Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8542

Soft lockup, eventually ending in a Kernel Panic

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Major
    • Lustre 2.9.0
    • Lustre 2.8.0
    • None
    • CentOS 7.2, NVMe devices, DNE2, LDISKFS MDT's, OPA with IFS 10.1.1.0.9 and Lustre-master Build #3419
    • 3
    • 9223372036854775807

    Description

      Soft lockup eventually ending in a kernel panic. I have seen this issue once when running ZFS as the backend but I see it very frequently on LDISKFS.

      Workload I am running to cause this is MDTEST with DNE2 striped directories, this instance failed at 7x MDS's with 1x MDT per MDS, however I have seen it do it with various combinations.

      Message from syslogd@zlfs2-oss7 at Aug 25 12:27:14 ...
       kernel:BUG: soft lockup - CPU#18 stuck for 23s! [mdt02_034:4962]
      Aug 25 12:27:14 zlfs2-oss7 kernel: BUG: soft lockup - CPU#18 stuck for 23s! [mdt02_034:4962]
      Aug 25 12:27:14 zlfs2-oss7 kernel: Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) mbcache jbd2 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache lustre(OE) lmv(OE) mdc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) sha512_generic crypto_null libcfs(OE) xprtrdma ib_isert iscsi_target_mod target_core_mod ib_iser libiscsi scsi_transport_iscsi ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm intel_powerclamp coretemp intel_rapl kvm crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper mxm_wmi cryptd iTCO_wdt iTCO_vendor_support i2c_i801 lpc_ich sg mfd_core ipmi_devintf pcspkr mei_me mei ioatdma hfi1 ipmi_si ipmi_msghandler sb_edac edac_core wmi shpchp acpi_power_meter acpi_pad nfsd auth_rpcgss nfs_acl
      Aug 25 12:27:14 zlfs2-oss7 kernel: lockd grace sunrpc ip_tables xfs libcrc32c mlx4_ib ib_sa ib_mad mlx4_en vxlan ip6_udp_tunnel udp_tunnel ib_core ib_addr raid1 sd_mod crc_t10dif crct10dif_generic mgag200 crct10dif_pclmul syscopyarea crct10dif_common sysfillrect sysimgblt crc32c_intel i2c_algo_bit drm_kms_helper ttm nvme drm ixgbe ahci libahci mlx4_core mdio i2c_core libata ptp pps_core dca dm_mirror dm_region_hash dm_log dm_mod zfs(POE) zunicode(POE) zavl(POE) zcommon(POE) znvpair(POE) spl(OE) zlib_deflate
      Aug 25 12:27:14 zlfs2-oss7 kernel: CPU: 18 PID: 4962 Comm: mdt02_034 Tainted: P           OEL ------------   3.10.0-327.22.2.el7_lustre.x86_64 #1
      Aug 25 12:27:14 zlfs2-oss7 kernel: Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS SE5C610.86B.01.01.0018.072020161249 07/20/2016
      Aug 25 12:27:14 zlfs2-oss7 kernel: task: ffff88202258b980 ti: ffff881f8d9b4000 task.ti: ffff881f8d9b4000
      Aug 25 12:27:14 zlfs2-oss7 kernel: RIP: 0010:[<ffffffff8163dcd7>]  [<ffffffff8163dcd7>] _raw_spin_lock+0x37/0x50
      Aug 25 12:27:14 zlfs2-oss7 kernel: RSP: 0018:ffff881f8d9b74d0  EFLAGS: 00000206
      Aug 25 12:27:14 zlfs2-oss7 kernel: RAX: 000000000000544e RBX: ffff88102533f270 RCX: 00000000000034ec
      Aug 25 12:27:14 zlfs2-oss7 kernel: RDX: 0000000000000e3a RSI: 0000000000000e3a RDI: ffff881fe27893a0
      Aug 25 12:27:14 zlfs2-oss7 kernel: RBP: ffff881f8d9b74d0 R08: 7010000000000000 R09: 10009e4e38080000
      Aug 25 12:27:14 zlfs2-oss7 kernel: R10: efe165b1ef8b8e02 R11: 0000000000000000 R12: ffff882000fc4dd0
      Aug 25 12:27:14 zlfs2-oss7 kernel: R13: ffff8810009e4e38 R14: ffffffff8121298b R15: ffff881f8d9b74e0
      Aug 25 12:27:14 zlfs2-oss7 kernel: FS:  0000000000000000(0000) GS:ffff88203ea80000(0000) knlGS:0000000000000000
      Aug 25 12:27:14 zlfs2-oss7 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      Aug 25 12:27:14 zlfs2-oss7 kernel: CR2: 00000000006dde20 CR3: 000000000194a000 CR4: 00000000001407e0
      Aug 25 12:27:14 zlfs2-oss7 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      Aug 25 12:27:14 zlfs2-oss7 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Aug 25 12:27:14 zlfs2-oss7 kernel: Stack:
      Aug 25 12:27:14 zlfs2-oss7 kernel: ffff881f8d9b7558 ffffffffa0d2dbfc ffff881f8d9b7500 ffffffff8121329c
      Aug 25 12:27:14 zlfs2-oss7 kernel: 00000000e278a000 ffff881fe2789000 0000000000000000 ffffffff8121332d
      Aug 25 12:27:14 zlfs2-oss7 kernel: 0000000100000020 ffff881f8d9b7528 ffffffff812404dc 00000000b5ead192
      Aug 25 12:27:14 zlfs2-oss7 kernel: Call Trace:
      Aug 25 12:27:14 zlfs2-oss7 kernel: [<ffffffffa0d2dbfc>] do_get_write_access+0x32c/0x4e0 [jbd2]
      Aug 25 12:27:14 zlfs2-oss7 kernel: [<ffffffff8121329c>] ? __find_get_block+0xbc/0x120
      Aug 25 12:27:14 zlfs2-oss7 kernel: [<ffffffff8121332d>] ? __getblk+0x2d/0x2e0
      Aug 25 12:27:14 zlfs2-oss7 kernel: [<ffffffff812404dc>] ? inode_reserved_space+0x1c/0x20
      Aug 25 12:27:14 zlfs2-oss7 kernel: [<ffffffffa0d2ddd7>] jbd2_journal_get_write_access+0x27/0x40 [jbd2]
      Aug 25 12:27:14 zlfs2-oss7 kernel: [<ffffffffa136b37b>] __ldiskfs_journal_get_write_access+0x3b/0x80 [ldiskfs]
      Aug 25 12:27:14 zlfs2-oss7 kernel: [<ffffffffa1372197>] __ldiskfs_new_inode+0x447/0x1300 [ldiskfs]
      Aug 25 12:27:14 zlfs2-oss7 kernel: [<ffffffffa13394c7>] ldiskfs_create_inode+0x37/0xa0 [ldiskfs]
      Aug 25 12:27:14 zlfs2-oss7 kernel: [<ffffffffa14636e9>] osd_mkfile.isra.80+0x119/0x230 [osd_ldiskfs]
      Aug 25 12:27:14 zlfs2-oss7 kernel: [<ffffffffa146c2f5>] ? osd_trans_exec_op+0x25/0x310 [osd_ldiskfs]
      Aug 25 12:27:14 zlfs2-oss7 kernel: [<ffffffffa1463873>] osd_mkreg+0x33/0x70 [osd_ldiskfs]
      Aug 25 12:27:14 zlfs2-oss7 kernel: [<ffffffffa1475b65>] osd_object_ea_create+0x1f5/0xc60 [osd_ldiskfs]
      Aug 25 12:27:14 zlfs2-oss7 kernel: [<ffffffffa169a7d2>] lod_sub_object_create+0x1f2/0x480 [lod]
      Aug 25 12:27:14 zlfs2-oss7 kernel: [<ffffffff811c153a>] ? kmem_cache_alloc+0x1ba/0x1d0
      Aug 25 12:27:14 zlfs2-oss7 kernel: [<ffffffffa1691c4f>] lod_object_create+0xaf/0x200 [lod]
      Aug 25 12:27:14 zlfs2-oss7 kernel: [<ffffffffa16f4f35>] mdd_object_create_internal+0xb5/0x280 [mdd]
      Aug 25 12:27:14 zlfs2-oss7 kernel: [<ffffffffa16e0086>] mdd_object_create+0x76/0xa30 [mdd]
      Aug 25 12:27:14 zlfs2-oss7 kernel: [<ffffffffa16ec1a0>] ? mdd_declare_create+0x490/0xc60 [mdd]
      Aug 25 12:27:14 zlfs2-oss7 kernel: [<ffffffffa16ed637>] mdd_create+0xcc7/0x12b0 [mdd]
      Aug 25 12:27:14 zlfs2-oss7 kernel: [<ffffffffa15d1c1b>] mdt_reint_open+0x223b/0x31a0 [mdt]
      Aug 25 12:27:14 zlfs2-oss7 kernel: [<ffffffffa0db4009>] ? upcall_cache_get_entry+0x3e9/0x8e0 [obdclass]
      Aug 25 12:27:14 zlfs2-oss7 kernel: [<ffffffffa15b7ab3>] ? ucred_set_jobid+0x53/0x70 [mdt]
      Aug 25 12:27:14 zlfs2-oss7 kernel: [<ffffffffa15c7080>] mdt_reint_rec+0x80/0x210 [mdt]
      Aug 25 12:27:14 zlfs2-oss7 kernel: [<ffffffffa15a9d62>] mdt_reint_internal+0x5b2/0x9b0 [mdt]
      Aug 25 12:27:14 zlfs2-oss7 kernel: [<ffffffffa15aa2c2>] mdt_intent_reint+0x162/0x430 [mdt]
      Aug 25 12:27:14 zlfs2-oss7 kernel: [<ffffffffa15b493c>] mdt_intent_policy+0x5bc/0xbb0 [mdt]
      Aug 25 12:27:14 zlfs2-oss7 kernel: [<ffffffffa0f30f02>] ? ldlm_resource_get+0x5e2/0xa30 [ptlrpc]
      Aug 25 12:27:14 zlfs2-oss7 kernel: [<ffffffffa0f2a1e7>] ldlm_lock_enqueue+0x387/0x970 [ptlrpc]
      Aug 25 12:27:14 zlfs2-oss7 kernel: [<ffffffffa0f52ce2>] ldlm_handle_enqueue0+0x772/0x16b0 [ptlrpc]
      Aug 25 12:27:14 zlfs2-oss7 kernel: [<ffffffffa0f7ac30>] ? lustre_swab_ldlm_lock_desc+0x30/0x30 [ptlrpc]
      Aug 25 12:27:14 zlfs2-oss7 kernel: [<ffffffffa0fd36b2>] tgt_enqueue+0x62/0x210 [ptlrpc]
      Aug 25 12:27:14 zlfs2-oss7 kernel: [<ffffffffa0fd7b15>] tgt_request_handle+0x915/0x1320 [ptlrpc]
      Aug 25 12:27:14 zlfs2-oss7 kernel: [<ffffffffa0f83ccb>] ptlrpc_server_handle_request+0x21b/0xa90 [ptlrpc]
      Aug 25 12:27:14 zlfs2-oss7 kernel: [<ffffffffa0be6568>] ? lc_watchdog_touch+0x68/0x180 [libcfs]
      Aug 25 12:27:14 zlfs2-oss7 kernel: [<ffffffffa0f81888>] ? ptlrpc_wait_event+0x98/0x340 [ptlrpc]
      Aug 25 12:27:14 zlfs2-oss7 kernel: [<ffffffff810b88d2>] ? default_wake_function+0x12/0x20
      Aug 25 12:27:14 zlfs2-oss7 kernel: [<ffffffff810af038>] ? __wake_up_common+0x58/0x90
      Aug 25 12:27:14 zlfs2-oss7 kernel: [<ffffffffa0f87d80>] ptlrpc_main+0xaa0/0x1de0 [ptlrpc]
      Aug 25 12:27:14 zlfs2-oss7 kernel: [<ffffffff81013588>] ? __switch_to+0xf8/0x4b0
      Aug 25 12:27:14 zlfs2-oss7 kernel: [<ffffffffa0f872e0>] ? ptlrpc_register_service+0xe40/0xe40 [ptlrpc]
      Aug 25 12:27:14 zlfs2-oss7 kernel: [<ffffffff810a5aef>] kthread+0xcf/0xe0
      Aug 25 12:27:14 zlfs2-oss7 kernel: [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140
      Aug 25 12:27:14 zlfs2-oss7 kernel: [<ffffffff816469d8>] ret_from_fork+0x58/0x90
      Aug 25 12:27:14 zlfs2-oss7 kernel: [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140
      Aug 25 12:27:14 zlfs2-oss7 kernel: Code: 02 00 f0 0f c1 07 89 c2 c1 ea 10 66 39 c2 75 02 5d c3 83 e2 fe 0f b7 f2 b8 00 80 00 00 eb 0c 0f 1f 44 00 00 f3 90 83 e8 01 74 0a <0f> b7 0f 66 39 ca 75 f1 5d c3 0f 1f 80 00 00 00 00 eb da 66 0f
      

      Attachments

        Issue Links

          Activity

            People

              yong.fan nasf (Inactive)
              adam.j.roe Adam Roe (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: