Details
-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
Lustre 2.14.0
-
lustre-master-ib #437. version=2.13.54_118_g2e813f3
-
3
-
9223372036854775807
Description
2 clients hit following error
[1669499.346718] NMI watchdog: BUG: soft lockup - CPU#29 stuck for 22s! [ldlm_lock_repla:132092] [1669499.356235] Modules linked in: mgc(OE) lustre(OE) lmv(OE) mdc(OE) fid(OE) osc(OE) lov(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_k rb5 nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_ib(OE) ib_uverbs(OE) mlx5_core(OE) mlxf w(OE) mlx4_en(OE) sb_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper iTCO _wdt cryptd iTCO_vendor_support ipmi_ssif joydev pcspkr sg i2c_i801 ipmi_si ipmi_devintf ipmi_msghandler pcc_cpufreq wmi lpc_ich mei_me mei ioatdma auth_rpcgss sunrpc ip_ta bles ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx4_ib(OE) ib_core(OE) mgag200 drm_kms_helper syscopyarea sysfillrect isci igb sysimgblt fb_sys_fops ttm mlx4_co re(OE) ahci libsas libahci scsi_transport_sas ptp drm devlink pps_core crct10dif_pclmul crct10dif_common crc32c_intel dca libata mlx_compat(OE) drm_panel_orientation_quirks i2c_algo_bit [last unloaded: libcfs] [1669499.461226] CPU: 29 PID: 132092 Comm: ldlm_lock_repla Kdump: loaded Tainted: G OEL ------------ 3.10.0-1062.18.1.el7.x86_64 #1 [1669499.475297] Hardware name: Intel Corporation S2600GZ ........../S2600GZ, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013 [1669499.488011] task: ffff9f75181a3150 ti: ffff9f76836c4000 task.ti: ffff9f76836c4000 [1669499.496556] RIP: 0010:[<ffffffff9f3176b6>] [<ffffffff9f3176b6>] native_queued_spin_lock_slowpath+0x156/0x200 [1669499.507827] RSP: 0018:ffff9f76836c7cc8 EFLAGS: 00000202 [1669499.513946] RAX: 0000000000000101 RBX: 0000000000000000 RCX: 0000000000e90000 [1669499.522103] RDX: 0000000000e90101 RSI: 0000000000000101 RDI: ffff9f6957afda5c [1669499.530259] RBP: ffff9f76836c7cc8 R08: ffff9f77de55b880 R09: 0000000000000000 [1669499.538417] R10: 0000000019aefe01 R11: ffff9f6e19aee900 R12: 0000000000000000 [1669499.546575] R13: ffffffff9f423c7d R14: ffff9f76836c7cc8 R15: ffff9f6e19aefec0 [1669499.554734] FS: 0000000000000000(0000) GS:ffff9f77de540000(0000) knlGS:0000000000000000 [1669499.563957] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [1669499.570563] CR2: 00007f5d7cec9b80 CR3: 0000000031010000 CR4: 00000000000607e0 [1669499.578725] Call Trace: [1669499.581653] [<ffffffff9f9754ee>] queued_spin_lock_slowpath+0xb/0xf [1669499.588832] [<ffffffff9f983b20>] _raw_spin_lock+0x20/0x30 [1669499.595172] [<ffffffffc0fb03e2>] ldlm_resource_foreach+0x52/0x270 [ptlrpc] [1669499.603153] [<ffffffffc0fb062f>] ldlm_res_iter_helper+0x2f/0x40 [ptlrpc] [1669499.610928] [<ffffffffc0bdc460>] cfs_hash_for_each_relax+0x250/0x450 [libcfs] [1669499.619197] [<ffffffffc0fb0600>] ? ldlm_resource_foreach+0x270/0x270 [ptlrpc] [1669499.627468] [<ffffffffc0fb0600>] ? ldlm_resource_foreach+0x270/0x270 [ptlrpc] [1669499.635734] [<ffffffffc0bdf6c5>] cfs_hash_for_each_nolock+0x75/0x1c0 [libcfs] [1669499.644003] [<ffffffffc0fb0900>] __ldlm_replay_locks+0xe0/0x9e0 [ptlrpc] [1669499.651788] [<ffffffffc0fa8bc0>] ? is_granted_or_cancelled_nolock+0x60/0x60 [ptlrpc] [1669499.660739] [<ffffffffc0fb1200>] ? __ldlm_replay_locks+0x9e0/0x9e0 [ptlrpc] [1669499.668816] [<ffffffffc0fb1231>] ldlm_lock_replay_thread+0x31/0xd0 [ptlrpc] [1669499.676877] [<ffffffff9f2c6321>] kthread+0xd1/0xe0 [1669499.682513] [<ffffffff9f2c6250>] ? insert_kthread_work+0x40/0x40 [1669499.689507] [<ffffffff9f98dd37>] ret_from_fork_nospec_begin+0x21/0x21 [1669499.696986] [<ffffffff9f2c6250>] ? insert_kthread_work+0x40/0x40 [1669499.703979] Code: 8b 08 4d 85 c9 74 04 41 0f 18 09 8b 17 0f b7 c2 85 c0 74 21 83 f8 03 75 10 eb 1a 66 2e 0f 1f 84 00 00 00 00 00 85 c0 74 0c f3 90 <8b> 17 0f b7 c2 83 f8 03 75 f0 be 01 00 00 00 eb 15 66 0f 1f 84 [1669519.188752] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [ldlm_cb00_004:84044] [1669519.194752] NMI watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [simul:128254] [1669519.194776] Modules linked in: mgc(OE) lustre(OE) lmv(OE) mdc(OE) fid(OE) osc(OE) lov(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_ib(OE) ib_uverbs(OE) mlx5_core(OE) mlxfw(OE) mlx4_en(OE) sb_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper iTCO_wdt cryptd iTCO_vendor_support ipmi_ssif joydev pcspkr sg i2c_i801 ipmi_si ipmi_devintf ipmi_msghandler pcc_cpufreq wmi lpc_ich mei_me mei ioatdma auth_rpcgss sunrpc ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx4_ib(OE) ib_core(OE) mgag200 drm_kms_helper syscopyarea sysfillrect isci igb sysimgblt fb_sys_fops ttm mlx4_core(OE) ahci libsas libahci scsi_transport_sas ptp drm devlink pps_core crct10dif_pclmul crct10dif_common crc32c_intel dca libata mlx_compat(OE) drm_panel_orientation_quirks i2c_algo_bit [last unloaded: libcfs] [1669519.194785] CPU: 2 PID: 128254 Comm: simul Kdump: loaded Tainted: G OEL ------------ 3.10.0-1062.18.1.el7.x86_64 #1 [1669519.194786] Hardware name: Intel Corporation S2600GZ ........../S2600GZ, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013 [1669519.194787] task: ffff9f6a933941c0 ti: ffff9f6ed9ea4000 task.ti: ffff9f6ed9ea4000 [1669519.194789] RIP: 0010:[<ffffffff9f31772e>] [<ffffffff9f31772e>] native_queued_spin_lock_slowpath+0x1ce/0x200 [1669519.194790] RSP: 0018:ffff9f6ed9ea7248 EFLAGS: 00000202 [1669519.194791] RAX: 0000000000000001 RBX: 00000000b863ca88 RCX: 0000000000000001 [1669519.194792] RDX: 0000000000000101 RSI: 0000000000000001 RDI: ffff9f6957afda5c [1669519.194792] RBP: ffff9f6ed9ea7248 R08: 0000000000000101 R09: 0000000000000000 [1669519.194793] R10: 0000000000000000 R11: ffff9f69290d6300 R12: ffff9f6ed9ea7200 [1669519.194794] R13: ffffffffc0fa36b2 R14: ffff9f6ed9ea71b0 R15: 0000000000000000 [1669519.194795] FS: 00007f746b6e3740(0000) GS:ffff9f6fdea80000(0000) knlGS:0000000000000000 [1669519.194796] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [1669519.194796] CR2: 00007f7137d35c20 CR3: 00000003964da000 CR4: 00000000000607e0 [1669519.194797] Call Trace: [1669519.194800] [<ffffffff9f9754ee>] queued_spin_lock_slowpath+0xb/0xf [1669519.194801] [<ffffffff9f983b20>] _raw_spin_lock+0x20/0x30 [1669519.194821] [<ffffffffc0f999f8>] ldlm_lock_change_resource+0xe8/0x350 [ptlrpc] [1669519.194836] [<ffffffffc0fab55f>] ldlm_cli_enqueue_fini+0x3ff/0xe40 [ptlrpc] [1669519.194850] [<ffffffffc0d726e1>] ? lprocfs_counter_sub+0xc1/0x130 [obdclass] [1669519.194866] [<ffffffffc0faf051>] ldlm_cli_enqueue+0x441/0xa20 [ptlrpc] [1669519.194880] [<ffffffffc0fac270>] ? ldlm_expired_completion_wait+0x2a0/0x2a0 [ptlrpc] [1669519.194890] [<ffffffffc122e8a0>] ? ll_md_need_convert+0x180/0x180 [lustre] [1669519.194895] [<ffffffffc0c878a0>] ? mdc_changelog_cdev_finish+0x210/0x210 [mdc] [1669519.194900] [<ffffffffc0c81ee0>] mdc_enqueue_base+0x330/0x1d40 [mdc] [1669519.194904] [<ffffffffc0c84055>] mdc_intent_lock+0x135/0x570 [mdc] [1669519.194917] [<ffffffffc0d726e1>] ? lprocfs_counter_sub+0xc1/0x130 [obdclass] [1669519.194927] [<ffffffffc122e8a0>] ? ll_md_need_convert+0x180/0x180 [lustre] [1669519.194942] [<ffffffffc0fac270>] ? ldlm_expired_completion_wait+0x2a0/0x2a0 [ptlrpc] [1669519.194946] [<ffffffffc0c878a0>] ? mdc_changelog_cdev_finish+0x210/0x210 [mdc] [1669519.194950] [<ffffffffc11c2996>] lmv_revalidate_slaves+0x416/0xb30 [lmv] [1669519.194959] [<ffffffffc122e8a0>] ? ll_md_need_convert+0x180/0x180 [lustre] [1669519.194962] [<ffffffffc11ac366>] lmv_merge_attr+0x46/0x1b0 [lmv] [1669519.194975] [<ffffffffc0d725b9>] ? lprocfs_counter_add+0xf9/0x160 [obdclass] [1669519.194984] [<ffffffffc1217d85>] ll_update_lsm_md+0xe35/0x1020 [lustre] [1669519.195002] [<ffffffffc0fd2c57>] ? lustre_msg_buf+0x17/0x60 [ptlrpc] [1669519.195011] [<ffffffffc121b8ab>] ll_update_inode+0x36b/0x640 [lustre] [1669519.195013] [<ffffffff9f468e68>] ? inode_insert5+0x128/0x190 [1669519.195022] [<ffffffffc122d600>] ? ll_test_inode_by_fid+0x30/0x30 [lustre] [1669519.195031] [<ffffffffc122d600>] ? ll_test_inode_by_fid+0x30/0x30 [lustre] [1669519.195039] [<ffffffffc121bbe7>] ll_read_inode2+0x67/0x420 [lustre] [1669519.195048] [<ffffffffc122e4ab>] ll_iget+0xdb/0x350 [lustre] [1669519.195057] [<ffffffffc12206b2>] ll_prep_inode+0x212/0x9b0 [lustre] [1669519.195074] [<ffffffffc0fd2c00>] ? lustre_msg_buf_v2+0x1a0/0x1e0 [ptlrpc] [1669519.195092] [<ffffffffc0ffb8e7>] ? __req_capsule_get+0x427/0x6b0 [ptlrpc] [1669519.195102] [<ffffffffc122fb98>] ll_lookup_it.constprop.26+0xc08/0x1ec0 [lustre] [1669519.195115] [<ffffffffc0f993e3>] ? ldlm_lock_add_to_lru+0x43/0x130 [ptlrpc] [1669519.195129] [<ffffffffc0f9b326>] ? ldlm_lock_decref+0x36/0x80 [ptlrpc] [1669519.195135] [<ffffffffc11e82ca>] ? ll_intent_drop_lock.part.15+0x4a/0x170 [lustre] [1669519.195152] [<ffffffffc0fc26c0>] ? ptlrpc_req_finished+0x10/0x20 [ptlrpc] [1669519.195160] [<ffffffffc11f9dee>] ? ll_inode_revalidate+0x18e/0x690 [lustre] [1669519.195167] [<ffffffffc11f6a71>] ? ll_get_acl+0x31/0xf0 [lustre] [1669519.195180] [<ffffffffc0d725b9>] ? lprocfs_counter_add+0xf9/0x160 [obdclass] [1669519.195189] [<ffffffffc12299b8>] ? ll_stats_ops_tally+0x98/0x100 [lustre] [1669519.195198] [<ffffffffc1230f0e>] ll_lookup_nd+0xbe/0x180 [lustre] [1669519.195200] [<ffffffff9f455973>] lookup_real+0x23/0x60 [1669519.195202] [<ffffffff9f456392>] __lookup_hash+0x42/0x60 [1669519.195203] [<ffffffff9f978067>] lookup_slow+0x42/0xa7 [1669519.195205] [<ffffffff9f45b8a8>] path_lookupat+0x838/0x8b0 [1669519.195206] [<ffffffff9f4264a5>] ? kmem_cache_alloc+0x35/0x1f0 [1669519.195208] [<ffffffff9f45c6ff>] ? getname_flags+0x4f/0x1a0 [1669519.195209] [<ffffffff9f45b94b>] filename_lookup+0x2b/0xc0 [1669519.195211] [<ffffffff9f45d897>] user_path_at_empty+0x67/0xc0 [1669519.195213] [<ffffffff9f2e28c9>] ? pick_next_entity+0xa9/0x190 [1669519.195214] [<ffffffff9f45d901>] user_path_at+0x11/0x20 [1669519.195216] [<ffffffff9f4505e3>] vfs_fstatat+0x63/0xc0 [1669519.195217] [<ffffffff9f2de7f5>] ? sched_clock_cpu+0x85/0xc0 [1669519.195218] [<ffffffff9f45099e>] SYSC_newstat+0x2e/0x60 [1669519.195220] [<ffffffff9f98de21>] ? system_call_after_swapgs+0xae/0x146 [1669519.195221] [<ffffffff9f98de15>] ? system_call_after_swapgs+0xa2/0x146 [1669519.195223] [<ffffffff9f98de21>] ? system_call_after_swapgs+0xae/0x146 [1669519.195224] [<ffffffff9f98de15>] ? system_call_after_swapgs+0xa2/0x146 [1669519.195225] [<ffffffff9f98de21>] ? system_call_after_swapgs+0xae/0x146 [1669519.195226] [<ffffffff9f98de15>] ? system_call_after_swapgs+0xa2/0x146 [1669519.195228] [<ffffffff9f98de21>] ? system_call_after_swapgs+0xae/0x146 [1669519.195229] [<ffffffff9f98de15>] ? system_call_after_swapgs+0xa2/0x146 [1669519.195230] [<ffffffff9f98de21>] ? system_call_after_swapgs+0xae/0x146 [1669519.195231] [<ffffffff9f98de15>] ? system_call_after_swapgs+0xa2/0x146 [1669519.195232] [<ffffffff9f98de21>] ? system_call_after_swapgs+0xae/0x146 [1669519.195234] [<ffffffff9f450e5e>] SyS_newstat+0xe/0x10 [1669519.195235] [<ffffffff9f98dede>] system_call_fastpath+0x25/0x2a [1669519.195237] [<ffffffff9f98de21>] ? system_call_after_swapgs+0xae/0x146 [1669519.195251] Code: 37 81 fe 00 01 00 00 74 f4 e9 93 fe ff ff 0f 1f 80 00 00 00 00 83 fa 01 75 11 0f 1f 00 e9 68 fe ff ff 0f 1f 00 85 c0 74 0c f3 90 <8b> 07 0f b6 c0 83 f8 03 75 f0 b8 01 00 00 00 66 89 07 5d c3 66 [1669519.947805] Modules linked in: mgc(OE) lustre(OE) lmv(OE) mdc(OE) fid(OE) osc(OE) lov(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_ib(OE) ib_uverbs(OE) mlx5_core(OE) mlxfw(OE) mlx4_en(OE) sb_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper iTCO_wdt cryptd iTCO_vendor_support ipmi_ssif joydev pcspkr sg i2c_i801 ipmi_si ipmi_devintf ipmi_msghandler pcc_cpufreq wmi lpc_ich mei_me mei ioatdma auth_rpcgss sunrpc ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx4_ib(OE) ib_core(OE) mgag200 drm_kms_helper syscopyarea sysfillrect isci igb sysimgblt fb_sys_fops ttm mlx4_core(OE) ahci libsas libahci scsi_transport_sas ptp drm devlink pps_core crct10dif_pclmul crct10dif_common crc32c_intel dca libata mlx_compat(OE) drm_panel_orientation_quirks i2c_algo_bit [last unloaded: libcfs]