Details
-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
None
-
None
-
3
-
9223372036854775807
Description
Soft lockup seen in large-scale testing:
https://testing.whamcloud.com/test_sets/7b7deb56-3d93-4741-9d65-3bc7175fab1c
[ 1077.167620] Lustre: Skipped 7 previous similar messages
[ 1099.471596] watchdog: BUG: soft lockup - CPU#0 stuck for 21s! [ldlm_bl_04:26190]
[ 1099.477093] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) lustre(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) dm_flakey dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs rfkill sunrpc virtio_balloon i2c_piix4 intel_rapl_msr intel_rapl_common joydev pcspkr fuse drm ext4 mbcache jbd2 ata_generic crct10dif_pclmul crc32_pclmul crc32c_intel virtio_net ata_piix ghash_clmulni_intel libata virtio_blk net_failover failover serio_raw
[ 1099.482971] CPU: 0 PID: 26190 Comm: ldlm_bl_04 Kdump: loaded Tainted: G OE ------- --- 5.14.0-362.24.1_lustre.el9.x86_64 #1
[ 1099.484414] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[ 1099.485145] RIP: 0010:cfs_hash_for_each_relax+0x15d/0x480 [libcfs]
[ 1099.486001] Code: 24 38 00 00 00 00 8b 40 2c 89 44 24 18 49 8b 45 38 48 8d 74 24 30 4c 89 ef 48 8b 00 e8 ac 93 6a eb 48 85 c0 0f 84 1e 02 00 00 <4c> 8b 38 4d 85 ff 0f 84 f8 01 00 00 49 8b 45 28 4c 89 ef 4c 89 fe
[ 1099.488117] RSP: 0018:ffffa16581303d80 EFLAGS: 00010282
[ 1099.488778] RAX: ffffa16588cbd008 RBX: 0000000000000050 RCX: 000000000000000e
[ 1099.489640] RDX: ffffa16588cbb000 RSI: ffffa16581303db0 RDI: ffff95bd7714d800
[ 1099.490502] RBP: ffff95bd7b8e8908 R08: ffff95bdffc324b8 R09: ffff95bdffc324b8
[ 1099.491370] R10: 00000000000000b3 R11: 0000000000000008 R12: 0000000000000001
[ 1099.492237] R13: ffff95bd7714d800 R14: 0000000000000000 R15: 0000000000000000
[ 1099.493103] FS: 0000000000000000(0000) GS:ffff95bdffc00000(0000) knlGS:0000000000000000
[ 1099.494072] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1099.494792] CR2: 00007ffe748cbdd8 CR3: 00000000acf96006 CR4: 00000000001706f0
[ 1099.495655] Call Trace:
[ 1099.496028] <IRQ>
[ 1099.496354] ? show_trace_log_lvl+0x1c4/0x2df
[ 1099.496938] ? show_trace_log_lvl+0x1c4/0x2df
[ 1099.497505] ? cfs_hash_for_each_nolock+0x12e/0x210 [libcfs]
[ 1099.498229] ? watchdog_timer_fn+0x1b2/0x210
[ 1099.498798] ? __pfx_watchdog_timer_fn+0x10/0x10
[ 1099.499390] ? __hrtimer_run_queues+0x12a/0x2c0
[ 1099.499980] ? hrtimer_interrupt+0xfc/0x210
[ 1099.500523] ? __sysvec_apic_timer_interrupt+0x5f/0x110
[ 1099.501179] ? sysvec_apic_timer_interrupt+0x6d/0x90
[ 1099.501817] </IRQ>
[ 1099.502142] <TASK>
[ 1099.502457] ? asm_sysvec_apic_timer_interrupt+0x16/0x20
[ 1099.503149] ? cfs_hash_for_each_relax+0x15d/0x480 [libcfs]
[ 1099.503863] ? __pfx_ldlm_reprocess_res+0x10/0x10 [ptlrpc]
[ 1099.663929] ? __pfx_ldlm_reprocess_res+0x10/0x10 [ptlrpc]
[ 1099.664732] cfs_hash_for_each_nolock+0x12e/0x210 [libcfs]
[ 1099.665437] ldlm_reprocess_recovery_done+0x8b/0x100 [ptlrpc]
[ 1099.666283] ldlm_export_cancel_locks+0x177/0x180 [ptlrpc]
[ 1099.667099] ldlm_bl_thread_main+0x531/0x640 [ptlrpc]
[ 1099.667871] ? __pfx_ldlm_bl_thread_main+0x10/0x10 [ptlrpc]
[ 1099.668677] kthread+0xe0/0x100
[ 1099.669132] ? __pfx_kthread+0x10/0x10
[ 1099.669633] ret_from_fork+0x2c/0x50
[ 1099.670127] </TASK>
[ 1099.670460] Kernel panic - not syncing: softlockup: hung tasks