Details
-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
Lustre 2.16.0
-
None
-
3
-
9223372036854775807
Description
This issue was created by maloo for Andreas Dilger <adilger@whamcloud.com>
This issue relates to the following test suite run:
https://testing.whamcloud.com/test_sets/112570ae-2e64-4c60-bd13-b1447c7934fa
test_50A failed with the following error after both CPUs were locked up:
onyx-99vm1 crash during sanity-flr test_50A
Test session details:
clients: https://build.whamcloud.com/job/lustre-reviews/101181 - 4.18.0-477.27.1.el8_8.x86_64
servers: https://build.whamcloud.com/job/lustre-reviews/101181 - 4.18.0-477.27.1.el8_lustre.x86_64
Lustre: DEBUG MARKER: == sanity-flr test 50A: mirror split update layout generation ===== 19:25:25 (1704741925) Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds1' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds1 Lustre: Failing over lustre-MDT0000 watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [ldlm_bl_02:77462] watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [ldlm_bl_03:80014] CPU: 1 PID: 80014 Comm: ldlm_bl_03 4.18.0-477.27.1.el8_lustre.x86_64 #1 CPU: 0 PID: 77462 Comm: ldlm_bl_02 4.18.0-477.27.1.el8_lustre.x86_64 #1 RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [libcfs] Call Trace: kvm_wait+0x58/0x60 __pv_queued_spin_lock_slowpath+0x268/0x2a0 cfs_hash_for_each_nolock+0x126/0x1f0 [libcfs] ldlm_reprocess_recovery_done+0x8b/0x100 [ptlrpc] _raw_spin_lock+0x1e/0x30 cfs_hash_for_each_relax+0x14a/0x480 [libcfs] cfs_hash_for_each_nolock+0x126/0x1f0 [libcfs] ldlm_reprocess_recovery_done+0x8b/0x100 [ptlrpc] ldlm_export_cancel_locks+0x172/0x180 [ptlrpc] ldlm_export_cancel_locks+0x172/0x180 [ptlrpc] ldlm_bl_thread_main+0x6df/0x940 [ptlrpc] ldlm_bl_thread_main+0x6df/0x940 [ptlrpc] kthread+0x134/0x150 kthread+0x134/0x150 ret_from_fork+0x35/0x40 ret_from_fork+0x35/0x40
The duplicate lines in the stack trace look like they are because both CPUs are printing to the console at the same time and both appear to be in ldlm_export_cancel_locks() and contending on the same spinlock.
This similar stack also appeared in LU-17349.
VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
sanity-flr test_50A - onyx-99vm1 crashed during sanity-flr test_50A