Details
-
Bug
-
Resolution: Unresolved
-
Critical
-
None
-
Lustre 2.12.4
-
None
-
3
-
9223372036854775807
Description
Switching back from "tbf uid" to fifo caused soft lockup. Including backtrace of all threads from the crash dump.
From dmesg
[-- MARK -- Mon Jan 25 15:00:00 2021] [15694977.724675] NMI watchdog: BUG: soft lockup - CPU#2 stuck for 23s! [mdt00_088:11264] [15694977.724677] NMI watchdog: BUG: soft lockup - CPU#6 stuck for 23s! [mdt00_080:11250] [15694977.724679] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [mdt00_109:11297] [15694977.724681] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 23s! [mdt00_102:11285] [15694977.724683] NMI watchdog: BUG: soft lockup - CPU#5 stuck for 23s! [mdt00_034:11187] [15694977.724685] NMI watchdog: BUG: soft lockup - CPU#7 stuck for 23s! [mdt00_016:11166] [15694977.724687] NMI watchdog: BUG: soft lockup - CPU#4 stuck for 23s! [mdt00_046:11201]
I was able to get a crash dump.
All the hung threads are in the same state
crash> bt 11285
PID: 11285 TASK: ffffa137e72d9070 CPU: 3 COMMAND: "mdt00_102"
#0 [ffffa117fecc8e48] crash_nmi_callback at ffffffffb7658017
#1 [ffffa117fecc8e58] nmi_handle at ffffffffb7d8593c
#2 [ffffa117fecc8eb0] do_nmi at ffffffffb7d85b5d
#3 [ffffa117fecc8ef0] end_repeat_nmi at ffffffffb7d84d9c
[exception RIP: native_queued_spin_lock_slowpath+344]
RIP: ffffffffb7717478 RSP: ffffa1375e5e3d38 RFLAGS: 00000202
RAX: 0000000000000101 RBX: ffffa117fb5e1108 RCX: 0000000000190000
RDX: 0000000000590101 RSI: 0000000000000101 RDI: ffffa117fb5e1108
RBP: ffffa1375e5e3d38 R8: ffffa117fecdb880 R9: 0000000000000000
R10: ffffffffc0d37e40 R11: ffffa117fb5e1108 R12: 0000000000000000
R13: ffffa0f8eb8a3b80 R14: ffffa0f8eb8a3b80 R15: 0000000000000000
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
--- <NMI exception stack> ---
#4 [ffffa1375e5e3d38] native_queued_spin_lock_slowpath at ffffffffb7717478
#5 [ffffa1375e5e3d40] queued_spin_lock_slowpath at ffffffffb7d7546a
#6 [ffffa1375e5e3d50] _raw_spin_lock at ffffffffb7d83350
#7 [ffffa1375e5e3d60] nrs_resource_get_safe at ffffffffc1039402 [ptlrpc]
#8 [ffffa1375e5e3d98] ptlrpc_nrs_req_initialize at ffffffffc1039f13 [ptlrpc]
#9 [ffffa1375e5e3db0] ptlrpc_server_handle_req_in at ffffffffc1004c21 [ptlrpc]
#10 [ffffa1375e5e3df8] ptlrpc_main at ffffffffc1008d65 [ptlrpc]
#11 [ffffa1375e5e3ec8] kthread at ffffffffb76c61f1