Details
-
Bug
-
Resolution: Unresolved
-
Medium
-
None
-
None
-
None
-
3
-
9223372036854775807
Description
Due to a problem in the task wakeup logic, the MCS spinlock queue (in cache->uc_lock) grows, eventually causing the MDS to hang.
PID: 3758 TASK: ff4b68551ee58000 CPU: 9 COMMAND: "mdt04_001"
#0 [fffffe000020ce48] crash_nmi_callback at ffffffff82a5e8d3
#1 [fffffe000020ce50] nmi_handle at ffffffff82a2b393
#2 [fffffe000020cea8] default_do_nmi at ffffffff833ee099
#3 [fffffe000020cec8] do_nmi at ffffffff82a2b8ef
#4 [fffffe000020cef0] end_repeat_nmi at ffffffff836015e8
[exception RIP: native_queued_spin_lock_slowpath+324]
RIP: ffffffff82b5b264 RSP: ff81546d74123ad8 RFLAGS: 00000246
RAX: 0000000000000000 RBX: ff4b685cee36c80c RCX: 0000000000000017
RDX: ff4b6874f1873d40 RSI: 0000000000280000 RDI: ff4b685cee36c810
RBP: ff4b685cee36c810 R8: 0000000000000033 R9: ff81546d74123988
R10: 8080808080808080 R11: 0000000000000029 R12: ff4b68611aae3c48
R13: 0000000000000006 R14: ff4b6852a713e110 R15: ff4b6852a713e110
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
--- <NMI exception stack> ---
#5 [ff81546d74123ad8] native_queued_spin_lock_slowpath at ffffffff82b5b264
#6 [ff81546d74123ad8] queued_write_lock_slowpath at ffffffff82b5b4e7
#7 [ff81546d74123af0] upcall_cache_put_entry at ffffffffc10baad4 [obdclass]
#8 [ff81546d74123b18] mdt_exit_ucred at ffffffffc1b20cd9 [mdt]
#9 [ff81546d74123b38] mdt_reint_internal at ffffffffc1b0c4b8 [mdt]
#10 [ff81546d74123b68] mdt_intent_open at ffffffffc1b17b08 [mdt]
#11 [ff81546d74123ba8] mdt_intent_opc at ffffffffc1b10472 [mdt]
#12 [ff81546d74123c00] mdt_intent_policy at ffffffffc1b15b1d [mdt]
#13 [ff81546d74123c40] ldlm_lock_enqueue at ffffffffc13db2de [ptlrpc]
#14 [ff81546d74123cb0] ldlm_handle_enqueue0 at ffffffffc140375a [ptlrpc]
#15 [ff81546d74123d30] tgt_enqueue at ffffffffc1492de4 [ptlrpc]
#16 [ff81546d74123d48] tgt_request_handle at ffffffffc149c1ac [ptlrpc]
#17 [ff81546d74123dc8] ptlrpc_server_handle_request at ffffffffc1437d23 [ptlrpc]
#18 [ff81546d74123e30] ptlrpc_main at ffffffffc143c60f [ptlrpc]
#19 [ff81546d74123f10] kthread at ffffffff82b1d6a4
#20 [ff81546d74123f50] ret_from_fork at ffffffff8360024f
PID: 3793 TASK: ff4b68549075c000 CPU: 13 COMMAND: "mdt_rdpg06_000"
#0 [fffffe00002f0e48] crash_nmi_callback at ffffffff82a5e8d3
#1 [fffffe00002f0e50] nmi_handle at ffffffff82a2b393
#2 [fffffe00002f0ea8] default_do_nmi at ffffffff833ee099
#3 [fffffe00002f0ec8] do_nmi at ffffffff82a2b8ef
#4 [fffffe00002f0ef0] end_repeat_nmi at ffffffff836015e8
[exception RIP: native_queued_spin_lock_slowpath+324]
RIP: ffffffff82b5b264 RSP: ff81546d7423bbb8 RFLAGS: 00000246
RAX: 0000000000000000 RBX: ff4b685cee36c80c RCX: 0000000000000009
RDX: ff4b6874f1973d40 RSI: 0000000000380000 RDI: ff4b685cee36c810
RBP: ff4b685cee36c810 R8: 0000000000000000 R9: 0000000000000000
R10: 0000000000000001 R11: 00000000000000a3 R12: ff4b685c2a0343e0
R13: ff4b685bae02f000 R14: ff4b685543406000 R15: 000000000000373e
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
--- <NMI exception stack> ---
#5 [ff81546d7423bbb8] native_queued_spin_lock_slowpath at ffffffff82b5b264
#6 [ff81546d7423bbb8] queued_read_lock_slowpath at ffffffff82b5b464
#7 [ff81546d7423bbd0] upcall_cache_get_entry at ffffffffc10bc1a0 [obdclass]
#8 [ff81546d7423bc88] mdt_identity_get at ffffffffc1b4760b [mdt]
#9 [ff81546d7423bca0] old_init_ucred_common at ffffffffc1b1f96e [mdt]
#10 [ff81546d7423bcd8] mdt_init_ucred_reint at ffffffffc1b21885 [mdt]
#11 [ff81546d7423bd00] mdt_close at ffffffffc1b462cd [mdt]
#12 [ff81546d7423bd48] tgt_request_handle at ffffffffc149c1ac [ptlrpc]
#13 [ff81546d7423bdc8] ptlrpc_server_handle_request at ffffffffc1437d23 [ptlrpc]
#14 [ff81546d7423be30] ptlrpc_main at ffffffffc143c60f [ptlrpc]
#15 [ff81546d7423bf10] kthread at ffffffff82b1d6a4
#16 [ff81546d7423bf50] ret_from_fork at ffffffff8360024f