Details
-
Bug
-
Resolution: Duplicate
-
Critical
-
None
-
None
-
None
-
3
-
11468
Description
I've been hittign this lately in racer.
unmount is not able to finish and things hang, but I suspect the lockup is not necessary related to shutdown.
I also have a crashdump dumped about an 30 minutes after the condition was detected, if desired.
This is a fairly recent master too.
[ 9489.412732] LNet: Service thread pid 21501 was inactive for 62.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [ 9489.414125] Pid: 21501, comm: mdt00_006 [ 9489.414387] [ 9489.414387] Call Trace: [ 9489.414848] [<ffffffffa056d834>] ? htable_lookup+0x1c4/0x1e0 [obdclass] [ 9489.415165] [<ffffffffa056de4b>] lu_object_find_at+0xab/0x360 [obdclass] [ 9489.415522] [<ffffffffa0695976>] ? lustre_msg_string+0x96/0x290 [ptlrpc] [ 9489.415870] [<ffffffff8105ad30>] ? default_wake_function+0x0/0x20 [ 9489.416214] [<ffffffffa06958d5>] ? lustre_msg_buf+0x55/0x60 [ptlrpc] [ 9489.416615] [<ffffffffa056e116>] lu_object_find+0x16/0x20 [obdclass] [ 9489.416928] [<ffffffffa0b1bad6>] mdt_object_find+0x56/0x170 [mdt] [ 9489.417224] [<ffffffffa0b2bac4>] mdt_getattr_name_lock+0x804/0x19a0 [mdt] [ 9489.417564] [<ffffffffa06958d5>] ? lustre_msg_buf+0x55/0x60 [ptlrpc] [ 9489.418207] [<ffffffffa06bc336>] ? __req_capsule_get+0x166/0x710 [ptlrpc] [ 9489.418555] [<ffffffffa0697b84>] ? lustre_msg_get_flags+0x34/0xb0 [ptlrpc] [ 9489.418870] [<ffffffffa0b2cef9>] mdt_intent_getattr+0x299/0x480 [mdt] [ 9489.419172] [<ffffffffa0b1d5b9>] mdt_intent_policy+0x499/0xca0 [mdt] [ 9489.419492] [<ffffffffa064e32a>] ldlm_lock_enqueue+0x2ea/0x860 [ptlrpc] [ 9489.419814] [<ffffffffa0676c4f>] ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc] [ 9489.420147] [<ffffffffa06ea772>] tgt_enqueue+0x62/0x1d0 [ptlrpc] [ 9489.420462] [<ffffffffa06e8cbf>] tgt_request_handle+0x5ff/0x1200 [ptlrpc] [ 9489.420813] [<ffffffffa06a63d5>] ptlrpc_server_handle_request+0x395/0xc20 [ptlrpc] [ 9489.421314] [<ffffffffa0ec540f>] ? lc_watchdog_touch+0x6f/0x170 [libcfs] [ 9489.422763] [<ffffffffa069dd41>] ? ptlrpc_wait_event+0xc1/0x2e0 [ptlrpc] [ 9489.423102] [<ffffffffa06a76ba>] ptlrpc_main+0xa5a/0x1690 [ptlrpc] [ 9489.423445] [<ffffffffa06a6c60>] ? ptlrpc_main+0x0/0x1690 [ptlrpc] [ 9489.423801] [<ffffffff81094726>] kthread+0x96/0xa0 [ 9489.424090] [<ffffffff8100c10a>] child_rip+0xa/0x20 [ 9489.424395] [<ffffffff81094690>] ? kthread+0x0/0xa0 [ 9489.424686] [<ffffffff8100c100>] ? child_rip+0x0/0x20 [ 9489.424955] [ 9489.425179] LustreError: dumping log to /tmp/lustre-log.1383705973.21501 [ 9497.613530] LNet: Service thread pid 30740 was inactive for 40.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [ 9497.614414] Pid: 30740, comm: mdt01_003 [ 9497.614680] [ 9497.614681] Call Trace: [ 9497.615135] [<ffffffffa056d834>] ? htable_lookup+0x1c4/0x1e0 [obdclass] [ 9497.615456] [<ffffffffa056de4b>] lu_object_find_at+0xab/0x360 [obdclass] [ 9497.615759] [<ffffffff8105ad30>] ? default_wake_function+0x0/0x20 [ 9497.616061] [<ffffffffa056e116>] lu_object_find+0x16/0x20 [obdclass] [ 9497.616375] [<ffffffffa0b1bad6>] mdt_object_find+0x56/0x170 [mdt] [ 9497.616676] [<ffffffffa0b2478d>] mdt_intent_layout+0x12d/0x640 [mdt] [ 9497.616975] [<ffffffffa0b1d5b9>] mdt_intent_policy+0x499/0xca0 [mdt] [ 9497.617299] [<ffffffffa064e32a>] ldlm_lock_enqueue+0x2ea/0x860 [ptlrpc] [ 9497.617684] [<ffffffffa0676c4f>] ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc] [ 9497.618028] [<ffffffffa06ea772>] tgt_enqueue+0x62/0x1d0 [ptlrpc] [ 9497.618340] [<ffffffffa06e8cbf>] tgt_request_handle+0x5ff/0x1200 [ptlrpc] [ 9497.618672] [<ffffffffa06a63d5>] ptlrpc_server_handle_request+0x395/0xc20 [ptlrpc] [ 9497.619169] [<ffffffffa0ec540f>] ? lc_watchdog_touch+0x6f/0x170 [libcfs] [ 9497.619496] [<ffffffffa069dd41>] ? ptlrpc_wait_event+0xc1/0x2e0 [ptlrpc] [ 9497.619819] [<ffffffffa06a76ba>] ptlrpc_main+0xa5a/0x1690 [ptlrpc] [ 9497.620131] [<ffffffffa06a6c60>] ? ptlrpc_main+0x0/0x1690 [ptlrpc] [ 9497.620422] [<ffffffff81094726>] kthread+0x96/0xa0 [ 9497.620692] [<ffffffff8100c10a>] child_rip+0xa/0x20 [ 9497.620955] [<ffffffff81094690>] ? kthread+0x0/0xa0 [ 9497.621219] [<ffffffff8100c100>] ? child_rip+0x0/0x20 [ 9497.621489] [ 9497.622494] LustreError: dumping log to /tmp/lustre-log.1383705981.30740
Attachments
Issue Links
- duplicates
-
LU-4106 racer test hang
- Resolved