Details
-
Bug
-
Resolution: Fixed
-
Major
-
None
-
Lustre 2.1.4
-
3
-
7064
Description
This issue has already been hit on lustre 2.2 (see LU-1702). Traces are exactly the same as for LU-1702.
It's been hit four consecutive times so it seems quite easy to reproduce.
2013-03-06 16:05:01 LustreError: 31751:0:(mdt_open.c:1023:mdt_reconstruct_open()) ASSERTION( (!(rc < 0)
(lustre_msg_get_transno(req->rq_repmsg) == 0)) ) failed: 2013-03-06 16:05:01 LustreError: 31751:0:(mdt_open.c:1023:mdt_reconstruct_open()) LBUG 2013-03-06 16:05:01 Pid: 31751, comm: mdt_145 2013-03-06 16:05:01 2013-03-06 16:05:01 Call Trace: 2013-03-06 16:05:01 [<ffffffffa04a27f5>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] 2013-03-06 16:05:01 [<ffffffffa04a2e07>] lbug_with_loc+0x47/0xb0 [libcfs] 2013-03-06 16:05:01 [<ffffffffa0d9ed87>] mdt_reconstruct_open+0x7c7/0xa80 [mdt] 2013-03-06 16:05:01 [<ffffffffa0d908c5>] mdt_reconstruct+0x45/0x120 [mdt] 2013-03-06 16:05:01 [<ffffffffa0d7d099>] mdt_reint_internal+0x709/0x8e0 [mdt] 2013-03-06 16:05:01 [<ffffffffa0d7d53d>] mdt_intent_reint+0x1ed/0x500 [mdt] 2013-03-06 16:05:01 [<ffffffffa0d7bc09>] mdt_intent_policy+0x379/0x690 [mdt] 2013-03-06 16:05:01 [<ffffffffa06ca3c1>] ldlm_lock_enqueue+0x361/0x8f0 [ptlrpc] 2013-03-06 16:05:01 [<ffffffffa06f03dd>] ldlm_handle_enqueue0+0x48d/0xf50 [ptlrpc] 2013-03-06 16:05:01 [<ffffffffa0d7c586>] mdt_enqueue+0x46/0x130 [mdt] 2013-03-06 16:05:01 [<ffffffffa0d71762>] mdt_handle_common+0x932/0x1750 [mdt] 2013-03-06 16:05:01 [<ffffffffa0d72655>] mdt_regular_handle+0x15/0x20 [mdt] 2013-03-06 16:05:01 [<ffffffffa071f4f6>] ptlrpc_main+0xd16/0x1a80 [ptlrpc] 2013-03-06 16:05:01 [<ffffffff810017cc>] ? __switch_to+0x1ac/0x320 2013-03-06 16:05:01 [<ffffffffa071e7e0>] ? ptlrpc_main+0x0/0x1a80 [ptlrpc] 2013-03-06 16:05:01 [<ffffffff8100412a>] child_rip+0xa/0x20 2013-03-06 16:05:01 [<ffffffffa071e7e0>] ? ptlrpc_main+0x0/0x1a80 [ptlrpc] 2013-03-06 16:05:01 [<ffffffffa071e7e0>] ? ptlrpc_main+0x0/0x1a80 [ptlrpc] 2013-03-06 16:05:01 [<ffffffff81004120>] ? child_rip+0x0/0x20 |
---|
On the crash, the file who make the LBUG is a file created by mpio.
Onsite support team made the following analysis
The return status (rc) is -EREMOTE (-66) and it seems the
disposition mask was DISP_IT_EXECD / DISP_LOOKUP_EXECD / DISP_LOOKUP_POS
/ DISP_OPEN_OPEN / DISP_OPEN_LOCK. According to these information, it could be possible that, prior to the LBUG, MDS has run mdt_reint_open() having in return -EREMOTE just before the LBUG.
So mdt_reint_open() would return -EREMOTE and then
mdt_reconstruct_open() does not make attention that in case of -EREMOTE
return there is no msg transno setting ...
On the attachment file you can find the struct mdt_thread_info info data
who made the LBUG and also the req data (struct ptlrpc_request°
and lcd data (struct lsd_client_data).
Attachments
Issue Links
- is related to
-
LU-3987 LBUG ASSERTION( (!(rc < 0) || (lustre_msg_get_transno(req->rq_repmsg) == 0)) ) failed
- Resolved