[LU-2943] LBUG mdt_reconstruct_open()) ASSERTION( (!(rc < 0) || (lustre_msg_get_transno(req->rq_repmsg) == 0)) ) - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Major
Fix Version/s: None
Affects Version/s: Lustre 2.1.4
Labels:
- mn1

Severity:
3
Rank (Obsolete):
7064

Description

This issue has already been hit on lustre 2.2 (see ~~LU-1702~~). Traces are exactly the same as for ~~LU-1702~~.

It's been hit four consecutive times so it seems quite easy to reproduce.

2013-03-06 16:05:01 LustreError: 31751:0:(mdt_open.c:1023:mdt_reconstruct_open()) ASSERTION( (!(rc < 0)

(lustre_msg_get_transno(req->rq_repmsg) == 0)) ) failed:
2013-03-06 16:05:01 LustreError: 31751:0:(mdt_open.c:1023:mdt_reconstruct_open()) LBUG
2013-03-06 16:05:01 Pid: 31751, comm: mdt_145
2013-03-06 16:05:01
2013-03-06 16:05:01 Call Trace:
2013-03-06 16:05:01 [<ffffffffa04a27f5>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
2013-03-06 16:05:01 [<ffffffffa04a2e07>] lbug_with_loc+0x47/0xb0 [libcfs]
2013-03-06 16:05:01 [<ffffffffa0d9ed87>] mdt_reconstruct_open+0x7c7/0xa80 [mdt]
2013-03-06 16:05:01 [<ffffffffa0d908c5>] mdt_reconstruct+0x45/0x120 [mdt]
2013-03-06 16:05:01 [<ffffffffa0d7d099>] mdt_reint_internal+0x709/0x8e0 [mdt]
2013-03-06 16:05:01 [<ffffffffa0d7d53d>] mdt_intent_reint+0x1ed/0x500 [mdt]
2013-03-06 16:05:01 [<ffffffffa0d7bc09>] mdt_intent_policy+0x379/0x690 [mdt]
2013-03-06 16:05:01 [<ffffffffa06ca3c1>] ldlm_lock_enqueue+0x361/0x8f0 [ptlrpc]
2013-03-06 16:05:01 [<ffffffffa06f03dd>] ldlm_handle_enqueue0+0x48d/0xf50 [ptlrpc]
2013-03-06 16:05:01 [<ffffffffa0d7c586>] mdt_enqueue+0x46/0x130 [mdt]
2013-03-06 16:05:01 [<ffffffffa0d71762>] mdt_handle_common+0x932/0x1750 [mdt]
2013-03-06 16:05:01 [<ffffffffa0d72655>] mdt_regular_handle+0x15/0x20 [mdt]
2013-03-06 16:05:01 [<ffffffffa071f4f6>] ptlrpc_main+0xd16/0x1a80 [ptlrpc]
2013-03-06 16:05:01 [<ffffffff810017cc>] ? __switch_to+0x1ac/0x320
2013-03-06 16:05:01 [<ffffffffa071e7e0>] ? ptlrpc_main+0x0/0x1a80 [ptlrpc]
2013-03-06 16:05:01 [<ffffffff8100412a>] child_rip+0xa/0x20
2013-03-06 16:05:01 [<ffffffffa071e7e0>] ? ptlrpc_main+0x0/0x1a80 [ptlrpc]
2013-03-06 16:05:01 [<ffffffffa071e7e0>] ? ptlrpc_main+0x0/0x1a80 [ptlrpc]
2013-03-06 16:05:01 [<ffffffff81004120>] ? child_rip+0x0/0x20

On the crash, the file who make the LBUG is a file created by mpio.

Onsite support team made the following analysis

The return status (rc) is -EREMOTE (-66) and it seems the
disposition mask was DISP_IT_EXECD / DISP_LOOKUP_EXECD / DISP_LOOKUP_POS
/ DISP_OPEN_OPEN / DISP_OPEN_LOCK. According to these information, it could be possible that, prior to the LBUG, MDS has run mdt_reint_open() having in return -EREMOTE just before the LBUG.

So mdt_reint_open() would return -EREMOTE and then
mdt_reconstruct_open() does not make attention that in case of -EREMOTE
return there is no msg transno setting ...

On the attachment file you can find the struct mdt_thread_info info data
who made the LBUG and also the req data (struct ptlrpc_request°
and lcd data (struct lsd_client_data).

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

lascaux2420.console.log.mdsissue
37 kB
28/Mar/13 10:25 AM
trace_debug_mdt_reconstruct_open_assertion.txt
139 kB
11/Mar/13 11:07 AM

Issue Links

is related to

LU-3987 LBUG ASSERTION( (!(rc < 0) || (lustre_msg_get_transno(req->rq_repmsg) == 0)) ) failed

Resolved

Activity

People

Assignee:: Bruno Faccini (Inactive)

Reporter:: Diego Moreno (Inactive)

Votes:: 2 Vote for this issue

Watchers:: 12 Start watching this issue

Dates

Created:: 11/Mar/13 11:07 AM

Updated:: 20/Nov/13 9:04 AM

Resolved:: 20/Nov/13 9:04 AM