[LU-5686] (mdt_handler.c:3203:mdt_intent_lock_replace()) ASSERTION( lustre_msg_get_flags(req->rq_reqmsg) & 0x0002 ) failed Created: 30/Sep/14 Updated: 13/Oct/21 Resolved: 13/Oct/21 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.1.6, Lustre 2.4.3 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Bruno Travouillon (Inactive) | Assignee: | Bruno Faccini (Inactive) |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | p4b | ||
| Environment: |
Clients:
|
||
| Issue Links: |
|
||||||||||||
| Severity: | 3 | ||||||||||||
| Rank (Obsolete): | 15925 | ||||||||||||
| Description |
|
We hit the following LBUG twice on one of our MDT: [78073.117731] Lustre: 31681:0:(ldlm_lib.c:952:target_handle_connect()) work2-MDT0000: connection from 38d12a48-aabd-9279-dc69-b78c4e00321c@10.100.62.72@o2ib2 t189645377601 exp ffff880b95bb1c00 cur 1410508503 last 1410508503 [78079.176124] Lustre: 31681:0:(mdt_handler.c:1005:mdt_getattr_name_lock()) Although resent, but still not get child lockparent:[0x22f2b0783:0x34b:0x0] child:[0x22d854b6e:0x85d5:0x0] [78079.192443] LustreError: 31681:0:(mdt_handler.c:3203:mdt_intent_lock_replace()) ASSERTION( lustre_msg_get_flags(req->rq_reqmsg) & 0x0002 ) failed: [78079.205971] LustreError: 31681:0:(mdt_handler.c:3203:mdt_intent_lock_replace()) LBUG [78079.215326] Pid: 31681, comm: mdt_104 [78079.220352] [78079.220353] Call Trace: [78079.227394] [<ffffffffa051a7f5>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] [78079.236100] [<ffffffffa051ae07>] lbug_with_loc+0x47/0xb0 [libcfs] [78079.243815] [<ffffffffa0d9671b>] mdt_intent_lock_replace+0x3bb/0x440 [mdt] [78079.252140] [<ffffffffa0daad26>] mdt_intent_getattr+0x3a6/0x4a0 [mdt] [78079.260391] [<ffffffffa0da6c09>] mdt_intent_policy+0x379/0x690 [mdt] [78079.268641] [<ffffffffa07423c1>] ldlm_lock_enqueue+0x361/0x8f0 [ptlrpc] [78079.276846] [<ffffffffa07683cd>] ldlm_handle_enqueue0+0x48d/0xf50 [ptlrpc] [78079.285614] [<ffffffffa0da7586>] mdt_enqueue+0x46/0x130 [mdt] [78079.292950] [<ffffffffa0d9c762>] mdt_handle_common+0x932/0x1750 [mdt] [78079.300987] [<ffffffffa0d9d655>] mdt_regular_handle+0x15/0x20 [mdt] [78079.309024] [<ffffffffa07974f6>] ptlrpc_main+0xd16/0x1a80 [ptlrpc] [78079.316979] [<ffffffff810017cc>] ? __switch_to+0x1ac/0x320 [78079.324222] [<ffffffffa07967e0>] ? ptlrpc_main+0x0/0x1a80 [ptlrpc] [78079.331896] [<ffffffff8100412a>] child_rip+0xa/0x20 [78079.338522] [<ffffffffa07967e0>] ? ptlrpc_main+0x0/0x1a80 [ptlrpc] [78079.346599] [<ffffffffa07967e0>] ? ptlrpc_main+0x0/0x1a80 [ptlrpc] [78079.354520] [<ffffffff81004120>] ? child_rip+0x0/0x20 [78079.361136] [78079.364683] Kernel panic - not syncing: LBUG The support engineer was able to retrieve the client node from the crash dump. Both time, the client was a login node running Lustre 2.4.3. It looks like |
| Comments |
| Comment by Peter Jones [ 30/Sep/14 ] |
|
Bruno T If this is indeed a duplicate of Bruno F Anything to add/correct? Peter |
| Comment by Bruno Faccini (Inactive) [ 30/Sep/14 ] |
|
Hello Bruno, |
| Comment by Bruno Faccini (Inactive) [ 30/Sep/14 ] |
|
Bruno, can you also check if you run with additional patch from |
| Comment by Bruno Travouillon (Inactive) [ 30/Sep/14 ] |
|
Hi, We run without the patch from I do agree with Peter, a fix for 2.5+ should be sufficient enough. |
| Comment by Bruno Faccini (Inactive) [ 01/Oct/14 ] |
|
Ok fine, but can you provide the list of additional patches for both Client and Server sides ? Thanks in advance. |
| Comment by Bruno Travouillon (Inactive) [ 01/Oct/14 ] |
|
Client: Lustre 2.4.3 + patches
Server: Lustre 2.1.6 + patches
Hope this helps |
| Comment by Bruno Faccini (Inactive) [ 09/Oct/14 ] |
|
Hello Bruno, In fact there are regression issues with b2_4 back-port (http://review.whamcloud.com/#/c/10902/) of |
| Comment by Bruno Travouillon (Inactive) [ 10/Feb/15 ] |
|
Hi, We are now running Lustre 2.5.3 + b2_5 patch http://review.whamcloud.com/#/c/10492/. Since the upgrade, we are hitting several issues on MDS/OSS around the ldlm. Are you aware of any complementary fix that we should apply with this one? In the meantime, we are still investigating those issues onsite and will report them asap in new JIRA tickets. |
| Comment by Bruno Faccini (Inactive) [ 11/Feb/15 ] |
|
Hello Bruno, |
| Comment by Bruno Travouillon (Inactive) [ 11/Feb/15 ] |
|
Hello Bruno, Yes, one of our issue is very close to |
| Comment by Bruno Faccini (Inactive) [ 12/Feb/15 ] |
|
Bruno, |