[LU-4877] mdt_fix_reply()) ASSERTION( md_packed > 0 ) failed Created: 10/Apr/14  Updated: 30/Jul/14  Resolved: 15/May/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.6.0
Fix Version/s: Lustre 2.6.0, Lustre 2.5.2

Type: Bug Priority: Critical
Reporter: John Hammond Assignee: Mikhail Pershin
Resolution: Fixed Votes: 0
Labels: dne2, migration

Issue Links:
Related
is related to LU-5429 insanity test_1: FAIL: Start of /dev/... Resolved
Severity: 3
Rank (Obsolete): 13489

 Description   

I see this running racer on 2.5.57-72-g69ddb2e. I used MDSCOUNT=6 and modified file_create.sh to do less IO by setting SIZE=$((RANDOM % 4)).

LustreError: 14686:0:(mdt_reint.c:1446:mdt_reint_migrate_internal()) lustre-MDT0003: can n
ot migrate striped dir [0x340000401:0xf:0x0]: rc = -1
LustreError: 14686:0:(mdt_lib.c:633:mdt_fix_reply()) ASSERTION( md_packed > 0 ) failed: 
LustreError: 14686:0:(mdt_lib.c:633:mdt_fix_reply()) LBUG
Pid: 14686, comm: mdt01_001

Call Trace:
 [<ffffffffa02a9895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
 [<ffffffffa02a9e97>] lbug_with_loc+0x47/0xb0 [libcfs]
 [<ffffffffa0c15630>] mdt_fix_reply+0x5d0/0x6a0 [mdt]
 [<ffffffffa0bfee98>] mdt_reint_internal+0x548/0x7c0 [mdt]
 [<ffffffffa0bff69b>] mdt_reint+0x6b/0x120 [mdt]
 [<ffffffffa06e39ac>] tgt_request_handle+0x23c/0xac0 [ptlrpc]
 [<ffffffffa069298a>] ptlrpc_main+0xd1a/0x1980 [ptlrpc]
 [<ffffffffa0691c70>] ? ptlrpc_main+0x0/0x1980 [ptlrpc]
 [<ffffffff81096a36>] kthread+0x96/0xa0
 [<ffffffff8100c0ca>] child_rip+0xa/0x20
 [<ffffffff810969a0>] ? kthread+0x0/0xa0
 [<ffffffff8100c0c0>] ? child_rip+0x0/0x20


 Comments   
Comment by Mikhail Pershin [ 11/Apr/14 ]

this new mdt_reint_migrate_internal() function calls mdt_stripe_get() which may use big buffer for xattrs. Meanwhile there is logic behind that 'big buffer' usage which is used to determine the reply size. Each time we are using big buffer and it is not to be returned ti the client, we need to mark that by dropping mti_big_lmm_used flag. Look at the end of mdt_reint_unlink() function which handles the same case.

The fix could be the same, just dropping the mti_big_lmm_used to the 0 after using big buffer. At the time such hidden logic is bad practice and we need to introduce some clear way to solve this. Probably, that assertion on md_packed in mdt_fix_reply() should be replaced with check and indicate that big buffer is not needed in reply because no buffer was packed at all.

Comment by Andreas Dilger [ 28/Apr/14 ]

Mike, any progress on making a patch for this?

Comment by Mikhail Pershin [ 29/Apr/14 ]

patch is ready, I forgot to paste link here - http://review.whamcloud.com/10116

Comment by Jodi Levi (Inactive) [ 15/May/14 ]

Patch landed to Master. Please reopen ticket if more work is needed.

Generated at Sat Feb 10 01:46:38 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.