[LU-4877] mdt_fix_reply()) ASSERTION( md_packed > 0 ) failed Created: 10/Apr/14 Updated: 30/Jul/14 Resolved: 15/May/14 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.6.0 |
| Fix Version/s: | Lustre 2.6.0, Lustre 2.5.2 |
| Type: | Bug | Priority: | Critical |
| Reporter: | John Hammond | Assignee: | Mikhail Pershin |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | dne2, migration | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 13489 | ||||||||
| Description |
|
I see this running racer on 2.5.57-72-g69ddb2e. I used MDSCOUNT=6 and modified file_create.sh to do less IO by setting SIZE=$((RANDOM % 4)). LustreError: 14686:0:(mdt_reint.c:1446:mdt_reint_migrate_internal()) lustre-MDT0003: can n ot migrate striped dir [0x340000401:0xf:0x0]: rc = -1 LustreError: 14686:0:(mdt_lib.c:633:mdt_fix_reply()) ASSERTION( md_packed > 0 ) failed: LustreError: 14686:0:(mdt_lib.c:633:mdt_fix_reply()) LBUG Pid: 14686, comm: mdt01_001 Call Trace: [<ffffffffa02a9895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] [<ffffffffa02a9e97>] lbug_with_loc+0x47/0xb0 [libcfs] [<ffffffffa0c15630>] mdt_fix_reply+0x5d0/0x6a0 [mdt] [<ffffffffa0bfee98>] mdt_reint_internal+0x548/0x7c0 [mdt] [<ffffffffa0bff69b>] mdt_reint+0x6b/0x120 [mdt] [<ffffffffa06e39ac>] tgt_request_handle+0x23c/0xac0 [ptlrpc] [<ffffffffa069298a>] ptlrpc_main+0xd1a/0x1980 [ptlrpc] [<ffffffffa0691c70>] ? ptlrpc_main+0x0/0x1980 [ptlrpc] [<ffffffff81096a36>] kthread+0x96/0xa0 [<ffffffff8100c0ca>] child_rip+0xa/0x20 [<ffffffff810969a0>] ? kthread+0x0/0xa0 [<ffffffff8100c0c0>] ? child_rip+0x0/0x20 |
| Comments |
| Comment by Mikhail Pershin [ 11/Apr/14 ] |
|
this new mdt_reint_migrate_internal() function calls mdt_stripe_get() which may use big buffer for xattrs. Meanwhile there is logic behind that 'big buffer' usage which is used to determine the reply size. Each time we are using big buffer and it is not to be returned ti the client, we need to mark that by dropping mti_big_lmm_used flag. Look at the end of mdt_reint_unlink() function which handles the same case. The fix could be the same, just dropping the mti_big_lmm_used to the 0 after using big buffer. At the time such hidden logic is bad practice and we need to introduce some clear way to solve this. Probably, that assertion on md_packed in mdt_fix_reply() should be replaced with check and indicate that big buffer is not needed in reply because no buffer was packed at all. |
| Comment by Andreas Dilger [ 28/Apr/14 ] |
|
Mike, any progress on making a patch for this? |
| Comment by Mikhail Pershin [ 29/Apr/14 ] |
|
patch is ready, I forgot to paste link here - http://review.whamcloud.com/10116 |
| Comment by Jodi Levi (Inactive) [ 15/May/14 ] |
|
Patch landed to Master. Please reopen ticket if more work is needed. |