Details
-
Bug
-
Resolution: Duplicate
-
Blocker
-
None
-
Lustre 2.1.6, Lustre 2.5.0
-
Lustre client build: http://build.whamcloud.com/job/lustre-master/1613/
Lustre server build: http://build.whamcloud.com/job/lustre-b2_1/191/ (2.1.5)
Distro/Arch: RHEL6.4/x86_64
-
3
-
9699
Description
sanity test 24u hit the following failure on MDS:
11:06:31:Lustre: DEBUG MARKER: == sanity test 24u: create stripe file == 11:06:31 (1376417191) 11:06:31:LustreError: 13255:0:(mdt_handler.c:224:mdt_lock_pdo_init()) ASSERTION( namelen > 0 ) failed: 11:06:31:LustreError: 13255:0:(mdt_handler.c:224:mdt_lock_pdo_init()) LBUG 11:06:31:Pid: 13255, comm: mdt_01 11:06:31: 11:06:31:Call Trace: 11:06:31: [<ffffffffa04d0785>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] 11:06:31: [<ffffffffa04d0d97>] lbug_with_loc+0x47/0xb0 [libcfs] 11:06:31: [<ffffffffa0bdea65>] mdt_lock_pdo_init+0xe5/0xf0 [mdt] 11:06:31: [<ffffffffa0c127c6>] mdt_reint_open+0x1f6/0x2940 [mdt] 11:06:31: [<ffffffffa077b764>] ? lustre_msg_add_version+0x74/0xd0 [ptlrpc] 11:06:32: [<ffffffffa0ba256e>] ? md_ucred+0x1e/0x60 [mdd] 11:06:32: [<ffffffffa0be15d5>] ? mdt_ucred+0x15/0x20 [mdt] 11:06:32: [<ffffffffa0bf84ec>] ? mdt_root_squash+0x2c/0x3e0 [mdt] 11:06:32: [<ffffffffa0bfcc51>] mdt_reint_rec+0x41/0xe0 [mdt] 11:06:32: [<ffffffffa0bf3ed4>] mdt_reint_internal+0x544/0x8e0 [mdt] 11:06:32: [<ffffffffa0bf453d>] mdt_intent_reint+0x1ed/0x500 [mdt] 11:06:32: [<ffffffffa0bf2c09>] mdt_intent_policy+0x379/0x690 [mdt] 11:06:32: [<ffffffffa0737391>] ldlm_lock_enqueue+0x361/0x8f0 [ptlrpc] 11:06:32: [<ffffffffa075d1ed>] ldlm_handle_enqueue0+0x48d/0xf50 [ptlrpc] 11:06:32: [<ffffffffa0bf3586>] mdt_enqueue+0x46/0x130 [mdt] 11:06:32: [<ffffffffa0be8772>] mdt_handle_common+0x932/0x1750 [mdt] 11:06:32: [<ffffffffa0be9665>] mdt_regular_handle+0x15/0x20 [mdt] 11:06:32: [<ffffffffa078bbae>] ptlrpc_main+0xc4e/0x1a40 [ptlrpc] 11:06:32: [<ffffffffa078af60>] ? ptlrpc_main+0x0/0x1a40 [ptlrpc] 11:06:32: [<ffffffff8100c0ca>] child_rip+0xa/0x20 11:06:32: [<ffffffffa078af60>] ? ptlrpc_main+0x0/0x1a40 [ptlrpc] 11:06:32: [<ffffffffa078af60>] ? ptlrpc_main+0x0/0x1a40 [ptlrpc] 11:06:32: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20 11:06:32: 11:06:32:Kernel panic - not syncing: LBUG
Maloo report: https://maloo.whamcloud.com/test_sets/e3f3b3d8-0525-11e3-8d88-52540035b04c
More instances:
https://maloo.whamcloud.com/test_sets/369e054c-0059-11e3-bb00-52540035b04c
https://maloo.whamcloud.com/test_sets/0bf3fdbc-f8f5-11e2-8917-52540035b04c
https://maloo.whamcloud.com/test_sets/59b2a818-f504-11e2-a8f6-52540035b04c
Attachments
Issue Links
- duplicates
-
LU-3544 Writing to new files under NFS export from Lustre will result in ENOENT (SLES11SP2)
-
- Closed
-
- is related to
-
LU-3233 tgt_cb_last_committed()) ASSERTION( ccb->llcc_exp->exp_obd == ccb->llcc_tgt->lut_obd ) failed:
-
- Resolved
-
- is related to
-
LU-2875 Remove LASSERT()s on return values from req_capsule_client_get() and similar
-
- Resolved
-
Andreas: Thanks for the reminder.
Reading it again, I see my comment above is flawed. The change in newer kernels is all anonymous dentries, I wasn't thinking clearly when I wrote that. We don't need to identify merely the root dentries, this issue applies to all anonymous dentries.
The problem is that when we change the names as we did, we fail that NAMELEN related assertion in 2.1. In retrospect, I think we have succeeded in our goal of isolating anonymous dentries, but perhaps there is a difference between passing NULL in to ll_prep_md_op_data for the name and the anonymous dentry names in 2.6.32 kernels. They appear (I haven't tested, but this is my reading of the code) to be pointers to a string containing nothing but the null terminator, but they aren't actually NULL.
I wonder if this difference isn't significant.