Details
-
Bug
-
Resolution: Fixed
-
Major
-
Lustre 2.6.0
-
3
-
11968
Description
Running racer on a single node with MDSCOUNT=2 I see this:
crash> bt PID: 6987 TASK: ffff8801ccbbe040 CPU: 1 COMMAND: "mdt00_008" #0 [ffff8801c619d808] machine_kexec at ffffffff81035d6b #1 [ffff8801c619d868] crash_kexec at ffffffff810c0e22 #2 [ffff8801c619d938] panic at ffffffff8150f01f #3 [ffff8801c619d9b8] lbug_with_loc at ffffffffa02a9eeb [libcfs] #4 [ffff8801c619d9d8] mdt_getattr_internal at ffffffffa0ba07b4 [mdt] #5 [ffff8801c619da68] mdt_getattr_name_lock at ffffffffa0ba1bc6 [mdt] #6 [ffff8801c619db18] mdt_intent_getattr at ffffffffa0ba2883 [mdt] #7 [ffff8801c619db78] mdt_intent_policy at ffffffffa0b91979 [mdt] #8 [ffff8801c619dbd8] ldlm_lock_enqueue at ffffffffa062f509 [ptlrpc] #9 [ffff8801c619dc38] ldlm_handle_enqueue0 at ffffffffa0658c4f [ptlrpc] #10 [ffff8801c619dca8] tgt_enqueue at ffffffffa06d2562 [ptlrpc] #11 [ffff8801c619dcc8] tgt_handle_request0 at ffffffffa06d4f5a [ptlrpc] #12 [ffff8801c619dd58] tgt_request_handle at ffffffffa06d653a [ptlrpc] #13 [ffff8801c619dda8] ptlrpc_main at ffffffffa068a295 [ptlrpc] #14 [ffff8801c619dee8] kthread at ffffffff81096a36 #15 [ffff8801c619df48] kernel_thread at ffffffff8100c0ca
This is from:
if (info->mti_cross_ref) { ... if (rc == 0) { /* Finally, we can get attr for child. */ mdt_set_capainfo(info, 0, mdt_object_fid(child), BYPASS_CAPA); rc = mdt_getattr_internal(info, child, 0); if (unlikely(rc != 0)) mdt_object_unlock(info, child, lhc, 1); } RETURN(rc); }
Above we are checking that parent (which is really child) exists but only if lname is non-NULL.
There are several more assertions in mdt_getattr_name_lock(), mdt_getattr_internal, and mdt_raw_lookup() which just depend on the politeness of clients. These should be collected and replaced with error handling.