[LU-4370] lu_object.h:853:lu_object_attr()) ASSERTION( ((o)->lo_header->loh_attr & LOHA_EXISTS) != 0 ) failed: Created: 09/Dec/13  Updated: 07/Jan/14  Resolved: 07/Jan/14

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.6.0
Fix Version/s: Lustre 2.6.0

Type: Bug Priority: Major
Reporter: John Hammond Assignee: Di Wang
Resolution: Fixed Votes: 0
Labels: dne, mdt

Severity: 3
Rank (Obsolete): 11968

 Description   

Running racer on a single node with MDSCOUNT=2 I see this:

crash> bt
PID: 6987   TASK: ffff8801ccbbe040  CPU: 1   COMMAND: "mdt00_008"
 #0 [ffff8801c619d808] machine_kexec at ffffffff81035d6b
 #1 [ffff8801c619d868] crash_kexec at ffffffff810c0e22
 #2 [ffff8801c619d938] panic at ffffffff8150f01f
 #3 [ffff8801c619d9b8] lbug_with_loc at ffffffffa02a9eeb [libcfs]
 #4 [ffff8801c619d9d8] mdt_getattr_internal at ffffffffa0ba07b4 [mdt]
 #5 [ffff8801c619da68] mdt_getattr_name_lock at ffffffffa0ba1bc6 [mdt]
 #6 [ffff8801c619db18] mdt_intent_getattr at ffffffffa0ba2883 [mdt]
 #7 [ffff8801c619db78] mdt_intent_policy at ffffffffa0b91979 [mdt]
 #8 [ffff8801c619dbd8] ldlm_lock_enqueue at ffffffffa062f509 [ptlrpc]
 #9 [ffff8801c619dc38] ldlm_handle_enqueue0 at ffffffffa0658c4f [ptlrpc]
#10 [ffff8801c619dca8] tgt_enqueue at ffffffffa06d2562 [ptlrpc]
#11 [ffff8801c619dcc8] tgt_handle_request0 at ffffffffa06d4f5a [ptlrpc]
#12 [ffff8801c619dd58] tgt_request_handle at ffffffffa06d653a [ptlrpc]
#13 [ffff8801c619dda8] ptlrpc_main at ffffffffa068a295 [ptlrpc]
#14 [ffff8801c619dee8] kthread at ffffffff81096a36
#15 [ffff8801c619df48] kernel_thread at ffffffff8100c0ca

This is from:

        if (info->mti_cross_ref) {
                ...

                if (rc == 0) {
                        /* Finally, we can get attr for child. */
                        mdt_set_capainfo(info, 0, mdt_object_fid(child),
                                         BYPASS_CAPA);
                        rc = mdt_getattr_internal(info, child, 0);
                        if (unlikely(rc != 0))
                                mdt_object_unlock(info, child, lhc, 1);
                }
                RETURN(rc);
        }

Above we are checking that parent (which is really child) exists but only if lname is non-NULL.

There are several more assertions in mdt_getattr_name_lock(), mdt_getattr_internal, and mdt_raw_lookup() which just depend on the politeness of clients. These should be collected and replaced with error handling.



 Comments   
Comment by Peter Jones [ 10/Dec/13 ]

Di

Could you please look into this one?

Thanks

Peter

Comment by Di Wang [ 10/Dec/13 ]

John, could you please tell me which line hit this Assertion failed. I also tried racer with MDSCOUNT=2, but it can pass for me with these 2 patches

http://review.whamcloud.com/#/c/8370/
http://review.whamcloud.com/#/c/8371/

probably you can try with these 2 patches, if you are interested. Thanks

Comment by John Hammond [ 10/Dec/13 ]

Di,

It's from the lu_object_attr() call in mdt_getattr_internal(). Change 8371 will address this.

Comment by John Hammond [ 07/Jan/14 ]

Fixed by http://review.whamcloud.com/#/c/8371/.

Generated at Sat Feb 10 01:42:07 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.