[LU-5289] mdc_enqueue() may leave an invalid lock handle in intent Created: 02/Jul/14  Updated: 08/Jul/14  Resolved: 08/Jul/14

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.6.0
Fix Version/s: Lustre 2.6.0

Type: Bug Priority: Major
Reporter: John Hammond Assignee: John Hammond
Resolution: Fixed Votes: 0
Labels: llite, mdc

Severity: 3
Rank (Obsolete): 14756

 Description   

I see this running vanilla single node racer with memory allocation fault injection.

[  169.793670] LustreError: 8024:0:(ldlm_lock.c:852:ldlm_lock_decref_internal_nolock()) ASSERTION( lock->l_readers > 0 ) failed: 
[  169.793681] LustreError: 8024:0:(ldlm_lock.c:852:ldlm_lock_decref_internal_nolock()) LBUG
[  169.793687] Pid: 8024, comm: setfattr
[  169.793690] 
[  169.793691] Call Trace:
[  169.793731]  [<ffffffffa02be8c5>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
[  169.793757]  [<ffffffffa02beec7>] lbug_with_loc+0x47/0xb0 [libcfs]
[  169.793848]  [<ffffffffa0643842>] ldlm_lock_decref_internal_nolock+0xd2/0x180 [ptlrpc]
[  169.793923]  [<ffffffffa0646d40>] ldlm_lock_decref_internal+0x50/0xae0 [ptlrpc]
[  169.793993]  [<ffffffffa0438b7e>] ? class_handle2object+0x3e/0x1d0 [obdclass]
[  169.794052]  [<ffffffffa06481b9>] ldlm_lock_decref+0x39/0x90 [ptlrpc]
[  169.794088]  [<ffffffffa0e31b6f>] ll_intent_drop_lock+0xaf/0x150 [lustre]
[  169.794113]  [<ffffffffa0e31c51>] ll_intent_release+0x41/0x1d0 [lustre]
[  169.794150]  [<ffffffffa0e7e9c8>] ll_lookup_nd+0x108/0x4a0 [lustre]
[  169.794158]  [<ffffffff811b29b5>] do_lookup+0x1a5/0x230
[  169.794163]  [<ffffffff811b2fc4>] __link_path_walk+0x584/0x840
[  169.794168]  [<ffffffff811b398a>] path_walk+0x6a/0xe0
[  169.794172]  [<ffffffff811b3b9b>] filename_lookup+0x6b/0xc0
[  169.794177]  [<ffffffff811b4cc7>] user_path_at+0x57/0xa0
[  169.794182]  [<ffffffff8119f6c3>] ? sys_close+0x43/0x120
[  169.794187]  [<ffffffff8119f6c3>] ? sys_close+0x43/0x120
[  169.794192]  [<ffffffff811cb418>] sys_setxattr+0x48/0xe0
[  169.794200]  [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
[  169.794203] 

This was triggered by a allocation failure in mdc_enqueue_finish(). But the issue is at the bottom of mdc_enqueue():

        rc = mdc_finish_enqueue(exp, req, einfo, it, lockh, rc);
	if (rc < 0) {
                if (lustre_handle_is_used(lockh)) {
                        ldlm_lock_decref(lockh, einfo->ei_mode);
                        memset(lockh, 0, sizeof(*lockh));
                }
                ptlrpc_req_finished(req);
        }
        RETURN(rc);
}

We should clean it_lock_handle and it_lock_mode as well.

More generally mdc_enqueue() should not have a *lockh parameter at all but to fix this we probably need split md_enqueue() into md_enqueue() and md_flock().



 Comments   
Comment by John Hammond [ 03/Jul/14 ]

Please see http://review.whamcloud.com/#/c/10963/.

Comment by John Hammond [ 08/Jul/14 ]

Patch landed to master.

Generated at Sat Feb 10 01:50:11 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.