[LU-4081] mdc_enqueue() may return a freed lock in intent Created: 09/Oct/13  Updated: 03/Jul/14  Resolved: 03/Jul/14

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.5.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: John Hammond Assignee: Dmitry Eremin (Inactive)
Resolution: Duplicate Votes: 0
Labels: mdc

Severity: 3
Rank (Obsolete): 10963

 Description   

This is the sister bug to LU-4079. Any failure in mdc_finish_enqueue() after the lock handle is copied over to the intent will cause mdc_enqueue() to drop its reference to the lock but return it in the intent anyway. The lock cookie returned through the lockh param is 0. There are several allocations that could fail in mdc_finish_enqueue() but I just added a fail check for simplicity.

mdc_finish_enqueue()
{
        ...
        intent->it_disposition = (int)lockrep->lock_policy_res1;
        intent->it_status = (int)lockrep->lock_policy_res2;
        intent->it_lock_mode = einfo->ei_mode;
        intent->it_lock_handle = lockh->cookie;
        intent->it_data = req;

        ...

        DEBUG_REQ(D_RPCTRACE, req, "op: %d disposition: %x, status: %d",
                  it->it_op, intent->it_disposition, intent->it_status);

+       if (OBD_FAIL_CHECK(0x3000))
+               RETURN(-EPROTO);
+
        ...
}

mdc_enqueue()
{
        ...
        rc = mdc_finish_enqueue(exp, req, einfo, it, lockh, rc);
        if (rc < 0) {
                if (lustre_handle_is_used(lockh)) {
                        ldlm_lock_decref(lockh, einfo->ei_mode);
                        memset(lockh, 0, sizeof(*lockh));
                }
                ptlrpc_req_finished(req);
        }
        RETURN(rc);
}

# llmount.sh
# sh ./lustre/tests/racer.sh
...
== racer test 1: racer on clients: t DURATION=300 == 15:04:21 (1381349061)
racers pids: 28220 28221
...
# lctl set_param fail_loc=0x3000

Lustre: *** cfs_fail_loc=3000, val=0***
LustreError: 31331:0:(ldlm_lock.c:851:ldlm_lock_decref_internal_nolock()) ASSERTION( lock\
->l_readers > 0 ) failed:
LustreError: 31331:0:(ldlm_lock.c:851:ldlm_lock_decref_internal_nolock()) LBUG
Pid: 31331, comm: mkdir

Call Trace:
 [<ffffffffa0d95895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
 [<ffffffffa0d95e97>] lbug_with_loc+0x47/0xb0 [libcfs]
 [<ffffffffa10167f2>] ldlm_lock_decref_internal_nolock+0xd2/0x180 [ptlrpc]
 [<ffffffffa101aabd>] ldlm_lock_decref_internal+0x4d/0xad0 [ptlrpc]
 [<ffffffffa0eb6ae5>] ? class_handle2object+0x95/0x190 [obdclass]
 [<ffffffffa101bf79>] ldlm_lock_decref+0x39/0x90 [ptlrpc]
 [<ffffffffa073bd2f>] ll_intent_drop_lock+0xaf/0x150 [lustre]
 [<ffffffffa07716cb>] ? ll_finish_md_op_data+0x2cb/0x410 [lustre]
 [<ffffffffa073e5a8>] ll_revalidate_it+0xbe8/0x1b20 [lustre]
 [<ffffffffa0786940>] ? ll_md_blocking_ast+0x0/0x790 [lustre]
 [<ffffffffa0786940>] ? ll_md_blocking_ast+0x0/0x790 [lustre]
 [<ffffffffa073f613>] ll_revalidate_nd+0x133/0x3e0 [lustre]
 [<ffffffff8118fa45>] __lookup_hash+0x85/0x160
 [<ffffffff8119016a>] lookup_hash+0x3a/0x50
 [<ffffffff811901ee>] lookup_create+0x6e/0xd0
 [<ffffffff81193aac>] sys_mkdirat+0x7c/0x130
 [<ffffffff811a36d0>] ? mntput_no_expire+0x30/0x110
 [<ffffffff811a36d0>] ? mntput_no_expire+0x30/0x110
 [<ffffffff81193b78>] sys_mkdir+0x18/0x20
 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b


 Comments   
Comment by John Hammond [ 03/Jul/14 ]

Fixing this in LU-5289.

Generated at Sat Feb 10 01:39:29 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.