Details
-
Bug
-
Resolution: Duplicate
-
Minor
-
None
-
Lustre 2.5.0
-
3
-
10963
Description
This is the sister bug to LU-4079. Any failure in mdc_finish_enqueue() after the lock handle is copied over to the intent will cause mdc_enqueue() to drop its reference to the lock but return it in the intent anyway. The lock cookie returned through the lockh param is 0. There are several allocations that could fail in mdc_finish_enqueue() but I just added a fail check for simplicity.
mdc_finish_enqueue() { ... intent->it_disposition = (int)lockrep->lock_policy_res1; intent->it_status = (int)lockrep->lock_policy_res2; intent->it_lock_mode = einfo->ei_mode; intent->it_lock_handle = lockh->cookie; intent->it_data = req; ... DEBUG_REQ(D_RPCTRACE, req, "op: %d disposition: %x, status: %d", it->it_op, intent->it_disposition, intent->it_status); + if (OBD_FAIL_CHECK(0x3000)) + RETURN(-EPROTO); + ... } mdc_enqueue() { ... rc = mdc_finish_enqueue(exp, req, einfo, it, lockh, rc); if (rc < 0) { if (lustre_handle_is_used(lockh)) { ldlm_lock_decref(lockh, einfo->ei_mode); memset(lockh, 0, sizeof(*lockh)); } ptlrpc_req_finished(req); } RETURN(rc); } # llmount.sh # sh ./lustre/tests/racer.sh ... == racer test 1: racer on clients: t DURATION=300 == 15:04:21 (1381349061) racers pids: 28220 28221 ... # lctl set_param fail_loc=0x3000 Lustre: *** cfs_fail_loc=3000, val=0*** LustreError: 31331:0:(ldlm_lock.c:851:ldlm_lock_decref_internal_nolock()) ASSERTION( lock\ ->l_readers > 0 ) failed: LustreError: 31331:0:(ldlm_lock.c:851:ldlm_lock_decref_internal_nolock()) LBUG Pid: 31331, comm: mkdir Call Trace: [<ffffffffa0d95895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] [<ffffffffa0d95e97>] lbug_with_loc+0x47/0xb0 [libcfs] [<ffffffffa10167f2>] ldlm_lock_decref_internal_nolock+0xd2/0x180 [ptlrpc] [<ffffffffa101aabd>] ldlm_lock_decref_internal+0x4d/0xad0 [ptlrpc] [<ffffffffa0eb6ae5>] ? class_handle2object+0x95/0x190 [obdclass] [<ffffffffa101bf79>] ldlm_lock_decref+0x39/0x90 [ptlrpc] [<ffffffffa073bd2f>] ll_intent_drop_lock+0xaf/0x150 [lustre] [<ffffffffa07716cb>] ? ll_finish_md_op_data+0x2cb/0x410 [lustre] [<ffffffffa073e5a8>] ll_revalidate_it+0xbe8/0x1b20 [lustre] [<ffffffffa0786940>] ? ll_md_blocking_ast+0x0/0x790 [lustre] [<ffffffffa0786940>] ? ll_md_blocking_ast+0x0/0x790 [lustre] [<ffffffffa073f613>] ll_revalidate_nd+0x133/0x3e0 [lustre] [<ffffffff8118fa45>] __lookup_hash+0x85/0x160 [<ffffffff8119016a>] lookup_hash+0x3a/0x50 [<ffffffff811901ee>] lookup_create+0x6e/0xd0 [<ffffffff81193aac>] sys_mkdirat+0x7c/0x130 [<ffffffff811a36d0>] ? mntput_no_expire+0x30/0x110 [<ffffffff811a36d0>] ? mntput_no_expire+0x30/0x110 [<ffffffff81193b78>] sys_mkdir+0x18/0x20 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b