[LU-5289] mdc_enqueue() may leave an invalid lock handle in intent Created: 02/Jul/14 Updated: 08/Jul/14 Resolved: 08/Jul/14 |
|
| Status: | Closed |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.6.0 |
| Fix Version/s: | Lustre 2.6.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | John Hammond | Assignee: | John Hammond |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | llite, mdc | ||
| Severity: | 3 |
| Rank (Obsolete): | 14756 |
| Description |
|
I see this running vanilla single node racer with memory allocation fault injection. [ 169.793670] LustreError: 8024:0:(ldlm_lock.c:852:ldlm_lock_decref_internal_nolock()) ASSERTION( lock->l_readers > 0 ) failed: [ 169.793681] LustreError: 8024:0:(ldlm_lock.c:852:ldlm_lock_decref_internal_nolock()) LBUG [ 169.793687] Pid: 8024, comm: setfattr [ 169.793690] [ 169.793691] Call Trace: [ 169.793731] [<ffffffffa02be8c5>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] [ 169.793757] [<ffffffffa02beec7>] lbug_with_loc+0x47/0xb0 [libcfs] [ 169.793848] [<ffffffffa0643842>] ldlm_lock_decref_internal_nolock+0xd2/0x180 [ptlrpc] [ 169.793923] [<ffffffffa0646d40>] ldlm_lock_decref_internal+0x50/0xae0 [ptlrpc] [ 169.793993] [<ffffffffa0438b7e>] ? class_handle2object+0x3e/0x1d0 [obdclass] [ 169.794052] [<ffffffffa06481b9>] ldlm_lock_decref+0x39/0x90 [ptlrpc] [ 169.794088] [<ffffffffa0e31b6f>] ll_intent_drop_lock+0xaf/0x150 [lustre] [ 169.794113] [<ffffffffa0e31c51>] ll_intent_release+0x41/0x1d0 [lustre] [ 169.794150] [<ffffffffa0e7e9c8>] ll_lookup_nd+0x108/0x4a0 [lustre] [ 169.794158] [<ffffffff811b29b5>] do_lookup+0x1a5/0x230 [ 169.794163] [<ffffffff811b2fc4>] __link_path_walk+0x584/0x840 [ 169.794168] [<ffffffff811b398a>] path_walk+0x6a/0xe0 [ 169.794172] [<ffffffff811b3b9b>] filename_lookup+0x6b/0xc0 [ 169.794177] [<ffffffff811b4cc7>] user_path_at+0x57/0xa0 [ 169.794182] [<ffffffff8119f6c3>] ? sys_close+0x43/0x120 [ 169.794187] [<ffffffff8119f6c3>] ? sys_close+0x43/0x120 [ 169.794192] [<ffffffff811cb418>] sys_setxattr+0x48/0xe0 [ 169.794200] [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b [ 169.794203] This was triggered by a allocation failure in mdc_enqueue_finish(). But the issue is at the bottom of mdc_enqueue(): rc = mdc_finish_enqueue(exp, req, einfo, it, lockh, rc); if (rc < 0) { if (lustre_handle_is_used(lockh)) { ldlm_lock_decref(lockh, einfo->ei_mode); memset(lockh, 0, sizeof(*lockh)); } ptlrpc_req_finished(req); } RETURN(rc); } We should clean it_lock_handle and it_lock_mode as well. More generally mdc_enqueue() should not have a *lockh parameter at all but to fix this we probably need split md_enqueue() into md_enqueue() and md_flock(). |
| Comments |
| Comment by John Hammond [ 03/Jul/14 ] |
|
Please see http://review.whamcloud.com/#/c/10963/. |
| Comment by John Hammond [ 08/Jul/14 ] |
|
Patch landed to master. |