[LU-4725] wrong lock ordering in rename leads to deadlocks Created: 06/Mar/14  Updated: 08/Mar/23  Resolved: 16/Jun/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.1.6, Lustre 2.6.0, Lustre 2.5.2, Lustre 2.4.3
Fix Version/s: Lustre 2.6.0, Lustre 2.5.4

Type: Bug Priority: Blocker
Reporter: Vitaly Fertman Assignee: Hongchao Zhang
Resolution: Fixed Votes: 0
Labels: patch

Issue Links:
Duplicate
duplicates LU-5178 object leak in mdt_rename_sanity() Closed
Related
is related to LU-5514 After upgrade from 1.8.7 to 2.5.2 sta... Resolved
is related to LU-5111 interop 2.5 server and higher version... Resolved
is related to LU-16589 sanityn test_55d: FAIL: (2) mv succeeded Resolved
is related to LU-5178 object leak in mdt_rename_sanity() Closed
Severity: 3
Rank (Obsolete): 12984

 Description   

the current rename code locks objects in the order: src parent, dst parent, src child, dst child. it may happen that dst is a parent of src, what may lead to deadlock.

example from a core dump:
res1 - dst parent
res2 - dst parent, PDO
res3 - src parent

Thread 1 (T1), rename:
Has RES3 (CW,0x2)
Wants RES1 (CW,0x2)

Thread 2 (T2), getattr:
Has RES1(CR,0x2)
Has RES2(PR,0x2)
Wants RES3(PR,0x2) - blocked by T1

Thread 3 (T3), create or open|create
Has RES1(CW,0x2)
Wants RES2(PW,0x2) - blocked by T2

Thread4 (T4), getattr or similar
Wants RES1(PR,0x2) - blocked by T3

T1 has no conflicts, but is sitting in the waiting queue behind T4, thus not granted.



 Comments   
Comment by Vitaly Fertman [ 06/Mar/14 ]

CODE: http://review.whamcloud.com/9538

Comment by Andreas Dilger [ 11/Mar/14 ]

Vitaly, does this problem also affect 2.5 or 2.4?

Comment by Jodi Levi (Inactive) [ 12/Mar/14 ]

HongChao,
Could you please keep an eye on this patch?

Comment by Patrick Farrell (Inactive) [ 14/Mar/14 ]

Andreas - This problem report's originally from Cray. I can confirm it seems to affect all current versions of Lustre. At least, we've seen it in 2.1 (Xyratex's server code), and Cray's 2.4 and 2.5 servers.

And looking at the proposed cause (the rename locking order that comes up when moving an object up in to its grandparent), that's the same in all of those versions, so it makes sense.

Comment by Vitaly Fertman [ 17/Mar/14 ]

Andreas, this ordering has not changed since was introduced in 2.0.
(sorry for the delay, I was on PTO)

Comment by Jodi Levi (Inactive) [ 17/Apr/14 ]

HongChao,
Can you please refresh this patch?
Thank you!

Comment by Hongchao Zhang [ 18/Apr/14 ]

the patch has been updated

Comment by Jodi Levi (Inactive) [ 21/Apr/14 ]

Test was retriggered today.

Comment by Peter Jones [ 09/May/14 ]

Landed for 2.6

Comment by Andreas Dilger [ 16/May/14 ]

Reopened because of a new problem found. http://review.whamcloud.com/10342

Comment by Oleg Drokin [ 20/May/14 ]

Sothe current deadlocks in rename I am observinghave a trace like this:

[24234.786010] Call Trace:
[24234.786802]  [<ffffffffa052a304>] ? htable_lookup+0x1c4/0x1e0 [obdclass]
[24234.787351]  [<ffffffffa052a91b>] lu_object_find_at+0xab/0x350 [obdclass]
[24234.787869]  [<ffffffff8105de00>] ? default_wake_function+0x0/0x20
[24234.788398]  [<ffffffffa052abd6>] lu_object_find+0x16/0x20 [obdclass]
[24234.788918]  [<ffffffffa0ce4e06>] mdt_object_find+0x56/0x170 [mdt]
[24234.789480]  [<ffffffffa0d03e32>] mdt_rename_sanity+0xa2/0x4a0 [mdt]
[24234.790006]  [<ffffffffa0d08ad8>] mdt_reint_rename_internal+0xe08/0x1a80 [mdt]
[24234.790890]  [<ffffffffa06be630>] ? ldlm_lock_enqueue+0x1d0/0x8b0 [ptlrpc]
[24234.791424]  [<ffffffffa0d09994>] mdt_reint_rename_or_migrate+0x244/0x660 [mdt]
[24234.792291]  [<ffffffffa06dc6c0>] ? ldlm_blocking_ast+0x0/0x180 [ptlrpc]
[24234.792832]  [<ffffffffa06ddd30>] ? ldlm_completion_ast+0x0/0x930 [ptlrpc]
[24234.793414]  [<ffffffffa0d09de0>] mdt_reint_rename+0x10/0x20 [mdt]
[24234.793959]  [<ffffffffa0d024b1>] mdt_reint_rec+0x41/0xe0 [mdt]
[24234.795042]  [<ffffffffa0ce7e03>] mdt_reint_internal+0x4c3/0x7c0 [mdt]
[24234.795374]  [<ffffffffa0ce868b>] mdt_reint+0x6b/0x120 [mdt]
[24234.795915]  [<ffffffffa0767b7c>] tgt_request_handle+0x23c/0xac0 [ptlrpc]
[24234.796475]  [<ffffffffa0718fb8>] ptlrpc_main+0xcc8/0x1950 [ptlrpc]
[24234.797078]  [<ffffffffa07182f0>] ? ptlrpc_main+0x0/0x1950 [ptlrpc]
[24234.797579]  [<ffffffff81098c06>] kthread+0x96/0xa0
[24234.798041]  [<ffffffff8100c24a>] child_rip+0xa/0x20
[24234.798495]  [<ffffffff81098b70>] ? kthread+0x0/0xa0
[24234.798981]  [<ffffffff8100c240>] ? child_rip+0x0/0x20
[24234.799449] 
(gdb) l *mdt_rename_sanity+0xa2
0x23e82 is in mdt_rename_sanity (/home/green/git/lustre-release-double/lustre/mdt/mdt_reint.c:1263).
1258			return 0;
1259	
1260		do {
1261			LASSERT(fid_is_sane(&dst_fid));
1262			dst = mdt_object_find(info->mti_env, info->mti_mdt, &dst_fid);
1263			if (!IS_ERR(dst)) {
1264				rc = mdo_is_subdir(info->mti_env,
1265						   mdt_object_child(dst), fid,
1266						   &dst_fid);
1267				mdt_object_put(info->mti_env, dst);
(gdb) l *(mdt_reint_rename_internal+0xe08)
0x28b28 is in mdt_reint_rename_internal (/home/green/git/lustre-release-double/lustre/mdt/mdt_reint.c:1783).
1778			GOTO(out_unlock_parents, rc = PTR_ERR(mold));
1779	
1780		/* Check if @mtgtdir is subdir of @mold, before locking child
1781		 * to avoid reverse locking. */
1782		rc = mdt_rename_sanity(info, rr->rr_fid2, old_fid);
1783		if (rc)
1784			GOTO(out_put_old, rc);
1785	
1786		tgt_vbr_obj_set(info->mti_env, mdt_obj2dt(mold));
1787		/* save version after locking */

The apparent problem is such that while we are holding object reference on target (or it's parent) after we looked it up it got unlinked and attempted to be destroyed, but we are trying to find it again ewhile lholding one reference and get deadlocked in find_object_at waiting for all referenced to go away due to HEARD_BANSHEE being set.

Comment by Vitaly Fertman [ 29/May/14 ]

alternatives for the last problem:
http://review.whamcloud.com/10342
http://review.whamcloud.com/10484

Comment by Vitaly Fertman [ 03/Jun/14 ]

independently of the previous fixes, but still related to rename, another deadlock is found:
http://review.whamcloud.com/10570

Comment by Vitaly Fertman [ 04/Jun/14 ]

the last patch is moved to a separate ticket: LU-5144

Comment by Jodi Levi (Inactive) [ 16/Jun/14 ]

Patches landed to Master. Closing this ticket and LU-5144 as the patch moved to that ticket was also landed.

Comment by Rahul Deshmukh (Inactive) [ 01/Jul/14 ]

I have back ported LU-4725 patches to b2_5 and here are the links:

http://review.whamcloud.com/#/c/10916/
http://review.whamcloud.com/#/c/10917/

Not sure if I need to create new bug for this? Please help.

Comment by Rahul Deshmukh (Inactive) [ 04/Aug/14 ]

The result for first patch shows two tests, this because first patch is not complete.
All 19 tests pass for second patch. (I was not sure where to mention it in gerrit, so updating here).

http://review.whamcloud.com/#/c/10916/
http://review.whamcloud.com/#/c/10917/

Comment by Jian Yu [ 27/Aug/14 ]

Just combined the above two patches into one: http://review.whamcloud.com/11615

Generated at Sat Feb 10 01:45:18 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.