[LU-4152]  layout locks can cause deadlock Created: 28/Oct/13  Updated: 16/Dec/14  Resolved: 12/Feb/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.5.0, Lustre 2.6.0
Fix Version/s: Lustre 2.6.0, Lustre 2.5.1

Type: Bug Priority: Blocker
Reporter: Andriy Skulysh Assignee: Oleg Drokin
Resolution: Fixed Votes: 0
Labels: HSM
Environment:

le


Issue Links:
Duplicate
is duplicated by LU-3601 HSM release causes running restore to... Resolved
Related
is related to LU-6015 mds evicting clients - lock timed out? Resolved
is related to LU-1876 Layout Lock Server Patch Landings to ... Resolved
is related to LU-4398 mdt_object_open_lock() may not flush ... Resolved
is related to LU-3601 HSM release causes running restore to... Resolved
Severity: 3
Rank (Obsolete): 11269

 Description   

LU-1876 adds mdt_object_open_lock() which acquires lock in 2 steps for layout locks.
A deadlock is possible since it isn't atomic and ibits locks are reprocessed until first blocking lock found.

Such situation was hit with mdt_reint_open() & mdt_intent_getattr()

mdt_reint_open()->mdt_open_by_fid_lock() takes first part of the lock (ibits=5),
mdt_intent_getattr() tries to obtain lock (ibits=17)
mdt_open_by_fid_lock() tries to obtain second part but fails due to some conflict with another layout lock2. During cancellation of lock2 only getattr lock is reprocessed.
http://review.whamcloud.com/#/c/7148/1 can help, but it is better to fix mdt_open_by_fid_lock()



 Comments   
Comment by Jodi Levi (Inactive) [ 28/Oct/13 ]

Duplicate of LU-3601

Comment by John Hammond [ 28/Oct/13 ]

Not a duplicate of LU-3601 since that issue describes a similar problem that could have been addresses using http://review.whamcloud.com/#/c/7148/ but was eventually fixed by changing the locking used by the coordinator.

Comment by Oleg Drokin [ 29/Oct/13 ]

I believe I have a patch for this in http://review.whamcloud.com/8083

Now, can you please share exact reproducing details (And also please give the patch a try).

Comment by Oleg Drokin [ 29/Oct/13 ]

lso there's a patch with alternative approach at http://review.whamcloud.com/8088
we might want this one to go into master only and not 2.5 since it's somewhat more involved.
Please also give this one a try.

Comment by Patrick Farrell (Inactive) [ 29/Oct/13 ]

Our first test run with the simpler of the two patches has locked up. The original lock re-ordering patch from LU-3601 allowed us to complete the test run successfully.
I'm going to make the dump available here shortly. It's possible it's the same issue, or it's possible there's some other layout lock related livelock that John's original LU-3601 patch also allowed us to avoid.

The second patch LBUGged almost immediately:
Oct 29 15:52:23 perses-esf-mds001 perses-esf-mds001 kernel: LustreError: 17452:0:(mdt_handler.c:3841:mdt_intent_reint()) ASSERTION( lustre_handle_is_used(&lhc->mlh_reg_lh) ) failed:
Oct 29 15:52:23 perses-esf-mds001 perses-esf-mds001 kernel: LustreError: 17452:0:(mdt_handler.c:3841:mdt_intent_reint()) LBUG
Oct 29 15:52:23 perses-esf-mds001 perses-esf-mds001 kernel: Pid: 17452, comm: mdt02_003
Oct 29 15:52:23 perses-esf-mds001 perses-esf-mds001 kernel:
Oct 29 15:52:23 perses-esf-mds001 perses-esf-mds001 kernel: Call Trace:
Oct 29 15:52:23 perses-esf-mds001 perses-esf-mds001 kernel: [<ffffffffa0a20895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
Oct 29 15:52:23 perses-esf-mds001 perses-esf-mds001 kernel: [<ffffffffa0a20e97>] lbug_with_loc+0x47/0xb0 [libcfs]
Oct 29 15:52:23 perses-esf-mds001 perses-esf-mds001 kernel: [<ffffffffa13ab367>] mdt_intent_reint+0x4e7/0x520 [mdt]
Oct 29 15:52:23 perses-esf-mds001 perses-esf-mds001 kernel: [<ffffffffa13a8f1e>] mdt_intent_policy+0x39e/0x720 [mdt]
Oct 29 15:52:23 perses-esf-mds001 perses-esf-mds001 kernel: [<ffffffffa0cc1831>] ldlm_lock_enqueue+0x361/0x8d0 [ptlrpc]
Oct 29 15:52:23 perses-esf-mds001 perses-esf-mds001 kernel: [<ffffffffa0ce893f>] ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc]
Oct 29 15:52:23 perses-esf-mds001 perses-esf-mds001 kernel: [<ffffffffa13a93a6>] mdt_enqueue+0x46/0xe0 [mdt]
Oct 29 15:52:23 perses-esf-mds001 perses-esf-mds001 kernel: [<ffffffffa13afa97>] mdt_handle_common+0x647/0x16d0 [mdt]
Oct 29 15:52:23 perses-esf-mds001 perses-esf-mds001 kernel: [<ffffffffa0d0b40c>] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc]
Oct 29 15:52:23 perses-esf-mds001 perses-esf-mds001 kernel: [<ffffffffa13e94d5>] mds_regular_handle+0x15/0x20 [mdt]
Oct 29 15:52:23 perses-esf-mds001 perses-esf-mds001 kernel: [<ffffffffa0d1ac28>] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc]
Oct 29 15:52:23 perses-esf-mds001 perses-esf-mds001 kernel: [<ffffffffa0a215de>] ? cfs_timer_arm+0xe/0x10 [libcfs]
Oct 29 15:52:23 perses-esf-mds001 perses-esf-mds001 kernel: [<ffffffffa0a32ebf>] ? lc_watchdog_touch+0x6f/0x170 [libcfs]
Oct 29 15:52:23 perses-esf-mds001 perses-esf-mds001 kernel: [<ffffffffa0d11f89>] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc]
Oct 29 15:52:23 perses-esf-mds001 perses-esf-mds001 kernel: [<ffffffff81055ad3>] ? __wake_up+0x53/0x70
Oct 29 15:52:23 perses-esf-mds001 perses-esf-mds001 kernel: [<ffffffffa0d1bfbe>] ptlrpc_main+0xace/0x1700 [ptlrpc]
Oct 29 15:52:23 perses-esf-mds001 perses-esf-mds001 kernel: [<ffffffffa0d1b4f0>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
Oct 29 15:52:23 perses-esf-mds001 perses-esf-mds001 kernel: [<ffffffff8100c0ca>] child_rip+0xa/0x20
Oct 29 15:52:23 perses-esf-mds001 perses-esf-mds001 kernel: [<ffffffffa0d1b4f0>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
Oct 29 15:52:23 perses-esf-mds001 perses-esf-mds001 kernel: [<ffffffffa0d1b4f0>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
Oct 29 15:52:23 perses-esf-mds001 perses-esf-mds001 kernel: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20

Comment by Patrick Farrell (Inactive) [ 29/Oct/13 ]

MDS dump from apparent live lock with the first patch is up on Cray's FTP at:
ftp.cray.com
u: anonymous
p: anonymous

Then:
cd outbound/LU-4152
And then the file is:
LU-4152-patch_8083.tar.gz

Note that anonymous doesn't have permission to ls in outbound or in LU-4152, but a get command will work fine.

Comment by Oleg Drokin [ 29/Oct/13 ]

apparently the vmcore that's part of this file is for the second panic, because I see the panic message inside.

Comment by Patrick Farrell (Inactive) [ 29/Oct/13 ]

Oops. You're right.

My apologies, Oleg, I grabbed the wrong dump package. Just a minute here.

Comment by Patrick Farrell (Inactive) [ 29/Oct/13 ]

The correct dump is now there.

There are now two files in outbound/LU-4152:
LU-4152-patch_8083.tar.gz
LU-4152-patch_8088.tar.gz

The file named for patch 8083 NOW correctly contains the dump from the hang we saw with patch 8083.
The one named for patch 8088 is the dump I uploaded previously, which has the LBUG in it.

Comment by Oleg Drokin [ 29/Oct/13 ]

I see.
Duh, apparently I made a check with a wrong lock handle.
Please update the patch with this fix and try again (note it introduces some sort of a crash on client shutdown, that I'll tackle soon, but I want to make sure it really kills the problem this time.

--- a/lustre/mdt/mdt_open.c
+++ b/lustre/mdt/mdt_open.c
@@ -1278,7 +1278,7 @@ static int mdt_object_open_lock(struct mdt_thread_info *in
                 * As such we'll drop the open lock we just got above here,
                 * it's ok not to have this open lock as it's main purpose is to
                 * flush unused cached client open handles. */
-               if (lustre_handle_is_used(&ll->mlh_reg_lh))
+               if (lustre_handle_is_used(&lhc->mlh_reg_lh))
                        mdt_object_unlock(info, obj, lhc, 1);
 
                LASSERT(!try_layout);
Comment by Patrick Farrell (Inactive) [ 30/Oct/13 ]

Oleg,

We've just tested the fixed patch and it appears to resolve the issue. We've run our reproducer twice now, and it's hit every time we've run with unpatched code.
Looks like it fixes it - If you can address the client shutdown crash, I think it's good to go.

Comment by Patrick Farrell (Inactive) [ 30/Oct/13 ]

By the way, here's the stack trace from our client dump on shutdown.. It seems simple enough, just an unreleased reference, presumably to one of those locks:
2013-10-30T16:34:38.526134-05:00 c0-0c0s2n3 LustreError: 11673:0:(lu_object.c:1141:lu_device_fini()) ASSERTION( cfs_atomic_read(&d->ld_ref) == 0 ) failed: Refcount is 1
2013-10-30T16:34:38.526158-05:00 c0-0c0s2n3 LustreError: 11673:0:(lu_object.c:1141:lu_device_fini()) LBUG
2013-10-30T16:34:38.526169-05:00 c0-0c0s2n3 Pid: 11673, comm: umount
2013-10-30T16:34:38.526175-05:00 c0-0c0s2n3 Call Trace:
2013-10-30T16:34:38.555669-05:00 c0-0c0s2n3 [<ffffffff81007e59>] try_stack_unwind+0x1a9/0x200
2013-10-30T16:34:38.555680-05:00 c0-0c0s2n3 [<ffffffff81006625>] dump_trace+0x95/0x300
2013-10-30T16:34:38.555693-05:00 c0-0c0s2n3 [<ffffffffa02a28d7>] libcfs_debug_dumpstack+0x57/0x80 [libcfs]
2013-10-30T16:34:38.555700-05:00 c0-0c0s2n3 [<ffffffffa02a2e27>] lbug_with_loc+0x47/0xb0 [libcfs]
2013-10-30T16:34:38.555725-05:00 c0-0c0s2n3 [<ffffffffa05190c7>] lu_device_fini+0x87/0xc0 [obdclass]
2013-10-30T16:34:38.555734-05:00 c0-0c0s2n3 [<ffffffffa094efd4>] lovsub_device_free+0x24/0x160 [lov]
2013-10-30T16:34:38.585779-05:00 c0-0c0s2n3 [<ffffffffa051cd0e>] lu_stack_fini+0x7e/0xc0 [obdclass]
2013-10-30T16:34:38.585790-05:00 c0-0c0s2n3 [<ffffffffa0522c1e>] cl_stack_fini+0xe/0x10 [obdclass]
2013-10-30T16:34:38.585801-05:00 c0-0c0s2n3 [<ffffffffa093e9a8>] lov_device_fini+0x58/0x120 [lov]
2013-10-30T16:34:38.585866-05:00 c0-0c0s2n3 [<ffffffffa051ccda>] lu_stack_fini+0x4a/0xc0 [obdclass]
2013-10-30T16:34:38.585887-05:00 c0-0c0s2n3 [<ffffffffa0522c1e>] cl_stack_fini+0xe/0x10 [obdclass]
2013-10-30T16:34:38.585896-05:00 c0-0c0s2n3 [<ffffffffa0a1edec>] cl_sb_fini+0x6c/0x1a0 [lustre]
2013-10-30T16:34:38.615774-05:00 c0-0c0s2n3 [<ffffffffa09e4fc5>] client_common_put_super+0x55/0xad0 [lustre]
2013-10-30T16:34:38.615784-05:00 c0-0c0s2n3 [<ffffffffa09e5b05>] ll_put_super+0xc5/0x330 [lustre]
2013-10-30T16:34:38.647596-05:00 c0-0c0s2n3 [<ffffffff8111546d>] generic_shutdown_super+0x5d/0x110
2013-10-30T16:34:38.647614-05:00 c0-0c0s2n3 [<ffffffff81115586>] kill_anon_super+0x16/0x60
2013-10-30T16:34:38.647620-05:00 c0-0c0s2n3 [<ffffffffa050f2c6>] lustre_kill_super+0x36/0x50 [obdclass]
2013-10-30T16:34:38.647626-05:00 c0-0c0s2n3 [<ffffffff81115a23>] deactivate_super+0x73/0x90
2013-10-30T16:34:38.673196-05:00 c0-0c0s2n3 [<ffffffff8112db02>] mntput_no_expire+0xc2/0xf0
2013-10-30T16:34:38.673208-05:00 c0-0c0s2n3 [<ffffffff8112debc>] sys_umount+0x7c/0x360
2013-10-30T16:34:38.673213-05:00 c0-0c0s2n3 [<ffffffff8100305b>] system_call_fastpath+0x16/0x1b
2013-10-30T16:34:38.673237-05:00 c0-0c0s2n3 [<00007fe4da8afd07>] 0x7fe4da8afd07
2013-10-30T16:34:38.673245-05:00 c0-0c0s2n3 Kernel panic - not syncing: LBUG
2013-10-30T16:34:38.698848-05:00 c0-0c0s2n3 Pid: 11673, comm: umount Tainted: P 2.6.32.59-0.7.1_1.0402.7496-cray_gem_s #1
2013-10-30T16:34:38.698859-05:00 c0-0c0s2n3 Call Trace:
2013-10-30T16:34:38.698882-05:00 c0-0c0s2n3 [<ffffffff81007e59>] try_stack_unwind+0x1a9/0x200
2013-10-30T16:34:38.698891-05:00 c0-0c0s2n3 [<ffffffff81006625>] dump_trace+0x95/0x300
2013-10-30T16:34:38.698896-05:00 c0-0c0s2n3 [<ffffffff8100786c>] show_trace_log_lvl+0x5c/0x80
2013-10-30T16:34:38.698911-05:00 c0-0c0s2n3 [<ffffffff810078a5>] show_trace+0x15/0x20
2013-10-30T16:34:38.698916-05:00 c0-0c0s2n3 [<ffffffff81427e15>] dump_stack+0x77/0x82
2013-10-30T16:34:38.698921-05:00 c0-0c0s2n3 [<ffffffff81427e9a>] panic+0x7a/0x165
2013-10-30T16:34:38.724451-05:00 c0-0c0s2n3 [<ffffffffa02a2e7b>] lbug_with_loc+0x9b/0xb0 [libcfs]
2013-10-30T16:34:38.724462-05:00 c0-0c0s2n3 [<ffffffffa05190c7>] lu_device_fini+0x87/0xc0 [obdclass]
2013-10-30T16:34:38.724484-05:00 c0-0c0s2n3 [<ffffffffa094efd4>] lovsub_device_free+0x24/0x160 [lov]
2013-10-30T16:34:38.750105-05:00 c0-0c0s2n3 [<ffffffffa051cd0e>] lu_stack_fini+0x7e/0xc0 [obdclass]
2013-10-30T16:34:38.750118-05:00 c0-0c0s2n3 [<ffffffffa0522c1e>] cl_stack_fini+0xe/0x10 [obdclass]
2013-10-30T16:34:38.750142-05:00 c0-0c0s2n3 [<ffffffffa093e9a8>] lov_device_fini+0x58/0x120 [lov]
2013-10-30T16:34:38.750151-05:00 c0-0c0s2n3 [<ffffffffa051ccda>] lu_stack_fini+0x4a/0xc0 [obdclass]
2013-10-30T16:34:38.750156-05:00 c0-0c0s2n3 [<ffffffffa0522c1e>] cl_stack_fini+0xe/0x10 [obdclass]
2013-10-30T16:34:38.750168-05:00 c0-0c0s2n3 [<ffffffffa0a1edec>] cl_sb_fini+0x6c/0x1a0 [lustre]
2013-10-30T16:34:38.775711-05:00 c0-0c0s2n3 [<ffffffffa09e4fc5>] client_common_put_super+0x55/0xad0 [lustre]
2013-10-30T16:34:38.775723-05:00 c0-0c0s2n3 [<ffffffffa09e5b05>] ll_put_super+0xc5/0x330 [lustre]
2013-10-30T16:34:38.775749-05:00 c0-0c0s2n3 [<ffffffff8111546d>] generic_shutdown_super+0x5d/0x110
2013-10-30T16:34:38.775760-05:00 c0-0c0s2n3 [<ffffffff81115586>] kill_anon_super+0x16/0x60
2013-10-30T16:34:38.775766-05:00 c0-0c0s2n3 [<ffffffffa050f2c6>] lustre_kill_super+0x36/0x50 [obdclass]
2013-10-30T16:34:38.801373-05:00 c0-0c0s2n3 [<ffffffff81115a23>] deactivate_super+0x73/0x90
2013-10-30T16:34:38.801384-05:00 c0-0c0s2n3 [<ffffffff8112db02>] mntput_no_expire+0xc2/0xf0
2013-10-30T16:34:38.801410-05:00 c0-0c0s2n3 [<ffffffff8112debc>] sys_umount+0x7c/0x360
2013-10-30T16:34:38.801419-05:00 c0-0c0s2n3 [<ffffffff8100305b>] system_call_fastpath+0x16/0x1b
2013-10-30T16:34:38.801429-05:00 c0-0c0s2n3 [<00007fe4da8afd07>] 0x7fe4da8afd07

Comment by Patrick Farrell (Inactive) [ 06/Nov/13 ]

We've now observed this same deadlock in a non-NFS situation as well.

Oleg - If you can give a pointer on how to get the client reference releasing correctly, I'd be happy to generate and test the patch.

Comment by Andriy Skulysh [ 07/Nov/13 ]
2013-10-30T16:34:38.526134-05:00 c0-0c0s2n3 LustreError: 11673:0:(lu_object.c:1141:lu_device_fini()) ASSERTION( cfs_atomic_read(&d->ld_ref) == 0 ) failed: Refcount is 1
2013-10-30T16:34:38.526158-05:00 c0-0c0s2n3 LustreError: 11673:0:(lu_object.c:1141:lu_device_fini()) LBUG
2013-10-30T16:34:38.526169-05:00 c0-0c0s2n3 Pid: 11673, comm: umount

This LBUG appeared due to

commit d10c9e05ab593f361fdfd27842766a1924e63e58
Author: yangsheng <ys@whamcloud.com>
Date:   Sat May 26 02:12:36 2012 +0800

    LU-506 kernel: FC15 - support dcache scalability changes.

during shutdown ll_file_release() and final dput() is called before ll_md_blocking_ast() and d_lustre_invalidate(), so there is no condition to call d_kill()

Comment by Patrick Farrell (Inactive) [ 07/Nov/13 ]

Andriy - Can you be more specific about what portion of that patch you believe caused this issue, and why you think it's at fault here from the stack trace?

Comment by Oleg Drokin [ 19/Nov/13 ]

Well, I came to a conclusion that the assertion is in fact a preexisting bug condition since I can reproduce it at will without any patches with this:

sh llmount.sh
echo 0 >/proc/sys/lnet/panic_on_lbug
service nfs start
mount localhost:/mnt/lustre /mnt/nfs -t nfs
touch /mnt/nfs/file
ls -l /mnt/lustre
cp -f /etc/passwd /mnt/nfs/file

umount /mnt/nfs
service nfs stop
sh llmountcleanup.sh

I am going to open a new bug for this issue since it seems to be separate.

Comment by Oleg Drokin [ 19/Nov/13 ]

Opened LU-4272
Updated the patch 4083 to fix the typo.

Comment by Patrick Farrell (Inactive) [ 19/Nov/13 ]

Thank you for the update, Oleg. It's nice to know the crash is unrelated.

Comment by Andreas Dilger [ 10/Feb/14 ]

http://review.whamcloud.com/8083 landed to master. What else is still needed to close this bug?

Comment by Jodi Levi (Inactive) [ 12/Feb/14 ]

Patch landed to Master. Closing ticket.

Comment by Aurelien Degremont (Inactive) [ 13/Feb/14 ]

Are we sure this problem should not be fixed in 2.5 too?

Comment by Peter Jones [ 13/Feb/14 ]

Aurelien

That would be tracked separately

Peter

Generated at Sat Feb 10 01:40:08 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.