[LU-12017] Truncate vs setxattr deadlock with DoM Created: 25/Feb/19  Updated: 29/Apr/20  Resolved: 27/Aug/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.13.0, Lustre 2.12.3

Type: Bug Priority: Major
Reporter: Andriy Skulysh Assignee: Mikhail Pershin
Resolution: Fixed Votes: 0
Labels: patch

Issue Links:
Related
is related to LU-13467 truncate deadlock with DoM files Resolved
is related to LU-11285 don't stop on the first blocked lock ... Resolved
is related to LU-13456 Fix LU-12017 Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

setxattr takes inode lock and sends reint to MDS.
truncate takes MDS_INODELOCK_DOM lock and  wants to acquire inode lock.

PID: 14942 TASK: ffff88007659cf10 CPU: 3 COMMAND: "truncate"
 #0 [ffff88011f397af8] __schedule at ffffffff816b3de4
 #1 [ffff88011f397b88] schedule_preempt_disabled at ffffffff816b5329
 #2 [ffff88011f397b98] __mutex_lock_slowpath at ffffffff816b30d7
 #3 [ffff88011f397bf0] mutex_lock at ffffffff816b24bf
 #4 [ffff88011f397c08] vvp_io_setattr_start at ffffffffc118993d [lustre]
 #5 [ffff88011f397c40] cl_io_start at ffffffffc06e7a25 [obdclass]
 #6 [ffff88011f397c68] cl_io_loop at ffffffffc06e9e01 [obdclass]
 #7 [ffff88011f397cd8] cl_setattr_ost at ffffffffc11847ef [lustre]
 #8 [ffff88011f397d28] ll_setattr_raw at ffffffffc11614d8 [lustre]
 #9 [ffff88011f397df0] ll_setattr at ffffffffc11617d3 [lustre]
#10 [ffff88011f397e00] notify_change at ffffffff81223bc4
#11 [ffff88011f397e48] do_truncate at ffffffff81203445
#12 [ffff88011f397ec0] vfs_truncate at ffffffff8120361c
#13 [ffff88011f397ef8] do_sys_truncate at ffffffff8120370c
#14 [ffff88011f397f40] sys_truncate at ffffffff812038de
PID: 15194 TASK: ffff880077f18000 CPU: 1 COMMAND: "setfattr"
 #0 [ffff88011d33b8b8] __schedule at ffffffff816b3de4
 #1 [ffff88011d33b948] schedule at ffffffff816b4409
 #2 [ffff88011d33b958] schedule_timeout at ffffffff816b1ca4
 #3 [ffff88011d33ba00] ptlrpc_set_wait at ffffffffc09070a0 [ptlrpc]
 #4 [ffff88011d33baf0] ptlrpc_queue_wait at ffffffffc09074e3 [ptlrpc]
 #5 [ffff88011d33bb10] mdc_xattr_common at ffffffffc0b52186 [mdc]
 #6 [ffff88011d33bb90] mdc_setxattr at ffffffffc0b522de [mdc]
 #7 [ffff88011d33bbd0] lmv_setxattr at ffffffffc0872524 [lmv]
 #8 [ffff88011d33bc48] ll_xattr_set_common at ffffffffc1175b54 [lustre]
 #9 [ffff88011d33bcc8] ll_xattr_set_common_3_11 at ffffffffc11769ab [lustre]
#10 [ffff88011d33bcd8] generic_setxattr at ffffffff8122c2d8
#11 [ffff88011d33bd10] __vfs_setxattr_noperm at ffffffff8122cb45
#12 [ffff88011d33bd58] vfs_setxattr at ffffffff8122cd45
#13 [ffff88011d33bd98] setxattr at ffffffff8122ce7e
#14 [ffff88011d33bef0] sys_setxattr at ffffffff8122d177

MDS locks are for different bits MDS_INODELOCK_UPDATE|MDS_INODELOCK_XATTR vs
MDS_INODELOCK_DOM but they blocks each other if some blocking lock was present earlier because Lustre tries to grant only first lock in the waiting list.



 Comments   
Comment by Andriy Skulysh [ 25/Feb/19 ]

I suppose the easiest way to resolve the deadlock is to add a separate lock type for DoM lock.

Comment by Mikhail Pershin [ 01/Mar/19 ]

are you able to reproduce this behavior? There is LU-11285 which should help

Comment by Andriy Skulysh [ 10/May/19 ]

I can confirm that LU-11285 helps with my reproducer

Comment by Patrick Farrell (Inactive) [ 10/May/19 ]

Are you able to share your reproducer, maybe as a testcase?

Comment by Mikhail Pershin [ 11/May/19 ]

I don't see how deadlock happens there, if truncate and setxattr are not dependent than other third lock can't cause deadlock because it will be granted at some moment and let waiting queue go further. The only case when it cause deadlock is when truncate and setxattr are dependent somehow so one is needed other to be granted. So I also wonder about reproduces to investigate this case more closely.

LU-11285 was made for other reasons, DOM lock as IO lock can be held for significant time and any other DOM lock in waiting queue might block all MD locks in that queue with no reasons. So this is sort of optimization when locks with non-crossing bits are not blocking each other. Strictly speaking that is not deadlock but lock timeout case, increasing lock timeout resolves that which is not possible with real deadlock. So LU-11285 is not deadlock resolver, thought may help with some deadlocks, but right way it to fix the source of deadlock.

Comment by Alexey Lyashkov [ 13/May/19 ]

This deadlock caused because CLIO code takes a i_mutex lock after DoM lock is obtained.
Fixing client side - solve this issue completely without ldlm code changes.

Comment by Mikhail Pershin [ 13/May/19 ]

Exactly, that is what I've meant, solving that at client side is the right thing. Alexey, could you give particular function name where that is happening?

Comment by Alexey Lyashkov [ 13/May/19 ]

ll_setattr_raw - release a i_mutex to take it later.

test patch (passed sanity / sanity-dom), including Andriy test for this paticual deadlock.

diff --git a/lustre/llite/llite_lib.c b/lustre/llite/llite_lib.c
index 1839163936..d33d02519e 100644
--- a/lustre/llite/llite_lib.c
+++ b/lustre/llite/llite_lib.c
@@ -1566,11 +1566,11 @@ static int ll_md_setattr(struct dentry *dentry, struct md_op_data *op_data)
        /* inode size will be in ll_setattr_ost, can't do it now since dirty
         * cache is not cleared yet. */
        op_data->op_attr.ia_valid &= ~(TIMES_SET_FLAGS | ATTR_SIZE);
-       if (S_ISREG(inode->i_mode))
-               inode_lock(inode);
+//     if (S_ISREG(inode->i_mode))
+//             inode_lock(inode);
        rc = simple_setattr(dentry, &op_data->op_attr);
-       if (S_ISREG(inode->i_mode))
-               inode_unlock(inode);
+//     if (S_ISREG(inode->i_mode))
+//             inode_unlock(inode);
        op_data->op_attr.ia_valid = ia_valid;

        rc = ll_update_inode(inode, &md);
@@ -1657,11 +1657,11 @@ int ll_setattr_raw(struct dentry *dentry, struct iattr *attr,
                        LTIME_S(attr->ia_mtime), LTIME_S(attr->ia_ctime),
                       (s64)ktime_get_real_seconds());

-       if (S_ISREG(inode->i_mode)) {
-               if (attr->ia_valid & ATTR_SIZE)
-                       inode_dio_write_done(inode);
-               inode_unlock(inode);
-       }
+//     if (S_ISREG(inode->i_mode)) {
+//             if (attr->ia_valid & ATTR_SIZE)
+//                     inode_dio_write_done(inode);
+//             inode_unlock(inode);
+//     }

        /* We always do an MDS RPC, even if we're only changing the size;
         * only the MDS knows whether truncate() should fail with -ETXTBUSY */
@@ -1746,7 +1746,7 @@ out:
                ll_finish_md_op_data(op_data);

        if (S_ISREG(inode->i_mode)) {
-               inode_lock(inode);
+//             inode_lock(inode);
                if ((attr->ia_valid & ATTR_SIZE) && !hsm_import)
                        inode_dio_wait(inode);
                /* Once we've got the i_mutex, it's safe to set the S_NOSEC
diff --git a/lustre/llite/vvp_io.c b/lustre/llite/vvp_io.c
index ac2cd0ea20..f9cbf6a9cc 100644
--- a/lustre/llite/vvp_io.c
+++ b/lustre/llite/vvp_io.c
@@ -712,10 +712,10 @@ static int vvp_io_setattr_start(const struct lu_env *env,

        if (cl_io_is_trunc(io)) {
                down_write(&lli->lli_trunc_sem);
-               inode_lock(inode);
+//             inode_lock(inode);
                inode_dio_wait(inode);
        } else {
-               inode_lock(inode);
+//             inode_lock(inode);
        }

        if (io->u.ci_setattr.sa_avalid & TIMES_SET_FLAGS)
@@ -736,10 +736,10 @@ static void vvp_io_setattr_end(const struct lu_env *env,
                 * because osc has already notified to destroy osc_extents. */
                vvp_do_vmtruncate(inode, io->u.ci_setattr.sa_attr.lvb_size);
                inode_dio_write_done(inode);
-               inode_unlock(inode);
+//             inode_unlock(inode);
                up_write(&lli->lli_trunc_sem);
        } else {
-               inode_unlock(inode);
+//             inode_unlock(inode);
        }
 }
Comment by Alexey Lyashkov [ 13/May/19 ]

as about deadlock scenario it's deadlock via client side.
truncate releases an i_mutex in ll_setattr_raw and start to take DoM lock, setxattr remove a parent lookup lock, so open start from lookup step but MDT have DOM always open option set, so open have conflict with DoM lock but i_mutex was held. Setattr have a DoM lock and start to take i_mutex..
It looks setxattr have posibility to have deadlock with truncate itself, but he should don't conflict with DoM lock (I don't check it really).

OOPS. Deadlock.

Comment by Mikhail Pershin [ 13/May/19 ]

I wonder why inode lock was released and taken back later in vvp_io_setattr_start(). Probably that was also intended to resolve other lock conflicts. Anyway, thanks for analysis, if it is possible, please attach test for that particular deadlock, so I will be able to use it for testing too

Comment by Mikhail Pershin [ 13/May/19 ]

Ah, that is because of lli_trunc_sem/i_mutex lock ordering, LU-7927, commit 5d60fd75152d10
Well there are things to think about

Comment by Alexey Lyashkov [ 13/May/19 ]

No. it's not LU-7927. inode_lock was exist before it. This ticket was changed a place, but not add locking where.

-       inode_lock(inode);
        if (cl_io_is_trunc(io)) {
                down_write(&lli->lli_trunc_sem);
+               inode_lock(inode);
                inode_dio_wait(inode);
+       } else {
+               inode_lock(inode);
        }
Comment by Gerrit Updater [ 04/Jun/19 ]

Andriy Skulysh (c17819@cray.com) uploaded a new patch: https://review.whamcloud.com/35057
Subject: LU-12017 test: DoM truncate deadlock
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 2f0a4e2a2073fa4c0fced7f1e155ff7b4372298a

Comment by Andriy Skulysh [ 05/Jun/19 ]

Similar deadlock (getattr vs rename) is triggered :

granted lock:
     <struct ldlm_lock 0xffff88007905eb40>
     0x73f93784e720791e  [0x200000402:0x14c8:0x0] pid 20988
     MDS_INODELOCK_LOOKUP|MDS_INODELOCK_UPDATE
waiting locks:
     <struct ldlm_lock 0xffff8801243d8000>
     0x73f93784e72079b1 [0x200000402:0x14c8:0x0] pid 10035
     MDS_INODELOCK_LOOKUP|MDS_INODELOCK_UPDATE|MDS_INODELOCK_PERM
 
     <struct ldlm_lock 0xffff880090f23440>
     0x73f93784e72079f0 [0x200000402:0x14c8:0x0] pid 20988
     MDS_INODELOCK_DOM
Comment by Andriy Skulysh [ 12/Jun/19 ]

lock 0xffff88007905eb40 is for mnew in mdt_reint_rename()
lock 0xffff880090f23440 is requested by same thread mdt_reint_rename()->mdt_dom_discard_data()
so DOM lock can't be  granted and mnew lock can't be unlocked as we need to send the reply

Comment by Mikhail Pershin [ 12/Jun/19 ]

Andriy, currently the mdt_dom_discard_data() doesn't wait for lock to be granted, that sort of deadlock should be fixed already by LU-11359

Comment by Cory Spitz [ 29/Jul/19 ]

pjones, can we target this for 2.13.0?

Comment by Cory Spitz [ 08/Aug/19 ]

I've added L2.13 to the Fix Version.

Comment by Gerrit Updater [ 27/Aug/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35057/
Subject: LU-12017 ldlm: DoM truncate deadlock
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 2250e072c37855d611aa64027945981fe2c8f4d7

Comment by Peter Jones [ 27/Aug/19 ]

Landed for 2.13

Comment by Gerrit Updater [ 27/Aug/19 ]

Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/35937
Subject: LU-12017 ldlm: DoM truncate deadlock
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: 3f8d06297971efbf773df70c97edb1a7f96a34ad

Comment by Mikhail Pershin [ 31/Aug/19 ]

The problem is not fixed fully, patch helps to avoid some situations with deadlock when IBITS of conflicting locks don't intersect but deadlock still exists and may happen in future when other IBITS combination will occur. The reason of deadlock on client side is not yet fixed.

I'd like to keep this ticket opened with lowered severity and priority or create new ticket with link to this one to fix the reason of deadlock on client side

Comment by Peter Jones [ 31/Aug/19 ]

As long as there is some value to landing the original patch I would suggest moving remaining work to a distinct ticket.

Comment by Gerrit Updater [ 12/Sep/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35937/
Subject: LU-12017 ldlm: DoM truncate deadlock
Project: fs/lustre-release
Branch: b2_12
Current Patch Set:
Commit: 671ece3104edd3342838e2c0eda350e72219dc97

Comment by Mikhail Pershin [ 06/Apr/20 ]

Short update for everyone interested. After discussion with Vitaly Fertman there is conclusion about possibility of new deadlocks due to reverse locking for i_mutex and DOM LDLM lock in truncate path. That still can cause deadlocks in generic case like the following:

th1: setattr took DOM lock, going to take inode lock
th2: wants some lock with DOM|XXX and stuck waiting the th1's DOM lock
th3: already have inode lock, enqueue some XXX bits and waits these bits on th2 waiting lock.

Due to reverse order of taking inode lock and ldlm lock in th1 and th3, any th2 with DOM and XXX bits which are common with th3 bits can cause deadlock. Ideally that should be resolved by changing truncate for proper lock order, though that can be non-trivial. Until that we can just avoid combining DOM bit with other if they cannot be granted immediately. In most cases it is already so, the only exception is OPEN path, mdt_object_open_lock() particularly. In case of similar deadlocks it can be switched to 'trybits' mode, by setting mdt.*.dom_lock=trylock parameter

Comment by Alexey Lyashkov [ 07/Apr/20 ]

Mike,

i_mutex is useless for truncate case now. IO path uses an unlocked version with range lock for long time, truncate vs IO protected by separate mutex.

Comment by Cory Spitz [ 25/Apr/20 ]

This comment relates to LU-13467.

Generated at Sat Feb 10 02:48:56 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.