[LU-5994] DT transaction start and object lock ordering Created: 05/Dec/14  Updated: 05/Jun/18  Resolved: 05/Jun/18

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.7.0
Fix Version/s: None

Type: Bug Priority: Major
Reporter: John Hammond Assignee: nasf (Inactive)
Resolution: Duplicate Votes: 0
Labels: None

Issue Links:
Related
is related to LU-10048 osd-ldiskfs to truncate outside of ma... Resolved
is related to LU-8806 LFSCK hangs on MDT - osp_precreate_cl... Resolved
Severity: 3
Rank (Obsolete): 16718

 Description   
lfsck_layout_slave_conditional_destroy()

and

lfsck_layout_slave_repair_pfid()

both call dt_trans_start_local() while holding a DT write lock on an object. This was found by code inspection.

I also ran with the following:

diff --git a/lustre/osd-ldiskfs/osd_handler.c b/lustre/osd-ldiskfs/osd_handler.c
index 80d1e67..ed15b0c 100644
--- a/lustre/osd-ldiskfs/osd_handler.c
+++ b/lustre/osd-ldiskfs/osd_handler.c
@@ -1058,6 +1058,8 @@ int osd_trans_start(const struct lu_env *env, struct dt_device *d,
         if (!IS_ERR(jh)) {
                 oh->ot_handle = jh;
                 LASSERT(oti->oti_txns == 0);
+               LASSERT(oti->oti_w_locks == 0);
+               LASSERT(oti->oti_r_locks == 0);
                 lu_context_init(&th->th_ctx, th->th_tags);
                 lu_context_enter(&th->th_ctx);

Running sanity-lfsck.sh I was able to trigger a crash from

lfsck_layout_slave_repair_pfid()

:

[ 1827.711965] Lustre: DEBUG MARKER: == sanity-lfsck test 19b: OST-object inconsistency self repair == 11:01:32 (1417798892)
[ 1827.813194] Lustre: DEBUG MARKER: cancel_lru_locks osc start
[ 1827.838214] Lustre: *** cfs_fail_loc=1611, val=0***
[ 1827.839821] Lustre: Skipped 3 previous similar messages
[ 1827.877070] Lustre: DEBUG MARKER: cancel_lru_locks osc stop
[ 1827.957214] LustreError: 21862:0:(osd_handler.c:1061:osd_trans_start()) ASSERTION( oti->oti_w_locks == 0 ) failed:
[ 1827.960100] LustreError: 21862:0:(osd_handler.c:1061:osd_trans_start()) LBUG
[ 1827.961698] Pid: 21862, comm: inconsistency_v
[ 1827.962700]
[ 1827.962702] Call Trace:
[ 1827.963692]  [<ffffffffa052f8c5>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
[ 1827.965393]  [<ffffffffa052fec7>] lbug_with_loc+0x47/0xb0 [libcfs]
[ 1827.966913]  [<ffffffffa0d98809>] osd_trans_start+0x389/0x730 [osd_ldiskfs]
[ 1827.968553]  [<ffffffffa0c44ed2>] lfsck_layout_slave_in_notify+0x982/0xcf0 [lfsck]
[ 1827.969963]  [<ffffffffa0c0cd37>] lfsck_in_notify+0xf7/0x5a0 [lfsck]
[ 1827.971155]  [<ffffffffa0fe6037>] ofd_inconsistency_verification_main+0x367/0xdb0 [ofd]
[ 1827.972639]  [<ffffffff8105e59d>] ? finish_task_switch+0x7d/0x110
[ 1827.973774]  [<ffffffff8105e568>] ? finish_task_switch+0x48/0x110
[ 1827.974896]  [<ffffffff81061d90>] ? default_wake_function+0x0/0x20
[ 1827.976039]  [<ffffffffa0fe5cd0>] ? ofd_inconsistency_verification_main+0x0/0xdb0 [ofd]
[ 1827.977539]  [<ffffffff8109e856>] kthread+0x96/0xa0
[ 1827.978448]  [<ffffffff8100c30a>] child_rip+0xa/0x20
[ 1827.979368]  [<ffffffff815562e0>] ? _spin_unlock_irq+0x30/0x40
[ 1827.980507]  [<ffffffff8100bb10>] ? restore_args+0x0/0x30
[ 1827.981704]  [<ffffffff8109e7c0>] ? kthread+0x0/0xa0
[ 1827.982787]  [<ffffffff8100c300>] ? child_rip+0x0/0x20
[ 1827.983912]

I also found that OFD also does this in enough places that I added

 || strstr(current->comm, "ll_ost") != NULL

to the assertions above. Should OFD be following the same order?



 Comments   
Comment by nasf (Inactive) [ 09/Dec/14 ]

Hm, I remembered that why LFSCK used such transaction/dt_lock order. Normally, or say on MDT, we start transaction firstly, then take dt_lock; but on the OST, we use the reverse order, for example, ofd_object_destroy().

I do not know why the OFD has different behaviour as MDT does.
Alex, what's your suggestion for that?

Comment by Alex Zhuravlev [ 09/Dec/14 ]

yes, this is a known technical debt.. we always wanted to have locks-the-transaction order (this is what the majority of the filesystem do) and implemented OFD this way, but that would serialize directory operations on MDS..

Comment by nasf (Inactive) [ 09/Dec/14 ]

If according to local filesystem order, we should take dt_lock before transaction started, right? But as a distributed filesystem, what's the potential trouble if we change MDS side order to match OST side? And what we should do for this ticket? We have kept such confused orders for a long time.

Comment by Alex Zhuravlev [ 09/Dec/14 ]

well, I think it'd be very good to have a same order of course. as for MDS: basically we use dt_write_lock() to control nlink+/nlink+. IIRC. if we take the lock outside of a transaction, then basically all the operations in a directory are serialized. actually both OSDs protect nlink internally and I wonder wouldn't it be enough. originally this "local" locking was introduced in 2.0 to support a mode where LDLM lock is taken for a node, not for a thread. something to take into account, I guess.

Comment by nasf (Inactive) [ 09/Dec/14 ]

Then I would say that it will not be a small change. We need to unify the dt_lock and transaction order on both MDT and OST, that may be affect not only scalabilities but also potential deadlock. Currently, the order of dt_lock and transaction in LFSCK matches both MDT and OST (different functions use different orders).
Since Lustre-2.7 feature landing has been already closed, we can resolve this known main technical debt in next release.

Comment by nasf (Inactive) [ 29/Dec/16 ]

Part of the issues for the order of starting transaction and locking object will be handled via the patch http://review.whamcloud.com/23689

Comment by nasf (Inactive) [ 05/Jun/18 ]

It will be fixed via LU-10048.

Generated at Sat Feb 10 01:56:17 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.