[LU-5994] DT transaction start and object lock ordering Created: 05/Dec/14 Updated: 05/Jun/18 Resolved: 05/Jun/18 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.7.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | John Hammond | Assignee: | nasf (Inactive) |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||
| Severity: | 3 | ||||||||||||
| Rank (Obsolete): | 16718 | ||||||||||||
| Description |
lfsck_layout_slave_conditional_destroy() and lfsck_layout_slave_repair_pfid() both call dt_trans_start_local() while holding a DT write lock on an object. This was found by code inspection. I also ran with the following: diff --git a/lustre/osd-ldiskfs/osd_handler.c b/lustre/osd-ldiskfs/osd_handler.c
index 80d1e67..ed15b0c 100644
--- a/lustre/osd-ldiskfs/osd_handler.c
+++ b/lustre/osd-ldiskfs/osd_handler.c
@@ -1058,6 +1058,8 @@ int osd_trans_start(const struct lu_env *env, struct dt_device *d,
if (!IS_ERR(jh)) {
oh->ot_handle = jh;
LASSERT(oti->oti_txns == 0);
+ LASSERT(oti->oti_w_locks == 0);
+ LASSERT(oti->oti_r_locks == 0);
lu_context_init(&th->th_ctx, th->th_tags);
lu_context_enter(&th->th_ctx);
Running sanity-lfsck.sh I was able to trigger a crash from lfsck_layout_slave_repair_pfid() : [ 1827.711965] Lustre: DEBUG MARKER: == sanity-lfsck test 19b: OST-object inconsistency self repair == 11:01:32 (1417798892) [ 1827.813194] Lustre: DEBUG MARKER: cancel_lru_locks osc start [ 1827.838214] Lustre: *** cfs_fail_loc=1611, val=0*** [ 1827.839821] Lustre: Skipped 3 previous similar messages [ 1827.877070] Lustre: DEBUG MARKER: cancel_lru_locks osc stop [ 1827.957214] LustreError: 21862:0:(osd_handler.c:1061:osd_trans_start()) ASSERTION( oti->oti_w_locks == 0 ) failed: [ 1827.960100] LustreError: 21862:0:(osd_handler.c:1061:osd_trans_start()) LBUG [ 1827.961698] Pid: 21862, comm: inconsistency_v [ 1827.962700] [ 1827.962702] Call Trace: [ 1827.963692] [<ffffffffa052f8c5>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] [ 1827.965393] [<ffffffffa052fec7>] lbug_with_loc+0x47/0xb0 [libcfs] [ 1827.966913] [<ffffffffa0d98809>] osd_trans_start+0x389/0x730 [osd_ldiskfs] [ 1827.968553] [<ffffffffa0c44ed2>] lfsck_layout_slave_in_notify+0x982/0xcf0 [lfsck] [ 1827.969963] [<ffffffffa0c0cd37>] lfsck_in_notify+0xf7/0x5a0 [lfsck] [ 1827.971155] [<ffffffffa0fe6037>] ofd_inconsistency_verification_main+0x367/0xdb0 [ofd] [ 1827.972639] [<ffffffff8105e59d>] ? finish_task_switch+0x7d/0x110 [ 1827.973774] [<ffffffff8105e568>] ? finish_task_switch+0x48/0x110 [ 1827.974896] [<ffffffff81061d90>] ? default_wake_function+0x0/0x20 [ 1827.976039] [<ffffffffa0fe5cd0>] ? ofd_inconsistency_verification_main+0x0/0xdb0 [ofd] [ 1827.977539] [<ffffffff8109e856>] kthread+0x96/0xa0 [ 1827.978448] [<ffffffff8100c30a>] child_rip+0xa/0x20 [ 1827.979368] [<ffffffff815562e0>] ? _spin_unlock_irq+0x30/0x40 [ 1827.980507] [<ffffffff8100bb10>] ? restore_args+0x0/0x30 [ 1827.981704] [<ffffffff8109e7c0>] ? kthread+0x0/0xa0 [ 1827.982787] [<ffffffff8100c300>] ? child_rip+0x0/0x20 [ 1827.983912] I also found that OFD also does this in enough places that I added || strstr(current->comm, "ll_ost") != NULL
to the assertions above. Should OFD be following the same order? |
| Comments |
| Comment by nasf (Inactive) [ 09/Dec/14 ] |
|
Hm, I remembered that why LFSCK used such transaction/dt_lock order. Normally, or say on MDT, we start transaction firstly, then take dt_lock; but on the OST, we use the reverse order, for example, ofd_object_destroy(). I do not know why the OFD has different behaviour as MDT does. |
| Comment by Alex Zhuravlev [ 09/Dec/14 ] |
|
yes, this is a known technical debt.. we always wanted to have locks-the-transaction order (this is what the majority of the filesystem do) and implemented OFD this way, but that would serialize directory operations on MDS.. |
| Comment by nasf (Inactive) [ 09/Dec/14 ] |
|
If according to local filesystem order, we should take dt_lock before transaction started, right? But as a distributed filesystem, what's the potential trouble if we change MDS side order to match OST side? And what we should do for this ticket? We have kept such confused orders for a long time. |
| Comment by Alex Zhuravlev [ 09/Dec/14 ] |
|
well, I think it'd be very good to have a same order of course. as for MDS: basically we use dt_write_lock() to control nlink+/nlink+. IIRC. if we take the lock outside of a transaction, then basically all the operations in a directory are serialized. actually both OSDs protect nlink internally and I wonder wouldn't it be enough. originally this "local" locking was introduced in 2.0 to support a mode where LDLM lock is taken for a node, not for a thread. something to take into account, I guess. |
| Comment by nasf (Inactive) [ 09/Dec/14 ] |
|
Then I would say that it will not be a small change. We need to unify the dt_lock and transaction order on both MDT and OST, that may be affect not only scalabilities but also potential deadlock. Currently, the order of dt_lock and transaction in LFSCK matches both MDT and OST (different functions use different orders). |
| Comment by nasf (Inactive) [ 29/Dec/16 ] |
|
Part of the issues for the order of starting transaction and locking object will be handled via the patch http://review.whamcloud.com/23689 |
| Comment by nasf (Inactive) [ 05/Jun/18 ] |
|
It will be fixed via |