[LU-14621] Broken lock-transaction ordering in MDS code Created: 18/Apr/21  Updated: 29/Aug/23  Resolved: 31/Jul/21

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Upstream
Fix Version/s: Lustre 2.15.0

Type: Bug Priority: Minor
Reporter: Alex Zhuravlev Assignee: Alex Zhuravlev
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-17048 Crash in lod_declare_update_extents Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

there are few places where local lock is taken before transaction starts which breaks transaction-then-locks rule:

 lbug_with_loc.cold.6+0x18/0x18 [libcfs]
 ? osd_trans_start+0x2f1/0x5a0 [osd_ldiskfs]
 osd_trans_start+0x2fd/0x5a0 [osd_ldiskfs]
 top_trans_start+0x40c/0x940 [ptlrpc]
 ? mdd_orphan_declare_delete+0x176/0x5c0 [mdd]
 mdd_orphan_cleanup_thread+0xaa1/0x18f0 [mdd]
 ? mdd_orphan_declare_delete+0x5c0/0x5c0 [mdd]
 kthread+0x11a/0x130
[<0>] libcfs_call_trace+0x76/0xa0 [libcfs]
[<0>] lbug_with_loc+0x3e/0x80 [libcfs]
[<0>] osd_trans_start+0x2fd/0x5a0 [osd_ldiskfs]
[<0>] top_trans_start+0x40c/0x940 [ptlrpc]
[<0>] mdd_swap_layouts+0x12f5/0x2350 [mdd]
[<0>] mdt_swap_layouts+0x40e/0x9a0 [mdt]
[<0>] osd_trans_start+0x2fd/0x5a0 [osd_ldiskfs]
[<0>] top_trans_start+0x40c/0x940 [ptlrpc]
[<0>] mdd_xattr_set+0x18c7/0x2e50 [mdd]
[<0>] mdt_close_handle_layouts+0xe23/0x1160 [mdt]
[<0>] mdt_mfd_close+0x5af/0x3110 [mdt]
[<0>] mdt_close_internal+0xfd/0x230 [mdt]
[<0>] mdt_close+0x60a/0x840 [mdt]
Call Trace:
[<0>] libcfs_call_trace+0x76/0xa0 [libcfs]
[<0>] lbug_with_loc+0x3e/0x80 [libcfs]
[<0>] osd_trans_start+0x2fd/0x5a0 [osd_ldiskfs]
[<0>] __lfsck_layout_update_pfid+0x15f/0x580 [lfsck]
[<0>] lfsck_layout_slave_in_notify_local+0x4ed/0x710 [lfsck]
[<0>] lfsck_in_notify_local+0x81/0x3f0 [lfsck]
[<0>] ofd_inconsistency_verification_main+0x1f2/0xa70 [ofd]

it's trivial to reproduce with the following patch:

index 04984f5d9f..b91b7c73e5 100644
--- a/lustre/osd-ldiskfs/osd_handler.c
+++ b/lustre/osd-ldiskfs/osd_handler.c
@@ -1913,6 +1913,9 @@ static int osd_trans_start(const struct lu_env *env, struct dt_device *d,
 
        ENTRY;
 
+       LASSERT(oti->oti_w_locks == 0);
+       LASSERT(oti->oti_r_locks == 0);
+
        LASSERT(current->journal_info == NULL);
 
        oh = container_of(th, struct osd_thandle, ot_super);


 Comments   
Comment by Gerrit Updater [ 18/Apr/21 ]

Alex Zhuravlev (bzzz@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/43362
Subject: LU-14621 mdd: fix lock-tx order in orphan cleanup
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 3150af6df4128db727e3aab1dce9334555fa8afa

Comment by Gerrit Updater [ 19/Apr/21 ]

Alex Zhuravlev (bzzz@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/43366
Subject: LU-14621 mdd: fix lock-tx order in mdd_xattr_merge()
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: afb757d02a34b7c8a2227f6a316b51dfab8698e7

Comment by Gerrit Updater [ 05/May/21 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/43362/
Subject: LU-14621 mdd: fix lock-tx order in orphan cleanup
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: f7396ce80fc780eca2645e371785cba256c55fa1

Comment by Gerrit Updater [ 31/Jul/21 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/43366/
Subject: LU-14621 mdd: fix lock-tx order in mdd_xattr_merge()
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: b7bd4e3422935fec82d13348d90ec205ac2f4da4

Comment by Peter Jones [ 31/Jul/21 ]

Landed for 2.15

Generated at Sat Feb 10 03:11:20 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.