[LU-11250] Wrong transaction order in mdd_migrate_entries() Created: 14/Aug/18  Updated: 14/Mar/19  Resolved: 14/Mar/19

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Andriy Skulysh Assignee: Andriy Skulysh
Resolution: Fixed Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Most mdd functions starts transaction and than takes mdd_write_lock

but mdd_migrate_entries() does it in reverse order thus it leads to a deadlock

crash> bt 12342 
PID: 12342  TASK: ffff88009b538000  CPU: 1   COMMAND: "mdt00_003" 
 #0 [ffff88009cd6f6d8] __schedule at ffffffff816b3de4 
 #1 [ffff88009cd6f768] schedule at ffffffff816b4409 
 #2 [ffff88009cd6f778] wait_transaction_locked at ffffffffc00bd085 [jbd2] 
 #3 [ffff88009cd6f7d0] add_transaction_credits at ffffffffc00bd368 [jbd2] 
 #4 [ffff88009cd6f830] start_this_handle at ffffffffc00bd5e1 [jbd2] 
 #5 [ffff88009cd6f8c8] jbd2__journal_start at ffffffffc00bdae3 [jbd2] 
 #6 [ffff88009cd6f910] __ldiskfs_journal_start_sb at ffffffffc04f5179 [ldiskfs] 
 #7 [ffff88009cd6f950] osd_trans_start at ffffffffc0abd24e [osd_ldiskfs] 
 #8 [ffff88009cd6f988] top_trans_start at ffffffffc0900fed [ptlrpc] 
 #9 [ffff88009cd6f9e8] lod_trans_start at ffffffffc0d653f1 [lod] 
#10 [ffff88009cd6f9f8] mdd_trans_start at ffffffffc0e1488a [mdd] 
#11 [ffff88009cd6fa08] mdd_migrate_entries at ffffffffc0dfc846 [mdd] 
#12 [ffff88009cd6faa8] mdd_migrate at ffffffffc0dfd1af [mdd] 
#13 [ffff88009cd6fb18] mdt_reint_migrate_internal at ffffffffc0c9802e [mdt] 
#14 [ffff88009cd6fbc8] mdt_reint_rename_or_migrate at ffffffffc0c98515 [mdt] 
#15 [ffff88009cd6fc58] mdt_reint_migrate at ffffffffc0c98b20 [mdt] 
#16 [ffff88009cd6fc68] mdt_reint_rec at ffffffffc0c9cc53 [mdt] 
#17 [ffff88009cd6fc90] mdt_reint_internal at ffffffffc0c7c1bb [mdt] 
#18 [ffff88009cd6fcc8] mdt_reint at ffffffffc0c87187 [mdt] 
#19 [ffff88009cd6fcf8] tgt_request_handle at ffffffffc08ee6ba [ptlrpc] 
#20 [ffff88009cd6fd40] ptlrpc_server_handle_request at ffffffffc0893d43 [ptlrpc] 
#21 [ffff88009cd6fde0] ptlrpc_main at ffffffffc08974f2 [ptlrpc] 
#22 [ffff88009cd6fec8] kthread at ffffffff810b4031 
#23 [ffff88009cd6ff50] ret_from_fork at ffffffff816c1577 
crash> bt 13673 
PID: 13673  TASK: ffff88012c6c8000  CPU: 1   COMMAND: "mdt00_027" 
 #0 [ffff88012c9e39a8] __schedule at ffffffff816b3de4 
 #1 [ffff88012c9e3a30] schedule at ffffffff816b4409 
 #2 [ffff88012c9e3a40] rwsem_down_write_failed at ffffffff816b5cf5 
 #3 [ffff88012c9e3ad8] call_rwsem_down_write_failed at ffffffff81338247 
 #4 [ffff88012c9e3b20] down_write at ffffffff816b356d 
 #5 [ffff88012c9e3b38] osd_write_lock at ffffffffc0ab1b0c [osd_ldiskfs] 
 #6 [ffff88012c9e3b60] lod_write_lock at ffffffffc0d7e33b [lod] 
 #7 [ffff88012c9e3b70] mdd_write_lock at ffffffffc0dff7cb [mdd] 
 #8 [ffff88012c9e3b80] mdd_xattr_set at ffffffffc0e0cdb8 [mdd] 
 #9 [ffff88012c9e3be8] mdt_reint_setxattr at ffffffffc0c9e64b [mdt] 
#10 [ffff88012c9e3c68] mdt_reint_rec at ffffffffc0c9cc53 [mdt] 
#11 [ffff88012c9e3c90] mdt_reint_internal at ffffffffc0c7c1bb [mdt] 
#12 [ffff88012c9e3cc8] mdt_reint at ffffffffc0c87187 [mdt] 
#13 [ffff88012c9e3cf8] tgt_request_handle at ffffffffc08ee6ba [ptlrpc] 
#14 [ffff88012c9e3d40] ptlrpc_server_handle_request at ffffffffc0893d43 [ptlrpc] 
#15 [ffff88012c9e3de0] ptlrpc_main at ffffffffc08974f2 [ptlrpc] 
#16 [ffff88012c9e3ec8] kthread at ffffffff810b4031 
#17 [ffff88012c9e3f50] ret_from_fork at ffffffff816c1577


 Comments   
Comment by Gerrit Updater [ 14/Aug/18 ]

Andriy Skulysh (c17819@cray.com) uploaded a new patch: https://review.whamcloud.com/33000
Subject: LU-11250 migrate: Wrong transaction order in mdd_migrate_entries
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 546d862f16570d86a820c621d9aa5ff5525eb04e

Comment by Andriy Skulysh [ 14/Mar/19 ]

It isn't needed after LU-4684.

Generated at Sat Feb 10 02:42:15 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.