[LU-2971] deadlock of changelog storing & canceling Created: 15/Mar/13  Updated: 15/Mar/13  Resolved: 15/Mar/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Andriy Skulysh Assignee: WC Triage
Resolution: Cannot Reproduce Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 7240

 Description   

MDT changelog uses incorrect locking order. It should start transaction before writing to llog.



 Comments   
Comment by Andriy Skulysh [ 15/Mar/13 ]

Incorrect locking order results in following deadlock:

All mdt threads are waiting to start transaction, but current one is deadlocked by following threads:

PID: 13564  TASK: ffff880e3ca94100  CPU: 8   COMMAND: "mdt_384"
 #0 [ffff880e2f567650] schedule at ffffffff814d6f09
 #1 [ffff880e2f567718] rwsem_down_failed_common at ffffffff814d9375
 #2 [ffff880e2f567778] rwsem_down_write_failed at ffffffff814d94d3
 #3 [ffff880e2f5677b8] call_rwsem_down_write_failed at ffffffff8126ee83
 #4 [ffff880e2f567818] llog_cat_current_log.clone.0 at ffffffffa058f1cb [obdclass]
 #5 [ffff880e2f5678b8] llog_cat_add_rec at ffffffffa058feca [obdclass]
 #6 [ffff880e2f567908] llog_obd_origin_add at ffffffffa0595ad7 [obdclass]
 #7 [ffff880e2f567938] llog_add at ffffffffa0595cb1 [obdclass]
 #8 [ffff880e2f567988] mdd_changelog_llog_write at ffffffffa0bf74dc [mdd]
 #9 [ffff880e2f5679d8] mdd_changelog_ns_store at ffffffffa0be8534 [mdd]
#10 [ffff880e2f567a58] mdd_create at ffffffffa0beee7e [mdd]
#11 [ffff880e2f567b98] cml_create at ffffffffa0d95467 [cmm]
#12 [ffff880e2f567be8] mdt_pdir_hash_lock.clone.0 at ffffffffa0c6792f [mdt]
#13 [ffff880e2f567c68] mdt_reint_create at ffffffffa0c67cc8 [mdt
PID: 13670  TASK: ffff880e2bb3d580  CPU: 2   COMMAND: "mdt_472"
 #0 [ffff880e2bb3f5c0] schedule at ffffffff814d6f09
 #1 [ffff880e2bb3f688] start_this_handle at ffffffffa03e409a [jbd2]
 #2 [ffff880e2bb3f748] jbd2_journal_start at ffffffffa03e4510 [jbd2]
 #3 [ffff880e2bb3f798] ldiskfs_journal_start_sb at ffffffffa0d26b28 [ldiskfs]
 #4 [ffff880e2bb3f7a8] fsfilt_ldiskfs_write_record at ffffffffa0d7152a [fsfilt_ldiskfs]
 #5 [ffff880e2bb3f7f8] llog_lvfs_write_blob at ffffffffa059144d [obdclass]
 #6 [ffff880e2bb3f868] llog_lvfs_write_rec at ffffffffa0592d07 [obdclass]
 #7 [ffff880e2bb3f908] llog_cat_add_rec at ffffffffa058ff69 [obdclass]
 #8 [ffff880e2bb3f958] llog_obd_origin_add at ffffffffa0595ad7 [obdclass]
 #9 [ffff880e2bb3f988] llog_add at ffffffffa0595cb1 [obdclass]
#10 [ffff880e2bb3f9d8] mdd_changelog_llog_write at ffffffffa0bf74dc [mdd]
#11 [ffff880e2bb3fa28] mdd_changelog_write_header at ffffffffa0bf776b [mdd]
#12 [ffff880e2bb3fa78] mdd_changelog_llog_cancel at ffffffffa0bf7bb9 [mdd]
#13 [ffff880e2bb3fab8] mdd_changelog_user_purge at ffffffffa0bf81c0 [mdd]
#14 [ffff880e2bb3fb18] mdd_iocontrol at ffffffffa0bf857c [mdd]

Comment by Andriy Skulysh [ 15/Mar/13 ]

Please, ignore this issue. It is already fixed in master

Comment by Peter Jones [ 15/Mar/13 ]

ok - thanks Andriy!

Generated at Sat Feb 10 01:29:50 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.