[LU-2971] deadlock of changelog storing & canceling Created: 15/Mar/13 Updated: 15/Mar/13 Resolved: 15/Mar/13 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Andriy Skulysh | Assignee: | WC Triage |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 7240 |
| Description |
|
MDT changelog uses incorrect locking order. It should start transaction before writing to llog. |
| Comments |
| Comment by Andriy Skulysh [ 15/Mar/13 ] |
|
Incorrect locking order results in following deadlock: All mdt threads are waiting to start transaction, but current one is deadlocked by following threads: PID: 13564 TASK: ffff880e3ca94100 CPU: 8 COMMAND: "mdt_384" #0 [ffff880e2f567650] schedule at ffffffff814d6f09 #1 [ffff880e2f567718] rwsem_down_failed_common at ffffffff814d9375 #2 [ffff880e2f567778] rwsem_down_write_failed at ffffffff814d94d3 #3 [ffff880e2f5677b8] call_rwsem_down_write_failed at ffffffff8126ee83 #4 [ffff880e2f567818] llog_cat_current_log.clone.0 at ffffffffa058f1cb [obdclass] #5 [ffff880e2f5678b8] llog_cat_add_rec at ffffffffa058feca [obdclass] #6 [ffff880e2f567908] llog_obd_origin_add at ffffffffa0595ad7 [obdclass] #7 [ffff880e2f567938] llog_add at ffffffffa0595cb1 [obdclass] #8 [ffff880e2f567988] mdd_changelog_llog_write at ffffffffa0bf74dc [mdd] #9 [ffff880e2f5679d8] mdd_changelog_ns_store at ffffffffa0be8534 [mdd] #10 [ffff880e2f567a58] mdd_create at ffffffffa0beee7e [mdd] #11 [ffff880e2f567b98] cml_create at ffffffffa0d95467 [cmm] #12 [ffff880e2f567be8] mdt_pdir_hash_lock.clone.0 at ffffffffa0c6792f [mdt] #13 [ffff880e2f567c68] mdt_reint_create at ffffffffa0c67cc8 [mdt PID: 13670 TASK: ffff880e2bb3d580 CPU: 2 COMMAND: "mdt_472" #0 [ffff880e2bb3f5c0] schedule at ffffffff814d6f09 #1 [ffff880e2bb3f688] start_this_handle at ffffffffa03e409a [jbd2] #2 [ffff880e2bb3f748] jbd2_journal_start at ffffffffa03e4510 [jbd2] #3 [ffff880e2bb3f798] ldiskfs_journal_start_sb at ffffffffa0d26b28 [ldiskfs] #4 [ffff880e2bb3f7a8] fsfilt_ldiskfs_write_record at ffffffffa0d7152a [fsfilt_ldiskfs] #5 [ffff880e2bb3f7f8] llog_lvfs_write_blob at ffffffffa059144d [obdclass] #6 [ffff880e2bb3f868] llog_lvfs_write_rec at ffffffffa0592d07 [obdclass] #7 [ffff880e2bb3f908] llog_cat_add_rec at ffffffffa058ff69 [obdclass] #8 [ffff880e2bb3f958] llog_obd_origin_add at ffffffffa0595ad7 [obdclass] #9 [ffff880e2bb3f988] llog_add at ffffffffa0595cb1 [obdclass] #10 [ffff880e2bb3f9d8] mdd_changelog_llog_write at ffffffffa0bf74dc [mdd] #11 [ffff880e2bb3fa28] mdd_changelog_write_header at ffffffffa0bf776b [mdd] #12 [ffff880e2bb3fa78] mdd_changelog_llog_cancel at ffffffffa0bf7bb9 [mdd] #13 [ffff880e2bb3fab8] mdd_changelog_user_purge at ffffffffa0bf81c0 [mdd] #14 [ffff880e2bb3fb18] mdd_iocontrol at ffffffffa0bf857c [mdd] |
| Comment by Andriy Skulysh [ 15/Mar/13 ] |
|
Please, ignore this issue. It is already fixed in master |
| Comment by Peter Jones [ 15/Mar/13 ] |
|
ok - thanks Andriy! |