[LU-7927] Deadlock between ll_setattr() and ll_file_write()->ll_fsync() Created: 28/Mar/16 Updated: 02/Sep/16 Resolved: 02/Sep/16 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.9.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Andriy Skulysh | Assignee: | WC Triage |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | patch | ||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
commit 85bd36cc69563d7a79e3ed34f8fadb4ed1a72b7c
Author: Henri Doreau <henri.doreau@cea.fr>
Date: Fri Apr 18 16:17:01 2014 +0200
LU-4840 lfs: Use file lease to implement migration
moves lli_trunc_sem into vvp layer. So i_mutex sholud be taken after lli_trunc_sem now. |
| Comments |
| Comment by Gerrit Updater [ 28/Mar/16 ] |
|
Andriy Skulysh (andriy.skulysh@seagate.com) uploaded a new patch: http://review.whamcloud.com/19165 |
| Comment by Zhenyu Xu [ 29/Mar/16 ] |
|
Can you elaborate what calling path takes lli_trunc_sem before i_mutex? ll_fsync()->[takes i_mutex]->cl_sync_file_range()->vvp_io_fsync_start() => (returns to ll_fsync()) [put down i_mutex] * no lli_trunc_sem is involved * |
| Comment by Andriy Skulysh [ 29/Mar/16 ] |
|
lli_trunc_sem was taken in vvp_io_write_start(). [ffff8817eb705bd8] schedule_preempt_disabled at ffffffff81495629 [ffff8817eb705be8] __mutex_lock_slowpath at ffffffff814961ab [ffff8817eb705c40] mutex_lock at ffffffff814962a7 [ffff8817eb705c58] ll_fsync at ffffffffa089f8b4 [lustre] [ffff8817eb705cb0] generic_write_sync at ffffffff811afead [ffff8817eb705cc0] vvp_io_write_start at ffffffffa08f81dd [lustre] [ffff8817eb705cd0] cl_lock_request at ffffffffa05260c7 [obdclass] [ffff8817eb705d20] cl_io_start at ffffffffa0528015 [obdclass] [ffff8817eb705d48] cl_io_loop at ffffffffa052b605 [obdclass] [ffff8817eb705d78] ll_file_io_generic at ffffffffa08948da [lustre] [ffff8817eb705e60] ll_file_aio_write at ffffffffa089507c [lustre] [ffff8817eb705eb0] ll_file_write at ffffffffa089579b [lustre] [ffff8817eb705f00] vfs_write at ffffffff811823bd and #0 [ffff8818a20c1b58] schedule at ffffffff81494c75 #1 [ffff8818a20c1bd8] rwsem_down_write_failed at ffffffff81496c85 #2 [ffff8818a20c1c50] call_rwsem_down_write_failed at ffffffff8126cc23 #3 [ffff8818a20c1ca0] vvp_io_setattr_start at ffffffffa08f4cd9 [lustre] #4 [ffff8818a20c1ce0] cl_io_start at ffffffffa0528015 [obdclass] #5 [ffff8818a20c1d08] cl_io_loop at ffffffffa052b605 [obdclass] #6 [ffff8818a20c1d38] cl_setattr_ost at ffffffffa08ef250 [lustre] #7 [ffff8818a20c1d80] ll_setattr_raw at ffffffffa08c2009 [lustre] #8 [ffff8818a20c1e68] ll_setattr at ffffffffa08c2313 [lustre] #9 [ffff8818a20c1e78] notify_change at ffffffff8119d371 #10 [ffff8818a20c1eb8] do_truncate at ffffffff811805dd #11 [ffff8818a20c1f28] do_sys_ftruncate.constprop.20 at ffffffff8118092b #12 [ffff8818a20c1f70] sys_ftruncate at ffffffff811809be |
| Comment by Zhenyu Xu [ 29/Mar/16 ] |
|
thank you. |
| Comment by Patrick Farrell (Inactive) [ 04/Apr/16 ] |
|
After adding this patch, we see https://jira.hpdd.intel.com/browse/LU-7981. However, I feel strongly that this patch did not cause that bug, it just exposed it. I think that's very clear from the code. I'm about to post a fix for that new deadlock in |
| Comment by Gerrit Updater [ 02/Sep/16 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/19165/ |
| Comment by Peter Jones [ 02/Sep/16 ] |
|
Landed for 2.9 |