[LU-7927] Deadlock between ll_setattr() and ll_file_write()->ll_fsync() Created: 28/Mar/16  Updated: 02/Sep/16  Resolved: 02/Sep/16

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.9.0

Type: Bug Priority: Minor
Reporter: Andriy Skulysh Assignee: WC Triage
Resolution: Fixed Votes: 0
Labels: patch

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   
commit 85bd36cc69563d7a79e3ed34f8fadb4ed1a72b7c
Author: Henri Doreau <henri.doreau@cea.fr>
Date:   Fri Apr 18 16:17:01 2014 +0200

    LU-4840 lfs: Use file lease to implement migration

moves lli_trunc_sem into vvp layer.
It violates lli_trunc_sem/i_mutex locking order.

So i_mutex sholud be taken after lli_trunc_sem now.



 Comments   
Comment by Gerrit Updater [ 28/Mar/16 ]

Andriy Skulysh (andriy.skulysh@seagate.com) uploaded a new patch: http://review.whamcloud.com/19165
Subject: LU-7927 llite: Deadlock between ll_setattr and write/ll_fsync
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 5cd324bc8c3adac656fd65a4d30e0ae3d48dfc7d

Comment by Zhenyu Xu [ 29/Mar/16 ]

Can you elaborate what calling path takes lli_trunc_sem before i_mutex?

ll_fsync()->[takes i_mutex]->cl_sync_file_range()->vvp_io_fsync_start() => (returns to ll_fsync()) [put down i_mutex]   * no lli_trunc_sem is involved *
Comment by Andriy Skulysh [ 29/Mar/16 ]

lli_trunc_sem was taken in vvp_io_write_start().

  [ffff8817eb705bd8] schedule_preempt_disabled at ffffffff81495629
   [ffff8817eb705be8] __mutex_lock_slowpath at ffffffff814961ab
   [ffff8817eb705c40] mutex_lock at ffffffff814962a7
   [ffff8817eb705c58] ll_fsync at ffffffffa089f8b4 [lustre]
   [ffff8817eb705cb0] generic_write_sync at ffffffff811afead
   [ffff8817eb705cc0] vvp_io_write_start at ffffffffa08f81dd [lustre]
   [ffff8817eb705cd0] cl_lock_request at ffffffffa05260c7 [obdclass]
   [ffff8817eb705d20] cl_io_start at ffffffffa0528015 [obdclass]
   [ffff8817eb705d48] cl_io_loop at ffffffffa052b605 [obdclass]
   [ffff8817eb705d78] ll_file_io_generic at ffffffffa08948da [lustre]
   [ffff8817eb705e60] ll_file_aio_write at ffffffffa089507c [lustre]
   [ffff8817eb705eb0] ll_file_write at ffffffffa089579b [lustre]
   [ffff8817eb705f00] vfs_write at ffffffff811823bd

and

  #0 [ffff8818a20c1b58] schedule at ffffffff81494c75
  #1 [ffff8818a20c1bd8] rwsem_down_write_failed at ffffffff81496c85
  #2 [ffff8818a20c1c50] call_rwsem_down_write_failed at ffffffff8126cc23
  #3 [ffff8818a20c1ca0] vvp_io_setattr_start at ffffffffa08f4cd9 [lustre]
  #4 [ffff8818a20c1ce0] cl_io_start at ffffffffa0528015 [obdclass]
  #5 [ffff8818a20c1d08] cl_io_loop at ffffffffa052b605 [obdclass]
  #6 [ffff8818a20c1d38] cl_setattr_ost at ffffffffa08ef250 [lustre]
  #7 [ffff8818a20c1d80] ll_setattr_raw at ffffffffa08c2009 [lustre]
  #8 [ffff8818a20c1e68] ll_setattr at ffffffffa08c2313 [lustre]
  #9 [ffff8818a20c1e78] notify_change at ffffffff8119d371
 #10 [ffff8818a20c1eb8] do_truncate at ffffffff811805dd
 #11 [ffff8818a20c1f28] do_sys_ftruncate.constprop.20 at ffffffff8118092b
 #12 [ffff8818a20c1f70] sys_ftruncate at ffffffff811809be
Comment by Zhenyu Xu [ 29/Mar/16 ]

thank you.

Comment by Patrick Farrell (Inactive) [ 04/Apr/16 ]

After adding this patch, we see https://jira.hpdd.intel.com/browse/LU-7981. However, I feel strongly that this patch did not cause that bug, it just exposed it. I think that's very clear from the code. I'm about to post a fix for that new deadlock in LU-7981.

Comment by Gerrit Updater [ 02/Sep/16 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/19165/
Subject: LU-7927 llite: Deadlock between ll_setattr and write/ll_fsync
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 5d60fd75152d10d699ce6e1cc128f12aa6cc86a6

Comment by Peter Jones [ 02/Sep/16 ]

Landed for 2.9

Generated at Sat Feb 10 02:13:07 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.