Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7927

Deadlock between ll_setattr() and ll_file_write()->ll_fsync()

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.9.0
    • None
    • 3
    • 9223372036854775807

    Description

      commit 85bd36cc69563d7a79e3ed34f8fadb4ed1a72b7c
      Author: Henri Doreau <henri.doreau@cea.fr>
      Date:   Fri Apr 18 16:17:01 2014 +0200
      
          LU-4840 lfs: Use file lease to implement migration
      

      moves lli_trunc_sem into vvp layer.
      It violates lli_trunc_sem/i_mutex locking order.

      So i_mutex sholud be taken after lli_trunc_sem now.

      Attachments

        Activity

          [LU-7927] Deadlock between ll_setattr() and ll_file_write()->ll_fsync()
          pjones Peter Jones added a comment -

          Landed for 2.9

          pjones Peter Jones added a comment - Landed for 2.9

          Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/19165/
          Subject: LU-7927 llite: Deadlock between ll_setattr and write/ll_fsync
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: 5d60fd75152d10d699ce6e1cc128f12aa6cc86a6

          gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/19165/ Subject: LU-7927 llite: Deadlock between ll_setattr and write/ll_fsync Project: fs/lustre-release Branch: master Current Patch Set: Commit: 5d60fd75152d10d699ce6e1cc128f12aa6cc86a6

          After adding this patch, we see https://jira.hpdd.intel.com/browse/LU-7981. However, I feel strongly that this patch did not cause that bug, it just exposed it. I think that's very clear from the code. I'm about to post a fix for that new deadlock in LU-7981.

          paf Patrick Farrell (Inactive) added a comment - After adding this patch, we see https://jira.hpdd.intel.com/browse/LU-7981 . However, I feel strongly that this patch did not cause that bug, it just exposed it. I think that's very clear from the code. I'm about to post a fix for that new deadlock in LU-7981 .
          bobijam Zhenyu Xu added a comment -

          thank you.

          bobijam Zhenyu Xu added a comment - thank you.

          lli_trunc_sem was taken in vvp_io_write_start().

            [ffff8817eb705bd8] schedule_preempt_disabled at ffffffff81495629
             [ffff8817eb705be8] __mutex_lock_slowpath at ffffffff814961ab
             [ffff8817eb705c40] mutex_lock at ffffffff814962a7
             [ffff8817eb705c58] ll_fsync at ffffffffa089f8b4 [lustre]
             [ffff8817eb705cb0] generic_write_sync at ffffffff811afead
             [ffff8817eb705cc0] vvp_io_write_start at ffffffffa08f81dd [lustre]
             [ffff8817eb705cd0] cl_lock_request at ffffffffa05260c7 [obdclass]
             [ffff8817eb705d20] cl_io_start at ffffffffa0528015 [obdclass]
             [ffff8817eb705d48] cl_io_loop at ffffffffa052b605 [obdclass]
             [ffff8817eb705d78] ll_file_io_generic at ffffffffa08948da [lustre]
             [ffff8817eb705e60] ll_file_aio_write at ffffffffa089507c [lustre]
             [ffff8817eb705eb0] ll_file_write at ffffffffa089579b [lustre]
             [ffff8817eb705f00] vfs_write at ffffffff811823bd
          

          and

            #0 [ffff8818a20c1b58] schedule at ffffffff81494c75
            #1 [ffff8818a20c1bd8] rwsem_down_write_failed at ffffffff81496c85
            #2 [ffff8818a20c1c50] call_rwsem_down_write_failed at ffffffff8126cc23
            #3 [ffff8818a20c1ca0] vvp_io_setattr_start at ffffffffa08f4cd9 [lustre]
            #4 [ffff8818a20c1ce0] cl_io_start at ffffffffa0528015 [obdclass]
            #5 [ffff8818a20c1d08] cl_io_loop at ffffffffa052b605 [obdclass]
            #6 [ffff8818a20c1d38] cl_setattr_ost at ffffffffa08ef250 [lustre]
            #7 [ffff8818a20c1d80] ll_setattr_raw at ffffffffa08c2009 [lustre]
            #8 [ffff8818a20c1e68] ll_setattr at ffffffffa08c2313 [lustre]
            #9 [ffff8818a20c1e78] notify_change at ffffffff8119d371
           #10 [ffff8818a20c1eb8] do_truncate at ffffffff811805dd
           #11 [ffff8818a20c1f28] do_sys_ftruncate.constprop.20 at ffffffff8118092b
           #12 [ffff8818a20c1f70] sys_ftruncate at ffffffff811809be
          
          askulysh Andriy Skulysh added a comment - lli_trunc_sem was taken in vvp_io_write_start(). [ffff8817eb705bd8] schedule_preempt_disabled at ffffffff81495629 [ffff8817eb705be8] __mutex_lock_slowpath at ffffffff814961ab [ffff8817eb705c40] mutex_lock at ffffffff814962a7 [ffff8817eb705c58] ll_fsync at ffffffffa089f8b4 [lustre] [ffff8817eb705cb0] generic_write_sync at ffffffff811afead [ffff8817eb705cc0] vvp_io_write_start at ffffffffa08f81dd [lustre] [ffff8817eb705cd0] cl_lock_request at ffffffffa05260c7 [obdclass] [ffff8817eb705d20] cl_io_start at ffffffffa0528015 [obdclass] [ffff8817eb705d48] cl_io_loop at ffffffffa052b605 [obdclass] [ffff8817eb705d78] ll_file_io_generic at ffffffffa08948da [lustre] [ffff8817eb705e60] ll_file_aio_write at ffffffffa089507c [lustre] [ffff8817eb705eb0] ll_file_write at ffffffffa089579b [lustre] [ffff8817eb705f00] vfs_write at ffffffff811823bd and #0 [ffff8818a20c1b58] schedule at ffffffff81494c75 #1 [ffff8818a20c1bd8] rwsem_down_write_failed at ffffffff81496c85 #2 [ffff8818a20c1c50] call_rwsem_down_write_failed at ffffffff8126cc23 #3 [ffff8818a20c1ca0] vvp_io_setattr_start at ffffffffa08f4cd9 [lustre] #4 [ffff8818a20c1ce0] cl_io_start at ffffffffa0528015 [obdclass] #5 [ffff8818a20c1d08] cl_io_loop at ffffffffa052b605 [obdclass] #6 [ffff8818a20c1d38] cl_setattr_ost at ffffffffa08ef250 [lustre] #7 [ffff8818a20c1d80] ll_setattr_raw at ffffffffa08c2009 [lustre] #8 [ffff8818a20c1e68] ll_setattr at ffffffffa08c2313 [lustre] #9 [ffff8818a20c1e78] notify_change at ffffffff8119d371 #10 [ffff8818a20c1eb8] do_truncate at ffffffff811805dd #11 [ffff8818a20c1f28] do_sys_ftruncate.constprop.20 at ffffffff8118092b #12 [ffff8818a20c1f70] sys_ftruncate at ffffffff811809be
          bobijam Zhenyu Xu added a comment -

          Can you elaborate what calling path takes lli_trunc_sem before i_mutex?

          ll_fsync()->[takes i_mutex]->cl_sync_file_range()->vvp_io_fsync_start() => (returns to ll_fsync()) [put down i_mutex]   * no lli_trunc_sem is involved *
          
          bobijam Zhenyu Xu added a comment - Can you elaborate what calling path takes lli_trunc_sem before i_mutex? ll_fsync()->[takes i_mutex]->cl_sync_file_range()->vvp_io_fsync_start() => (returns to ll_fsync()) [put down i_mutex] * no lli_trunc_sem is involved *

          Andriy Skulysh (andriy.skulysh@seagate.com) uploaded a new patch: http://review.whamcloud.com/19165
          Subject: LU-7927 llite: Deadlock between ll_setattr and write/ll_fsync
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: 5cd324bc8c3adac656fd65a4d30e0ae3d48dfc7d

          gerrit Gerrit Updater added a comment - Andriy Skulysh (andriy.skulysh@seagate.com) uploaded a new patch: http://review.whamcloud.com/19165 Subject: LU-7927 llite: Deadlock between ll_setattr and write/ll_fsync Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 5cd324bc8c3adac656fd65a4d30e0ae3d48dfc7d

          People

            wc-triage WC Triage
            askulysh Andriy Skulysh
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: