Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12977

fix i_mutex for ldiskfs_truncate() in osd_execute_truncate()

Details

    • Improvement
    • Resolution: Fixed
    • Minor
    • Lustre 2.14.0
    • None
    • None
    • 9223372036854775807

    Description

      The inode->i_mutex should be held when calling osd_execute_truncate->ldiskfs_truncate(). However, due to locking order in osd-ldiskfs we cannot just grab i_mutex around ldiskfs_truncate() as that can lead to deadlocks as described in LU-6446 and LU-4252.

      Currently, there is an ldiskfs patch to remove the WARN_ON() in ext4_truncate() when it is called without i_mutex held. This is safe because the osd-ldiskfs code holds its own lock (oo_ext_idx_sem) on the object over the ldiskfs_truncate() call, but potentially there is a way to improve this without patching the ldiskfs code.

      The truncate-warning patch was first added with:

       commit 3daba43a191a0a2c9358d39b3f48b1b759a41819
       Author:     Yang Sheng <yang.sheng@intel.com>
      
          LU-6446 ldiskfs: remove WARN_ON from ldiskfs_orphan_add{del}
          
          RHEL7.1 kernel carefully check i_mutex whether locked in
          many places. It will be triggered while ldiskfs_truncate()
          was invoked in osd layer. A dead lock would occured if we
          just locked i_mutex around it since Lustre using a reverse
          order for start journal==>lock i_mutex. Consider Lustre has
          own ldlm lock. So just remove such messages in ldiskfs patches.
      
          Reviewed-on: http://review.whamcloud.com/14690
      

      so I think moving the lock is not as easily done as what is done in the current patch, see comments in LU-6446 and LU-4252 for details. The code used to look like the following:

             if (!(inode->i_state & (I_NEW|I_FREEING)))
                     mutex_lock(&inode->i_mutex);
             ldiskfs_truncate(inode);
             if (!(inode->i_state & (I_NEW|I_FREEING)))
                     mutex_unlock(&inode->i_mutex);
      

      but this was removed as part of the patch:

       commit b98792ae96ad2bb3c73ca3ebebdd8edfdfbc35df
       Author:     yangsheng <yang.sheng@intel.com>
      
          LU-4252 osd: remove locking i_mutex in osd_punch
          
          This piece of code intent to calm down the kernel
          WARN_ON messsage show up since 3.9 upstream. So we
          can just remove it for now. We'll add a ldiskfs
          patch to resolve this issue while 3.9 server support
          come up.
      
          Reviewed-on: http://review.whamcloud.com/9644
      

      In more recent developments, LU-10048 changed the truncate code so that it is called asynchronously from the main transaction, so that it doesn't bloat transaction sizes, and avoids lock ordering issues:

      commit cf29a5e7bfa254ccfcea023028fe7da80503c512
      Author:     Alex Zhuravlev <alexey.zhuravlev@intel.com>
      
          LU-10048 osd: async truncate
          
          osd-ldiskfs should execute truncate outside of main transaction
          handle. This avoids restarting truncate transaction handles in
          main transaction, and allows "transaction first, locking second"
          model on OST.
          
          Reviewed-on: https://review.whamcloud.com/27488
      

      It may now be OK to grab i_mutex around ldiskfs_truncate() in osd_execute_truncate() and remove ext4-remove-truncate-warning.patch, but this should be reviewed and tested properly (e.g. with the racer workload that triggered LU-4252 originally).

      Attachments

        Issue Links

          Activity

            [LU-12977] fix i_mutex for ldiskfs_truncate() in osd_execute_truncate()
            pjones Peter Jones made changes -
            Link New: This issue is related to DDN-5195 [ DDN-5195 ]
            adilger Andreas Dilger made changes -
            Link New: This issue is related to LU-13183 [ LU-13183 ]
            adilger Andreas Dilger made changes -
            Link New: This issue is related to LU-13234 [ LU-13234 ]
            simmonsja James A Simmons made changes -
            Fix Version/s New: Lustre 2.14.0 [ 14490 ]
            Resolution New: Fixed [ 1 ]
            Status Original: Open [ 1 ] New: Resolved [ 5 ]

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37116/
            Subject: LU-12977 ldiskfs: properly take inode_lock() for truncates
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: f64e9f19f68e3983024520bd80122f12367a235a

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37116/ Subject: LU-12977 ldiskfs: properly take inode_lock() for truncates Project: fs/lustre-release Branch: master Current Patch Set: Commit: f64e9f19f68e3983024520bd80122f12367a235a
            simmonsja James A Simmons made changes -
            Assignee Original: WC Triage [ wc-triage ] New: James A Simmons [ simmonsja ]

            James Simmons (jsimmons@infradead.org) uploaded a new patch: https://review.whamcloud.com/37116
            Subject: LU-12977 ldiskfs: properly take inode_lock() for truncates
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: b8c9dbc6da6215c828b59e5654f66155da7d1382

            gerrit Gerrit Updater added a comment - James Simmons (jsimmons@infradead.org) uploaded a new patch: https://review.whamcloud.com/37116 Subject: LU-12977 ldiskfs: properly take inode_lock() for truncates Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: b8c9dbc6da6215c828b59e5654f66155da7d1382
            adilger Andreas Dilger made changes -
            Description Original: The {{inode->i_mutex}} should be held when calling {{osd_execute_truncate->ldiskfs_truncate()}}. However, due to locking order in osd-ldiskfs we cannot just grab {{i_mutex}} around {{ldiskfs_truncate()}} as that can lead to deadlocks as described in LU-6446 and LU-4252.

            Currently, there is an ldiskfs patch to remove the {{WARN_ON()}} in {{ext4_truncate()}} when it is called without {{i_mutex}} held. This is safe because the {{osd-ldiskfs}} code holds its own lock ({{oo_ext_idx_sem}}) on the object over the {{ldiskfs_truncate()}} call, but potentially there is a way to improve this without patching the ldiskfs code.

            The truncate-warning patch was first added with:
            {noformat}
             commit 3daba43a191a0a2c9358d39b3f48b1b759a41819
             Author: Yang Sheng <yang.sheng@intel.com>

                LU-6446 ldiskfs: remove WARN_ON from ldiskfs_orphan_add{del}
                
                RHEL7.1 kernel carefully check i_mutex whether locked in
                many places. It will be triggered while ldiskfs_truncate()
                was invoked in osd layer. A dead lock would occured if we
                just locked i_mutex around it since Lustre using a reverse
                order for start journal==>lock i_mutex. Consider Lustre has
                own ldlm lock. So just remove such messages in ldiskfs patches.

                Reviewed-on: http://review.whamcloud.com/14690
            {noformat}

            so I think moving the lock is not as easily done as what is done in the current patch, see comments in LU-6446 and LU-4252 for details. The code _used_ to look like the following:
            {code}
                   if (!(inode->i_state & (I_NEW|I_FREEING)))
                           mutex_lock(&inode->i_mutex);
                   ldiskfs_truncate(inode);
                   if (!(inode->i_state & (I_NEW|I_FREEING)))
                           mutex_unlock(&inode->i_mutex);
            {code}

            but this was removed as part of the patch:
            {noformat}
             commit b98792ae96ad2bb3c73ca3ebebdd8edfdfbc35df
             Author: yangsheng <yang.sheng@intel.com>

                LU-4252 osd: remove locking i_mutex in osd_punch
                
                This piece of code intent to calm down the kernel
                WARN_ON messsage show up since 3.9 upstream. So we
                can just remove it for now. We'll add a ldiskfs
                patch to resolve this issue while 3.9 server support
                come up.

                Reviewed-on: http://review.whamcloud.com/9644
            {noformat}

            In more recent developments, LU-10048 changed the truncate code so that it is called asynchronously from the main transaction, so that it doesn't bloat transaction sizes, and avoids lock ordering issues:
            {noformat}
            commit cf29a5e7bfa254ccfcea023028fe7da80503c512
            Author: Alex Zhuravlev <alexey.zhuravlev@intel.com>

                LU-10048 osd: async truncate
                
                osd-ldiskfs should execute truncate outside of main transaction
                handle. This avoids restarting truncate transaction handles in
                main transaction, and allows "transaction first, locking second"
                model on OST.
                
                Reviewed-on: https://review.whamcloud.com/27488
            {noformat}

            It _may_ now be OK to grab {{i_mutex}} around {{ldiskfs_truncate()}} in {{osd_execute_truncate()}}, but this should be reviewed and tested properly (e.g. with the racer workload that triggered LU-4252 originally)/
            New: The {{inode->i_mutex}} should be held when calling {{osd_execute_truncate->ldiskfs_truncate()}}. However, due to locking order in osd-ldiskfs we cannot just grab {{i_mutex}} around {{ldiskfs_truncate()}} as that can lead to deadlocks as described in LU-6446 and LU-4252.

            Currently, there is an ldiskfs patch to remove the {{WARN_ON()}} in {{ext4_truncate()}} when it is called without {{i_mutex}} held. This is safe because the {{osd-ldiskfs}} code holds its own lock ({{oo_ext_idx_sem}}) on the object over the {{ldiskfs_truncate()}} call, but potentially there is a way to improve this without patching the ldiskfs code.

            The truncate-warning patch was first added with:
            {noformat}
             commit 3daba43a191a0a2c9358d39b3f48b1b759a41819
             Author: Yang Sheng <yang.sheng@intel.com>

                LU-6446 ldiskfs: remove WARN_ON from ldiskfs_orphan_add{del}
                
                RHEL7.1 kernel carefully check i_mutex whether locked in
                many places. It will be triggered while ldiskfs_truncate()
                was invoked in osd layer. A dead lock would occured if we
                just locked i_mutex around it since Lustre using a reverse
                order for start journal==>lock i_mutex. Consider Lustre has
                own ldlm lock. So just remove such messages in ldiskfs patches.

                Reviewed-on: http://review.whamcloud.com/14690
            {noformat}

            so I think moving the lock is not as easily done as what is done in the current patch, see comments in LU-6446 and LU-4252 for details. The code _used_ to look like the following:
            {code}
                   if (!(inode->i_state & (I_NEW|I_FREEING)))
                           mutex_lock(&inode->i_mutex);
                   ldiskfs_truncate(inode);
                   if (!(inode->i_state & (I_NEW|I_FREEING)))
                           mutex_unlock(&inode->i_mutex);
            {code}

            but this was removed as part of the patch:
            {noformat}
             commit b98792ae96ad2bb3c73ca3ebebdd8edfdfbc35df
             Author: yangsheng <yang.sheng@intel.com>

                LU-4252 osd: remove locking i_mutex in osd_punch
                
                This piece of code intent to calm down the kernel
                WARN_ON messsage show up since 3.9 upstream. So we
                can just remove it for now. We'll add a ldiskfs
                patch to resolve this issue while 3.9 server support
                come up.

                Reviewed-on: http://review.whamcloud.com/9644
            {noformat}

            In more recent developments, LU-10048 changed the truncate code so that it is called asynchronously from the main transaction, so that it doesn't bloat transaction sizes, and avoids lock ordering issues:
            {noformat}
            commit cf29a5e7bfa254ccfcea023028fe7da80503c512
            Author: Alex Zhuravlev <alexey.zhuravlev@intel.com>

                LU-10048 osd: async truncate
                
                osd-ldiskfs should execute truncate outside of main transaction
                handle. This avoids restarting truncate transaction handles in
                main transaction, and allows "transaction first, locking second"
                model on OST.
                
                Reviewed-on: https://review.whamcloud.com/27488
            {noformat}

            It _may_ now be OK to grab {{i_mutex}} around {{ldiskfs_truncate()}} in {{osd_execute_truncate()}} and remove {{ext4-remove-truncate-warning.patch}}, but this should be reviewed and tested properly (e.g. with the racer workload that triggered LU-4252 originally).
            adilger Andreas Dilger made changes -
            Summary Original: fix i_mutex for ldiskfs_truncate() in osd_punch() New: fix i_mutex for ldiskfs_truncate() in osd_execute_truncate()
            adilger Andreas Dilger made changes -
            Link New: This issue is related to LU-10048 [ LU-10048 ]
            adilger Andreas Dilger made changes -
            Link New: This issue is related to LU-4252 [ LU-4252 ]
            adilger Andreas Dilger made changes -
            Link New: This issue is related to LU-11832 [ LU-11832 ]
            adilger Andreas Dilger made changes -
            Link New: This issue is related to LU-6446 [ LU-6446 ]

            People

              simmonsja James A Simmons
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: