Details
-
Improvement
-
Resolution: Fixed
-
Minor
-
None
-
None
-
9223372036854775807
Description
The inode->i_mutex should be held when calling osd_execute_truncate->ldiskfs_truncate(). However, due to locking order in osd-ldiskfs we cannot just grab i_mutex around ldiskfs_truncate() as that can lead to deadlocks as described in LU-6446 and LU-4252.
Currently, there is an ldiskfs patch to remove the WARN_ON() in ext4_truncate() when it is called without i_mutex held. This is safe because the osd-ldiskfs code holds its own lock (oo_ext_idx_sem) on the object over the ldiskfs_truncate() call, but potentially there is a way to improve this without patching the ldiskfs code.
The truncate-warning patch was first added with:
commit 3daba43a191a0a2c9358d39b3f48b1b759a41819 Author: Yang Sheng <yang.sheng@intel.com> LU-6446 ldiskfs: remove WARN_ON from ldiskfs_orphan_add{del} RHEL7.1 kernel carefully check i_mutex whether locked in many places. It will be triggered while ldiskfs_truncate() was invoked in osd layer. A dead lock would occured if we just locked i_mutex around it since Lustre using a reverse order for start journal==>lock i_mutex. Consider Lustre has own ldlm lock. So just remove such messages in ldiskfs patches. Reviewed-on: http://review.whamcloud.com/14690
so I think moving the lock is not as easily done as what is done in the current patch, see comments in LU-6446 and LU-4252 for details. The code used to look like the following:
if (!(inode->i_state & (I_NEW|I_FREEING))) mutex_lock(&inode->i_mutex); ldiskfs_truncate(inode); if (!(inode->i_state & (I_NEW|I_FREEING))) mutex_unlock(&inode->i_mutex);
but this was removed as part of the patch:
commit b98792ae96ad2bb3c73ca3ebebdd8edfdfbc35df Author: yangsheng <yang.sheng@intel.com> LU-4252 osd: remove locking i_mutex in osd_punch This piece of code intent to calm down the kernel WARN_ON messsage show up since 3.9 upstream. So we can just remove it for now. We'll add a ldiskfs patch to resolve this issue while 3.9 server support come up. Reviewed-on: http://review.whamcloud.com/9644
In more recent developments, LU-10048 changed the truncate code so that it is called asynchronously from the main transaction, so that it doesn't bloat transaction sizes, and avoids lock ordering issues:
commit cf29a5e7bfa254ccfcea023028fe7da80503c512 Author: Alex Zhuravlev <alexey.zhuravlev@intel.com> LU-10048 osd: async truncate osd-ldiskfs should execute truncate outside of main transaction handle. This avoids restarting truncate transaction handles in main transaction, and allows "transaction first, locking second" model on OST. Reviewed-on: https://review.whamcloud.com/27488
It may now be OK to grab i_mutex around ldiskfs_truncate() in osd_execute_truncate() and remove ext4-remove-truncate-warning.patch, but this should be reviewed and tested properly (e.g. with the racer workload that triggered LU-4252 originally).
Attachments
Issue Links
- is related to
-
LU-13234 ldiskfs/namei.c:3310 ldiskfs_orphan_add+0x11e/0x2a0 [ldiskfs]
-
- Resolved
-
- is related to
-
LU-4252 Failure on test suite racer test_1
-
- Resolved
-
-
LU-6446 Warn-on in ldiskfs_orphan_add/del
-
- Resolved
-
-
LU-10048 osd-ldiskfs to truncate outside of main transaction
-
- Resolved
-
-
LU-11832 ARM servers crashing on MDS startup
-
- Resolved
-
-
LU-13183 Linux 5.4 ldiskfs build (drop: ext4-remove-truncate-warning.patch)
-
- Resolved
-
Activity
Link | New: This issue is related to DDN-5195 [ DDN-5195 ] |
Fix Version/s | New: Lustre 2.14.0 [ 14490 ] | |
Resolution | New: Fixed [ 1 ] | |
Status | Original: Open [ 1 ] | New: Resolved [ 5 ] |
Assignee | Original: WC Triage [ wc-triage ] | New: James A Simmons [ simmonsja ] |
Description |
Original:
The {{inode->i_mutex}} should be held when calling {{osd_execute_truncate->ldiskfs_truncate()}}. However, due to locking order in osd-ldiskfs we cannot just grab {{i_mutex}} around {{ldiskfs_truncate()}} as that can lead to deadlocks as described in Currently, there is an ldiskfs patch to remove the {{WARN_ON()}} in {{ext4_truncate()}} when it is called without {{i_mutex}} held. This is safe because the {{osd-ldiskfs}} code holds its own lock ({{oo_ext_idx_sem}}) on the object over the {{ldiskfs_truncate()}} call, but potentially there is a way to improve this without patching the ldiskfs code. The truncate-warning patch was first added with: {noformat} commit 3daba43a191a0a2c9358d39b3f48b1b759a41819 Author: Yang Sheng <yang.sheng@intel.com> RHEL7.1 kernel carefully check i_mutex whether locked in many places. It will be triggered while ldiskfs_truncate() was invoked in osd layer. A dead lock would occured if we just locked i_mutex around it since Lustre using a reverse order for start journal==>lock i_mutex. Consider Lustre has own ldlm lock. So just remove such messages in ldiskfs patches. Reviewed-on: http://review.whamcloud.com/14690 {noformat} so I think moving the lock is not as easily done as what is done in the current patch, see comments in {code} if (!(inode->i_state & (I_NEW|I_FREEING))) mutex_lock(&inode->i_mutex); ldiskfs_truncate(inode); if (!(inode->i_state & (I_NEW|I_FREEING))) mutex_unlock(&inode->i_mutex); {code} but this was removed as part of the patch: {noformat} commit b98792ae96ad2bb3c73ca3ebebdd8edfdfbc35df Author: yangsheng <yang.sheng@intel.com> This piece of code intent to calm down the kernel WARN_ON messsage show up since 3.9 upstream. So we can just remove it for now. We'll add a ldiskfs patch to resolve this issue while 3.9 server support come up. Reviewed-on: http://review.whamcloud.com/9644 {noformat} In more recent developments, {noformat} commit cf29a5e7bfa254ccfcea023028fe7da80503c512 Author: Alex Zhuravlev <alexey.zhuravlev@intel.com> osd-ldiskfs should execute truncate outside of main transaction handle. This avoids restarting truncate transaction handles in main transaction, and allows "transaction first, locking second" model on OST. Reviewed-on: https://review.whamcloud.com/27488 {noformat} It _may_ now be OK to grab {{i_mutex}} around {{ldiskfs_truncate()}} in {{osd_execute_truncate()}}, but this should be reviewed and tested properly (e.g. with the racer workload that triggered |
New:
The {{inode->i_mutex}} should be held when calling {{osd_execute_truncate->ldiskfs_truncate()}}. However, due to locking order in osd-ldiskfs we cannot just grab {{i_mutex}} around {{ldiskfs_truncate()}} as that can lead to deadlocks as described in Currently, there is an ldiskfs patch to remove the {{WARN_ON()}} in {{ext4_truncate()}} when it is called without {{i_mutex}} held. This is safe because the {{osd-ldiskfs}} code holds its own lock ({{oo_ext_idx_sem}}) on the object over the {{ldiskfs_truncate()}} call, but potentially there is a way to improve this without patching the ldiskfs code. The truncate-warning patch was first added with: {noformat} commit 3daba43a191a0a2c9358d39b3f48b1b759a41819 Author: Yang Sheng <yang.sheng@intel.com> RHEL7.1 kernel carefully check i_mutex whether locked in many places. It will be triggered while ldiskfs_truncate() was invoked in osd layer. A dead lock would occured if we just locked i_mutex around it since Lustre using a reverse order for start journal==>lock i_mutex. Consider Lustre has own ldlm lock. So just remove such messages in ldiskfs patches. Reviewed-on: http://review.whamcloud.com/14690 {noformat} so I think moving the lock is not as easily done as what is done in the current patch, see comments in {code} if (!(inode->i_state & (I_NEW|I_FREEING))) mutex_lock(&inode->i_mutex); ldiskfs_truncate(inode); if (!(inode->i_state & (I_NEW|I_FREEING))) mutex_unlock(&inode->i_mutex); {code} but this was removed as part of the patch: {noformat} commit b98792ae96ad2bb3c73ca3ebebdd8edfdfbc35df Author: yangsheng <yang.sheng@intel.com> This piece of code intent to calm down the kernel WARN_ON messsage show up since 3.9 upstream. So we can just remove it for now. We'll add a ldiskfs patch to resolve this issue while 3.9 server support come up. Reviewed-on: http://review.whamcloud.com/9644 {noformat} In more recent developments, {noformat} commit cf29a5e7bfa254ccfcea023028fe7da80503c512 Author: Alex Zhuravlev <alexey.zhuravlev@intel.com> osd-ldiskfs should execute truncate outside of main transaction handle. This avoids restarting truncate transaction handles in main transaction, and allows "transaction first, locking second" model on OST. Reviewed-on: https://review.whamcloud.com/27488 {noformat} It _may_ now be OK to grab {{i_mutex}} around {{ldiskfs_truncate()}} in {{osd_execute_truncate()}} and remove {{ext4-remove-truncate-warning.patch}}, but this should be reviewed and tested properly (e.g. with the racer workload that triggered |
Summary | Original: fix i_mutex for ldiskfs_truncate() in osd_punch() | New: fix i_mutex for ldiskfs_truncate() in osd_execute_truncate() |
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37116/
Subject:
LU-12977ldiskfs: properly take inode_lock() for truncatesProject: fs/lustre-release
Branch: master
Current Patch Set:
Commit: f64e9f19f68e3983024520bd80122f12367a235a