[LU-16044] osd: truncate vs write deadlock Created: 25/Jul/22 Updated: 19/Jun/23 Resolved: 16/Oct/22 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.16.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Alex Zhuravlev | Assignee: | Alex Zhuravlev |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Attachments: |
|
||||
| Issue Links: |
|
||||
| Severity: | 3 | ||||
| Rank (Obsolete): | 9223372036854775807 | ||||
| Description |
|
| Comments |
| Comment by Gerrit Updater [ 25/Jul/22 ] |
|
"Alex Zhuravlev <bzzz@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/48033 |
| Comment by Gerrit Updater [ 01/Sep/22 ] |
|
"Stephane Thiell <sthiell@stanford.edu>" uploaded a new patch: https://review.whamcloud.com/48410 |
| Comment by Stephane Thiell [ 01/Sep/22 ] |
|
Alex, this is a backport of your patch to b2_12. Basically I just removed the encryption part that is not available in 2.12. Can you please double check this looks OK to you? When I get your go, we'll try it in production. Thanks! |
| Comment by Stephane Thiell [ 09/Sep/22 ] |
|
Unfortunately, even with 2.12.9 with both patches:
We hit a deadlock situation last night. Attaching "foreach bt" of the new crash dump as foreach_bt_fir-io2-s2_20220909.txt |
| Comment by Stephane Thiell [ 09/Sep/22 ] |
|
We may have identified the source of the deadlock. A group of users had jobs using GNU parallel with --tmpfile set to Lustre, which apparently uses unlinked tmp files that are kept opened and it does frequent ftruncate(0) on them. The command used is: parallel --tmpdir $folder/tmp --delay 2 -j $threads < $folder/calls.$cmd.txt with $folder set to Lustre We have asked the users to change their scripts and avoid Lustre to store such temporary files, and we'll see if that reduces the number of OSS deadlock.
|
| Comment by Alex Zhuravlev [ 10/Sep/22 ] |
schedule,io_schedule,bit_wait_io,__wait_on_bit_lock,__lock_page,mpage_prepare_extent_to_map,ldiskfs_writepages,do_writepages,__writeback_single_inode,writeback_sb_inodes,__writeback_inodes_wb,wb_writeback,bdi_writeback_workfn,process_one_work,worker_thread
PIDs(1): "kworker/u259:2":94708
this is not the problem I tried to fix. probably better to say it's a related issue. need to think a bit more.. sorry for the inconvenience. |
| Comment by Alex Zhuravlev [ 10/Sep/22 ] |
please ask for sysctl -a | grep vm.dirty from OSTs |
| Comment by Stephane Thiell [ 10/Sep/22 ] |
|
Thanks Alex. This is the result from all our OSS on this system. I believe we use the default settings for CentOS 7.9. --------------- fir-io[1-8]-s[1-2] (16) --------------- vm.dirty_background_bytes = 0 vm.dirty_background_ratio = 3 vm.dirty_bytes = 0 vm.dirty_expire_centisecs = 3000 vm.dirty_ratio = 10 vm.dirty_writeback_centisecs = 500 What we change is the following: vm.min_free_kbytes = 2097152 vm.swappiness = 1 vm.zone_reclaim_mode = 1 OSS are based on AMD EPYC Naples, single socket 7401P. 512GB of RAM each. |
| Comment by Gerrit Updater [ 15/Oct/22 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/48033/ |
| Comment by Peter Jones [ 16/Oct/22 ] |
|
Landed for 2.16 |
| Comment by Gerrit Updater [ 19/Jun/23 ] |
|
"Etienne AUJAMES <eaujames@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/51360 |