[LU-16044] osd: truncate vs write deadlock Created: 25/Jul/22  Updated: 19/Jun/23  Resolved: 16/Oct/22

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.16.0

Type: Bug Priority: Minor
Reporter: Alex Zhuravlev Assignee: Alex Zhuravlev
Resolution: Fixed Votes: 0
Labels: None

Attachments: Text File foreach_bt_fir-io2-s2_20220909.txt    
Issue Links:
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

PID: 12333 TASK: ffff8d84a294c200 CPU: 8 COMMAND: "ll_ost_io02_086"
#0 [ffff8d84a29937e0] __schedule at ffffffffa8988e18
#1 [ffff8d84a2993848] schedule at ffffffffa89891e9
#2 [ffff8d84a2993858] schedule_timeout at ffffffffa8986eb1
#3 [ffff8d84a2993908] io_schedule_timeout at ffffffffa8988a9d
#4 [ffff8d84a2993938] io_schedule at ffffffffa8988b38
#5 [ffff8d84a2993948] bit_wait_io at ffffffffa8987501
#6 [ffff8d84a2993960] __wait_on_bit_lock at ffffffffa89870b1
#7 [ffff8d84a29939a0] __lock_page at ffffffffa83bd2a4
#8 [ffff8d84a29939f8] truncate_inode_pages_range at ffffffffa83cf2fb
#9 [ffff8d84a2993b50] truncate_pagecache at ffffffffa83cf3f7
#10 [ffff8d84a2993b78] osd_punch at ffffffffc14beecc [osd_ldiskfs]
#11 [ffff8d84a2993bd0] ofd_object_punch at ffffffffc15e7e26 [ofd]
#12 [ffff8d84a2993c48] ofd_punch_hdl at ffffffffc15d442f [ofd]
#13 [ffff8d84a2993cd0] tgt_checksum_niobuf_t10pi at ffffffffc0fe909e [ptlrpc]
#14 [ffff8d84a2993d58] ptlrpc_server_handle_request at ffffffffc0f9090b [ptlrpc]
#15 [ffff8d84a2993df8] ptlrpc_main at ffffffffc0f94274 [ptlrpc]
#16 [ffff8d84a2993ec8] kthread at ffffffffa82c5e31
......

PID: 12603 TASK: ffff8d8490db0000 CPU: 14 COMMAND: "ll_ost_io05_068"
#0 [ffff8d8490daf8a8] __schedule at ffffffffa8988e18
#1 [ffff8d8490daf910] schedule at ffffffffa89891e9
#2 [ffff8d8490daf920] rwsem_down_read_failed at ffffffffa898abd5
#3 [ffff8d8490daf9a0] call_rwsem_down_read_failed at ffffffffa8598068
#4 [ffff8d8490daf9f0] down_read at ffffffffa89886b0
#5 [ffff8d8490dafa08] osd_read_lock at ffffffffc148e03c [osd_ldiskfs]
#6 [ffff8d8490dafa30] ofd_commitrw_write at ffffffffc15eb76c [ofd]
#7 [ffff8d8490dafac0] ofd_commitrw at ffffffffc15efe4f [ofd]
#8 [ffff8d8490dafb58] tgt_request_preprocess at ffffffffc0fee11b [ptlrpc]
#9 [ffff8d8490dafcd0] tgt_checksum_niobuf_t10pi at ffffffffc0fe909e [ptlrpc]
#10 [ffff8d8490dafd58] ptlrpc_server_handle_request at ffffffffc0f9090b [ptlrpc]
#11 [ffff8d8490dafdf8] ptlrpc_main at ffffffffc0f94274 [ptlrpc]
#12 [ffff8d8490dafec8] kthread at ffffffffa82c5e31



 Comments   
Comment by Gerrit Updater [ 25/Jul/22 ]

"Alex Zhuravlev <bzzz@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/48033
Subject: LU-16044 osd: discard pagecache in truncate's declaration
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: f7f107f3f0144b5f15309506dc6bc3509c1d8d70

Comment by Gerrit Updater [ 01/Sep/22 ]

"Stephane Thiell <sthiell@stanford.edu>" uploaded a new patch: https://review.whamcloud.com/48410
Subject: LU-16044 osd: discard pagecache in truncate's declaration
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: de4c30e20f4d474ec363835f2ce2456d23896cc4

Comment by Stephane Thiell [ 01/Sep/22 ]

Alex, this is a backport of your patch to b2_12. Basically I just removed the encryption part that is not available in 2.12. Can you please double check this looks OK to you? When I get your go, we'll try it in production. Thanks!

Comment by Stephane Thiell [ 09/Sep/22 ]

Unfortunately, even with 2.12.9 with both patches:

We hit a deadlock situation last night. Attaching "foreach bt" of the new crash dump as foreach_bt_fir-io2-s2_20220909.txt

Comment by Stephane Thiell [ 09/Sep/22 ]

We may have identified the source of the deadlock. A group of users had jobs using GNU parallel with --tmpfile set to Lustre, which apparently uses unlinked tmp files that are kept opened and it does frequent ftruncate(0) on them.

The command used is:

parallel --tmpdir $folder/tmp --delay 2 -j $threads < $folder/calls.$cmd.txt

with $folder set to Lustre

We have asked the users to change their scripts and avoid Lustre to store such temporary files, and we'll see if that reduces the number of OSS deadlock.

 

Comment by Alex Zhuravlev [ 10/Sep/22 ]
schedule,io_schedule,bit_wait_io,__wait_on_bit_lock,__lock_page,mpage_prepare_extent_to_map,ldiskfs_writepages,do_writepages,__writeback_single_inode,writeback_sb_inodes,__writeback_inodes_wb,wb_writeback,bdi_writeback_workfn,process_one_work,worker_thread
	PIDs(1): "kworker/u259:2":94708 

this is not the problem I tried to fix. probably better to say it's a related issue. need to think a bit more.. sorry for the inconvenience.

Comment by Alex Zhuravlev [ 10/Sep/22 ]

We have asked the users to change their scripts and avoid Lustre to store such temporary files, and we'll see if that reduces the number of OSS deadlock.

please ask for sysctl -a | grep vm.dirty from OSTs

Comment by Stephane Thiell [ 10/Sep/22 ]

Thanks Alex. This is the result from all our OSS on this system. I believe we use the default settings for CentOS 7.9.

---------------
fir-io[1-8]-s[1-2] (16)
---------------
vm.dirty_background_bytes = 0
vm.dirty_background_ratio = 3
vm.dirty_bytes = 0
vm.dirty_expire_centisecs = 3000
vm.dirty_ratio = 10
vm.dirty_writeback_centisecs = 500

What we change is the following:

vm.min_free_kbytes = 2097152
vm.swappiness = 1
vm.zone_reclaim_mode = 1

OSS are based on AMD EPYC Naples, single socket 7401P. 512GB of RAM each.

Comment by Gerrit Updater [ 15/Oct/22 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/48033/
Subject: LU-16044 osd: discard pagecache in truncate's declaration
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 0bb491b2ecf494c3f78fa08a101af8af7853a0fe

Comment by Peter Jones [ 16/Oct/22 ]

Landed for 2.16

Comment by Gerrit Updater [ 19/Jun/23 ]

"Etienne AUJAMES <eaujames@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/51360
Subject: LU-16044 osd: discard pagecache in truncate's declaration
Project: fs/lustre-release
Branch: b2_15
Current Patch Set: 1
Commit: 536d362534f37e53bae1868b4ea1a044306b69a4

Generated at Sat Feb 10 03:23:30 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.