Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.16.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      PID: 12333 TASK: ffff8d84a294c200 CPU: 8 COMMAND: "ll_ost_io02_086"
      #0 [ffff8d84a29937e0] __schedule at ffffffffa8988e18
      #1 [ffff8d84a2993848] schedule at ffffffffa89891e9
      #2 [ffff8d84a2993858] schedule_timeout at ffffffffa8986eb1
      #3 [ffff8d84a2993908] io_schedule_timeout at ffffffffa8988a9d
      #4 [ffff8d84a2993938] io_schedule at ffffffffa8988b38
      #5 [ffff8d84a2993948] bit_wait_io at ffffffffa8987501
      #6 [ffff8d84a2993960] __wait_on_bit_lock at ffffffffa89870b1
      #7 [ffff8d84a29939a0] __lock_page at ffffffffa83bd2a4
      #8 [ffff8d84a29939f8] truncate_inode_pages_range at ffffffffa83cf2fb
      #9 [ffff8d84a2993b50] truncate_pagecache at ffffffffa83cf3f7
      #10 [ffff8d84a2993b78] osd_punch at ffffffffc14beecc [osd_ldiskfs]
      #11 [ffff8d84a2993bd0] ofd_object_punch at ffffffffc15e7e26 [ofd]
      #12 [ffff8d84a2993c48] ofd_punch_hdl at ffffffffc15d442f [ofd]
      #13 [ffff8d84a2993cd0] tgt_checksum_niobuf_t10pi at ffffffffc0fe909e [ptlrpc]
      #14 [ffff8d84a2993d58] ptlrpc_server_handle_request at ffffffffc0f9090b [ptlrpc]
      #15 [ffff8d84a2993df8] ptlrpc_main at ffffffffc0f94274 [ptlrpc]
      #16 [ffff8d84a2993ec8] kthread at ffffffffa82c5e31
      ......

      PID: 12603 TASK: ffff8d8490db0000 CPU: 14 COMMAND: "ll_ost_io05_068"
      #0 [ffff8d8490daf8a8] __schedule at ffffffffa8988e18
      #1 [ffff8d8490daf910] schedule at ffffffffa89891e9
      #2 [ffff8d8490daf920] rwsem_down_read_failed at ffffffffa898abd5
      #3 [ffff8d8490daf9a0] call_rwsem_down_read_failed at ffffffffa8598068
      #4 [ffff8d8490daf9f0] down_read at ffffffffa89886b0
      #5 [ffff8d8490dafa08] osd_read_lock at ffffffffc148e03c [osd_ldiskfs]
      #6 [ffff8d8490dafa30] ofd_commitrw_write at ffffffffc15eb76c [ofd]
      #7 [ffff8d8490dafac0] ofd_commitrw at ffffffffc15efe4f [ofd]
      #8 [ffff8d8490dafb58] tgt_request_preprocess at ffffffffc0fee11b [ptlrpc]
      #9 [ffff8d8490dafcd0] tgt_checksum_niobuf_t10pi at ffffffffc0fe909e [ptlrpc]
      #10 [ffff8d8490dafd58] ptlrpc_server_handle_request at ffffffffc0f9090b [ptlrpc]
      #11 [ffff8d8490dafdf8] ptlrpc_main at ffffffffc0f94274 [ptlrpc]
      #12 [ffff8d8490dafec8] kthread at ffffffffa82c5e31

      Attachments

        Activity

          [LU-16044] osd: truncate vs write deadlock

          "Etienne AUJAMES <eaujames@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/51360
          Subject: LU-16044 osd: discard pagecache in truncate's declaration
          Project: fs/lustre-release
          Branch: b2_15
          Current Patch Set: 1
          Commit: 536d362534f37e53bae1868b4ea1a044306b69a4

          gerrit Gerrit Updater added a comment - "Etienne AUJAMES <eaujames@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/51360 Subject: LU-16044 osd: discard pagecache in truncate's declaration Project: fs/lustre-release Branch: b2_15 Current Patch Set: 1 Commit: 536d362534f37e53bae1868b4ea1a044306b69a4
          pjones Peter Jones added a comment -

          Landed for 2.16

          pjones Peter Jones added a comment - Landed for 2.16

          "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/48033/
          Subject: LU-16044 osd: discard pagecache in truncate's declaration
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: 0bb491b2ecf494c3f78fa08a101af8af7853a0fe

          gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/48033/ Subject: LU-16044 osd: discard pagecache in truncate's declaration Project: fs/lustre-release Branch: master Current Patch Set: Commit: 0bb491b2ecf494c3f78fa08a101af8af7853a0fe

          Thanks Alex. This is the result from all our OSS on this system. I believe we use the default settings for CentOS 7.9.

          ---------------
          fir-io[1-8]-s[1-2] (16)
          ---------------
          vm.dirty_background_bytes = 0
          vm.dirty_background_ratio = 3
          vm.dirty_bytes = 0
          vm.dirty_expire_centisecs = 3000
          vm.dirty_ratio = 10
          vm.dirty_writeback_centisecs = 500
          

          What we change is the following:

          vm.min_free_kbytes = 2097152
          vm.swappiness = 1
          vm.zone_reclaim_mode = 1
          

          OSS are based on AMD EPYC Naples, single socket 7401P. 512GB of RAM each.

          sthiell Stephane Thiell added a comment - Thanks Alex. This is the result from all our OSS on this system. I believe we use the default settings for CentOS 7.9. --------------- fir-io[1-8]-s[1-2] (16) --------------- vm.dirty_background_bytes = 0 vm.dirty_background_ratio = 3 vm.dirty_bytes = 0 vm.dirty_expire_centisecs = 3000 vm.dirty_ratio = 10 vm.dirty_writeback_centisecs = 500 What we change is the following: vm.min_free_kbytes = 2097152 vm.swappiness = 1 vm.zone_reclaim_mode = 1 OSS are based on AMD EPYC Naples, single socket 7401P. 512GB of RAM each.

          We have asked the users to change their scripts and avoid Lustre to store such temporary files, and we'll see if that reduces the number of OSS deadlock.

          please ask for sysctl -a | grep vm.dirty from OSTs

          bzzz Alex Zhuravlev added a comment - We have asked the users to change their scripts and avoid Lustre to store such temporary files, and we'll see if that reduces the number of OSS deadlock. please ask for sysctl -a | grep vm.dirty from OSTs
          schedule,io_schedule,bit_wait_io,__wait_on_bit_lock,__lock_page,mpage_prepare_extent_to_map,ldiskfs_writepages,do_writepages,__writeback_single_inode,writeback_sb_inodes,__writeback_inodes_wb,wb_writeback,bdi_writeback_workfn,process_one_work,worker_thread
          	PIDs(1): "kworker/u259:2":94708 
          

          this is not the problem I tried to fix. probably better to say it's a related issue. need to think a bit more.. sorry for the inconvenience.

          bzzz Alex Zhuravlev added a comment - schedule,io_schedule,bit_wait_io,__wait_on_bit_lock,__lock_page,mpage_prepare_extent_to_map,ldiskfs_writepages,do_writepages,__writeback_single_inode,writeback_sb_inodes,__writeback_inodes_wb,wb_writeback,bdi_writeback_workfn,process_one_work,worker_thread PIDs(1): "kworker/u259:2" :94708 this is not the problem I tried to fix. probably better to say it's a related issue. need to think a bit more.. sorry for the inconvenience.

          We may have identified the source of the deadlock. A group of users had jobs using GNU parallel with --tmpfile set to Lustre, which apparently uses unlinked tmp files that are kept opened and it does frequent ftruncate(0) on them.

          The command used is:

          parallel --tmpdir $folder/tmp --delay 2 -j $threads < $folder/calls.$cmd.txt
          

          with $folder set to Lustre

          We have asked the users to change their scripts and avoid Lustre to store such temporary files, and we'll see if that reduces the number of OSS deadlock.

           

          sthiell Stephane Thiell added a comment - We may have identified the source of the deadlock. A group of users had jobs using GNU parallel with --tmpfile set to Lustre, which apparently uses unlinked tmp files that are kept opened and it does frequent ftruncate(0) on them. The command used is: parallel --tmpdir $folder/tmp --delay 2 -j $threads < $folder/calls.$cmd.txt with $folder set to Lustre We have asked the users to change their scripts and avoid Lustre to store such temporary files, and we'll see if that reduces the number of OSS deadlock.  

          Unfortunately, even with 2.12.9 with both patches:

          We hit a deadlock situation last night. Attaching "foreach bt" of the new crash dump as foreach_bt_fir-io2-s2_20220909.txt

          sthiell Stephane Thiell added a comment - Unfortunately, even with 2.12.9 with both patches: LU-16044 osd: discard pagecache in truncate's declaration ( https://review.whamcloud.com/48410 LU-15117 ofd: don't take lock for dt_bufs_get() ( https://review.whamcloud.com/47925) We hit a deadlock situation last night. Attaching "foreach bt" of the new crash dump as foreach_bt_fir-io2-s2_20220909.txt

          Alex, this is a backport of your patch to b2_12. Basically I just removed the encryption part that is not available in 2.12. Can you please double check this looks OK to you? When I get your go, we'll try it in production. Thanks!

          sthiell Stephane Thiell added a comment - Alex, this is a backport of your patch to b2_12. Basically I just removed the encryption part that is not available in 2.12. Can you please double check this looks OK to you? When I get your go, we'll try it in production. Thanks!

          "Stephane Thiell <sthiell@stanford.edu>" uploaded a new patch: https://review.whamcloud.com/48410
          Subject: LU-16044 osd: discard pagecache in truncate's declaration
          Project: fs/lustre-release
          Branch: b2_12
          Current Patch Set: 1
          Commit: de4c30e20f4d474ec363835f2ce2456d23896cc4

          gerrit Gerrit Updater added a comment - "Stephane Thiell <sthiell@stanford.edu>" uploaded a new patch: https://review.whamcloud.com/48410 Subject: LU-16044 osd: discard pagecache in truncate's declaration Project: fs/lustre-release Branch: b2_12 Current Patch Set: 1 Commit: de4c30e20f4d474ec363835f2ce2456d23896cc4

          People

            bzzz Alex Zhuravlev
            bzzz Alex Zhuravlev
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: