Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7988

HSM: high lock contention for cdt_llog_lock

Details

    • 3
    • 9223372036854775807

    Description

      There is an important locking issue around cdt_llog_lock when adding new HSM requests.

      # time wc -l /proc/fs/lustre/mdt/snx11133-MDT0000/hsm/actions
      219759 /proc/fs/lustre/mdt/snx11133-MDT0000/hsm/actions
      
      real    11m45.068s
      user    0m0.020s
      sys     0m21.372s
      

      11 minutes to cat the list is too high. Such operation should take a couple seconds at most.

      The contention appears to come from the coordinator. Every time a new request is posted, the whole list of request is browsed, under that lock. That's not a problem when there is only a handful of request, but it doesn't scale when there is hundreds of thousands of them.

      I recompiled a centos 7 kernel with CONFIG_LOCK_STAT on a VM. I ran test creating 10000 files and archiving them without a copytool present. Total time was 146 seconds. Lock contention result:

      lock_stat version 0.3
      -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                                    class name    con-bounces    contentions   waittime-min   waittime-max waittime-total    acq-bounces   acquisitions   holdtime-min   holdtime-max holdtime-total
      -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
      
      [...]
      
                           &cdt->cdt_llog_lock:          6296           6296          15.45       23074.17    43436574.06          17791          27134          25.09       37745.03   138558199.24
                           -------------------
                           &cdt->cdt_llog_lock           6296          [<ffffffffa0fb096d>] cdt_llog_process+0x9d/0x3a0 [mdt]
                           -------------------
                           &cdt->cdt_llog_lock           6296          [<ffffffffa0fb096d>] cdt_llog_process+0x9d/0x3a0 [mdt]
      [...]
      

      (time units are micro-seconds).

      With waittime-total=43 seconds and holdtime-total=138s, this is a very contentious lock, way above the other locks in Lustre or the whole system.

      AFAICS, contention is between these mechanisms:

      • adding a new request (lfs hsm_archive, ...)
      • changing a request status (WAITING->STARTED->SUCCEED)
      • removing a request (archive completed)
      • housekeeping (coordinator loop every 10 seconds)
      • dumping the list of actions from /proc

      The net result is that when there is a lot of requests, they trickle down to the copytool, exacerbating the problem by increasing the number in the list.

      Attachments

        Issue Links

          Activity

            [LU-7988] HSM: high lock contention for cdt_llog_lock

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/28908/
            Subject: LU-7988 hsm: added coordinator housekeeping flag
            Project: fs/lustre-release
            Branch: b2_10
            Current Patch Set:
            Commit: 387dca70723affc0b45b87730e994395b9627eef

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/28908/ Subject: LU-7988 hsm: added coordinator housekeeping flag Project: fs/lustre-release Branch: b2_10 Current Patch Set: Commit: 387dca70723affc0b45b87730e994395b9627eef

            It seems this ticket was wrongly flagged to 2.10.6.

            The last patch being landed on b2_10 from this series was landed for 2.10.4

            There is still one backport which was not landed and is interesting: https://review.whamcloud.com/#/c/28908/2

            adegremont Aurelien Degremont (Inactive) added a comment - It seems this ticket was wrongly flagged to 2.10.6. The last patch being landed on b2_10 from this series was landed for 2.10.4 There is still one backport which was not landed and is interesting: https://review.whamcloud.com/#/c/28908/2

            John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/29583/
            Subject: LU-7988 hsm: split mdt_hsm_add_actions()
            Project: fs/lustre-release
            Branch: b2_10
            Current Patch Set:
            Commit: 03fc1a015f5869b68d21eee69531a201045532e1

            gerrit Gerrit Updater added a comment - John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/29583/ Subject: LU-7988 hsm: split mdt_hsm_add_actions() Project: fs/lustre-release Branch: b2_10 Current Patch Set: Commit: 03fc1a015f5869b68d21eee69531a201045532e1

            John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/30538/
            Subject: LU-7988 hsm: wake up cdt when requests are empty
            Project: fs/lustre-release
            Branch: b2_10
            Current Patch Set:
            Commit: d5a7563a46acf15252929b6a045eef8c58022314

            gerrit Gerrit Updater added a comment - John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/30538/ Subject: LU-7988 hsm: wake up cdt when requests are empty Project: fs/lustre-release Branch: b2_10 Current Patch Set: Commit: d5a7563a46acf15252929b6a045eef8c58022314

            Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/30538
            Subject: LU-7988 hsm: wake up cdt when requests are empty
            Project: fs/lustre-release
            Branch: b2_10
            Current Patch Set: 1
            Commit: 7121b1d335453f31766f2f59909d20b7cde3e948

            gerrit Gerrit Updater added a comment - Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/30538 Subject: LU-7988 hsm: wake up cdt when requests are empty Project: fs/lustre-release Branch: b2_10 Current Patch Set: 1 Commit: 7121b1d335453f31766f2f59909d20b7cde3e948
            pjones Peter Jones added a comment -

            As we finally seem to be at an end of patches queued up for this ticket let's close it and open a new ticket to track any new fixes identified in this area of code in the future.

            pjones Peter Jones added a comment - As we finally seem to be at an end of patches queued up for this ticket let's close it and open a new ticket to track any new fixes identified in this area of code in the future.

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/29742/
            Subject: LU-7988 hsm: wake up cdt when requests are empty
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 7251fea8dc3c4d29e30c5a3f763c4c33d35f90a7

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/29742/ Subject: LU-7988 hsm: wake up cdt when requests are empty Project: fs/lustre-release Branch: master Current Patch Set: Commit: 7251fea8dc3c4d29e30c5a3f763c4c33d35f90a7

            John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/28973/
            Subject: LU-7988 hsm: update many cookie status at once
            Project: fs/lustre-release
            Branch: b2_10
            Current Patch Set:
            Commit: 019c48b649d0f668563cf72495d1d1b02f4f69c0

            gerrit Gerrit Updater added a comment - John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/28973/ Subject: LU-7988 hsm: update many cookie status at once Project: fs/lustre-release Branch: b2_10 Current Patch Set: Commit: 019c48b649d0f668563cf72495d1d1b02f4f69c0

            Ben Evans (bevans@cray.com) uploaded a new patch: https://review.whamcloud.com/29742
            Subject: LU-7988 hsm: wake up cdt when requests are empty
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 0556c7c9bf289ac34cda099819d23ce0063302a2

            gerrit Gerrit Updater added a comment - Ben Evans (bevans@cray.com) uploaded a new patch: https://review.whamcloud.com/29742 Subject: LU-7988 hsm: wake up cdt when requests are empty Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 0556c7c9bf289ac34cda099819d23ce0063302a2

            Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/29583
            Subject: LU-7988 hsm: split mdt_hsm_add_actions()
            Project: fs/lustre-release
            Branch: b2_10
            Current Patch Set: 1
            Commit: 80ce8998119a9bf125c014c94f8fa712a47e0b47

            gerrit Gerrit Updater added a comment - Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/29583 Subject: LU-7988 hsm: split mdt_hsm_add_actions() Project: fs/lustre-release Branch: b2_10 Current Patch Set: 1 Commit: 80ce8998119a9bf125c014c94f8fa712a47e0b47

            People

              fzago Frank Zago (Inactive)
              fzago Frank Zago (Inactive)
              Votes:
              2 Vote for this issue
              Watchers:
              30 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: