Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9660

reduce ptlrpcd wakeups on idle system

Details

    • 3
    • 9223372036854775807

    Description

      When a client is not actively using Lustre, there is still background activity (dirty page flushing, lock cancellation, pings, etc.) that wakes a number of different threads. This background activity can cause small delays in user-space threads, and in tightly-coupled HPC applications running across a large number of clients (typically MPI with barriers) this jitter causes all of the clients to be delayed all of the time.

      It would be beneficial to determine which threads are commonly being woken when there is no work to be done (in particular ptlrpcd) and:

      • avoid waking them on a periodic basis (e.g. 1s) and instead only wake them when there is work to be done (e.g. send ping, cancel a lock, etc)
      • coordinate wakeups across clients (e.g. on even multiples of 1s, 5s, etc) so that the delays are affecting all clients at one time rather than spread continuously across the application timesteps. This can potentially be problematic, if it causes large load spikes on the servers (essentially DDOS).
      • pre-emptively perform work that would happen in the near future (e.g. cancel locks that will expire in the next few seconds, send an earlier ping, etc).
      • completely disconnect from the server(s) if there is no activity (LU-7236) to avoid pings completely and reduce the number of clients they need to recover when the system is idle.

      Attachments

        Issue Links

          Activity

            [LU-9660] reduce ptlrpcd wakeups on idle system
            mdiep Minh Diep made changes -
            Labels Original: LTS medium New: medium

            John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/29969/
            Subject: LU-9660 ptlrpc: do not wakeup every second
            Project: fs/lustre-release
            Branch: b2_10
            Current Patch Set:
            Commit: 7faa2fec32b9d8a8f9dd646ca0b418c754cd8d0a

            gerrit Gerrit Updater added a comment - John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/29969/ Subject: LU-9660 ptlrpc: do not wakeup every second Project: fs/lustre-release Branch: b2_10 Current Patch Set: Commit: 7faa2fec32b9d8a8f9dd646ca0b418c754cd8d0a
            mdiep Minh Diep made changes -
            Fix Version/s New: Lustre 2.10.3 [ 13591 ]
            Fix Version/s Original: Lustre 2.10.2 [ 13494 ]
            mdiep Minh Diep made changes -
            Fix Version/s New: Lustre 2.10.2 [ 13494 ]

            Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/29969
            Subject: LU-9660 ptlrpc: do not wakeup every second
            Project: fs/lustre-release
            Branch: b2_10
            Current Patch Set: 1
            Commit: 2142990299805080eab5a17ad7a80486072487f0

            gerrit Gerrit Updater added a comment - Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/29969 Subject: LU-9660 ptlrpc: do not wakeup every second Project: fs/lustre-release Branch: b2_10 Current Patch Set: 1 Commit: 2142990299805080eab5a17ad7a80486072487f0
            mdiep Minh Diep made changes -
            Labels Original: medium New: LTS medium
            pjones Peter Jones made changes -
            Fix Version/s New: Lustre 2.11.0 [ 13091 ]
            Resolution New: Fixed [ 1 ]
            Status Original: Open [ 1 ] New: Resolved [ 5 ]
            pjones Peter Jones added a comment -

            Landed for 2.11

            pjones Peter Jones added a comment - Landed for 2.11
            pjones Peter Jones made changes -
            Assignee Original: WC Triage [ wc-triage ] New: Alex Zhuravlev [ bzzz ]

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/28776/
            Subject: LU-9660 ptlrpc: do not wakeup every second
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: e81847bd06519dd9847c31244abe6da978bcd016

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/28776/ Subject: LU-9660 ptlrpc: do not wakeup every second Project: fs/lustre-release Branch: master Current Patch Set: Commit: e81847bd06519dd9847c31244abe6da978bcd016

            People

              bzzz Alex Zhuravlev
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: