Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-15821

Server driven blocking callbacks can wait behind general lru_size management

Details

    • 3
    • 9223372036854775807

    Description

      The current code places bl_ast lock callbacks at the end of the global BL callback queue.  This is bad because it causes urgent requests from the server to wait behind non-urgent cleanup tasks to keep lru_size at the right level.

      This can lead to evictions if there is a large queue of items in the global queue so the callback is not serviced in a timely manner.

      Put bl_ast callbacks on the priority queue so they do not wait behind the background traffic.

      Attachments

        Issue Links

          Activity

            [LU-15821] Server driven blocking callbacks can wait behind general lru_size management

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/49610/
            Subject: LU-15821 ldlm: Prioritize blocking callbacks
            Project: fs/lustre-release
            Branch: b2_15
            Current Patch Set:
            Commit: 8ca1186151faa778edd5abd361e92fcd5d8ff56b

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/49610/ Subject: LU-15821 ldlm: Prioritize blocking callbacks Project: fs/lustre-release Branch: b2_15 Current Patch Set: Commit: 8ca1186151faa778edd5abd361e92fcd5d8ff56b

            Andreas, since we have applied this patch, last October 2022, we have not seen again the following problems from two workloads that were previously causing trouble:

            • GNU parallel with --tmpdir
            • sort with --temporary-directory

            In both use cases, files are created in temporary directory and are used unlinked, not visible in the directory but actually still open, this may have triggered some sort of contention in Lustre leading to evictions.

            But at the same time, we have also tried to redirect our users to local scratch filesystems to avoid further issues, as a parallel filesystem was not really needed. So I can't tell you for sure that this patch resolves these issues, but at least it didn't introduce anything bad and we would like to keep it for now. It would be convenient for us if it was added to 2.15, but otherwise, I will just continue to backport it. I hope that the context helps a bit.

            Thanks also for the pointer to the other patch from Yang Sheng.

            sthiell Stephane Thiell added a comment - Andreas, since we have applied this patch, last October 2022, we have not seen again the following problems from two workloads that were previously causing trouble: GNU parallel with --tmpdir sort with --temporary-directory In both use cases, files are created in temporary directory and are used unlinked, not visible in the directory but actually still open, this may have triggered some sort of contention in Lustre leading to evictions. But at the same time, we have also tried to redirect our users to local scratch filesystems to avoid further issues, as a parallel filesystem was not really needed. So I can't tell you for sure that this patch resolves these issues, but at least it didn't introduce anything bad and we would like to keep it for now. It would be convenient for us if it was added to 2.15, but otherwise, I will just continue to backport it. I hope that the context helps a bit. Thanks also for the pointer to the other patch from Yang Sheng.

            we have been running it for a while on 2.15.1 clients with good results.

            Stephane, by "good results" do you mean "it doesn't cause problems" or "it visibly improved/removed some problem that you were seeing with client evictions"? In the use case that drove the initial development of this patch it didn't totally solve the issue. Yang Sheng also just developed patch https://review.whamcloud.com/49527 "LU-16285 ldlm: improvement of bl lock queue" to further improve the handling of highly-contended DLM locks.

            adilger Andreas Dilger added a comment - we have been running it for a while on 2.15.1 clients with good results. Stephane, by "good results" do you mean "it doesn't cause problems" or "it visibly improved/removed some problem that you were seeing with client evictions"? In the use case that drove the initial development of this patch it didn't totally solve the issue. Yang Sheng also just developed patch https://review.whamcloud.com/49527 " LU-16285 ldlm: improvement of bl lock queue " to further improve the handling of highly-contended DLM locks.

            "Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/49610
            Subject: LU-15821 ldlm: Prioritize blocking callbacks
            Project: fs/lustre-release
            Branch: b2_15
            Current Patch Set: 1
            Commit: b3e9e0eeadd783d22065871df28ea32f2d3c6934

            gerrit Gerrit Updater added a comment - "Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/49610 Subject: LU-15821 ldlm: Prioritize blocking callbacks Project: fs/lustre-release Branch: b2_15 Current Patch Set: 1 Commit: b3e9e0eeadd783d22065871df28ea32f2d3c6934

            It would be nice to have this patch backported to 2.15.x, we have been running it for a while on 2.15.1 clients with good results.

            sthiell Stephane Thiell added a comment - It would be nice to have this patch backported to 2.15.x, we have been running it for a while on 2.15.1 clients with good results.

            "Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/48764
            Subject: LU-15821 ldlm: Prioritize blocking callbacks
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set: 1
            Commit: 74666c9fff24126922e5635fbaf2394bf7eda118

            gerrit Gerrit Updater added a comment - "Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/48764 Subject: LU-15821 ldlm: Prioritize blocking callbacks Project: fs/lustre-release Branch: b2_12 Current Patch Set: 1 Commit: 74666c9fff24126922e5635fbaf2394bf7eda118
            gerrit Gerrit Updater added a comment - - edited

            "Patrick Farrell <pfarrell@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/48122
            Subject: LU-15821 ldlm: Fix unsafe blwi access
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 97326667eb8f960fa2996099a6bd2b96496d026e
            (Patch not needed)

            gerrit Gerrit Updater added a comment - - edited "Patrick Farrell <pfarrell@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/48122 Subject: LU-15821 ldlm: Fix unsafe blwi access Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 97326667eb8f960fa2996099a6bd2b96496d026e (Patch not needed)
            pjones Peter Jones added a comment -

            Landed for 2.16

            pjones Peter Jones added a comment - Landed for 2.16

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/47215/
            Subject: LU-15821 ldlm: Prioritize blocking callbacks
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 2d59294d52b696125acc464e5910c893d9aef237

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/47215/ Subject: LU-15821 ldlm: Prioritize blocking callbacks Project: fs/lustre-release Branch: master Current Patch Set: Commit: 2d59294d52b696125acc464e5910c893d9aef237

            "Patrick Farrell <pfarrell@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/47215
            Subject: LU-15821 ldlm: Prioritize blocking callbacks
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 65a5b8d27e6d6a0acf8bc87458b8837509e60b23

            gerrit Gerrit Updater added a comment - "Patrick Farrell <pfarrell@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/47215 Subject: LU-15821 ldlm: Prioritize blocking callbacks Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 65a5b8d27e6d6a0acf8bc87458b8837509e60b23

            People

              paf0186 Patrick Farrell
              paf0186 Patrick Farrell
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: