Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11509

LDLM: replace client lock LRU with improved cache algorithm

Details

    • Improvement
    • Resolution: Unresolved
    • Major
    • Lustre 2.17.0
    • None
    • 9223372036854775807

    Description

      The LDLM LRU algorithm is sub-optimal for managing locks on the client, since it can flush often-used locks when there are a large number of files accessed in a short time (e.g. filesystem scanning). It would be better to implement a more sophisticated cache mechanism, such as ARC, LFRU, or similar (see https://en.wikipedia.org/wiki/Cache_replacement_policies) that includes a frequency count on the lock in addition to only the age of the lock. That will ensure that more important top-level locks are more likely to stay on the client instead of getting flushed.

      Attachments

        Issue Links

          Activity

            [LU-11509] LDLM: replace client lock LRU with improved cache algorithm
            gerrit Gerrit Updater added a comment - - edited

            "Kiet <dekisugi@cau.ac.kr>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/56679
            Subject: LU-11509 ldlm: Implement LFRU cache eviction
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 4385a0992e74cdf4474821d4b6213a40b42e5283

            gerrit Gerrit Updater added a comment - - edited "Kiet <dekisugi@cau.ac.kr>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/56679 Subject: LU-11509 ldlm: Implement LFRU cache eviction Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 4385a0992e74cdf4474821d4b6213a40b42e5283
            dekisugi Kiet Tuan Pham added a comment - - edited

            Sorry for taking a lot of time to catch up with the patch. Currently it is a very simple LFRU algorithm to be implemented. There is no maximum limit for the priv list yet so I think it could generate some problems. Also, the frequency threshold for a lock to be considered as "frequently accessed" is not measured by any means, I only set it so it seems reasonable.

            For simple test, I try to do some work at the /, and then I tar some archive files, run find and ls to fill the LRU list. After that, I set the lru_size to be small ~100 to see the impact. Compared with the original LRU scheme, 7.3% more of the locks with high frequency will remain.

            I haven't found any workload that show any differences for performance yet, may be the network should be slow to clearly see the additional time needed for enqueue ldlm_lock again?

            dekisugi Kiet Tuan Pham added a comment - - edited Sorry for taking a lot of time to catch up with the patch. Currently it is a very simple LFRU algorithm to be implemented. There is no maximum limit for the priv list yet so I think it could generate some problems. Also, the frequency threshold for a lock to be considered as "frequently accessed" is not measured by any means, I only set it so it seems reasonable. For simple test, I try to do some work at the /, and then I tar some archive files, run find and ls to fill the LRU list. After that, I set the lru_size to be small ~100 to see the impact. Compared with the original LRU scheme, 7.3% more of the locks with high frequency will remain. I haven't found any workload that show any differences for performance yet, may be the network should be slow to clearly see the additional time needed for enqueue ldlm_lock again?

            "Kiet <tuankiet.rickystudio@gmail.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/56648
            Subject: LU-11509 ldlm: Implement LFRU cache eviction
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: d3c3286e3ff8d16efaf49fd70651068aae0cff1c

            gerrit Gerrit Updater added a comment - "Kiet <tuankiet.rickystudio@gmail.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/56648 Subject: LU-11509 ldlm: Implement LFRU cache eviction Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: d3c3286e3ff8d16efaf49fd70651068aae0cff1c

            Hi dekisugi, your account in Gerrit has been enabled for you to push your patch(es). I would be quite interested to see what you have developed, even if it is not totally ready yet.

            If the patch is still in a "prototype" stage, which often is the case for new/complex patches, then you can add the following line to the commit message so that it will not be considered for landing before it is ready:

            Test-Parameters: fortestonly
            

            Thank you for your contribution.

            adilger Andreas Dilger added a comment - Hi dekisugi , your account in Gerrit has been enabled for you to push your patch(es). I would be quite interested to see what you have developed, even if it is not totally ready yet. If the patch is still in a "prototype" stage, which often is the case for new/complex patches, then you can add the following line to the commit message so that it will not be considered for landing before it is ready: Test-Parameters: fortestonly Thank you for your contribution.

            dekisugi,
            I understand that you are working on this project to implement LFRU for the Lustre DLM lock cache. Thank you for your contribution.

            Could you please provide some background of what you are implementing so that it is available here for future reference. It would be useful to know if you have done any performance analysis with the LFRU and how that compares to the existing LRU.

            When you push your patch to Gerrit (which I've now given you permission to do), the patch will undergo fairly substantial regression testing, but it will not necessarily do extensive performance testing specific to this code. You may be interested to review patch https://review.whamcloud.com/54204 "LU-11085 tests: Add performance test for ldlm_extent code" (sanity.sh test_842) to see how a unit test could be written for the DLM LRU.

            At a system level, running some filesystem workload that has both short- and long-lived DLM locks on a single client would be useful. Possibly "find" on a large directory tree, with some large directories at the bottom, should keep the DLM locks from the frequently-used top-level directories in the client cache, while single-use DLM locks for the leaf files should be dropped relatively quickly. I'm not sure if the DLM LRU behaviour would show up in a workload like "compilebench" or if that would be overwhelmed by other factors.

            adilger Andreas Dilger added a comment - dekisugi , I understand that you are working on this project to implement LFRU for the Lustre DLM lock cache. Thank you for your contribution. Could you please provide some background of what you are implementing so that it is available here for future reference. It would be useful to know if you have done any performance analysis with the LFRU and how that compares to the existing LRU. When you push your patch to Gerrit (which I've now given you permission to do), the patch will undergo fairly substantial regression testing, but it will not necessarily do extensive performance testing specific to this code. You may be interested to review patch https://review.whamcloud.com/54204 " LU-11085 tests: Add performance test for ldlm_extent code " (sanity.sh test_842) to see how a unit test could be written for the DLM LRU. At a system level, running some filesystem workload that has both short- and long-lived DLM locks on a single client would be useful. Possibly "find" on a large directory tree, with some large directories at the bottom, should keep the DLM locks from the frequently-used top-level directories in the client cache, while single-use DLM locks for the leaf files should be dropped relatively quickly. I'm not sure if the DLM LRU behaviour would show up in a workload like "compilebench" or if that would be overwhelmed by other factors.

            From discussion in patch https://review.whamcloud.com/53682 "LU-17428 ldlm: reduce default lru_max_age":

            tim> Poking around the code, the LRU in Lustre seems pretty simplistic.
            tim> It seems like the core logic is encapsulated in ldlm_prepare_lru_list()?
            tim> We just cycle over ns_unused_list and cancel according to a couple preset
            tim> policies. If we added a couple more lists and new policies for shifting
            tim> locks between them, that looks like it'd be enough to implement LFRU or ARC.
            tim>
            tim> Has anyone looked at this in-depth before? Anything experimental floating around?
            
            patrick> While client side lock management could be improved, the real problem
            patrick> with lock management isn't the client side code - it's the monstrosity that is the
            patrick> slv code.  It buggily applies an overly complicated model of "lock volume" which
            patrick> has (imo) nothing to do with how locks work in Lustre.
            patrick> But hey, at least it's almost incomprehensible.
            patrick> 
            patrick> Apologies for the tone - a client side LRU rework is probably worth the trouble.
            patrick> While it would be more ambitious, a simplification of the server side LRU might
            patrick> be even more valuable.  I'd have to think a bit about what's most desirable,
            patrick> but I think it's probably letting clients have as many locks as they want until
            patrick> lru_size (on the server) is hit, or it runs low on memory.
            patrick> When it does, do something simple.
            

            I think LFRU or ARC would be a real and immediate benefit for clients with a relatively small effort, so that some "find" or "ls -lR" doesn't blow out a few dozen busy locks on a login node (e.g. /mnt/lustre, .../home, .../home/$USER, .../home/$USER/bin, etc.) and all of their cached positive and negative entries, or a busy data mover node doing a filesystem traversal doesn't thrash the top-level directory locks for the used-once leaf locks.

            I have to admit that I have the same feeling as Patrick about the SLV implementation (remediation of which is discussed a bit under LU-7266). Maybe it works sometimes, but nobody understands it, and it definitely doesn't work all the time or there wouldn't be a need to reduce lru_max_age or similar.

            LU-6529 did some work to have a more "direct" server lock reclaim, but it never implemented a "generic client callback" from a low-memory server that informed the client "you need to cancel 50 locks right now, you pick which". Instead, it just randomly picks some locks and issues cancels, which may help or may just drive up the server load further.

            adilger Andreas Dilger added a comment - From discussion in patch https://review.whamcloud.com/53682 " LU-17428 ldlm: reduce default lru_max_age ": tim> Poking around the code, the LRU in Lustre seems pretty simplistic. tim> It seems like the core logic is encapsulated in ldlm_prepare_lru_list()? tim> We just cycle over ns_unused_list and cancel according to a couple preset tim> policies. If we added a couple more lists and new policies for shifting tim> locks between them, that looks like it'd be enough to implement LFRU or ARC. tim> tim> Has anyone looked at this in-depth before? Anything experimental floating around? patrick> While client side lock management could be improved, the real problem patrick> with lock management isn't the client side code - it's the monstrosity that is the patrick> slv code. It buggily applies an overly complicated model of "lock volume" which patrick> has (imo) nothing to do with how locks work in Lustre. patrick> But hey, at least it's almost incomprehensible. patrick> patrick> Apologies for the tone - a client side LRU rework is probably worth the trouble. patrick> While it would be more ambitious, a simplification of the server side LRU might patrick> be even more valuable. I'd have to think a bit about what's most desirable, patrick> but I think it's probably letting clients have as many locks as they want until patrick> lru_size (on the server) is hit, or it runs low on memory. patrick> When it does, do something simple. I think LFRU or ARC would be a real and immediate benefit for clients with a relatively small effort, so that some "find" or "ls -lR" doesn't blow out a few dozen busy locks on a login node (e.g. /mnt/lustre, .../home, .../home/$USER, .../home/$USER/bin, etc.) and all of their cached positive and negative entries, or a busy data mover node doing a filesystem traversal doesn't thrash the top-level directory locks for the used-once leaf locks. I have to admit that I have the same feeling as Patrick about the SLV implementation (remediation of which is discussed a bit under LU-7266 ). Maybe it works sometimes, but nobody understands it, and it definitely doesn't work all the time or there wouldn't be a need to reduce lru_max_age or similar. LU-6529 did some work to have a more "direct" server lock reclaim, but it never implemented a "generic client callback" from a low-memory server that informed the client "you need to cancel 50 locks right now, you pick which". Instead, it just randomly picks some locks and issues cancels, which may help or may just drive up the server load further.

            People

              dekisugi Kiet Tuan Pham
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              22 Start watching this issue

              Dates

                Created:
                Updated: