[LU-11509] LDLM: replace client lock LRU with improved cache algorithm - Whamcloud Community JIRA

Details

Type: Improvement
Resolution: Unresolved
Priority: Major
Fix Version/s: Lustre 2.17.0
Affects Version/s: None
Labels:
- medium

Rank (Obsolete):
9223372036854775807

Description

The LDLM LRU algorithm is sub-optimal for managing locks on the client, since it can flush often-used locks when there are a large number of files accessed in a short time (e.g. filesystem scanning). It would be better to implement a more sophisticated cache mechanism, such as ARC, LFRU, or similar (see https://en.wikipedia.org/wiki/Cache_replacement_policies) that includes a frequency count on the lock in addition to only the age of the lock. That will ensure that more important top-level locks are more likely to stay on the client instead of getting flushed.

Attachments

Issue Links

is related to

LU-7266 Fix LDLM pool to make LRUR working properly

Open

LU-10602 Add file heat support for Persistent Client Cache

Open

LU-17329 Relaxed POSIX Consistency for Lustre

Open

LU-17493 restore LDLM cancel on blocking callback

Open

LU-14517 Decrease default lru_max_age value

Resolved

is related to

LU-6529 Server side lock limits to avoid unnecessary memory exhaustion

Closed

LU-14221 Client hangs when using DoM with a fixed mdc lru_size

Closed

LU-17428 reduce default value for lru_max_age to 300s

Resolved

mentioned in: Page Loading...

(3 is related to , 1 mentioned in)

Activity

[LU-11509] LDLM: replace client lock LRU with improved cache algorithm

Gerrit Updater added a comment - 14/Oct/24 12:16 PM - edited

~~"Kiet <dekisugi@cau.ac.kr>" uploaded a new patch:~~ https://review.whamcloud.com/c/fs/lustre-release/+/56679
~~Subject: LU-11509 ldlm: Implement LFRU cache eviction~~
~~Project: fs/lustre-release~~
~~Branch: master~~
~~Current Patch Set: 1~~
~~Commit: 4385a0992e74cdf4474821d4b6213a40b42e5283~~

Gerrit Updater added a comment - 14/Oct/24 12:16 PM - edited "Kiet <dekisugi@cau.ac.kr>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/56679 Subject: LU-11509 ldlm: Implement LFRU cache eviction Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 4385a0992e74cdf4474821d4b6213a40b42e5283

Kiet Tuan Pham added a comment - 10/Oct/24 2:59 PM - edited

Sorry for taking a lot of time to catch up with the patch. Currently it is a very simple LFRU algorithm to be implemented. There is no maximum limit for the priv list yet so I think it could generate some problems. Also, the frequency threshold for a lock to be considered as "frequently accessed" is not measured by any means, I only set it so it seems reasonable.

For simple test, I try to do some work at the /, and then I tar some archive files, run find and ls to fill the LRU list. After that, I set the lru_size to be small ~100 to see the impact. Compared with the original LRU scheme, 7.3% more of the locks with high frequency will remain.

I haven't found any workload that show any differences for performance yet, may be the network should be slow to clearly see the additional time needed for enqueue ldlm_lock again?

Kiet Tuan Pham added a comment - 10/Oct/24 2:59 PM - edited Sorry for taking a lot of time to catch up with the patch. Currently it is a very simple LFRU algorithm to be implemented. There is no maximum limit for the priv list yet so I think it could generate some problems. Also, the frequency threshold for a lock to be considered as "frequently accessed" is not measured by any means, I only set it so it seems reasonable. For simple test, I try to do some work at the /, and then I tar some archive files, run find and ls to fill the LRU list. After that, I set the lru_size to be small ~100 to see the impact. Compared with the original LRU scheme, 7.3% more of the locks with high frequency will remain. I haven't found any workload that show any differences for performance yet, may be the network should be slow to clearly see the additional time needed for enqueue ldlm_lock again?

Gerrit Updater added a comment - 10/Oct/24 2:45 PM

"Kiet <tuankiet.rickystudio@gmail.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/56648
Subject: LU-11509 ldlm: Implement LFRU cache eviction
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: d3c3286e3ff8d16efaf49fd70651068aae0cff1c

Gerrit Updater added a comment - 10/Oct/24 2:45 PM "Kiet <tuankiet.rickystudio@gmail.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/56648 Subject: LU-11509 ldlm: Implement LFRU cache eviction Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: d3c3286e3ff8d16efaf49fd70651068aae0cff1c

Andreas Dilger added a comment - 10/Oct/24 1:48 AM

Hi dekisugi, your account in Gerrit has been enabled for you to push your patch(es). I would be quite interested to see what you have developed, even if it is not totally ready yet.

If the patch is still in a "prototype" stage, which often is the case for new/complex patches, then you can add the following line to the commit message so that it will not be considered for landing before it is ready:

Test-Parameters: fortestonly

Thank you for your contribution.

Andreas Dilger added a comment - 10/Oct/24 1:48 AM Hi dekisugi , your account in Gerrit has been enabled for you to push your patch(es). I would be quite interested to see what you have developed, even if it is not totally ready yet. If the patch is still in a "prototype" stage, which often is the case for new/complex patches, then you can add the following line to the commit message so that it will not be considered for landing before it is ready: Test-Parameters: fortestonly Thank you for your contribution.

Andreas Dilger added a comment - 17/Sep/24 11:37 PM

dekisugi,
I understand that you are working on this project to implement LFRU for the Lustre DLM lock cache. Thank you for your contribution.

Could you please provide some background of what you are implementing so that it is available here for future reference. It would be useful to know if you have done any performance analysis with the LFRU and how that compares to the existing LRU.

When you push your patch to Gerrit (which I've now given you permission to do), the patch will undergo fairly substantial regression testing, but it will not necessarily do extensive performance testing specific to this code. You may be interested to review patch https://review.whamcloud.com/54204 "LU-11085 tests: Add performance test for ldlm_extent code" (sanity.sh test_842) to see how a unit test could be written for the DLM LRU.

At a system level, running some filesystem workload that has both short- and long-lived DLM locks on a single client would be useful. Possibly "find" on a large directory tree, with some large directories at the bottom, should keep the DLM locks from the frequently-used top-level directories in the client cache, while single-use DLM locks for the leaf files should be dropped relatively quickly. I'm not sure if the DLM LRU behaviour would show up in a workload like "compilebench" or if that would be overwhelmed by other factors.

Andreas Dilger added a comment - 17/Sep/24 11:37 PM dekisugi , I understand that you are working on this project to implement LFRU for the Lustre DLM lock cache. Thank you for your contribution. Could you please provide some background of what you are implementing so that it is available here for future reference. It would be useful to know if you have done any performance analysis with the LFRU and how that compares to the existing LRU. When you push your patch to Gerrit (which I've now given you permission to do), the patch will undergo fairly substantial regression testing, but it will not necessarily do extensive performance testing specific to this code. You may be interested to review patch https://review.whamcloud.com/54204 " LU-11085 tests: Add performance test for ldlm_extent code " (sanity.sh test_842) to see how a unit test could be written for the DLM LRU. At a system level, running some filesystem workload that has both short- and long-lived DLM locks on a single client would be useful. Possibly "find" on a large directory tree, with some large directories at the bottom, should keep the DLM locks from the frequently-used top-level directories in the client cache, while single-use DLM locks for the leaf files should be dropped relatively quickly. I'm not sure if the DLM LRU behaviour would show up in a workload like "compilebench" or if that would be overwhelmed by other factors.

Andreas Dilger added a comment - 28/Aug/24 12:05 AM

From discussion in patch https://review.whamcloud.com/53682 "LU-17428 ldlm: reduce default lru_max_age":

tim> Poking around the code, the LRU in Lustre seems pretty simplistic.
tim> It seems like the core logic is encapsulated in ldlm_prepare_lru_list()?
tim> We just cycle over ns_unused_list and cancel according to a couple preset
tim> policies. If we added a couple more lists and new policies for shifting
tim> locks between them, that looks like it'd be enough to implement LFRU or ARC.
tim>
tim> Has anyone looked at this in-depth before? Anything experimental floating around?

patrick> While client side lock management could be improved, the real problem
patrick> with lock management isn't the client side code - it's the monstrosity that is the
patrick> slv code.  It buggily applies an overly complicated model of "lock volume" which
patrick> has (imo) nothing to do with how locks work in Lustre.
patrick> But hey, at least it's almost incomprehensible.
patrick> 
patrick> Apologies for the tone - a client side LRU rework is probably worth the trouble.
patrick> While it would be more ambitious, a simplification of the server side LRU might
patrick> be even more valuable.  I'd have to think a bit about what's most desirable,
patrick> but I think it's probably letting clients have as many locks as they want until
patrick> lru_size (on the server) is hit, or it runs low on memory.
patrick> When it does, do something simple.

I think LFRU or ARC would be a real and immediate benefit for clients with a relatively small effort, so that some "find" or "ls -lR" doesn't blow out a few dozen busy locks on a login node (e.g. /mnt/lustre, .../home, .../home/$USER, .../home/$USER/bin, etc.) and all of their cached positive and negative entries, or a busy data mover node doing a filesystem traversal doesn't thrash the top-level directory locks for the used-once leaf locks.

I have to admit that I have the same feeling as Patrick about the SLV implementation (remediation of which is discussed a bit under LU-7266). Maybe it works sometimes, but nobody understands it, and it definitely doesn't work all the time or there wouldn't be a need to reduce lru_max_age or similar.

~~LU-6529~~ did some work to have a more "direct" server lock reclaim, but it never implemented a "generic client callback" from a low-memory server that informed the client "you need to cancel 50 locks right now, you pick which". Instead, it just randomly picks some locks and issues cancels, which may help or may just drive up the server load further.

Andreas Dilger added a comment - 28/Aug/24 12:05 AM From discussion in patch https://review.whamcloud.com/53682 " LU-17428 ldlm: reduce default lru_max_age ": tim> Poking around the code, the LRU in Lustre seems pretty simplistic. tim> It seems like the core logic is encapsulated in ldlm_prepare_lru_list()? tim> We just cycle over ns_unused_list and cancel according to a couple preset tim> policies. If we added a couple more lists and new policies for shifting tim> locks between them, that looks like it'd be enough to implement LFRU or ARC. tim> tim> Has anyone looked at this in-depth before? Anything experimental floating around? patrick> While client side lock management could be improved, the real problem patrick> with lock management isn't the client side code - it's the monstrosity that is the patrick> slv code. It buggily applies an overly complicated model of "lock volume" which patrick> has (imo) nothing to do with how locks work in Lustre. patrick> But hey, at least it's almost incomprehensible. patrick> patrick> Apologies for the tone - a client side LRU rework is probably worth the trouble. patrick> While it would be more ambitious, a simplification of the server side LRU might patrick> be even more valuable. I'd have to think a bit about what's most desirable, patrick> but I think it's probably letting clients have as many locks as they want until patrick> lru_size (on the server) is hit, or it runs low on memory. patrick> When it does, do something simple. I think LFRU or ARC would be a real and immediate benefit for clients with a relatively small effort, so that some "find" or "ls -lR" doesn't blow out a few dozen busy locks on a login node (e.g. /mnt/lustre, .../home, .../home/$USER, .../home/$USER/bin, etc.) and all of their cached positive and negative entries, or a busy data mover node doing a filesystem traversal doesn't thrash the top-level directory locks for the used-once leaf locks. I have to admit that I have the same feeling as Patrick about the SLV implementation (remediation of which is discussed a bit under LU-7266 ). Maybe it works sometimes, but nobody understands it, and it definitely doesn't work all the time or there wouldn't be a need to reduce lru_max_age or similar. LU-6529 did some work to have a more "direct" server lock reclaim, but it never implemented a "generic client callback" from a low-memory server that informed the client "you need to cancel 50 locks right now, you pick which". Instead, it just randomly picks some locks and issues cancels, which may help or may just drive up the server load further.

People

Assignee:: Kiet Tuan Pham

Reporter:: Andreas Dilger

Votes:: 0 Vote for this issue

Watchers:: 22 Start watching this issue

Dates

Created:: 12/Oct/18 2:47 AM

Updated:: 25/Apr/25 9:22 PM