[LU-141] port lustre client page cache shrinker back to clio Created: 17/Mar/11  Updated: 13/Mar/14  Resolved: 13/Mar/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.4.0

Type: Improvement Priority: Minor
Reporter: Jinshan Xiong (Inactive) Assignee: WC Triage
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-744 Single client's performance degradati... Resolved
Rank (Obsolete): 10222

 Description   

This feature is lost when clio was implemented. We may need to port it back from 1.8.

There will be a lot of changes for this since we need to deal with it under cl_page infrastructure, also it might be better to implement this in obdclass/ instead of llite as what 1.8 does.

this is what we should do:
1. Implement generic cl_page LRU mechanism in obdclass/. This LRU list should not use spinlocks for smp scalability and be able to deal with async_page_max cap.
2. add a shrinker for cl_page in llite - not sure if we still need this.



 Comments   
Comment by Oleg Drokin [ 17/Mar/11 ]

Actually I wonder if we can just forego this completely and depend on the generic VFS fs cache tunables instead?

Comment by Jinshan Xiong (Inactive) [ 17/Mar/11 ]

We may still need this for some special task - for example, as a customer mentioned at lustre-discuss@, the task just reads data once, and rarely uses it again. In this case, max_cached_mb helps not pollute system memory cache too much

Comment by Bryon Neitzel (Inactive) [ 30/Mar/11 ]

Andreas, can you provide an opinion on whether we should forego this or implement per Jay's suggestion?

Comment by Andreas Dilger [ 01/Apr/11 ]

While I think this is an important function, I don't think it is a 2.1 release blocker.

There have been reports of out-of-memory on the client due to async journal commit, and the lack of a cache limit on the client in 2.x may make that problem worse. However, until we have solid proof of memory problems on the client, I agree that normal Linux VM tunables can be used for this (e.g. /proc/sys/vm/dirty_ratio and friends, and posix_fadvise(POSIX_FADV_

{NOREUSE,DONTNEED}

)).

Comment by Jinshan Xiong (Inactive) [ 01/Apr/11 ]

Andreas, thanks for you advice. Let's hold this issue until it really needs fixing.

Comment by Bryon Neitzel (Inactive) [ 09/Nov/11 ]

Can anyone from CEA provide a test case that exhibits memory issues because this feature is not in 2.1? It would help justify the effort required to add this back in.

Comment by jc.lafoucriere@cea.fr [ 12/Nov/11 ]

Hello Andreas

I am not totally aware of this issue, but we will look what we can do to
clarify our need (use case or reproducer)

Bye

JC

Comment by Stéphane Thiell (Inactive) [ 27/Nov/11 ]

We have at CEA some special lustre client nodes (4 lustre filesystems ore more are mounted) and their role is to continuously copy files from one filesystem to another. Bandwidth drops rapidly after the first copies, and all transfer nodes become heavily loaded, taking their time doing memory reclaims in lustre (if I remember correctly). These nodes have quite a large amount of memory, from 24GB to 64GB depending on the cluster. And I think it is a problem particularly seen on large nodes as memory allocations in lustre are not NUMA aware.

Also we didn't want to remove some physical memory as it is potentially used to run other filesystems tools (like robinhood and its large databases). That's why we would like to limit the max cached pages in lustre with max_cached_mb or by another way, but I didn't find anything convincing in Linux VM tunables (and this doesn't concern dirty pages, so I don't think dirty_ratio and friends could help in that case...). I would be happy to try some other ideas though.

Our workaround for this issue is to use directio for file copy between lustre FS on these nodes. The global performance is much better, but individual transfers are not very fast and this is a problem when an user is eagerly waiting for a particular file.

Comment by Andreas Dilger [ 13/Mar/14 ]

The per-filesystem max_cached_mb limit was added in http://review.whamcloud.com/2514 for 2.4.0 via LU-744.

Generated at Sat Feb 10 01:04:09 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.