[LU-141] port lustre client page cache shrinker back to clio Created: 17/Mar/11 Updated: 13/Mar/14 Resolved: 13/Mar/14 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.4.0 |
| Type: | Improvement | Priority: | Minor |
| Reporter: | Jinshan Xiong (Inactive) | Assignee: | WC Triage |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Rank (Obsolete): | 10222 | ||||||||
| Description |
|
This feature is lost when clio was implemented. We may need to port it back from 1.8. There will be a lot of changes for this since we need to deal with it under cl_page infrastructure, also it might be better to implement this in obdclass/ instead of llite as what 1.8 does. this is what we should do: |
| Comments |
| Comment by Oleg Drokin [ 17/Mar/11 ] |
|
Actually I wonder if we can just forego this completely and depend on the generic VFS fs cache tunables instead? |
| Comment by Jinshan Xiong (Inactive) [ 17/Mar/11 ] |
|
We may still need this for some special task - for example, as a customer mentioned at lustre-discuss@, the task just reads data once, and rarely uses it again. In this case, max_cached_mb helps not pollute system memory cache too much |
| Comment by Bryon Neitzel (Inactive) [ 30/Mar/11 ] |
|
Andreas, can you provide an opinion on whether we should forego this or implement per Jay's suggestion? |
| Comment by Andreas Dilger [ 01/Apr/11 ] |
|
While I think this is an important function, I don't think it is a 2.1 release blocker. There have been reports of out-of-memory on the client due to async journal commit, and the lack of a cache limit on the client in 2.x may make that problem worse. However, until we have solid proof of memory problems on the client, I agree that normal Linux VM tunables can be used for this (e.g. /proc/sys/vm/dirty_ratio and friends, and posix_fadvise(POSIX_FADV_ {NOREUSE,DONTNEED})). |
| Comment by Jinshan Xiong (Inactive) [ 01/Apr/11 ] |
|
Andreas, thanks for you advice. Let's hold this issue until it really needs fixing. |
| Comment by Bryon Neitzel (Inactive) [ 09/Nov/11 ] |
|
Can anyone from CEA provide a test case that exhibits memory issues because this feature is not in 2.1? It would help justify the effort required to add this back in. |
| Comment by jc.lafoucriere@cea.fr [ 12/Nov/11 ] |
|
Hello Andreas I am not totally aware of this issue, but we will look what we can do to Bye JC |
| Comment by Stéphane Thiell (Inactive) [ 27/Nov/11 ] |
|
We have at CEA some special lustre client nodes (4 lustre filesystems ore more are mounted) and their role is to continuously copy files from one filesystem to another. Bandwidth drops rapidly after the first copies, and all transfer nodes become heavily loaded, taking their time doing memory reclaims in lustre (if I remember correctly). These nodes have quite a large amount of memory, from 24GB to 64GB depending on the cluster. And I think it is a problem particularly seen on large nodes as memory allocations in lustre are not NUMA aware. Also we didn't want to remove some physical memory as it is potentially used to run other filesystems tools (like robinhood and its large databases). That's why we would like to limit the max cached pages in lustre with max_cached_mb or by another way, but I didn't find anything convincing in Linux VM tunables (and this doesn't concern dirty pages, so I don't think dirty_ratio and friends could help in that case...). I would be happy to try some other ideas though. Our workaround for this issue is to use directio for file copy between lustre FS on these nodes. The global performance is much better, but individual transfers are not very fast and this is a problem when an user is eagerly waiting for a particular file. |
| Comment by Andreas Dilger [ 13/Mar/14 ] |
|
The per-filesystem max_cached_mb limit was added in http://review.whamcloud.com/2514 for 2.4.0 via |