[LU-823] Lustre breaks cgroups accounting Created: 03/Nov/11 Updated: 24/Nov/22 Resolved: 10/Apr/12 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.1.0, Lustre 2.2.0 |
| Fix Version/s: | Lustre 2.4.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Christopher Morrone | Assignee: | Niu Yawei (Inactive) |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | None | ||
| Environment: |
RHEL 6.1, 6.2 (possibly earlier), other kernels with cgroups support |
||
| Issue Links: |
|
||||
| Severity: | 3 | ||||
| Rank (Obsolete): | 6527 | ||||
| Description |
|
Lustre implements its own copy of truncate_complete_page() in lustre_patchless_compat.h to allow the client to build against an unpatched kernel. Unfortunately, of course, this means that lustre can break the kernel if it doesn't keep its copy in sync. Lustre's copy of truncate_complete_page() is out of date and breaks, at a minimum, linux's cgroup accounting. To work around this problem we have decided to temporarily patch our kernel to have it export truncate_complete_page() and allow Lustre to use the kernel's function. Obviously the quick-and-dirty fix for Lustre is to add additional autoconf checks and update its copy of truncate_complete_page() and other associated functions. But that whole approach is pretty unsettling. I would first like to check if there is some other similar function that the kernel exports now that we can start using. Failing that, perhaps we can borrow Brian's trick from ZFS for using a symbol that hasn't been exported. |
| Comments |
| Comment by Peter Jones [ 03/Nov/11 ] |
|
Chris Could you check whether the tip of master still exhibits this problem? Oleg wonders whether this landing may have helped - http://git.whamcloud.com/?p=fs/lustre-release.git;a=commit;h=1515e409cc57af5eaef809eee6d8f8d6725d092b Peter |
| Comment by Christopher Morrone [ 03/Nov/11 ] |
|
I think that patch comment could have used a larger comment. Why is it ok to switch from calling truncate_complete_page() to calling remove_page_from_cache() when remove_page_from_cache() is only one of several things that truncate_complete_page() does? But unfortunately, RHEL 6.2 doesn't export either delete_from_page_cache or remove_from_page_cache, so that page doesn't address the problem. |
| Comment by Christopher Morrone [ 03/Nov/11 ] |
|
The upstream linux commit (a52116aba5b3eed0ee41f70b794cc1937acd5cb8) to export remove_from_page_cache is just a one-line that doesn't do anything else. We are going to take a stab at getting RHEL to cherry-pick that into the RHEL6.2 kernel, but they are close to freezing the kernel so we probably shouldn't hold our breath. In any event, it would not help folks using cgroups on earlier RHEL6 releases. |
| Comment by Christopher Morrone [ 03/Nov/11 ] |
|
Oh, nevermind my comment about replacing truncate_complete_page with the other calls. Now I see how the functions nest. But the problem still remains: no call to mem_cgroup_uncharge_cache_page And isn't it incorrect to use cfs_* lock functions when the kernel is using the normal kernel locking functions on the same locks? Granted, the cfs_* functions are just #defines to the kernel functions, but it doesn't seem correct to use cfs_* functions when these are cfs_* locks. |
| Comment by Andreas Dilger [ 03/Nov/11 ] |
|
I agree - cfs_* wrappers shouldn't be used on kernel structures. I've noticed this in a few places. |
| Comment by Peter Jones [ 07/Nov/11 ] |
|
Niu Could you please look into what changes are needed here? Thanks Peter |
| Comment by Niu Yawei (Inactive) [ 08/Nov/11 ] |
|
truncate_inode_pages_range() can serve the similar function like truncate_complete_page(), but it's not as efficient as calling truncate_complete_page() directly, since it'll re-lookup & re-lock the page internally, I think that's why we didn't use it at the very begining, and given that remove_page_from_cache() will be exported in later kernel, I don't think it's wise to make changes(quite a few) to use the truncate_inode_pages_range(). If uer really want both cgroup and patchless client, I think we have to adopt the way of hacking to use un-exported symboles (Chirs, could you provide the details of this?), otherwise, we can document that patchless client doesn't support cgroup for those kernels (early 2.6.32). |
| Comment by Niu Yawei (Inactive) [ 10/Apr/12 ] |
|
Chris, are you ok with my previous comment? Declare that cgroup isn't supportted on patchless client with early 2.6.32 kernels, and use remove_from_page_cache() whenever it's exported in later kernel version. Thanks. |
| Comment by Christopher Morrone [ 10/Apr/12 ] |
|
I suppose I don't care too much since we decided to backport the kernel export. This business of copying kernel functions into lustre and hoping they are correct is very very ugly. But since this particular problem will go away in the future with newer kernels that export more useful symbols, I think we just leave it as it is for now. |
| Comment by Peter Jones [ 10/Apr/12 ] |
|
ok thanks Chris |
| Comment by Mark Hills [ 11/Apr/12 ] |
|
FYI, I think this is a straight duplicate of |