[LU-5801] client 2.5.2 - cgroups compatibility problem Created: 23/Oct/14 Updated: 24/Nov/22 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.7.0, Lustre 2.5.2 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Lukasz Flis | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 1 |
| Labels: | None | ||
| Environment: |
Scientific Linux 6.5 |
||
| Attachments: |
|
||||
| Issue Links: |
|
||||
| Severity: | 3 | ||||
| Rank (Obsolete): | 16267 | ||||
| Description |
|
During internal tests it appeared that dd process get killed by OOM-Killer when it belongs to a cgroup with memory enforcement. In our scenario we use dd to write 10G file: Our tests on local filesystems and GPFS were fine - no oom killer invocations have been seen. Only Lustre operations tend to fail due to memory excess I'm attaching oom details from dmesg – |
| Comments |
| Comment by Andreas Dilger [ 24/Oct/14 ] |
|
As yet, we haven't done any testing with Lustre and cgroups. I suspect this relates to the way that clients determine the cache limits based on the total RAM and not on the memory available to the cgroup. I suspect that this needs fixing in several places (max_dirty_mb, max_cached_mb, lu_cache limits, DLM pool limits, etc) to check the cgroup available memory instead of totalram_pages. That said, I don't know enough about cgroups yet to know what to check... |
| Comment by Lukasz Flis [ 24/Oct/14 ] |
|
I was looking for workaround and tried setting lustre.max_dirty_mb below the cgroup limit. Unfortunately no luck this time. |
| Comment by Lukasz Flis [ 13/Nov/14 ] |
|
Hello Since cgroups memory limiting is quite important functionality for us I'd like to ask if the fix is likely to appear in 2.5.x line? Best Regards |
| Comment by Andreas Dilger [ 14/Nov/14 ] |
|
All new features will appear in master before possibly being backported to a maintenance branch. Since master is currently in feature freeze for 2.7.0 it is unlikely that this feature would be implemented in the next few months. |
| Comment by Lukasz Flis [ 12/Jun/15 ] |
|
Hello Andreas, Are there any news regarding the memory cgroup support in new Lustre client versions? |
| Comment by Tyson Whitehead [ 08/Jan/18 ] |
|
Just adding another voice here for this bug to get some TLC as we've (Compute Canada) ran into it in a couple of cases with our users now. Would be great if it could get addressed. Thanks! -Tyson |
| Comment by Peter Jones [ 26/Jan/18 ] |
|
Has this issue been seen running on a current 2.10.x LTS release? |
| Comment by Patrick Farrell (Inactive) [ 26/Jan/18 ] |
|
Peter, You can check with Andreas, but I'm pretty confident the problem still exists. It's both a design issue and a design question. (I recall discussing it with Andreas in a JIRA at one point, but either my memory is bad or it was another JIRA.) We may not want to solve the problem in the way the reporter requested, but we haven't changed things here yet either. |