[LU-3771] stuck 56G of SUnreclaim memory Created: 16/Aug/13 Updated: 03/Oct/13 Resolved: 04/Sep/13 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Jay Lan (Inactive) | Assignee: | Niu Yawei (Inactive) |
| Resolution: | Not a Bug | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Server: 2.1.4, centos 6.3 |
||
| Issue Links: |
|
||||||||
| Severity: | 2 | ||||||||
| Rank (Obsolete): | 9716 | ||||||||
| Description |
|
We have ongoing problem of unreclaiming slab memory stuck in Lustre. It is different from This is an ongoing problem and created a lot of problem in our production systems. I will append /proc/meminfo and a 'slabtop' output below. Let me know what other information you need. bridge2 /proc # cat meminfo bridge2 ~ # slabtop --once Active / Total Objects (% used) : 2291913 / 500886088 (0.5%) OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME |
| Comments |
| Comment by Peter Jones [ 16/Aug/13 ] |
|
Niu Here is some further information from NASA. What do you advise? Thanks Peter |
| Comment by Niu Yawei (Inactive) [ 19/Aug/13 ] |
|
The memory is consumed by the slabs, and those slabs will be destroyed when unloading lustre modules (obdclass, lov, ...), could you try to see if unloading lustre modules will help? Thanks. |
| Comment by Andreas Dilger [ 19/Aug/13 ] |
|
This is all CLIO metadata structures in the slabs. It also seems unusual that the number of slab objects is over 150M, which would be enough for over 600GB of pages and totally unreasonable for a node with 64GB of RAM. Are there any patches that have been landed after 2.1 that might fix this problem? |
| Comment by Niu Yawei (Inactive) [ 20/Aug/13 ] |
|
Those slabs have been removed by 3bffa4d32bc5b0bc71ba6873e262ddbca436bae1 ( |
| Comment by Jinshan Xiong (Inactive) [ 20/Aug/13 ] |
|
I tend to think this shows a `problem' of slab allocation. Let's take a look at the first line: 133434868 41138 0% 0.04K 1450379 92 5801516K lovsub_page_kmem so it allocated a huge number of pages which can contain 133434868 of lovsub_page{} coming to 5G memory. However only 41138 of lovsub_page{} are being actively used. Hi Jay, have you ever seen this caused any real problem? Usually kernel tends to cache slab object until the memory is in pressure. Therefore it is okay if the memory being used by slab can be freed later on. |
| Comment by Jay Lan (Inactive) [ 20/Aug/13 ] |
|
Niu, Yes, unloading the modules freed up the SUnreclaim slabs. The systems have 62G memory. One system constantly has high number of memory in slab (> 45G), most of them in SUnreclaim (~ 99%). The past two days I checked many times, and found the "Active / Total Slabes (% used)" showed between 4.5% ~ 8.5% of slab usage. I was told when the usage dropped below, say 1.5%, the system would be very sluggish and unusable. 45G x (100 - 8)% = 41.4G of SUnreclaim memory unused is a lot. It is OK if the memory just park there while we still have enough memory for normal operation, but they should be freed up when the system need them. I would cherry-pick 3bffa4d ( |
| Comment by Jay Lan (Inactive) [ 20/Aug/13 ] |
|
It seems to me patch " |
| Comment by Jinshan Xiong (Inactive) [ 20/Aug/13 ] |
|
patch 3bffa4d would mitigate the problem a little bit because less slab data structures will be used for a page, but that is definitely not a fix. Actually we can't do anything for that because it's up to linux kernel vm management for when to free those memory. Niu, probably we should take a look at slab implementation to check if there is any tunable parameters for this. |
| Comment by Jinshan Xiong (Inactive) [ 20/Aug/13 ] |
|
There is /proc/sys/vm/min_slab_ratio in linux kernel and default is 5, you may set it higher and see if it can help. |
| Comment by Jay Lan (Inactive) [ 20/Aug/13 ] |
|
The min_slab_ratio defines the threshold when the kernel will free the reclaimable slab. But in our cases, the slabs held up by Lustre were in Unreclaimable slabs. Changing that value would not help.
|
| Comment by Jinshan Xiong (Inactive) [ 21/Aug/13 ] |
|
Yes, you're right about this. The slab memory should be in SRclaimable but it was in SUnreclaim for unknown reason. Based on the low number of `active objs' in slabinfo, it doesn't look like a memory leak problem - was the memory all released after unloading lustre modules? |
| Comment by Niu Yawei (Inactive) [ 21/Aug/13 ] |
The slab memory is accounted in SUnreclaim when the slab cache is created without SLAB_RECLAIM_ACCOUNT flag, the cl/lov/osc page slabs are created without this flag, so they showed in SUnreclaim, and I think adding the flag and shrinker callback won't help, because the problem now is that slab cache isn't reaped but not the slab objects are not freed.
Right, it's not a memory leak problem, and all the slab memory will be freed after unloading lustre modules (see Jay's previous comment) I don't think it's a lustre problem, the slab objects are already freed and put back in the slab cache after umount, so the problem is that kernel didn't reap the slab cache for some reason (actually, I don't know how to reap slab cache initiatively in 2.6 kernel). |
| Comment by Niu Yawei (Inactive) [ 21/Aug/13 ] |
I think the reason of such a high object number is that the filesystem has been umount/umount & used for a very long time, so lots of objects were created. |
| Comment by Jay Lan (Inactive) [ 21/Aug/13 ] |
|
Niu, it is not exactly as what you said "the slab objects are already freed and put back in the slab cache after umount". Bridge2 was last rebooted 2 days ago at Aug 19 04:38. All 8 lustre fs have not been umounted since. Here is the 'slabtop' output: Active / Total Objects (% used) : 8844277 / 385193960 (2.3%) OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME 104933521 1057165 1% 0.05K 1362773 77 5451092K lov_page_kmem 68093040 1050624 1% 0.08K 1418605 48 5674420K vvp_page_kmem 53508260 2119860 3% 0.19K 2675413 20 10701652K cl_page_kmem 12914768 27449 0% 0.50K 1614346 8 6457384K size-512 11721315 1058247 9% 0.26K 781421 15 3125684K osc_page_kmem 5503680 81827 1% 0.03K 49140 112 196560K size-32 639760 4178 0% 0.19K 31988 20 127952K cred_jar We have not done umount of Lustre fs. From past observation unmount would not free up the slab memory until we unload lustre modules. The fact that unloading the lustre modules frees up slabs suggests some communication not right between kernel and the lustre modules. Why and how? I do not know. |
| Comment by Niu Yawei (Inactive) [ 22/Aug/13 ] |
Lustre should have freed all slab objects (done by kmem_cache_free()) after umount, but it doesn't mean slab cache will free the memory used by object immediately, slab cache will still hold the memory for next use, the memory will only be freed when slab thinks memory is tight or the slab is destroyed. (when unload Lustre module, the slabs will be destroyed) If the slab cache consumed too much memory, and that result in unusable/sluggish system, I think there could be some defects in the slab reap mechanism (slab cache is run by kernel, not Lustre), what we can do is to reduce use of slab in Lustre, the fix of |
| Comment by Jay Lan (Inactive) [ 03/Sep/13 ] |
|
I now think the problem was probably caused by a certain application by certain user(s). For about a week after crash, about 90% of system memory were in slab. Last Friday I checked again, the slab percentage dropped down to 38%. Today the slab percentage was 30%. We can close this ticket. Next time should the problem happen again, we will track down the user and help him/her figure out how to address the problem. |
| Comment by Peter Jones [ 04/Sep/13 ] |
|
ok - thanks Jay! |