[LU-3771] stuck 56G of SUnreclaim memory - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Not a Bug
Priority: Major
Fix Version/s: None
Affects Version/s: None
Labels:
None
Environment:
Server: 2.1.4, centos 6.3
Client: 2.1.5, sles11sp1

Severity:
2
Rank (Obsolete):
9716

Description

We have ongoing problem of unreclaiming slab memory stuck in Lustre. It is different from ~~LU-2613~~ in that unmounting Lustre FS did not release the stuck memory. Also we tried lflush and also the write technique suggested by Niu Yawei in ~~LU-2613~~ at 15/Jan/13 8:54 AM. None worked for us.

This is an ongoing problem and created a lot of problem in our production systems.

I will append /proc/meminfo and a 'slabtop' output below. Let me know what other information you need.

bridge2 /proc # cat meminfo
MemTotal: 65978336 kB
MemFree: 4417544 kB
Buffers: 7804 kB
Cached: 183036 kB
SwapCached: 6068 kB
Active: 101840 kB
Inactive: 183404 kB
Active(anon): 83648 kB
Inactive(anon): 13036 kB
Active(file): 18192 kB
Inactive(file): 170368 kB
Unevictable: 3480 kB
Mlocked: 3480 kB
SwapTotal: 2000052 kB
SwapFree: 1669420 kB
Dirty: 288 kB
Writeback: 0 kB
AnonPages: 92980 kB
Mapped: 16964 kB
Shmem: 136 kB
Slab: 57633936 kB
SReclaimable: 1029472 kB
SUnreclaim: 56604464 kB
KernelStack: 5280 kB
PageTables: 15928 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 34989220 kB
Committed_AS: 737448 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 2348084 kB
VmallocChunk: 34297775112 kB
HardwareCorrupted: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 7104 kB
DirectMap2M: 67100672 kB
bridge2 /proc #

bridge2 ~ # slabtop --once

Active / Total Objects (% used) : 2291913 / 500886088 (0.5%)
Active / Total Slabs (% used) : 170870 / 14351991 (1.2%)
Active / Total Caches (% used) : 151 / 249 (60.6%)
Active / Total Size (% used) : 838108.56K / 53998141.57K (1.6%)
Minimum / Average / Maximum Object : 0.01K / 0.11K / 4096.00K

OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
133434868 41138 0% 0.04K 1450379 92 5801516K lovsub_page_kmem
124369720 77440 0% 0.19K 6218486 20 24873944K cl_page_kmem
115027759 41264 0% 0.05K 1493867 77 5975468K lov_page_kmem
77597568 41174 0% 0.08K 1616616 48 6466464K vvp_page_kmem
44004405 38371 0% 0.26K 2933627 15 11734508K osc_page_kmem
1558690 9106 0% 0.54K 222670 7 890680K radix_tree_node
1435785 457262 31% 0.25K 95719 15 382876K size-256
991104 24455 2% 0.50K 123888 8 495552K size-512
591420 573510 96% 0.12K 19714 30 78856K size-128
583038 507363 87% 0.06K 9882 59 39528K size-64
399080 4356 1% 0.19K 19954 20 79816K cred_jar
112112 81796 72% 0.03K 1001 112 4004K size-32
106368 106154 99% 0.08K 2216 48 8864K sysfs_dir_cache
89740 26198 29% 1.00K 22435 4 89740K size-1024
87018 1601 1% 0.62K 14503 6 58012K proc_inode_cache
53772 2845 5% 0.58K 8962 6 35848K inode_cache
44781 44746 99% 8.00K 44781 1 358248K size-8192
42700 28830 67% 0.19K 2135 20 8540K dentry
38990 2213 5% 0.79K 7798 5 31192K ext3_inode_cache
25525 24880 97% 0.78K 5105 5 20420K shmem_inode_cache
23394 16849 72% 0.18K 1114 21 4456K vm_area_struct
22340 6262 28% 0.19K 1117 20 4468K filp
20415 19243 94% 0.25K 1361 15 5444K skbuff_head_cache
19893 2152 10% 0.20K 1047 19 4188K ll_obdo_cache
15097 15006 99% 4.00K 15097 1 60388K size-4096
14076 1837 13% 0.04K 153 92 612K osc_req_kmem
12696 1448 11% 0.04K 138 92 552K lovsub_req_kmem
11684 1444 12% 0.04K 127 92 508K lov_req_kmem
10028 1477 14% 0.04K 109 92 436K ccc_req_kmem
9750 3000 30% 0.12K 325 30 1300K nfs_page

Attachments

Issue Links

is duplicated by

LU-4053 client leaking objects/locks during IO

Resolved

Activity

[LU-3771] stuck 56G of SUnreclaim memory

Niu Yawei (Inactive) added a comment - 21/Aug/13 5:25 AM

Yes, you're right about this. The slab memory should be in SRclaimable but it was in SUnreclaim for unknown reason.

The slab memory is accounted in SUnreclaim when the slab cache is created without SLAB_RECLAIM_ACCOUNT flag, the cl/lov/osc page slabs are created without this flag, so they showed in SUnreclaim, and I think adding the flag and shrinker callback won't help, because the problem now is that slab cache isn't reaped but not the slab objects are not freed.

Based on the low number of `active objs' in slabinfo, it doesn't look like a memory leak problem - was the memory all released after unloading lustre modules?

Right, it's not a memory leak problem, and all the slab memory will be freed after unloading lustre modules (see Jay's previous comment)

I don't think it's a lustre problem, the slab objects are already freed and put back in the slab cache after umount, so the problem is that kernel didn't reap the slab cache for some reason (actually, I don't know how to reap slab cache initiatively in 2.6 kernel).

Niu Yawei (Inactive) added a comment - 21/Aug/13 5:25 AM Yes, you're right about this. The slab memory should be in SRclaimable but it was in SUnreclaim for unknown reason. The slab memory is accounted in SUnreclaim when the slab cache is created without SLAB_RECLAIM_ACCOUNT flag, the cl/lov/osc page slabs are created without this flag, so they showed in SUnreclaim, and I think adding the flag and shrinker callback won't help, because the problem now is that slab cache isn't reaped but not the slab objects are not freed. Based on the low number of `active objs' in slabinfo, it doesn't look like a memory leak problem - was the memory all released after unloading lustre modules? Right, it's not a memory leak problem, and all the slab memory will be freed after unloading lustre modules (see Jay's previous comment) I don't think it's a lustre problem, the slab objects are already freed and put back in the slab cache after umount, so the problem is that kernel didn't reap the slab cache for some reason (actually, I don't know how to reap slab cache initiatively in 2.6 kernel).

Jinshan Xiong (Inactive) added a comment - 21/Aug/13 1:36 AM

Yes, you're right about this. The slab memory should be in SRclaimable but it was in SUnreclaim for unknown reason.

Based on the low number of `active objs' in slabinfo, it doesn't look like a memory leak problem - was the memory all released after unloading lustre modules?

Jinshan Xiong (Inactive) added a comment - 21/Aug/13 1:36 AM Yes, you're right about this. The slab memory should be in SRclaimable but it was in SUnreclaim for unknown reason. Based on the low number of `active objs' in slabinfo, it doesn't look like a memory leak problem - was the memory all released after unloading lustre modules?

Jay Lan (Inactive) added a comment - 20/Aug/13 10:27 PM

The min_slab_ratio defines the threshold when the kernel will free the reclaimable slab. But in our cases, the slabs held up by Lustre were in Unreclaimable slabs. Changing that value would not help.

~~LU-2613~~ found a case where unreclaimable slabs should have been released but not. We may hit another case? Don't know. But, as Andreas commented those number were unreasonably high.

Jay Lan (Inactive) added a comment - 20/Aug/13 10:27 PM The min_slab_ratio defines the threshold when the kernel will free the reclaimable slab. But in our cases, the slabs held up by Lustre were in Unreclaimable slabs. Changing that value would not help. LU-2613 found a case where unreclaimable slabs should have been released but not. We may hit another case? Don't know. But, as Andreas commented those number were unreasonably high.

Jinshan Xiong (Inactive) added a comment - 20/Aug/13 8:33 PM

There is /proc/sys/vm/min_slab_ratio in linux kernel and default is 5, you may set it higher and see if it can help.

Jinshan Xiong (Inactive) added a comment - 20/Aug/13 8:33 PM There is /proc/sys/vm/min_slab_ratio in linux kernel and default is 5, you may set it higher and see if it can help.

Jinshan Xiong (Inactive) added a comment - 20/Aug/13 8:18 PM

patch 3bffa4d would mitigate the problem a little bit because less slab data structures will be used for a page, but that is definitely not a fix. Actually we can't do anything for that because it's up to linux kernel vm management for when to free those memory.

Niu, probably we should take a look at slab implementation to check if there is any tunable parameters for this.

Jinshan Xiong (Inactive) added a comment - 20/Aug/13 8:18 PM patch 3bffa4d would mitigate the problem a little bit because less slab data structures will be used for a page, but that is definitely not a fix. Actually we can't do anything for that because it's up to linux kernel vm management for when to free those memory. Niu, probably we should take a look at slab implementation to check if there is any tunable parameters for this.

Jay Lan (Inactive) added a comment - 20/Aug/13 8:15 PM

It seems to me patch "~~LU-744~~ clio: save memory allocations for cl_page" decreases memory allocation of clio from 6 down to 2. It certainly will ease the pressure, but it does not seem to address the problem of memory stuck in SUnreclaim.

Jay Lan (Inactive) added a comment - 20/Aug/13 8:15 PM It seems to me patch " LU-744 clio: save memory allocations for cl_page" decreases memory allocation of clio from 6 down to 2. It certainly will ease the pressure, but it does not seem to address the problem of memory stuck in SUnreclaim.

Jay Lan (Inactive) added a comment - 20/Aug/13 5:59 PM

Niu,

Yes, unloading the modules freed up the SUnreclaim slabs.

The systems have 62G memory. One system constantly has high number of memory in slab (> 45G), most of them in SUnreclaim (~ 99%). The past two days I checked many times, and found the "Active / Total Slabes (% used)" showed between 4.5% ~ 8.5% of slab usage. I was told when the usage dropped below, say 1.5%, the system would be very sluggish and unusable.

45G x (100 - 8)% = 41.4G of SUnreclaim memory unused is a lot. It is OK if the memory just park there while we still have enough memory for normal operation, but they should be freed up when the system need them.

I would cherry-pick 3bffa4d (~~LU-744~~) into our 2.1.5. Do you know it would solve our problem? Thanks!

Jay Lan (Inactive) added a comment - 20/Aug/13 5:59 PM Niu, Yes, unloading the modules freed up the SUnreclaim slabs. The systems have 62G memory. One system constantly has high number of memory in slab (> 45G), most of them in SUnreclaim (~ 99%). The past two days I checked many times, and found the "Active / Total Slabes (% used)" showed between 4.5% ~ 8.5% of slab usage. I was told when the usage dropped below, say 1.5%, the system would be very sluggish and unusable. 45G x (100 - 8)% = 41.4G of SUnreclaim memory unused is a lot. It is OK if the memory just park there while we still have enough memory for normal operation, but they should be freed up when the system need them. I would cherry-pick 3bffa4d ( LU-744 ) into our 2.1.5. Do you know it would solve our problem? Thanks!

Jinshan Xiong (Inactive) added a comment - 20/Aug/13 3:42 AM

I tend to think this shows a `problem' of slab allocation. Let's take a look at the first line:

133434868 41138 0% 0.04K 1450379 92 5801516K lovsub_page_kmem

so it allocated a huge number of pages which can contain 133434868 of lovsub_page{} coming to 5G memory. However only 41138 of lovsub_page{} are being actively used.

Hi Jay, have you ever seen this caused any real problem? Usually kernel tends to cache slab object until the memory is in pressure. Therefore it is okay if the memory being used by slab can be freed later on.

Jinshan Xiong (Inactive) added a comment - 20/Aug/13 3:42 AM I tend to think this shows a `problem' of slab allocation. Let's take a look at the first line: 133434868 41138 0% 0.04K 1450379 92 5801516K lovsub_page_kmem so it allocated a huge number of pages which can contain 133434868 of lovsub_page{} coming to 5G memory. However only 41138 of lovsub_page{} are being actively used. Hi Jay, have you ever seen this caused any real problem? Usually kernel tends to cache slab object until the memory is in pressure. Therefore it is okay if the memory being used by slab can be freed later on.

Niu Yawei (Inactive) added a comment - 20/Aug/13 3:11 AM

Those slabs have been removed by 3bffa4d32bc5b0bc71ba6873e262ddbca436bae1 (~~LU-744~~) in master.

Niu Yawei (Inactive) added a comment - 20/Aug/13 3:11 AM Those slabs have been removed by 3bffa4d32bc5b0bc71ba6873e262ddbca436bae1 ( LU-744 ) in master.

Andreas Dilger added a comment - 19/Aug/13 4:15 PM

This is all CLIO metadata structures in the slabs. It also seems unusual that the number of slab objects is over 150M, which would be enough for over 600GB of pages and totally unreasonable for a node with 64GB of RAM.

Are there any patches that have been landed after 2.1 that might fix this problem?

Andreas Dilger added a comment - 19/Aug/13 4:15 PM This is all CLIO metadata structures in the slabs. It also seems unusual that the number of slab objects is over 150M, which would be enough for over 600GB of pages and totally unreasonable for a node with 64GB of RAM. Are there any patches that have been landed after 2.1 that might fix this problem?

Niu Yawei (Inactive) added a comment - 19/Aug/13 3:33 AM

The memory is consumed by the slabs, and those slabs will be destroyed when unloading lustre modules (obdclass, lov, ...), could you try to see if unloading lustre modules will help? Thanks.

Niu Yawei (Inactive) added a comment - 19/Aug/13 3:33 AM The memory is consumed by the slabs, and those slabs will be destroyed when unloading lustre modules (obdclass, lov, ...), could you try to see if unloading lustre modules will help? Thanks.

People

Assignee:: Niu Yawei (Inactive)

Reporter:: Jay Lan (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 16/Aug/13 7:01 PM

Updated:: 03/Oct/13 4:36 PM

Resolved:: 04/Sep/13 2:20 AM