<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:36:46 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-3771] stuck 56G of SUnreclaim memory</title>
                <link>https://jira.whamcloud.com/browse/LU-3771</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;We have ongoing problem of unreclaiming slab memory stuck in Lustre. It is different from &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-2613&quot; title=&quot;opening and closing file can generate &amp;#39;unreclaimable slab&amp;#39; space&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-2613&quot;&gt;&lt;del&gt;LU-2613&lt;/del&gt;&lt;/a&gt; in that unmounting Lustre FS did not release the stuck memory. Also we tried lflush and also the write technique suggested by Niu Yawei in &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-2613&quot; title=&quot;opening and closing file can generate &amp;#39;unreclaimable slab&amp;#39; space&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-2613&quot;&gt;&lt;del&gt;LU-2613&lt;/del&gt;&lt;/a&gt; at 15/Jan/13 8:54 AM. None worked for us.&lt;/p&gt;

&lt;p&gt;This is an ongoing problem and created a lot of problem in our production systems.&lt;/p&gt;

&lt;p&gt;I will append /proc/meminfo and a &apos;slabtop&apos; output below. Let me know what other information you need.&lt;/p&gt;

&lt;p&gt;bridge2 /proc # cat meminfo&lt;br/&gt;
MemTotal:       65978336 kB&lt;br/&gt;
MemFree:         4417544 kB&lt;br/&gt;
Buffers:            7804 kB&lt;br/&gt;
Cached:           183036 kB&lt;br/&gt;
SwapCached:         6068 kB&lt;br/&gt;
Active:           101840 kB&lt;br/&gt;
Inactive:         183404 kB&lt;br/&gt;
Active(anon):      83648 kB&lt;br/&gt;
Inactive(anon):    13036 kB&lt;br/&gt;
Active(file):      18192 kB&lt;br/&gt;
Inactive(file):   170368 kB&lt;br/&gt;
Unevictable:        3480 kB&lt;br/&gt;
Mlocked:            3480 kB&lt;br/&gt;
SwapTotal:       2000052 kB&lt;br/&gt;
SwapFree:        1669420 kB&lt;br/&gt;
Dirty:               288 kB&lt;br/&gt;
Writeback:             0 kB&lt;br/&gt;
AnonPages:         92980 kB&lt;br/&gt;
Mapped:            16964 kB&lt;br/&gt;
Shmem:               136 kB&lt;br/&gt;
Slab:           57633936 kB&lt;br/&gt;
SReclaimable:    1029472 kB&lt;br/&gt;
SUnreclaim:     56604464 kB&lt;br/&gt;
KernelStack:        5280 kB&lt;br/&gt;
PageTables:        15928 kB&lt;br/&gt;
NFS_Unstable:          0 kB&lt;br/&gt;
Bounce:                0 kB&lt;br/&gt;
WritebackTmp:          0 kB&lt;br/&gt;
CommitLimit:    34989220 kB&lt;br/&gt;
Committed_AS:     737448 kB&lt;br/&gt;
VmallocTotal:   34359738367 kB&lt;br/&gt;
VmallocUsed:     2348084 kB&lt;br/&gt;
VmallocChunk:   34297775112 kB&lt;br/&gt;
HardwareCorrupted:     0 kB&lt;br/&gt;
HugePages_Total:       0&lt;br/&gt;
HugePages_Free:        0&lt;br/&gt;
HugePages_Rsvd:        0&lt;br/&gt;
HugePages_Surp:        0&lt;br/&gt;
Hugepagesize:       2048 kB&lt;br/&gt;
DirectMap4k:        7104 kB&lt;br/&gt;
DirectMap2M:    67100672 kB&lt;br/&gt;
bridge2 /proc # &lt;/p&gt;

&lt;p&gt;bridge2 ~ #   slabtop --once&lt;/p&gt;

&lt;p&gt; Active / Total Objects (% used)    : 2291913 / 500886088 (0.5%)&lt;br/&gt;
 Active / Total Slabs (% used)      : 170870 / 14351991 (1.2%)&lt;br/&gt;
 Active / Total Caches (% used)     : 151 / 249 (60.6%)&lt;br/&gt;
 Active / Total Size (% used)       : 838108.56K / 53998141.57K (1.6%)&lt;br/&gt;
 Minimum / Average / Maximum Object : 0.01K / 0.11K / 4096.00K&lt;/p&gt;

&lt;p&gt;  OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME                   &lt;br/&gt;
133434868  41138   0%    0.04K 1450379       92   5801516K lovsub_page_kmem&lt;br/&gt;
124369720  77440   0%    0.19K 6218486       20  24873944K cl_page_kmem&lt;br/&gt;
115027759  41264   0%    0.05K 1493867       77   5975468K lov_page_kmem&lt;br/&gt;
77597568  41174   0%    0.08K 1616616       48   6466464K vvp_page_kmem&lt;br/&gt;
44004405  38371   0%    0.26K 2933627       15  11734508K osc_page_kmem&lt;br/&gt;
1558690   9106   0%    0.54K 222670        7    890680K radix_tree_node&lt;br/&gt;
1435785 457262  31%    0.25K  95719       15    382876K size-256&lt;br/&gt;
991104  24455   2%    0.50K 123888        8    495552K size-512&lt;br/&gt;
591420 573510  96%    0.12K  19714       30     78856K size-128&lt;br/&gt;
583038 507363  87%    0.06K   9882       59     39528K size-64&lt;br/&gt;
399080   4356   1%    0.19K  19954       20     79816K cred_jar&lt;br/&gt;
112112  81796  72%    0.03K   1001      112      4004K size-32&lt;br/&gt;
106368 106154  99%    0.08K   2216       48      8864K sysfs_dir_cache&lt;br/&gt;
 89740  26198  29%    1.00K  22435        4     89740K size-1024&lt;br/&gt;
 87018   1601   1%    0.62K  14503        6     58012K proc_inode_cache&lt;br/&gt;
 53772   2845   5%    0.58K   8962        6     35848K inode_cache&lt;br/&gt;
 44781  44746  99%    8.00K  44781        1    358248K size-8192&lt;br/&gt;
 42700  28830  67%    0.19K   2135       20      8540K dentry&lt;br/&gt;
 38990   2213   5%    0.79K   7798        5     31192K ext3_inode_cache&lt;br/&gt;
 25525  24880  97%    0.78K   5105        5     20420K shmem_inode_cache&lt;br/&gt;
 23394  16849  72%    0.18K   1114       21      4456K vm_area_struct&lt;br/&gt;
 22340   6262  28%    0.19K   1117       20      4468K filp&lt;br/&gt;
 20415  19243  94%    0.25K   1361       15      5444K skbuff_head_cache&lt;br/&gt;
 19893   2152  10%    0.20K   1047       19      4188K ll_obdo_cache&lt;br/&gt;
 15097  15006  99%    4.00K  15097        1     60388K size-4096&lt;br/&gt;
 14076   1837  13%    0.04K    153       92       612K osc_req_kmem&lt;br/&gt;
 12696   1448  11%    0.04K    138       92       552K lovsub_req_kmem&lt;br/&gt;
 11684   1444  12%    0.04K    127       92       508K lov_req_kmem&lt;br/&gt;
 10028   1477  14%    0.04K    109       92       436K ccc_req_kmem&lt;br/&gt;
  9750   3000  30%    0.12K    325       30      1300K nfs_page&lt;/p&gt;</description>
                <environment>Server: 2.1.4, centos 6.3&lt;br/&gt;
Client: 2.1.5, sles11sp1</environment>
        <key id="20377">LU-3771</key>
            <summary>stuck 56G of SUnreclaim memory</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="6">Not a Bug</resolution>
                                        <assignee username="niu">Niu Yawei</assignee>
                                    <reporter username="jaylan">Jay Lan</reporter>
                        <labels>
                    </labels>
                <created>Fri, 16 Aug 2013 19:01:40 +0000</created>
                <updated>Thu, 3 Oct 2013 16:36:55 +0000</updated>
                            <resolved>Wed, 4 Sep 2013 02:20:25 +0000</resolved>
                                                                        <due></due>
                            <votes>0</votes>
                                    <watches>8</watches>
                                                                            <comments>
                            <comment id="64426" author="pjones" created="Fri, 16 Aug 2013 21:17:46 +0000"  >&lt;p&gt;Niu&lt;/p&gt;

&lt;p&gt;Here is some further information from NASA. What do you advise?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="64446" author="niu" created="Mon, 19 Aug 2013 03:33:06 +0000"  >&lt;p&gt;The memory is consumed by the slabs, and those slabs will be destroyed when unloading lustre modules (obdclass, lov, ...), could you try to see if unloading lustre modules will help? Thanks.&lt;/p&gt;</comment>
                            <comment id="64473" author="adilger" created="Mon, 19 Aug 2013 16:15:41 +0000"  >&lt;p&gt;This is all CLIO metadata structures in the slabs. It also seems unusual that the number of slab objects is over 150M, which would be enough for over 600GB of pages and totally unreasonable for a node with 64GB of RAM.&lt;/p&gt;

&lt;p&gt;Are there any patches that have been landed after 2.1 that might fix this problem?  &lt;/p&gt;</comment>
                            <comment id="64486" author="niu" created="Tue, 20 Aug 2013 03:11:03 +0000"  >&lt;p&gt;Those slabs have been removed by 3bffa4d32bc5b0bc71ba6873e262ddbca436bae1 (&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-744&quot; title=&quot;Single client&amp;#39;s performance degradation on 2.1&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-744&quot;&gt;&lt;del&gt;LU-744&lt;/del&gt;&lt;/a&gt;) in master.&lt;/p&gt;</comment>
                            <comment id="64488" author="jay" created="Tue, 20 Aug 2013 03:42:02 +0000"  >&lt;p&gt;I tend to think this shows a `problem&apos; of slab allocation. Let&apos;s take a look at the first line:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;133434868 41138 0% 0.04K 1450379 92 5801516K lovsub_page_kmem
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;so it allocated a huge number of pages which can contain 133434868 of lovsub_page{} coming to 5G memory. However only 41138 of lovsub_page{} are being actively used.&lt;/p&gt;

&lt;p&gt;Hi Jay, have you ever seen this caused any real problem? Usually kernel tends to cache slab object until the memory is in pressure. Therefore it is okay if the memory being used by slab can be freed later on.&lt;/p&gt;
</comment>
                            <comment id="64645" author="jaylan" created="Tue, 20 Aug 2013 17:59:27 +0000"  >&lt;p&gt;Niu,&lt;/p&gt;

&lt;p&gt;Yes, unloading the modules freed up the SUnreclaim slabs. &lt;/p&gt;

&lt;p&gt;The systems have 62G memory. One system constantly has high number of memory in slab (&amp;gt; 45G), most of them in SUnreclaim (~ 99%). The past two days I checked many times, and found the &quot;Active / Total Slabes (% used)&quot; showed between 4.5% ~ 8.5% of slab usage. I was told when the usage dropped below, say 1.5%, the system would be very sluggish and unusable.&lt;/p&gt;

&lt;p&gt;45G x (100 - 8)% = 41.4G of SUnreclaim memory unused is a lot. It is OK if the memory just park there while we still have enough memory for normal operation, but they should be freed up when the system need them.&lt;/p&gt;

&lt;p&gt;I would cherry-pick 3bffa4d (&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-744&quot; title=&quot;Single client&amp;#39;s performance degradation on 2.1&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-744&quot;&gt;&lt;del&gt;LU-744&lt;/del&gt;&lt;/a&gt;) into our 2.1.5. Do you know it would solve our problem? Thanks!&lt;/p&gt;</comment>
                            <comment id="64666" author="jaylan" created="Tue, 20 Aug 2013 20:15:02 +0000"  >&lt;p&gt;It seems to me patch &quot;&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-744&quot; title=&quot;Single client&amp;#39;s performance degradation on 2.1&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-744&quot;&gt;&lt;del&gt;LU-744&lt;/del&gt;&lt;/a&gt; clio: save memory allocations for cl_page&quot; decreases memory allocation of clio from 6 down to 2. It certainly will ease the pressure, but it does not seem to address the problem of memory stuck in SUnreclaim.&lt;/p&gt;</comment>
                            <comment id="64667" author="jay" created="Tue, 20 Aug 2013 20:18:18 +0000"  >&lt;p&gt;patch 3bffa4d would mitigate the problem a little bit because less slab data structures will be used for a page, but that is definitely not a fix. Actually we can&apos;t do anything for that because it&apos;s up to linux kernel vm management for when to free those memory.&lt;/p&gt;

&lt;p&gt;Niu, probably we should take a look at slab implementation to check if there is any tunable parameters for this.&lt;/p&gt;</comment>
                            <comment id="64670" author="jay" created="Tue, 20 Aug 2013 20:33:54 +0000"  >&lt;p&gt;There is /proc/sys/vm/min_slab_ratio in linux kernel and default is 5, you may set it higher and see if it can help.&lt;/p&gt;</comment>
                            <comment id="64678" author="jaylan" created="Tue, 20 Aug 2013 22:27:25 +0000"  >&lt;p&gt;The min_slab_ratio defines the threshold when the kernel will free the reclaimable slab. But in our cases, the slabs held up by Lustre were in Unreclaimable slabs. Changing that value would not help.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-2613&quot; title=&quot;opening and closing file can generate &amp;#39;unreclaimable slab&amp;#39; space&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-2613&quot;&gt;&lt;del&gt;LU-2613&lt;/del&gt;&lt;/a&gt; found a case where unreclaimable slabs should have been released but not. We may hit another case? Don&apos;t know. But, as Andreas commented those number were unreasonably high.&lt;/p&gt;</comment>
                            <comment id="64703" author="jay" created="Wed, 21 Aug 2013 01:36:19 +0000"  >&lt;p&gt;Yes, you&apos;re right about this. The slab memory should be in SRclaimable but it was in SUnreclaim for unknown reason.&lt;/p&gt;

&lt;p&gt;Based on the low number of `active objs&apos; in slabinfo, it doesn&apos;t look like a memory leak problem - was the memory all released after unloading lustre modules?&lt;/p&gt;</comment>
                            <comment id="64711" author="niu" created="Wed, 21 Aug 2013 05:25:31 +0000"  >&lt;blockquote&gt;
&lt;p&gt;Yes, you&apos;re right about this. The slab memory should be in SRclaimable but it was in SUnreclaim for unknown reason.&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;The slab memory is accounted in SUnreclaim when the slab cache is created without SLAB_RECLAIM_ACCOUNT flag, the cl/lov/osc page slabs are created without this flag, so they showed in SUnreclaim, and I think adding the flag and shrinker callback won&apos;t help, because the problem now is that slab cache isn&apos;t reaped but not the slab objects are not freed. &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Based on the low number of `active objs&apos; in slabinfo, it doesn&apos;t look like a memory leak problem - was the memory all released after unloading lustre modules?&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;Right, it&apos;s not a memory leak problem, and all the slab memory will be freed after unloading lustre modules (see Jay&apos;s previous comment)&lt;/p&gt;

&lt;p&gt;I don&apos;t think it&apos;s a lustre problem, the slab objects are already freed and put back in the slab cache after umount, so the problem is that kernel didn&apos;t reap the slab cache for some reason (actually, I don&apos;t know how to reap slab cache initiatively in 2.6 kernel).&lt;/p&gt;</comment>
                            <comment id="64716" author="niu" created="Wed, 21 Aug 2013 08:19:50 +0000"  >&lt;blockquote&gt;
&lt;p&gt;&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-2613&quot; title=&quot;opening and closing file can generate &amp;#39;unreclaimable slab&amp;#39; space&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-2613&quot;&gt;&lt;del&gt;LU-2613&lt;/del&gt;&lt;/a&gt; found a case where unreclaimable slabs should have been released but not. We may hit another case? Don&apos;t know. But, as Andreas commented those number were unreasonably high.&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-2613&quot; title=&quot;opening and closing file can generate &amp;#39;unreclaimable slab&amp;#39; space&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-2613&quot;&gt;&lt;del&gt;LU-2613&lt;/del&gt;&lt;/a&gt; is totally different, in &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-2613&quot; title=&quot;opening and closing file can generate &amp;#39;unreclaimable slab&amp;#39; space&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-2613&quot;&gt;&lt;del&gt;LU-2613&lt;/del&gt;&lt;/a&gt;, Lustre holds the slab objects, in this ticket, Lustre doesn&apos;t hold any object (fs is already umount, and all objects freed), but kernel doesn&apos;t free memory in slab cache.&lt;/p&gt;

&lt;p&gt;I think the reason of such a high object number is that the filesystem has been umount/umount &amp;amp; used for a very long time, so lots of objects were created.&lt;/p&gt;</comment>
                            <comment id="64768" author="jaylan" created="Wed, 21 Aug 2013 17:53:32 +0000"  >&lt;p&gt;Niu, it is not exactly as what you said &quot;the slab objects are already freed and put back in the slab cache after umount&quot;.&lt;/p&gt;

&lt;p&gt;Bridge2 was last rebooted 2 days ago at Aug 19 04:38. All 8 lustre fs have not been umounted since. Here is the &apos;slabtop&apos; output:&lt;/p&gt;

&lt;p&gt; Active / Total Objects (% used)    : 8844277 / 385193960 (2.3%)&lt;br/&gt;
 Active / Total Slabs (% used)      : 411903 / 9494044 (4.3%)&lt;br/&gt;
 Active / Total Caches (% used)     : 151 / 249 (60.6%)&lt;br/&gt;
 Active / Total Size (% used)       : 1729034.95K / 35916957.66K (4.8%)&lt;br/&gt;
 Minimum / Average / Maximum Object : 0.01K / 0.09K / 4096.00K&lt;/p&gt;

&lt;p&gt;  OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME                   &lt;br/&gt;
124673156 1052956   0%    0.04K 1355143       92   5420572K lovsub_page_kmem&lt;/p&gt;

&lt;p&gt;104933521 1057165   1%    0.05K 1362773       77   5451092K lov_page_kmem&lt;/p&gt;

&lt;p&gt;68093040 1050624   1%    0.08K 1418605       48   5674420K vvp_page_kmem&lt;/p&gt;

&lt;p&gt;53508260 2119860   3%    0.19K 2675413       20  10701652K cl_page_kmem&lt;/p&gt;

&lt;p&gt;12914768  27449   0%    0.50K 1614346        8   6457384K size-512&lt;/p&gt;

&lt;p&gt;11721315 1058247   9%    0.26K 781421       15   3125684K osc_page_kmem&lt;/p&gt;

&lt;p&gt;5503680  81827   1%    0.03K  49140      112    196560K size-32&lt;br/&gt;
1856400 1690306  91%    0.12K  61880       30    247520K size-128&lt;/p&gt;

&lt;p&gt;639760   4178   0%    0.19K  31988       20    127952K cred_jar&lt;br/&gt;
475658 124818  26%    0.06K   8062       59     32248K size-64&lt;br/&gt;
250560  87321  34%    0.25K  16704       15     66816K size-256&lt;br/&gt;
106416 106154  99%    0.08K   2217       48      8868K sysfs_dir_cache&lt;br/&gt;
 65527  65103  99%    0.54K   9361        7     37444K radix_tree_node&lt;br/&gt;
 46062  46027  99%    8.00K  46062        1    368496K size-8192&lt;br/&gt;
 43160  41625  96%    0.19K   2158       20      8632K dentry&lt;br/&gt;
 42600  35980  84%    1.00K  10650        4     42600K size-1024&lt;br/&gt;
 37962  11173  29%    0.10K   1026       37      4104K buffer_head&lt;br/&gt;
 28440   1072   3%    0.12K    948       30      3792K nfs_page&lt;br/&gt;
 27798   5355  19%    0.58K   4633        6     18532K inode_cache&lt;br/&gt;
 25545  24886  97%    0.78K   5109        5     20436K shmem_inode_cache&lt;br/&gt;
 20910  19187  91%    0.25K   1394       15      5576K skbuff_head_cache&lt;br/&gt;
 17346   4255  24%    0.62K   2891        6     11564K proc_inode_cache&lt;br/&gt;
 15049  14950  99%    4.00K  15049        1     60196K size-4096&lt;br/&gt;
 13902  12067  86%    0.18K    662       21      2648K vm_area_struct&lt;br/&gt;
 10200   6255  61%    0.19K    510       20      2040K filp&lt;br/&gt;
  8408   8192  97%    0.44K   1051        8      4204K ib_mad&lt;br/&gt;
  5900   5586  94%    2.00K   2950        2     11800K size-2048&lt;br/&gt;
  4445   3003  67%    0.56K    635        7      2540K ldlm_locks&lt;br/&gt;
  4032   2810  69%    0.02K     28      144       112K anon_vma&lt;br/&gt;
  3696   3360  90%    0.08K     77       48       308K Acpi-State&lt;br/&gt;
  3638   2106  57%    0.11K    107       34       428K journal_head&lt;br/&gt;
  3498   2873  82%    0.07K     66       53       264K Acpi-Operand&lt;br/&gt;
  3312     72   2%    0.02K     23      144        92K journal_handle&lt;br/&gt;
  3017   2366  78%    0.50K    431        7      1724K skbuff_fclone_cache&lt;/p&gt;


&lt;p&gt;We have not done umount of Lustre fs. From past observation unmount would not free up the slab memory until we unload lustre modules. The fact that unloading the lustre modules frees up slabs suggests some communication not right between kernel and the lustre modules. Why and how? I do not know.&lt;/p&gt;</comment>
                            <comment id="64806" author="niu" created="Thu, 22 Aug 2013 00:26:34 +0000"  >&lt;blockquote&gt;
&lt;p&gt;We have not done umount of Lustre fs. From past observation unmount would not free up the slab memory until we unload lustre modules. The fact that unloading the lustre modules frees up slabs suggests some communication not right between kernel and the lustre modules. Why and how? I do not know.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;Lustre should have freed all slab objects (done by kmem_cache_free()) after umount, but it doesn&apos;t mean slab cache will free the memory used by object immediately, slab cache will still hold the memory for next use, the memory will only be freed when slab thinks memory is tight or the slab is destroyed. (when unload Lustre module, the slabs will be destroyed)&lt;/p&gt;

&lt;p&gt;If the slab cache consumed too much memory, and that result in unusable/sluggish system, I think there could be some defects in the slab reap mechanism (slab cache is run by kernel, not Lustre), what we can do is to reduce use of slab in Lustre, the fix of &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-744&quot; title=&quot;Single client&amp;#39;s performance degradation on 2.1&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-744&quot;&gt;&lt;del&gt;LU-744&lt;/del&gt;&lt;/a&gt; could be helpful.&lt;/p&gt;</comment>
                            <comment id="65668" author="jaylan" created="Tue, 3 Sep 2013 23:40:56 +0000"  >&lt;p&gt;I now think the problem was probably caused by a certain application by certain user(s). For about a week after crash, about 90% of system memory were in slab. Last Friday I checked again, the slab percentage dropped down to 38%. Today the slab percentage was 30%.&lt;/p&gt;

&lt;p&gt;We can close this ticket. Next time should the problem happen again, we will track down the user and help him/her figure out how to address the problem.&lt;/p&gt;</comment>
                            <comment id="65676" author="pjones" created="Wed, 4 Sep 2013 02:20:25 +0000"  >&lt;p&gt;ok - thanks Jay!&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10010">
                    <name>Duplicate</name>
                                                                <inwardlinks description="is duplicated by">
                                        <issuelink>
            <issuekey id="21245">LU-4053</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzvy5b:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9716</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10021"><![CDATA[2]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>