<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:41:09 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-4264] Excessive slab usage on 1.8.9 server</title>
                <link>https://jira.whamcloud.com/browse/LU-4264</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;NOAA has been having a problem with OOM on their OSSes causing failover. Looking at collectl output from right before the crash, it appears that all the memory is being consumed by the size-256 slab:&lt;br/&gt;
size-256                     168M  41122M    168M  41122M  11229K  43864M  11229K  43864M 200704    0.0&lt;/p&gt;

&lt;p&gt;Is there a way to determine what those objects are and reduce the amount of memory they are taking? The vmcore is available if necessary. &lt;/p&gt;</description>
                <environment></environment>
        <key id="22142">LU-4264</key>
            <summary>Excessive slab usage on 1.8.9 server</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="2" iconUrl="https://jira.whamcloud.com/images/icons/priorities/critical.svg">Critical</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="green">Oleg Drokin</assignee>
                                    <reporter username="orentas">Oz Rentas</reporter>
                        <labels>
                    </labels>
                <created>Mon, 18 Nov 2013 14:28:05 +0000</created>
                <updated>Fri, 1 Sep 2017 14:55:31 +0000</updated>
                            <resolved>Fri, 1 Sep 2017 14:55:31 +0000</resolved>
                                    <version>Lustre 1.8.9</version>
                                                        <due></due>
                            <votes>1</votes>
                                    <watches>8</watches>
                                                                            <comments>
                            <comment id="71788" author="green" created="Mon, 18 Nov 2013 14:51:22 +0000"  >&lt;p&gt;You can do &quot;echo +malloc &amp;gt;/proc/sys/lnet/debug&quot; (no quotes) on the affected servers, and then as the number keeps growing, you can do lctl dk &amp;gt;/tmp/somewhere&lt;br/&gt;
This will tell you what allocations (and freeings) lustre has been doing. But if the allocation does not originate from Lustre, this won&apos;t tell you anything.&lt;/p&gt;

&lt;p&gt;You can also enable in-kernel memory leak tracer if you suspect you have a genuine memory leak (it&apos;s a kernel config option in debug options).&lt;br/&gt;
this will print warnings in kernel dmesg log, and you will be able to get suspected leaked allocations and backtraces that led to it from a special debugfs file.&lt;/p&gt;</comment>
                            <comment id="71790" author="kitwestneat" created="Mon, 18 Nov 2013 14:58:31 +0000"  >&lt;p&gt;The log from crash and the collectl from the time of the OOM. &lt;/p&gt;</comment>
                            <comment id="71795" author="kitwestneat" created="Mon, 18 Nov 2013 15:24:21 +0000"  >&lt;p&gt;I found an oss whose slab appears to be growing larger than the other OSSes. oss-2-18 is typical &lt;/p&gt;
&lt;ol&gt;
	&lt;li&gt;pdsh -a cat /proc/slabinfo |grep size-256&lt;br/&gt;
lfs-oss-2-18: size-256(DMA)          0      0    256   15    1 : tunables  120   60    8 : slabdata      0      0      0&lt;br/&gt;
lfs-oss-2-18: size-256          1633652 1646400    256   15    1 : tunables  120   60    8 : slabdata 109760 109760     76&lt;br/&gt;
...&lt;br/&gt;
lfs-oss-2-15: size-256(DMA)          0      0    256   15    1 : tunables  120   60    8 : slabdata      0      0      0&lt;br/&gt;
lfs-oss-2-15: size-256          168811996 168812010    256   15    1 : tunables  120   60    8 : slabdata 11254134 11254134     39&lt;/li&gt;
&lt;/ol&gt;


&lt;p&gt;So there is a couple orders of magnitude difference, and it keeps increasing.&lt;/p&gt;

&lt;p&gt;Slab size-256 allocations within a minute or two:&lt;/p&gt;

&lt;p&gt;&lt;span class=&quot;error&quot;&gt;&amp;#91;root@lfs-oss-2-15 ~&amp;#93;&lt;/span&gt;# cat /proc/slabinfo |grep e-256&lt;br/&gt;
size-256(DMA)          0      0    256   15    1 : tunables  120   60    8 : slabdata      0      0      0&lt;br/&gt;
size-256          168812385 168812475    256   15    1 : tunables  120   60    8 : slabdata 11254165 11254165    376&lt;/p&gt;

&lt;p&gt;&lt;span class=&quot;error&quot;&gt;&amp;#91;root@lfs-oss-2-15 ~&amp;#93;&lt;/span&gt;# cat /proc/slabinfo |grep e-256&lt;br/&gt;
size-256(DMA)          0      0    256   15    1 : tunables  120   60    8 : slabdata      0      0      0&lt;br/&gt;
size-256          168813438 168813450    256   15    1 : tunables  120   60    8 : slabdata 11254230 11254230     27&lt;/p&gt;

&lt;p&gt;I ran malloc and grepped for slab-alloc:&lt;br/&gt;
00010000:00000010:15:1384787443.947162:0:24815:0:(ldlm_lock.c:338:ldlm_lock_new()) slab-alloced &apos;lock&apos;: 512 at ffff810abf0b8400.&lt;br/&gt;
00010000:00000010:15:1384787443.947167:0:24815:0:(ldlm_extent.c:799:ldlm_interval_alloc()) slab-alloced &apos;node&apos;: 72 at ffff810b4013c1c0.&lt;br/&gt;
00010000:00000010:15:1384787444.003597:0:24908:0:(ldlm_resource.c:757:ldlm_resource_new()) slab-alloced &apos;res&apos;: 344 at ffff8109e26d0cc0.&lt;br/&gt;
00002000:00000010:15:1384787444.197912:0:5041:0:(fsfilt-ldiskfs.c:809:fsfilt_ldiskfs_add_journal_cb()) slab-alloced &apos;fcb&apos;: 56 at ffff810ae623be98.&lt;/p&gt;

&lt;p&gt;I checked the number of locks to see if they were very large, but it is comparable to other OSSes:&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;root@lfs-oss-2-15 ~&amp;#93;&lt;/span&gt;# cat /proc/fs/lustre/ldlm/namespaces/filter-scratch2-OST*/lock_count&lt;br/&gt;
22085&lt;br/&gt;
9504&lt;br/&gt;
22975&lt;br/&gt;
12472&lt;br/&gt;
7644&lt;br/&gt;
14597&lt;br/&gt;
6662&lt;br/&gt;
8624&lt;br/&gt;
20551&lt;br/&gt;
8725&lt;br/&gt;
16634&lt;/p&gt;
</comment>
                            <comment id="71805" author="kitwestneat" created="Mon, 18 Nov 2013 16:29:15 +0000"  >&lt;p&gt;Any ideas? The OSSes keep crashing.&lt;/p&gt;</comment>
                            <comment id="71807" author="kitwestneat" created="Mon, 18 Nov 2013 16:32:39 +0000"  >&lt;p&gt;Attached the malloc dk log&lt;/p&gt;</comment>
                            <comment id="71814" author="kitwestneat" created="Mon, 18 Nov 2013 18:39:01 +0000"  >&lt;p&gt;It appears to be a memory leak of some kind - the servers having issues had uptimes of 155 days, while the other OSSes were more recently rebooted. I wasn&apos;t able to determine where the leak was. I looked at some of the objects in the slab of a vmcore we got, and some were definitely Lustre related, but I couldn&apos;t narrow it down to anything. I also tried unloading modules and stopping processes to try to get the memory back, but it didn&apos;t budge. &lt;/p&gt;

&lt;p&gt;Any tips for debugging this kind of problem?&lt;/p&gt;</comment>
                            <comment id="71833" author="jay" created="Mon, 18 Nov 2013 21:53:26 +0000"  >&lt;p&gt;I didn&apos;t find leak information from the log file of dk1 you posted. How much time would it survive for each restart? The log file has collected the memory allocation from lustre for 80 seconds, so it must have hit thousands of times if the leaking allocation is from lustre.&lt;/p&gt;

&lt;p&gt;256 byte-size slab is created by linux kernel to serve generic kmalloc, and lustre creates its own slab cache for dlm lock, therefore it won&apos;t be the lak from dlm lock fore sure. I used the following command to check the log to try to find the leaking blocks:&lt;/p&gt;

&lt;p&gt;egrep &apos;kmalloced|kfreed&apos; dk1 |awk &apos;&lt;/p&gt;
{if($4&amp;gt;128 &amp;amp;&amp;amp; $4&amp;lt;=256)print}
&lt;p&gt;&apos; &amp;gt; temp_log&lt;br/&gt;
perl ~/srcs/lustre/lustre/tests/leak_finder.pl temp_log&lt;/p&gt;

&lt;p&gt;(you can find leak_finder from lustre-tests rpm).&lt;/p&gt;

&lt;p&gt;the output was:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;   ...
   freed    144 bytes at ffff810a83297980 called desc (client.c:ptlrpc_free_bulk:188)
   freed    144 bytes at ffff8107f9ca16c0 called desc (client.c:ptlrpc_free_bulk:188)
   freed    144 bytes at ffff810825d80980 called desc (client.c:ptlrpc_free_bulk:188)
   freed    144 bytes at ffff81090a21d1c0 called desc (client.c:ptlrpc_free_bulk:188)
malloced    176 bytes at ffff810accc66880 called blwi (ldlm_lockd.c:ldlm_bl_to_thread:1672)
*** Leak: 176 bytes allocated at ffff810accc66880 (ldlm_lockd.c:ldlm_bl_to_thread:1672, debug file line 15320)
maximum used: 4912, amount leaked: 176
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;the last entry is not a true memory leak, because the free log was not yet collected.&lt;/p&gt;

&lt;p&gt;Is this node different from the others in anyways, kernel, drivers or recent updates? This is probably not a Lustre problem.&lt;/p&gt;

&lt;p&gt;Anyway, let&apos;s drill down a little bit. Let&apos;s try ftrace to see if we can find it out.&lt;/p&gt;

&lt;p&gt;Please follow the instruction here: &lt;a href=&quot;http://elinux.org/Kernel_dynamic_memory_analysis&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://elinux.org/Kernel_dynamic_memory_analysis&lt;/a&gt;, please read the Debugfs section and then go to Dynamic section.&lt;/p&gt;

&lt;p&gt;You can trace kmalloc and kfree events as follows:&lt;/p&gt;

&lt;p&gt;echo &quot;kmem:kmalloc_node kmem:kfree kmem:kmalloc&quot; &amp;gt; /sys/kernel/debug/tracing/set_event&lt;/p&gt;

&lt;p&gt;Then enable the trace by:&lt;/p&gt;

&lt;p&gt;echo &quot;1&quot; &amp;gt; /sys/kernel/debug/tracing/tracing_on&lt;/p&gt;

&lt;p&gt;After a while, you believe you have seen memory reduction, then you can dump the memory allocation information by:&lt;/p&gt;

&lt;p&gt;echo &quot;0&quot; &amp;gt; /sys/kernel/debug/tracing/tracing_on&lt;br/&gt;
cat /sys/kernel/debug/tracing/trace &amp;gt; trace_log&lt;/p&gt;

&lt;p&gt;Also, you need to dump the kernel symbol of running kernel by:&lt;/p&gt;

&lt;p&gt;cat /proc/kallsyms &amp;gt; kallsyms.txt&lt;/p&gt;

&lt;p&gt;Then we can do further analysis.&lt;/p&gt;</comment>
                            <comment id="71836" author="kitwestneat" created="Mon, 18 Nov 2013 22:34:14 +0000"  >&lt;p&gt;Unfortunately, it looks like RHEL5 doesn&apos;t have trace support compiled in, so we would have to reboot.&lt;/p&gt;

&lt;p&gt;The memory leak seems to be occurring on all the servers, we still have one system up that has been up for 155 days and it has very high memory usage in slab-256. It is the backup MDS, so we have not rebooted it yet, in case there is information we can still get. &lt;/p&gt;

&lt;p&gt;It seems that the servers go about 150 days before starting to have problems. I thought that these servers had already been reboot earlier today, but that was not the case. The file system is mostly stable now, so the severity can be reduced. NOAA is very anxious about the memory leaks, however, and so we still need to figure out where that is coming from.&lt;/p&gt;</comment>
                            <comment id="71854" author="green" created="Tue, 19 Nov 2013 08:26:27 +0000"  >&lt;p&gt;Well, given that you cannot identify a culprit in the lustre log (or is it a really-really slow leak?) and it takes 150 days to manifest, also assuming you don&apos;t want to recompile your kernel in a way that would allow tracking leaks, the only realistic options left to you, I guess are:&lt;/p&gt;

&lt;p&gt;schedule some quiet time and reboot the remaining OSSes with 150+ days uptime, and reboot them every 100 days or so until you can gather some extra debug or upgrade to a newer version on those nodes (so that reboots are controllable vs random).&lt;/p&gt;

&lt;p&gt;Just do nothing and the servers will fail when they fail and recover all by themselves.&lt;/p&gt;</comment>
                            <comment id="71896" author="kitwestneat" created="Tue, 19 Nov 2013 16:05:10 +0000"  >&lt;p&gt;Ok, that makes sense.&lt;/p&gt;

&lt;p&gt;Is the vmcore useful at all in identifying what is occupying the memory?&lt;/p&gt;</comment>
                            <comment id="71917" author="jay" created="Tue, 19 Nov 2013 19:16:30 +0000"  >&lt;p&gt;no, vmcore won&apos;t help this case.&lt;/p&gt;</comment>
                            <comment id="74749" author="kitwestneat" created="Fri, 10 Jan 2014 20:24:06 +0000"  >&lt;p&gt;I think I found the memory leak, it looks like a Mellanox patch was incorrectly backported. I created a Redhat ticket here:&lt;br/&gt;
&lt;a href=&quot;https://bugzilla.redhat.com/show_bug.cgi?id=1049601&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://bugzilla.redhat.com/show_bug.cgi?id=1049601&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I&apos;ll attach the original 2.6.18-308.11.1 version and the broken 2.6.18-348.1.1. Can you tell me if my analysis makes sense?&lt;/p&gt;</comment>
                            <comment id="74751" author="kitwestneat" created="Fri, 10 Jan 2014 20:39:48 +0000"  >&lt;p&gt;Here&apos;s a link to the kernel.org git version:&lt;br/&gt;
&lt;a href=&quot;https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/infiniband/hw/mlx4/main.c&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/infiniband/hw/mlx4/main.c&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;ib_link_query_port is the function with the memory leak.&lt;/p&gt;</comment>
                            <comment id="75190" author="jpdionne" created="Fri, 17 Jan 2014 16:21:47 +0000"  >&lt;p&gt;I have a similar problem here. Same kernel and slow slab usage increase over time.   I do not have access to the Bugzilla link.   By looking at the main.c differences, I can&apos;t pinpoint the leak.   Can you provide a patch or a link to the commit that introduced problem?&lt;/p&gt;
</comment>
                            <comment id="75193" author="kitwestneat" created="Fri, 17 Jan 2014 16:53:24 +0000"  >&lt;p&gt;oh weird, I wonder why it&apos;s private. I&apos;ll attach the patch, and try to get it into Gerrit.&lt;/p&gt;</comment>
                            <comment id="75350" author="kitwestneat" created="Tue, 21 Jan 2014 16:40:12 +0000"  >&lt;p&gt;&lt;a href=&quot;http://review.whamcloud.com/#/c/8942/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#/c/8942/&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="207206" author="orentas" created="Fri, 1 Sep 2017 14:54:46 +0000"  >&lt;p&gt;This is resolved. Please close.&lt;/p&gt;</comment>
                            <comment id="207208" author="pjones" created="Fri, 1 Sep 2017 14:55:31 +0000"  >&lt;p&gt;ok thanks&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                            <attachment id="13994" name="DE11223-fix-mlx4-leak.patch" size="1029" author="kitwestneat" created="Fri, 17 Jan 2014 16:53:45 +0000"/>
                            <attachment id="13839" name="collectl.out" size="1207435" author="kitwestneat" created="Mon, 18 Nov 2013 14:58:31 +0000"/>
                            <attachment id="13840" name="dk1.gz" size="8689145" author="kitwestneat" created="Mon, 18 Nov 2013 16:32:39 +0000"/>
                            <attachment id="13969" name="main.c-308.11.1.el5" size="27896" author="kitwestneat" created="Fri, 10 Jan 2014 20:39:48 +0000"/>
                            <attachment id="13970" name="main.c-348.1.1.el5" size="42833" author="kitwestneat" created="Fri, 10 Jan 2014 20:39:48 +0000"/>
                            <attachment id="13838" name="vmcore.log" size="507572" author="kitwestneat" created="Mon, 18 Nov 2013 14:58:31 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10490" key="com.atlassian.jira.plugin.system.customfieldtypes:datepicker">
                        <customfieldname>End date</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Fri, 9 May 2014 14:28:05 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzw9kv:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>11714</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10020"><![CDATA[1]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                        <customfield id="customfield_10493" key="com.atlassian.jira.plugin.system.customfieldtypes:datepicker">
                        <customfieldname>Start date</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Mon, 18 Nov 2013 14:28:05 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                    </customfields>
    </item>
</channel>
</rss>