<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:09:34 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-7517] oom killer active after failback of MDS resources</title>
                <link>https://jira.whamcloud.com/browse/LU-7517</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;The error below happens during soak testing of change 16838 patch set #31 (no Wiki entry for build exits, yet) on cluster lola. DNE is enabled and MDSes are configured in active-active HA failover configuration.&lt;/p&gt;

&lt;p&gt;    Primary resources of MDT lola-11 were failed back at Dec, 3 20:18.&lt;br/&gt;
    The allocation of slabs increased continuously till ~ 31 GB till crash&lt;br/&gt;
    MDS node lola-11 crashed with oom-killer at Dec, 4 00:21 (local time). (see also &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-7432&quot; title=&quot;oom-killer started on MDSes&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-7432&quot;&gt;&lt;del&gt;LU-7432&lt;/del&gt;&lt;/a&gt;)&lt;br/&gt;
    ptlrpc_cache seems to be the biggest consumer&lt;br/&gt;
    Attached lola-11&apos;s messages, console log, vmcore-dmesg file, collectl (version V4.0.2-1) files (for time interval specified above). Also&lt;br/&gt;
    attached files containing extracted counters for memory, slab totals and per slab allocation.&lt;/p&gt;
</description>
                <environment>lola:&lt;br/&gt;
build: tip of master + #31 of change 16383</environment>
        <key id="33457">LU-7517</key>
            <summary>oom killer active after failback of MDS resources</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="1" iconUrl="https://jira.whamcloud.com/images/icons/priorities/blocker.svg">Blocker</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="2">Won&apos;t Fix</resolution>
                                        <assignee username="wc-triage">WC Triage</assignee>
                                    <reporter username="heckes">Frank Heckes</reporter>
                        <labels>
                            <label>soak</label>
                    </labels>
                <created>Fri, 4 Dec 2015 15:14:30 +0000</created>
                <updated>Tue, 24 Jan 2017 22:40:05 +0000</updated>
                            <resolved>Tue, 24 Jan 2017 22:40:05 +0000</resolved>
                                                                        <due></due>
                            <votes>0</votes>
                                    <watches>3</watches>
                                                                            <comments>
                            <comment id="135227" author="heckes" created="Fri, 4 Dec 2015 15:16:08 +0000"  >&lt;p&gt;The crash dump has been saved to lola-1:/scratch/crashdumps/lu-7517/127.0.0.1-2015-12-04-00\:22\:36.&lt;br/&gt;
It turned out that the collectl raw files are to big to be uploaded to Jira. I saved them to lola-1:/scratch/crashdumps/lu-7517.&lt;/p&gt;</comment>
                            <comment id="135228" author="heckes" created="Fri, 4 Dec 2015 15:17:13 +0000"  >&lt;p&gt;No debug log files have been written.&lt;/p&gt;</comment>
                            <comment id="135273" author="di.wang" created="Fri, 4 Dec 2015 19:38:59 +0000"  >&lt;p&gt;looks like lola-8 and lola-9 got OOM as well.&lt;/p&gt;</comment>
                            <comment id="135274" author="di.wang" created="Fri, 4 Dec 2015 19:40:39 +0000"  >&lt;p&gt;It looks like most the memory is holden by 1M size slab&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;20151204 00:21:00 size-1048576 29758 31203524608 29758 31203524608 29758 31203524608 29758 31203524608 0 0
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="136310" author="heckes" created="Tue, 15 Dec 2015 11:49:42 +0000"  >&lt;p&gt;The error showed up on all soak MDSes (lola-8 not reported in detail) running &lt;tt&gt;soak&lt;/tt&gt; for build &lt;a href=&quot;https://build.hpdd.intel.com/job/lustre-reviews/36192/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://build.hpdd.intel.com/job/lustre-reviews/36192/&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;lola-9
	&lt;ul&gt;
		&lt;li&gt;20151213 09:52:40 failback MDTS to lola finished&lt;/li&gt;
		&lt;li&gt;Dec 13 16:05:01 lola-9  oom-killer started&lt;/li&gt;
	&lt;/ul&gt;
	&lt;/li&gt;
	&lt;li&gt;lola-10
	&lt;ul&gt;
		&lt;li&gt;20151213 08:01:40 failback MDTs to lola-10 finished&lt;/li&gt;
		&lt;li&gt;Dec 13 11:40:02 lola-10  oom-killer started&lt;/li&gt;
	&lt;/ul&gt;
	&lt;/li&gt;
	&lt;li&gt;lola-11
	&lt;ul&gt;
		&lt;li&gt;20151214 02:15:00  failback of MDTs to lola-11 finished&lt;/li&gt;
		&lt;li&gt;Dec 14 12:05:01 lola-11  oom-killer started&lt;/li&gt;
	&lt;/ul&gt;
	&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;From failback till start of oom-killer the &lt;tt&gt;size-1048576&lt;/tt&gt; slabs continuously increased and are the biggest memory consumers:&lt;br/&gt;
&lt;b&gt;lola-9&lt;/b&gt;&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;#Date Time SlabName ObjInUse ObjInUseB ObjAll ObjAllB SlabInUse SlabInUseB SlabAll SlabAllB SlabChg SlabPct
slab-details/size-1048576.dat:20151213 16:05:40 size-1048576 23122 24245174272 23122 24245174272 23122 24245174272 23122 24245174272 0 0
slab-details/size-512.dat:20151213 16:05:40 size-512 11151621 5709629952 11152824 5710245888 1394094 5710209024 1394103 5710245888 7835648 0
slab-details/size-128.dat:20151213 16:05:40 size-128 7076064 905736192 7077360 905902080 235911 966291456 235912 966295552 1376256 0
slab-details/size-262144.dat:20151213 16:05:40 size-262144 1673 438566912 1673 438566912 1673 438566912 1673 438566912 0 0
slab-details/ptlrpc_cache.dat:20151213 16:05:40 ptlrpc_cache 376940 289489920 376940 289489920 75388 308789248 75388 308789248 204800 0
...
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;b&gt;lola-10&lt;/b&gt;&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;#Date Time SlabName ObjInUse ObjInUseB ObjAll ObjAllB SlabInUse SlabInUseB SlabAll SlabAllB SlabChg SlabPct
slab-details/size-1048576.dat:20151213 11:40:40 size-1048576 29494 30926700544 29494 30926700544 29494 30926700544 29494 30926700544 0 0
slab-details/size-262144.dat:20151213 11:40:40 size-262144 1015 266076160 1015 266076160 1015 266076160 1015 266076160 0 0
slab-details/ptlrpc_cache.dat:20151213 11:40:40 ptlrpc_cache 195920 150466560 195920 150466560 39184 160497664 39184 160497664 167936 0
slab-details/size-1024.dat:20151213 11:40:40 size-1024 133537 136741888 133540 136744960 33385 136744960 33385 136744960 73728 0
slab-details/size-512.dat:20151213 11:40:40 size-512 150577 77095424 153896 78794752 19237 78794752 19237 78794752 0 0
...
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;b&gt;lola-11&lt;/b&gt;&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;#Date Time SlabName ObjInUse ObjInUseB ObjAll ObjAllB SlabInUse SlabInUseB SlabAll SlabAllB SlabChg SlabPct
slab-details/size-1048576.dat:20151214 12:05:40 size-1048576 29392 30819745792 29392 30819745792 29392 30819745792 29392 30819745792 0 0
slab-details/size-262144.dat:20151214 12:05:40 size-262144 1345 352583680 1345 352583680 1345 352583680 1345 352583680 0 0
slab-details/ptlrpc_cache.dat:20151214 12:05:40 ptlrpc_cache 224232 172210176 224290 172254720 44858 183738368 44858 183738368 57344 0
slab-details/size-1024.dat:20151214 12:05:40 size-1024 150612 154226688 150632 154247168 37655 154234880 37658 154247168 -20480 0
slab-details/size-8192.dat:20151214 12:05:40 size-8192 8859 72572928 8859 72572928 8859 72572928 8859 72572928 0 0
...
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Attached messages, console log files and extracted &lt;tt&gt;collectl&lt;/tt&gt; counters for memory, slab-global, slab-details for each node.&lt;/p&gt;</comment>
                            <comment id="137034" author="heckes" created="Mon, 21 Dec 2015 14:37:49 +0000"  >&lt;p&gt;Error appeared again on a MDT for build &apos;20151214&apos; (see &lt;a href=&quot;https://wiki.hpdd.intel.com/display/Releases/Soak+Testing+on+Lola#SoakTestingonLola-20151214&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://wiki.hpdd.intel.com/display/Releases/Soak+Testing+on+Lola#SoakTestingonLola-20151214&lt;/a&gt;)&lt;/p&gt;</comment>
                            <comment id="182017" author="cliffw" created="Tue, 24 Jan 2017 22:40:05 +0000"  >&lt;p&gt;Old issue not reproduced on recent builds&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                            <attachment id="19920" name="console-lola-10.log-20151213.gz" size="580209" author="heckes" created="Tue, 15 Dec 2015 12:17:38 +0000"/>
                            <attachment id="19921" name="console-lola-11.log.bz2" size="123156" author="heckes" created="Tue, 15 Dec 2015 12:17:38 +0000"/>
                            <attachment id="19937" name="console-lola-9.log-20151213.gz" size="935259" author="heckes" created="Tue, 15 Dec 2015 12:22:35 +0000"/>
                            <attachment id="19809" name="console.log.bz2" size="194520" author="heckes" created="Fri, 4 Dec 2015 15:19:54 +0000"/>
                            <attachment id="19926" name="lola-10-memory-counter-20151213.dat.bz2" size="21649" author="heckes" created="Tue, 15 Dec 2015 12:17:38 +0000"/>
                            <attachment id="19927" name="lola-10-one-file-per-slab.tar.bz2" size="518040" author="heckes" created="Tue, 15 Dec 2015 12:17:38 +0000"/>
                            <attachment id="19928" name="lola-10-slab-detail-counter-20151213.dat.bz2" size="738550" author="heckes" created="Tue, 15 Dec 2015 12:17:38 +0000"/>
                            <attachment id="19929" name="lola-10-slab-global-counter-20151213.dat.bz2" size="25735" author="heckes" created="Tue, 15 Dec 2015 12:17:38 +0000"/>
                            <attachment id="19930" name="lola-11-memory-counter-20151213.dat.bz2" size="61698" author="heckes" created="Tue, 15 Dec 2015 12:17:38 +0000"/>
                            <attachment id="19931" name="lola-11-one-file-per-slab.tar.bz2" size="1273646" author="heckes" created="Tue, 15 Dec 2015 12:17:38 +0000"/>
                            <attachment id="19932" name="lola-11-slab-detail-counter-20151213.dat.bz2" size="2066526" author="heckes" created="Tue, 15 Dec 2015 12:17:38 +0000"/>
                            <attachment id="19933" name="lola-11-slab-global-counter-20151213.dat.bz2" size="70920" author="heckes" created="Tue, 15 Dec 2015 12:17:38 +0000"/>
                            <attachment id="19922" name="lola-9-memory-counter-20151213.dat.bz2" size="38820" author="heckes" created="Tue, 15 Dec 2015 12:17:38 +0000"/>
                            <attachment id="19923" name="lola-9-one-file-per-slab.tar.bz2" size="832286" author="heckes" created="Tue, 15 Dec 2015 12:17:38 +0000"/>
                            <attachment id="19924" name="lola-9-slab-details-counter-20151213.dat.bz2" size="1259791" author="heckes" created="Tue, 15 Dec 2015 12:17:38 +0000"/>
                            <attachment id="19925" name="lola-9-slab-global-counter-20151213.dat.bz2" size="45097" author="heckes" created="Tue, 15 Dec 2015 12:17:38 +0000"/>
                            <attachment id="19810" name="memory-counter-lola-11.dat.bz2" size="25254" author="heckes" created="Fri, 4 Dec 2015 15:19:54 +0000"/>
                            <attachment id="19935" name="messages-lola-10.log-20151213.bz2" size="793082" author="heckes" created="Tue, 15 Dec 2015 12:17:38 +0000"/>
                            <attachment id="19936" name="messages-lola-11.log.bz2" size="179176" author="heckes" created="Tue, 15 Dec 2015 12:17:38 +0000"/>
                            <attachment id="19811" name="messages-lola-11.log.bz2" size="309631" author="heckes" created="Fri, 4 Dec 2015 15:19:54 +0000"/>
                            <attachment id="19934" name="messages-lola-9.log-20151213.bz2" size="502044" author="heckes" created="Tue, 15 Dec 2015 12:17:38 +0000"/>
                            <attachment id="19812" name="slab-details-lola-11.dat.bz2" size="894137" author="heckes" created="Fri, 4 Dec 2015 15:19:54 +0000"/>
                            <attachment id="19813" name="slab-details-one-file-per-slab.tar.bz2" size="631494" author="heckes" created="Fri, 4 Dec 2015 15:19:54 +0000"/>
                            <attachment id="19814" name="slab-total-lola-11.dat.bz2" size="28753" author="heckes" created="Fri, 4 Dec 2015 15:19:54 +0000"/>
                            <attachment id="19815" name="vmcore-dmesg.txt.bz2" size="28332" author="heckes" created="Fri, 4 Dec 2015 15:19:54 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzxuxj:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>