<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:08:51 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-7432] oom-killer started on MDSes</title>
                <link>https://jira.whamcloud.com/browse/LU-7432</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;The error occurred during soak testing of build &apos;20151113&apos; (see &lt;a href=&quot;https://wiki.hpdd.intel.com/pages/viewpage.action?title=Soak+Testing+on+Lola&amp;amp;spaceKey=Releases#SoakTestingonLola-20151113&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://wiki.hpdd.intel.com/pages/viewpage.action?title=Soak+Testing+on+Lola&amp;amp;spaceKey=Releases#SoakTestingonLola-20151113&lt;/a&gt;) and earlier already when testing build &apos;20151109&apos;.&lt;br/&gt;
DNE is enabled. OSTs had been formatted using &lt;em&gt;zfs&lt;/em&gt;, MDTs using &lt;em&gt;ldiskfs&lt;/em&gt;. MDS nodes are configured in HA active-active failover configuration.&lt;/p&gt;

&lt;p&gt;At three moments in time:&lt;/p&gt;
&lt;div class=&apos;table-wrap&apos;&gt;
&lt;table class=&apos;confluenceTable&apos;&gt;&lt;tbody&gt;
&lt;tr&gt;
&lt;th class=&apos;confluenceTh&apos;&gt; date &lt;/th&gt;
&lt;th class=&apos;confluenceTh&apos;&gt; node &lt;/th&gt;
&lt;th class=&apos;confluenceTh&apos;&gt; build ID&lt;/th&gt;
&lt;th class=&apos;confluenceTh&apos;&gt; soak event &lt;/th&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;&amp;nbsp;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;  Nov  9 18:10:01 &lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;&lt;tt&gt;lola-9&lt;/tt&gt; &lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt; build: 20151109 &lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt; no fault; only job execution &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&apos;confluenceTd&apos;&gt; Nov 13 14:30:02 &lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt; &lt;tt&gt;lola-10&lt;/tt&gt; &lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt; build 20151113&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt; during stopping of soak &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&apos;confluenceTd&apos;&gt; Nov 14 05:35:01 &lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt; &lt;tt&gt;lola-11&lt;/tt&gt; &lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt; build 20151113 &lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt; no fault ; only job execution&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&apos;confluenceTd&apos;&gt; Nov 14 05:45:01 &lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt; {{ lola-9}} &lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt; build 20151113 &lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt; no fault ; only job execution&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/div&gt;

&lt;p&gt;the oom - killer had been invoked on the nodes specified. (All events happened at times where &lt;em&gt;no&lt;/em&gt; fault was injected.)&lt;/p&gt;

&lt;p&gt;Attached files: console and syslog of nodes affected.&lt;/p&gt;

&lt;p&gt;Unfortunately &lt;tt&gt;collectl&lt;/tt&gt; wasn&apos;t running to gather performance counters.&lt;br/&gt;
The tool has been enabled on all soak nodes to be able get memory, especially slab stats during one of the next sessions.&lt;/p&gt;
</description>
                <environment>lola&lt;br/&gt;
build: tip of master(df6cf859bbb29392064e6ddb701f3357e01b3a13) + patches</environment>
        <key id="33141">LU-7432</key>
            <summary>oom-killer started on MDSes</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="1" iconUrl="https://jira.whamcloud.com/images/icons/priorities/blocker.svg">Blocker</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="3">Duplicate</resolution>
                                        <assignee username="wc-triage">WC Triage</assignee>
                                    <reporter username="heckes">Frank Heckes</reporter>
                        <labels>
                            <label>soak</label>
                    </labels>
                <created>Mon, 16 Nov 2015 15:18:48 +0000</created>
                <updated>Tue, 24 Nov 2015 23:06:10 +0000</updated>
                            <resolved>Tue, 24 Nov 2015 23:06:10 +0000</resolved>
                                                    <fixVersion>Lustre 2.8.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>4</watches>
                                                                            <comments>
                            <comment id="133616" author="green" created="Mon, 16 Nov 2015 18:21:28 +0000"  >&lt;p&gt;we need /proc/slabinfo output here to see who&apos;s using the ram.&lt;/p&gt;</comment>
                            <comment id="134105" author="pjones" created="Fri, 20 Nov 2015 18:53:20 +0000"  >&lt;p&gt;Frank&lt;/p&gt;

&lt;p&gt;When do you think that you will be able to provide the info that Oleg has requested?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="134203" author="heckes" created="Mon, 23 Nov 2015 09:36:43 +0000"  >&lt;p&gt;Sorry, I overlooked Oleg&apos;s request.&lt;br/&gt;
For the incident described above we don&apos;t have these counters. I enabled &lt;tt&gt;collectl&lt;/tt&gt; on all soak nodes&lt;br/&gt;
last week to gather performance counters, especially for the slap details. We should be prepared to &lt;br/&gt;
replay the stats in case the incident happens again.&lt;/p&gt;</comment>
                            <comment id="134466" author="di.wang" created="Tue, 24 Nov 2015 23:05:53 +0000"  >&lt;p&gt;I checked the log and also was monitoring the MDS when OOM was about to happen. It seems because of endless recovery on some MDTs. i.e. if recovery abort problem is being fixed, then this problem should go away.  Since the endless recovery will be fixed by the patch in &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-7039&quot; title=&quot;llog_osd.c:778:llog_osd_next_block()) ASSERTION( last_rec-&amp;gt;lrh_index == tail-&amp;gt;lrt_index ) failed:&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-7039&quot;&gt;&lt;del&gt;LU-7039&lt;/del&gt;&lt;/a&gt; and other related patch under &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-7455&quot; title=&quot;Tracking tickets to make DNE pass soak-test.&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-7455&quot;&gt;&lt;del&gt;LU-7455&lt;/del&gt;&lt;/a&gt;, I will close this patch. &lt;/p&gt;

&lt;p&gt;Frank, if you see something different, please re-open this one. thanks.&lt;/p&gt;</comment>
                            <comment id="134467" author="di.wang" created="Tue, 24 Nov 2015 23:06:10 +0000"  >&lt;p&gt;duplicate with &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-7039&quot; title=&quot;llog_osd.c:778:llog_osd_next_block()) ASSERTION( last_rec-&amp;gt;lrh_index == tail-&amp;gt;lrt_index ) failed:&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-7039&quot;&gt;&lt;del&gt;LU-7039&lt;/del&gt;&lt;/a&gt;&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="33262">LU-7455</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="19627" name="console-lola-10.log.gz" size="414270" author="heckes" created="Mon, 16 Nov 2015 18:18:23 +0000"/>
                            <attachment id="19628" name="console-lola-11.log.gz" size="634205" author="heckes" created="Mon, 16 Nov 2015 18:18:23 +0000"/>
                            <attachment id="19626" name="console-lola-9.log.gz" size="901480" author="heckes" created="Mon, 16 Nov 2015 18:18:23 +0000"/>
                            <attachment id="19630" name="messages-lola-10.log.bz2" size="809447" author="heckes" created="Mon, 16 Nov 2015 18:18:23 +0000"/>
                            <attachment id="19631" name="messages-lola-11.log.bz2" size="824746" author="heckes" created="Mon, 16 Nov 2015 18:18:23 +0000"/>
                            <attachment id="19629" name="messages-lola-9.log.bz2" size="674621" author="heckes" created="Mon, 16 Nov 2015 18:18:23 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzxt3j:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>