<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 03:24:24 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-16146] after dropping mgs/mdt0 for test: mdt_handler.c:7522:mdt_postrecov()) lflood-MDT0000: auto trigger paused LFSCK failed: rc = -6</title>
                <link>https://jira.whamcloud.com/browse/LU-16146</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;After doing a test that involved turning off the mgs/mdt0 node for the test cluster garter, the mdts fail to come back up.&#160; They are stuck &quot;WAITING&quot;.&lt;/p&gt;

&lt;p&gt;When the mgs node was powered off, an ior test was in progress.&lt;/p&gt;

&lt;p&gt;Failover seemed to work, but the ior job was unable to finish. The cluster was eventually rebooted.&lt;/p&gt;

&lt;p&gt;After the reboot, the logs show that all 4 mdts are failing an lfsck_start with a line such as&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;2022-09-07 14:35:12 [ 5791.188613] Lustre: 32474:0:(mdt_handler.c:7522:mdt_postrecov()) lflood-MDT0000: auto trigger paused LFSCK failed: rc = -6&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;I believe that -6 is coming from lfsck_start()&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;&#160; &#160; &#160; &#160; lfsck = lfsck_instance_find(key, true, false); &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160;&#160;
&#160; &#160; &#160; &#160; if (unlikely(lfsck == NULL)) &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160;&#160;
&#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; RETURN(-ENXIO); &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160;
&#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160;
&#160; &#160; &#160; &#160; if (unlikely(lfsck-&amp;gt;li_stopping)) &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160;
&#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; GOTO(put, rc = -ENXIO);&#160;
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;For our reference, our local ticket is TOSS5875&lt;/p&gt;</description>
                <environment>TOSS 4.4-4.1&lt;br/&gt;
4.18.0-372.19.1.1toss.t4.x86_64&lt;br/&gt;
lustre 2.15.0_3.llnl</environment>
        <key id="72279">LU-16146</key>
            <summary>after dropping mgs/mdt0 for test: mdt_handler.c:7522:mdt_postrecov()) lflood-MDT0000: auto trigger paused LFSCK failed: rc = -6</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="6" iconUrl="https://jira.whamcloud.com/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="5">Cannot Reproduce</resolution>
                                        <assignee username="laisiyao">Lai Siyao</assignee>
                                    <reporter username="defazio">Gian-Carlo Defazio</reporter>
                        <labels>
                            <label>llnl</label>
                    </labels>
                <created>Thu, 8 Sep 2022 21:26:19 +0000</created>
                <updated>Thu, 26 Jan 2023 00:28:07 +0000</updated>
                            <resolved>Thu, 26 Jan 2023 00:28:07 +0000</resolved>
                                    <version>Lustre 2.15.0</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>5</watches>
                                                                            <comments>
                            <comment id="346201" author="pjones" created="Fri, 9 Sep 2022 16:15:29 +0000"  >&lt;p&gt;Lai&lt;/p&gt;

&lt;p&gt;Could you please advise on this one?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="346401" author="ofaaland" created="Mon, 12 Sep 2022 17:40:06 +0000"  >&lt;p&gt;Gian-Carlo,&lt;/p&gt;

&lt;p&gt;Can you describe what you saw that gave the impression that &quot;Failover seemed to work&quot;?&#160; &#160;IE did the mgs and mdts mount, but never exit recovery?&#160; And I think attach the console log for the garter nodes that were hosting the MGS and MDT0000 at any point during this sequence of events.&lt;/p&gt;

&lt;p&gt;thanks&lt;/p&gt;</comment>
                            <comment id="346429" author="defazio" created="Tue, 13 Sep 2022 00:07:42 +0000"  >&lt;p&gt;For the failover, I could see pacemaker moving MGS and MDT0 from garter1 to garter2 after garter1 was turned off, then ltop showed MDT1 running on garter2.&lt;/p&gt;</comment>
                            <comment id="346430" author="defazio" created="Tue, 13 Sep 2022 00:09:29 +0000"  >&lt;p&gt;I&apos;ve added the console logs for garter&lt;span class=&quot;error&quot;&gt;&amp;#91;1,2&amp;#93;&lt;/span&gt;&lt;/p&gt;</comment>
                            <comment id="349519" author="laisiyao" created="Thu, 13 Oct 2022 14:20:47 +0000"  >&lt;p&gt;I&apos;ll review related code. It&apos;s better if you can enable trace with &quot;lfs set_param debug=+trace&quot; and dump debug of MDS.&lt;/p&gt;</comment>
                            <comment id="350550" author="pjones" created="Fri, 21 Oct 2022 23:29:14 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/ViewProfile.jspa?name=defazio&quot; class=&quot;user-hover&quot; rel=&quot;defazio&quot;&gt;defazio&lt;/a&gt;&#160;will you be able to gather this additional debug info?&lt;/p&gt;</comment>
                            <comment id="350611" author="defazio" created="Mon, 24 Oct 2022 16:13:21 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/ViewProfile.jspa?name=pjones&quot; class=&quot;user-hover&quot; rel=&quot;pjones&quot;&gt;pjones&lt;/a&gt; &lt;br/&gt;
Sorry for the delay.&lt;br/&gt;
The system I&apos;d need to generate those logs on, garter, is being used for higher priority stuff right now. It&apos;s using new MDTs and OSTs. I have the old MDTs that failed on lfsck_start saved, but I can&apos;t switch back to them right now. &lt;/p&gt;

&lt;p&gt;So yes I can get logs for trying to start up the MDTs, but I&apos;m not sure when.&lt;/p&gt;</comment>
                            <comment id="360428" author="defazio" created="Thu, 26 Jan 2023 00:28:07 +0000"  >&lt;p&gt;We were unable to reproduce this when bringing up out file system both manually and using pacemaker.&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                            <attachment id="45659" name="2022-09-07_console.garter1.gz" size="208476" author="defazio" created="Tue, 13 Sep 2022 00:08:39 +0000"/>
                            <attachment id="45660" name="2022-09-07_console.garter2.gz" size="121537" author="defazio" created="Tue, 13 Sep 2022 00:09:01 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i02zpb:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>