<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:41:35 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-4311] Mount sometimes fails with EIO on OSS with several mounts in parallel</title>
                <link>https://jira.whamcloud.com/browse/LU-4311</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;On one of our test cluster installed with Lustre 2.4.1, we somtimes saw the following error message in the &quot;shine&quot; command line tool output, when starting a lustre file system, and the corresponding OST is not mounted:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;mount.lustre: mount /dev/mapper/mpathj at /mnt/fs1/ost/6 failed: Input/output error
Is the MGS running?
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The test file system is composed of 6 servers: one MDS (one MDT), 4 OSS (3 with 2 OSTs and one with 1 OST) and a separate MGS.&lt;br/&gt;
Configuration (see attached config_parameters file for details):&lt;br/&gt;
    MGS: lama5 (failover lama6)&lt;br/&gt;
    MDS: lama6 (failover lama5)&lt;br/&gt;
    OSS: lama7 (failover lama8, lama9 and lama10) to lama10 (failover lama7, lama8 and lama9)&lt;/p&gt;

&lt;p&gt;When the error occurs, we have the following lustre kernel traces on MGS:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;MGS: Client &amp;lt;client_name&amp;gt; seen on new nid &amp;lt;nid2&amp;gt; when existing nid &amp;lt;nid1&amp;gt; is already connected
...
@@@ MGS fail to handle opc = 250: rc = -114
...
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;and on OSS:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;InitRecov MGC10.3.0.10@o2ib 1/d0:i0:r0:or0:NEW
...
InitRecov MGC10.3.0.10@o2ib 1/d0:i0:r1:or0:CONNECTING
...
recovery of MGS on MGC10.3.0.10@o2ib_0 failed (-5)
...
MGS: recovery started, waiting 100000 seconds
...
MGC10.3.0.10@o2ib: Communicating with 10.4.0.10@o2ib1, operation mgs_connect failed with -114
...
recovery of MGS on MGC10.3.0.10@o2ib_0 failed (-114)
MGS: recovery finished
...
fs1-OST0005: cannot register this server with the MGS: rc = -5. Is the MGS running?
...
Unable to start targets: -5
...
Unable to mount  (-5)
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;I was able to reproduce the error without shine and with only one OSS, with the script below.&lt;br/&gt;
The MGS (lama5) and MDS (lama6) are started/mounted, and the script is started on lama10.&lt;br/&gt;
If the tunefs.lustre or the lustre_rmmod is removed, or the first mount is started in foreground, the error does not occur.&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;N=1
rm -f error stop
while true; do
        tunefs.lustre --erase-params --quiet &quot;--mgsnode=lama5-ic1@o2ib0,lama5-ic2@o2ib1&quot; \
             &quot;--mgsnode=lama6-ic1@o2ib0,lama6-ic2@o2ib1&quot; &quot;--failnode=lama7-ic1@o2ib0&quot; \
             &quot;--failnode=lama8-ic1@o2ib0&quot; &quot;--failnode=lama9-ic1@o2ib0&quot; \
              --network=o2ib0 --writeconf /dev/ldn.cook.ost3 &amp;gt; /dev/null

        tunefs.lustre --erase-params --quiet &quot;--mgsnode=lama5-ic1@o2ib0,lama5-ic2@o2ib1&quot; \
             &quot;--mgsnode=lama6-ic1@o2ib0,lama6-ic2@o2ib1&quot; &quot;--failnode=lama7-ic2@o2ib1&quot; \
             &quot;--failnode=lama8-ic2@o2ib1&quot; &quot;--failnode=lama9-ic2@o2ib1&quot; \
             --network=o2ib1 --writeconf /dev/ldn.cook.ost6 &amp;gt; /dev/null

        modprobe fsfilt_ldiskfs
        modprobe lustre
        ssh lama5 lctl clear
        dmesg -c &amp;gt; /dev/null
        ssh lama5 dmesg -c &amp;gt; /dev/null
        (/bin/mount -t lustre -o errors=panic /dev/ldn.cook.ost3 /mnt/fs1/ost/5 || touch error) &amp;amp;
        /bin/mount -t lustre -o errors=panic /dev/ldn.cook.ost6 /mnt/fs1/ost/6 || touch error
        wait
        if [ -f error ]; then
                lctl dk &amp;gt; oss.lustre.dk.bad
                ssh lama5 lctl dk &amp;gt; mgs.lustre.dk.bad
                dmesg &amp;gt; oss.dmesg.bad
                ssh lama5 dmesg &amp;gt; mgs.dmesg.bad
        else
                lctl dk &amp;gt; oss.lustre.dk.good
                ssh lama5 lctl dk &amp;gt; mgs.lustre.dk.good
                dmesg &amp;gt; oss.dmesg.good
                ssh lama5 dmesg &amp;gt; mgs.dmesg.good
        fi
        umount /mnt/fs1/ost/5
        umount /mnt/fs1/ost/6
        lustre_rmmod
        [ -f stop -o -f error ] &amp;amp;&amp;amp; break
        [ $N -ge 25 ] &amp;amp;&amp;amp; break
        echo &quot;============================&amp;gt; loop $N&quot;
        N=$((N+1))
done
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;I have attached a tarball containing the config parameters, the reproducer, and the files produced by the reproducer:&lt;br/&gt;
    reproducer&lt;br/&gt;
    config_parameters&lt;br/&gt;
    mgs.dmesg.good, mgs.lustre.dk.good, oss.dmesg.good, oss.lustre.dk.good&lt;br/&gt;
    mgs.dmesg.bad,  mgs.lustre.dk.bad,  oss.dmesg.bad,  oss.lustre.dk.bad&lt;/p&gt;

&lt;p&gt;I have tried the following patch, which skips the connection at INIT_RECOV_BACKUP if one already exists.&lt;br/&gt;
With this patch the &quot;mount&quot; no longer fails, but it&apos;s only a workaround and it does not solve the problem of double connection on MGS. Probably there is a missing serialisation/synchronisation.&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;--- a/lustre/mgc/mgc_request.c
+++ b/lustre/mgc/mgc_request.c
@@ -1029,6 +1029,7 @@ int mgc_set_info_async(const struct lu_e
                        ptlrpc_import_state_name(imp-&amp;gt;imp_state));
                 /* Resurrect if we previously died */
                 if ((imp-&amp;gt;imp_state != LUSTRE_IMP_FULL &amp;amp;&amp;amp;
+                     imp-&amp;gt;imp_state != LUSTRE_IMP_CONNECTING &amp;amp;&amp;amp;
                      imp-&amp;gt;imp_state != LUSTRE_IMP_NEW) || value &amp;gt; 1)
                         ptlrpc_reconnect_import(imp);
                 RETURN(0);
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</description>
                <environment>Lustre 2.4.1</environment>
        <key id="22237">LU-4311</key>
            <summary>Mount sometimes fails with EIO on OSS with several mounts in parallel</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="6" iconUrl="https://jira.whamcloud.com/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="3">Duplicate</resolution>
                                        <assignee username="bfaccini">Bruno Faccini</assignee>
                                    <reporter username="patrick.valentin">Patrick Valentin</reporter>
                        <labels>
                    </labels>
                <created>Tue, 26 Nov 2013 13:49:54 +0000</created>
                <updated>Wed, 27 Nov 2013 19:13:40 +0000</updated>
                            <resolved>Wed, 27 Nov 2013 19:13:40 +0000</resolved>
                                                                        <due></due>
                            <votes>0</votes>
                                    <watches>4</watches>
                                                                            <comments>
                            <comment id="72305" author="bfaccini" created="Tue, 26 Nov 2013 14:46:09 +0000"  >&lt;p&gt;Hello Patrick, is it me or we also got this kind of issues in the past and already related to the // operations launched by Shine during Lustre start/mount ??&lt;/p&gt;

&lt;p&gt;Also, did you really mean that &quot;the error does not occur&quot; also when &quot;the first mount is started in foreground&quot; or when it is not ?&lt;/p&gt;</comment>
                            <comment id="72318" author="patrick.valentin" created="Tue, 26 Nov 2013 16:11:17 +0000"  >&lt;p&gt;&amp;gt; &lt;br/&gt;
&amp;gt; Hello Patrick, is it me or we also got this kind of issues in the past and already related to the // operations launched by Shine during Lustre start/mount ??&lt;br/&gt;
&amp;gt;&lt;br/&gt;
Hi Bruno,&lt;br/&gt;
I discussed with S&#233;bastien and he confirms such problem was already seen on &quot;tera100&quot; with shine. So it&apos;s not a new issue introduced by lustre 2.4.1, but is more a problem of commands which run in parallel.&lt;br/&gt;
&amp;gt;&lt;br/&gt;
&amp;gt; Also, did you really mean that &quot;the error does not occur&quot; also when &quot;the first mount is started in foreground&quot; or when it is not ?&lt;br/&gt;
&amp;gt;&lt;br/&gt;
As far as I remember, the issue only occurs when the two mount commands are executing in parallel (the fisrt one started in background). If the first one is started in the foreground (sequentiel execution) there is no mount error. And if there is only one mount command in the script, there is no error either.&lt;br/&gt;
I make a new try to confirm this.&lt;br/&gt;
As suggested by S&#233;bastien, I will also try to reduce the number of &quot;mgsnode&quot; and &quot;failnode&quot; in the &quot;tunefs&quot;, to see if this has any effect.&lt;/p&gt;
</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10010">
                    <name>Duplicate</name>
                                                                <inwardlinks description="is duplicated by">
                                        <issuelink>
            <issuekey id="13847">LU-1279</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="13876" name="reproduce.tarces.tar" size="133120" author="patrick.valentin" created="Tue, 26 Nov 2013 13:52:25 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzwa33:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>11805</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>