<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:44:01 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-4582] After failing over Lustre MGS node to the secondary, client mount fails with -5</title>
                <link>https://jira.whamcloud.com/browse/LU-4582</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Following are steps to reproduce the issue reliably:&lt;/p&gt;

&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;1. adjust obd_timeout from &lt;span class=&quot;code-keyword&quot;&gt;default&lt;/span&gt; 100 to 300
    lctl conf_param &amp;lt;fsname&amp;gt;.sys.timeout=300
2. mount and umount the client
    mount - lustre &amp;lt;primary MGS ip&amp;gt;:&amp;lt;secondary MGS ip&amp;gt;:/&amp;lt;fsname&amp;gt; /mnt/lustre
3. failover the MGS node to the secondary
4. mount the client again using the same command as in step 2
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Then step 4 will fail with EIO.&lt;/p&gt;</description>
                <environment></environment>
        <key id="23003">LU-4582</key>
            <summary>After failing over Lustre MGS node to the secondary, client mount fails with -5</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="cliffw">Cliff White</assignee>
                                    <reporter username="cheng_shao">Cheng Shao</reporter>
                        <labels>
                            <label>patch</label>
                    </labels>
                <created>Tue, 4 Feb 2014 20:12:19 +0000</created>
                <updated>Wed, 14 Mar 2018 13:53:33 +0000</updated>
                            <resolved>Mon, 23 Jun 2014 15:59:08 +0000</resolved>
                                    <version>Lustre 2.1.0</version>
                    <version>Lustre 2.2.0</version>
                    <version>Lustre 2.3.0</version>
                    <version>Lustre 2.4.0</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>11</watches>
                                                                            <comments>
                            <comment id="76220" author="cheng_shao" created="Tue, 4 Feb 2014 20:15:22 +0000"  >&lt;p&gt;Here is the timeline of events according to the Lustre debug log. The beginning num is relative to the start of the mount op.&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;+0 Client sent MGS_CONNECT req to primary MGS node with timeout set to (obd_timeout/20 + adaptive_timeout), which was 20 seconds in our test &lt;span class=&quot;code-keyword&quot;&gt;case&lt;/span&gt;.
+0 Client sent LDLM_ENQUEUE req to MGS node with rq_delay_limit set to 5 seconds. This is &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; sptlrpc. The send will be delayed because the &lt;span class=&quot;code-keyword&quot;&gt;import&lt;/span&gt; is still in connecting state. 
+5 The above req failed after the delayed sent expired. But &lt;span class=&quot;code-keyword&quot;&gt;this&lt;/span&gt; is not fatal.
+5 Client sent another LDLM_ENQUEUE req to MGS node with rq_delay_limit set to MGC_ENQUEUE_LIMIT, which is hard coded to 50 seconds.
+20 MGS_CONNECT timed out.
+55 The second LDLM_ENQUEUE req failed after the delayed sent expired. This will fail the whole client mount with error -5.
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The problem here is that, after the MGS_CONNECT failed to connect to the primary MGS, it didn&apos;t get a chance to connect to the secondary before the mount fails. We know that selecting a different MGS node is triggered by the pinger, which works at (obd_timeout/4) interval. Since we increased obd_timeout to 300, the interval became 75 seconds now. So the connection to the secondary will not happened prior to failure of the second LDLM_ENQUEUE req.&lt;/p&gt;

&lt;p&gt;The solution we proposed here is to redefine MGS_ENQUEUE_LIMIT as relative to obd_timeout, instead of a hard-coded value. By doing that, the second LDLM_ENQUEUE will wait long enough to go through after the connection to the secondary MGS node is established.&lt;/p&gt;
</comment>
                            <comment id="76264" author="haasken" created="Wed, 5 Feb 2014 15:19:59 +0000"  >&lt;p&gt;Cheng, have you uploaded your patch to the whamcloud gerrit review site?  If so, please post a link here.  Thanks.&lt;/p&gt;</comment>
                            <comment id="76285" author="cheng_shao" created="Wed, 5 Feb 2014 18:39:53 +0000"  >&lt;p&gt;Patch is up for review at &lt;a href=&quot;http://review.whamcloud.com/#/c/9141/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#/c/9141/&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="76486" author="green" created="Fri, 7 Feb 2014 18:28:47 +0000"  >&lt;p&gt;I wonder why is your patch against b2_5 and not master? Is master not affected?&lt;br/&gt;
We generally prefer to land things to master first.&lt;/p&gt;</comment>
                            <comment id="76633" author="cheng_shao" created="Mon, 10 Feb 2014 19:02:54 +0000"  >&lt;p&gt;Master is definitely affected as well. Will abandon this patch and submit a new one against master.&lt;/p&gt;</comment>
                            <comment id="76694" author="cheng_shao" created="Tue, 11 Feb 2014 03:46:51 +0000"  >&lt;p&gt;New patch is at &lt;a href=&quot;http://review.whamcloud.com/#/c/9217/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#/c/9217/&lt;/a&gt;.&lt;/p&gt;</comment>
                            <comment id="82450" author="denis_kondratenko" created="Fri, 25 Apr 2014 08:22:23 +0000"  >&lt;p&gt;unfortunately Cheng left Xyratex, but we still need to get this landed.&lt;/p&gt;

&lt;p&gt;Could someone review Cheng&apos;s patch?&lt;/p&gt;</comment>
                            <comment id="83675" author="cliffw" created="Fri, 9 May 2014 20:15:30 +0000"  >&lt;p&gt;Reviewers have been assigned.&lt;/p&gt;</comment>
                            <comment id="85192" author="haasken" created="Thu, 29 May 2014 22:54:24 +0000"  >&lt;p&gt;It has been a while since there has been any activity on this bug.  Who is reviewing Cheng&apos;s patch?&lt;/p&gt;</comment>
                            <comment id="85276" author="cliffw" created="Fri, 30 May 2014 17:25:57 +0000"  >&lt;p&gt;Very sorry about the delay, will investigate&lt;/p&gt;</comment>
                            <comment id="85771" author="haasken" created="Wed, 4 Jun 2014 23:28:17 +0000"  >&lt;p&gt;Thanks, Cliff.  &lt;a href=&quot;http://review.whamcloud.com/#/c/9217/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#/c/9217/&lt;/a&gt; has landed.&lt;/p&gt;</comment>
                            <comment id="86131" author="cliffw" created="Mon, 9 Jun 2014 18:06:25 +0000"  >&lt;p&gt;Is it okay to close this isse?&lt;/p&gt;</comment>
                            <comment id="86165" author="haasken" created="Mon, 9 Jun 2014 22:01:45 +0000"  >&lt;p&gt;Yes.&lt;/p&gt;</comment>
                            <comment id="106475" author="gerrit" created="Tue, 10 Feb 2015 17:33:28 +0000"  >&lt;p&gt;Jian Yu (jian.yu@intel.com) uploaded a new patch: &lt;a href=&quot;http://review.whamcloud.com/13718&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/13718&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-4582&quot; title=&quot;After failing over Lustre MGS node to the secondary, client mount fails with -5&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-4582&quot;&gt;&lt;del&gt;LU-4582&lt;/del&gt;&lt;/a&gt; mgc: replace hard-coded MGC_ENQUEUE_LIMIT value&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: b2_5&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 6c815c12a2d6dfabf55b43563b4f1062c123db99&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10010">
                    <name>Duplicate</name>
                                                                <inwardlinks description="is duplicated by">
                                                        </inwardlinks>
                                    </issuelinktype>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                                                <inwardlinks description="is related to">
                                                        </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzwe9z:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>12524</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>