<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:19:04 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-1716] Race in setting connection flags and using them on 2.x client connect</title>
                <link>https://jira.whamcloud.com/browse/LU-1716</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Lustre 2.1 client fails to connect to Lustre 2.2 server&lt;/p&gt;

&lt;p&gt;&amp;gt; c0-0c2s6n1 LustreError: 11-0: an error occurred while communicating with 10.149.3.5@o2ib. The mgs_config_read operation failed with -524&lt;br/&gt;
&amp;gt; c0-0c2s6n1 LustreError: 4645:0:(mgc_request.c:1917:mgc_process_config()) Cannot process recover llog -524&lt;br/&gt;
&amp;gt; c0-0c2s6n1 LustreError: 15c-8: MGC10.149.3.5@o2ib: The configuration from log &apos;snxs2-client&apos; failed (-524). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.&lt;br/&gt;
&amp;gt; c0-0c2s6n1 LustreError: 4645:0:(llite_lib.c:983:ll_fill_super()) Unable to process log: -524&lt;/p&gt;

&lt;p&gt;the race can be reproduced with following patch:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;diff --git a/lustre/ptlrpc/import.c b/lustre/ptlrpc/import.c
index 2953352..a69e6b9 100644
&#8212; a/lustre/ptlrpc/import.c
+++ b/lustre/ptlrpc/import.c
@@ -805,6 +805,7 @@ static int ptlrpc_connect_interpret(const struct lu_env *env,
                 } else {
                         IMPORT_SET_STATE(imp, LUSTRE_IMP_FULL);
                         ptlrpc_activate_import(imp);
+                        OBD_FAIL_TIMEOUT(0x5555, 2);
                 }
 
                 GOTO(finish, rc = 0);
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</description>
                <environment></environment>
        <key id="15429">LU-1716</key>
            <summary>Race in setting connection flags and using them on 2.x client connect</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="1" iconUrl="https://jira.whamcloud.com/images/icons/priorities/blocker.svg">Blocker</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="bogl">Bob Glossman</assignee>
                                    <reporter username="askulysh">Andriy Skulysh</reporter>
                        <labels>
                    </labels>
                <created>Tue, 7 Aug 2012 11:00:35 +0000</created>
                <updated>Sat, 22 Dec 2012 10:47:36 +0000</updated>
                            <resolved>Sun, 26 Aug 2012 11:29:51 +0000</resolved>
                                    <version>Lustre 2.3.0</version>
                                    <fixVersion>Lustre 2.3.0</fixVersion>
                    <fixVersion>Lustre 2.4.0</fixVersion>
                    <fixVersion>Lustre 2.1.4</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>6</watches>
                                                                            <comments>
                            <comment id="42812" author="adilger" created="Tue, 7 Aug 2012 11:11:49 +0000"  >&lt;p&gt;How is this different than &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-887&quot; title=&quot;master (2.2) clients incompatible with 2.1 servers?&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-887&quot;&gt;&lt;del&gt;LU-887&lt;/del&gt;&lt;/a&gt;?&lt;/p&gt;</comment>
                            <comment id="42834" author="askulysh" created="Tue, 7 Aug 2012 16:08:17 +0000"  >&lt;p&gt;It looks like the answer is no. My case addresses the issue in setting connect flags. We should set connect flags prior to setting import to full state. &lt;br/&gt;
Patch &lt;a href=&quot;http://review.whamcloud.com/#change,3555&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#change,3555&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="42844" author="bogl" created="Tue, 7 Aug 2012 17:48:19 +0000"  >&lt;p&gt;I tried to reproduce the reported problem with just repeated mount/unmount cycles of a 2.3 filesystem on a 2.1 client.   Couldn&apos;t make it happen.  Didn&apos;t add in your suggested OBD_FAIL_* hook to try to force it to happen.   Should I have expected to see it or is there something else that needs to be done to cause it?&lt;/p&gt;
</comment>
                            <comment id="42966" author="adilger" created="Thu, 9 Aug 2012 16:24:00 +0000"  >&lt;p&gt;Bob/Andriy, it would be useful to add a proper OBD_FAIL value for this test, and write a recovery-small.sh test to trigger the timeout, and submit it as a patch to autotest with the following directives  in the commit comment (see &lt;a href=&quot;http://wiki.whamcloud.com/display/PUB/Changing+Test+Parameters+with+Gerrit+Commit+Messages&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://wiki.whamcloud.com/display/PUB/Changing+Test+Parameters+with+Gerrit+Commit+Messages&lt;/a&gt; for details):&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Test-Parameters: fortestonly clientbuildno=??? list=recovery-small,recovery-small,recovery-small
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Where clientbuildno=&amp;lt;2.1.2 release build number&amp;gt;. This is without the actual fix applied, just the test case.  That would give us an indication on how easily this problem can be hit, and then if it fails during the testing, then the test case should be added to the patch, which will presumably allow the test to pass.&lt;/p&gt;</comment>
                            <comment id="43242" author="askulysh" created="Wed, 15 Aug 2012 09:26:17 +0000"  >&lt;p&gt;added test &lt;a href=&quot;http://review.whamcloud.com/#change,3654&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#change,3654&lt;/a&gt; for b2_1&lt;br/&gt;
In fact it reproduces the bug in another place:&lt;br/&gt;
00000100:00000001:0.0:1344768393.872519:0:4160:0:(client.c:617:__ptlrpc_request_bufs_pack()) Process leaving (rc=0 : 0 : 0)&lt;br/&gt;
00000000:00040000:0.0:1344768393.872521:0:4160:0:(mdc_request.c:239:mdc_getattr()) ASSERTION(client_is_remote(exp)) failed&lt;br/&gt;
00000000:00040000:0.0:1344768393.872527:0:4160:0:(mdc_request.c:239:mdc_getattr()) LBUG&lt;/p&gt;

&lt;p&gt;but it is connection flags race also. The LBUG dissapears with my fix.&lt;/p&gt;</comment>
                            <comment id="43256" author="bogl" created="Wed, 15 Aug 2012 10:35:00 +0000"  >&lt;p&gt;Andreas, I can incorporate the test from &lt;a href=&quot;http://review.whamcloud.com/#change,3654&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#change,3654&lt;/a&gt; into the fix for master.  Since this bug is an interop problem it won&apos;t prove anything with the test only in 2.3 client &amp;amp; server, but should be good to prove the problem doesn&apos;t come back for testing interop of 2.3 with future versions.  Will also rework the patch to address your style and cleanup comments and resubmit.&lt;/p&gt;</comment>
                            <comment id="43257" author="bogl" created="Wed, 15 Aug 2012 10:42:54 +0000"  >&lt;p&gt;Never mind.  I see I am playing catch up here.  Looks like Andriy has already addressed the style and cleanup issues but not the test.  That being the case should I grab the 2.1 test into master even though it doesn&apos;t prove anything in this release?&lt;/p&gt;</comment>
                            <comment id="43258" author="askulysh" created="Wed, 15 Aug 2012 11:00:17 +0000"  >&lt;p&gt;I haven&apos;t added test to master branch because the fix changes the order of assigning connect flags and setting import to FULL state. OBD_FAIL_TIMEOUT() will not catch anything.&lt;/p&gt;</comment>
                            <comment id="43762" author="green" created="Sat, 25 Aug 2012 11:06:46 +0000"  >&lt;p&gt;Hm, actually I now believe the issue is more serious than interop. Since the flags like CAPA, GSS and remote clients are set in a client by default and only after server rejects them they are reset, this could occur without any interop again, though strange that we have not seen it before (needs a lot of superfast cores on client to show up the race?)&lt;/p&gt;

&lt;p&gt;I will change the title to reflect this.&lt;/p&gt;</comment>
                            <comment id="43764" author="askulysh" created="Sat, 25 Aug 2012 11:34:33 +0000"  >&lt;p&gt;b2_1 patch &lt;a href=&quot;http://review.whamcloud.com/3783&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/3783&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="43772" author="pjones" created="Sun, 26 Aug 2012 11:29:51 +0000"  >&lt;p&gt;Landed for 2.3 and 2.4&lt;/p&gt;</comment>
                            <comment id="48244" author="nrutman" created="Wed, 21 Nov 2012 18:29:01 +0000"  >&lt;p&gt;Xyratex-bug-id: &lt;a href=&quot;http://jira-nss.xy01.xyratex.com:8080/browse/MRP-577&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;MRP-577&lt;/a&gt;&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzv5t3:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>4475</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>