<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:40:58 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-4243] multiple servicenodes or failnids: wrong client llog registration</title>
                <link>https://jira.whamcloud.com/browse/LU-4243</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Since Lustre 2.4.0 we had problems with clients that could not connect after eg. the MGS was failing over. Most experiments we did with a client on the standby MGS server, the symptom was that the client only worked from the active MGS/MDS node, not from the passive one!&lt;/p&gt;

&lt;p&gt;The reason for the problem seems to be commit d9d27cad and the following hunk of the patch:&lt;br/&gt;
@@ -1447,13 +1481,11 @@ static int mgs_write_log_failnids(const struct lu_env *env,&lt;br/&gt;
                                failnodeuuid, cliname);&lt;br/&gt;
                        rc = record_add_uuid(env, llh, nid, failnodeuuid);&lt;br/&gt;
                 }&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;if (failnodeuuid) 
{
+               if (failnodeuuid)
                        rc = record_add_conn(env, llh, cliname, failnodeuuid);
-                        name_destroy(&amp;amp;failnodeuuid);
-                        failnodeuuid = NULL;
-                }
&lt;p&gt;         }&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;+       name_destroy(&amp;amp;failnodeuuid);&lt;br/&gt;
         return rc;&lt;br/&gt;
 }&lt;/p&gt;

&lt;p&gt;This leads to a wrong lustre client llog when the lustre block devices are formated with multiple --servicenode options! Here is an example:&lt;/p&gt;

&lt;p&gt;#09 (224)marker   6 (flags=0x01, v2.5.0.0) lnec-MDT0000    &apos;add mdc&apos; Mon Nov 11 17:03:21 2013-&lt;br/&gt;
#10 (080)add_uuid  nid=10.3.0.34@o2ib(0x500000a030022)  0:  1:10.3.0.34@o2ib  &lt;br/&gt;
#11 (128)attach    0:lnec-MDT0000-mdc  1:mdc  2:lnec-clilmv_UUID  &lt;br/&gt;
#12 (136)setup     0:lnec-MDT0000-mdc  1:lnec-MDT0000_UUID  2:10.3.0.34@o2ib  &lt;br/&gt;
#13 (080)add_uuid  nid=10.3.0.34@o2ib(0x500000a030022)  0:  1:10.3.0.34@o2ib  &lt;br/&gt;
#14 (104)add_conn  0:lnec-MDT0000-mdc  1:10.3.0.34@o2ib  &lt;br/&gt;
#15 (080)add_uuid  nid=10.3.0.35@o2ib(0x500000a030023)  0:  1:10.3.0.34@o2ib  &lt;br/&gt;
#16 (104)add_conn  0:lnec-MDT0000-mdc  1:10.3.0.34@o2ib  &lt;br/&gt;
#17 (160)modify_mdc_tgts add 0:lnec-clilmv  1:lnec-MDT0000_UUID  2:0  3:1  4:lnec-MDT0000-mdc_UUID  &lt;br/&gt;
#18 (224)marker   6 (flags=0x02, v2.5.0.0) lnec-MDT0000    &apos;add mdc&apos; Mon Nov 11 17:03:21 2013-&lt;/p&gt;

&lt;p&gt;The last add_uuid should have 1:10.3.0.35@o2ib instead of 1:10.3.0.34@o2ib.&lt;/p&gt;

&lt;p&gt;And the reason is that only the first nid of the first --servicenode AKA --failnode entry is considered.&lt;/p&gt;

&lt;p&gt;Please revert that little patch of mgs_llog.c.&lt;/p&gt;

&lt;p&gt;Regards,&lt;br/&gt;
Erich&lt;/p&gt;</description>
                <environment>failover MDS/MGS, failover OSTs&lt;br/&gt;
</environment>
        <key id="21972">LU-4243</key>
            <summary>multiple servicenodes or failnids: wrong client llog registration</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="hongchao.zhang">Hongchao Zhang</assignee>
                                    <reporter username="efocht">Erich Focht</reporter>
                        <labels>
                    </labels>
                <created>Tue, 12 Nov 2013 12:52:46 +0000</created>
                <updated>Thu, 13 Mar 2014 18:39:18 +0000</updated>
                            <resolved>Fri, 20 Dec 2013 15:12:49 +0000</resolved>
                                    <version>Lustre 2.4.1</version>
                    <version>Lustre 2.5.0</version>
                                    <fixVersion>Lustre 2.6.0</fixVersion>
                    <fixVersion>Lustre 2.4.2</fixVersion>
                    <fixVersion>Lustre 2.5.1</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>10</watches>
                                                                            <comments>
                            <comment id="71311" author="efocht" created="Tue, 12 Nov 2013 12:56:35 +0000"  >&lt;p&gt;The patch again, with hopefully proper formatting:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;@@ -1447,13 +1481,11 @@ static int mgs_write_log_failnids(const struct lu_env *env,
                                failnodeuuid, cliname);
                        rc = record_add_uuid(env, llh, nid, failnodeuuid);
                 }
-                if (failnodeuuid) {
+               if (failnodeuuid)
                        rc = record_add_conn(env, llh, cliname, failnodeuuid);
-                        name_destroy(&amp;amp;failnodeuuid);
-                        failnodeuuid = NULL;
-                }
         }
 
+       name_destroy(&amp;amp;failnodeuuid);
         return rc;
 }
 
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="71314" author="pjones" created="Tue, 12 Nov 2013 14:01:47 +0000"  >&lt;p&gt;Hongchao&lt;/p&gt;

&lt;p&gt;Could you please help with this one?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="71697" author="adilger" created="Fri, 15 Nov 2013 23:38:26 +0000"  >&lt;p&gt;Erich, to clarify, this problem only happens when you are trying to mount a client from the backup MGS node?&lt;/p&gt;</comment>
                            <comment id="71699" author="green" created="Fri, 15 Nov 2013 23:39:03 +0000"  >&lt;p&gt;I wonder if it&apos;s also related somehow to &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-3829&quot; title=&quot;MDT mount fails if mkfs.lustre is run with multiple mgsnode arguments on MDSs where MGS is not running&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-3829&quot;&gt;&lt;del&gt;LU-3829&lt;/del&gt;&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="71710" author="efocht" created="Sat, 16 Nov 2013 00:26:35 +0000"  >&lt;p&gt;Andreas, the problem occurs also for normal clients, IIRC. The peculiarity of the setup is that we formatted the Lustre devices with two &lt;tt&gt;-servicenode&lt;/tt&gt; arguments (which transform into failover.node options). The first mount of the devices was on the first of the service nodes. This used to work fine under 2.1.X. The other (maybe most widely used) way of formatting is with just one &lt;tt&gt;-failnode&lt;/tt&gt; option, say &lt;tt&gt;-failnode B&lt;/tt&gt;, and first mount of device on node A.&lt;/p&gt;

&lt;p&gt;The patch mentioned in my first comment leads to the problem that the failnodeuuid is set only once in the entire registration process, i.e. it is set to the first failover.node argument (or --sevicenode) which was used for formatting. Instead failnodeuuid should be set again for each of the appearing failover.node options. We verified that reverting that patch fixes the problem in 2.5.0.&lt;/p&gt;

&lt;p&gt;Oleg, we&apos;ve seen the issue with two --mgsnode options, too, but that&apos;s different. Seems fixed in 2.5.0.&lt;/p&gt;</comment>
                            <comment id="72115" author="hongchao.zhang" created="Fri, 22 Nov 2013 09:31:25 +0000"  >&lt;p&gt;the problem here is that the LNet doesn&apos;t use the other NIDs with the same &quot;distance&quot; and &quot;order&quot; contained in the same UUID, which is &quot;10.3.0.34@o2ib&quot;&lt;br/&gt;
in this case (see &quot;ptlrpc_uuid_to_connection&quot; and &quot;ptlrpc_uuid_to_peer&quot; for detailed info).&lt;/p&gt;

&lt;p&gt;this issue will still exist even the patch mentioned is reverted if MDT is formatted with &quot;--servicenode 10.3.0.34,10.3.0.35&quot; for these two NIDs will use&lt;br/&gt;
the same UUID &quot;10.3.0.34@o2ib&quot;.&lt;/p&gt;</comment>
                            <comment id="72116" author="hongchao.zhang" created="Fri, 22 Nov 2013 09:55:01 +0000"  >&lt;p&gt;the patch is tracked at &lt;a href=&quot;http://review.whamcloud.com/#/c/8372/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#/c/8372/&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="72364" author="bobijam" created="Wed, 27 Nov 2013 03:13:08 +0000"  >&lt;p&gt;Hi Hongchao,&lt;/p&gt;

&lt;p&gt;Why &quot;if MDT is formatted with &quot;--servicenode 10.3.0.34,10.3.0.35&quot; for these two NIDs will use the same UUID &quot;10.3.0.34@o2ib&quot;.&quot;? I think these two service nodes will be parsed to different failnodeuuid string.&lt;/p&gt;</comment>
                            <comment id="72517" author="hongchao.zhang" created="Fri, 29 Nov 2013 11:01:45 +0000"  >&lt;p&gt;in mkfs_lustre.c&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;&lt;span class=&quot;code-object&quot;&gt;int&lt;/span&gt; parse_opts(&lt;span class=&quot;code-object&quot;&gt;int&lt;/span&gt; argc, &lt;span class=&quot;code-object&quot;&gt;char&lt;/span&gt; *&lt;span class=&quot;code-keyword&quot;&gt;const&lt;/span&gt; argv[], struct mkfs_opts *mop,
               &lt;span class=&quot;code-object&quot;&gt;char&lt;/span&gt; **mountopts)
{
     ...
     &lt;span class=&quot;code-keyword&quot;&gt;case&lt;/span&gt; &lt;span class=&quot;code-quote&quot;&gt;&apos;s&apos;&lt;/span&gt;: {
         ...
         nids = convert_hostnames(optarg);
         &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (!nids)
             &lt;span class=&quot;code-keyword&quot;&gt;return&lt;/span&gt; 1;
         rc = add_param(mop-&amp;gt;mo_ldd.ldd_params, PARAM_FAILNODE,
                        nids);
         free(nids);
         ...
     }
     ...
}
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;&quot;convert_hostnames&quot; does little to &quot;10.3.0.34,10.3.0.35&quot;, and &quot;add_param&quot; will add a single &quot;failover.node&quot; param.&lt;/p&gt;</comment>
                            <comment id="72518" author="bobijam" created="Fri, 29 Nov 2013 11:25:16 +0000"  >&lt;p&gt;add_param() would separate the string into two parameters, like it shows on my VM machine&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;# mkfs.lustre --mgs --mdt --fsname=lustre --index=0 --servicenode 10.3.0.34,10.3.0.35 --reformat /dev/sdb
   Permanent disk data:
Target:     lustre:MDT0000
Index:      0
Lustre FS:  lustre
Mount type: ldiskfs
Flags:      0x1065
              (MDT MGS first_time update no_primnode )
Persistent mount opts: user_xattr,errors=remount-ro
Parameters: failover.node=10.3.0.34@tcp failover.node=10.3.0.35@tcp
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="73920" author="hongchao.zhang" created="Fri, 20 Dec 2013 09:55:40 +0000"  >&lt;p&gt;oh, Yes, it has been fixed in &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-3445&quot; title=&quot;Specifying multiple networks in NIDs does no longer work&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-3445&quot;&gt;&lt;del&gt;LU-3445&lt;/del&gt;&lt;/a&gt; (&lt;a href=&quot;http://review.whamcloud.com/#/c/6686/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#/c/6686/&lt;/a&gt;), which was landed on b2_4_1 and master, b2_4_0 still has this problem.&lt;/p&gt;</comment>
                            <comment id="73937" author="pjones" created="Fri, 20 Dec 2013 15:12:49 +0000"  >&lt;p&gt;Landed for 2.4.2 and 2.6. Will be landed for 2.5.1 shortly.&lt;/p&gt;</comment>
                            <comment id="77524" author="jfc" created="Thu, 20 Feb 2014 21:14:12 +0000"  >&lt;p&gt;Hello Erich &amp;#8211; Do you have what you need on this issue? If so, can I go ahead and mark it as resolved? Thanks, ~ jfc.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10010">
                    <name>Duplicate</name>
                                                                <inwardlinks description="is duplicated by">
                                        <issuelink>
            <issuekey id="21228">LU-4043</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzw8nr:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>11555</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>