<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 03:02:13 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-13548] LNet: b2_12 discovery of non-MR peers may yield unreachable peer NIs</title>
                <link>https://jira.whamcloud.com/browse/LU-13548</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;If non-MR peer (2.10.8) is discovered by a 2.12 MR peer, the following problem may happen: if non-MR peer has LNets that are not defined on the MR peer, it is possible that a NID on the undefined LNet is listed as primary. Later this causes communication problems when mounting.&#160;&lt;/p&gt;

&lt;p&gt;Here&apos;s an example of the buggy discovery:&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;lnetctl discover 192.168.1.123@o2ib4&lt;/p&gt;

&lt;p&gt;discover:&lt;/p&gt;

&lt;p&gt;&#160; &#160; - primary nid: 192.168.1.123@o2ib&lt;/p&gt;

&lt;p&gt;&#160; &#160; &#160; Multi-Rail: False&lt;/p&gt;

&lt;p&gt;&#160; &#160; &#160; peer ni:&lt;/p&gt;

&lt;p&gt;&#160; &#160; &#160; &#160; - nid: 192.168.1.123@o2ib4&lt;/p&gt;

&lt;p&gt;&#160; &#160; &#160; &#160; - nid: 192.168.1.123@o2ib&lt;/p&gt;

&lt;p&gt;lnetctl peer show&lt;/p&gt;

&lt;p&gt;peer:&lt;/p&gt;

&lt;p&gt;&#160; &#160; - primary nid: 192.168.1.123@o2ib&lt;/p&gt;

&lt;p&gt;&#160; &#160; &#160; Multi-Rail: False&lt;/p&gt;

&lt;p&gt;&#160; &#160; &#160; peer ni:&lt;/p&gt;

&lt;p&gt;&#160; &#160; &#160; &#160; - nid: 192.168.1.123@o2ib4&lt;/p&gt;

&lt;p&gt;&#160; &#160; &#160; &#160; &#160; state: NA&lt;/p&gt;

&lt;p&gt;&#160; &#160; &#160; &#160; - nid: 192.168.1.123@o2ib&lt;/p&gt;

&lt;p&gt;&#160; &#160; &#160; &#160; &#160; state: NA&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;In the example above, the peer that is running the discovery has an only nid on o2ib4, and so designating a peer with a primary nid on o2ib is a problem.&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;Here&apos;s the lnet config on the MR peer (the peer running discovery):&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;lnetctl net show&lt;/p&gt;

&lt;p&gt;net:&lt;/p&gt;

&lt;p&gt;&#160; &#160; - net type: lo&lt;/p&gt;

&lt;p&gt;&#160; &#160; &#160; local NI(s):&lt;/p&gt;

&lt;p&gt;&#160; &#160; &#160; &#160; - nid: 0@lo&lt;/p&gt;

&lt;p&gt;&#160; &#160; &#160; &#160; &#160; status: up&lt;/p&gt;

&lt;p&gt;&#160; &#160; - net type: o2ib4&lt;/p&gt;

&lt;p&gt;&#160; &#160; &#160; local NI(s):&lt;/p&gt;

&lt;p&gt;&#160; &#160; &#160; &#160; - nid: 192.168.1.105@o2ib4&lt;/p&gt;

&lt;p&gt;&#160; &#160; &#160; &#160; &#160; status: up&lt;/p&gt;

&lt;p&gt;&#160; &#160; &#160; &#160; &#160; interfaces:&lt;/p&gt;

&lt;p&gt;&#160; &#160; &#160; &#160; &#160; &#160; &#160; 0: ib0&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;&#160;Here&apos;s the lnet config on the non-MR peer (the peer being discovered):&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;lnetctl net show&lt;/p&gt;

&lt;p&gt;net:&lt;/p&gt;

&lt;p&gt;&#160; &#160; - net type: lo&lt;/p&gt;

&lt;p&gt;&#160; &#160; &#160; local NI(s):&lt;/p&gt;

&lt;p&gt;&#160; &#160; &#160; &#160; - nid: 0@lo&lt;/p&gt;

&lt;p&gt;&#160; &#160; &#160; &#160; &#160; status: up&lt;/p&gt;

&lt;p&gt;&#160; &#160; - net type: o2ib&lt;/p&gt;

&lt;p&gt;&#160; &#160; &#160; local NI(s):&lt;/p&gt;

&lt;p&gt;&#160; &#160; &#160; &#160; - nid: 192.168.1.123@o2ib&lt;/p&gt;

&lt;p&gt;&#160; &#160; &#160; &#160; &#160; status: up&lt;/p&gt;

&lt;p&gt;&#160; &#160; &#160; &#160; &#160; interfaces:&lt;/p&gt;

&lt;p&gt;&#160; &#160; &#160; &#160; &#160; &#160; &#160; 0: ib0&lt;/p&gt;

&lt;p&gt;&#160; &#160; - net type: o2ib4&lt;/p&gt;

&lt;p&gt;&#160; &#160; &#160; local NI(s):&lt;/p&gt;

&lt;p&gt;&#160; &#160; &#160; &#160; - nid: 192.168.1.123@o2ib4&lt;/p&gt;

&lt;p&gt;&#160; &#160; &#160; &#160; &#160; status: up&lt;/p&gt;

&lt;p&gt;&#160; &#160; &#160; &#160; &#160; interfaces:&lt;/p&gt;

&lt;p&gt;&#160; &#160; &#160; &#160; &#160; &#160; &#160; 0: ib0&lt;/p&gt;&lt;/blockquote&gt;</description>
                <environment></environment>
        <key id="59153">LU-13548</key>
            <summary>LNet: b2_12 discovery of non-MR peers may yield unreachable peer NIs</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="1" iconUrl="https://jira.whamcloud.com/images/icons/statuses/open.png" description="The issue is open and ready for the assignee to start work on it.">Open</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="wc-triage">WC Triage</assignee>
                                    <reporter username="ssmirnov">Serguei Smirnov</reporter>
                        <labels>
                    </labels>
                <created>Tue, 12 May 2020 22:05:57 +0000</created>
                <updated>Tue, 15 Mar 2022 14:58:36 +0000</updated>
                                            <version>Lustre 2.12.4</version>
                                                        <due></due>
                            <votes>1</votes>
                                    <watches>16</watches>
                                                                            <comments>
                            <comment id="269990" author="ssmirnov" created="Tue, 12 May 2020 23:07:51 +0000"  >&lt;p&gt;It has been determined that porting changes from&#160;&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-11641&quot; title=&quot;LNet Router: handle discovery off case&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-11641&quot;&gt;&lt;del&gt;LU-11641&lt;/del&gt;&lt;/a&gt; to 2.12 is able to change the discovery behaviour as follows:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;lnetctl discover 192.168.1.123@o2ib4&lt;/p&gt;

&lt;p&gt;discover:&lt;/p&gt;

&lt;p&gt;&#160; &#160; - primary nid: 192.168.1.123@o2ib4&lt;/p&gt;

&lt;p&gt;&#160; &#160; &#160; Multi-Rail: False&lt;/p&gt;

&lt;p&gt;&#160; &#160; &#160; peer ni:&lt;/p&gt;

&lt;p&gt;&#160; &#160; &#160; &#160; - nid: 192.168.1.123@o2ib4&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;lnetctl peer show&lt;/p&gt;

&lt;p&gt;peer:&lt;/p&gt;

&lt;p&gt;&#160; &#160; - primary nid: 192.168.1.123@o2ib4&lt;/p&gt;

&lt;p&gt;&#160; &#160; &#160; Multi-Rail: False&lt;/p&gt;

&lt;p&gt;&#160; &#160; &#160; peer ni:&lt;/p&gt;

&lt;p&gt;&#160; &#160; &#160; &#160; - nid: 192.168.1.123@o2ib4&lt;/p&gt;

&lt;p&gt;&#160; &#160; &#160; &#160; &#160; state: NA&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;This is the correct behaviour. Same is observed on 2.13 (2.13 peer discovering 2.10.8 peer, same configuration)&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;</comment>
                            <comment id="286410" author="charr" created="Tue, 1 Dec 2020 21:03:58 +0000"  >&lt;p&gt;Serguei, You mention a port to 2.12 above. Which 2.12 minor version has the &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-11641&quot; title=&quot;LNet Router: handle discovery off case&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-11641&quot;&gt;&lt;del&gt;LU-11641&lt;/del&gt;&lt;/a&gt; patch? Is this ticket still waiting on additional work?&lt;/p&gt;</comment>
                            <comment id="286422" author="ssmirnov" created="Tue, 1 Dec 2020 22:16:53 +0000"  >&lt;p&gt;Hi Cameron,&lt;/p&gt;

&lt;p&gt;The earlier comment was about a test that I ran at the time, based on 2.12 + ported changes. It was just a proof of concept as it broke something else. The actual patch with proper fix went into a private branch, but it still needs to be ported to 2.12. I guess I thought that MRR feature was going to get ported to 2.12, but that was wrong. I&apos;ll add porting this fix to 2.12 to my list of things to do.&lt;/p&gt;

&lt;p&gt;Thanks,&lt;/p&gt;

&lt;p&gt;Serguei.&lt;/p&gt;</comment>
                            <comment id="286423" author="charr" created="Tue, 1 Dec 2020 22:21:24 +0000"  >&lt;p&gt;Thank you Serguei!&lt;/p&gt;</comment>
                            <comment id="286461" author="degremoa" created="Wed, 2 Dec 2020 09:13:00 +0000"  >&lt;p&gt;I don&apos;t know Serguei what&apos;s your current workload and when you will be able to port this fix but I would appreciate if you can push a non-fully ported patch somewhere for me to look at it and see if i can finish the backport.&lt;/p&gt;</comment>
                            <comment id="286689" author="gerrit" created="Thu, 3 Dec 2020 23:44:24 +0000"  >&lt;p&gt;Serguei Smirnov (ssmirnov@whamcloud.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/40857&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/40857&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-13548&quot; title=&quot;LNet: b2_12 discovery of non-MR peers may yield unreachable peer NIs&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-13548&quot;&gt;LU-13548&lt;/a&gt; lnet: backport fix for discovery of non-MR peers&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: b2_12&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: d20018597825ba5ad85ffec2bbd148ae4bc8ccb1&lt;/p&gt;</comment>
                            <comment id="287959" author="ofaaland" created="Fri, 18 Dec 2020 00:09:54 +0000"  >&lt;p&gt;Hi Serguei and Aurelien,&lt;br/&gt;
I put 2.12.6 +  &lt;a href=&quot;https://review.whamcloud.com/40857&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/40857&lt;/a&gt; on a system and had issues related to routers (and therefore LNet).  I&apos;m working building 2.12.6 and then checking to see if I can still re-create the problem.&lt;/p&gt;</comment>
                            <comment id="290509" author="ssmirnov" created="Wed, 27 Jan 2021 22:58:32 +0000"  >&lt;p&gt;Hi Olaf,&lt;/p&gt;

&lt;p&gt;Are you still having routing issues with this patch?&lt;/p&gt;

&lt;p&gt;Thanks,&lt;/p&gt;

&lt;p&gt;Serguei.&lt;/p&gt;</comment>
                            <comment id="290600" author="degremoa" created="Thu, 28 Jan 2021 16:10:42 +0000"  >&lt;p&gt;My 2 cents: our minimal testing confirmed this patch is working. But I didn&apos;t test with routers.&lt;/p&gt;</comment>
                            <comment id="290624" author="ssmirnov" created="Thu, 28 Jan 2021 18:27:47 +0000"  >&lt;p&gt;FYI, with routing the following still results in a problem:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
NodeA --tcp0-- GW --tcp1-- NodeB &lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
------ NodeA ------
lnetctl net show
    - net type: tcp9
      local NI(s):
        - nid: 192.168.122.10@tcp9
    - net type: tcp
      local NI(s):
        - nid: 192.168.122.142@tcp&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
------ NodeB ------
lnetctl net show
net:
    - net type: tcp1
      local NI(s):
        - nid: 192.168.122.40@tcp1&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
------ NodeB ------
lnetctl peer show
peer:
    - primary nid: 192.168.122.10@tcp9
      Multi-Rail: True
      peer ni:
        - nid: 192.168.122.142@tcp
          state: NA
        - nid: 192.168.122.10@tcp9
          state: NA&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Note that NodeB lists NodeA under the unreachable tcp9 primary nid. Even though NodeB is aware of the reachable nid for NodeA, it gets confused if using the primary nid:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
------ NodeB ------
lnetctl ping 192.168.122.10@tcp9
manage:
    - ping:
          errno: -1
          descr: failed to ping 192.168.122.10@tcp9: Input/output error&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&#160;This is being tracked in&#160;&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-14386&quot; title=&quot;LNet: select reachable remote peer nid &quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-14386&quot;&gt;LU-14386&lt;/a&gt;&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="54452">LU-11840</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="62591">LU-14386</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i01067:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>