<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:49:14 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-12049] Multirail - server trying to connect unconfigured nid</title>
                <link>https://jira.whamcloud.com/browse/LU-12049</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;I had set up 2 server with multirail (ib0 and ib1) like this:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
srv1
10.151.26.196@o2ib (ib0)
10.151.26.195@o2ib (ib1)

Srv2
10.151.26.197@o2ib (ib1)
10.151.26.198@o2ib (ib0)
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Serv1 was rebooted and it came up with 2 interfaces.&lt;br/&gt;
Then serv2 was rebooted and it came up with 1 interface.&lt;/p&gt;

&lt;p&gt;AFTER REBOOT CONFIG:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
srv1 ~ # lnetctl net show
net:
    - net type: lo
      local NI(s):
        - nid: 0@lo
          status: up
    - net type: o2ib
      local NI(s):
        - nid: 10.151.26.196@o2ib
          status: up
          interfaces:
              0: ib0
        - nid: 10.151.26.195@o2ib
          status: up
          interfaces:
              0: ib1
---------------------------------
srv2 ~ # lnetctl net show
net:
    - net type: lo
      local NI(s):
        - nid: 0@lo
          status: up
    - net type: o2ib
      local NI(s):
        - nid: 10.151.26.197@o2ib
          status: up
          interfaces:
              0: ib1
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;But srv1 still things srv2 should have 2 interfaces.&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
srv1 # lnetctl peer show
...
    - primary nid: 10.151.26.197@o2ib
      Multi-Rail: True
      peer ni:
        - nid: 10.151.26.197@o2ib
          state: NA
        - nid: 10.151.26.198@o2ib
          state: NA
....

srv1 ~ # lnetctl discover 10.151.26.197@o2ib
manage:
    - discover:
          errno: -1
          descr: failed to discover 10.151.26.197@o2ib: Connection timed out
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
[ 2623.243967] LNet: 270:0:(o2iblnd.c:941:kiblnd_create_conn()) peer 10.151.26.198@o2ib - queue depth reduced from 63 to 42  to allow &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; qp creation
[ 2623.283462] LNet: 270:0:(o2iblnd.c:941:kiblnd_create_conn()) Skipped 1813 previous similar messages
[ 2741.589327] Lustre: 17563:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1551901661/real 1551901663]  req@ffff882ba16f9500 x1627284088955520/t0(0) o13-&amp;gt;nbp16-OST000d-osc-MDT0000@10.151.26.197@o2ib:7/4 lens 224/368 e 0 to 1 dl 1551902116 ref 1 fl Rpc:eX/2/ffffffff rc -11/-1
[ 2741.676417] Lustre: 17563:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 114 previous similar messages
[ 2741.706242] Lustre: nbp16-OST000d-osc-MDT0000: Connection to nbp16-OST000d (at 10.151.26.197@o2ib) was lost; in progress operations using &lt;span class=&quot;code-keyword&quot;&gt;this&lt;/span&gt; service will wait &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; recovery to complete
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;So the srv1 keep trying to connect to the alternate nid on srv2. Even thought that nid is not even configured.&#160;&lt;/p&gt;</description>
                <environment></environment>
        <key id="55083">LU-12049</key>
            <summary>Multirail - server trying to connect unconfigured nid</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="3">Duplicate</resolution>
                                        <assignee username="ashehata">Amir Shehata</assignee>
                                    <reporter username="mhanafi">Mahmoud Hanafi</reporter>
                        <labels>
                    </labels>
                <created>Wed, 6 Mar 2019 20:32:30 +0000</created>
                <updated>Wed, 6 Jan 2021 13:15:44 +0000</updated>
                            <resolved>Wed, 6 Jan 2021 13:15:44 +0000</resolved>
                                                                        <due></due>
                            <votes>0</votes>
                                    <watches>5</watches>
                                                                            <comments>
                            <comment id="243459" author="pjones" created="Thu, 7 Mar 2019 12:48:39 +0000"  >&lt;p&gt;Mahmoud&lt;/p&gt;

&lt;p&gt;Could you please clarify which Lustre version you are using&lt;/p&gt;

&lt;p&gt;Amir&lt;/p&gt;

&lt;p&gt;Could you please advise?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="243478" author="mhanafi" created="Thu, 7 Mar 2019 17:20:40 +0000"  >&lt;p&gt;This is once a peer is discovered as having 2 nids and that peer is restarted with only a single nid. Clients and servers that had discovered with 2 nids are able to rediscover that it only has one nid now.&lt;/p&gt;</comment>
                            <comment id="243483" author="ashehata" created="Thu, 7 Mar 2019 17:44:54 +0000"  >&lt;p&gt;Yes there is a current issue with the way reboots are handled. Discovery uses a sequence number to check if the information its getting is out of date. That algorithm however doesn&apos;t work if the node reboots, changes and comes up. The sequence number gets reset, so all updates are deemed out of date. I have a fix for that on the multi-rail branch as part of the MR Routing/UDSP work.&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
 &#160;
4965bc886f792067046e7c25ec7b3c80888093eb LU-11478 lnet: misleading discovery seqno.&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="288754" author="mhanafi" created="Wed, 6 Jan 2021 01:37:14 +0000"  >&lt;p&gt;please close we have picked up &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-11478&quot; title=&quot;LNet: discovery sequence numbers could be misleading&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-11478&quot;&gt;&lt;del&gt;LU-11478&lt;/del&gt;&lt;/a&gt;&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                            <attachment id="32187" name="srv1.debug.gz" size="6222669" author="mhanafi" created="Wed, 6 Mar 2019 20:25:34 +0000"/>
                            <attachment id="32188" name="srv2.debug.gz" size="374927" author="mhanafi" created="Wed, 6 Mar 2019 20:25:26 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i00cv3:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>