<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 03:03:34 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-13712] Flaw in MR Routing Algorithm</title>
                <link>https://jira.whamcloud.com/browse/LU-13712</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;While testing the fix for &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-13708&quot; title=&quot;lnet_notify can set route aliveness incorrectly&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-13708&quot;&gt;&lt;del&gt;LU-13708&lt;/del&gt;&lt;/a&gt;, we found that communication was still severely disrupted.&lt;/p&gt;

&lt;p&gt;I ran some ping tests between a client and a server. From the logs, we could see the router was attempting to forward a message using an interface that had been disabled:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;00000400:00000200:0.0:1593005930.168691:0:7991:0:(lib-move.c:4323:lnet_parse()) TRACE: 10.16.100.56@o2ib10(605@gni) &amp;lt;- 610@gni : GET - routed
00000800:00000200:0.0:1593005930.168698:0:7991:0:(gnilnd_cb.c:2450:kgnilnd_recv()) $$ conn ffff88082b37c800, rxmsg ffffc900201321c8, lntmsg ffff880716ef6040 niov=0 kiov=          (null) iov=          (null) offset=0 mlen=0 rlen=0 from 610@gni  msg@0xffffc900201321c8 m/v/ty/ck/pck/pl b00fbabe/8/2/0/0/0 x2156869:GNILND_MSG_IMMEDIATE
00000800:00000200:0.0:1593005930.168707:0:7991:0:(gnilnd_cb.c:2092:kgnilnd_consume_rx()) $$ rx ffff880829491d80 processed from 610@gni  msg@0xffffc900201321c8 m/v/ty/ck/pck/pl b00fbabe/8/2/0/0/0 x2156869:GNILND_MSG_IMMEDIATE
00000800:00000200:0.0:1593005930.168711:0:7991:0:(gnilnd_cb.c:2058:kgnilnd_release_msg()) consuming ffff88082b37c800
00000400:00000200:0.0:1593005930.168718:0:7991:0:(lib-msg.c:996:lnet_is_health_check()) health check = 1, status = 0, hstatus = 0
00000400:00000200:0.0:1593005930.168722:0:7991:0:(lib-msg.c:825:lnet_health_check()) health check: 605@gni-&amp;gt;610@gni: GET: OK
00000400:00000200:0.0:1593005930.168727:0:7991:0:(lib-move.c:2624:lnet_handle_send_case_locked()) Source ANY to NMR:  10.16.100.56@o2ib10 local destination
00000400:00000200:0.0:1593005930.168737:0:7991:0:(lib-move.c:1853:lnet_handle_send()) TRACE: 610@gni(10.16.100.14@o2ib10:&amp;lt;?&amp;gt;) -&amp;gt; 10.16.100.56@o2ib10(10.16.100.56@o2ib10:10.16.100.56@o2ib10) &amp;lt;?&amp;gt; : GET try# 0 &amp;lt;&amp;lt;&amp;lt;&amp;lt; 10.16.100.14@o2ib10 is disabled interface
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;There&apos;s a flaw in the logic used to forward the message. The path selection code treats this like a local send to a non-multi-rail peer. The reason for this is that we don&apos;t want the router to modify the destination interface. However, this code path sets a &quot;preferred NI&quot; that gets used for future sends. In this case, the first time the router forwarded a message to 10.16.100.56@o2ib10, it set 10.16.100.14@o2ib10 as the &quot;preferred NI&quot; to be used when communicating with this node. Now, even though that interface is down, it is still selecting it to forward messages because of this preferred status.&lt;/p&gt;</description>
                <environment></environment>
        <key id="59704">LU-13712</key>
            <summary>Flaw in MR Routing Algorithm</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="hornc">Chris Horn</assignee>
                                    <reporter username="hornc">Chris Horn</reporter>
                        <labels>
                    </labels>
                <created>Wed, 24 Jun 2020 16:30:31 +0000</created>
                <updated>Fri, 7 Aug 2020 14:11:26 +0000</updated>
                            <resolved>Fri, 7 Aug 2020 14:11:26 +0000</resolved>
                                                    <fixVersion>Lustre 2.14.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>2</watches>
                                                                            <comments>
                            <comment id="273662" author="gerrit" created="Wed, 24 Jun 2020 16:32:21 +0000"  >&lt;p&gt;Chris Horn (chris.horn@hpe.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/39168&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/39168&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-13712&quot; title=&quot;Flaw in MR Routing Algorithm&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-13712&quot;&gt;&lt;del&gt;LU-13712&lt;/del&gt;&lt;/a&gt; lnet: Preferred NI logic breaks MR routing&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 19efba0448bb3244200b24b92debc6fe1eac26a2&lt;/p&gt;</comment>
                            <comment id="276894" author="gerrit" created="Fri, 7 Aug 2020 04:58:07 +0000"  >&lt;p&gt;Oleg Drokin (green@whamcloud.com) merged in patch &lt;a href=&quot;https://review.whamcloud.com/39168/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/39168/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-13712&quot; title=&quot;Flaw in MR Routing Algorithm&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-13712&quot;&gt;&lt;del&gt;LU-13712&lt;/del&gt;&lt;/a&gt; lnet: Preferred NI logic breaks MR routing&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: ef6c35877b96c11a83a6cb823bf66e44bf355ed3&lt;/p&gt;</comment>
                            <comment id="276936" author="pjones" created="Fri, 7 Aug 2020 14:11:26 +0000"  >&lt;p&gt;Landed for 2.14&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i013k7:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>