<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:35:59 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-3679] /proc/sys/lnet/routes should accurately reflect routing with ARF when LNet router has one or more down NIs</title>
                <link>https://jira.whamcloud.com/browse/LU-3679</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;On a system where an LNet router has more than one NI, ARF is configured on clients and servers, and one or more of the LNet router&apos;s NIs goes &quot;down&quot;, /proc/sys/lnet/routes on clients/servers should show routes for that router as &quot;down&quot; rather than &quot;up&quot;.&lt;/p&gt;

&lt;p&gt;The story: A site was doing some tests of FGR where LNet routers had two IB interfaces. After seeing wide variations in packet counts between ib0 and ib1, they noticed that some NIs were down on the routers&lt;/p&gt;

&lt;p&gt;&amp;gt; lnet6: nid                      status alive refs peer  rtr   max    tx   min&lt;br/&gt;
&amp;gt; lnet6: 0@lo                         up     0    2    0    0     0     0     0&lt;br/&gt;
&amp;gt; lnet6: 454@gni                      up     0  679   16    0  2048  2048  1664&lt;br/&gt;
&amp;gt; lnet6: 10.100.100.160@o2ib1000      up    18    3   63  128  2048  2048  2047&lt;br/&gt;
&amp;gt; lnet6: 10.100.100.160@o2ib1002      up    12    4   63  128  2048  2048  2047&lt;br/&gt;
&amp;gt; lnet6: 10.100.100.160@o2ib1004      up     0    4   63  128  2048  2048  1859&lt;br/&gt;
&amp;gt; lnet6: 10.100.100.161@o2ib1006    down 66420    1   63  128  2048  2048  2048&lt;br/&gt;
&amp;gt; lnet6: 10.100.100.161@o2ib1007    down 66420    1   63  128  2048  2048  2048&lt;/p&gt;

&lt;p&gt;but were up for IPOIB. This caused some confusion, and was compounded by the fact that clients show these routes as still functional:&lt;/p&gt;

&lt;p&gt;cat /proc/sys/lnet/routes | grep 454&lt;br/&gt;
o2ib1000    2      up 454@gni&lt;br/&gt;
o2ib1002    2      up 454@gni&lt;br/&gt;
o2ib1004    1      up 454@gni&lt;br/&gt;
o2ib1006    1      up 454@gni&lt;br/&gt;
o2ib1007    2      up 454@gni&lt;/p&gt;

&lt;p&gt;This lead people to believe that clients were still trying to use routes that were actually down resulting in performance problems. Since ARF was configured, we know this wasn&apos;t actually the case. Clients will not use a router if that router has one or more down NIs. This should be reflected in the output of /proc/sys/lnet/routes.&lt;/p&gt;</description>
                <environment></environment>
        <key id="20135">LU-3679</key>
            <summary>/proc/sys/lnet/routes should accurately reflect routing with ARF when LNet router has one or more down NIs</summary>
                <type id="4" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11310&amp;avatarType=issuetype">Improvement</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="dmiter">Dmitry Eremin</assignee>
                                    <reporter username="hornc">Chris Horn</reporter>
                        <labels>
                            <label>patch</label>
                    </labels>
                <created>Wed, 31 Jul 2013 20:56:44 +0000</created>
                <updated>Tue, 4 Feb 2014 08:11:25 +0000</updated>
                            <resolved>Tue, 4 Feb 2014 08:11:25 +0000</resolved>
                                    <version>Lustre 2.5.0</version>
                                    <fixVersion>Lustre 2.6.0</fixVersion>
                    <fixVersion>Lustre 2.5.1</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>6</watches>
                                                                            <comments>
                            <comment id="63730" author="isaac" created="Tue, 6 Aug 2013 21:46:58 +0000"  >&lt;p&gt;Yes, a route should be considered &quot;down&quot; if the router is down or the router NI for the target network is down.&lt;/p&gt;</comment>
                            <comment id="63780" author="hornc" created="Wed, 7 Aug 2013 16:08:26 +0000"  >&lt;p&gt;Correct me if I&apos;m wrong, but if the NI for the target network is up and an NI for a &lt;em&gt;different&lt;/em&gt; target network is down the router still won&apos;t be used due to ARF, right?&lt;/p&gt;</comment>
                            <comment id="63829" author="isaac" created="Wed, 7 Aug 2013 22:06:50 +0000"  >&lt;p&gt;In that case the route will still used. For example, if router 454@gni has @o2ib1000 NI down but @o2ib1002 NI up, there is no reason why 454@gni can&apos;t be used as a route to @o2ib1002. Note that route != router (a router can serve as next hop in multiple routes), in the example, the route to @o2ib1000 via 454@gni is down, but the route to @o2ib1002 via 454@gni is up.&lt;/p&gt;</comment>
                            <comment id="63886" author="hornc" created="Thu, 8 Aug 2013 16:15:01 +0000"  >&lt;p&gt;Ah right. I had missed the bit of code in lnet_parse_rc_info() that ignored other down NIs on a router if the NI for the destination network was up.&lt;/p&gt;</comment>
                            <comment id="68116" author="hornc" created="Tue, 1 Oct 2013 22:21:59 +0000"  >&lt;p&gt;FYI, I have a patch for this awaiting testing and a push into Gerrit for review. Just don&apos;t want anyone to duplicate effort here.&lt;/p&gt;</comment>
                            <comment id="68418" author="hornc" created="Fri, 4 Oct 2013 21:38:50 +0000"  >&lt;p&gt;For your review: &lt;a href=&quot;http://review.whamcloud.com/#/c/7857/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#/c/7857/&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="69980" author="pjones" created="Sun, 27 Oct 2013 04:16:12 +0000"  >&lt;p&gt;Landed for 2.6&lt;/p&gt;</comment>
                            <comment id="70867" author="hornc" created="Wed, 6 Nov 2013 16:57:35 +0000"  >&lt;p&gt;Can we get this on b2_5?&lt;/p&gt;</comment>
                            <comment id="70874" author="dmiter" created="Wed, 6 Nov 2013 17:26:45 +0000"  >&lt;p&gt;patch for b2_5 is &lt;a href=&quot;http://review.whamcloud.com/8195&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/8195&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="76168" author="dmiter" created="Tue, 4 Feb 2014 08:11:25 +0000"  >&lt;p&gt;Landed to b2_5&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzvwy7:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9498</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                </customfields>
    </item>
</channel>
</rss>