<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:56:52 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-6060] ARF doesn&apos;t detect lack of interface on a router</title>
                <link>https://jira.whamcloud.com/browse/LU-6060</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;When using Asymmetric router failure detection, the system appears unable to determine the lack of an expected interface.  While a defined but non-functional interface is detected, the clients do not seem to detect when they have a route to a network via a router but that router has no means of getting the traffic there.&lt;/p&gt;

&lt;p&gt;Take for example a few nodes, login1, rtr5, rtr6, and mgs.  This was demonstrated on live hardware, although the following example is abstracted/has changed addresses and names.  &lt;/p&gt;

&lt;p&gt;Host: interfaces (routes)&lt;br/&gt;
login1: 30@gni1 (o2ib1 via 27@gni1, o2ib1 via 31@gni1)&lt;br/&gt;
rtr5: 27@gni1 10.1.1.5@o2ib1 ()&lt;br/&gt;
rtr6: 31@gni1 10.1.1.6@o2ib1 ()&lt;br/&gt;
mgs: 10.1.1.1@o2ib1 (gni1 via 10.1.1.5@o2ib1 and gni1 via 10.1.1.6@o2ib1)&lt;/p&gt;

&lt;p&gt;In other words, we have two routers with two interfaces each sitting between LNET1 and GNI1.&lt;/p&gt;

&lt;p&gt;Reproduction steps:&lt;br/&gt;
Enable ARF via configs, ensure running&lt;br/&gt;
Configure interface ib0 on rtr5 to not start on boot.&lt;br/&gt;
Reboot rtr5 (ifconfig ib0 shows no ib0 down / no IP)&lt;br/&gt;
start lnet (lctl net up)&lt;/p&gt;

&lt;p&gt;show missing interface on rtr5 via lctl list_nids &lt;br/&gt;
rtr5:~ # lctl list_nids&lt;br/&gt;
27@gni1&lt;br/&gt;
rtr5:~ # &lt;/p&gt;

&lt;p&gt;on login1 ping mgs&lt;br/&gt;
lctl ping 10.1.1.1@o2ib1 (result is 50% success, 50% I/O error)&lt;/p&gt;

&lt;p&gt;show routes&lt;br/&gt;
login1:~ # lctl show_route&lt;br/&gt;
net              o2ib1 hops 1 gw                          27@gni1 up pri 0&lt;br/&gt;
net              o2ib1 hops 1 gw                          31@gni1 up pri 0&lt;/p&gt;

&lt;p&gt;look for down_ni&lt;br/&gt;
login1:~ # cat /proc/sys/lnet/routers&lt;br/&gt;
ref  rtr_ref alive_cnt  state    last_ping ping_sent deadline down_ni router&lt;br/&gt;
4          1         1     up           28         1       NA       0 27@gni1&lt;br/&gt;
4          1         1     up           28         1       NA       0 31@gni1&lt;/p&gt;

&lt;p&gt;In other words, there is no way to get to o2ib1 via rtr5, but arf does not detect this.  Presumably, at least in a non-multihop configuration, clients should be concerned not with whether the router has defined routes that aren&apos;t working, but wether the client has a defined route that a router can&apos;t handle due to a down interface or a lack of an interface.&lt;/p&gt;</description>
                <environment></environment>
        <key id="27994">LU-6060</key>
            <summary>ARF doesn&apos;t detect lack of interface on a router</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="wc-triage">WC Triage</assignee>
                                    <reporter username="lewisj">John Lewis</reporter>
                        <labels>
                    </labels>
                <created>Fri, 19 Dec 2014 19:54:56 +0000</created>
                <updated>Tue, 14 Jul 2015 21:43:38 +0000</updated>
                            <resolved>Tue, 20 Jan 2015 14:46:29 +0000</resolved>
                                    <version>Lustre 2.5.3</version>
                                    <fixVersion>Lustre 2.7.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>10</watches>
                                                                            <comments>
                            <comment id="102137" author="liang" created="Sat, 20 Dec 2014 02:03:40 +0000"  >&lt;p&gt;Hi John,  do you have patch on &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-5485&quot; title=&quot;first mount always fail with avoid_asym_router_failure&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-5485&quot;&gt;&lt;del&gt;LU-5485&lt;/del&gt;&lt;/a&gt; in your environment? &lt;/p&gt;</comment>
                            <comment id="102138" author="simmonsja" created="Sat, 20 Dec 2014 02:17:31 +0000"  >&lt;p&gt;Yes the patch for &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-5485&quot; title=&quot;first mount always fail with avoid_asym_router_failure&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-5485&quot;&gt;&lt;del&gt;LU-5485&lt;/del&gt;&lt;/a&gt; is included. Without the patch we can&apos;t mount a file system with ARF enabled.&lt;/p&gt;</comment>
                            <comment id="102153" author="gerrit" created="Sun, 21 Dec 2014 03:26:29 +0000"  >&lt;p&gt;Liang Zhen (liang.zhen@intel.com) uploaded a new patch: &lt;a href=&quot;http://review.whamcloud.com/13162&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/13162&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6060&quot; title=&quot;ARF doesn&amp;#39;t detect lack of interface on a router&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6060&quot;&gt;&lt;del&gt;LU-6060&lt;/del&gt;&lt;/a&gt; lnet: set downis to 1 if there&apos;s no NI for remote net&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: b2_5&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 5ed16f284871b6b898591815cb7d5468ae2c3fca&lt;/p&gt;</comment>
                            <comment id="102154" author="liang" created="Sun, 21 Dec 2014 03:28:43 +0000"  >&lt;p&gt;James, I think the issue here is, we will not record downis if there is no NI for target network, above patch should fix this problem. Also, I&apos;m wondering if this the same problem of &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-5758&quot; title=&quot;enabling avoid_asym_router_failure prvents the bring up of ORNL production systems&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-5758&quot;&gt;&lt;del&gt;LU-5758&lt;/del&gt;&lt;/a&gt;, could you please comment on 5758? &lt;/p&gt;</comment>
                            <comment id="103596" author="simmonsja" created="Thu, 15 Jan 2015 15:15:48 +0000"  >&lt;p&gt;Can you make a patch for master as well. Testing looks good for the patch you provided.&lt;/p&gt;</comment>
                            <comment id="103620" author="gerrit" created="Thu, 15 Jan 2015 17:35:58 +0000"  >&lt;p&gt;Jian Yu (jian.yu@intel.com) uploaded a new patch: &lt;a href=&quot;http://review.whamcloud.com/13417&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/13417&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6060&quot; title=&quot;ARF doesn&amp;#39;t detect lack of interface on a router&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6060&quot;&gt;&lt;del&gt;LU-6060&lt;/del&gt;&lt;/a&gt; lnet: set downis to 1 if there&apos;s no NI for remote net&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 826353849a9a51a4d4c53accd449e9427386f57c&lt;/p&gt;</comment>
                            <comment id="103924" author="gerrit" created="Mon, 19 Jan 2015 23:43:09 +0000"  >&lt;p&gt;Oleg Drokin (oleg.drokin@intel.com) merged in patch &lt;a href=&quot;http://review.whamcloud.com/13417/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/13417/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6060&quot; title=&quot;ARF doesn&amp;#39;t detect lack of interface on a router&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6060&quot;&gt;&lt;del&gt;LU-6060&lt;/del&gt;&lt;/a&gt; lnet: set downis to 1 if there&apos;s no NI for remote net&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 749dc54622b2c3267c6c97eb770702b437a7897d&lt;/p&gt;</comment>
                            <comment id="103991" author="pjones" created="Tue, 20 Jan 2015 14:46:29 +0000"  >&lt;p&gt;Landed for 2.7&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="31086">LU-6851</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="27053">LU-5758</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="26010">LU-5485</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzx2xr:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>16875</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>