<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 03:10:47 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-14555] lnet_check_route_inconsistency() complains when hops == -1</title>
                <link>https://jira.whamcloud.com/browse/LU-14555</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;We have the following configuration:&lt;/p&gt;

&lt;p&gt;2.14_servers == o2ib100 == 2.12_routers == tcp129 == 2.12_routers == o2ib18 == 2.12_clients&lt;/p&gt;

&lt;p&gt;Discovery is disabled, and the routes are configured statically, on all the systems.&lt;/p&gt;

&lt;p&gt;This causes LNet to complain vociferously on the console from lnet_check_route_inconsistency()&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;LNet: 29144:0:(router.c:384:lnet_check_route_inconsistency()) route o2ib18-&amp;gt;172.19.1.54@o2ib100 is detected to be multi-hop but hop count is set to -1&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;If LNet is configured so that there is only one route to any given endpoint, even on a multi-hop network, there is no value to spending sysadmin time determining and setting the hop counts as far as I can tell.&#160; And setting hops is optional according to the Lustre Operations Manual.&lt;/p&gt;

&lt;p&gt;Is hop count actually required in 2.14 due to&#160;&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-13029&quot; title=&quot;LNet Routing: asym routing is not working for multi-hop routing&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-13029&quot;&gt;&lt;del&gt;LU-13029&lt;/del&gt;&lt;/a&gt;&#160;and &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-13785&quot; title=&quot;router ib interface was not configured on boot. gni clients mis-classified the router as multi-hop leading to evictions&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-13785&quot;&gt;&lt;del&gt;LU-13785&lt;/del&gt;&lt;/a&gt;?&lt;/p&gt;</description>
                <environment>RHEL 8&lt;br/&gt;
multi-hop network&lt;br/&gt;
hops not set</environment>
        <key id="63524">LU-14555</key>
            <summary>lnet_check_route_inconsistency() complains when hops == -1</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="defazio">Gian-Carlo Defazio</assignee>
                                    <reporter username="ofaaland">Olaf Faaland</reporter>
                        <labels>
                            <label>llnl</label>
                    </labels>
                <created>Fri, 26 Mar 2021 01:02:40 +0000</created>
                <updated>Fri, 13 Jan 2023 14:57:21 +0000</updated>
                            <resolved>Fri, 13 Jan 2023 14:57:21 +0000</resolved>
                                    <version>Lustre 2.14.0</version>
                                    <fixVersion>Lustre 2.16.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>4</watches>
                                                                            <comments>
                            <comment id="296800" author="gerrit" created="Fri, 26 Mar 2021 01:54:32 +0000"  >&lt;p&gt;Olaf Faaland-LLNL (faaland1@llnl.gov) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/43127&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/43127&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-14555&quot; title=&quot;lnet_check_route_inconsistency() complains when hops == -1&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-14555&quot;&gt;&lt;del&gt;LU-14555&lt;/del&gt;&lt;/a&gt; lnet: do not complain if hops == -1&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 4ebbbb068cf9d2f53b6923ffe7744dc562bc94fe&lt;/p&gt;</comment>
                            <comment id="296801" author="ofaaland" created="Fri, 26 Mar 2021 01:56:31 +0000"  >&lt;p&gt;I don&apos;t actually know if hops == 0 is either valid or possible, so in the patch I checked for that and reported it as well as reporting if hops == 1.&lt;/p&gt;</comment>
                            <comment id="296903" author="ofaaland" created="Fri, 26 Mar 2021 19:27:40 +0000"  >&lt;p&gt;Peter,&lt;/p&gt;

&lt;p&gt;There appears to be more to hop count than I realized.&#160; Please assign this to an engineer.&lt;/p&gt;

&lt;p&gt;thanks&lt;/p&gt;</comment>
                            <comment id="296906" author="pjones" created="Fri, 26 Mar 2021 19:45:52 +0000"  >&lt;p&gt;Serguei&lt;/p&gt;

&lt;p&gt;Could you please assist?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="296932" author="ashehata" created="Sat, 27 Mar 2021 00:04:54 +0000"  >&lt;p&gt;Olaf, the check for inconsistency should happen only when we ping the route for aliveness.&lt;/p&gt;

&lt;p&gt;Do you see it happening more often?&lt;/p&gt;

&lt;p&gt;We can probably reduce the severity of the debug message. But would like to make sure that it&apos;s not being printed more frequently than it should.&lt;/p&gt;</comment>
                            <comment id="296934" author="ofaaland" created="Sat, 27 Mar 2021 00:38:34 +0000"  >&lt;p&gt;Hi Amir,&lt;/p&gt;

&lt;p&gt;I&apos;m not certain if this is only occurring when the route is pinged for aliveness.&#160; I&apos;ll look.&#160; But if setting the hop count is not required, then a console message is inappropriate.&#160; &#160;And if setting the hop count is required, then shouldn&apos;t that be enforced at the time routes are created?&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[root@garteri:~]# pdsh -N -w e1 &apos;dmesg -T | grep -w hop | fgrep 1.54 | tail&apos; | sed &apos;s/is detected to be multi-hop.*$//&apos;
[Thu Mar 25 15:21:37 2021] LNet: 29145:0:(router.c:384:lnet_check_route_inconsistency()) route o2ib18-&amp;gt;172.19.1.54@o2ib100 
[Thu Mar 25 15:23:07 2021] LNet: 29145:0:(router.c:384:lnet_check_route_inconsistency()) route o2ib18-&amp;gt;172.19.1.54@o2ib100 
[Thu Mar 25 15:23:37 2021] LNet: 29143:0:(router.c:384:lnet_check_route_inconsistency()) route o2ib18-&amp;gt;172.19.1.54@o2ib100 
[Thu Mar 25 15:24:08 2021] LNet: 29146:0:(router.c:384:lnet_check_route_inconsistency()) route o2ib18-&amp;gt;172.19.1.54@o2ib100 
[Thu Mar 25 15:24:37 2021] LNet: 29144:0:(router.c:384:lnet_check_route_inconsistency()) route o2ib18-&amp;gt;172.19.1.54@o2ib100 
[Thu Mar 25 15:25:38 2021] LNet: 29146:0:(router.c:384:lnet_check_route_inconsistency()) route o2ib18-&amp;gt;172.19.1.54@o2ib100 
[Thu Mar 25 15:27:07 2021] LNet: 29146:0:(router.c:384:lnet_check_route_inconsistency()) route o2ib18-&amp;gt;172.19.1.54@o2ib100 
[Thu Mar 25 15:29:37 2021] LNet: 29144:0:(router.c:384:lnet_check_route_inconsistency()) route o2ib18-&amp;gt;172.19.1.54@o2ib100 
[Thu Mar 25 15:34:08 2021] LNet: 29146:0:(router.c:384:lnet_check_route_inconsistency()) route o2ib18-&amp;gt;172.19.1.54@o2ib100 
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;thanks&lt;br/&gt;
 &#160;&lt;/p&gt;</comment>
                            <comment id="296937" author="ofaaland" created="Sat, 27 Mar 2021 01:29:38 +0000"  >&lt;p&gt;Amir,&lt;br/&gt;
 The timing of the message does not seem to correlate well with the timing of the call to lnet_check_routers(). If I&apos;m looking at the wrong code, let me know.&lt;/p&gt;

&lt;p&gt;The debug log &quot;discover&quot; messages from lnet_check_routers for one router, .1.54:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;2021-03-26 18:12:19.043797 00000400:00000200:3.0::0:30243:0:(router.c:1231:lnet_check_routers()) discover 172.19.1.54@o2
2021-03-26 18:12:49.798562 00000400:00000200:3.0::0:30243:0:(router.c:1231:lnet_check_routers()) discover 172.19.1.54@o2
2021-03-26 18:13:19.494570 00000400:00000200:3.0::0:30243:0:(router.c:1231:lnet_check_routers()) discover 172.19.1.54@o22021-03-26 18:13:49.190567 00000400:00000200:3.0::0:30243:0:(router.c:1231:lnet_check_routers()) discover 172.19.1.54@o2
2021-03-26 18:14:19.910568 00000400:00000200:3.0::0:30243:0:(router.c:1231:lnet_check_routers()) discover 172.19.1.54@o22021-03-26 18:14:49.606563 00000400:00000200:3.0::0:30243:0:(router.c:1231:lnet_check_routers()) discover 172.19.1.54@o2
2021-03-26 18:15:19.302567 00000400:00000200:3.0::0:30243:0:(router.c:1231:lnet_check_routers()) discover 172.19.1.54@o22021-03-26 18:15:50.022572 00000400:00000200:3.0::0:30243:0:(router.c:1231:lnet_check_routers()) discover 172.19.1.54@o2
2021-03-26 18:16:20.742564 00000400:00000200:3.0::0:30243:0:(router.c:1231:lnet_check_routers()) discover 172.19.1.54@o22021-03-26 18:16:50.438577 00000400:00000200:3.0::0:30243:0:(router.c:1231:lnet_check_routers()) discover 172.19.1.54@o2
2021-03-26 18:17:20.134570 00000400:00000200:3.0::0:30243:0:(router.c:1231:lnet_check_routers()) discover 172.19.1.54@o2
2021-03-26 18:17:50.854558 00000400:00000200:3.0::0:30243:0:(router.c:1231:lnet_check_routers()) discover 172.19.1.54@o2
2021-03-26 18:18:20.550570 00000400:00000200:3.0::0:30243:0:(router.c:1231:lnet_check_routers()) discover 172.19.1.54@o2
2021-03-26 18:18:50.246560 00000400:00000200:3.0::0:30243:0:(router.c:1231:lnet_check_routers()) discover 172.19.1.54@o2
2021-03-26 18:19:20.966560 00000400:00000200:3.0::0:30243:0:(router.c:1231:lnet_check_routers()) discover 172.19.1.54@o2
2021-03-26 18:19:50.662570 00000400:00000200:3.0::0:30243:0:(router.c:1231:lnet_check_routers()) discover 172.19.1.54@o2
2021-03-26 18:20:20.358566 00000400:00000200:3.0::0:30243:0:(router.c:1231:lnet_check_routers()) discover 172.19.1.54@o2 &lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;and the console log multi-hop messages from that period:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[Fri Mar 26 18:12:18 2021] LNet: 30237:0:(router.c:384:lnet_check_route_inconsistency()) route o2ib18-&amp;gt;172.19.1.54@o2ib100 is detected to be multi-hop but hop count is set to -1
[Fri Mar 26 18:12:48 2021] LNet: 30240:0:(router.c:384:lnet_check_route_inconsistency()) route o2ib18-&amp;gt;172.19.1.54@o2ib100 is detected to be multi-hop but hop count is set to -1
[Fri Mar 26 18:13:18 2021] LNet: 30238:0:(router.c:384:lnet_check_route_inconsistency()) route o2ib18-&amp;gt;172.19.1.54@o2ib100 is detected to be multi-hop but hop count is set to -1
[Fri Mar 26 18:13:48 2021] LNet: 30238:0:(router.c:384:lnet_check_route_inconsistency()) route o2ib18-&amp;gt;172.19.1.54@o2ib100 is detected to be multi-hop but hop count is set to -1
[Fri Mar 26 18:14:19 2021] LNet: 30238:0:(router.c:384:lnet_check_route_inconsistency()) route o2ib18-&amp;gt;172.19.1.54@o2ib100 is detected to be multi-hop but hop count is set to -1
[Fri Mar 26 18:14:48 2021] LNet: 30237:0:(router.c:384:lnet_check_route_inconsistency()) route o2ib18-&amp;gt;172.19.1.54@o2ib100 is detected to be multi-hop but hop count is set to -1
[Fri Mar 26 18:15:18 2021] LNet: 30240:0:(router.c:384:lnet_check_route_inconsistency()) route o2ib18-&amp;gt;172.19.1.54@o2ib100 is detected to be multi-hop but hop count is set to -1
[Fri Mar 26 18:16:19 2021] LNet: 30240:0:(router.c:384:lnet_check_route_inconsistency()) route o2ib18-&amp;gt;172.19.1.54@o2ib100 is detected to be multi-hop but hop count is set to -1
[Fri Mar 26 18:17:50 2021] LNet: 30240:0:(router.c:384:lnet_check_route_inconsistency()) route o2ib18-&amp;gt;172.19.1.54@o2ib100 is detected to be multi-hop but hop count is set to -1
[Fri Mar 26 18:20:19 2021] LNet: 30237:0:(router.c:384:lnet_check_route_inconsistency()) route o2ib18-&amp;gt;172.19.1.54@o2ib100 is detected to be multi-hop but hop count is set to -1&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="297465" author="ashehata" created="Wed, 31 Mar 2021 21:17:58 +0000"  >&lt;p&gt;Looks like it&apos;s getting the keep alive every 30 seconds and that&apos;s when we do the check route consistency. I think it&apos;ll be enough to reduce the message severity to just &quot;net&quot;. It is not mandatory to set the hop count. However, the reason we have the check is to verify configuration consistency. However, if it&apos;s not standard to explicitly specify the hop count when configuring the route, then the check becomes less effective.&lt;/p&gt;</comment>
                            <comment id="330041" author="gerrit" created="Wed, 23 Mar 2022 22:03:19 +0000"  >&lt;p&gt;&quot;Gian-Carlo DeFazio &amp;lt;defazio1@llnl.gov&amp;gt;&quot; uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/46918&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/46918&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-14555&quot; title=&quot;lnet_check_route_inconsistency() complains when hops == -1&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-14555&quot;&gt;&lt;del&gt;LU-14555&lt;/del&gt;&lt;/a&gt; lnet: change route inconsistency warnings&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 572b52a488ac7a186be2f808668e163e9ac850b2&lt;/p&gt;</comment>
                            <comment id="340017" author="gerrit" created="Mon, 11 Jul 2022 06:50:33 +0000"  >&lt;p&gt;&quot;Oleg Drokin &amp;lt;green@whamcloud.com&amp;gt;&quot; merged in patch &lt;a href=&quot;https://review.whamcloud.com/46918/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/46918/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-14555&quot; title=&quot;lnet_check_route_inconsistency() complains when hops == -1&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-14555&quot;&gt;&lt;del&gt;LU-14555&lt;/del&gt;&lt;/a&gt; lnet: asym route inconsistency warning&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 6ab060e58e6b3f38b0c8d57b56fec887c6fe9fb6&lt;/p&gt;</comment>
                            <comment id="340047" author="pjones" created="Mon, 11 Jul 2022 13:04:20 +0000"  >&lt;p&gt;Landed for 2.16&lt;/p&gt;</comment>
                            <comment id="355783" author="defazio" created="Thu, 8 Dec 2022 23:58:38 +0000"  >&lt;p&gt;Just adding the requirement that &lt;b&gt;avoid_asym_router_failure&lt;/b&gt; be true doesn&apos;t prevent the warning from beaing logged because we have &lt;b&gt;avoid_asym_router_failure=1.&lt;/b&gt; &lt;/p&gt;

&lt;p&gt;However, due to changes in the code since ticket was made, I&apos;ve made a change which further reduces the conditions in which the warning is logged: having hops undefined for a multi-hop route is no longer considered an inconsistent configuration.&lt;/p&gt;</comment>
                            <comment id="355784" author="gerrit" created="Thu, 8 Dec 2022 23:59:25 +0000"  >&lt;p&gt;&quot;Gian-Carlo DeFazio &amp;lt;defazio1@llnl.gov&amp;gt;&quot; uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/c/fs/lustre-release/+/49352&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/c/fs/lustre-release/+/49352&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-14555&quot; title=&quot;lnet_check_route_inconsistency() complains when hops == -1&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-14555&quot;&gt;&lt;del&gt;LU-14555&lt;/del&gt;&lt;/a&gt; lnet: asym route inconsistency warning&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: af572e46f883f8736af8055a2030eb640792130c&lt;/p&gt;</comment>
                            <comment id="358919" author="gerrit" created="Fri, 13 Jan 2023 07:22:44 +0000"  >&lt;p&gt;&quot;Oleg Drokin &amp;lt;green@whamcloud.com&amp;gt;&quot; merged in patch &lt;a href=&quot;https://review.whamcloud.com/c/fs/lustre-release/+/49352/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/c/fs/lustre-release/+/49352/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-14555&quot; title=&quot;lnet_check_route_inconsistency() complains when hops == -1&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-14555&quot;&gt;&lt;del&gt;LU-14555&lt;/del&gt;&lt;/a&gt; lnet: asym route inconsistency warning&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 6aed5df1771c299b527251b0e18ff9f6cb95dd75&lt;/p&gt;</comment>
                            <comment id="358976" author="pjones" created="Fri, 13 Jan 2023 14:57:21 +0000"  >&lt;p&gt;Landed for 2.16&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i01qjz:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>