<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 03:09:54 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-14454] LNET routers added - then access issues with Lustre storage</title>
                <link>https://jira.whamcloud.com/browse/LU-14454</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;I built 2 new LNET routers and added them to our LNET env. The version of software of OS/LNET/MLNX OFED is exactly the same as 2 other existing lnet routers in this location. I added lnet routes on the 2 Lustre filesystem we have in this physical location to point to the 2 new lnet routers. I tested one client in another data center we have by adding the 2 lnet routes on the client to point to the new lnet routers. The client could read and write fine. The next day we were having issues from various clients with access to the 2 Lustre FS I had set LNET routes on previously. We ended up removing all the lnet routes to the 2 new lnet routers on the Lustre filesystems and things started to working again. So we ended up removing the 2 new lnet routers from our LNET env.&lt;/p&gt;

&lt;p&gt;LNET routers are running lnet 2.12.4, Lustre FS are lustre 2.12.3 and a very old version &lt;/p&gt;

&lt;p&gt;We have not experienced this before and was wondering it there is a specific procedure we have to follow to add new lnet routers in our environment ? &lt;/p&gt;

&lt;p&gt;The messages we were seeing on the lustre FS were for example:&lt;br/&gt;
Feb 19 09:03:17 boslfs02mds01 kernel: LNetError: 6413:0:(peer.c:3451:lnet_peer_ni_add_to_recoveryq_locked()) Skipped 9 previous similar messages&lt;br/&gt;
Feb 19 09:14:32 boslfs02mds01 kernel: LNetError: 6413:0:(peer.c:3451:lnet_peer_ni_add_to_recoveryq_locked()) lpni 10.242.46.216@o2ib1 added to recovery queue. Health = 900&lt;br/&gt;
Feb 19 09:14:32 boslfs02mds01 kernel: LNetError: 6413:0:(peer.c:3451:lnet_peer_ni_add_to_recoveryq_locked()) Skipped 9 previous similar messages&lt;br/&gt;
Feb 19 09:25:47 boslfs02mds01 kernel: LNetError: 6413:0:(peer.c:3451:lnet_peer_ni_add_to_recoveryq_locked()) lpni 10.242.46.217@o2ib1 added to recovery queue. Health = 900&lt;/p&gt;

&lt;p&gt;We were getting messages like the above for all 4 of the lnet routers, both the existing and the 2 new ones that were added.&lt;/p&gt;

&lt;p&gt;Also the hardware configuration of the 2 new LNET router is different. They have a dual port ConnectX-4 card running in ethernet mode at 10G and the 2 ports are LACP bonded, with a CX5 card for 100 rate IB.  The older LNET routers have a ConnectX-4 IB card with IB rate 100 and a traditional 10G ethernet card with 2 10G and are LACP bonded. Not sure if this matters, but I wanted to mention it.&lt;/p&gt;
</description>
                <environment>All Centos 7.x. Hardware is either Dell or Lenovo. IB infrastructure is EDR IB with a MSB7800 switch. MLNX OFED is 4.7-1.0.0.1 for lnet routers</environment>
        <key id="62961">LU-14454</key>
            <summary>LNET routers added - then access issues with Lustre storage</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="4" iconUrl="https://jira.whamcloud.com/images/icons/statuses/reopened.png" description="This issue was once resolved, but the resolution was deemed incorrect. From here issues are either marked assigned or resolved.">Reopened</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="ssmirnov">Serguei Smirnov</assignee>
                                    <reporter username="mre64">Michael Ethier</reporter>
                        <labels>
                    </labels>
                <created>Fri, 19 Feb 2021 16:25:29 +0000</created>
                <updated>Wed, 5 May 2021 17:03:47 +0000</updated>
                                            <version>Lustre 2.12.3</version>
                    <version>Lustre 2.12.4</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>4</watches>
                                                                            <comments>
                            <comment id="292455" author="mre64" created="Fri, 19 Feb 2021 16:48:19 +0000"  >&lt;p&gt;Actually, those recoveryq messages I mentioned above may not be an issue in regards to the access problem I described. I can see those messages in /var/log/messages on one of the lustre FS - at much earlier times, like weeks ago - before we experienced the access issue.&lt;/p&gt;</comment>
                            <comment id="292465" author="pjones" created="Fri, 19 Feb 2021 18:06:16 +0000"  >&lt;p&gt;Cyril&lt;/p&gt;

&lt;p&gt;Could you please assist with this one?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="292816" author="mre64" created="Wed, 24 Feb 2021 00:40:14 +0000"  >&lt;p&gt;Hello,&lt;br/&gt;
Anything else you need ?&lt;br/&gt;
Thanks,&lt;br/&gt;
Mike&lt;/p&gt;</comment>
                            <comment id="292847" author="cbordage" created="Wed, 24 Feb 2021 09:25:43 +0000"  >&lt;p&gt;Hello Michael,&lt;/p&gt;

&lt;p&gt;sorry for the late answer, I had to take unexpected leave.&lt;/p&gt;

&lt;p&gt;Could you provide the outputs of the following commands from the servers, the routers and several clients?&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;lnetctl peer show -v 4
lnetctl net show -v 4
lnetctl global show
lnetctl route show&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;Thank you.&lt;/p&gt;

&lt;p&gt;Cyril.&lt;/p&gt;</comment>
                            <comment id="293075" author="mre64" created="Thu, 25 Feb 2021 19:20:25 +0000"  >&lt;p&gt;Hi Cyril, no worries. So those 2 lnet routers with issues do not have their lnet active to I can&apos;t run the commands. If we add routes to the Lustre storage and to some clients that&apos;s when we run into issues with accessing the Lustre storage afterwards.&lt;/p&gt;

&lt;p&gt;&lt;span class=&quot;error&quot;&gt;&amp;#91;root@boslnet03 ~&amp;#93;&lt;/span&gt;# lnetctl peer show -v 4&lt;br/&gt;
^C&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;root@boslnet03 ~&amp;#93;&lt;/span&gt;# lnetctl net show&lt;br/&gt;
net:&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;net type: lo&lt;br/&gt;
      local NI(s):&lt;/li&gt;
	&lt;li&gt;nid: 0@lo&lt;br/&gt;
          status: up&lt;/li&gt;
	&lt;li&gt;net type: tcp1&lt;br/&gt;
      local NI(s):&lt;/li&gt;
	&lt;li&gt;nid: 10.242.62.213@tcp1&lt;br/&gt;
          status: down&lt;br/&gt;
          interfaces:&lt;br/&gt;
              0: bond0&lt;/li&gt;
	&lt;li&gt;net type: o2ib1&lt;br/&gt;
      local NI(s):&lt;/li&gt;
	&lt;li&gt;nid: 10.242.46.227@o2ib1&lt;br/&gt;
          status: down&lt;br/&gt;
          interfaces:&lt;br/&gt;
              0: ib0&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;Thanks,&lt;br/&gt;
Mike&lt;/p&gt;</comment>
                            <comment id="293166" author="cbordage" created="Fri, 26 Feb 2021 09:33:48 +0000"  >&lt;p&gt;Hello Michael,&lt;/p&gt;

&lt;p&gt;I do not get your &quot;those 2 lnet routers with issues do not have their lnet active to I can&apos;t run the commands.&quot;. You mean you cannot mess up your working configuration by enabling them?&lt;/p&gt;

&lt;p&gt;I will be difficult to diagnose with little information&#8230; To see what is going on I need to have all requested details with the exact commands I provided. If you cannot mess up with your production environment, could you provide the commands you used to configure everything, with the details of your network (nids of the servers, the routers, the clients), and all available information for servers and clients.&lt;/p&gt;

&lt;p&gt;Thank you.&lt;/p&gt;

&lt;p&gt;Cyril.&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;</comment>
                            <comment id="293879" author="mre64" created="Thu, 4 Mar 2021 04:13:36 +0000"  >&lt;p&gt;Hi Cyril,&lt;br/&gt;
Sorry for the delay. Yes when we added the 2 new lnet routers to some of our lustre storage via lnet routes and set a few client with routes to access the storage via the lnet routers we had problems accessing the storage from other clients. I will gather the details you have ask for and reply back.&lt;br/&gt;
Thanks,&lt;br/&gt;
Mike&lt;/p&gt;</comment>
                            <comment id="294723" author="mre64" created="Thu, 11 Mar 2021 17:14:02 +0000"  >&lt;p&gt;Hi Cyril,&lt;/p&gt;

&lt;p&gt;Your requested commands on a working lnet router:&lt;br/&gt;
lnetctl peer show -v 4 pegs the cpu to 100% and never returns on a lnet router.&lt;/p&gt;

&lt;p&gt;&lt;span class=&quot;error&quot;&gt;&amp;#91;root@boslnet01 ~&amp;#93;&lt;/span&gt;# lnetctl net show -v 4&lt;br/&gt;
net:&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;net type: lo&lt;br/&gt;
      local NI(s):&lt;/li&gt;
	&lt;li&gt;nid: 0@lo&lt;br/&gt;
          status: up&lt;br/&gt;
          statistics:&lt;br/&gt;
              send_count: 0&lt;br/&gt;
              recv_count: 0&lt;br/&gt;
              drop_count: 0&lt;br/&gt;
          sent_stats:&lt;br/&gt;
              put: 0&lt;br/&gt;
              get: 0&lt;br/&gt;
              reply: 0&lt;br/&gt;
              ack: 0&lt;br/&gt;
              hello: 0&lt;br/&gt;
          received_stats:&lt;br/&gt;
              put: 0&lt;br/&gt;
              get: 0&lt;br/&gt;
              reply: 0&lt;br/&gt;
              ack: 0&lt;br/&gt;
              hello: 0&lt;br/&gt;
          dropped_stats:&lt;br/&gt;
              put: 0&lt;br/&gt;
              get: 0&lt;br/&gt;
              reply: 0&lt;br/&gt;
              ack: 0&lt;br/&gt;
              hello: 0&lt;br/&gt;
          health stats:&lt;br/&gt;
              health value: 0&lt;br/&gt;
              interrupts: 0&lt;br/&gt;
              dropped: 0&lt;br/&gt;
              aborted: 0&lt;br/&gt;
              no route: 0&lt;br/&gt;
              timeouts: 0&lt;br/&gt;
              error: 0&lt;br/&gt;
          tunables:&lt;br/&gt;
              peer_timeout: 0&lt;br/&gt;
              peer_credits: 0&lt;br/&gt;
              peer_buffer_credits: 0&lt;br/&gt;
              credits: 0&lt;br/&gt;
          dev cpt: 0&lt;br/&gt;
          tcp bonding: 0&lt;br/&gt;
          CPT: &quot;&lt;span class=&quot;error&quot;&gt;&amp;#91;0,1&amp;#93;&lt;/span&gt;&quot;&lt;/li&gt;
	&lt;li&gt;net type: tcp1&lt;br/&gt;
      local NI(s):&lt;/li&gt;
	&lt;li&gt;nid: 10.242.62.227@tcp1&lt;br/&gt;
          status: up&lt;br/&gt;
          interfaces:&lt;br/&gt;
              0: bond0&lt;br/&gt;
          statistics:&lt;br/&gt;
              send_count: 1263198586&lt;br/&gt;
              recv_count: 446196203&lt;br/&gt;
              drop_count: 827&lt;br/&gt;
          sent_stats:&lt;br/&gt;
              put: 1211118273&lt;br/&gt;
              get: 52080311&lt;br/&gt;
              reply: 2&lt;br/&gt;
              ack: 0&lt;br/&gt;
              hello: 0&lt;br/&gt;
          received_stats:&lt;br/&gt;
              put: 380078000&lt;br/&gt;
              get: 48000898&lt;br/&gt;
              reply: 3709225&lt;br/&gt;
              ack: 14408080&lt;br/&gt;
              hello: 0&lt;br/&gt;
          dropped_stats:&lt;br/&gt;
              put: 707&lt;br/&gt;
              get: 117&lt;br/&gt;
              reply: 3&lt;br/&gt;
              ack: 0&lt;br/&gt;
              hello: 0&lt;br/&gt;
          health stats:&lt;br/&gt;
              health value: 1000&lt;br/&gt;
              interrupts: 0&lt;br/&gt;
              dropped: 0&lt;br/&gt;
              aborted: 0&lt;br/&gt;
              no route: 0&lt;br/&gt;
              timeouts: 454&lt;br/&gt;
              error: 5397&lt;br/&gt;
          tunables:&lt;br/&gt;
              peer_timeout: 180&lt;br/&gt;
              peer_credits: 8&lt;br/&gt;
              peer_buffer_credits: 0&lt;br/&gt;
              credits: 256&lt;br/&gt;
          dev cpt: -1&lt;br/&gt;
          tcp bonding: 0&lt;br/&gt;
          CPT: &quot;&lt;span class=&quot;error&quot;&gt;&amp;#91;0,1&amp;#93;&lt;/span&gt;&quot;&lt;/li&gt;
	&lt;li&gt;net type: tcp2&lt;br/&gt;
      local NI(s):&lt;/li&gt;
	&lt;li&gt;nid: 10.242.107.16@tcp2&lt;br/&gt;
          status: up&lt;br/&gt;
          interfaces:&lt;br/&gt;
              0: p3p2&lt;br/&gt;
          statistics:&lt;br/&gt;
              send_count: 61007347&lt;br/&gt;
              recv_count: 60503869&lt;br/&gt;
              drop_count: 5&lt;br/&gt;
          sent_stats:&lt;br/&gt;
              put: 56550435&lt;br/&gt;
              get: 4456912&lt;br/&gt;
              reply: 0&lt;br/&gt;
              ack: 0&lt;br/&gt;
              hello: 0&lt;br/&gt;
          received_stats:&lt;br/&gt;
              put: 12820596&lt;br/&gt;
              get: 52672&lt;br/&gt;
              reply: 4359334&lt;br/&gt;
              ack: 43271267&lt;br/&gt;
              hello: 0&lt;br/&gt;
          dropped_stats:&lt;br/&gt;
              put: 4&lt;br/&gt;
              get: 0&lt;br/&gt;
              reply: 1&lt;br/&gt;
              ack: 0&lt;br/&gt;
              hello: 0&lt;br/&gt;
          health stats:&lt;br/&gt;
              health value: 1000&lt;br/&gt;
              interrupts: 0&lt;br/&gt;
              dropped: 0&lt;br/&gt;
              aborted: 0&lt;br/&gt;
              no route: 0&lt;br/&gt;
              timeouts: 0&lt;br/&gt;
              error: 0&lt;br/&gt;
          tunables:&lt;br/&gt;
              peer_timeout: 180&lt;br/&gt;
              peer_credits: 8&lt;br/&gt;
              peer_buffer_credits: 0&lt;br/&gt;
              credits: 256&lt;br/&gt;
          dev cpt: 0&lt;br/&gt;
          tcp bonding: 0&lt;br/&gt;
          CPT: &quot;&lt;span class=&quot;error&quot;&gt;&amp;#91;0,1&amp;#93;&lt;/span&gt;&quot;&lt;/li&gt;
	&lt;li&gt;net type: o2ib1&lt;br/&gt;
      local NI(s):&lt;/li&gt;
	&lt;li&gt;nid: 10.242.46.216@o2ib1&lt;br/&gt;
          status: up&lt;br/&gt;
          interfaces:&lt;br/&gt;
              0: ib0&lt;br/&gt;
          statistics:&lt;br/&gt;
              send_count: 459118144&lt;br/&gt;
              recv_count: 1276624005&lt;br/&gt;
              drop_count: 457262&lt;br/&gt;
          sent_stats:&lt;br/&gt;
              put: 392898596&lt;br/&gt;
              get: 471643&lt;br/&gt;
              reply: 8068558&lt;br/&gt;
              ack: 57679347&lt;br/&gt;
              hello: 0&lt;br/&gt;
          received_stats:&lt;br/&gt;
              put: 1267668708&lt;br/&gt;
              get: 8955285&lt;br/&gt;
              reply: 12&lt;br/&gt;
              ack: 0&lt;br/&gt;
              hello: 0&lt;br/&gt;
          dropped_stats:&lt;br/&gt;
              put: 457250&lt;br/&gt;
              get: 0&lt;br/&gt;
              reply: 12&lt;br/&gt;
              ack: 0&lt;br/&gt;
              hello: 0&lt;br/&gt;
          health stats:&lt;br/&gt;
              health value: 1000&lt;br/&gt;
              interrupts: 0&lt;br/&gt;
              dropped: 62&lt;br/&gt;
              aborted: 0&lt;br/&gt;
              no route: 0&lt;br/&gt;
              timeouts: 350&lt;br/&gt;
              error: 0&lt;br/&gt;
          tunables:&lt;br/&gt;
              peer_timeout: 180&lt;br/&gt;
              peer_credits: 8&lt;br/&gt;
              peer_buffer_credits: 0&lt;br/&gt;
              credits: 256&lt;br/&gt;
              peercredits_hiw: 4&lt;br/&gt;
              map_on_demand: 0&lt;br/&gt;
              concurrent_sends: 8&lt;br/&gt;
              fmr_pool_size: 512&lt;br/&gt;
              fmr_flush_trigger: 384&lt;br/&gt;
              fmr_cache: 1&lt;br/&gt;
              ntx: 512&lt;br/&gt;
              conns_per_peer: 1&lt;br/&gt;
          lnd tunables:&lt;br/&gt;
          dev cpt: 1&lt;br/&gt;
          tcp bonding: 0&lt;br/&gt;
          CPT: &quot;&lt;span class=&quot;error&quot;&gt;&amp;#91;0,1&amp;#93;&lt;/span&gt;&quot;&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;root@boslnet01 ~&amp;#93;&lt;/span&gt;# lnetctl global show&lt;br/&gt;
global:&lt;br/&gt;
    numa_range: 0&lt;br/&gt;
    max_intf: 200&lt;br/&gt;
    discovery: 0&lt;br/&gt;
    drop_asym_route: 0&lt;br/&gt;
    retry_count: 0&lt;br/&gt;
    transaction_timeout: 50&lt;br/&gt;
    health_sensitivity: 0&lt;br/&gt;
    recovery_interval: 1&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;root@boslnet01 ~&amp;#93;&lt;/span&gt;# lnetctl route show&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;root@boslnet01 ~&amp;#93;&lt;/span&gt;#&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;Your commands on a non-working lnet router:&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;root@boslnet03 ~&amp;#93;&lt;/span&gt;# lnetctl net show -v 4&lt;br/&gt;
net:&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;net type: lo&lt;br/&gt;
      local NI(s):&lt;/li&gt;
	&lt;li&gt;nid: 0@lo&lt;br/&gt;
          status: up&lt;br/&gt;
          statistics:&lt;br/&gt;
              send_count: 0&lt;br/&gt;
              recv_count: 0&lt;br/&gt;
              drop_count: 0&lt;br/&gt;
          sent_stats:&lt;br/&gt;
              put: 0&lt;br/&gt;
              get: 0&lt;br/&gt;
              reply: 0&lt;br/&gt;
              ack: 0&lt;br/&gt;
              hello: 0&lt;br/&gt;
          received_stats:&lt;br/&gt;
              put: 0&lt;br/&gt;
              get: 0&lt;br/&gt;
              reply: 0&lt;br/&gt;
              ack: 0&lt;br/&gt;
              hello: 0&lt;br/&gt;
          dropped_stats:&lt;br/&gt;
              put: 0&lt;br/&gt;
              get: 0&lt;br/&gt;
              reply: 0&lt;br/&gt;
              ack: 0&lt;br/&gt;
              hello: 0&lt;br/&gt;
          health stats:&lt;br/&gt;
              health value: 0&lt;br/&gt;
              interrupts: 0&lt;br/&gt;
              dropped: 0&lt;br/&gt;
              aborted: 0&lt;br/&gt;
              no route: 0&lt;br/&gt;
              timeouts: 0&lt;br/&gt;
              error: 0&lt;br/&gt;
          tunables:&lt;br/&gt;
              peer_timeout: 0&lt;br/&gt;
              peer_credits: 0&lt;br/&gt;
              peer_buffer_credits: 0&lt;br/&gt;
              credits: 0&lt;br/&gt;
          dev cpt: 0&lt;br/&gt;
          tcp bonding: 0&lt;br/&gt;
          CPT: &quot;&lt;span class=&quot;error&quot;&gt;&amp;#91;0&amp;#93;&lt;/span&gt;&quot;&lt;/li&gt;
	&lt;li&gt;net type: tcp1&lt;br/&gt;
      local NI(s):&lt;/li&gt;
	&lt;li&gt;nid: 10.242.62.213@tcp1&lt;br/&gt;
          status: down&lt;br/&gt;
          interfaces:&lt;br/&gt;
              0: bond0&lt;br/&gt;
          statistics:&lt;br/&gt;
              send_count: 9937720&lt;br/&gt;
              recv_count: 13722&lt;br/&gt;
              drop_count: 276772&lt;br/&gt;
          sent_stats:&lt;br/&gt;
              put: 9913812&lt;br/&gt;
              get: 23902&lt;br/&gt;
              reply: 6&lt;br/&gt;
              ack: 0&lt;br/&gt;
              hello: 0&lt;br/&gt;
          received_stats:&lt;br/&gt;
              put: 7145&lt;br/&gt;
              get: 1445&lt;br/&gt;
              reply: 5121&lt;br/&gt;
              ack: 11&lt;br/&gt;
              hello: 0&lt;br/&gt;
          dropped_stats:&lt;br/&gt;
              put: 276766&lt;br/&gt;
              get: 0&lt;br/&gt;
              reply: 6&lt;br/&gt;
              ack: 0&lt;br/&gt;
              hello: 0&lt;br/&gt;
          health stats:&lt;br/&gt;
              health value: 1000&lt;br/&gt;
              interrupts: 0&lt;br/&gt;
              dropped: 0&lt;br/&gt;
              aborted: 0&lt;br/&gt;
              no route: 0&lt;br/&gt;
              timeouts: 738&lt;br/&gt;
              error: 3&lt;br/&gt;
          tunables:&lt;br/&gt;
              peer_timeout: 180&lt;br/&gt;
              peer_credits: 8&lt;br/&gt;
              peer_buffer_credits: 0&lt;br/&gt;
              credits: 256&lt;br/&gt;
          dev cpt: -1&lt;br/&gt;
          tcp bonding: 0&lt;br/&gt;
          CPT: &quot;&lt;span class=&quot;error&quot;&gt;&amp;#91;0&amp;#93;&lt;/span&gt;&quot;&lt;/li&gt;
	&lt;li&gt;net type: o2ib1&lt;br/&gt;
      local NI(s):&lt;/li&gt;
	&lt;li&gt;nid: 10.242.46.227@o2ib1&lt;br/&gt;
          status: down&lt;br/&gt;
          interfaces:&lt;br/&gt;
              0: ib0&lt;br/&gt;
          statistics:&lt;br/&gt;
              send_count: 217189&lt;br/&gt;
              recv_count: 10141187&lt;br/&gt;
              drop_count: 10&lt;br/&gt;
          sent_stats:&lt;br/&gt;
              put: 7145&lt;br/&gt;
              get: 204912&lt;br/&gt;
              reply: 5121&lt;br/&gt;
              ack: 11&lt;br/&gt;
              hello: 0&lt;br/&gt;
          received_stats:&lt;br/&gt;
              put: 9913812&lt;br/&gt;
              get: 227369&lt;br/&gt;
              reply: 6&lt;br/&gt;
              ack: 0&lt;br/&gt;
              hello: 0&lt;br/&gt;
          dropped_stats:&lt;br/&gt;
              put: 10&lt;br/&gt;
              get: 0&lt;br/&gt;
              reply: 0&lt;br/&gt;
              ack: 0&lt;br/&gt;
              hello: 0&lt;br/&gt;
          health stats:&lt;br/&gt;
              health value: 1000&lt;br/&gt;
              interrupts: 0&lt;br/&gt;
              dropped: 0&lt;br/&gt;
              aborted: 0&lt;br/&gt;
              no route: 0&lt;br/&gt;
              timeouts: 0&lt;br/&gt;
              error: 0&lt;br/&gt;
          tunables:&lt;br/&gt;
              peer_timeout: 180&lt;br/&gt;
              peer_credits: 8&lt;br/&gt;
              peer_buffer_credits: 0&lt;br/&gt;
              credits: 256&lt;br/&gt;
              peercredits_hiw: 4&lt;br/&gt;
              map_on_demand: 0&lt;br/&gt;
              concurrent_sends: 8&lt;br/&gt;
              fmr_pool_size: 512&lt;br/&gt;
              fmr_flush_trigger: 384&lt;br/&gt;
              fmr_cache: 1&lt;br/&gt;
              ntx: 512&lt;br/&gt;
              conns_per_peer: 1&lt;br/&gt;
          lnd tunables:&lt;br/&gt;
          dev cpt: 0&lt;br/&gt;
          tcp bonding: 0&lt;br/&gt;
          CPT: &quot;&lt;span class=&quot;error&quot;&gt;&amp;#91;0&amp;#93;&lt;/span&gt;&quot;&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;root@boslnet03 ~&amp;#93;&lt;/span&gt;# lnetctl global show&lt;br/&gt;
global:&lt;br/&gt;
    numa_range: 0&lt;br/&gt;
    max_intf: 200&lt;br/&gt;
    discovery: 0&lt;br/&gt;
    drop_asym_route: 0&lt;br/&gt;
    retry_count: 0&lt;br/&gt;
    transaction_timeout: 50&lt;br/&gt;
    health_sensitivity: 0&lt;br/&gt;
    recovery_interval: 1&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;root@boslnet03 ~&amp;#93;&lt;/span&gt;# lnetctl route show&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;root@boslnet03 ~&amp;#93;&lt;/span&gt;#&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;Your commands on a Lustre storage node (MDS):&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;root@boslfs02mds01 ~&amp;#93;&lt;/span&gt;# lnetctl net show -v 4&lt;br/&gt;
net:&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;net type: lo&lt;br/&gt;
      local NI(s):&lt;/li&gt;
	&lt;li&gt;nid: 0@lo&lt;br/&gt;
          status: up&lt;br/&gt;
          statistics:&lt;br/&gt;
              send_count: 2888768&lt;br/&gt;
              recv_count: 2888764&lt;br/&gt;
              drop_count: 4&lt;br/&gt;
          sent_stats:&lt;br/&gt;
              put: 2888768&lt;br/&gt;
              get: 0&lt;br/&gt;
              reply: 0&lt;br/&gt;
              ack: 0&lt;br/&gt;
              hello: 0&lt;br/&gt;
          received_stats:&lt;br/&gt;
              put: 2190548&lt;br/&gt;
              get: 0&lt;br/&gt;
              reply: 0&lt;br/&gt;
              ack: 698216&lt;br/&gt;
              hello: 0&lt;br/&gt;
          dropped_stats:&lt;br/&gt;
              put: 4&lt;br/&gt;
              get: 0&lt;br/&gt;
              reply: 0&lt;br/&gt;
              ack: 0&lt;br/&gt;
              hello: 0&lt;br/&gt;
          health stats:&lt;br/&gt;
              health value: 0&lt;br/&gt;
              interrupts: 0&lt;br/&gt;
              dropped: 0&lt;br/&gt;
              aborted: 0&lt;br/&gt;
              no route: 0&lt;br/&gt;
              timeouts: 0&lt;br/&gt;
              error: 0&lt;br/&gt;
          tunables:&lt;br/&gt;
              peer_timeout: 0&lt;br/&gt;
              peer_credits: 0&lt;br/&gt;
              peer_buffer_credits: 0&lt;br/&gt;
              credits: 0&lt;br/&gt;
          dev cpt: 0&lt;br/&gt;
          tcp bonding: 0&lt;br/&gt;
          CPT: &quot;&lt;span class=&quot;error&quot;&gt;&amp;#91;0,1,2,3&amp;#93;&lt;/span&gt;&quot;&lt;/li&gt;
	&lt;li&gt;net type: o2ib1&lt;br/&gt;
      local NI(s):&lt;/li&gt;
	&lt;li&gt;nid: 10.242.46.233@o2ib1&lt;br/&gt;
          status: up&lt;br/&gt;
          interfaces:&lt;br/&gt;
              0: ib0&lt;br/&gt;
          statistics:&lt;br/&gt;
              send_count: 120142564&lt;br/&gt;
              recv_count: 120852294&lt;br/&gt;
              drop_count: 22158&lt;br/&gt;
          sent_stats:&lt;br/&gt;
              put: 120091641&lt;br/&gt;
              get: 50923&lt;br/&gt;
              reply: 0&lt;br/&gt;
              ack: 0&lt;br/&gt;
              hello: 0&lt;br/&gt;
          received_stats:&lt;br/&gt;
              put: 119667645&lt;br/&gt;
              get: 13605&lt;br/&gt;
              reply: 37318&lt;br/&gt;
              ack: 1133726&lt;br/&gt;
              hello: 0&lt;br/&gt;
          dropped_stats:&lt;br/&gt;
              put: 18448&lt;br/&gt;
              get: 0&lt;br/&gt;
              reply: 0&lt;br/&gt;
              ack: 3710&lt;br/&gt;
              hello: 0&lt;br/&gt;
          health stats:&lt;br/&gt;
              health value: 1000&lt;br/&gt;
              interrupts: 0&lt;br/&gt;
              dropped: 1804948&lt;br/&gt;
              aborted: 0&lt;br/&gt;
              no route: 0&lt;br/&gt;
              timeouts: 0&lt;br/&gt;
              error: 0&lt;br/&gt;
          tunables:&lt;br/&gt;
              peer_timeout: 180&lt;br/&gt;
              peer_credits: 8&lt;br/&gt;
              peer_buffer_credits: 0&lt;br/&gt;
              credits: 256&lt;br/&gt;
              peercredits_hiw: 4&lt;br/&gt;
              map_on_demand: 0&lt;br/&gt;
              concurrent_sends: 8&lt;br/&gt;
              fmr_pool_size: 512&lt;br/&gt;
              fmr_flush_trigger: 384&lt;br/&gt;
              fmr_cache: 1&lt;br/&gt;
              ntx: 512&lt;br/&gt;
              conns_per_peer: 1&lt;br/&gt;
          lnd tunables:&lt;br/&gt;
          dev cpt: 3&lt;br/&gt;
          tcp bonding: 0&lt;br/&gt;
          CPT: &quot;&lt;span class=&quot;error&quot;&gt;&amp;#91;0,1,2,3&amp;#93;&lt;/span&gt;&quot;&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;root@boslfs02mds01 ~&amp;#93;&lt;/span&gt;# lnetctl global show&lt;br/&gt;
global:&lt;br/&gt;
    numa_range: 0&lt;br/&gt;
    max_intf: 200&lt;br/&gt;
    discovery: 0&lt;br/&gt;
    drop_asym_route: 0&lt;br/&gt;
    retry_count: 0&lt;br/&gt;
    transaction_timeout: 50&lt;br/&gt;
    health_sensitivity: 100&lt;br/&gt;
    recovery_interval: 1&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;root@boslfs02mds01 ~&amp;#93;&lt;/span&gt;# lnetctl route show&lt;br/&gt;
route:&lt;/li&gt;
	&lt;li&gt;net: tcp1&lt;br/&gt;
      gateway: 10.242.46.216@o2ib1&lt;/li&gt;
	&lt;li&gt;net: tcp1&lt;br/&gt;
      gateway: 10.242.46.217@o2ib1&lt;/li&gt;
	&lt;li&gt;net: tcp2&lt;br/&gt;
      gateway: 10.242.46.217@o2ib1&lt;/li&gt;
	&lt;li&gt;net: tcp2&lt;br/&gt;
      gateway: 10.242.46.216@o2ib1&lt;/li&gt;
&lt;/ul&gt;
</comment>
                            <comment id="294724" author="mre64" created="Thu, 11 Mar 2021 17:15:13 +0000"  >&lt;p&gt;Details of existing working lnet router:&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;root@boslnet01 ~&amp;#93;&lt;/span&gt;# dkms status&lt;br/&gt;
lustre-client, 2.12.4, 3.10.0-1062.18.1.el7.x86_64, x86_64: installed&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;root@boslnet01 ~&amp;#93;&lt;/span&gt;# more /etc/modprobe.d/lustre.conf &lt;br/&gt;
options lnet networks=&quot;tcp1(bond0), tcp2(p3p2), o2ib1(ib0)&quot;&lt;br/&gt;
options lnet forwarding=&quot;enabled&quot;&lt;br/&gt;
options lnet lnet_peer_discovery_disabled=1&lt;br/&gt;
options lnet lnet_health_sensitivity=0&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;root@boslnet01 ~&amp;#93;&lt;/span&gt;# lnetctl net show&lt;br/&gt;
net:&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;net type: lo&lt;br/&gt;
      local NI(s):&lt;/li&gt;
	&lt;li&gt;nid: 0@lo&lt;br/&gt;
          status: up&lt;/li&gt;
	&lt;li&gt;net type: tcp1&lt;br/&gt;
      local NI(s):&lt;/li&gt;
	&lt;li&gt;nid: 10.242.62.227@tcp1&lt;br/&gt;
          status: up&lt;br/&gt;
          interfaces:&lt;br/&gt;
              0: bond0&lt;/li&gt;
	&lt;li&gt;net type: tcp2&lt;br/&gt;
      local NI(s):&lt;/li&gt;
	&lt;li&gt;nid: 10.242.107.16@tcp2&lt;br/&gt;
          status: up&lt;br/&gt;
          interfaces:&lt;br/&gt;
              0: p3p2&lt;/li&gt;
	&lt;li&gt;net type: o2ib1&lt;br/&gt;
      local NI(s):&lt;/li&gt;
	&lt;li&gt;nid: 10.242.46.216@o2ib1&lt;br/&gt;
          status: up&lt;br/&gt;
          interfaces:&lt;br/&gt;
              0: ib0&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;root@boslnet01 ~&amp;#93;&lt;/span&gt;# more /etc/redhat-release &lt;br/&gt;
CentOS Linux release 7.7.1908 (Core)&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;root@boslnet01 ~&amp;#93;&lt;/span&gt;# lspci |grep -i mell&lt;br/&gt;
81:00.0 Infiniband controller: Mellanox Technologies MT27700 Family &lt;span class=&quot;error&quot;&gt;&amp;#91;ConnectX-4&amp;#93;&lt;/span&gt;&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;root@boslnet01 ~&amp;#93;&lt;/span&gt;# ibstat&lt;br/&gt;
CA &apos;mlx5_0&apos;&lt;br/&gt;
	CA type: MT4115&lt;br/&gt;
	Number of ports: 1&lt;br/&gt;
	Firmware version: 12.17.2052&lt;br/&gt;
	Hardware version: 0&lt;br/&gt;
	Node GUID: 0x98039b0300bec2c8&lt;br/&gt;
	System image GUID: 0x98039b0300bec2c8&lt;br/&gt;
	Port 1:&lt;br/&gt;
		State: Active&lt;br/&gt;
		Physical state: LinkUp&lt;br/&gt;
		Rate: 100&lt;br/&gt;
		Base lid: 44&lt;br/&gt;
		LMC: 0&lt;br/&gt;
		SM lid: 1&lt;br/&gt;
		Capability mask: 0x2651e848&lt;br/&gt;
		Port GUID: 0x98039b0300bec2c8&lt;br/&gt;
		Link layer: InfiniBand&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;root@boslnet01 ~&amp;#93;&lt;/span&gt;# ifconfig&lt;br/&gt;
bond0: flags=5187&amp;lt;UP,BROADCAST,RUNNING,MASTER,MULTICAST&amp;gt;  mtu 1500&lt;br/&gt;
        inet 10.242.62.227  netmask 255.255.255.0  broadcast 10.242.62.255&lt;br/&gt;
        inet6 fe80::a236:9fff:fe8d:9774  prefixlen 64  scopeid 0x20&amp;lt;link&amp;gt;&lt;br/&gt;
        ether a0:36:9f:8d:97:74  txqueuelen 1000  (Ethernet)&lt;br/&gt;
        RX packets 5311208385  bytes 4185414327252 (3.8 TiB)&lt;br/&gt;
        RX errors 0  dropped 0  overruns 0  frame 0&lt;br/&gt;
        TX packets 11038342648  bytes 14227372363039 (12.9 TiB)&lt;br/&gt;
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;ib0: flags=4163&amp;lt;UP,BROADCAST,RUNNING,MULTICAST&amp;gt;  mtu 2044&lt;br/&gt;
        inet 10.242.46.216  netmask 255.255.252.0  broadcast 10.242.47.255&lt;br/&gt;
        inet6 fe80::9a03:9b03:be:c2c8  prefixlen 64  scopeid 0x20&amp;lt;link&amp;gt;&lt;br/&gt;
Infiniband hardware address can be incorrect! Please read BUGS section in ifconfig(8).&lt;br/&gt;
        infiniband 20:00:00:68:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00  txqueuelen 256  (InfiniBand)&lt;br/&gt;
        RX packets 29806  bytes 4627169 (4.4 MiB)&lt;br/&gt;
        RX errors 0  dropped 0  overruns 0  frame 0&lt;br/&gt;
        TX packets 837  bytes 50408 (49.2 KiB)&lt;br/&gt;
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0&lt;/p&gt;

&lt;p&gt;lo: flags=73&amp;lt;UP,LOOPBACK,RUNNING&amp;gt;  mtu 65536&lt;br/&gt;
        inet 127.0.0.1  netmask 255.0.0.0&lt;br/&gt;
        inet6 ::1  prefixlen 128  scopeid 0x10&amp;lt;host&amp;gt;&lt;br/&gt;
        loop  txqueuelen 1000  (Local Loopback)&lt;br/&gt;
        RX packets 743598  bytes 71604808 (68.2 MiB)&lt;br/&gt;
        RX errors 0  dropped 0  overruns 0  frame 0&lt;br/&gt;
        TX packets 743598  bytes 71604808 (68.2 MiB)&lt;br/&gt;
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0&lt;/p&gt;

&lt;p&gt;p1p1: flags=6211&amp;lt;UP,BROADCAST,RUNNING,SLAVE,MULTICAST&amp;gt;  mtu 1500&lt;br/&gt;
        ether a0:36:9f:8d:97:74  txqueuelen 1000  (Ethernet)&lt;br/&gt;
        RX packets 2983098310  bytes 2267051063818 (2.0 TiB)&lt;br/&gt;
        RX errors 0  dropped 0  overruns 0  frame 0&lt;br/&gt;
        TX packets 6235321147  bytes 8498451034677 (7.7 TiB)&lt;br/&gt;
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0&lt;/p&gt;

&lt;p&gt;p1p2: flags=6211&amp;lt;UP,BROADCAST,RUNNING,SLAVE,MULTICAST&amp;gt;  mtu 1500&lt;br/&gt;
        ether a0:36:9f:8d:97:74  txqueuelen 1000  (Ethernet)&lt;br/&gt;
        RX packets 2328110076  bytes 1918363263500 (1.7 TiB)&lt;br/&gt;
        RX errors 0  dropped 0  overruns 0  frame 0&lt;br/&gt;
        TX packets 4803021510  bytes 5728921330120 (5.2 TiB)&lt;br/&gt;
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0&lt;/p&gt;

&lt;p&gt;p3p1: flags=4099&amp;lt;UP,BROADCAST,MULTICAST&amp;gt;  mtu 1500&lt;br/&gt;
        ether 3c:fd:fe:16:01:88  txqueuelen 1000  (Ethernet)&lt;br/&gt;
        RX packets 0  bytes 0 (0.0 B)&lt;br/&gt;
        RX errors 0  dropped 0  overruns 0  frame 0&lt;br/&gt;
        TX packets 0  bytes 0 (0.0 B)&lt;br/&gt;
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0&lt;/p&gt;

&lt;p&gt;p3p2: flags=4163&amp;lt;UP,BROADCAST,RUNNING,MULTICAST&amp;gt;  mtu 1500&lt;br/&gt;
        inet 10.242.107.16  netmask 255.255.255.192  broadcast 10.242.107.63&lt;br/&gt;
        inet6 fe80::3efd:feff:fe16:18a  prefixlen 64  scopeid 0x20&amp;lt;link&amp;gt;&lt;br/&gt;
        ether 3c:fd:fe:16:01:8a  txqueuelen 1000  (Ethernet)&lt;br/&gt;
        RX packets 3292939462  bytes 4737754838918 (4.3 TiB)&lt;br/&gt;
        RX errors 0  dropped 667  overruns 0  frame 0&lt;br/&gt;
        TX packets 31792664489  bytes 47939135623196 (43.6 TiB)&lt;br/&gt;
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0&lt;/p&gt;



&lt;p&gt;Details of one of our lustre storage servers:&lt;/p&gt;

&lt;p&gt;&lt;span class=&quot;error&quot;&gt;&amp;#91;root@boslfs02mds01 ~&amp;#93;&lt;/span&gt;# rpm -qa |grep lustre&lt;br/&gt;
kernel-3.10.0-1062.1.1.el7_lustre.x86_64&lt;br/&gt;
kmod-lustre-2.12.3-1.el7.x86_64&lt;br/&gt;
kmod-lustre-osd-ldiskfs-2.12.3-1.el7.x86_64&lt;br/&gt;
kernel-devel-3.10.0-1062.1.1.el7_lustre.x86_64&lt;br/&gt;
lustre-osd-zfs-mount-2.12.3-1.el7.x86_64&lt;br/&gt;
lustre-2.12.3-1.el7.x86_64&lt;br/&gt;
lustre-zfs-dkms-2.12.3-1.el7.noarch&lt;br/&gt;
lustre-resource-agents-2.12.3-1.el7.x86_64&lt;br/&gt;
lustre-ldiskfs-zfs-5.0.0-1.el7.x86_64&lt;br/&gt;
lustre-osd-ldiskfs-mount-2.12.3-1.el7.x86_64&lt;br/&gt;
kmod-spl-3.10.0-1062.1.1.el7_lustre.x86_64-0.7.13-1.el7.x86_64&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;root@boslfs02mds01 ~&amp;#93;&lt;/span&gt;# more /etc/redhat-release &lt;br/&gt;
CentOS Linux release 7.7.1908 (Core)&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;root@boslfs02mds01 ~&amp;#93;&lt;/span&gt;# more /etc/modprobe.d/lustre.conf &lt;br/&gt;
options lnet networks=&quot;o2ib1(ib0)&quot; routes=&quot;tcp1 10.242.46.&lt;span class=&quot;error&quot;&gt;&amp;#91;216-217&amp;#93;&lt;/span&gt;@o2ib1; tcp2 10.242.46.&lt;span class=&quot;error&quot;&gt;&amp;#91;216-217&amp;#93;&lt;/span&gt;@o2ib1&quot;&lt;br/&gt;
options lnet lnet_transaction_timeout=50 lnet_retry_count=0 lnet_peer_discovery_disabled=1&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;root@boslfs02mds01 ~&amp;#93;&lt;/span&gt;# lnetctl route show&lt;br/&gt;
route:&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;net: tcp1&lt;br/&gt;
      gateway: 10.242.46.216@o2ib1&lt;/li&gt;
	&lt;li&gt;net: tcp1&lt;br/&gt;
      gateway: 10.242.46.217@o2ib1&lt;/li&gt;
	&lt;li&gt;net: tcp2&lt;br/&gt;
      gateway: 10.242.46.217@o2ib1&lt;/li&gt;
	&lt;li&gt;net: tcp2&lt;br/&gt;
      gateway: 10.242.46.216@o2ib1&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;root@boslfs02mds01 ~&amp;#93;&lt;/span&gt;# lspci |grep -i mell&lt;br/&gt;
d8:00.0 Infiniband controller: Mellanox Technologies MT27800 Family &lt;span class=&quot;error&quot;&gt;&amp;#91;ConnectX-5&amp;#93;&lt;/span&gt;&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;root@boslfs02mds01 ~&amp;#93;&lt;/span&gt;# lspci |grep -i eth&lt;br/&gt;
01:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)&lt;br/&gt;
01:00.1 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)&lt;br/&gt;
19:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)&lt;br/&gt;
19:00.1 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;root@boslfs02mds01 ~&amp;#93;&lt;/span&gt;# ibstat&lt;br/&gt;
CA &apos;mlx5_0&apos;&lt;br/&gt;
	CA type: MT4119&lt;br/&gt;
	Number of ports: 1&lt;br/&gt;
	Firmware version: 16.25.1020&lt;br/&gt;
	Hardware version: 0&lt;br/&gt;
	Node GUID: 0xb8599f030005a4ec&lt;br/&gt;
	System image GUID: 0xb8599f030005a4ec&lt;br/&gt;
	Port 1:&lt;br/&gt;
		State: Active&lt;br/&gt;
		Physical state: LinkUp&lt;br/&gt;
		Rate: 100&lt;br/&gt;
		Base lid: 20&lt;br/&gt;
		LMC: 0&lt;br/&gt;
		SM lid: 1&lt;br/&gt;
		Capability mask: 0x2651e848&lt;br/&gt;
		Port GUID: 0xb8599f030005a4ec&lt;br/&gt;
		Link layer: InfiniBand&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;root@boslfs02mds01 ~&amp;#93;&lt;/span&gt;# ifconfig&lt;br/&gt;
bond0: flags=5187&amp;lt;UP,BROADCAST,RUNNING,MASTER,MULTICAST&amp;gt;  mtu 1500&lt;br/&gt;
        inet 10.242.62.139  netmask 255.255.255.0  broadcast 10.242.62.255&lt;br/&gt;
        inet6 fe80::e643:4bff:fe21:6750  prefixlen 64  scopeid 0x20&amp;lt;link&amp;gt;&lt;br/&gt;
        ether e4:43:4b:21:67:50  txqueuelen 1000  (Ethernet)&lt;br/&gt;
        RX packets 244991196  bytes 40288519866 (37.5 GiB)&lt;br/&gt;
        RX errors 0  dropped 689357  overruns 0  frame 0&lt;br/&gt;
        TX packets 243506307  bytes 25816873448 (24.0 GiB)&lt;br/&gt;
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;em1: flags=6211&amp;lt;UP,BROADCAST,RUNNING,SLAVE,MULTICAST&amp;gt;  mtu 1500&lt;br/&gt;
        ether e4:43:4b:21:67:50  txqueuelen 1000  (Ethernet)&lt;br/&gt;
        RX packets 124720954  bytes 20746393497 (19.3 GiB)&lt;br/&gt;
        RX errors 0  dropped 19702  overruns 0  frame 0&lt;br/&gt;
        TX packets 121422150  bytes 12632871392 (11.7 GiB)&lt;br/&gt;
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0&lt;/p&gt;

&lt;p&gt;em2: flags=6211&amp;lt;UP,BROADCAST,RUNNING,SLAVE,MULTICAST&amp;gt;  mtu 1500&lt;br/&gt;
        ether e4:43:4b:21:67:50  txqueuelen 1000  (Ethernet)&lt;br/&gt;
        RX packets 120270244  bytes 19542126677 (18.2 GiB)&lt;br/&gt;
        RX errors 0  dropped 315129  overruns 0  frame 0&lt;br/&gt;
        TX packets 122084164  bytes 13184003240 (12.2 GiB)&lt;br/&gt;
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0&lt;/p&gt;

&lt;p&gt;em3: flags=4163&amp;lt;UP,BROADCAST,RUNNING,MULTICAST&amp;gt;  mtu 1500&lt;br/&gt;
        inet 10.0.0.139  netmask 255.255.255.0  broadcast 10.0.0.255&lt;br/&gt;
        inet6 fe80::e643:4bff:fe21:6754  prefixlen 64  scopeid 0x20&amp;lt;link&amp;gt;&lt;br/&gt;
        ether e4:43:4b:21:67:54  txqueuelen 1000  (Ethernet)&lt;br/&gt;
        RX packets 228132  bytes 31718009 (30.2 MiB)&lt;br/&gt;
        RX errors 0  dropped 0  overruns 0  frame 0&lt;br/&gt;
        TX packets 328790  bytes 52526191 (50.0 MiB)&lt;br/&gt;
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0&lt;br/&gt;
        device memory 0x92980000-929fffff  &lt;/p&gt;

&lt;p&gt;em4: flags=4163&amp;lt;UP,BROADCAST,RUNNING,MULTICAST&amp;gt;  mtu 1500&lt;br/&gt;
        inet 10.242.112.176  netmask 255.255.255.128  broadcast 10.242.112.255&lt;br/&gt;
        inet6 fe80::e643:4bff:fe21:6755  prefixlen 64  scopeid 0x20&amp;lt;link&amp;gt;&lt;br/&gt;
        ether e4:43:4b:21:67:55  txqueuelen 1000  (Ethernet)&lt;br/&gt;
        RX packets 26539  bytes 1600620 (1.5 MiB)&lt;br/&gt;
        RX errors 0  dropped 0  overruns 0  frame 0&lt;br/&gt;
        TX packets 210  bytes 20138 (19.6 KiB)&lt;br/&gt;
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0&lt;br/&gt;
        device memory 0x92900000-9297ffff  &lt;/p&gt;

&lt;p&gt;ib0: flags=4163&amp;lt;UP,BROADCAST,RUNNING,MULTICAST&amp;gt;  mtu 2044&lt;br/&gt;
        inet 10.242.46.233  netmask 255.255.252.0  broadcast 10.242.47.255&lt;br/&gt;
        inet6 fe80::ba59:9f03:5:a4ec  prefixlen 64  scopeid 0x20&amp;lt;link&amp;gt;&lt;br/&gt;
Infiniband hardware address can be incorrect! Please read BUGS section in ifconfig(8).&lt;br/&gt;
        infiniband 20:00:07:EB:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00  txqueuelen 256  (InfiniBand)&lt;br/&gt;
        RX packets 10537  bytes 1677139 (1.5 MiB)&lt;br/&gt;
        RX errors 0  dropped 0  overruns 0  frame 0&lt;br/&gt;
        TX packets 14  bytes 984 (984.0 B)&lt;br/&gt;
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0&lt;/p&gt;

&lt;p&gt;idrac: flags=67&amp;lt;UP,BROADCAST,RUNNING&amp;gt;  mtu 1500&lt;br/&gt;
        inet6 fe80::4ed9:8fff:fe7c:c104  prefixlen 64  scopeid 0x20&amp;lt;link&amp;gt;&lt;br/&gt;
        ether 4c:d9:8f:7c:c1:04  txqueuelen 1000  (Ethernet)&lt;br/&gt;
        RX packets 690330  bytes 56738080 (54.1 MiB)&lt;br/&gt;
        RX errors 0  dropped 0  overruns 0  frame 0&lt;br/&gt;
        TX packets 808171  bytes 76845763 (73.2 MiB)&lt;br/&gt;
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0&lt;/p&gt;

&lt;p&gt;lo: flags=73&amp;lt;UP,LOOPBACK,RUNNING&amp;gt;  mtu 65536&lt;br/&gt;
        inet 127.0.0.1  netmask 255.0.0.0&lt;br/&gt;
        inet6 ::1  prefixlen 128  scopeid 0x10&amp;lt;host&amp;gt;&lt;br/&gt;
        loop  txqueuelen 1000  (Local Loopback)&lt;br/&gt;
        RX packets 20976  bytes 4281290 (4.0 MiB)&lt;br/&gt;
        RX errors 0  dropped 0  overruns 0  frame 0&lt;br/&gt;
        TX packets 20976  bytes 4281290 (4.0 MiB)&lt;br/&gt;
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0&lt;/p&gt;


&lt;p&gt;Deails of a lnet router that doesn&apos;t work:&lt;/p&gt;

&lt;p&gt;&lt;span class=&quot;error&quot;&gt;&amp;#91;root@boslnet03 ~&amp;#93;&lt;/span&gt;# dkms status&lt;br/&gt;
lustre-client, 2.12.4, 3.10.0-1062.18.1.el7.x86_64, x86_64: installed&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;root@boslnet03 ~&amp;#93;&lt;/span&gt;# more /etc/modprobe.d/lustre.conf &lt;br/&gt;
options lnet networks=&quot;tcp1(bond0), o2ib1(ib0)&quot;&lt;br/&gt;
options lnet forwarding=&quot;enabled&quot;&lt;br/&gt;
options lnet lnet_peer_discovery_disabled=1&lt;br/&gt;
options lnet lnet_health_sensitivity=0&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;root@boslnet03 ~&amp;#93;&lt;/span&gt;# lnetctl net show&lt;br/&gt;
net:&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;net type: lo&lt;br/&gt;
      local NI(s):&lt;/li&gt;
	&lt;li&gt;nid: 0@lo&lt;br/&gt;
          status: up&lt;/li&gt;
	&lt;li&gt;net type: tcp1&lt;br/&gt;
      local NI(s):&lt;/li&gt;
	&lt;li&gt;nid: 10.242.62.213@tcp1&lt;br/&gt;
          status: down&lt;br/&gt;
          interfaces:&lt;br/&gt;
              0: bond0&lt;/li&gt;
	&lt;li&gt;net type: o2ib1&lt;br/&gt;
      local NI(s):&lt;/li&gt;
	&lt;li&gt;nid: 10.242.46.227@o2ib1&lt;br/&gt;
          status: down&lt;br/&gt;
          interfaces:&lt;br/&gt;
              0: ib0&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;root@boslnet03 ~&amp;#93;&lt;/span&gt;# more /etc/redhat-release &lt;br/&gt;
CentOS Linux release 7.7.1908 (Core)&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;root@boslnet03 ~&amp;#93;&lt;/span&gt;# lspci |grep -i mell&lt;br/&gt;
08:00.0 Ethernet controller: Mellanox Technologies MT27710 Family &lt;span class=&quot;error&quot;&gt;&amp;#91;ConnectX-4 Lx&amp;#93;&lt;/span&gt;&lt;br/&gt;
08:00.1 Ethernet controller: Mellanox Technologies MT27710 Family &lt;span class=&quot;error&quot;&gt;&amp;#91;ConnectX-4 Lx&amp;#93;&lt;/span&gt;&lt;br/&gt;
5b:00.0 Infiniband controller: Mellanox Technologies MT27800 Family &lt;span class=&quot;error&quot;&gt;&amp;#91;ConnectX-5&amp;#93;&lt;/span&gt;&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;root@boslfs02mds01 ~&amp;#93;&lt;/span&gt;# lspci |grep -i eth&lt;br/&gt;
01:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)&lt;br/&gt;
01:00.1 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)&lt;br/&gt;
19:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)&lt;br/&gt;
19:00.1 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;root@boslnet03 ~&amp;#93;&lt;/span&gt;# ibstat&lt;br/&gt;
CA &apos;mlx5_2&apos;&lt;br/&gt;
	CA type: MT4119&lt;br/&gt;
	Number of ports: 1&lt;br/&gt;
	Firmware version: 16.29.1016&lt;br/&gt;
	Hardware version: 0&lt;br/&gt;
	Node GUID: 0x98039b0300b4146e&lt;br/&gt;
	System image GUID: 0x98039b0300b4146e&lt;br/&gt;
	Port 1:&lt;br/&gt;
		State: Active&lt;br/&gt;
		Physical state: LinkUp&lt;br/&gt;
		Rate: 100&lt;br/&gt;
		Base lid: 26&lt;br/&gt;
		LMC: 0&lt;br/&gt;
		SM lid: 1&lt;br/&gt;
		Capability mask: 0x2651e848&lt;br/&gt;
		Port GUID: 0x98039b0300b4146e&lt;br/&gt;
		Link layer: InfiniBand&lt;br/&gt;
CA &apos;mlx5_bond_0&apos;&lt;br/&gt;
	CA type: MT4117&lt;br/&gt;
	Number of ports: 1&lt;br/&gt;
	Firmware version: 14.29.1016&lt;br/&gt;
	Hardware version: 0&lt;br/&gt;
	Node GUID: 0x98039b0300cbcb34&lt;br/&gt;
	System image GUID: 0x98039b0300cbcb34&lt;br/&gt;
	Port 1:&lt;br/&gt;
		State: Active&lt;br/&gt;
		Physical state: LinkUp&lt;br/&gt;
		Rate: 10&lt;br/&gt;
		Base lid: 0&lt;br/&gt;
		LMC: 0&lt;br/&gt;
		SM lid: 0&lt;br/&gt;
		Capability mask: 0x00010000&lt;br/&gt;
		Port GUID: 0x9a039bfffecbcb34&lt;br/&gt;
		Link layer: Ethernet&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;root@boslnet03 ~&amp;#93;&lt;/span&gt;# ifconfig&lt;br/&gt;
bond0: flags=5187&amp;lt;UP,BROADCAST,RUNNING,MASTER,MULTICAST&amp;gt;  mtu 1500&lt;br/&gt;
        inet 10.242.62.213  netmask 255.255.255.0  broadcast 10.242.62.255&lt;br/&gt;
        inet6 fe80::9a03:9bff:fecb:cb34  prefixlen 64  scopeid 0x20&amp;lt;link&amp;gt;&lt;br/&gt;
        ether 98:03:9b:cb:cb:34  txqueuelen 1000  (Ethernet)&lt;br/&gt;
        RX packets 493962473  bytes 39340657521 (36.6 GiB)&lt;br/&gt;
        RX errors 0  dropped 0  overruns 0  frame 0&lt;br/&gt;
        TX packets 515219003  bytes 85671938800 (79.7 GiB)&lt;br/&gt;
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;enp0s20f0u1u6: flags=4163&amp;lt;UP,BROADCAST,RUNNING,MULTICAST&amp;gt;  mtu 1500&lt;br/&gt;
        inet 169.254.95.120  netmask 255.255.255.0  broadcast 169.254.95.255&lt;br/&gt;
        inet6 fe80::3868:ddff:fe0c:29f7  prefixlen 64  scopeid 0x20&amp;lt;link&amp;gt;&lt;br/&gt;
        ether 3a:68:dd:0c:29:f7  txqueuelen 1000  (Ethernet)&lt;br/&gt;
        RX packets 2505736  bytes 240636075 (229.4 MiB)&lt;br/&gt;
        RX errors 1  dropped 0  overruns 0  frame 0&lt;br/&gt;
        TX packets 2501477  bytes 274965254 (262.2 MiB)&lt;br/&gt;
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0&lt;/p&gt;

&lt;p&gt;ib0: flags=4163&amp;lt;UP,BROADCAST,RUNNING,MULTICAST&amp;gt;  mtu 2044&lt;br/&gt;
        inet 10.242.46.227  netmask 255.255.252.0  broadcast 10.242.47.255&lt;br/&gt;
        inet6 fe80::9a03:9b03:b4:146e  prefixlen 64  scopeid 0x20&amp;lt;link&amp;gt;&lt;br/&gt;
Infiniband hardware address can be incorrect! Please read BUGS section in ifconfig(8).&lt;br/&gt;
        infiniband 20:00:10:29:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00  txqueuelen 256  (InfiniBand)&lt;br/&gt;
        RX packets 39137  bytes 6016144 (5.7 MiB)&lt;br/&gt;
        RX errors 0  dropped 0  overruns 0  frame 0&lt;br/&gt;
        TX packets 50  bytes 3144 (3.0 KiB)&lt;br/&gt;
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0&lt;/p&gt;

&lt;p&gt;lo: flags=73&amp;lt;UP,LOOPBACK,RUNNING&amp;gt;  mtu 65536&lt;br/&gt;
        inet 127.0.0.1  netmask 255.0.0.0&lt;br/&gt;
        inet6 ::1  prefixlen 128  scopeid 0x10&amp;lt;host&amp;gt;&lt;br/&gt;
        loop  txqueuelen 1000  (Local Loopback)&lt;br/&gt;
        RX packets 260  bytes 21314 (20.8 KiB)&lt;br/&gt;
        RX errors 0  dropped 0  overruns 0  frame 0&lt;br/&gt;
        TX packets 260  bytes 21314 (20.8 KiB)&lt;br/&gt;
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0&lt;/p&gt;

&lt;p&gt;p1p1: flags=6211&amp;lt;UP,BROADCAST,RUNNING,SLAVE,MULTICAST&amp;gt;  mtu 1500&lt;br/&gt;
        ether 98:03:9b:cb:cb:34  txqueuelen 1000  (Ethernet)&lt;br/&gt;
        RX packets 276852635  bytes 24386554325 (22.7 GiB)&lt;br/&gt;
        RX errors 0  dropped 0  overruns 0  frame 0&lt;br/&gt;
        TX packets 295567994  bytes 70653344774 (65.8 GiB)&lt;br/&gt;
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0&lt;/p&gt;

&lt;p&gt;p1p2: flags=6211&amp;lt;UP,BROADCAST,RUNNING,SLAVE,MULTICAST&amp;gt;  mtu 1500&lt;br/&gt;
        ether 98:03:9b:cb:cb:34  txqueuelen 1000  (Ethernet)&lt;br/&gt;
        RX packets 217109839  bytes 14954103262 (13.9 GiB)&lt;br/&gt;
        RX errors 0  dropped 0  overruns 0  frame 0&lt;br/&gt;
        TX packets 219651019  bytes 15018595778 (13.9 GiB)&lt;br/&gt;
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0&lt;/p&gt;</comment>
                            <comment id="295393" author="ssmirnov" created="Thu, 18 Mar 2021 22:48:44 +0000"  >&lt;p&gt;Michael,&lt;/p&gt;

&lt;p&gt;Would it be possible to set up a live debugging session so that we can go through the procedure of adding a new router together?&#160;&lt;/p&gt;

&lt;p&gt;I haven&apos;t seen anything wrong in the logs you provided, but the errors mentioned in the description could be explained by mis-configuration.&lt;/p&gt;

&lt;p&gt;The procedure should be roughly as follows:&lt;/p&gt;

&lt;p&gt;1) Setup router node: configure lnet, start lnet, verify by dumping &quot;lnetctl net show&quot;, and by &quot;lnetctl pinging&quot; to and from peers on both nets (tcp1, o2ib1) multiple times.&lt;/p&gt;

&lt;p&gt;2) Add the route on the server and on the client, list the new router as the gw to use to reach respective nets.&lt;/p&gt;

&lt;p&gt;3) Verify by&#160;&quot;lnetctl pinging&quot; multiple times across the new router (a tcp1 client to server and back).&lt;/p&gt;

&lt;p&gt;4) Verify by&#160;&quot;lnetctl pinging&quot; multiple times across the new router using a client on tcp2 (from the logs provided, the new router handles only tcp1, so let&apos;s verify that tcp2 is still good).&lt;/p&gt;

&lt;p&gt;5) If all looks good so far, try mounting&lt;/p&gt;</comment>
                            <comment id="295823" author="mre64" created="Tue, 23 Mar 2021 15:44:16 +0000"  >&lt;p&gt;Hi, i&apos;m attaching 2 logs files per Serguei&apos;s request. Thanks, Mike. &lt;span class=&quot;nobr&quot;&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/attachment/38051/38051_log.txt&quot; title=&quot;log.txt attached to LU-14454&quot;&gt;log.txt&lt;sup&gt;&lt;img class=&quot;rendericon&quot; src=&quot;https://jira.whamcloud.com/images/icons/link_attachment_7.gif&quot; height=&quot;7&quot; width=&quot;7&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/sup&gt;&lt;/a&gt;&lt;/span&gt;  &lt;span class=&quot;nobr&quot;&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/attachment/38052/38052_log1.txt&quot; title=&quot;log1.txt attached to LU-14454&quot;&gt;log1.txt&lt;sup&gt;&lt;img class=&quot;rendericon&quot; src=&quot;https://jira.whamcloud.com/images/icons/link_attachment_7.gif&quot; height=&quot;7&quot; width=&quot;7&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/sup&gt;&lt;/a&gt;&lt;/span&gt; &lt;/p&gt;</comment>
                            <comment id="297529" author="ssmirnov" created="Thu, 1 Apr 2021 14:38:17 +0000"  >&lt;p&gt;During the today&apos;s call we found out that the new router&apos;s IP may need to be added to the access control list for a group of clients. Regular ping from the client to the router was going through, but lnetctl ping was not. Because lnetctl ping was part of the procedure we used earlier, failed lnetctl ping we&apos;re seeing now may not explain the behaviour we were seeing before. We&apos;ll proceed once the ACL issue is out of the way.&lt;/p&gt;</comment>
                            <comment id="297550" author="mre64" created="Thu, 1 Apr 2021 15:51:03 +0000"  >&lt;p&gt;I verified that from the client we can lnet ping the 2 new lnet routers now. There were ACLs blocking access.&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;root@holylogin01 ~&amp;#93;&lt;/span&gt;# lnetctl ping 10.242.62.214@tcp1&lt;br/&gt;
ping:&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;primary nid: 10.242.62.214@tcp1&lt;br/&gt;
      Multi-Rail: False&lt;br/&gt;
      peer ni:&lt;/li&gt;
	&lt;li&gt;nid: 10.242.62.214@tcp1&lt;/li&gt;
	&lt;li&gt;nid: 10.242.46.228@o2ib1&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;root@holylogin01 ~&amp;#93;&lt;/span&gt;# lnetctl ping 10.242.62.213@tcp1&lt;br/&gt;
ping:&lt;/li&gt;
	&lt;li&gt;primary nid: 10.242.62.213@tcp1&lt;br/&gt;
      Multi-Rail: False&lt;br/&gt;
      peer ni:&lt;/li&gt;
	&lt;li&gt;nid: 10.242.62.213@tcp1&lt;/li&gt;
	&lt;li&gt;nid: 10.242.46.227@o2ib1&lt;/li&gt;
&lt;/ul&gt;
</comment>
                            <comment id="297727" author="mre64" created="Fri, 2 Apr 2021 19:35:22 +0000"  >&lt;p&gt;It has been verified that network ACLs were causing the issue. We have successfully added a new LNET router and Lustre FS access seems to be fine now. This ticket can be closed. Thanks to Serguei for all his help, much appreciated.&lt;/p&gt;</comment>
                            <comment id="297748" author="pjones" created="Sat, 3 Apr 2021 15:15:18 +0000"  >&lt;p&gt;Great - thanks for the update&lt;/p&gt;</comment>
                            <comment id="297758" author="ssmirnov" created="Sat, 3 Apr 2021 17:31:13 +0000"  >&lt;p&gt;Michael reported later yesterday via e-mail that the clients which didn&apos;t get the route to the new gateway setup had issues with accessing the FS. This should not have happened unless asymmetric routes are configured to be dropped on the clients.&lt;/p&gt;</comment>
                            <comment id="300571" author="ssmirnov" created="Wed, 5 May 2021 17:03:47 +0000"  >&lt;p&gt;Michael recently reported via email:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&quot;I wanted to let you know we finally added those 2 lnet routers we were working previously, globally to our cluster in Holyoke/boston and they are now running in production.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;It appears the procedure requires that you add the lnet routes across all the clients that are mounting the storage on the other side of the routers and then add lnet routes&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;to the storage side after that.&quot;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Michael believes that the ticket can be closed.&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                            <attachment id="38051" name="log.txt" size="76519645" author="mre64" created="Tue, 23 Mar 2021 15:44:42 +0000"/>
                            <attachment id="38052" name="log1.txt" size="22939356" author="mre64" created="Tue, 23 Mar 2021 15:44:32 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i01n4f:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>