<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:04:16 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-153] Clients cannot connect to servers with 2 IB cards until &quot;lctl ping&quot; is done from server to clients</title>
                <link>https://jira.whamcloud.com/browse/LU-153</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Clients are not able to connect to server interfaces when there are two IB cards (and two lnets) configured on servers. We have a workaround consisting on &quot;lctl ping&quot; from servers to both lnets on every client. After that clients are able to connect to servers.&lt;/p&gt;

&lt;p&gt;Once clients are mounted we see the problem if we run the &quot;df -h /lustre&quot; command on clients (obvious cause running this command client needs to contact OSSs).&lt;/p&gt;

&lt;p&gt;At first we try to ping every interface on server:&lt;/p&gt;

&lt;p&gt;client&amp;gt; lctl ping 10.50.0.7@o2ib0 =&amp;gt; No response&lt;br/&gt;
client&amp;gt; lctl ping 10.50.1.7@o2ib1 =&amp;gt; No response&lt;/p&gt;

&lt;p&gt;client&amp;gt;dmesg&lt;br/&gt;
00000400:00000100:3.0F:1297255885.268873:0:2998:0:(lib-move.c:1028:lnet_post_send_locked()) Dropping message for&lt;br/&gt;
12345-10.50.0.7@o2ib1: peer not alive&lt;br/&gt;
00000400:00020000:3.0:1297255885.279758:0:2998:0:(lib-move.c:2628:LNetGet()) error sending GET to 12345-10.50.0.7@o2ib1:&lt;br/&gt;
-113&lt;br/&gt;
00000800:00000100:0.0F:1297255885.284181:0:2435:0:(o2iblnd_cb.c:462:kiblnd_rx_complete()) Rx from 10.50.0.7@o2ib1 failed:&lt;br/&gt;
5&lt;/p&gt;

&lt;p&gt;Then we ping client&apos;s interface (client has only one if) on both lnets:&lt;/p&gt;

&lt;p&gt;server&amp;gt; lctl ping 10.50.0.50@o2ib0 =&amp;gt; OK&lt;br/&gt;
server&amp;gt; lctl ping 10.50.0.50@o2ib1 =&amp;gt; OK&lt;/p&gt;

&lt;p&gt;And problem is solved, &quot;df -h /lustre&quot; will run correctly and all &quot;lctl ping&quot; from client to server&apos;s interface will work fine.&lt;/p&gt;

&lt;p&gt;IPoIB ping command is working fine, we don&apos;t have DDR infiniband drivers running on our machines and we already tried a network configuration using ip2nets.&lt;/p&gt;

&lt;p&gt;Here you have our ip2nets config (note that all machines in the &lt;span class=&quot;error&quot;&gt;&amp;#91;7-10&amp;#93;&lt;/span&gt; range are servers, with two IB cards, one lnet on every one, and all the rest of the machines are clients with only one IB interface and two lnets on every one):&lt;/p&gt;

&lt;p&gt;&lt;span class=&quot;error&quot;&gt;&amp;#91;root@berlin5 ~&amp;#93;&lt;/span&gt;# cat /sys/module/lnet/parameters/ip2nets &lt;br/&gt;
o2ib0(ib0) 10.50.0.&lt;span class=&quot;error&quot;&gt;&amp;#91;7-10&amp;#93;&lt;/span&gt; ; o2ib1(ib1) 10.50.1.&lt;span class=&quot;error&quot;&gt;&amp;#91;7-10&amp;#93;&lt;/span&gt; ; o2ib0(ib0) 10.50.&lt;b&gt;.&lt;/b&gt; ; o2ib1(ib0) 10.50.&lt;b&gt;.&lt;/b&gt;&lt;/p&gt;

&lt;p&gt;So, it seems like clients are not able to choose between one of the interfaces on servers but once server has &apos;pinged&apos; clients, these ones are now able to choose the right interface. &lt;/p&gt;

&lt;p&gt;Do you think this could be an OFED bug? Maybe an lnet bug?&lt;/p&gt;</description>
                <environment>RHEL 6.0 GA, ofed1.5.2, Lustre 2.0.0.1, Mellanox QDR Ib cards</environment>
        <key id="10489">LU-153</key>
            <summary>Clients cannot connect to servers with 2 IB cards until &quot;lctl ping&quot; is done from server to clients</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="2">Won&apos;t Fix</resolution>
                                        <assignee username="liang">Liang Zhen</assignee>
                                    <reporter username="dmoreno">Diego Moreno</reporter>
                        <labels>
                    </labels>
                <created>Thu, 24 Mar 2011 06:39:02 +0000</created>
                <updated>Mon, 1 Apr 2019 19:02:40 +0000</updated>
                            <resolved>Thu, 21 Jul 2011 10:33:11 +0000</resolved>
                                    <version>Lustre 2.0.0</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>5</watches>
                                                                            <comments>
                            <comment id="11323" author="liang" created="Thu, 24 Mar 2011 08:01:36 +0000"  >&lt;p&gt;it looks like a LNet bug to me, I will look into it very soon&lt;/p&gt;</comment>
                            <comment id="13209" author="sebastien.buisson" created="Fri, 22 Apr 2011 02:09:53 +0000"  >&lt;p&gt;Hi,&lt;/p&gt;

&lt;p&gt;Bad news on this, because CEA is now experiencing this issue on one of their cluster, but the problem is the workaround that consists in &apos;lctl pinging&apos; all interfaces does not scale. So they are blocked, and this is why I would like to change this ticket&apos;s priority to Blocker.&lt;/p&gt;

&lt;p&gt;Could you please have a look at this very soon?&lt;/p&gt;

&lt;p&gt;TIA,&lt;br/&gt;
Sebastien.&lt;/p&gt;</comment>
                            <comment id="13211" author="liang" created="Fri, 22 Apr 2011 05:24:18 +0000"  >&lt;p&gt;Could you please provide ifconfig output and routing table of the server? Also, which OFED version and kernel version are you using? &lt;/p&gt;

&lt;p&gt;Thanks&lt;br/&gt;
Liang&lt;/p&gt;</comment>
                            <comment id="13212" author="dmoreno" created="Fri, 22 Apr 2011 06:25:41 +0000"  >&lt;p&gt;Hi Liang,&lt;/p&gt;

&lt;p&gt;On server&apos;s side:&lt;/p&gt;

&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;OFED: 1.5.2 (kernel-ib-1.5-2.6.32_71.14.1.el6.Bull.20.x86_64.ofed1.5.2.Bull.4.el6.x86_64)&lt;/li&gt;
	&lt;li&gt;lnet config:&lt;br/&gt;
export LNET_MULTIRAIL_OPTIONS=&quot;networks=o2ib0(ib0),o2ib1(ib1)&quot;&lt;/li&gt;
	&lt;li&gt;kernel version: kernel-2.6.32-71.14.1.el6.Bull.20.x86_64&lt;/li&gt;
	&lt;li&gt;ifconfig:&lt;br/&gt;
ib0       Link encap:InfiniBand  HWaddr 80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00  &lt;br/&gt;
          inet addr:60.64.0.32  Bcast:60.64.255.255  Mask:255.255.0.0&lt;br/&gt;
          inet6 addr: fe80::202:c903:a:b73f/64 Scope:Link&lt;br/&gt;
          UP BROADCAST RUNNING MULTICAST  MTU:65520  Metric:1&lt;br/&gt;
          RX packets:659 errors:0 dropped:0 overruns:0 frame:0&lt;br/&gt;
          TX packets:105 errors:0 dropped:5 overruns:0 carrier:0&lt;br/&gt;
          collisions:0 txqueuelen:256 &lt;br/&gt;
          RX bytes:48041 (46.9 KiB)  TX bytes:18021 (17.5 KiB)&lt;br/&gt;
ib1       Link encap:InfiniBand  HWaddr 80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00  &lt;br/&gt;
          inet addr:60.64.1.32  Bcast:60.64.255.255  Mask:255.255.0.0&lt;br/&gt;
          inet6 addr: fe80::202:c903:4:89b1/64 Scope:Link&lt;br/&gt;
          UP BROADCAST RUNNING MULTICAST  MTU:65520  Metric:1&lt;br/&gt;
          RX packets:557 errors:0 dropped:0 overruns:0 frame:0&lt;br/&gt;
          TX packets:0 errors:0 dropped:5 overruns:0 carrier:0&lt;br/&gt;
          collisions:0 txqueuelen:256 &lt;br/&gt;
          RX bytes:31192 (30.4 KiB)  TX bytes:0 (0.0 b)&lt;/li&gt;
&lt;/ul&gt;


&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;Routing:&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;root@berlin8 ~&amp;#93;&lt;/span&gt;# route&lt;br/&gt;
Kernel IP routing table&lt;br/&gt;
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface&lt;br/&gt;
60.64.0.0       *               255.255.0.0     U     0      0        0 ib0&lt;br/&gt;
60.64.0.0       *               255.255.0.0     U     0      0        0 ib1&lt;br/&gt;
60.0.0.0        *               255.248.0.0     U     0      0        0 eth0&lt;br/&gt;
default         berlin32.echi.l 0.0.0.0         UG    0      0        0 eth0&lt;/li&gt;
&lt;/ul&gt;



&lt;p&gt;On client&apos;s side:&lt;/p&gt;

&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;OFED: 1.5.2 (kernel-ib-1.5-2.6.32_71.14.1.el6.Bull.20.x86_64.ofed1.5.2.Bull.4.el6.x86_64)&lt;/li&gt;
	&lt;li&gt;lnet config:&lt;br/&gt;
export LNET_MULTIRAIL_OPTIONS=&quot;networks=o2ib0(ib0),o2ib1(ib0)&quot; &lt;/li&gt;
	&lt;li&gt;kernel version: kernel-2.6.32-71.14.1.el6.Bull.20.x86_64&lt;/li&gt;
	&lt;li&gt;ifconfig:&lt;br/&gt;
ib0       Link encap:InfiniBand  HWaddr 80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00  &lt;br/&gt;
          inet addr:60.64.2.57  Bcast:60.64.255.255  Mask:255.255.0.0&lt;br/&gt;
          inet6 addr: fe80::230:48ff:fff4:ca15/64 Scope:Link&lt;br/&gt;
          UP BROADCAST RUNNING MULTICAST  MTU:65520  Metric:1&lt;br/&gt;
          RX packets:1188 errors:0 dropped:0 overruns:0 frame:0&lt;br/&gt;
          TX packets:86 errors:0 dropped:5 overruns:0 carrier:0&lt;br/&gt;
          collisions:0 txqueuelen:256 &lt;br/&gt;
          RX bytes:83356 (81.4 KiB)  TX bytes:14580 (14.2 KiB)&lt;/li&gt;
	&lt;li&gt;Routing:&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;root@berlin71 ~&amp;#93;&lt;/span&gt;# route&lt;br/&gt;
Kernel IP routing table&lt;br/&gt;
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface&lt;br/&gt;
60.64.0.0       *               255.255.0.0     U     0      0        0 ib0&lt;br/&gt;
60.0.0.0        *               255.248.0.0     U     0      0        0 eth2&lt;br/&gt;
default         berlin32.echi.l 0.0.0.0         UG    0      0        0 eth2&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;Do you need any other information?&lt;/p&gt;</comment>
                            <comment id="13213" author="pjones" created="Fri, 22 Apr 2011 07:40:15 +0000"  >&lt;p&gt;Sorry, but, while this is undoubtedly an important issue for CEA, this is not a general enough issue to be a 2.1 blocker.&lt;/p&gt;</comment>
                            <comment id="13214" author="liang" created="Fri, 22 Apr 2011 08:22:04 +0000"  >&lt;p&gt;It would be a little helpful if you can &quot;echo +neterror &amp;gt; &amp;gt; /proc/sys/lnet/printk&quot; on both server and the client, and reproduce it to get o2iblnd error message on both sides (very likely, you will see nothing on server side).&lt;/p&gt;

&lt;p&gt;Actually, I remember other people got into similar trouble while having multiple interfaces with same netmask and in the same subnet, and the problem disappeared after change them to different subnets. Would it be possible to change one of those address and make two interfaces in separated subnets and with different netmask? If this can help, it could be an issue of OFED although I&apos;m not sure.&lt;/p&gt;</comment>
                            <comment id="13314" author="dmoreno" created="Tue, 26 Apr 2011 06:33:51 +0000"  >&lt;p&gt;Hi Liang,&lt;/p&gt;

&lt;p&gt;I&apos;ll try everything you propose on your last comment. However, having two interfaces in separated subnets won&apos;t be possible on our configuration as our clients, which have only one interface, need to access both interfaces on servers, that&apos;s why they are in the same subnet.&lt;/p&gt;

&lt;p&gt;One possibility could be to configure an alias interface on every client with the new subnet but I don&apos;t know if this works on lustre...&lt;/p&gt;</comment>
                            <comment id="13463" author="dmoreno" created="Fri, 29 Apr 2011 03:47:00 +0000"  >&lt;p&gt;Hi,&lt;/p&gt;

&lt;p&gt;Putting both interfaces in different subnets made the trick. If I want to have my clients connecting to both interfaces then I need to create an IP alias on every client because my clients have only one IB interface, is this a problem in lustre?&lt;/p&gt;

&lt;p&gt;I also get the traces you asked me when both cards are in the same subnet. In dmesg I only obtained the next two lines:&lt;/p&gt;

&lt;p&gt;Lustre: 40750:0:(lib-move.c:1028:lnet_post_send_locked()) Dropping message for 12345-60.64.1.32@o2ib1: peer not alive&lt;br/&gt;
LustreError: 40750:0:(lib-move.c:2628:LNetGet()) error sending GET to 12345-60.64.1.32@o2ib1: -113&lt;/p&gt;

&lt;p&gt;So I also obtained debug daemon lines. The activated traces are: info, neterror, net, warning, nettrace, error and emerg. See lctl_ping_debug attachment.&lt;/p&gt;

&lt;p&gt;Then do you think this is an issue with OFED and not with lnet?&lt;/p&gt;

&lt;p&gt;Thanks,&lt;/p&gt;</comment>
                            <comment id="13464" author="dmoreno" created="Fri, 29 Apr 2011 03:48:18 +0000"  >&lt;p&gt;Debug traces when lctl ping doesn&apos;t work in a multirail context with both interfaces in the same subnet.&lt;/p&gt;</comment>
                            <comment id="13636" author="sebastien.buisson" created="Wed, 4 May 2011 02:05:44 +0000"  >&lt;p&gt;Hi,&lt;/p&gt;

&lt;p&gt;I tried two alternatives to the client IP alias (servers still have their 2 IB interfaces on 2 different subnets):&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;create a route on the client node, telling to use its ib0 interface to reach the servers&apos; second IB interface network;&lt;/li&gt;
	&lt;li&gt;create a route on the client node, telling to use a gateway represented by the ib0 IPoIB address of one of the servers to reach the servers&apos; second IB interface network.&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;Both work at the IP level, but fail at the LNET level (ie &apos;ping&apos; is OK, but &apos;lctl ping&apos; is not).&lt;/p&gt;

&lt;p&gt;So it seems to work only when the client has an IPoIB address in the same subnet as the OSTs it wants to reach.&lt;br/&gt;
Is this particular issue due to an OFED problem again? or could it be something fixable in Lustre?&lt;/p&gt;

&lt;p&gt;Moreover, can you confirm it is safe to run Lustre clients with IP aliases?&lt;/p&gt;

&lt;p&gt;TIA,&lt;br/&gt;
Sebastien.&lt;/p&gt;</comment>
                            <comment id="13662" author="liang" created="Wed, 4 May 2011 07:56:44 +0000"  >&lt;p&gt;from the log, the client got IB_CM_REJ_INVALID_SERVICE_ID from server, which means it thought there is no listener on server, I suspect it&apos;s because ARP flux, could you please try to set this  and reload o2iblnd? &lt;/p&gt;

&lt;p&gt;   sysctl -w net.ipv4.conf.ib0.arp_ignore=1&lt;br/&gt;
   sysctl -w net.ipv4.conf.ib1.arp_ignore=1&lt;/p&gt;

&lt;p&gt;If this can&apos;t resolve the problem, could you try to see if the client can reach server if server has only one NI (i.e: only start LNet with o2ib0(ib0), then shutdown and try with o2ib1(ib1))&lt;/p&gt;

&lt;p&gt;I don&apos;t know whether it&apos;s safe to use IP aliases for o2iblnd, but I can try to find out if none of previous ways can give us more hints.&lt;/p&gt;

&lt;p&gt;Thanks&lt;br/&gt;
Liang&lt;/p&gt;</comment>
                            <comment id="13727" author="sebastien.buisson" created="Thu, 5 May 2011 04:23:59 +0000"  >&lt;p&gt;Hi,&lt;/p&gt;

&lt;p&gt;I tested the two proposals from Liang.&lt;/p&gt;

&lt;p&gt;My first tests consisted in tuning the ARP parameters, but it had no effect.&lt;/p&gt;

&lt;p&gt;My second test was to configure one IPoIB on the client (60.64.x.x/16), only one IPoIB on the server (61.64.1.x/16), add a route in the IP routing table of the client so that the TCP connection is working. In this configuration, the &apos;lctl ping&apos; was not working.&lt;/p&gt;

&lt;p&gt;So, unless the client has an IPoIB address in the same subnet as the OST to reach, the connection seems to be blocked by a bug.&lt;/p&gt;

&lt;p&gt;Sebastien.&lt;/p&gt;</comment>
                            <comment id="13865" author="liang" created="Fri, 6 May 2011 05:38:26 +0000"  >&lt;p&gt;Sebatien, &lt;/p&gt;

&lt;p&gt;It&apos;s very weird if it can&apos;t work even server only has one ipoib interface, have tried both interfaces on server (i.e: only with ib0 and only with ib1)? &lt;br/&gt;
I checked our code again, we don&apos;t have any special tricky while using rdma_resolve_addr &amp;amp; rdma_connect in o2iblnd, so I still suspect it&apos;s because ARP flux, would it be possible that ARP cache is not refreshed so you still can&apos;t make it work with previous solution? &lt;/p&gt;

&lt;p&gt;we can see this from release note of OFED:&lt;/p&gt;


&lt;p&gt;2. Known Issues&lt;br/&gt;
===============================================================================&lt;br/&gt;
1. If a host has multiple interfaces and (a) each interface belongs to a&lt;br/&gt;
   different IP subnet, (b) they all use the same InfiniBand Partition, and (c)&lt;br/&gt;
   they are connected to the same IB Switch, then the host violates the IP rule&lt;br/&gt;
   requiring different broadcast domains. Consequently, the host may build an&lt;br/&gt;
   incorrect ARP table.&lt;/p&gt;

&lt;p&gt;   The correct setting of a multi-homed IPoIB host is achieved by using a&lt;br/&gt;
   different PKEY for each IP subnet. If a host has multiple interfaces on the&lt;br/&gt;
   same IP subnet, then to prevent a peer from building an incorrect ARP entry&lt;br/&gt;
   (neighbor) set the net.ipv4.conf.X.arp_ignore value to 1 or 2, where X&lt;br/&gt;
   stands for the IPoIB (non-child) interfaces (e.g., ib0, ib1, etc). This&lt;br/&gt;
   causes the network stack to send ARP replies only on the interface with the&lt;br/&gt;
   IP address specified in the ARP request:&lt;/p&gt;

&lt;p&gt;   sysctl -w net.ipv4.conf.ib0.arp_ignore=1&lt;br/&gt;
   sysctl -w net.ipv4.conf.ib1.arp_ignore=1&lt;/p&gt;

&lt;p&gt;   Or, globally,&lt;/p&gt;

&lt;p&gt;   sysctl -w net.ipv4.conf.all.arp_ignore=1&lt;/p&gt;

&lt;p&gt;   To learn more about the arp_ignore parameter, see&lt;br/&gt;
   Documentation/networking/ip-sysctl.txt.&lt;br/&gt;
   Note that distributions have the means to make kernel parameters persistent.&lt;/p&gt;


&lt;p&gt;Could you please try a few more things:&lt;br/&gt;
0. ifconfig output from both client and server&lt;br/&gt;
1. regular ping both interfaces of server from client, and collect &quot;arp -na&quot; on client&lt;br/&gt;
2. please collect &quot;arp -na&quot; and &quot;ip neighbor show&quot; from the client&lt;br/&gt;
3. assume the server has ib0(10.50.0.7), ib1(10.50.1.7), please run this and collect output:&lt;/p&gt;

&lt;p&gt;   client $ tcpdump -c 3 -nni ib0 arp&lt;br/&gt;
   server $ tcpdump -c 3 -nni ib0 arp&lt;br/&gt;
   client $ arping -c 3 -D -I ib0 10.50.0.7&lt;/p&gt;

&lt;p&gt;4. collect output of:&lt;/p&gt;

&lt;p&gt;   client $ tcpdump -c 3 -nni ib0 arp&lt;br/&gt;
   server $ tcpdump -c 3 -nni ib1 arp&lt;br/&gt;
   client $ arping -c 3 -D -I ib0 10.50.1.7&lt;/p&gt;

&lt;p&gt;5. please clear arp cache on client by &quot;arp -d servername 10.50.0.7&quot; and &quot;arp -d servername 10.50.1.7&quot; (or you have a better way)&lt;/p&gt;

&lt;p&gt;6. ping both interfaces of server and collect &quot;arp -na&quot; again on client&lt;/p&gt;

&lt;p&gt;7. reload o2iblnd on both client and server to see whether this can work&lt;/p&gt;

&lt;p&gt;8. if it still can&apos;t help, please add those sysctl to /etc/sysctl.conf and reboot client and server to see how it works&lt;/p&gt;

&lt;p&gt;Anyway, if these things can help, I don&apos;t really want previous logs/information, but if they still can&apos;t, we have to get those information for the next step survey.&lt;/p&gt;

&lt;p&gt;Thanks&lt;br/&gt;
Liang&lt;/p&gt;</comment>
                            <comment id="15032" author="dmoreno" created="Wed, 25 May 2011 08:06:37 +0000"  >&lt;p&gt;Hi,&lt;/p&gt;

&lt;p&gt;Finally I could make what you wrote in your previous comment:&lt;/p&gt;

&lt;p&gt;In fact, we are having problems with the arp table. When we run &apos;arping&apos; from client to server we get not only one arp address but 2, so the client is not able to have a proper arp table.&lt;/p&gt;

&lt;p&gt;15:04:36.258634 ARP, Request who-has 60.64.0.31 (00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff) tell 0.0.0.0, length 56&lt;br/&gt;
15:04:36.258751 ARP, Reply 60.64.0.31 is-at 80:00:00:48:fe:80:00:00:00:00:00:00:00:02:c9:03:00:04:89:c5, length 56&lt;br/&gt;
15:04:36.258773 ARP, Reply 60.64.0.31 is-at 80:00:00:48:fe:80:00:00:00:00:00:00:00:02:c9:03:00:0a:ba:57, length 56&lt;/p&gt;

&lt;p&gt;If we set &apos;arp_ignore=1&apos; then one of the interfaces don&apos;t reply to standard ping requests (it&apos;s not always the same interface), so we still have the problem. It&apos;s like if only one interface can reply.&lt;/p&gt;

&lt;p&gt;Also, as a consequence, routing is not well done from server to client as the client is connecting to a bad interface and not to the good one. It&apos;s definitively an OFED bug, do you agree?&lt;/p&gt;

&lt;p&gt;So, by the moment, the only available solution is to have an alias ip on client and 2 different subnets, but this cannot be easy to do in a cluster with more than 4000 nodes...&lt;/p&gt;</comment>
                            <comment id="16305" author="liang" created="Tue, 14 Jun 2011 12:40:07 +0000"  >&lt;p&gt;sorry I didn&apos;t see the last comment, I agree that it&apos;s not something we can fix inside o2iblnd and I don&apos;t have a better suggestion to help you make it easier, &lt;img class=&quot;emoticon&quot; src=&quot;https://jira.whamcloud.com/images/icons/emoticons/sad.png&quot; height=&quot;16&quot; width=&quot;16&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/p&gt;

&lt;p&gt;Liang&lt;/p&gt;</comment>
                            <comment id="16450" author="dmoreno" created="Thu, 16 Jun 2011 09:59:05 +0000"  >&lt;p&gt;I think we agree this is an bug in OFED stack. We can close this bug and up to OFED resolution we&apos;ll use an ib alias device on each client.&lt;/p&gt;

&lt;p&gt;Thanks,&lt;/p&gt;</comment>
                            <comment id="18046" author="pjones" created="Thu, 21 Jul 2011 10:33:11 +0000"  >&lt;p&gt;As per Bull, they have a workaround for this OFED bug&lt;/p&gt;</comment>
                            <comment id="87364" author="spitzcor" created="Tue, 24 Jun 2014 14:16:44 +0000"  >&lt;p&gt;Does anyone know what OFED the bug was fixed, if ever?  Diego, did you or anyone report this bug to OFA?&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="55300">LU-12132</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="10189" name="lctl_ping_debug" size="670965" author="dmoreno" created="Fri, 29 Apr 2011 03:48:18 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10040" key="com.atlassian.jira.plugin.system.customfieldtypes:labels">
                        <customfieldname>Epic</customfieldname>
                        <customfieldvalues>
                                        <label>lnet</label>
            <label>lustre-2_0</label>
            <label>multirail</label>
            <label>ping</label>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzvsm7:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>8540</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>