Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-153

Clients cannot connect to servers with 2 IB cards until "lctl ping" is done from server to clients

Details

    • Bug
    • Resolution: Won't Fix
    • Minor
    • None
    • Lustre 2.0.0
    • None
    • RHEL 6.0 GA, ofed1.5.2, Lustre 2.0.0.1, Mellanox QDR Ib cards

    Description

      Clients are not able to connect to server interfaces when there are two IB cards (and two lnets) configured on servers. We have a workaround consisting on "lctl ping" from servers to both lnets on every client. After that clients are able to connect to servers.

      Once clients are mounted we see the problem if we run the "df -h /lustre" command on clients (obvious cause running this command client needs to contact OSSs).

      At first we try to ping every interface on server:

      client> lctl ping 10.50.0.7@o2ib0 => No response
      client> lctl ping 10.50.1.7@o2ib1 => No response

      client>dmesg
      00000400:00000100:3.0F:1297255885.268873:0:2998:0:(lib-move.c:1028:lnet_post_send_locked()) Dropping message for
      12345-10.50.0.7@o2ib1: peer not alive
      00000400:00020000:3.0:1297255885.279758:0:2998:0:(lib-move.c:2628:LNetGet()) error sending GET to 12345-10.50.0.7@o2ib1:
      -113
      00000800:00000100:0.0F:1297255885.284181:0:2435:0:(o2iblnd_cb.c:462:kiblnd_rx_complete()) Rx from 10.50.0.7@o2ib1 failed:
      5

      Then we ping client's interface (client has only one if) on both lnets:

      server> lctl ping 10.50.0.50@o2ib0 => OK
      server> lctl ping 10.50.0.50@o2ib1 => OK

      And problem is solved, "df -h /lustre" will run correctly and all "lctl ping" from client to server's interface will work fine.

      IPoIB ping command is working fine, we don't have DDR infiniband drivers running on our machines and we already tried a network configuration using ip2nets.

      Here you have our ip2nets config (note that all machines in the [7-10] range are servers, with two IB cards, one lnet on every one, and all the rest of the machines are clients with only one IB interface and two lnets on every one):

      [root@berlin5 ~]# cat /sys/module/lnet/parameters/ip2nets
      o2ib0(ib0) 10.50.0.[7-10] ; o2ib1(ib1) 10.50.1.[7-10] ; o2ib0(ib0) 10.50.. ; o2ib1(ib0) 10.50..

      So, it seems like clients are not able to choose between one of the interfaces on servers but once server has 'pinged' clients, these ones are now able to choose the right interface.

      Do you think this could be an OFED bug? Maybe an lnet bug?

      Attachments

        Issue Links

          Activity

            [LU-153] Clients cannot connect to servers with 2 IB cards until "lctl ping" is done from server to clients

            Hi,

            I tested the two proposals from Liang.

            My first tests consisted in tuning the ARP parameters, but it had no effect.

            My second test was to configure one IPoIB on the client (60.64.x.x/16), only one IPoIB on the server (61.64.1.x/16), add a route in the IP routing table of the client so that the TCP connection is working. In this configuration, the 'lctl ping' was not working.

            So, unless the client has an IPoIB address in the same subnet as the OST to reach, the connection seems to be blocked by a bug.

            Sebastien.

            sebastien.buisson Sebastien Buisson (Inactive) added a comment - Hi, I tested the two proposals from Liang. My first tests consisted in tuning the ARP parameters, but it had no effect. My second test was to configure one IPoIB on the client (60.64.x.x/16), only one IPoIB on the server (61.64.1.x/16), add a route in the IP routing table of the client so that the TCP connection is working. In this configuration, the 'lctl ping' was not working. So, unless the client has an IPoIB address in the same subnet as the OST to reach, the connection seems to be blocked by a bug. Sebastien.
            liang Liang Zhen (Inactive) added a comment - - edited

            from the log, the client got IB_CM_REJ_INVALID_SERVICE_ID from server, which means it thought there is no listener on server, I suspect it's because ARP flux, could you please try to set this and reload o2iblnd?

            sysctl -w net.ipv4.conf.ib0.arp_ignore=1
            sysctl -w net.ipv4.conf.ib1.arp_ignore=1

            If this can't resolve the problem, could you try to see if the client can reach server if server has only one NI (i.e: only start LNet with o2ib0(ib0), then shutdown and try with o2ib1(ib1))

            I don't know whether it's safe to use IP aliases for o2iblnd, but I can try to find out if none of previous ways can give us more hints.

            Thanks
            Liang

            liang Liang Zhen (Inactive) added a comment - - edited from the log, the client got IB_CM_REJ_INVALID_SERVICE_ID from server, which means it thought there is no listener on server, I suspect it's because ARP flux, could you please try to set this and reload o2iblnd? sysctl -w net.ipv4.conf.ib0.arp_ignore=1 sysctl -w net.ipv4.conf.ib1.arp_ignore=1 If this can't resolve the problem, could you try to see if the client can reach server if server has only one NI (i.e: only start LNet with o2ib0(ib0), then shutdown and try with o2ib1(ib1)) I don't know whether it's safe to use IP aliases for o2iblnd, but I can try to find out if none of previous ways can give us more hints. Thanks Liang

            Hi,

            I tried two alternatives to the client IP alias (servers still have their 2 IB interfaces on 2 different subnets):

            • create a route on the client node, telling to use its ib0 interface to reach the servers' second IB interface network;
            • create a route on the client node, telling to use a gateway represented by the ib0 IPoIB address of one of the servers to reach the servers' second IB interface network.

            Both work at the IP level, but fail at the LNET level (ie 'ping' is OK, but 'lctl ping' is not).

            So it seems to work only when the client has an IPoIB address in the same subnet as the OSTs it wants to reach.
            Is this particular issue due to an OFED problem again? or could it be something fixable in Lustre?

            Moreover, can you confirm it is safe to run Lustre clients with IP aliases?

            TIA,
            Sebastien.

            sebastien.buisson Sebastien Buisson (Inactive) added a comment - Hi, I tried two alternatives to the client IP alias (servers still have their 2 IB interfaces on 2 different subnets): create a route on the client node, telling to use its ib0 interface to reach the servers' second IB interface network; create a route on the client node, telling to use a gateway represented by the ib0 IPoIB address of one of the servers to reach the servers' second IB interface network. Both work at the IP level, but fail at the LNET level (ie 'ping' is OK, but 'lctl ping' is not). So it seems to work only when the client has an IPoIB address in the same subnet as the OSTs it wants to reach. Is this particular issue due to an OFED problem again? or could it be something fixable in Lustre? Moreover, can you confirm it is safe to run Lustre clients with IP aliases? TIA, Sebastien.

            Debug traces when lctl ping doesn't work in a multirail context with both interfaces in the same subnet.

            dmoreno Diego Moreno (Inactive) added a comment - Debug traces when lctl ping doesn't work in a multirail context with both interfaces in the same subnet.
            dmoreno Diego Moreno (Inactive) added a comment - - edited

            Hi,

            Putting both interfaces in different subnets made the trick. If I want to have my clients connecting to both interfaces then I need to create an IP alias on every client because my clients have only one IB interface, is this a problem in lustre?

            I also get the traces you asked me when both cards are in the same subnet. In dmesg I only obtained the next two lines:

            Lustre: 40750:0:(lib-move.c:1028:lnet_post_send_locked()) Dropping message for 12345-60.64.1.32@o2ib1: peer not alive
            LustreError: 40750:0:(lib-move.c:2628:LNetGet()) error sending GET to 12345-60.64.1.32@o2ib1: -113

            So I also obtained debug daemon lines. The activated traces are: info, neterror, net, warning, nettrace, error and emerg. See lctl_ping_debug attachment.

            Then do you think this is an issue with OFED and not with lnet?

            Thanks,

            dmoreno Diego Moreno (Inactive) added a comment - - edited Hi, Putting both interfaces in different subnets made the trick. If I want to have my clients connecting to both interfaces then I need to create an IP alias on every client because my clients have only one IB interface, is this a problem in lustre? I also get the traces you asked me when both cards are in the same subnet. In dmesg I only obtained the next two lines: Lustre: 40750:0:(lib-move.c:1028:lnet_post_send_locked()) Dropping message for 12345-60.64.1.32@o2ib1: peer not alive LustreError: 40750:0:(lib-move.c:2628:LNetGet()) error sending GET to 12345-60.64.1.32@o2ib1: -113 So I also obtained debug daemon lines. The activated traces are: info, neterror, net, warning, nettrace, error and emerg. See lctl_ping_debug attachment. Then do you think this is an issue with OFED and not with lnet? Thanks,

            Hi Liang,

            I'll try everything you propose on your last comment. However, having two interfaces in separated subnets won't be possible on our configuration as our clients, which have only one interface, need to access both interfaces on servers, that's why they are in the same subnet.

            One possibility could be to configure an alias interface on every client with the new subnet but I don't know if this works on lustre...

            dmoreno Diego Moreno (Inactive) added a comment - Hi Liang, I'll try everything you propose on your last comment. However, having two interfaces in separated subnets won't be possible on our configuration as our clients, which have only one interface, need to access both interfaces on servers, that's why they are in the same subnet. One possibility could be to configure an alias interface on every client with the new subnet but I don't know if this works on lustre...
            liang Liang Zhen (Inactive) added a comment - - edited

            It would be a little helpful if you can "echo +neterror > > /proc/sys/lnet/printk" on both server and the client, and reproduce it to get o2iblnd error message on both sides (very likely, you will see nothing on server side).

            Actually, I remember other people got into similar trouble while having multiple interfaces with same netmask and in the same subnet, and the problem disappeared after change them to different subnets. Would it be possible to change one of those address and make two interfaces in separated subnets and with different netmask? If this can help, it could be an issue of OFED although I'm not sure.

            liang Liang Zhen (Inactive) added a comment - - edited It would be a little helpful if you can "echo +neterror > > /proc/sys/lnet/printk" on both server and the client, and reproduce it to get o2iblnd error message on both sides (very likely, you will see nothing on server side). Actually, I remember other people got into similar trouble while having multiple interfaces with same netmask and in the same subnet, and the problem disappeared after change them to different subnets. Would it be possible to change one of those address and make two interfaces in separated subnets and with different netmask? If this can help, it could be an issue of OFED although I'm not sure.
            pjones Peter Jones added a comment -

            Sorry, but, while this is undoubtedly an important issue for CEA, this is not a general enough issue to be a 2.1 blocker.

            pjones Peter Jones added a comment - Sorry, but, while this is undoubtedly an important issue for CEA, this is not a general enough issue to be a 2.1 blocker.

            Hi Liang,

            On server's side:

            • OFED: 1.5.2 (kernel-ib-1.5-2.6.32_71.14.1.el6.Bull.20.x86_64.ofed1.5.2.Bull.4.el6.x86_64)
            • lnet config:
              export LNET_MULTIRAIL_OPTIONS="networks=o2ib0(ib0),o2ib1(ib1)"
            • kernel version: kernel-2.6.32-71.14.1.el6.Bull.20.x86_64
            • ifconfig:
              ib0 Link encap:InfiniBand HWaddr 80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00
              inet addr:60.64.0.32 Bcast:60.64.255.255 Mask:255.255.0.0
              inet6 addr: fe80::202:c903:a:b73f/64 Scope:Link
              UP BROADCAST RUNNING MULTICAST MTU:65520 Metric:1
              RX packets:659 errors:0 dropped:0 overruns:0 frame:0
              TX packets:105 errors:0 dropped:5 overruns:0 carrier:0
              collisions:0 txqueuelen:256
              RX bytes:48041 (46.9 KiB) TX bytes:18021 (17.5 KiB)
              ib1 Link encap:InfiniBand HWaddr 80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00
              inet addr:60.64.1.32 Bcast:60.64.255.255 Mask:255.255.0.0
              inet6 addr: fe80::202:c903:4:89b1/64 Scope:Link
              UP BROADCAST RUNNING MULTICAST MTU:65520 Metric:1
              RX packets:557 errors:0 dropped:0 overruns:0 frame:0
              TX packets:0 errors:0 dropped:5 overruns:0 carrier:0
              collisions:0 txqueuelen:256
              RX bytes:31192 (30.4 KiB) TX bytes:0 (0.0 b)
            • Routing:
              [root@berlin8 ~]# route
              Kernel IP routing table
              Destination Gateway Genmask Flags Metric Ref Use Iface
              60.64.0.0 * 255.255.0.0 U 0 0 0 ib0
              60.64.0.0 * 255.255.0.0 U 0 0 0 ib1
              60.0.0.0 * 255.248.0.0 U 0 0 0 eth0
              default berlin32.echi.l 0.0.0.0 UG 0 0 0 eth0

            On client's side:

            • OFED: 1.5.2 (kernel-ib-1.5-2.6.32_71.14.1.el6.Bull.20.x86_64.ofed1.5.2.Bull.4.el6.x86_64)
            • lnet config:
              export LNET_MULTIRAIL_OPTIONS="networks=o2ib0(ib0),o2ib1(ib0)"
            • kernel version: kernel-2.6.32-71.14.1.el6.Bull.20.x86_64
            • ifconfig:
              ib0 Link encap:InfiniBand HWaddr 80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00
              inet addr:60.64.2.57 Bcast:60.64.255.255 Mask:255.255.0.0
              inet6 addr: fe80::230:48ff:fff4:ca15/64 Scope:Link
              UP BROADCAST RUNNING MULTICAST MTU:65520 Metric:1
              RX packets:1188 errors:0 dropped:0 overruns:0 frame:0
              TX packets:86 errors:0 dropped:5 overruns:0 carrier:0
              collisions:0 txqueuelen:256
              RX bytes:83356 (81.4 KiB) TX bytes:14580 (14.2 KiB)
            • Routing:
              [root@berlin71 ~]# route
              Kernel IP routing table
              Destination Gateway Genmask Flags Metric Ref Use Iface
              60.64.0.0 * 255.255.0.0 U 0 0 0 ib0
              60.0.0.0 * 255.248.0.0 U 0 0 0 eth2
              default berlin32.echi.l 0.0.0.0 UG 0 0 0 eth2

            Do you need any other information?

            dmoreno Diego Moreno (Inactive) added a comment - Hi Liang, On server's side: OFED: 1.5.2 (kernel-ib-1.5-2.6.32_71.14.1.el6.Bull.20.x86_64.ofed1.5.2.Bull.4.el6.x86_64) lnet config: export LNET_MULTIRAIL_OPTIONS="networks=o2ib0(ib0),o2ib1(ib1)" kernel version: kernel-2.6.32-71.14.1.el6.Bull.20.x86_64 ifconfig: ib0 Link encap:InfiniBand HWaddr 80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00 inet addr:60.64.0.32 Bcast:60.64.255.255 Mask:255.255.0.0 inet6 addr: fe80::202:c903:a:b73f/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:65520 Metric:1 RX packets:659 errors:0 dropped:0 overruns:0 frame:0 TX packets:105 errors:0 dropped:5 overruns:0 carrier:0 collisions:0 txqueuelen:256 RX bytes:48041 (46.9 KiB) TX bytes:18021 (17.5 KiB) ib1 Link encap:InfiniBand HWaddr 80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00 inet addr:60.64.1.32 Bcast:60.64.255.255 Mask:255.255.0.0 inet6 addr: fe80::202:c903:4:89b1/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:65520 Metric:1 RX packets:557 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:5 overruns:0 carrier:0 collisions:0 txqueuelen:256 RX bytes:31192 (30.4 KiB) TX bytes:0 (0.0 b) Routing: [root@berlin8 ~] # route Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 60.64.0.0 * 255.255.0.0 U 0 0 0 ib0 60.64.0.0 * 255.255.0.0 U 0 0 0 ib1 60.0.0.0 * 255.248.0.0 U 0 0 0 eth0 default berlin32.echi.l 0.0.0.0 UG 0 0 0 eth0 On client's side: OFED: 1.5.2 (kernel-ib-1.5-2.6.32_71.14.1.el6.Bull.20.x86_64.ofed1.5.2.Bull.4.el6.x86_64) lnet config: export LNET_MULTIRAIL_OPTIONS="networks=o2ib0(ib0),o2ib1(ib0)" kernel version: kernel-2.6.32-71.14.1.el6.Bull.20.x86_64 ifconfig: ib0 Link encap:InfiniBand HWaddr 80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00 inet addr:60.64.2.57 Bcast:60.64.255.255 Mask:255.255.0.0 inet6 addr: fe80::230:48ff:fff4:ca15/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:65520 Metric:1 RX packets:1188 errors:0 dropped:0 overruns:0 frame:0 TX packets:86 errors:0 dropped:5 overruns:0 carrier:0 collisions:0 txqueuelen:256 RX bytes:83356 (81.4 KiB) TX bytes:14580 (14.2 KiB) Routing: [root@berlin71 ~] # route Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 60.64.0.0 * 255.255.0.0 U 0 0 0 ib0 60.0.0.0 * 255.248.0.0 U 0 0 0 eth2 default berlin32.echi.l 0.0.0.0 UG 0 0 0 eth2 Do you need any other information?
            liang Liang Zhen (Inactive) added a comment - - edited

            Could you please provide ifconfig output and routing table of the server? Also, which OFED version and kernel version are you using?

            Thanks
            Liang

            liang Liang Zhen (Inactive) added a comment - - edited Could you please provide ifconfig output and routing table of the server? Also, which OFED version and kernel version are you using? Thanks Liang

            Hi,

            Bad news on this, because CEA is now experiencing this issue on one of their cluster, but the problem is the workaround that consists in 'lctl pinging' all interfaces does not scale. So they are blocked, and this is why I would like to change this ticket's priority to Blocker.

            Could you please have a look at this very soon?

            TIA,
            Sebastien.

            sebastien.buisson Sebastien Buisson (Inactive) added a comment - Hi, Bad news on this, because CEA is now experiencing this issue on one of their cluster, but the problem is the workaround that consists in 'lctl pinging' all interfaces does not scale. So they are blocked, and this is why I would like to change this ticket's priority to Blocker. Could you please have a look at this very soon? TIA, Sebastien.

            People

              liang Liang Zhen (Inactive)
              dmoreno Diego Moreno (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: