Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3679

/proc/sys/lnet/routes should accurately reflect routing with ARF when LNet router has one or more down NIs

Details

    • 9498

    Description

      On a system where an LNet router has more than one NI, ARF is configured on clients and servers, and one or more of the LNet router's NIs goes "down", /proc/sys/lnet/routes on clients/servers should show routes for that router as "down" rather than "up".

      The story: A site was doing some tests of FGR where LNet routers had two IB interfaces. After seeing wide variations in packet counts between ib0 and ib1, they noticed that some NIs were down on the routers

      > lnet6: nid status alive refs peer rtr max tx min
      > lnet6: 0@lo up 0 2 0 0 0 0 0
      > lnet6: 454@gni up 0 679 16 0 2048 2048 1664
      > lnet6: 10.100.100.160@o2ib1000 up 18 3 63 128 2048 2048 2047
      > lnet6: 10.100.100.160@o2ib1002 up 12 4 63 128 2048 2048 2047
      > lnet6: 10.100.100.160@o2ib1004 up 0 4 63 128 2048 2048 1859
      > lnet6: 10.100.100.161@o2ib1006 down 66420 1 63 128 2048 2048 2048
      > lnet6: 10.100.100.161@o2ib1007 down 66420 1 63 128 2048 2048 2048

      but were up for IPOIB. This caused some confusion, and was compounded by the fact that clients show these routes as still functional:

      cat /proc/sys/lnet/routes | grep 454
      o2ib1000 2 up 454@gni
      o2ib1002 2 up 454@gni
      o2ib1004 1 up 454@gni
      o2ib1006 1 up 454@gni
      o2ib1007 2 up 454@gni

      This lead people to believe that clients were still trying to use routes that were actually down resulting in performance problems. Since ARF was configured, we know this wasn't actually the case. Clients will not use a router if that router has one or more down NIs. This should be reflected in the output of /proc/sys/lnet/routes.

      Attachments

        Activity

          People

            dmiter Dmitry Eremin (Inactive)
            hornc Chris Horn
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: