Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-6060

ARF doesn't detect lack of interface on a router


    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: Lustre 2.5.3
    • Fix Version/s: Lustre 2.7.0
    • Labels:
    • Severity:
    • Rank (Obsolete):


      When using Asymmetric router failure detection, the system appears unable to determine the lack of an expected interface. While a defined but non-functional interface is detected, the clients do not seem to detect when they have a route to a network via a router but that router has no means of getting the traffic there.

      Take for example a few nodes, login1, rtr5, rtr6, and mgs. This was demonstrated on live hardware, although the following example is abstracted/has changed addresses and names.

      Host: interfaces (routes)
      login1: 30@gni1 (o2ib1 via 27@gni1, o2ib1 via 31@gni1)
      rtr5: 27@gni1 ()
      rtr6: 31@gni1 ()
      mgs: (gni1 via and gni1 via

      In other words, we have two routers with two interfaces each sitting between LNET1 and GNI1.

      Reproduction steps:
      Enable ARF via configs, ensure running
      Configure interface ib0 on rtr5 to not start on boot.
      Reboot rtr5 (ifconfig ib0 shows no ib0 down / no IP)
      start lnet (lctl net up)

      show missing interface on rtr5 via lctl list_nids
      rtr5:~ # lctl list_nids
      rtr5:~ #

      on login1 ping mgs
      lctl ping (result is 50% success, 50% I/O error)

      show routes
      login1:~ # lctl show_route
      net o2ib1 hops 1 gw 27@gni1 up pri 0
      net o2ib1 hops 1 gw 31@gni1 up pri 0

      look for down_ni
      login1:~ # cat /proc/sys/lnet/routers
      ref rtr_ref alive_cnt state last_ping ping_sent deadline down_ni router
      4 1 1 up 28 1 NA 0 27@gni1
      4 1 1 up 28 1 NA 0 31@gni1

      In other words, there is no way to get to o2ib1 via rtr5, but arf does not detect this. Presumably, at least in a non-multihop configuration, clients should be concerned not with whether the router has defined routes that aren't working, but wether the client has a defined route that a router can't handle due to a down interface or a lack of an interface.


          Issue Links



              • Assignee:
                wc-triage WC Triage
                lewisj John Lewis
              • Votes:
                0 Vote for this issue
                10 Start watching this issue


                • Created: