Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13714

LNet Router: ni status flip flopping unnecessarily

    Details

    • Type: Bug
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None
    • Severity:
      3
    • Rank (Obsolete):
      9223372036854775807

      Description

      Each peer with a route discovers the gateway specified in the route. The gateway updates its NI status to UP when it receives a message. If it doesn't receive a message on one of its NIs for a timeout period then it brings the NI status down. This helps other peers to decide whether to use the route or not if their asym_router is set to 1.

      However there is a flaw in the logic. The NI status is set to UP on any message received.  This is problematic because the gateway pushes an update when it's NI status goes down. This PUSH message gets an ACK from the peer without the route. That ACK ends up setting the NI status to UP. So we endup toggling the NI status.

      Because of this flaw, the other peers see the route go down and up almost immediately, even though the other side has removed its route.

      The gateway's NI status should be set to UP only when it receives a discovery GET message. Basically a GET on the RESERVED portal.

      This solution is not entirely fool proof. A manually triggered discovery from the peer to the gateway will bring the NI status UP, because the gateway can not make a distinction between a manual discovery or a gateway alive discovery. However, this is the best we can do.

        Attachments

          Activity

            People

            • Assignee:
              ashehata Amir Shehata
              Reporter:
              ashehata Amir Shehata
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: