Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13714

LNet Router: ni status flip flopping unnecessarily

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.15.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      Each peer with a route discovers the gateway specified in the route. The gateway updates its NI status to UP when it receives a message. If it doesn't receive a message on one of its NIs for a timeout period then it brings the NI status down. This helps other peers to decide whether to use the route or not if their asym_router is set to 1.

      However there is a flaw in the logic. The NI status is set to UP on any message received.  This is problematic because the gateway pushes an update when it's NI status goes down. This PUSH message gets an ACK from the peer without the route. That ACK ends up setting the NI status to UP. So we endup toggling the NI status.

      Because of this flaw, the other peers see the route go down and up almost immediately, even though the other side has removed its route.

      The gateway's NI status should be set to UP only when it receives a discovery GET message. Basically a GET on the RESERVED portal.

      This solution is not entirely fool proof. A manually triggered discovery from the peer to the gateway will bring the NI status UP, because the gateway can not make a distinction between a manual discovery or a gateway alive discovery. However, this is the best we can do.

      Attachments

        Activity

          People

            ashehata Amir Shehata (Inactive)
            ashehata Amir Shehata (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: