Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13780

Leverage peer aliveness more efficiently

    XMLWordPrintable

Details

    • Improvement
    • Resolution: Fixed
    • Minor
    • Lustre 2.15.0
    • None
    • None
    • 9223372036854775807

    Description

      When an LNet router is revived after going down, remote peers may
      discover it is alive before we do. Thus, remote peers may use it
      as a next-hop, and we may start receiving messages from it while we
      still consider it to be dead. We should mark router peers as alive
      when we receive a message from them.

      If an LNet router does not respond to a discovery ping, then we
      currently mark all of its NIs as DOWN. This can actually slow down
      the process of returning a route to service. If we receive a message
      from a router, in the manner described above, then we can safely
      return the router to service. We already set the status of the router
      NI we received the message from to UP, but the remote NIs will still
      be DOWN and thus the route will be considered down until we get a
      reply to the next discovery ping.

      When selecting a route, we only consider the aliveness of a gateway's
      remote NIs if avoid_asym_router_failure is enabled and the route is
      single-hop. In this case, as long as the gateway has at least one
      alive NI on the remote network then the route is considered UP. In
      the situation described above, we know the router has at least one
      NI alive because it was used to forward a message from a remote peer.
      Thus, when we receive a forwarded message from a router, we can
      reasonably set the NI status of all of its NIs that are on the same
      peer net as the message originator to UP. This does not impact the
      route status of any multi-hop routes because we do not consider the
      aliveness of remote NIs for multi-hop routes.

      Similarly, we can set the cached lr_alive value to up for any routes
      whose lr_net matches the net ID of the message originator NID.

      Attachments

        Activity

          People

            hornc Chris Horn
            hornc Chris Horn
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: