Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-15909

Track Peer/Network credits at peer net/net level - use for path selection

    XMLWordPrintable

Details

    • Improvement
    • Resolution: Unresolved
    • Minor
    • None
    • None
    • None
    • 9223372036854775807

    Description

      If an lnet peer has multiple networks configured we will currently round robin between them regardless of the relative capabilities/capacities of each network. This can lead to a situation where 1/(# nets) traffic is sent via a slow protocol (like tcp) when it would be better to use a faster protocol.

      This situation can be remedied in Lustre 2.15 by defining net selection rules. This tells LNet to prioritize using some nets over other ones. But this may still not be ideal.

      Suppose we have a peer with two cassini interfaces and two ethernet interfaces:

          - net type: tcp
            local NI(s):
              - nid: 172.18.2.3@tcp
                status: up
                interfaces:
                    0: enp65s0
              - nid: 172.18.2.4@tcp
                status: up
                interfaces:
                    0: enp65s1
          - net type: gni
            local NI(s):
              - nid: 17@gni
                status: up
              - nid: 18@gni
                status: up
      

      Without udsp, half of all traffic will be sent over the tcp network which is much slower than the gni network.

      With udsp, we can add a rule so that all traffic will be sent over the gni network unless there is a problem and the tcp interfaces have higher health value than the gni interfaces.

      This may seem ideal, but it could be the case that all available resources on the gni interfaces are consumed. In this case, LNet will queue messages on the gni interfaces until a resource becomes available. Meanwhile, the tcp interfaces may be completely idle.

      I propose to add resource tracking at the local net/peer net level. This will allow LNet to choose a network (local or peer) based on the resources available in that network (which are simply the sum of the resources available to the NIs belonging to the network).

      This should allow us to get most of the benefit of the UDSP network selection rule but also enable us to fully leverage all network capacity in a more intelligent manner than round-robin across all nets.

      A side benefit is that we can get rid of the round robin behavior altogether. Round robin relies on sequence numbers, and there is a potential scenario where the round robin behavior can be broken.

      On every send to some peer we increment a sequence number for the source interface and a sequence number for the peer. The sequence numbers are unsigned 32 bit ints, so if we happen to wrap the sequence number in just the right way we can end up in a situation where the sequence of some ni is UINT_MAX but the next send sets some other NI to 0. Then all future sends get funneled to the NI with the lower sequence number.

      Attachments

        Activity

          People

            hornc Chris Horn
            hornc Chris Horn
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: