Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13575

LNet should ensure round-robin interface selection when interfaces are healthy

    XMLWordPrintable

Details

    • Improvement
    • Resolution: Fixed
    • Minor
    • Lustre 2.15.0
    • None
    • None

    Description

      When an interface fails and stays out of commission for a period of time, and then is brought back into commission, the sequence numbers for the interface which has been currently in use would be far larger than the newly commissioned interface. This leads to the new interface being used continuously until its sequence number catches up with the in use interface. This is not ideal behavior, because the system has two available interfaces, but only one is being used simply because of the sequence number, which is intended to allow round robin. Ideally, once an interface comes back into service, it should immediately be used.

      A similar thing happens when there are a lot of source specified sends. One NI gets a bunch of sequence increments so then it takes a while for other NIs to "catch up".

      We should modify the sequence number manipulation to help ensure we actually round robin when desired, or otherwise modify the relevant code.

      Attachments

        Activity

          People

            hornc Chris Horn
            hornc Chris Horn
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: