Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5570

Better router selection in LNet

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None
    • Rank (Obsolete):
      15533

      Description

      LNet chooses routers based on queued bytes on routers, at the meanwhile, it normally takes tens of seconds to detect dead routers (we see failed completion event of outstanding tx/rx, then close connection and notify LNet peer is dead) , which means it is still possible to queue more messages to a potentially dead router if all other alive routers have long message queue.

      we may need to check aliveness timestamp as part of router evaluation, and avoid to choose those routers that are inactive for certain number of seconds as long as there are other active routers (it takes pretty long to mark a router as dead, we might prefer not to choose it before marking it as dead)

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                liang Liang Zhen (Inactive)
                Reporter:
                liang Liang Zhen (Inactive)
              • Votes:
                0 Vote for this issue
                Watchers:
                12 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: