Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5570

Better router selection in LNet

    XMLWordPrintable

Details

    • Improvement
    • Resolution: Fixed
    • Minor
    • None
    • None
    • None
    • 15533

    Description

      LNet chooses routers based on queued bytes on routers, at the meanwhile, it normally takes tens of seconds to detect dead routers (we see failed completion event of outstanding tx/rx, then close connection and notify LNet peer is dead) , which means it is still possible to queue more messages to a potentially dead router if all other alive routers have long message queue.

      we may need to check aliveness timestamp as part of router evaluation, and avoid to choose those routers that are inactive for certain number of seconds as long as there are other active routers (it takes pretty long to mark a router as dead, we might prefer not to choose it before marking it as dead)

      Attachments

        Issue Links

          Activity

            People

              liang Liang Zhen (Inactive)
              liang Liang Zhen (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: