Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17505

socklnd: return LNET_MSG_STATUS_NETWORK_TIMEOUT to LNet on ETIMEDOUT

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      Currently socklnd returns LNET_MSG_STATUS_LOCAL_TIMEOUT to LNet if ETIMEDOUT error occurs. This causes LNet to only decrement the local NI health score, while the issue may actually be with the remote NI. Because of this, peer NI health is not decremented and so LNet continues to believe it is as good to select for sending as other options.

      Returning  LNET_MSG_STATUS_NETWORK_TIMEOUT would cause LNet to decrement both local NI and peer NI health. If local NI is ok, it will recover its score quickly, but the proposed change would allow peer NI score to be properly lowered until it is recovered.

      Attachments

        Issue Links

          Activity

            People

              ssmirnov Serguei Smirnov
              ssmirnov Serguei Smirnov
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: