Details
-
Bug
-
Resolution: Fixed
-
Minor
-
None
-
3
-
9223372036854775807
Description
Currently socklnd returns LNET_MSG_STATUS_LOCAL_TIMEOUT to LNet if ETIMEDOUT error occurs. This causes LNet to only decrement the local NI health score, while the issue may actually be with the remote NI. Because of this, peer NI health is not decremented and so LNet continues to believe it is as good to select for sending as other options.
Returning LNET_MSG_STATUS_NETWORK_TIMEOUT would cause LNet to decrement both local NI and peer NI health. If local NI is ok, it will recover its score quickly, but the proposed change would allow peer NI score to be properly lowered until it is recovered.
Attachments
Issue Links
- is related to
-
LU-17379 try MGS NIDs more quickly at initial mount
- Resolved