Details
-
Bug
-
Resolution: Fixed
-
Minor
-
None
-
None
-
3
-
9223372036854775807
Description
When LNET is restarted at a target the kfilnd peer is marked as stale. An incoming message from the initiator peer is silently dropped and a peer handshake exchange is started.
While this works as designed, since kfilnd is connectionless, it is not optimal because kfilnd clients must rely on their error handing to figure out a message was dropped and then retry.
The purpose of this story is to determine how to minimize the occurrence of dropped kfilnd messages.
One thought is to have kfilnd proactively do a hello handshake on a send if a message hasn't been received from a peer for some period of time. Similar to what it does when it knows the peer is stale.