[LU-16214] Minimize dropping kfilnd messages at target due to stale peer Created: 05/Oct/22 Updated: 19/Jan/23 Resolved: 19/Jan/23 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.16.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Chris Horn | Assignee: | Chris Horn |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
When LNET is restarted at a target the kfilnd peer is marked as stale. An incoming message from the initiator peer is silently dropped and a peer handshake exchange is started. While this works as designed, since kfilnd is connectionless, it is not optimal because kfilnd clients must rely on their error handing to figure out a message was dropped and then retry. The purpose of this story is to determine how to minimize the occurrence of dropped kfilnd messages. One thought is to have kfilnd proactively do a hello handshake on a send if a message hasn't been received from a peer for some period of time. Similar to what it does when it knows the peer is stale. |
| Comments |
| Comment by Gerrit Updater [ 05/Oct/22 ] |
|
"Chris Horn <chris.horn@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/48785 |
| Comment by Gerrit Updater [ 05/Oct/22 ] |
|
"Chris Horn <chris.horn@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/48786 |
| Comment by Gerrit Updater [ 19/Jan/23 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/48785/ |
| Comment by Gerrit Updater [ 19/Jan/23 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/48786/ |
| Comment by Peter Jones [ 19/Jan/23 ] |
|
Landed for 2.16 |