[LU-16999] Revert caf6095ade LU-15595 lnet: LNet peer aliveness broken Created: 27/Jul/23 Updated: 24/Aug/23 Resolved: 24/Aug/23 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.16.0 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Chris Horn | Assignee: | Chris Horn |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
This patch restored the historic behavior of the LNet router peer health feature, but it did not account for the fact that the old lnet router checker behaved differently than the current implementation that leverages LNet discovery to perform the router checker pings. Because of this change to use discovery we can no longer guarantee that each router end point will be ping'd within peer aliveness window, and as a result the router may incorrectly determine that some peer NIs are not alive. Just revert this for now |
| Comments |
| Comment by Gerrit Updater [ 27/Jul/23 ] |
|
"Chris Horn <chris.horn@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/51791 |
| Comment by Gerrit Updater [ 24/Aug/23 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/51791/ |
| Comment by Peter Jones [ 24/Aug/23 ] |
|
Landed for 2.16 |