[LU-14978] LNet: balance peer NI selection if peer NI is added late Created: 01/Sep/21 Updated: 17/Jan/24 Resolved: 17/Jan/24 |
|
| Status: | Closed |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Minor |
| Reporter: | Serguei Smirnov | Assignee: | Serguei Smirnov |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | Multi-Rail, lnet | ||
| Issue Links: |
|
||||
| Rank (Obsolete): | 9223372036854775807 | ||||
| Description |
|
Currently if peer NI is added late, such that the communication to the peer has been already happening, assuming peer NIs are equal otherwise, the LNet is going to switch to using the newly discovered NI and ignore others until the "sequence count" (i.e. the count of packets sent to the NI) on the new peer NI becomes level with the counts on the peer NIs that were available previously. Same issue can be seen with the peer NI that comes back from an "unhealthy" state. This creates an imbalance which can last for quite a while. |
| Comments |
| Comment by Chris Horn [ 17/May/22 ] |
|
I believe this is a duplicate of https://jira.whamcloud.com/browse/LU-13575 and https://jira.whamcloud.com/browse/LU-15731 |