[LU-14978] LNet: balance peer NI selection if peer NI is added late Created: 01/Sep/21  Updated: 17/Jan/24  Resolved: 17/Jan/24

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Minor
Reporter: Serguei Smirnov Assignee: Serguei Smirnov
Resolution: Duplicate Votes: 0
Labels: Multi-Rail, lnet

Issue Links:
Related
Rank (Obsolete): 9223372036854775807

 Description   

Currently if peer NI is added late, such that the communication to the peer has been already happening, assuming peer NIs are equal otherwise, the LNet is going to switch to using the newly discovered NI and ignore others until the "sequence count" (i.e. the count of packets sent to the NI) on the new peer NI becomes level with the counts on the peer NIs that were available previously. Same issue can be seen with the peer NI that comes back from an "unhealthy" state.

This creates an imbalance which can last for quite a while.



 Comments   
Comment by Chris Horn [ 17/May/22 ]

I believe this is a duplicate of https://jira.whamcloud.com/browse/LU-13575 and https://jira.whamcloud.com/browse/LU-15731

Generated at Sat Feb 10 03:14:23 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.