[LU-13778] LNet Router: bug in routing selection algorithm Created: 11/Jul/20  Updated: 16/Oct/20

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Amir Shehata (Inactive) Assignee: Amir Shehata (Inactive)
Resolution: Unresolved Votes: 0
Labels: lnet

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

If we have previously selected a source NID to send from, we need to select a routed network which we can reach from the source NID we have specified. The current logic considers src NID after we've selected the remote net. When we look for a gateway to that remote net, since we didn't consider the src NID ahead of time we could end up not finding a gateway on the same net as the src NID, and then fail the send with host EHOSTUNREACH

This can happen in a setup as follows:

src NID A -> GATEWAY A -> remote Net 1
src NID B -> GATEWAY B -> remote Net 2

Both remote Nets 1 and 2  are reacheable via two different gateways. However, we want to restrict on src NID A. The current algorithm could give us Gateway B which would result in a EHOSTUNREACH

Another issue here is that when source NID is specified we want to end up sending to the same destination NID. This is to ensure that we keep the original NI selection by the initiator which could be NUMA optimal.


Generated at Sat Feb 10 03:04:08 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.