LNet MultiRail selects routes from local to remote in order.
Local device is detemined in round-robin fashion,
so even if Health value of a remote device(A) is smaller than another remote device(B),
a local device(A) which is the peer of the remote device(A) may be selected.
If device(A) and device(B) are on different subnets, a failure route will be selected.
Subnet1 Subnet2
(DOWN) |
REMOTE IB(A) | IB(B)
↑ |
x |
| |
LOCAL IB(A) | IB(B)
local device is selected in round-robin fashion regardless of remote side device status
We modify the finding best local device algorithm as follows,
- get the maximum health value of remote device
- if the value is smaller than the best health value, don't use this device
- if the value is bigger than the best health value, update the best device
- if the value is identical with the best value, update the best device by conventional way
- update the best health value