Details
-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
Lustre 2.12.1
-
None
-
3
-
9223372036854775807
Description
LNet MultiRail selects routes from local to remote in order.
Local device is detemined in round-robin fashion,
so even if Health value of a remote device(A) is smaller than another remote device(B),
a local device(A) which is the peer of the remote device(A) may be selected.
If device(A) and device(B) are on different subnets, a failure route will be selected.
Subnet1 Subnet2 (DOWN) | REMOTE IB(A) | IB(B) ↑ | x | | | LOCAL IB(A) | IB(B) local device is selected in round-robin fashion regardless of remote side device status
We modify the finding best local device algorithm as follows,
- get the maximum health value of remote device
- if the value is smaller than the best health value, don't use this device
- if the value is bigger than the best health value, update the best device
- if the value is identical with the best value, update the best device by conventional way
- update the best health value