Details
-
Improvement
-
Resolution: Fixed
-
Minor
-
None
-
None
-
9223372036854775807
Description
Typically, LNet peers do not perform discovery on themselves, so it is often the case that there is a non-MR peer entry for each local interface. For example:
[root@kjcf01n05 ~]# lctl list_nids 10.253.100.9@o2ib 10.253.100.10@o2ib [root@kjcf01n05 ~]# lnetctl peer show --nid 10.253.100.9@o2ib peer: - primary nid: 10.253.100.9@o2ib Multi-Rail: False peer ni: - nid: 10.253.100.9@o2ib state: NA [root@kjcf01n05 ~]# lnetctl peer show --nid 10.253.100.10@o2ib peer: - primary nid: 10.253.100.10@o2ib Multi-Rail: False peer ni: - nid: 10.253.100.10@o2ib state: NA [root@kjcf01n05 ~]#
Because of this, LNet sets a "preferred" local NI to use when sending traffic to these non-MR peers. This prevents LNet recovery pings from exercising other paths. e.g. consider a peer with two local interfaces, heth0 and heth1. We have the following paths for sending to heth0:
heth0 -> heth0 heth1 -> heth0
And paths for sending to heth1:
heth0 -> heth1 heth1 -> heth1
Because of the preferred NI for non-MR peer logic, whichever path is first chosen will then be used for every future send to that NI (unless the peer entry is deleted, then a new path may be chosen). It is not clear whether these local recovery pings are particularly useful in ascertaining the health of local interfaces, but if they are, then it seems we ought to allow LNet to exercise all possible paths.
Attachments
Activity
Fix Version/s | New: Lustre 2.15.0 [ 14791 ] | |
Resolution | New: Fixed [ 1 ] | |
Status | Original: Open [ 1 ] | New: Resolved [ 5 ] |