Details
-
Bug
-
Resolution: Fixed
-
Minor
-
None
-
None
-
3
-
9223372036854775807
Description
Multi-rail peer can have multiple local NIDs, but LNetDist() will only identify a NID as local if it is the first one returned by lnet_get_next_ni_locked().
Here's the code:
while ((ni = lnet_get_next_ni_locked(NULL, ni))) {
if (ni->ni_nid == dstnid) {
if (srcnidp != NULL)
*srcnidp = dstnid;
if (orderp != NULL) {
if (dstnid == LNET_NID_LO_0)
*orderp = 0;
else
*orderp = 1;
}
lnet_net_unlock(cpt);
return local_nid_dist_zero ? 0 : 1;
}
if (LNET_NIDNET(ni->ni_nid) == dstnet) {
/* Check if ni was originally created in
* current net namespace.
* If not, assign order above 0xffff0000,
* to make this ni not a priority. */
if (current->nsproxy &&
!net_eq(ni->ni_net_ns, current->nsproxy->net_ns))
order += 0xffff0000;
if (srcnidp != NULL)
*srcnidp = ni->ni_nid;
if (orderp != NULL)
*orderp = order;
lnet_net_unlock(cpt);
return 1;
}
order++;
}
If a peer has two nids on same net, x@o2ib and y@o2ib, then LNetDist() will return 0 for one of the NIDs and 1 for the other NID even though both NIDs are local.
This is evidenced by lctl which_nids always returning the first NI that is configured regardless of the order of arguments:
sles15c01:~ # lctl list_nids 192.168.2.38@tcp 192.168.2.39@tcp sles15c01:~ # lctl which_nid 192.168.2.38@tcp 192.168.2.39@tcp 192.168.2.38@tcp sles15c01:~ # lctl which_nid 192.168.2.39@tcp 192.168.2.38@tcp 192.168.2.38@tcp sles15c01:~ #