Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-14649

LNetDist() may not return 0 for local NID

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.15.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      Multi-rail peer can have multiple local NIDs, but LNetDist() will only identify a NID as local if it is the first one returned by lnet_get_next_ni_locked().

      Here's the code:

              while ((ni = lnet_get_next_ni_locked(NULL, ni))) {
                      if (ni->ni_nid == dstnid) {
                              if (srcnidp != NULL)
                                      *srcnidp = dstnid;
                              if (orderp != NULL) {
                                      if (dstnid == LNET_NID_LO_0)
                                              *orderp = 0;
                                      else
                                              *orderp = 1;
                              }
                              lnet_net_unlock(cpt);
      
                              return local_nid_dist_zero ? 0 : 1;
                      }
      
                      if (LNET_NIDNET(ni->ni_nid) == dstnet) {
                              /* Check if ni was originally created in
                               * current net namespace.
                               * If not, assign order above 0xffff0000,
                               * to make this ni not a priority. */
                              if (current->nsproxy &&
                                  !net_eq(ni->ni_net_ns, current->nsproxy->net_ns))
                                              order += 0xffff0000;
                              if (srcnidp != NULL)
                                      *srcnidp = ni->ni_nid;
                              if (orderp != NULL)
                                      *orderp = order;
                              lnet_net_unlock(cpt);
                              return 1;
                      }
      
                      order++;
              }
      

      If a peer has two nids on same net, x@o2ib and y@o2ib, then LNetDist() will return 0 for one of the NIDs and 1 for the other NID even though both NIDs are local.

      This is evidenced by lctl which_nids always returning the first NI that is configured regardless of the order of arguments:

      sles15c01:~ # lctl list_nids
      192.168.2.38@tcp
      192.168.2.39@tcp
      sles15c01:~ # lctl which_nid 192.168.2.38@tcp 192.168.2.39@tcp
      192.168.2.38@tcp
      sles15c01:~ # lctl which_nid 192.168.2.39@tcp 192.168.2.38@tcp
      192.168.2.38@tcp
      sles15c01:~ #
      

      Attachments

        Activity

          People

            hornc Chris Horn
            hornc Chris Horn
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: