Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13306

allow clients to accept mgs_nidtbl_entry with IPv6 NIDs

    XMLWordPrintable

Details

    • Improvement
    • Resolution: Fixed
    • Major
    • Lustre 2.16.0
    • None
    • 9223372036854775807

    Description

      Currently, if a client receives an mgs_nidtbl_entry with mne_nid_type != 0 or mne_nid_size != sizeof(lnet_nid_t)=8 the client will stop processing all later records, essentially cutting the client off from OST/MDT IR logs that have even one IPv6 addresses configured.

      To add IPv6 NID support, but preserve compatibility with older clients, there should be a new connect flag sent to the MGS at connect time:

      #define OBD_CONNECT2_MNE_TYPE         0x1000000ULL /* mne_nid_type flags */
      

      that should be saved in the client export on the MGS and used by mgs_nidtbl_read() to decide what type of NIDs should be sent to the client. The client should also send the flags that it understands MNE_NID_SUPPORTED = (MNE_NID_TYPE_* | (MNR_REC_FLAG_* << 8)) in struct obd_connect_data.padding0 renamed to ocd_mne_type so that the MGS can also avoid sending future record types to a client that it doesn't understand. The MGS reply should mask out any flags in ocd_mne_type that it doesn't support itself.  If a client export does not have OBD_CONNECT2_MNE_TYPE set, the MGS should only send compatible lnet_nid_t records with mne_nid_type=0 and mne_nid_size=sizeof(lnet_nid_t)=8.

      To allow future expansion, the mne_rec_type field should be treated as a bitmask of features present in the mgs_nidtbl_entry instead of a enumerated type. The first bit set in mne_nid_type=MNE_NID_TYPE_RECORD=0x01 would be used for expanding the record size, with the larger per-NID records as follows, with mne_nid_size=24, and a per-record mnr_rec_flags=MNR_REC_FLAG_NIDv6=0x01 to allow positively identifying the record content rather than depending on "all zero" fields to detect the difference:

      struct mne_nid_record {         /* when MNE_NID_TYPE_RECORD is set */
              __u8                    mnr_rec_flags;  /* MNR_REC_FLAG_* flags */
              __u8                    mnr_padding1;
              __u16                   mnr_padding2;
              struct lnet_nid_v6      mnr_nid_v6;
              /* other fields depending on mne_nid_type, mne_nid_size, mnr_rec_flags */
      };
      
      struct mgs_nidtbl_entry {
              __u64           mne_version;    /* table version of this entry */
              __u32           mne_instance;   /* target instance # */
              __u32           mne_index;      /* target index */
              __u32           mne_length;     /* length of this entry - by bytes */
              __u8            mne_type;       /* target type LDD_F_SV_TYPE_OST/MDT */
              __u8            mne_nid_type;   /* MNE_NID_TYPE_* flags */
              __u8            mne_nid_size;   /* size of each NID, by bytes */
              __u8            mne_nid_count;  /* # of NIDs in buffer */
              union {         /* don't walk the union array directly, use mne_nid_size */
                      lnet_nid_t             mne_nids[];     /* when mne_nid_type == 0 */
                      struct mnr_nid_record mne_records[];  /* if MNE_NID_TYPE_RECORD */
              };
      };
      

      The 4 extra bytes of fields at the start plus the 20-byte mnr_nid_v6 field keep all records with 64-bit size and alignment. Since lnet_nid_v6 can also encapsulate an IPv4 NID (based on .nid_type) there is no need to handle lnet_nid_t directly within mnr_nid_record. However, MNR_REC_FLAG_NIDv6 should be verified in case there is some other kind of NID in the record.

      When processing mne_nids or mne_records, it should be using mne_nid_size to walk the records, rather than having a loop that processes mne_nids[i] as an array, otherwise this will break if mne_nid_size changes in the future (ignoring any potential changes from LU-10360):

                      void *ptr;
      
                      /* iterate all nids to find one UUID by NID */
                      rc = -ENOENT;
      
                      for (i = 0, ptr = entry->mne_records; i < entry->mne_nid_count; i++, ptr += entry->mne_nid_size) {
                              lnet_nid_v6 nid_v6;
      
                              if (entry->mne_nid_type & MNE_NID_TYPE_RECORD) {
                                      struct mnr_nid_record *rec = ptr;
        
                                      if (!(rec->mnr_rec_flags & MNR_REC_FLAG_NIDv6))
                                              continue;
                                      nid_v6 = rec->mnr_nid_v6;
                              } else {
                                      lnet_nid2nidv6(*(lnet_nid_t *)ptr, &nid_v6);
                              }
                  
                              rc = client_import_find_conn(obd->u.cli.cl_import, nid_v6,
                                                           (struct obd_uuid *)uuid);
                              if (rc == 0)
                                      break;
                      }
      
      

      Attachments

        Issue Links

          Activity

            People

              simmonsja James A Simmons
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: