Details
-
Improvement
-
Resolution: Fixed
-
Major
-
None
-
9223372036854775807
Description
Currently, if a client receives an mgs_nidtbl_entry with mne_nid_type != 0 or mne_nid_size != sizeof(lnet_nid_t)=8 the client will stop processing all later records, essentially cutting the client off from OST/MDT IR logs that have even one IPv6 addresses configured.
To add IPv6 NID support, but preserve compatibility with older clients, there should be a new connect flag sent to the MGS at connect time:
#define OBD_CONNECT2_MNE_TYPE 0x1000000ULL /* mne_nid_type flags */
that should be saved in the client export on the MGS and used by mgs_nidtbl_read() to decide what type of NIDs should be sent to the client. The client should also send the flags that it understands MNE_NID_SUPPORTED = (MNE_NID_TYPE_* | (MNR_REC_FLAG_* << 8)) in struct obd_connect_data.padding0 renamed to ocd_mne_type so that the MGS can also avoid sending future record types to a client that it doesn't understand. The MGS reply should mask out any flags in ocd_mne_type that it doesn't support itself. If a client export does not have OBD_CONNECT2_MNE_TYPE set, the MGS should only send compatible lnet_nid_t records with mne_nid_type=0 and mne_nid_size=sizeof(lnet_nid_t)=8.
To allow future expansion, the mne_rec_type field should be treated as a bitmask of features present in the mgs_nidtbl_entry instead of a enumerated type. The first bit set in mne_nid_type=MNE_NID_TYPE_RECORD=0x01 would be used for expanding the record size, with the larger per-NID records as follows, with mne_nid_size=24, and a per-record mnr_rec_flags=MNR_REC_FLAG_NIDv6=0x01 to allow positively identifying the record content rather than depending on "all zero" fields to detect the difference:
struct mne_nid_record { /* when MNE_NID_TYPE_RECORD is set */ __u8 mnr_rec_flags; /* MNR_REC_FLAG_* flags */ __u8 mnr_padding1; __u16 mnr_padding2; struct lnet_nid_v6 mnr_nid_v6; /* other fields depending on mne_nid_type, mne_nid_size, mnr_rec_flags */ }; struct mgs_nidtbl_entry { __u64 mne_version; /* table version of this entry */ __u32 mne_instance; /* target instance # */ __u32 mne_index; /* target index */ __u32 mne_length; /* length of this entry - by bytes */ __u8 mne_type; /* target type LDD_F_SV_TYPE_OST/MDT */ __u8 mne_nid_type; /* MNE_NID_TYPE_* flags */ __u8 mne_nid_size; /* size of each NID, by bytes */ __u8 mne_nid_count; /* # of NIDs in buffer */ union { /* don't walk the union array directly, use mne_nid_size */ lnet_nid_t mne_nids[]; /* when mne_nid_type == 0 */ struct mnr_nid_record mne_records[]; /* if MNE_NID_TYPE_RECORD */ }; };
The 4 extra bytes of fields at the start plus the 20-byte mnr_nid_v6 field keep all records with 64-bit size and alignment. Since lnet_nid_v6 can also encapsulate an IPv4 NID (based on .nid_type) there is no need to handle lnet_nid_t directly within mnr_nid_record. However, MNR_REC_FLAG_NIDv6 should be verified in case there is some other kind of NID in the record.
When processing mne_nids or mne_records, it should be using mne_nid_size to walk the records, rather than having a loop that processes mne_nids[i] as an array, otherwise this will break if mne_nid_size changes in the future (ignoring any potential changes from LU-10360):
void *ptr; /* iterate all nids to find one UUID by NID */ rc = -ENOENT; for (i = 0, ptr = entry->mne_records; i < entry->mne_nid_count; i++, ptr += entry->mne_nid_size) { lnet_nid_v6 nid_v6; if (entry->mne_nid_type & MNE_NID_TYPE_RECORD) { struct mnr_nid_record *rec = ptr; if (!(rec->mnr_rec_flags & MNR_REC_FLAG_NIDv6)) continue; nid_v6 = rec->mnr_nid_v6; } else { lnet_nid2nidv6(*(lnet_nid_t *)ptr, &nid_v6); } rc = client_import_find_conn(obd->u.cli.cl_import, nid_v6, (struct obd_uuid *)uuid); if (rc == 0) break; }
Attachments
Issue Links
- is related to
-
LU-17060 conf-sanity test_21c: Unable to start OST2 - MGS refuses to allow this server to start
- Open
-
LU-18049 ost-pools test_25, sanity-sec test_31: crash in ext4_htree_store_dirent kmalloc
- Resolved
-
LU-17962 conf-sanity test_32a: failed with replace_nids operation already in progress
- Resolved
-
LU-16722 LMR1a: MGS config log restructuring and redundancy
- Open
-
LU-16823 add LNet and OBD connect flags for IPv6 peers
- Resolved
- is related to
-
LU-10360 use Imperative Recovery logs for client->MDT/OST connections
- Open
-
LU-19 imperative recovery
- Resolved
-
LU-10391 LNET: Support IPv6
- Resolved