[LU-13306] allow clients to accept mgs_nidtbl_entry with IPv6 NIDs Created: 28/Feb/20 Updated: 07/Jan/24 Resolved: 23/Sep/23 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.16.0 |
| Type: | Improvement | Priority: | Major |
| Reporter: | Andreas Dilger | Assignee: | James A Simmons |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | IPv6 | ||
| Issue Links: |
|
||||||||||||||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||||||||||||||
| Description |
|
Currently, if a client receives an mgs_nidtbl_entry with mne_nid_type != 0 or mne_nid_size != sizeof(lnet_nid_t)=8 the client will stop processing all later records, essentially cutting the client off from OST/MDT IR logs that have even one IPv6 addresses configured. To add IPv6 NID support, but preserve compatibility with older clients, there should be a new connect flag sent to the MGS at connect time: #define OBD_CONNECT2_MNE_TYPE 0x1000000ULL /* mne_nid_type flags */ that should be saved in the client export on the MGS and used by mgs_nidtbl_read() to decide what type of NIDs should be sent to the client. The client should also send the flags that it understands MNE_NID_SUPPORTED = (MNE_NID_TYPE_* | (MNR_REC_FLAG_* << 8)) in struct obd_connect_data.padding0 renamed to ocd_mne_type so that the MGS can also avoid sending future record types to a client that it doesn't understand. The MGS reply should mask out any flags in ocd_mne_type that it doesn't support itself. If a client export does not have OBD_CONNECT2_MNE_TYPE set, the MGS should only send compatible lnet_nid_t records with mne_nid_type=0 and mne_nid_size=sizeof(lnet_nid_t)=8. To allow future expansion, the mne_rec_type field should be treated as a bitmask of features present in the mgs_nidtbl_entry instead of a enumerated type. The first bit set in mne_nid_type=MNE_NID_TYPE_RECORD=0x01 would be used for expanding the record size, with the larger per-NID records as follows, with mne_nid_size=24, and a per-record mnr_rec_flags=MNR_REC_FLAG_NIDv6=0x01 to allow positively identifying the record content rather than depending on "all zero" fields to detect the difference:
struct mne_nid_record { /* when MNE_NID_TYPE_RECORD is set */
__u8 mnr_rec_flags; /* MNR_REC_FLAG_* flags */
__u8 mnr_padding1;
__u16 mnr_padding2;
struct lnet_nid_v6 mnr_nid_v6;
/* other fields depending on mne_nid_type, mne_nid_size, mnr_rec_flags */
};
struct mgs_nidtbl_entry {
__u64 mne_version; /* table version of this entry */
__u32 mne_instance; /* target instance # */
__u32 mne_index; /* target index */
__u32 mne_length; /* length of this entry - by bytes */
__u8 mne_type; /* target type LDD_F_SV_TYPE_OST/MDT */
__u8 mne_nid_type; /* MNE_NID_TYPE_* flags */
__u8 mne_nid_size; /* size of each NID, by bytes */
__u8 mne_nid_count; /* # of NIDs in buffer */
union { /* don't walk the union array directly, use mne_nid_size */
lnet_nid_t mne_nids[]; /* when mne_nid_type == 0 */
struct mnr_nid_record mne_records[]; /* if MNE_NID_TYPE_RECORD */
};
};
The 4 extra bytes of fields at the start plus the 20-byte mnr_nid_v6 field keep all records with 64-bit size and alignment. Since lnet_nid_v6 can also encapsulate an IPv4 NID (based on .nid_type) there is no need to handle lnet_nid_t directly within mnr_nid_record. However, MNR_REC_FLAG_NIDv6 should be verified in case there is some other kind of NID in the record. When processing mne_nids or mne_records, it should be using mne_nid_size to walk the records, rather than having a loop that processes mne_nids[i] as an array, otherwise this will break if mne_nid_size changes in the future (ignoring any potential changes from LU-10360):
void *ptr;
/* iterate all nids to find one UUID by NID */
rc = -ENOENT;
for (i = 0, ptr = entry->mne_records; i < entry->mne_nid_count; i++, ptr += entry->mne_nid_size) {
lnet_nid_v6 nid_v6;
if (entry->mne_nid_type & MNE_NID_TYPE_RECORD) {
struct mnr_nid_record *rec = ptr;
if (!(rec->mnr_rec_flags & MNR_REC_FLAG_NIDv6))
continue;
nid_v6 = rec->mnr_nid_v6;
} else {
lnet_nid2nidv6(*(lnet_nid_t *)ptr, &nid_v6);
}
rc = client_import_find_conn(obd->u.cli.cl_import, nid_v6,
(struct obd_uuid *)uuid);
if (rc == 0)
break;
}
|
| Comments |
| Comment by Andreas Dilger [ 28/Feb/20 ] |
|
The MGS could also check for a new OBD_CONNECT_MNE_TYPE flag at connection time, and drop IPv6 records from the mgs_nidtbl_entry records for those clients. Since the MGS generates the nidtbl list for every client, this does not increase overhead significantly. |
| Comment by Andreas Dilger [ 20/Jul/20 ] |
|
It would be very useful to get this code fixed to skip "unknown" NIDs in mgs_nidtbl_entry, so that we have some kind of interop with older clients. Otherwise, as soon as any IPv6 NID is listed in the IR table the client will no longer be able to use the IR list throwing a big wrench into LU-10360. |
| Comment by Andreas Dilger [ 21/Jul/20 ] |
|
The patch https://review.whamcloud.com/39373 LU-10391 lnet: extended nids introduces struct lnet_nid_x (IMHO, lnet_nid_v6) to encapsulate larger NIDs: static inline int nidv6_is_legacy(const struct lnet_nid_v6 *nid_v6) { return nid_v6->nid_type < 256; } struct lnet_nid_v6 { __u16 nid_type; __u16 nid_num; union { __u32 nid_addr0; __be32 nid_addr[4]; }; }; The added mnr_rec_flags and padding fields are used since it only needs 4-byte alignment and is 20 bytes in size. The nid_is_legacy() check can be used to encapsulate IPv4 addresses without the need for external identification. |
| Comment by Gerrit Updater [ 26/Apr/23 ] |
|
"James Simmons <jsimmons@infradead.org>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50750 |
| Comment by Gerrit Updater [ 09/May/23 ] |
|
"James Simmons <jsimmons@infradead.org>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50896 |
| Comment by Gerrit Updater [ 01/Jun/23 ] |
|
"James Simmons <jsimmons@infradead.org>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/51185 |
| Comment by Gerrit Updater [ 19/Aug/23 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/50896/ |
| Comment by Gerrit Updater [ 23/Aug/23 ] |
|
"James Simmons <jsimmons@infradead.org>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/52053 |
| Comment by Gerrit Updater [ 31/Aug/23 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/52053/ |
| Comment by Gerrit Updater [ 23/Sep/23 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/50750/ |
| Comment by Peter Jones [ 23/Sep/23 ] |
|
All patches seem to be merged for 2.16 |