Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13307

add LCFG_NODEMAP_ADD_RANGEv6 records for IPv6

Details

    • Improvement
    • Resolution: Fixed
    • Minor
    • Lustre 2.16.0
    • Lustre 2.16.0
    • 9223372036854775807

    Description

      Add LCFG_NODEMAP_ADD_RANGEv6 and LCFG_NODEMAP_DEL_RANGEv6 records to hold IPv6 addresses. There may need to be some better semantics for what constitutes a "range" for IPv6 addresses, as we definitely do not want enumeration of addresses involved.

      Attachments

        Issue Links

          Activity

            [LU-13307] add LCFG_NODEMAP_ADD_RANGEv6 records for IPv6

            Got  https://review.whamcloud.com/c/fs/lustre-release/+/53135 fully working for IPv6 and IPv4. Its ready for review. Once this lands we can put lustre into a feature freeze.

            simmonsja James A Simmons added a comment - Got   https://review.whamcloud.com/c/fs/lustre-release/+/53135 fully working for IPv6 and IPv4. Its ready for review. Once this lands we can put lustre into a feature freeze.

            Currently working on https://review.whamcloud.com/c/fs/lustre-release/+/53135.

            Works for IPv4 and it mostly works for IPv6. Still some issues for final deleting of the nodemap due to the records support missing currently. Also for this patch only one large NID is stored for each nodemap.

             

            simmonsja James A Simmons added a comment - Currently working on https://review.whamcloud.com/c/fs/lustre-release/+/53135. Works for IPv4 and it mostly works for IPv6. Still some issues for final deleting of the nodemap due to the records support missing currently. Also for this patch only one large NID is stored for each nodemap.  

            This patch allows a single nodemap entry to be specified but not a range of NIDs. I don't think this ticket should be closed when 53135 is landed, since we really need to have a solution for specifying multiple IPv6 NIDs into a range efficiently instead of listing 5000 NIDs individually for multiple nodemaps.

            adilger Andreas Dilger added a comment - This patch allows a single nodemap entry to be specified but not a range of NIDs. I don't think this ticket should be closed when 53135 is landed, since we really need to have a solution for specifying multiple IPv6 NIDs into a range efficiently instead of listing 5000 NIDs individually for multiple nodemaps.

            "James Simmons <jsimmons@infradead.org>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/53135
            Subject: LU-13307 nodemap: have nodemap_add_member support large NIDs
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 4aa8457f95ee21d72a35d596082f34f6fbee1ace

            gerrit Gerrit Updater added a comment - "James Simmons <jsimmons@infradead.org>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/53135 Subject: LU-13307 nodemap: have nodemap_add_member support large NIDs Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 4aa8457f95ee21d72a35d596082f34f6fbee1ace

            It would seem reasonable to change nodemap_idx_type similar to my previous suggestion, but instead of (or in addition to) nr6_count to add a nr6_netmask_bits so that it can encode e.g. 192.168.10.1/24 or ::1/52 (or whatever is used for IPv6 addresses):

            enum nodemap_idx_type {
                    :
                  NODEMAP_RANGE6_IDX = 6
            };
            
            struct nodemap_range6_rec {
                    __u64 nr6_count;
                    struct nr6_start_nid;
                    __u32 nr6_netmask_bits;
            };
            
            adilger Andreas Dilger added a comment - It would seem reasonable to change nodemap_idx_type similar to my previous suggestion, but instead of (or in addition to) nr6_count to add a nr6_netmask_bits so that it can encode e.g. 192.168.10.1/24 or ::1/52 (or whatever is used for IPv6 addresses): enum nodemap_idx_type { : NODEMAP_RANGE6_IDX = 6 }; struct nodemap_range6_rec { __u64 nr6_count; struct nr6_start_nid; __u32 nr6_netmask_bits; };

            Neil, nodemaps are used for managing client-specific configuration.

            Originally nodemaps were developed for UID/GID mapping for clients connecting from different "workgroup" clusters that are/were not managed with the same central user administration. They are also used to direct clients to mount only a subdirectory of the filesystem, and to limit root access/admin permissions to the rest of the filesystem (root squash, and more).

            Because nodemaps can control admin access on clients, the clients can also use strong authentication (shared secret key or Kerberos) in addition to the IP address to ensure that clients are not spoofing their address.

            Definitely the clients will need to use consistent source addresses, and/or advertise all of their addresses at connect time to the servers, so that the server knows that later RPCs are from the same client, or they will be rejected.

            adilger Andreas Dilger added a comment - Neil, nodemaps are used for managing client-specific configuration. Originally nodemaps were developed for UID/GID mapping for clients connecting from different "workgroup" clusters that are/were not managed with the same central user administration. They are also used to direct clients to mount only a subdirectory of the filesystem, and to limit root access/admin permissions to the rest of the filesystem (root squash, and more). Because nodemaps can control admin access on clients, the clients can also use strong authentication (shared secret key or Kerberos) in addition to the IP address to ensure that clients are not spoofing their address. Definitely the clients will need to use consistent source addresses, and/or advertise all of their addresses at connect time to the servers, so that the server knows that later RPCs are from the same client, or they will be rejected.
            neilb Neil Brown added a comment -

            It is important to be aware that a network interface can (and usually does) have multiple IPv6 addresses.

            There is a link-local address which cannot be routed, there maybe a an address based on the hardware MAC address, there might be 1 or more temporary addresses that are assigned randomly and used as the source of outgoing connections and are discarded after a time when not in use.  There might be an address assigned by dhcp6.  And there could be addresses that are manually assigned.

            Servers typically have a manually assigned address which is published in the DNS.   Hosts that only act as a client could only have randomly assigned addresses - one permanent for the rare case when an incoming connection is needed, one or two which are temporary and recycles, and used for all outgoing connections.

            NFS and Lustre both  set IPV6_PREFER_SRC_PUBLIC so when connecting to a server the client will advertise a stable IPv6 address that will not expire.  However if there are multiple possible public source addresses for the chosen interface, there is not currently any code to choose between them.

            With this context: What are nodemaps used for?  grouping servers? grouping clients?  Both?

            Can we just assume that people who want to use nodemaps will configure appropriate IPv6 addresses and NIDs to make the mapping convenient?

            I would MUCH rather limit support to net-masks, avoiding any suggestion of numeric ranges which are not power-of-2 in size and alignment.

            neilb Neil Brown added a comment - It is important to be aware that a network interface can (and usually does) have multiple IPv6 addresses. There is a link-local address which cannot be routed, there maybe a an address based on the hardware MAC address, there might be 1 or more temporary addresses that are assigned randomly and used as the source of outgoing connections and are discarded after a time when not in use.  There might be an address assigned by dhcp6.  And there could be addresses that are manually assigned. Servers typically have a manually assigned address which is published in the DNS.   Hosts that only act as a client could only have randomly assigned addresses - one permanent for the rare case when an incoming connection is needed, one or two which are temporary and recycles, and used for all outgoing connections. NFS and Lustre both  set IPV6_PREFER_SRC_PUBLIC so when connecting to a server the client will advertise a stable IPv6 address that will not expire.  However if there are multiple possible public source addresses for the chosen interface, there is not currently any code to choose between them. With this context: What are nodemaps used for?  grouping servers? grouping clients?  Both? Can we just assume that people who want to use nodemaps will configure appropriate IPv6 addresses and NIDs to make the mapping convenient? I would MUCH rather limit support to net-masks, avoiding any suggestion of numeric ranges which are not power-of-2 in size and alignment.

            James, do you have any clusters at ORNL that are configured with IPv6 addresses? It would be useful to see what the address assignments look like in that case.

            adilger Andreas Dilger added a comment - James, do you have any clusters at ORNL that are configured with IPv6 addresses? It would be useful to see what the address assignments look like in that case.

            After recently digging into the nodemap code, I'm trying to figure out how to configure nodemaps with IPv6 addresses.

            My understanding of IPv6 is superficial, so it isn't clear to me how a HPC cluster of nodes would be addressed? In some documentation, the interface identifier is randomly assigned (formerly based on the Ethernet MAC address), so no "NID range" is possible, and the nodemap might need thousands of individual records to describe the nodes, yuck. It also seems possible that the interface identifier part of the IPv6 address would be sequentially assigned to the client nodes, which would map will to the existing nodemap functionality.

            For IPv4 addresses, nodemap stores the nrr_start_nid and nrr_end_nid range in a 32-byte struct nodemap_range_rec using the NODEMAP_RANGE_IDX keys. The nodemap index contains only 32-byte records, so it would be possible to store one, but not two, 20-byte IPv6 NIDs in a single record.

            To distinguish IPv6 NIDs in the nodemap config index, a new NODEMAP_RANGE6_IDX with nodemap_range6_rec records could be used:

            struct nodemap_range6_rec {
                    __u64 nr6_count;
                    struct nr6_start_nid;
                    __u32 nr6_padding;
            };
            

            Since the nodemap range does not need to have a different nud_size, nid_type, nid_num or IPv6 subnet, it would be possible to store the 20-byte struct lnet_nid plus a _u64 count of NIDs that are in this range. The _u64 nr6_count corresponds to the interface identifier part of the IPv6 address, so is large enough to contain all NIDs in a given subnet, if necessary.

            This works for inclusive ranges (ie. nodes X-Y belong to a nodemap), but not very good to specify exclusive ranges (ie. "all other NIDs belong to this nodemap"). Any thoughts?

            adilger Andreas Dilger added a comment - After recently digging into the nodemap code, I'm trying to figure out how to configure nodemaps with IPv6 addresses. My understanding of IPv6 is superficial, so it isn't clear to me how a HPC cluster of nodes would be addressed? In some documentation, the interface identifier is randomly assigned (formerly based on the Ethernet MAC address), so no "NID range" is possible, and the nodemap might need thousands of individual records to describe the nodes, yuck. It also seems possible that the interface identifier part of the IPv6 address would be sequentially assigned to the client nodes, which would map will to the existing nodemap functionality. For IPv4 addresses, nodemap stores the nrr_start_nid and nrr_end_nid range in a 32-byte struct nodemap_range_rec using the NODEMAP_RANGE_IDX keys. The nodemap index contains only 32-byte records, so it would be possible to store one, but not two, 20-byte IPv6 NIDs in a single record. To distinguish IPv6 NIDs in the nodemap config index, a new NODEMAP_RANGE6_IDX with nodemap_range6_rec records could be used: struct nodemap_range6_rec { __u64 nr6_count; struct nr6_start_nid; __u32 nr6_padding; }; Since the nodemap range does not need to have a different nud_size , nid_type , nid_num or IPv6 subnet, it would be possible to store the 20-byte struct lnet_nid plus a _ u64 count of NIDs that are in this range. The _u64 nr6_count corresponds to the interface identifier part of the IPv6 address, so is large enough to contain all NIDs in a given subnet, if necessary. This works for inclusive ranges (ie. nodes X-Y belong to a nodemap), but not very good to specify exclusive ranges (ie. "all other NIDs belong to this nodemap"). Any thoughts?

            People

              simmonsja James A Simmons
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: