Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17922

nodemap should allow declaring ID ranges

Details

    • Improvement
    • Resolution: Fixed
    • Minor
    • Lustre 2.16.0
    • Lustre 2.16.0
    • 3
    • 9223372036854775807

    Description

      It is currently not possible to specify an ID range when using "lctl nodemap_add_idmap" to specify a UID/GID/PROJID mapping for a nodemap, there needs to be a separate record specified for each "CLID:FSID" ID mapping. That means for an environment with a large number of mapped IDs (e.g. tenants in a virtualized environment) there would potentially need to be hundreds or thousands of nodemap records specified, if they are not all squashed to a single FSID.

      The command line interface could specify a range of values like "lctl nodemap_add_idmap --name MAP --idtype TYPE --idmap CLSTART-CLEND:FSSTART[-_FSEND_]". The "FSEND" parameter is redundant in this case and would not need to be specified, but it may be good to accept it as a double-check and/or a visual reminder to the admin what the range of CLID and FSID values are used by this nodemap? Alternately, or additionally, it would be possible to specify the number of IDs being mapped "lctl nodemap_add_idmap --name NODEMAP --idtype TYPE --idmap CLSTART:FSSTART --count=NUMBER" to avoid the chance of an inconsistency in the *END values, but it would need to be converted to a range due to internal implementation details (see further below).

      The "expedient" solution would be to loop over the specified range/count internally in jt_nodemap_add_idmap() and generate a series of commands to the kernel that maps one ID at a time. However, I worry that expanding the range into individual records could be accidentally misused, with an admin thinking "let's map a range of 1M UIDs and 1M GIDs for each nodemap just in case they are needed" and that generates many millions of configuration records that every target needs to process at mount time, if this many records can even be stored in the config logs in the first place.

      The better solution would be to store the ID range in the configuration record, and then expand it when it is processed (which would give 1M separate records in memory), but since it uses an rbtree structure for lookup this wouldn't be fatal. The data structure to hold the ID mapping in the NODEMAP_UIDMAP_IDX, NODEMAP_GIDMAP_IDX, and NODEMAP_PROJIDMAP_IDX records is:

      struct nodemap_id_rec {
              __u32   nir_id_fs;                                                      
              __u32   nir_padding1;                                                 
              __u64   nir_padding2;                                                 
              __u64   nir_padding3;                                                  
              __u64   nir_padding4;                                                   
      };
      

      We could replace nir_padding1 with nir_id_fs_end (or nir_id_count, whichever is more convenient for the implementation) which would optionally specify the number of IDs to map if it is non-zero.

      The best solution would be to be able to insert the range of mappings directly into the rbtree structure (which I believe can be done efficiently in an rbtree). In this case, if the ID is mapped to a range and this is returned from the rbtree lookup, then care would be needed to properly "offset" the returned value so that it properly maps the ID within the range.

      This will involve code changes in several areas:

      • lctl nodemap_add_idmap and jt_nodemap_add_idmap() to allow specifying the *END values (and/or COUNT) and pass this on to the kernel
      • the interface to the kernel to set the range in the idmap looks like it is a string-based lustre_cfg so it should be possible to just pass the range directly through to the kernel. I see that the "CLID:FSID" string is passed straight through to nodemap_parse_idmap() in the kernel, and it just parses two numbers separated by ':' directly. It is a tossup whether it is more expedient to pass the "range" values like "CLSTARTCLEND:FSSTART[_FSEND_]" directly compared to a count "CLSTART:FSSTART+COUNT" (which is not as convenient for the users, but less prone to errors IMHO, and either could be generated by jt_nodemap_add_idmap() internally). Either format should cause kstrtoul() to fail ID parsing for older modules, which is probably desirable so that an ID range is not accidentally treated as a single value.
      • the code in cfg_nodemap_cmd() and nodemap_parse_idmap() that processes the configuration record would need to handle the extra range values, maybe just growing idmap[3] to calculate and pass back idmap[2] = COUNT to the caller?
      • the rbtree management for the range of IDs is probably the most complex part. This is pretty straight forward (if least efficient) if nodemap_add_idmap() is just looping and inserting idmap[2]=COUNT separate records into the rbtree. This could be done for an initial implementation to avoid the need for more extensive changes elsewhere in the ID mapping code. The changes become more complex if the ID range is inserted into the rbtree directly, but that could be done as a separate patch once the initial parsing/passing/processing code is working properly.

      Attachments

        Issue Links

          Activity

            [LU-17922] nodemap should allow declaring ID ranges

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/56551/
            Subject: LU-17922 tests: skip sanity-sec/27aa for old MDS
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 50b779a87bee4ae6456d2de4a8668efe3a09b1b2

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/56551/ Subject: LU-17922 tests: skip sanity-sec/27aa for old MDS Project: fs/lustre-release Branch: master Current Patch Set: Commit: 50b779a87bee4ae6456d2de4a8668efe3a09b1b2

            "Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/56551
            Subject: LU-17922 tests: skip sanity-sec/27aa for old MDS
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 48f975bf0e368391fe3e934afd68a9dc819d3770

            gerrit Gerrit Updater added a comment - "Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/56551 Subject: LU-17922 tests: skip sanity-sec/27aa for old MDS Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 48f975bf0e368391fe3e934afd68a9dc819d3770
            pjones Peter Jones added a comment -

            Merged for 2.16

            pjones Peter Jones added a comment - Merged for 2.16

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/55502/
            Subject: LU-17922 utils: added idmap range functionality
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 8445f7b92f067346919069bcf9a268d41c52c5b3

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/55502/ Subject: LU-17922 utils: added idmap range functionality Project: fs/lustre-release Branch: master Current Patch Set: Commit: 8445f7b92f067346919069bcf9a268d41c52c5b3

            "Maximilian Dilger <mdilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/55502
            Subject: LU-17922 utils: added idmap range functionality
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 427297f4371dbe402bf3196486e161f27c98da13

            gerrit Gerrit Updater added a comment - "Maximilian Dilger <mdilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/55502 Subject: LU-17922 utils: added idmap range functionality Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 427297f4371dbe402bf3196486e161f27c98da13

            @adilger yes, I think this could be quite convenient!

            Example 1: For GID mapping from Oak (storage system) to Sherlock (cluster), with map_mode=gid, we add 1000000 to each GID to avoid any collision on already existing GIDs on the cluster. It's a convention we adopted between Oak and Sherlock.
            Currently, the mapping looks like this:

            idtype: gid, client_id: 1014600, fs_id: 14600
            idtype: gid, client_id: 1014622, fs_id: 14622 
            idtype: gid, client_id: 1014633, fs_id: 14633
            

            We have 378 GIDs mapped like this. We could definitely replace these with a single ID range map.

            Example 2: For another cluster that is root squashed (admin_nodemap=0), we do a GID-to-same-GID mapping (with map_mode=gid):

            idtype: gid, client_id: 13734, fs_id: 13736 
            idtype: gid, client_id: 13765, fs_id: 13765 
            idtype: gid, client_id: 13766, fs_id: 13766 
            

            We have 684 GIDs mapped like this. Those are the only allowed GIDs. We would have to double check if that would be ok to replace with a single ID mapping range in terms of security, but I think that possibly it could.

            We used to have another cluster with GID mapping to "random" cluster GIDs: there, ID ranges would not have been useful, but we have stopped doing that as of today.

            sthiell Stephane Thiell added a comment - @adilger yes, I think this could be quite convenient! Example 1: For GID mapping from Oak (storage system) to Sherlock (cluster), with map_mode=gid, we add 1000000 to each GID to avoid any collision on already existing GIDs on the cluster. It's a convention we adopted between Oak and Sherlock. Currently, the mapping looks like this: idtype: gid, client_id: 1014600, fs_id: 14600 idtype: gid, client_id: 1014622, fs_id: 14622 idtype: gid, client_id: 1014633, fs_id: 14633 We have 378 GIDs mapped like this. We could definitely replace these with a single ID range map. Example 2: For another cluster that is root squashed (admin_nodemap=0), we do a GID-to-same-GID mapping (with map_mode=gid): idtype: gid, client_id: 13734, fs_id: 13736 idtype: gid, client_id: 13765, fs_id: 13765 idtype: gid, client_id: 13766, fs_id: 13766 We have 684 GIDs mapped like this. Those are the only allowed GIDs. We would have to double check if that would be ok to replace with a single ID mapping range in terms of security, but I think that possibly it could. We used to have another cluster with GID mapping to "random" cluster GIDs: there, ID ranges would not have been useful, but we have stopped doing that as of today.

            People

              mdilger Max Dilger
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: