Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17922

nodemap should allow declaring ID ranges

    XMLWordPrintable

Details

    • Improvement
    • Resolution: Fixed
    • Minor
    • Lustre 2.16.0
    • Lustre 2.16.0
    • 3
    • 9223372036854775807

    Description

      It is currently not possible to specify an ID range when using "lctl nodemap_add_idmap" to specify a UID/GID/PROJID mapping for a nodemap, there needs to be a separate record specified for each "CLID:FSID" ID mapping. That means for an environment with a large number of mapped IDs (e.g. tenants in a virtualized environment) there would potentially need to be hundreds or thousands of nodemap records specified, if they are not all squashed to a single FSID.

      The command line interface could specify a range of values like "lctl nodemap_add_idmap --name MAP --idtype TYPE --idmap CLSTART-CLEND:FSSTART[-_FSEND_]". The "FSEND" parameter is redundant in this case and would not need to be specified, but it may be good to accept it as a double-check and/or a visual reminder to the admin what the range of CLID and FSID values are used by this nodemap? Alternately, or additionally, it would be possible to specify the number of IDs being mapped "lctl nodemap_add_idmap --name NODEMAP --idtype TYPE --idmap CLSTART:FSSTART --count=NUMBER" to avoid the chance of an inconsistency in the *END values, but it would need to be converted to a range due to internal implementation details (see further below).

      The "expedient" solution would be to loop over the specified range/count internally in jt_nodemap_add_idmap() and generate a series of commands to the kernel that maps one ID at a time. However, I worry that expanding the range into individual records could be accidentally misused, with an admin thinking "let's map a range of 1M UIDs and 1M GIDs for each nodemap just in case they are needed" and that generates many millions of configuration records that every target needs to process at mount time, if this many records can even be stored in the config logs in the first place.

      The better solution would be to store the ID range in the configuration record, and then expand it when it is processed (which would give 1M separate records in memory), but since it uses an rbtree structure for lookup this wouldn't be fatal. The data structure to hold the ID mapping in the NODEMAP_UIDMAP_IDX, NODEMAP_GIDMAP_IDX, and NODEMAP_PROJIDMAP_IDX records is:

      struct nodemap_id_rec {
              __u32   nir_id_fs;                                                      
              __u32   nir_padding1;                                                 
              __u64   nir_padding2;                                                 
              __u64   nir_padding3;                                                  
              __u64   nir_padding4;                                                   
      };
      

      We could replace nir_padding1 with nir_id_fs_end (or nir_id_count, whichever is more convenient for the implementation) which would optionally specify the number of IDs to map if it is non-zero.

      The best solution would be to be able to insert the range of mappings directly into the rbtree structure (which I believe can be done efficiently in an rbtree). In this case, if the ID is mapped to a range and this is returned from the rbtree lookup, then care would be needed to properly "offset" the returned value so that it properly maps the ID within the range.

      This will involve code changes in several areas:

      • lctl nodemap_add_idmap and jt_nodemap_add_idmap() to allow specifying the *END values (and/or COUNT) and pass this on to the kernel
      • the interface to the kernel to set the range in the idmap looks like it is a string-based lustre_cfg so it should be possible to just pass the range directly through to the kernel. I see that the "CLID:FSID" string is passed straight through to nodemap_parse_idmap() in the kernel, and it just parses two numbers separated by ':' directly. It is a tossup whether it is more expedient to pass the "range" values like "CLSTARTCLEND:FSSTART[_FSEND_]" directly compared to a count "CLSTART:FSSTART+COUNT" (which is not as convenient for the users, but less prone to errors IMHO, and either could be generated by jt_nodemap_add_idmap() internally). Either format should cause kstrtoul() to fail ID parsing for older modules, which is probably desirable so that an ID range is not accidentally treated as a single value.
      • the code in cfg_nodemap_cmd() and nodemap_parse_idmap() that processes the configuration record would need to handle the extra range values, maybe just growing idmap[3] to calculate and pass back idmap[2] = COUNT to the caller?
      • the rbtree management for the range of IDs is probably the most complex part. This is pretty straight forward (if least efficient) if nodemap_add_idmap() is just looping and inserting idmap[2]=COUNT separate records into the rbtree. This could be done for an initial implementation to avoid the need for more extensive changes elsewhere in the ID mapping code. The changes become more complex if the ID range is inserted into the rbtree directly, but that could be done as a separate patch once the initial parsing/passing/processing code is working properly.

      Attachments

        Issue Links

          Activity

            People

              mdilger Max Dilger
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: