Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17434

DNE3: add exclude list for remote subdirectory creation

    XMLWordPrintable

Details

    • Improvement
    • Resolution: Fixed
    • Minor
    • Lustre 2.16.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      In some cases, subdirectories created in a directory with "--max-inherit-rr" and/or DNE3 MDT space balance enabled (LU-11213) should not be created on a remote MDT, but instead still be created on the local MDT.

      This was seen recently with Apache Spark creating a _temporary subdirectory for staging files while they are being transferred between clients (for filesystems that don't have coherent distributed IO). The files are then renamed out of _temporary and into the parent directory. Rename can cause high overhead and contention if the parent and _temporary directories are on different MDTs because parallel rename (LU-12125) cannot be used. Locking improvements are proposed in LU-17426 that would allow parallel rename between directories on the same MDT, but something is needed to ensure the directories actually are on the same MDT.

      It should be possible to specify a list of names that the LMV ignores when creating a subdirectory of a parent with round-robin or space balancing enabled. Hard coding the _temporary name into the client is inflexible, since there may be a need for other similar directories to avoid in the future, and there may be a need to disable this functionality for some reason. Also, the MAPREDUCE-7331 ticket proposes to allow _temporary to be changed in the future, though hopefully it is still something deterministic (e.g. "_temporary.XXXXXX").

      The exclude list for subdirectory name prefixes can be specified via "lctl set_param -P lmv.FSNAME*.qos_exclude_prefixes-". It should be possible to incrementally modify the list like debug masks are set/cleared, with "++" adding a prefix and "" removing a prefix instead of having to specify the full list each time. Using "++" could embed a "+" into the name and "--" should embed "-" into the name?  It isn't possible to use NUL-separated strings for the output, but it would be possible to use a slash '/' to separated strings and/or as an escape character "/+" and "/-", since it can never be part of a valid name.  That would have the unfortunate side-effect of looking like a pathname, which would be confusing.  Also, it doesn't allow adding and removing the exclusion list incrementally like "+" and "" do. I'm open for suggestions here. Using a backslash '\' as the escape character would also be possible, but has the drawback that it may be eaten by various levels of parsing (shell, glibc input and output) so it would never be clear how many '\' should be used at any time.

      If some subdirectory named "_temporary" (or other name in the exclude list) is created that is not related to Apache Spark, then the only effect is the directory is created on the same MDT as the parent. That is not worse than before DNE MDT balancing was implemented, and would only be a problem if the entire filesystem tree was being created by Apache Spark using only intermediate "_temporary" directories, which seems unlikely.

      Attachments

        Issue Links

          Activity

            People

              laisiyao Lai Siyao
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: