Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13417

DNE3: mkdir() automatically create remote directory on MDS which has more space

Details

    • Improvement
    • Resolution: Fixed
    • Major
    • Lustre 2.15.0
    • Lustre 2.13.0
    • 3
    • 9223372036854775807

    Description

      Since the patches from LU-11213 landed for 2.13.0, I thought "lfs setdirstripe -i -1 /mnt/lustre" on the root or other existing directory would allow creation of remote directories on other MDTs using plain "mkdir" commands. This is different than the case of "lfs setdirstripe -i -1 -c N /mnt/lustre" selecting stripes on less-full MDTs that was landed via patch https://review.whamcloud.com/35825 "LU-12624 lod: alloc dir stripes by QoS", but this patch also removed the "space" hash, so I thought that regular mkdir of a directory could be allowed to balance across MDTs?

      However, I can't seem to get this to work. On current master (2.13.52-259, just before 2.13.53) I'm not able to use "lfs setdirstripe -i -1 /path/to/dir" on an existing directory. It seems to select the less-full MDT if I explicitly run "lfs mkdir -i -1" for a new directory, but that was also true in 2.12 using patch https://review.whamcloud.com/30598 "LU-10277 utils: 'lfs mkdir -i -1' pick the less full MDTs", so it isn't clear how to enable the LU-11213 functionality to balance directories across MDTs?

      There should be a way for "mkdir(2)" from a normal application (not "lfs mkdir -i -1") to be able to create remote (1-stripe) directories in the filesystem, and it should be possible to set this by default on the root directory (per LU-11213). This is critical for being able to use multiple MDTs effectively without users knowing the details of how to configure striped/remote directories manually, or be forced to set all directories as striped (unwelcome due to performance overhead).

      The default mdt_qos_threshold_rr value should be reduced significantly (e.g. 1-2% and/or modified so that some amount of MDT balancing is active when the filesystem is balanced, at least in the root directory by default. Otherwise, without users understanding the details of DNE MDT0000 will hold all of the inodes, when it would be better if the top 1 or 2 levels of directories should be distributed across MDTs.

      Maybe this is mostly a documentation issue, and the "lfs-setdirstripe.1" man page needs to be updated to be more clear so I can understand what needs to be done to enable this? (also the usage message for setdirstripe/mkdir should remove the "This can only be done on MDT0 with the right of administrator" message.)

      Attachments

        Issue Links

          Activity

            [LU-13417] DNE3: mkdir() automatically create remote directory on MDS which has more space

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/43491/
            Subject: LU-13417 test: use mkdir_on_mdt0() in misc tests
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: de62c8c7ef5d627da872260686d9279cbb60736e

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/43491/ Subject: LU-13417 test: use mkdir_on_mdt0() in misc tests Project: fs/lustre-release Branch: master Current Patch Set: Commit: de62c8c7ef5d627da872260686d9279cbb60736e

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/43489/
            Subject: LU-13417 test: add mkdir_on_mdt0()
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 54fb8458db0bff4fdfe42ba7476de3129d7606cd

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/43489/ Subject: LU-13417 test: add mkdir_on_mdt0() Project: fs/lustre-release Branch: master Current Patch Set: Commit: 54fb8458db0bff4fdfe42ba7476de3129d7606cd

            Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/43492
            Subject: LU-13417 test: use mkdir_on_mdt0() in replay-dual
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 4196abb88570a008839c9f249b80bb1c357c2723

            gerrit Gerrit Updater added a comment - Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/43492 Subject: LU-13417 test: use mkdir_on_mdt0() in replay-dual Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 4196abb88570a008839c9f249b80bb1c357c2723

            Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/43491
            Subject: LU-13417 test: use mkdir_on_mdt0() in misc tests
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 8ea0f5eb5fe0e9e08ed44a2e805c704e4f40d581

            gerrit Gerrit Updater added a comment - Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/43491 Subject: LU-13417 test: use mkdir_on_mdt0() in misc tests Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 8ea0f5eb5fe0e9e08ed44a2e805c704e4f40d581

            Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/43489
            Subject: LU-13417 test: add mkdir_on_mdt0()
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: b84ff3e11e3f3c064d5e257b13d21ce80053e4df

            gerrit Gerrit Updater added a comment - Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/43489 Subject: LU-13417 test: add mkdir_on_mdt0() Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: b84ff3e11e3f3c064d5e257b13d21ce80053e4df
            laisiyao Lai Siyao added a comment -

            It's strange replay-dual 22d always failed, but can pass on autotest for other patches.

            When I tried to test replay-dual alone on master code, it also failed on autotest.

            laisiyao Lai Siyao added a comment - It's strange replay-dual 22d always failed, but can pass on autotest for other patches. When I tried to test replay-dual alone on master code, it also failed on autotest.

            Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/40925
            Subject: LU-13417 test: dump replay-dual 22d debug log
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: fedcbc7741ad5928b3c2f40a910c2126f37f060c

            gerrit Gerrit Updater added a comment - Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/40925 Subject: LU-13417 test: dump replay-dual 22d debug log Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: fedcbc7741ad5928b3c2f40a910c2126f37f060c

            I pushed the above patch as a starting point for getting this working, but it needs some additional work to finish it off.  Hongchao or Lai, can you please finish off that patch so that we can get it included into 2.14.

            adilger Andreas Dilger added a comment - I pushed the above patch as a starting point for getting this working, but it needs some additional work to finish it off.  Hongchao or Lai, can you please finish off that patch so that we can get it included into 2.14.

            Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/38553
            Subject: LU-13417 mdd: default DNE MDT balance on new filesystems
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 46feb12b9d64b366ba8cc0b5b842824add5a23c2

            gerrit Gerrit Updater added a comment - Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/38553 Subject: LU-13417 mdd: default DNE MDT balance on new filesystems Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 46feb12b9d64b366ba8cc0b5b842824add5a23c2
            adilger Andreas Dilger added a comment - - edited

            I was looking at the LMV code and tested to verify that the LU-11213 implementation of "lmv_create()" will already do round-robin allocation across MDTs in a directory with "-D -c 1 -i -1" set, and will only use QOS weight-balanced MDT selection if the MDT space imbalance is over the qos_threshold_rr limit. This is enabled only when a default directory layout is set on the root directory. It will also only be inherited one level down, because lum_stripe_count=1 default layouts are changed to lum_stripe_count=0 in lod_ah_init(), which is no longer considered to be inherited.

            I think that means there is a (hopefully simple) change that can be done to make this functionality more useful for filesystems:

            • explicitly set "trusted.dmv = .lum_magic=LMV_USER_MAGIC, .lum_stripe_count=1, .lum_stripe_index=-1" xattr in mdd_prepare() for newly formatted filesystems
            • one disadvantage is that this will only be inherited for the top-level directory, and would need LU-13440 to be inherited for 2-3 levels
            • another disadvantage is that this needs to be enabled manually by the user on "ROOT/" for existing filesystems

            While I think this will not be perfect, it will be a lot better than defaulting to not using all of the other MDTs unless the user knows to explicitly use "lfs mkdir" and/or "lfs setdirstripe -D" on the filesystem to start using other MDTs.

            adilger Andreas Dilger added a comment - - edited I was looking at the LMV code and tested to verify that the LU-11213 implementation of " lmv_create() " will already do round-robin allocation across MDTs in a directory with " -D -c 1 -i -1 " set, and will only use QOS weight-balanced MDT selection if the MDT space imbalance is over the qos_threshold_rr limit. This is enabled only when a default directory layout is set on the root directory. It will also only be inherited one level down, because lum_stripe_count=1 default layouts are changed to lum_stripe_count=0 in lod_ah_init() , which is no longer considered to be inherited. I think that means there is a (hopefully simple) change that can be done to make this functionality more useful for filesystems: explicitly set " trusted.dmv = .lum_magic=LMV_USER_MAGIC, .lum_stripe_count=1, .lum_stripe_index=-1 " xattr in mdd_prepare() for newly formatted filesystems one disadvantage is that this will only be inherited for the top-level directory, and would need LU-13440 to be inherited for 2-3 levels another disadvantage is that this needs to be enabled manually by the user on " ROOT/ " for existing filesystems While I think this will not be perfect, it will be a lot better than defaulting to not using all of the other MDTs unless the user knows to explicitly use " lfs mkdir " and/or " lfs setdirstripe -D " on the filesystem to start using other MDTs.

            Peter, this still needs some work to make the remote MDT selection heuristics a bit better.

            For top-level directories, it makes sense to round-robin them when the MDTs are relatively empty, and only pick a specific MDT when they are imbalanced.

            Also, I think the free space threshold needs to be smaller for MDT imbalance than for OST imbalance.

            adilger Andreas Dilger added a comment - Peter, this still needs some work to make the remote MDT selection heuristics a bit better. For top-level directories, it makes sense to round-robin them when the MDTs are relatively empty, and only pick a specific MDT when they are imbalanced. Also, I think the free space threshold needs to be smaller for MDT imbalance than for OST imbalance.

            People

              laisiyao Lai Siyao
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: