[LU-13417] DNE3: mkdir() automatically create remote directory on MDS which has more space Created: 06/Apr/20  Updated: 04/Dec/23  Resolved: 31/Jul/21

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.13.0
Fix Version/s: Lustre 2.15.0

Type: Improvement Priority: Major
Reporter: Andreas Dilger Assignee: Lai Siyao
Resolution: Fixed Votes: 0
Labels: dne3

Issue Links:
Duplicate
is duplicated by LU-10784 DNE3: mkdir() automatically create re... Resolved
Related
is related to LU-15216 improve MDT QOS space balance Resolved
is related to LU-11025 DNE3: directory restripe Resolved
is related to LU-12624 DNE3: striped directory allocate stri... Resolved
is related to LU-11213 DNE3: remote mkdir() in ROOT/ by default Resolved
is related to LU-15850 MDT QOS should always be used for rou... Resolved
is related to LU-17300 Avoid creating new dir/file/object on... Open
is related to LU-17334 Client should handle dir/file/object ... In Progress
is related to LU-13439 DNE3: MDT QOS tuning to avoid full MD... Resolved
is related to LU-14898 sanity test_413a: (max - min) * 100 /... Resolved
is related to LU-14909 LU-13417 patch breaks few recovery tests Resolved
is related to LU-14792 DNE3: enable filesystem-wide default LMV Resolved
is related to LU-15856 "lfs setdirstripe -D ... <dir>" shoul... Open
is related to LU-13440 DNE3: limit directory default layout ... Resolved
is related to LU-13560 'lfs mkdir -i N' should be 'sticky' o... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Since the patches from LU-11213 landed for 2.13.0, I thought "lfs setdirstripe -i -1 /mnt/lustre" on the root or other existing directory would allow creation of remote directories on other MDTs using plain "mkdir" commands. This is different than the case of "lfs setdirstripe -i -1 -c N /mnt/lustre" selecting stripes on less-full MDTs that was landed via patch https://review.whamcloud.com/35825 "LU-12624 lod: alloc dir stripes by QoS", but this patch also removed the "space" hash, so I thought that regular mkdir of a directory could be allowed to balance across MDTs?

However, I can't seem to get this to work. On current master (2.13.52-259, just before 2.13.53) I'm not able to use "lfs setdirstripe -i -1 /path/to/dir" on an existing directory. It seems to select the less-full MDT if I explicitly run "lfs mkdir -i -1" for a new directory, but that was also true in 2.12 using patch https://review.whamcloud.com/30598 "LU-10277 utils: 'lfs mkdir -i -1' pick the less full MDTs", so it isn't clear how to enable the LU-11213 functionality to balance directories across MDTs?

There should be a way for "mkdir(2)" from a normal application (not "lfs mkdir -i -1") to be able to create remote (1-stripe) directories in the filesystem, and it should be possible to set this by default on the root directory (per LU-11213). This is critical for being able to use multiple MDTs effectively without users knowing the details of how to configure striped/remote directories manually, or be forced to set all directories as striped (unwelcome due to performance overhead).

The default mdt_qos_threshold_rr value should be reduced significantly (e.g. 1-2% and/or modified so that some amount of MDT balancing is active when the filesystem is balanced, at least in the root directory by default. Otherwise, without users understanding the details of DNE MDT0000 will hold all of the inodes, when it would be better if the top 1 or 2 levels of directories should be distributed across MDTs.

Maybe this is mostly a documentation issue, and the "lfs-setdirstripe.1" man page needs to be updated to be more clear so I can understand what needs to be done to enable this? (also the usage message for setdirstripe/mkdir should remove the "This can only be done on MDT0 with the right of administrator" message.)



 Comments   
Comment by Lai Siyao [ 07/Apr/20 ]

It's because 'lfs setdirstripe -D -i -1 <dir>' is used to delete default stripe, because when both "mdt_index" and "mdt_count" are unset, it's treated as removal. You need to use 'lfs setdirstripe -D -i -1 -c 1' to enable balanced subdirectories creation for plain directories.

Comment by Andreas Dilger [ 08/Apr/20 ]

It's because 'lfs setdirstripe -D -i -1 <dir>' is used to delete default stripe... You need to use 'lfs setdirstripe -D -i -1 -c 1' to enable balanced subdirectories creation for plain directories.

I think this is a user interface bug then, or a bug in how LOD is interpreting the layout. In my testing, "lfs getdirstripe" showed "lmv_stripe_count:0 lmv_stripe_offset:-1 lmv_hash_type:none" was set on the directory, but it didn't affect the creation of directories with "mkdir()".

I would expect "lfs setdirstripe -d -D $dir" to delete the default layout for a directory, which seems to work, with "-d" already implying "-D" internally, but it is non-obvious because "lmv_stripe_offset:-1" is actually the default value, so "deleting" this layout didn't help.

I would also expect "lfs setdirstripe -D -i -1" to set the default layout to create remote directories, matching how "lfs setstripe" works. There were other users confused by this as well. The missing part is that specifying only "-i -1" is internally using the same as "-c 0" which actually results in the existing layout to be reset to the default (local directory creation). I'll push a small patch that makes "-D -i -1" set "-c 1" internally if the stripe count is not specified, so that it doesn't result in unexpected behavior for the user.

Another issue is that the default "qos_threshold_rr=17%" is too high to start balancing directory creation across MDTs. This might mean that MDT0000 is used for many millions of files and top-level directories before any balancing is even started. At that point it will be very difficult to return the balance of the MDTs because so many top-level directories and subdirectories have been created on MDT0000. I think it would be better to start space balancing and/or round-robin MDT selection for root directory entries right away if "lmv_stripe_count:1 lmv_stripe_offset:-1" is set on "ROOT/" (which I think we should make the default for 2.14). If there is only a single MDT then this is no change to behavior, but for multiple MDTs it will start using all MDTs right away at the root level and prevent the MDTs from becoming unbalanced in the first place. If we special-case MDT0000 to be RR/balanced immediately, then a smaller qos_threshold_rr=5% may still be useful to avoid the MDTs becoming too imbalanced, but will be less likely to be needed.

Comment by Gerrit Updater [ 08/Apr/20 ]

Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/38160
Subject: LU-13417 utils: lfs setstripe -D -i -1 should work
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 8da6905fb197ef226ca091f53335189f243bbcbe

Comment by Gerrit Updater [ 01/May/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/38160/
Subject: LU-13417 utils: lfs setdirstripe -D -i -1 should work
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: eeab6942a8dc65dab789c7ca85cc31ba5cee74f3

Comment by Peter Jones [ 01/May/20 ]

Landed for 2.14

Comment by Andreas Dilger [ 01/May/20 ]

Peter, this still needs some work to make the remote MDT selection heuristics a bit better.

For top-level directories, it makes sense to round-robin them when the MDTs are relatively empty, and only pick a specific MDT when they are imbalanced.

Also, I think the free space threshold needs to be smaller for MDT imbalance than for OST imbalance.

Comment by Andreas Dilger [ 09/May/20 ]

I was looking at the LMV code and tested to verify that the LU-11213 implementation of "lmv_create()" will already do round-robin allocation across MDTs in a directory with "-D -c 1 -i -1" set, and will only use QOS weight-balanced MDT selection if the MDT space imbalance is over the qos_threshold_rr limit. This is enabled only when a default directory layout is set on the root directory. It will also only be inherited one level down, because lum_stripe_count=1 default layouts are changed to lum_stripe_count=0 in lod_ah_init(), which is no longer considered to be inherited.

I think that means there is a (hopefully simple) change that can be done to make this functionality more useful for filesystems:

  • explicitly set "trusted.dmv = .lum_magic=LMV_USER_MAGIC, .lum_stripe_count=1, .lum_stripe_index=-1" xattr in mdd_prepare() for newly formatted filesystems
  • one disadvantage is that this will only be inherited for the top-level directory, and would need LU-13440 to be inherited for 2-3 levels
  • another disadvantage is that this needs to be enabled manually by the user on "ROOT/" for existing filesystems

While I think this will not be perfect, it will be a lot better than defaulting to not using all of the other MDTs unless the user knows to explicitly use "lfs mkdir" and/or "lfs setdirstripe -D" on the filesystem to start using other MDTs.

Comment by Gerrit Updater [ 09/May/20 ]

Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/38553
Subject: LU-13417 mdd: default DNE MDT balance on new filesystems
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 46feb12b9d64b366ba8cc0b5b842824add5a23c2

Comment by Andreas Dilger [ 09/May/20 ]

I pushed the above patch as a starting point for getting this working, but it needs some additional work to finish it off.  Hongchao or Lai, can you please finish off that patch so that we can get it included into 2.14.

Comment by Gerrit Updater [ 10/Dec/20 ]

Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/40925
Subject: LU-13417 test: dump replay-dual 22d debug log
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: fedcbc7741ad5928b3c2f40a910c2126f37f060c

Comment by Lai Siyao [ 11/Dec/20 ]

It's strange replay-dual 22d always failed, but can pass on autotest for other patches.

When I tried to test replay-dual alone on master code, it also failed on autotest.

Comment by Gerrit Updater [ 29/Apr/21 ]

Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/43489
Subject: LU-13417 test: add mkdir_on_mdt0()
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: b84ff3e11e3f3c064d5e257b13d21ce80053e4df

Comment by Gerrit Updater [ 29/Apr/21 ]

Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/43491
Subject: LU-13417 test: use mkdir_on_mdt0() in misc tests
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 8ea0f5eb5fe0e9e08ed44a2e805c704e4f40d581

Comment by Gerrit Updater [ 29/Apr/21 ]

Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/43492
Subject: LU-13417 test: use mkdir_on_mdt0() in replay-dual
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 4196abb88570a008839c9f249b80bb1c357c2723

Comment by Gerrit Updater [ 30/Jun/21 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/43489/
Subject: LU-13417 test: add mkdir_on_mdt0()
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 54fb8458db0bff4fdfe42ba7476de3129d7606cd

Comment by Gerrit Updater [ 30/Jun/21 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/43491/
Subject: LU-13417 test: use mkdir_on_mdt0() in misc tests
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: de62c8c7ef5d627da872260686d9279cbb60736e

Comment by Gerrit Updater [ 30/Jun/21 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/43492/
Subject: LU-13417 test: use mkdir_on_mdt0() in replay-dual
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: ce179e97767936ff76282fd06df063b386851fe7

Comment by Gerrit Updater [ 15/Jul/21 ]

Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/44315
Subject: LU-13417 test: mkdir_on_mdt0() in more tests
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 2bd9a10b17db3df7f87b4068a3746bc114069b46

Comment by Gerrit Updater [ 22/Jul/21 ]

Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/44384
Subject: LU-13417 test: generate uneven MDTs early for sanity 413
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: efa31b02f470cf90281504ca49185a37006523d8

Comment by Gerrit Updater [ 25/Jul/21 ]

Andreas Dilger (adilger@whamcloud.com) merged in patch https://review.whamcloud.com/44384/
Subject: LU-13417 test: generate uneven MDTs early for sanity 413
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 233344d451e567c71726bcb071f45cf8f1c6ef3e

Comment by Gerrit Updater [ 27/Jul/21 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/44315/
Subject: LU-13417 test: mkdir_on_mdt0() in more tests
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 618625af42b9ff0427b096996ddf07a327689ec8

Comment by Gerrit Updater [ 31/Jul/21 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/38553/
Subject: LU-13417 mdd: set default LMV on ROOT
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 3e04b0fd6c3dd36372f33c54ea5f401c27485d60

Comment by Peter Jones [ 31/Jul/21 ]

Landed for 2.15

Generated at Sat Feb 10 03:01:05 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.