[LU-13440] DNE3: limit directory default layout inheritance Created: 08/Apr/20  Updated: 19/May/22  Resolved: 05/Oct/21

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.15.0

Type: Improvement Priority: Major
Reporter: Andreas Dilger Assignee: Lai Siyao
Resolution: Fixed Votes: 0
Labels: dne3

Issue Links:
Related
is related to LU-13439 DNE3: MDT QOS tuning to avoid full MD... Resolved
is related to LU-10329 DNE3: REMOTE_PARENT_DIR scalability Open
is related to LU-13417 DNE3: mkdir() automatically create re... Resolved
is related to LU-14762 qos subdirectory creation stay on par... Resolved
is related to LU-14868 sanity: all subtests pass but test su... Resolved
is related to LU-15314 set default max-inherit to 3 for defa... Resolved
is related to LU-15850 MDT QOS should always be used for rou... Resolved
Rank (Obsolete): 9223372036854775807

 Description   

One problem that exists today is that default directory layouts are inherited by all new subdirectories created in the filesystem. That makes it difficult to set e.g. "lfs setdirstripe -D -c 1 -i -1" on the root directory and maybe a second level of directories without having it inherited by all of the subdirectories for the whole filesystem.

It would be useful to add a option like "lfs setdirstripe --max-inherit" that stores "lmv_max_inherit" on the default directory layout so that it is only copied down that many levels of subdirectories before it is not copied. The lmv_max_inherit would be decremented each time it is copied down to a new subdirectory, so there is no need to track the parent layout.

For compatibility, "lmv_max_inherit=0" would mean "copy forever", so "lmv_max_inherit=1" would mean "do not copy default layout". We don't need huge values here (e.g. "lmv_max_inherit=255" would be totally fine).

I don't think we need to do anything incompatible for older MDS nodes (e.g. we don't need to use a different LMV magic), since at worst the old MDS will copy this layout forever (ignoring lmv_max_inherit) and have the same behaviour as before this feature existed. Probably the easiest would be to split a __u8 field out of lum_padding1 and leave an unused __u8 and __u16 for future use.



 Comments   
Comment by Andreas Dilger [ 21/Mar/21 ]

Lai, could you please look into this next, whether it is possible to implement this in a relatively simple manner. We still need something that will "more automatically" distribute the load across MDTs, even if directory split is not active. It doesn't have to be perfect, but at least work with relatively little input from the admins if the MDTs become really imbalanced. I can think of two relatively straight forward options, and we might consider to implement both if they are not too complex:

  • the "limited inheritance" change described in this ticket would allow to e.g. set "default remote directory" ("-D -c 1 -i -1") on the root (or any) directory and then have it inherited for 2-3 directory levels before it reverts to "local" directories again. This would allow "-D -c1 -i -1 -X 3" or even "-D -c4 -i -1 -X 3" to be set on the root and spread the top of the tree widely, so that all MDTs are used, and then the lower levels stay local to their MDTs.
  • we could allow setting "-D -c 1 -i -1" on the root directory and have an MDS tunable parameter to inherit the root directory layout for the whole filesystem. That would need a bit of a change to the "always round-robin remote directories" so that it would only create remote directories if the MDTs are imbalanced, and prefer to create local directories if the MDT balance is good. Maybe limit the "round-robin" to the root directory or the top-level directory?
Comment by Gerrit Updater [ 26/Mar/21 ]

Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/43131
Subject: LU-13440 lmv: add default LMV inherit depth
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 4578bdf0091c7061328264b66f05f54b048da94d

Comment by Gerrit Updater [ 21/Apr/21 ]

Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/43385
Subject: LU-13440 obdclass: server qos penalty miscaculated
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 809bd318183f9b14cccf04f10e34b7b367f19e53

Comment by Andreas Dilger [ 21/Apr/21 ]

I think the main goal here is to allow users to get reasonable MDT balancing without significant effort. For new filesystems, I think the current patch is relatively good, but we also need a way to handle this for existing filesystems without the need to explicitly set a layout on every subdirectory (which would also be complex because the "inherit depth" would need to be changed each time, if not unlimited).

For default layout inheritance from the root directory, one problem that we've seen with file layout inheritance is if the default layout xattr is copied to each subdirectory, then it is difficult to change the default afterward without changing it in every directory in the filesystem, except directories that had a different layout explicitly set on them. If the root default layout has (lum_max_inherit = LMV_INHERIT_UNLIMITED) then there is no need to copy the layout to the subdirectories at all, since it could just be cached on the root directory. Also, we could assume for a root default layout even with (lum_max_inherit_rr != LMV_INHERIT_UNLIMITED) that if the parent directory does not have a layout, then we have exceeded lum_max_inherit_rr and no copy of the default layout is needed. Only the top lum_max_inherit_rr directories would get an explicit xattr copy.

For existing filesystems (which will almost certainly already have MDT imbalance), it probably makes sense to skip the RR phase entirely and set a default "-c 1 -i -1 --max-inherit=-1" default layout on the root directory, and make the automatic balancing of new directories in the whole filesystem "smart enough" (i.e. stick with parent MDT unless MDTs are imbalanced, probability of remote directory depends on imbalance between MDTs).

One option (in a follow-on patch) would be to track the "depth" of every directory in memory and then use this to determine whether the rr applies or not? That avoids the need to copy the layout explicitly for subdirectories, since it can ignore RR mode if depth > max_inherit_rr. The probability of creating a subdirectory on a remote MDT would depend on the imbalance between MDTs and also the depth (higher-level directories are more likely to be remote).

Comment by Gerrit Updater [ 04/May/21 ]

Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/43530
Subject: LU-13440 utils: fix handling of lsa_stripe_off = -1
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 792fa045a1975a1a18af0d72470134e5bf997d6a

Comment by Gerrit Updater [ 05/May/21 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/43385/
Subject: LU-13440 obdclass: server qos penalty miscaculated
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 0ccce7ecb72f847f4235a513424d90119edad7ca

Comment by Gerrit Updater [ 05/May/21 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/43131/
Subject: LU-13440 lmv: add default LMV inherit depth
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 01d34a6b3b2e34f7414f627e4f87993322dafa78

Comment by Gerrit Updater [ 22/Jul/21 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/43530/
Subject: LU-13440 utils: fix handling of lsa_stripe_off = -1
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 1dbe63301b8c5cb7f7d0fe9960cafd3cd0e45534

Generated at Sat Feb 10 03:01:17 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.