[LU-15850] MDT QOS should always be used for round-robin directories. Created: 12/May/22  Updated: 28/Oct/22  Resolved: 05/Aug/22

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.15.0
Fix Version/s: Lustre 2.16.0

Type: Improvement Priority: Minor
Reporter: Andreas Dilger Assignee: Lai Siyao
Resolution: Fixed Votes: 0
Labels: MON

Issue Links:
Related
is related to LU-15910 ROOT default LMV is not working for s... Resolved
is related to LU-13440 DNE3: limit directory default layout ... Resolved
is related to LU-13439 DNE3: MDT QOS tuning to avoid full MD... Resolved
is related to LU-14762 qos subdirectory creation stay on par... Resolved
is related to LU-13417 DNE3: mkdir() automatically create re... Resolved
Rank (Obsolete): 9223372036854775807

 Description   

The MDT QOS should always be used for subdirectories created in a parent that has round-robin activated, if the MDT space balance exceeds qos_threshold_rr. Otherwise, subdirectories in that directory tree will suddenly change from r-r to being "sticky" on a single MDT, which significantly changed the behavior and load distribution across MDTs. The "threshold by depth" should only be used for directories that would otherwise have always been created on the parent already.

Related to this, it should be possible to tune the weighting of subdirectories by depth so that this can be adjusted without recompiling the code.



 Comments   
Comment by Andreas Dilger [ 19/May/22 ]

Lai, I was trying to test my patch to fix the "use space balance for RR directories" issue, but found something very wrong with the max-inherit and max-inherit-rr code, when used with explicitly inherited default layouts (i.e. layouts set on a non-root directory and copied down the tree while decrementing max-inherit-rr).

When used with the implicitly inherited layout from the root directory, the max-inherit-rr value is copied from the ROOT directory, and is compared against lli_dir_depth (which increases with directory depth):

                if (lsm->lsm_md_max_inherit_rr != LMV_INHERIT_RR_NONE &&
                    (lsm->lsm_md_max_inherit_rr == LMV_INHERIT_RR_UNLIMITED ||
                     lsm->lsm_md_max_inherit_rr >= lli->lli_dir_depth))
                        op_data->op_flags |= MF_RR_MKDIR;

This works as expected because lsm_md_max_inherit_rr is constant when implicitly inherited from the root directory, and lli_dir_depth is increasing by directory depth. However, if lsm_md_max_inherit_rr is on an explicitly copied default layout on a directory, then lsm_md_max_inherit_rr is decremented by one for each level FROM THE FILESYSTEM ROOT, while lli_dir_depth is incremented by one for each level. So in this second case, these values can be totally unrelated and the comparison is meaningless. For example, if the directory is 10 deep from the filesystem root, then lli_dir_depth must be >= 10 on the parent directory.

I'm thinking something like "store (lsm_md_max_inherit_rr + parent->lli_dir_depth) in memory on the parent directory, so that the child directory (with (child->lli_dir_depth = parent->lli_dir_depth + 1), so that the "parent->lli_dir_depth" value cancels out and the above check works properly. However, it doesn't seem very obvious yet how that will be implemented properly.

Comment by Lai Siyao [ 23/May/22 ]

Indeed, dir-depth only considered ROOT. We may convert lsm_md_max-inherit and lsm_md_max-inherit-rr to absolute dir depth to ROOT in lsm unpack, then the comparison with lli_dir_depth will be opaque.

Comment by Lai Siyao [ 27/May/22 ]

Andreas, below code is for filesystem-wide default LMV only:

                if (lsm->lsm_md_max_inherit_rr != LMV_INHERIT_RR_NONE &&
                    (lsm->lsm_md_max_inherit_rr == LMV_INHERIT_RR_UNLIMITED ||
                     lsm->lsm_md_max_inherit_rr >= lli->lli_dir_depth))
                        op_data->op_flags |= MF_RR_MKDIR;

This looks not to be an issue.

Comment by Andreas Dilger [ 27/May/22 ]

I was trying to check MF_RR_MKDIR to see if the directory has round-robin allocation enabled, so that the "stay on parent" check in lmv_locate_tgt_qos() would not be used if the parent is r-r.

Comment by Gerrit Updater [ 09/Jun/22 ]

"Lai Siyao <lai.siyao@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/47576
Subject: LU-15850 mdt: pack default LMV in open reply
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: c31475758946fd5f32382b5415ab7c0c1b46913c

Comment by Gerrit Updater [ 09/Jun/22 ]

"Lai Siyao <lai.siyao@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/47577
Subject: LU-15850 llite: pass dmv inherit depth instead of dir depth
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: d33bd101c7b2a9184615d6ff4751fe8d6222283b

Comment by Gerrit Updater [ 09/Jun/22 ]

"Lai Siyao <lai.siyao@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/47578
Subject: LU-15850 lmv: always space-balance r-r directories
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 9017448833a249ce29ea0c5d26c60e3d5dc03201

Comment by Gerrit Updater [ 20/Jun/22 ]

"Lai Siyao <lai.siyao@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/47679
Subject: LU-15850 llite: interop test with 2.14
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 03fba460bc0f1b6a747b792715ac7f8f059eacfe

Comment by Gerrit Updater [ 27/Jun/22 ]

"Lai Siyao <lai.siyao@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/47789
Subject: LU-15850 llite: implicit default LMV inherit
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: cc65c9deaab82dfc7ab81235ba8c0339fe8fc73b

Comment by Gerrit Updater [ 26/Jul/22 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/47576/
Subject: LU-15850 mdt: pack default LMV in open reply
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: f6e4272fb0be5b798b7685bb40067e3f6877c8a5

Comment by Gerrit Updater [ 03/Aug/22 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/47577/
Subject: LU-15850 llite: pass dmv inherit depth instead of dir depth
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: c23c68a52a04369101db2bd3b1d3da23025fcf48

Comment by Gerrit Updater [ 05/Aug/22 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/47578/
Subject: LU-15850 lmv: always space-balance r-r directories
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 37c1ddc34d3a1e61c5533f48cb29fe2258ca2907

Comment by Peter Jones [ 05/Aug/22 ]

Landed for 2.16

Generated at Sat Feb 10 03:21:50 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.