[LU-13417] DNE3: mkdir() automatically create remote directory on MDS which has more space Created: 06/Apr/20 Updated: 04/Dec/23 Resolved: 31/Jul/21 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.13.0 |
| Fix Version/s: | Lustre 2.15.0 |
| Type: | Improvement | Priority: | Major |
| Reporter: | Andreas Dilger | Assignee: | Lai Siyao |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | dne3 | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Severity: | 3 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
Since the patches from However, I can't seem to get this to work. On current master (2.13.52-259, just before 2.13.53) I'm not able to use "lfs setdirstripe -i -1 /path/to/dir" on an existing directory. It seems to select the less-full MDT if I explicitly run "lfs mkdir -i -1" for a new directory, but that was also true in 2.12 using patch https://review.whamcloud.com/30598 " There should be a way for "mkdir(2)" from a normal application (not "lfs mkdir -i -1") to be able to create remote (1-stripe) directories in the filesystem, and it should be possible to set this by default on the root directory (per The default mdt_qos_threshold_rr value should be reduced significantly (e.g. 1-2% and/or modified so that some amount of MDT balancing is active when the filesystem is balanced, at least in the root directory by default. Otherwise, without users understanding the details of DNE MDT0000 will hold all of the inodes, when it would be better if the top 1 or 2 levels of directories should be distributed across MDTs. Maybe this is mostly a documentation issue, and the "lfs-setdirstripe.1" man page needs to be updated to be more clear so I can understand what needs to be done to enable this? (also the usage message for setdirstripe/mkdir should remove the "This can only be done on MDT0 with the right of administrator" message.) |
| Comments |
| Comment by Lai Siyao [ 07/Apr/20 ] |
|
It's because 'lfs setdirstripe -D -i -1 <dir>' is used to delete default stripe, because when both "mdt_index" and "mdt_count" are unset, it's treated as removal. You need to use 'lfs setdirstripe -D -i -1 -c 1' to enable balanced subdirectories creation for plain directories. |
| Comment by Andreas Dilger [ 08/Apr/20 ] |
I think this is a user interface bug then, or a bug in how LOD is interpreting the layout. In my testing, "lfs getdirstripe" showed "lmv_stripe_count:0 lmv_stripe_offset:-1 lmv_hash_type:none" was set on the directory, but it didn't affect the creation of directories with "mkdir()". I would expect "lfs setdirstripe -d -D $dir" to delete the default layout for a directory, which seems to work, with "-d" already implying "-D" internally, but it is non-obvious because "lmv_stripe_offset:-1" is actually the default value, so "deleting" this layout didn't help. I would also expect "lfs setdirstripe -D -i -1" to set the default layout to create remote directories, matching how "lfs setstripe" works. There were other users confused by this as well. The missing part is that specifying only "-i -1" is internally using the same as "-c 0" which actually results in the existing layout to be reset to the default (local directory creation). I'll push a small patch that makes "-D -i -1" set "-c 1" internally if the stripe count is not specified, so that it doesn't result in unexpected behavior for the user. Another issue is that the default "qos_threshold_rr=17%" is too high to start balancing directory creation across MDTs. This might mean that MDT0000 is used for many millions of files and top-level directories before any balancing is even started. At that point it will be very difficult to return the balance of the MDTs because so many top-level directories and subdirectories have been created on MDT0000. I think it would be better to start space balancing and/or round-robin MDT selection for root directory entries right away if "lmv_stripe_count:1 lmv_stripe_offset:-1" is set on "ROOT/" (which I think we should make the default for 2.14). If there is only a single MDT then this is no change to behavior, but for multiple MDTs it will start using all MDTs right away at the root level and prevent the MDTs from becoming unbalanced in the first place. If we special-case MDT0000 to be RR/balanced immediately, then a smaller qos_threshold_rr=5% may still be useful to avoid the MDTs becoming too imbalanced, but will be less likely to be needed. |
| Comment by Gerrit Updater [ 08/Apr/20 ] |
|
Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/38160 |
| Comment by Gerrit Updater [ 01/May/20 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/38160/ |
| Comment by Peter Jones [ 01/May/20 ] |
|
Landed for 2.14 |
| Comment by Andreas Dilger [ 01/May/20 ] |
|
Peter, this still needs some work to make the remote MDT selection heuristics a bit better. For top-level directories, it makes sense to round-robin them when the MDTs are relatively empty, and only pick a specific MDT when they are imbalanced. Also, I think the free space threshold needs to be smaller for MDT imbalance than for OST imbalance. |
| Comment by Andreas Dilger [ 09/May/20 ] |
|
I was looking at the LMV code and tested to verify that the I think that means there is a (hopefully simple) change that can be done to make this functionality more useful for filesystems:
While I think this will not be perfect, it will be a lot better than defaulting to not using all of the other MDTs unless the user knows to explicitly use "lfs mkdir" and/or "lfs setdirstripe -D" on the filesystem to start using other MDTs. |
| Comment by Gerrit Updater [ 09/May/20 ] |
|
Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/38553 |
| Comment by Andreas Dilger [ 09/May/20 ] |
|
I pushed the above patch as a starting point for getting this working, but it needs some additional work to finish it off. Hongchao or Lai, can you please finish off that patch so that we can get it included into 2.14. |
| Comment by Gerrit Updater [ 10/Dec/20 ] |
|
Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/40925 |
| Comment by Lai Siyao [ 11/Dec/20 ] |
|
It's strange replay-dual 22d always failed, but can pass on autotest for other patches. When I tried to test replay-dual alone on master code, it also failed on autotest. |
| Comment by Gerrit Updater [ 29/Apr/21 ] |
|
Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/43489 |
| Comment by Gerrit Updater [ 29/Apr/21 ] |
|
Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/43491 |
| Comment by Gerrit Updater [ 29/Apr/21 ] |
|
Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/43492 |
| Comment by Gerrit Updater [ 30/Jun/21 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/43489/ |
| Comment by Gerrit Updater [ 30/Jun/21 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/43491/ |
| Comment by Gerrit Updater [ 30/Jun/21 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/43492/ |
| Comment by Gerrit Updater [ 15/Jul/21 ] |
|
Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/44315 |
| Comment by Gerrit Updater [ 22/Jul/21 ] |
|
Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/44384 |
| Comment by Gerrit Updater [ 25/Jul/21 ] |
|
Andreas Dilger (adilger@whamcloud.com) merged in patch https://review.whamcloud.com/44384/ |
| Comment by Gerrit Updater [ 27/Jul/21 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/44315/ |
| Comment by Gerrit Updater [ 31/Jul/21 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/38553/ |
| Comment by Peter Jones [ 31/Jul/21 ] |
|
Landed for 2.15 |