[LU-11213] DNE3: remote mkdir() in ROOT/ by default Created: 04/Aug/18 Updated: 07/Feb/22 Resolved: 20/Sep/19 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.13.0 |
| Type: | Improvement | Priority: | Major |
| Reporter: | Andreas Dilger | Assignee: | Lai Siyao |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | dne3 | ||
| Attachments: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
In order to start using multiple MDTs on a filesystem automatically, I wonder if it makes sense to start creating any directories under the filesystem ROOT/ directory remotely by default? This could be a client parameter that can be disabled if desired, but as can be seen in LU-11211 it is not very clear to users how to get multiple MDTs active by default. With OST object allocation we already do this by default for all objects. While there is some overhead to creating remote directories, this top-level directories are best placed to allow maximal distribution across all of the MDTs. This would also work as new MDTs are added to the filesystem. Using remote directories at the top level is a lot easier than trying to create a striped directory for ROOT/, since that would also need to change as new MDTs are added to the filesystem. It also allows more dynamic load balancing across MDTs. As for implementation, the client tunable might be something like "llite.*.mdt_distribution_enabled" to turn it on/off, and "llite.*.mdt_distribution_seconds", which is how often the client will do obd_statfs() to the MDTs to get space/inode info, so that it doesn't need to do this for every create. Once per 5s would be the minimum, maybe as long as 60s? The ROOT/ directory would get a flag like LUSTRE_TOPDIR_FL/LMAI_TOPDIR set when it is created that indicates the directory is a top-level directory and should have automatic remote directories created in it. This can be set manually by users on other directories with the "chattr +T" command, but it would not be inherited by subdirectories. Using the same algorithms and tunables as LOV QOS for MDT selection would help leverage the same code, and allow us to improve the code together in the future. |
| Comments |
| Comment by Andreas Dilger [ 05/Aug/18 ] |
|
Another way to do this that might be easier for Lustre users to handle is to allow setting "lfs setdirstripe -D -c1 -i -1 /mnt/lustre/remotedir" that will cause all new subdirectories to be created on the "best" MDT index, as we do with regular OST selection. Typically this should be round-robin if the MDT available space is relatively close, but will select a less full MDT if the space gets out of balance. |
| Comment by Lai Siyao [ 06/Aug/18 ] |
|
This can be simpler in this way: in ll_mkdir(), if parent is ROOT, fake an lmv_user_md which specifies a less empty MDT to create this directory. And this can ever be applied on general mkdir which is not under ROOT, but user may find mkdir is slow, so it's better to let directory restripe to balance MDT usage for directory not under ROOT. |
| Comment by Andreas Dilger [ 08/Aug/18 ] |
|
The one issue with checking for ROOT/ is that this will potentially have problems with subdirectory mounts with or without nodemaps, where a client is mounting only a subdirectory and doesn't see ROOT/ at all. Having an explicit flag or a layout on the ROOT/ or other directory (that can be set/removed by the user) is much more flexible and doesn't really add more complexity. In theory, we could always set the equivalent of lfs setstripe -i 1 -c -1 on the ROOT/ directory at format time, so that top-level directories are distributed across MDTs by default. Even if there was only a single MDT to start with, this would allow load distribution to other MDTs when they are added. We might want to consider adding a "no inherit" mode for this kind of layout, so that it doesn't propagate to new subdirectories. |
| Comment by Lai Siyao [ 29/Dec/18 ] |
|
Hi Andreas, the issue with setting this in default directory stripe is that we can't both have global default dir stripe and this feature in the same time. And IMHO this is not needed for non-ROOT directory because non-ROOT directory can be a striped directory, so subdirectories are distributed by default. So fundamentally this feature is introduced is because ROOT in Lustre DNE system is not striped. |
| Comment by Andreas Dilger [ 29/Dec/18 ] |
|
I think there are two separate issues here:
I don't think these two behaviors are incompatible. The "-1" index behavior is a property of the directory itself, essentially like a hashed layout that can be added on an existing plain directory that affects new subdirectory entries in the directory itself. That is not the same as the default directory layout that affects how the subdirectory will be created (eg. stripe count/hash function of the subdirectory). In a regular striped directory, the location of the subdirectory is based on the name and the hash function strored in the LMV of the directory itself. In the proposed change, we would add a new hash function that means "ignore the filename and use the 'best' (usually least full) MDT for the new subdirectory". We would want to be able to set such a layout on an existing directory (in particular the ROOT directory), but it would still be possible for those directories to inherit the default directory layout from the parent as well (eg. stripe_count). |
| Comment by Lai Siyao [ 01/Jan/19 ] |
|
The problem is that the global default stripe is now stored as the default stripe of ROOT, and we can't store two default stripe there. One option is to move the global default stripe somewhere else, but it needs taking care of backward compatibility; or we use the original proposal to store this as LUSTRE_TOPDIR_FL/LMAI_TOPDIR, and alter it via chattr. |
| Comment by Andreas Dilger [ 02/Jan/19 ] |
|
Is that true for directory default stripe or only file default stripe? Also, the directory default stripe should be inherited from the LMV dirstripe default and not the actual LMV layout of the directory. |
| Comment by Lai Siyao [ 02/Jan/19 ] |
|
It's true for both file and directory default stripe. And it's also true for directory layout, so by default if parent doesn't have default LMV dirstripe, subdirectories are created as plain directory, and doesn't have default LMV either. |
| Comment by Andreas Dilger [ 02/Jan/19 ] |
|
I'm still confused. There is the LMV default layout stores on a directory, which is inherited by newly created directories, which can be set on the root directory and is the default for the whole filesystem. I'm not necessarily suggesting to always change the default layout of the directory, though this seems natural if lmv_stripe_index = -1 in the default layout. Separately, there is the LMV layout of the directory itself, which describes how new filenames created in that directory are mapped to MDTs (normally via a filename hash) and should not be inherited. One option is to create a new hash type/flag (eg. LMV_HASH_FLAG_REMOTE or LMV_HASH_FLAG_BALANCED) that can be set on an existing single-stripe directory, similar to LOV_PATTERN_F_HOLE and LOV_PATTERN_F_RELEASED on regular files. This will not affect where the names of new directories are created, but the directories themselves may be located on remote MDTs if the space is imbalanced. |
| Comment by Lai Siyao [ 03/Jan/19 ] |
|
Okay, I see what you mean, I thought it's not allowed to set LMV on an existed directory, because currently setxattr(LMV) is used to create remote/striped directory, but by checking these special flags we can store LMV on plain directories. |
| Comment by Gerrit Updater [ 01/Mar/19 ] |
|
Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34357 |
| Comment by Gerrit Updater [ 01/Mar/19 ] |
|
Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34358 |
| Comment by Gerrit Updater [ 01/Mar/19 ] |
|
Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34359 |
| Comment by Gerrit Updater [ 01/Mar/19 ] |
|
Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34360 |
| Comment by Gerrit Updater [ 14/Apr/19 ] |
|
Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34656 |
| Comment by Gerrit Updater [ 14/Apr/19 ] |
|
Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34657 |
| Comment by Gerrit Updater [ 18/Apr/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34656/ |
| Comment by Gerrit Updater [ 04/May/19 ] |
|
Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34801 |
| Comment by Gerrit Updater [ 04/May/19 ] |
|
Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34802 |
| Comment by Gerrit Updater [ 04/Jun/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34358/ |
| Comment by Gerrit Updater [ 07/Jun/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34802/ |
| Comment by Gerrit Updater [ 07/Jun/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34359/ |
| Comment by Gerrit Updater [ 07/Jun/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34360/ |
| Comment by Gerrit Updater [ 12/Jun/19 ] |
|
Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/35207 |
| Comment by Gerrit Updater [ 12/Jun/19 ] |
|
Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/35208 |
| Comment by Andreas Dilger [ 12/Jun/19 ] |
|
I was thinking about how we might best use this feature in the field. It definitely makes sense to set it on the root directory (at some point by default, but not yet). However, one concern that I have is that (AFAIK) remote directories created by this feature will have fnv-1a hash type set on them (ie. they will be striped directories)? I don't think that is something that we want, but rather to make the new directories be unstriped. It may also be useful to allow specifying an "inherit_depth=N", so that if set on the root directory it will create the next N subdirectories with space-hash (eg. inherit_depth=1 on root sets space-hash for the top-level user directories, but lower-level directories revert to plain dirs). Using this avoids the situation where one user has a lot of files on their initial MDT, but their subdirectories don't get distributed further across MDTs, but we don't want the overhead of remote directories at every level. If we want all directories to be remote, then set "inherit_depth=-1" (probably an 8-bit field would be enough, given PATH_MAX limits), and maybe special-case the "-1" value so it is never decremented. This would also be useful for striped directories I think. |
| Comment by Gerrit Updater [ 13/Jun/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34657/ |
| Comment by Gerrit Updater [ 13/Jun/19 ] |
|
Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/35218 |
| Comment by Gerrit Updater [ 13/Jun/19 ] |
|
Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/35219 |
| Comment by Gerrit Updater [ 20/Jun/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35208/ |
| Comment by Gerrit Updater [ 20/Jun/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35207/ |
| Comment by Gerrit Updater [ 25/Jun/19 ] |
|
Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/35318 |
| Comment by Gerrit Updater [ 12/Jul/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35318/ |
| Comment by Gerrit Updater [ 30/Aug/19 ] |
|
Patrick Farrell (pfarrell@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36008 |
| Comment by Gerrit Updater [ 07/Sep/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36008/ |
| Comment by Gerrit Updater [ 20/Sep/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35218/ |
| Comment by Gerrit Updater [ 20/Sep/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35219/ |
| Comment by Peter Jones [ 20/Sep/19 ] |
|
Landed for 2.13 |
| Comment by Gerrit Updater [ 07/Feb/22 ] |
|
"Oleg Drokin <green@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/46476 |