[LU-11546] enable large_dir support for MDTs Created: 18/Oct/18 Updated: 25/May/22 Resolved: 12/Nov/19 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.12.0, Lustre 2.13.0 |
| Fix Version/s: | Lustre 2.14.0, Lustre 2.12.8 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Andreas Dilger | Assignee: | Dongyang Li |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | LTS12 | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||
| Severity: | 3 | ||||||||||||||||||||||||||||||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
Now that e2fsprogs-1.44.3 has support for large_dir, testing and enabling the large_dir support on MDTs would allow single directories to exceed the ~10M limit currently imposed by the 2-level htree. This should be done automatically for new MDTs, and the existing error message in ext4_dx_add_entry() should be updated to directly reference the large_dir feature by name instead of just "Large directory", and explain that it should be enabled by tune2fs. |
| Comments |
| Comment by Andreas Dilger [ 18/Oct/18 ] |
|
Note that I don't think that large_dir should be used for OSTs. For very large OSTs that exceed the 10M-entry limit for the O/0/d* directories, I think it makes more sense to have the MDTs create fewer than LUSTRE_DATA_SEQ_MAX_WIDTH (= 4B) objects per OST sequence, and have the OSTs create new object directories O/<seq>/d* for each sequence (which they already do for DNE when multiple MDTs are creating objects on the OST). This will allow the older object directory blocks to drop out of RAM as they become less used, and eventually those directories could be removed when they become empty. Having a single huge directory for objects means that the directory leaf blocks are updated totally randomly, and must always fit into RAM, or cause high read/write IOPS to the OST storage when there are lots of objects on a single OST. That is very undesirable, since it will typically be HDD-based OSTs that are so large they need more than 320M objects in a single filesystem (10M entries/directory * 32 directories). |
| Comment by Andreas Dilger [ 27/Jun/19 ] |
|
The "optimizing for more than 320M objects per OST per MDT" issue is being tracked in |
| Comment by Gerrit Updater [ 28/Jun/19 ] |
|
Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/35358 |
| Comment by Gerrit Updater [ 07/Sep/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35358/ |
| Comment by Andreas Dilger [ 22/Oct/19 ] |
|
Dongyang, can you please make a patch to enable large_dir on MDTs when they are formatted by mkfs.lustre. This can go into master once 2.14 opens (in the next few weeks), and then likely backported to 2.12.4. |
| Comment by Gerrit Updater [ 23/Oct/19 ] |
|
Li Dongyang (dongyangli@ddn.com) uploaded a new patch: https://review.whamcloud.com/36555 |
| Comment by Gerrit Updater [ 12/Nov/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36555/ |
| Comment by Peter Jones [ 12/Nov/19 ] |
|
Landed for 2.14 |
| Comment by Gerrit Updater [ 18/Nov/19 ] |
|
Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36780 |
| Comment by Gerrit Updater [ 18/Nov/19 ] |
|
Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36781 |
| Comment by Stephane Thiell [ 04/Jun/20 ] |
|
It would be nice to have this patch landed into 2.12 at some point. We just used 2.12.5 RC1 to format a MDT and large_dir was not set. With DNE and especially if we use lfs migrate -m, large_dir becomes quickly mandatory on MDTs. |
| Comment by Andreas Dilger [ 04/Jun/20 ] |
|
Stephane, it is easy enough to set after formatting - "tune2fs -O large_dir <dev>". The holdup with landing the patch is that the tests written for this feature don't pass. That is mostly a problem with the tests themselves (they don't pass on master either), so either the change should be rebased to not depend on the tests, or the tests should be fixed. |
| Comment by Stephane Thiell [ 04/Jun/20 ] |
|
OK! We actually used mkfs.lustre -O large_dir,project and it worked fine. It's also easy to set using tune2fs like you said so not super critical to have it by default indeed. Thanks for the heads-up regarding the tests. Note that we have been using large_dir for months now on Fir's MDTs (2.12.x). |
| Comment by Gerrit Updater [ 18/Oct/21 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/36781/ |