[LU-14345] e2fsck of very large directories is broken Created: 20/Jan/21 Updated: 19/Feb/21 Resolved: 19/Feb/21 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.13.0, Lustre 2.14.0 |
| Fix Version/s: | Lustre 2.14.0 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Andreas Dilger | Assignee: | Andreas Dilger |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | e2fsck | ||
| Issue Links: |
|
||||||||||||||||||||
| Severity: | 3 | ||||||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||||||
| Description |
|
In patch http://review.whamcloud.com/22008 " While working on the e2fsck code for an unrelated issue, I saw that there is still some code in e2fsck that limits the size of a directory to be less than 2^32 bytes in size. Currently, e2fsck will consider a directory file with a large size to be broken and clear the high bytes of the size if there is any problem with the directory size:
if (pb.is_dir) {
:
if (err || sz != inode->i_size) {
bad_size = 7;
pctx->num = sz;
} else if (inode->i_size & (fs->blocksize - 1))
bad_size = 5;
else if (nblock > (pb.last_block + 1))
bad_size = 1;
:
:
if (fix_problem(ctx, PR_1_BAD_I_SIZE, pctx)) {
if (LINUX_S_ISDIR(inode->i_mode))
pctx->num &= 0xFFFFFFFFULL;
ext2fs_inode_size_set(fs, inode, pctx->num);
and in libext2fs this will also report an error and truncate the size if it is set for directories:
errcode_t ext2fs_inode_size_set(ext2_filsys fs, struct ext2_inode *inode,
ext2_off64_t size)
{
/* Only regular files get to be larger than 4GB */
if (!LINUX_S_ISREG(inode->i_mode) && (size >> 32))
return EXT2_ET_FILE_TOO_BIG;
A 4GB directory can have about 80M 32-byte filename entries (11M 256-byte entries) in it before the 32-bit size is exceeded. While we previously reported a limit of approximately 10M entries in a single directory in the past, it is not completely unlikely that we may see large-OST systems with directories over 4GB in size in the near future. It is not totally clear, but it may be that in addition to changing the above code to allow large directories, e2fsck may also need to track large directories (a new ctx->large_dirs counter) and set the LARGEDIR feature flag in the superblock, like it does for LARGE_FILE (with the existing ctx->large_files counter). On the one hand, unlike the LARGE_FILE feature (which is set by the kernel at runtime), the LARGEDIR feature should always be set before a large directory is allowed, so it should also be set in all of the superblock backups. This would prevent garbage/corrupt directory inodes from getting a "large size" when in fact they are just sparse files. On the other hand, we don't want to truncate real large directories because a flag is missing in the superblock for some reason. It may be that there is enough logic in the directory leaf block handling that a large size should never be set unless the directory legitimately has enough blocks for that size, but I haven't looked into the details of this yet. |
| Comments |
| Comment by Andreas Dilger [ 20/Jan/21 ] |
|
Artem, would you have time to look into this? It would be good to get a fix into e2fsprogs before the 2.14 release is made that enables large_dir by default for all MDT/OST filesystems. I think currently that e2fsck will mostly "ignore" a large directory until it has some error in pass1 that sets bad_size, then it would truncate the size to a 32-bit value. If ext2fs_inode_size_set() is called somewhere else, like e2fsck_rehash_dir()->write_directory() called with "e2fsck -fD", then it looks like ext2fs_inode_size_set() will return an error when trying to set the size, but that may cause other problems. I think the minimum fix is to check for ext2fs_has_feature_largedir() in ext2fs_inode_size_set() and allow a 64-bit size in that case, and similarly clean up the above code in pass1 that is truncating the size for LINUX_S_ISDIR() inodes. |
| Comment by Andreas Dilger [ 20/Jan/21 ] |
|
If we got It seems more likely that this would be hit with a single 80M-entry directory (11-20M entries with very long filenames) on the MDT. Due to The existing conf_sanity test_111 did not trigger the problem reported here, likely because it limits itself to a 2GB directory size instead of exceeding 4GB, and because e2fsck doesn't find any errors with the large directory inode that would trigger this problem. It may be possible to trigger it with a > 4GB directory by adding "-D" to the run_e2fsck options at the end. |
| Comment by Gerrit Updater [ 01/Feb/21 ] |
|
|
| Comment by Gerrit Updater [ 02/Feb/21 ] |
|
Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/41385 |
| Comment by Gerrit Updater [ 03/Feb/21 ] |
|
Andreas Dilger (adilger@whamcloud.com) merged in patch https://review.whamcloud.com/41385/ |
| Comment by Gerrit Updater [ 08/Feb/21 ] |
|
Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/41433 |
| Comment by Gerrit Updater [ 19/Feb/21 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/41433/ |