Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-14345

e2fsck of very large directories is broken

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.14.0
    • Lustre 2.13.0, Lustre 2.14.0
    • 3
    • 9223372036854775807

    Description

      In patch http://review.whamcloud.com/22008 "LU-1365 e2fsprogs: enable large directory support in tools" support was added to e2fsprogs for directories with 3-level htree and larger than 2GB in size. The large_dir feature was enabled by default for all new ldiskfs filesystems in patch https://review.whamcloud.com/36555 "LU-11546 utils: enable large_dir for ldiskfs" (commit v2_13_50-13-gcd1faa0124).

      While working on the e2fsck code for an unrelated issue, I saw that there is still some code in e2fsck that limits the size of a directory to be less than 2^32 bytes in size. Currently, e2fsck will consider a directory file with a large size to be broken and clear the high bytes of the size if there is any problem with the directory size:

              if (pb.is_dir) {
                      :
                      if (err || sz != inode->i_size) {
                              bad_size = 7;
                              pctx->num = sz;
                      } else if (inode->i_size & (fs->blocksize - 1))
                              bad_size = 5;
                      else if (nblock > (pb.last_block + 1))
                              bad_size = 1;
                      :
                      :
                      if (fix_problem(ctx, PR_1_BAD_I_SIZE, pctx)) {
                              if (LINUX_S_ISDIR(inode->i_mode))
                                      pctx->num &= 0xFFFFFFFFULL;
                              ext2fs_inode_size_set(fs, inode, pctx->num);
      

      and in libext2fs this will also report an error and truncate the size if it is set for directories:

      errcode_t ext2fs_inode_size_set(ext2_filsys fs, struct ext2_inode *inode,
                                      ext2_off64_t size)
      {
              /* Only regular files get to be larger than 4GB */
              if (!LINUX_S_ISREG(inode->i_mode) && (size >> 32))
                      return EXT2_ET_FILE_TOO_BIG;
      

      A 4GB directory can have about 80M 32-byte filename entries (11M 256-byte entries) in it before the 32-bit size is exceeded. While we previously reported a limit of approximately 10M entries in a single directory in the past, it is not completely unlikely that we may see large-OST systems with directories over 4GB in size in the near future.

      It is not totally clear, but it may be that in addition to changing the above code to allow large directories, e2fsck may also need to track large directories (a new ctx->large_dirs counter) and set the LARGEDIR feature flag in the superblock, like it does for LARGE_FILE (with the existing ctx->large_files counter).

      On the one hand, unlike the LARGE_FILE feature (which is set by the kernel at runtime), the LARGEDIR feature should always be set before a large directory is allowed, so it should also be set in all of the superblock backups. This would prevent garbage/corrupt directory inodes from getting a "large size" when in fact they are just sparse files. On the other hand, we don't want to truncate real large directories because a flag is missing in the superblock for some reason. It may be that there is enough logic in the directory leaf block handling that a large size should never be set unless the directory legitimately has enough blocks for that size, but I haven't looked into the details of this yet.

      Attachments

        Issue Links

          Activity

            People

              adilger Andreas Dilger
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: