[LU-13197] e2fsck fails with 'Internal error: couldn't find dir_info for 1222630503' - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Minor
Fix Version/s: None
Affects Version/s: None
Labels:
- e2fsck

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

Running e2fsck on a large MDT filesystem fails with an internal error.

e2fsck 1.45.2.wc1 (27-May-2019)
MMP interval is 9 seconds and total wait time is 38 seconds. Please wait...
lfs-MDT0000 contains a file system with errors, check forced.
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
lfs-MDT0000: Deleted inode 813447929 has zero dtime. FIXED.
lfs-MDT0000: Internal error: couldn't find dir_info for 1222630503.
lfs-MDT0000: ***** FILE SYSTEM WAS MODIFIED *****

Restarting the e2fsck reports the same problem with the same inode number. The inode itself appears to be a valid directory:

debugfs: stat <1222630503>
Inode: 1222630503 Type: directory Mode: 02700 Flags: 0x0
Generation: 3227604132 Version: 0x00000172:b0d4e94e
User: 123375 Group: 130793 Project: 0 Size: 4096
File ACL: 0
Links: 2 Blockcount: 8
Fragment: Address: 0 Number: 0 Size: 0
ctime: 0x5dfcc8e7:00000000 - Fri Dec 20 13:13:11 2019
atime: 0x5e13b95f:00000000 - Mon Jan 6 22:49:03 2020
mtime: 0x5dfcc8e7:00000000 - Fri Dec 20 13:13:11 2019
crtime: 0x5dfcc88a:5e4ca20c - Fri Dec 20 13:11:38 2019
Size of extra inode fields: 32
Extended attributes:
trusted.lma (24) = 00 00 00 00 00 00 00 00 bb 87 14 00 02 00 00 00 fb 18 00 00 00 00 00 00
lma: fid=[0x2001487bb:0x18fb:0x0] compat=0 incompat=0
trusted.link (44)
BLOCKS:
(0):763955436
TOTAL: 1 

debugfs: ls <1222630503>
1222630503 (12) . 1162861665 (28) .. 1222214101 (36) 20.1.crd
1222214102 (36) 20.2.crd 1222214103 (36) 20.3.crd
1222214104 (36) 20.4.crd 1222214105 (36) 20.5.crd
:

Clearing this inode with "debugfs -w -R 'clri <1222630503>'" results in a different directory inode reporting an error, and that continues after clearing several such inodes.

The filesystem is 6TB with 2.3B inodes, about 1.8B of which are in use, and 380M directories, for an average of only 4 entries per directory. The inode numbers reporting problems are in the range of 1.2B. e2fsck is using about 45GB of RAM (with some swap on an NVMe SSD, so much faster than the HDD the filesystem image is on). It takes about 3-4h to hit this problem.

I was able to run e2fsck under gdb, though this slows down e2fsck significantly (10-100x slower), if there are conditional breakpoints since they need to be evaluated for each inode, which makes it impossible to use this way.

What I've discovered is quite interesting. The ctx->dir_info->array is filled in properly for the bad inode, and the array is OK at the start of pass2 ($66 holds the bad inode number):

    (gdb) p e2fsck_get_dir_info(ctx, 2)
    $171 = (struct dir_info *) 0x7ffabf2bd010
    (gdb) p *$171
    $172 = {ino = 2, dotdot = 0, parent = 2}
    (gdb) p $66
    $173 = 1222630490
    (gdb) p e2fsck_get_dir_info(ctx, $66)
    $174 = (struct dir_info *) 0x7ffb3d327f54
    (gdb) p *$174
    $175 = {ino = 1222630490, dotdot = 0, parent = 0}

The dir_info array itself is over 4GB in size, with 380M directories each with a 12-byte entry, but the bad inode is only about half way through the array ($140 is the index of the problematic entry:

    (gdb) p $140
    $176 = 176197275
    (gdb) p p ctx->dir_info->array[$140]
    $177 = {ino = 1222630490, dotdot = 0, parent = 0}
    (gdb) p *ctx->dir_info
    $178 = {count = 381539950, size = 381539965,
      array = 0x7ffabf2bd010, last_lookup = 0x0,
      tdb_fn = 0x0, tdb = 0x0}
    (gdb) p $178.count * sizeof(ctx->dir_info->array[0])
    $179 = 4578479400

I put a hardware watchpoint on the ".parent" and ".dotdot" fields of this entry so that I could watch as e2fsck_dir_info_set_dotdot() updated it, but I didn't think to put one on the ".ino" field, since I thought it would be static.

    (gdb) info break
    Num     Type           Disp Enb Address    What
    15      hw watchpoint  keep y              -location $177->parent
    16      hw watchpoint  keep y              -location $177->dotdot

The watchpoint triggered, and showed that the entry was changed by qsort_r(), which seemed plausible because the dir_info array needs to be sorted, because a binary search is used during lookups. However, the array is created in order during the inode table scan and does not need to be sorted. As can be seen from the stack, it is another array that is being sorted that overwrites the dir_info entry:

    Hardware watchpoint 16: -location $177->dotdot

    Old value = 0
    New value = 1581603653
    0x0000003b2ea381f7 in _quicksort ()
       from /lib64/libc.so.6
    (gdb) p $141
    $179 = {ino = 1222630490, dotdot = 0, parent = 0}
    (gdb) bt
    #0  0x0000003b2ea381e1 in _quicksort ()
      from /lib64/libc.so.6
    #1  0x0000003b2ea38f82 in qsort_r ()
      from /lib64/libc.so.6
    #2  0x000000000045447b in ext2fs_dblist_sort2 (
        dblist=0x6c9a90,
        sortfunc=0x428c3d <special_dir_block_cmp>)
        at dblist.c:189
    #3  0x00000000004285bc in e2fsck_pass2 (
        ctx=0x6c4000) at pass2.c:180
    #4  0x000000000041646b in e2fsck_run (ctx=0x6c4000)
        at e2fsck.c:253
    #5  0x0000000000415210 in main (argc=4,
        argv=0x7fffffffe088) at unix.c:2047
    (gdb) p ctx->dir_info->array[$140]
    $207 = {ino = 0, dotdot = 1581603653, parent = 0}
    (gdb) f 3
    #3  0x00000000004285bc in e2fsck_pass2 (
        ctx=0x6c4000) at pass2.c:180
    180       ext2fs_dblist_sort2(fs->dblist, special_dir_block_cmp);

The ext2fs_dblist_sort2() is for the directory blocks, and should not be changing the dir_info array at all. It can be observed from the above dumps that the fs->dblist.list address is right in the middle of the dir_info array address range:

    (gdb) p *fs->dblist
    $210 = {magic = 2133571340, fs = 0x6c4460,
      size = 763079922, count = 388821313, sorted = 0,
      list = 0x7ffad011e010}
    (gdb) p &ctx->dir_info->array[0]
    $211 = (struct dir_info *) 0x7ffabf2bd010
    (gdb) p &ctx->dir_info->array[$140]
    $212 = (struct dir_info *) 0x7ffb3d327f54

The problem ends up being that e2fsck_allocate_memory() takes an unsigned int size as an argument instead of unsigned long, which means that the large allocations for dir_info and dblist overflow and actually allocates a small array. What is surprising is that e2fsck didn't fail much earlier.

Attachments

Issue Links

is related to

LU-13452 MDT is 100% full, cannot delete files

Resolved

LU-14056 MDT filesystem full, fsck crashing

Resolved

LU-13604 rebase Lustre e2fsprogs onto 1.45.6

Resolved

LU-13558 upgrade e2fsprogs to 1.45.5

Resolved

Activity

[LU-13197] e2fsck fails with 'Internal error: couldn't find dir_info for 1222630503'

Andreas Dilger added a comment - 24/May/20 6:48 PM

All patches for this issue are landed, will tag this release as 1.45.2.wc2 in the repo.

Andreas Dilger added a comment - 24/May/20 6:48 PM All patches for this issue are landed, will tag this release as 1.45.2.wc2 in the repo.

Gerrit Updater added a comment - 24/May/20 4:05 PM

Andreas Dilger (adilger@whamcloud.com) merged in patch https://review.whamcloud.com/37441/
Subject: ~~LU-13197~~ e2fsck: consistently use ext2fs_get_mem()
Project: tools/e2fsprogs
Branch: master-lustre
Current Patch Set:
Commit: cad09f5e6107df65f7927a1d8e104de74cbb16d5

Gerrit Updater added a comment - 24/May/20 4:05 PM Andreas Dilger (adilger@whamcloud.com) merged in patch https://review.whamcloud.com/37441/ Subject: LU-13197 e2fsck: consistently use ext2fs_get_mem() Project: tools/e2fsprogs Branch: master-lustre Current Patch Set: Commit: cad09f5e6107df65f7927a1d8e104de74cbb16d5

Gerrit Updater added a comment - 24/May/20 4:05 PM

Andreas Dilger (adilger@whamcloud.com) merged in patch https://review.whamcloud.com/37440/
Subject: ~~LU-13197~~ e2fsck: fix overflow if more than 4B inodes
Project: tools/e2fsprogs
Branch: master-lustre
Current Patch Set:
Commit: 9450cab0c2398871f026afcb56a3eab02ed78b40

Gerrit Updater added a comment - 24/May/20 4:05 PM Andreas Dilger (adilger@whamcloud.com) merged in patch https://review.whamcloud.com/37440/ Subject: LU-13197 e2fsck: fix overflow if more than 4B inodes Project: tools/e2fsprogs Branch: master-lustre Current Patch Set: Commit: 9450cab0c2398871f026afcb56a3eab02ed78b40

Gerrit Updater added a comment - 24/May/20 4:05 PM

Andreas Dilger (adilger@whamcloud.com) merged in patch https://review.whamcloud.com/37439/
Subject: ~~LU-13197~~ misc: handle large files with filefrag
Project: tools/e2fsprogs
Branch: master-lustre
Current Patch Set:
Commit: bd3c5a11044c9427066a0b2a3122787a269ddeb0

Gerrit Updater added a comment - 24/May/20 4:05 PM Andreas Dilger (adilger@whamcloud.com) merged in patch https://review.whamcloud.com/37439/ Subject: LU-13197 misc: handle large files with filefrag Project: tools/e2fsprogs Branch: master-lustre Current Patch Set: Commit: bd3c5a11044c9427066a0b2a3122787a269ddeb0

Gerrit Updater added a comment - 24/May/20 5:48 AM

Andreas Dilger (adilger@whamcloud.com) merged in patch https://review.whamcloud.com/37443/
Subject: ~~LU-13197~~ e2fsck: reduce memory usage for directories
Project: tools/e2fsprogs
Branch: master-lustre
Current Patch Set:
Commit: 4eaf6ab403bbcab0036938ad1a6b9ec6d69734a0

Gerrit Updater added a comment - 24/May/20 5:48 AM Andreas Dilger (adilger@whamcloud.com) merged in patch https://review.whamcloud.com/37443/ Subject: LU-13197 e2fsck: reduce memory usage for directories Project: tools/e2fsprogs Branch: master-lustre Current Patch Set: Commit: 4eaf6ab403bbcab0036938ad1a6b9ec6d69734a0

Gerrit Updater added a comment - 23/May/20 11:03 PM

Andreas Dilger (adilger@whamcloud.com) merged in patch https://review.whamcloud.com/37442/
Subject: ~~LU-13197~~ debugfs: print inode numbers as unsigned
Project: tools/e2fsprogs
Branch: master-lustre
Current Patch Set:
Commit: f4f013381df959141803cadd709d2301c176a204

Gerrit Updater added a comment - 23/May/20 11:03 PM Andreas Dilger (adilger@whamcloud.com) merged in patch https://review.whamcloud.com/37442/ Subject: LU-13197 debugfs: print inode numbers as unsigned Project: tools/e2fsprogs Branch: master-lustre Current Patch Set: Commit: f4f013381df959141803cadd709d2301c176a204

Gerrit Updater added a comment - 23/May/20 11:02 PM

Andreas Dilger (adilger@whamcloud.com) merged in patch https://review.whamcloud.com/37438/
Subject: ~~LU-13197~~ debugfs: allow comment lines in command file
Project: tools/e2fsprogs
Branch: master-lustre
Current Patch Set:
Commit: 8c041aaa224cf2349c4df9718bc42d8925425ffa

Gerrit Updater added a comment - 23/May/20 11:02 PM Andreas Dilger (adilger@whamcloud.com) merged in patch https://review.whamcloud.com/37438/ Subject: LU-13197 debugfs: allow comment lines in command file Project: tools/e2fsprogs Branch: master-lustre Current Patch Set: Commit: 8c041aaa224cf2349c4df9718bc42d8925425ffa

Gerrit Updater added a comment - 23/May/20 11:02 PM

Andreas Dilger (adilger@whamcloud.com) merged in patch https://review.whamcloud.com/37436/
Subject: ~~LU-13197~~ e2fsck: avoid mallinfo() if over 2GB allocated
Project: tools/e2fsprogs
Branch: master-lustre
Current Patch Set:
Commit: 039b852c6a40e12bec0d621bf7aa4f9b554a8dbc

Gerrit Updater added a comment - 23/May/20 11:02 PM Andreas Dilger (adilger@whamcloud.com) merged in patch https://review.whamcloud.com/37436/ Subject: LU-13197 e2fsck: avoid mallinfo() if over 2GB allocated Project: tools/e2fsprogs Branch: master-lustre Current Patch Set: Commit: 039b852c6a40e12bec0d621bf7aa4f9b554a8dbc

Gerrit Updater added a comment - 22/May/20 6:45 AM

Andreas Dilger (adilger@whamcloud.com) merged in patch https://review.whamcloud.com/37437/
Subject: ~~LU-13197~~ e2fsck: avoid overflow with very large dirs
Project: tools/e2fsprogs
Branch: master-lustre
Current Patch Set:
Commit: 8051687bc488a00fb3e17554064c308add13cbcf

Gerrit Updater added a comment - 22/May/20 6:45 AM Andreas Dilger (adilger@whamcloud.com) merged in patch https://review.whamcloud.com/37437/ Subject: LU-13197 e2fsck: avoid overflow with very large dirs Project: tools/e2fsprogs Branch: master-lustre Current Patch Set: Commit: 8051687bc488a00fb3e17554064c308add13cbcf

Gerrit Updater added a comment - 22/May/20 6:29 AM

Andreas Dilger (adilger@whamcloud.com) merged in patch https://review.whamcloud.com/37424/
Subject: ~~LU-13197~~ e2fsck: use proper types for variables
Project: tools/e2fsprogs
Branch: master-lustre
Current Patch Set:
Commit: 77bccf803dea6cb9a69cfd334619c7552143f3cf

Gerrit Updater added a comment - 22/May/20 6:29 AM Andreas Dilger (adilger@whamcloud.com) merged in patch https://review.whamcloud.com/37424/ Subject: LU-13197 e2fsck: use proper types for variables Project: tools/e2fsprogs Branch: master-lustre Current Patch Set: Commit: 77bccf803dea6cb9a69cfd334619c7552143f3cf

People

Assignee:: Andreas Dilger

Reporter:: Andreas Dilger

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 04/Feb/20 5:31 AM

Updated:: 21/Oct/20 6:14 PM

Resolved:: 24/May/20 6:48 PM