[LU-15330] ext2fs_get_pathname() very slow for large directory Created: 07/Dec/21 Updated: 04/Apr/23 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Andreas Dilger | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||||||
| Severity: | 3 | ||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||
| Description |
|
Running e2fsck on an MDT with a very large REMOTE_PARENT_DIR is extremely slow if there are entries in that directory that need to be repaired. In the case of a 60M-entry REMOTE_PARENT_DIR system, each directory entry was taking about 1.4s to repair due to PR_3_UNCONNECTED_DIR: Unconnected directory inode 2102494 (/REMOTE_PARENT_DIR/???) Connect to /lost+found? yes Unconnected directory inode 2102510 (/REMOTE_PARENT_DIR/???) Connect to /lost+found? yes Unconnected directory inode 2102514 (/REMOTE_PARENT_DIR/???) Connect to /lost+found? yes Depending on how many unattached entries there are, this might take days, weeks, or even months to complete (1M files might take 2 weeks to repair). Attaching ltrace to e2fsck showed that all of the time is spent in ext2fs_get_pathname() opening and iterating through all of the entries in the huge directory (ltrace slowed down the per-file repair time from 1s to 14s but is the same fraction of time): 1638486316.336885 ext2fs_read_inode(0x18ab2f0, 0x261db5f2, 0x7ffdc97a2d00, 10) = 0 <0.000069>
1638486316.336977 ext2fs_link(0x18ab2f0, 11, 0x7ffdc97a2d80, 0x261db5f2) = 0 <0.001130>
1638486316.338130 ext2fs_read_inode(0x18ab2f0, 0x261db5f2, 0x7ffdc97a2bd0, 0xa626870) = 0 <0.000071>
1638486316.338223 ext2fs_icount_increment(0x383027b0, 0x261db5f2, 0, 0x18ab2b0) = 0 <0.000084>
1638486316.338329 ext2fs_icount_increment(0x1efa400, 0x261db5f2, 0, 0) = 0 <0.000073>
1638486316.338425 ext2fs_write_inode(0x18ab2f0, 0x261db5f2, 0x7ffdc97a2bd0, 0) = 0 <0.000094>
1638486316.338542 ext2fs_u32_list_test(0x1efa310, 0x261db5f2, 11, 0) = 0 <0.000069>
1638486316.338633 ext2fs_dir_iterate(0x18ab2f0, 0x261db5f2, 1, 0 <unfinished ...>
1638486316.338727 ext2fs_read_inode(0x18ab2f0, 0x83f7c001, 0x7ffdc97a28f0, 0) = 0 <0.000071>
1638486316.338819 ext2fs_icount_decrement(0x383027b0, 0x83f7c001, 0, 0x18ab2c0) = 0 <0.000080>
1638486316.338921 ext2fs_read_inode(0x18ab2f0, 11, 0x7ffdc97a28f0, 0) = 0 <0.000070>
1638486316.339014 ext2fs_icount_increment(0x383027b0, 11, 0, 0x18ab2a0) = 0 <0.000075>
1638486316.339111 ext2fs_icount_increment(0x1efa400, 11, 0, 0) = 0 <0.000071>
1638486316.339205 ext2fs_write_inode(0x18ab2f0, 11, 0x7ffdc97a28f0, 0) = 0 <0.000087>
1638486316.339313 <... ext2fs_dir_iterate resumed> ) = 0 <0.000679>
1638486316.339337 ext2fs_test_generic_bmap(0x1efa870, 0x261db5f3, 0x7f527a6cce48, 0x7f52765db010) = 4 <0.000070>
1638486316.339428 ext2fs_mark_generic_bmap(0x296ee90, 0x261db5f3, 4, 2) = 0 <0.000070>
1638486316.339521 ext2fs_mark_generic_bmap(0x296ee90, 0x83f7c001, 0x261db5f3, 0) = 1 <0.000070>
1638486316.339614 ext2fs_test_generic_bmap(0x1efa870, 0x261db5f4, 0x7f527a6cce54, 0x7f52765db010) = 8 <0.000069>
1638486316.339705 ext2fs_mark_generic_bmap(0x296ee90, 0x261db5f4, 8, 3) = 0 <0.000070>
1638486316.339798 ext2fs_mark_generic_bmap(0x296ee90, 0x1a3e72d1, 0x261db5f4, 0) = 1 <0.000073>
1638486316.339894 ext2fs_test_generic_bmap(0x1efa870, 0x261db5f5, 0x7f527a6cce60, 0x7f52765db010) = 16 <0.000069>
1638486316.339985 ext2fs_mark_generic_bmap(0x296ee90, 0x261db5f5, 16, 4) = 0 <0.000069>
1638486316.340077 ext2fs_mark_generic_bmap(0x296ee90, 0x83f7c001, 0x261db5f5, 0) = 1 <0.000069>
1638486316.340168 ext2fs_test_generic_bmap(0x1efa870, 0x261db5f6, 0x7f527a6cce6c, 0x7f52765db010) = 32 <0.000069>
1638486316.340260 ext2fs_mark_generic_bmap(0x296ee90, 0x261db5f6, 32, 5) = 0 <0.000068>
1638486316.340350 ext2fs_mark_generic_bmap(0x296ee90, 0x545ad40d, 0x261db5f6, 0) = 0 <0.000069>
1638486316.340443 ext2fs_mark_generic_bmap(0x296ee90, 0x545ad40c, 0x2434746, 0) = 0 <0.000069>
1638486316.340534 ext2fs_mark_generic_bmap(0x296ee90, 0x1ec6326d, 0x2434743, 0) = 16 <0.000069>
1638486316.340625 ext2fs_test_generic_bmap(0x1efa870, 0x261db5f7, 0x7f527a6cce78, 0x7f52765db010) = 64 <0.000070>
1638486316.340717 ext2fs_mark_generic_bmap(0x296ee90, 0x261db5f7, 64, 6) = 0 <0.000069>
1638486316.340811 dcgettext(0, 0x448684, 5, 335) = 0x448684 <0.000079>
1638486316.340916 __fprintf_chk(0x7f5363936400, 1, 0x44cb40, 12) = 12 <0.000095>
1638486316.341033 dcgettext(0, 0x44cc4d, 5, 12) = 0x44cc4d <0.000072>
1638486316.341128 __fprintf_chk(0x7f5363936400, 1, 0x44cb40, 9) = 9 <0.000088>
1638486316.341239 __fprintf_chk(0x7f5363936400, 1, 0x44cb40, 1) = 1 <0.000089>
1638486316.341350 dcgettext(0, 0x44ccb0, 5, 1) = 0x44ccb0 <0.000071>
1638486316.341445 __fprintf_chk(0x7f5363936400, 1, 0x44cb40, 5) = 5 <0.000089>
1638486316.341557 __fprintf_chk(0x7f5363936400, 1, 0x44cb40, 1) = 1 <0.000092>
1638486316.341673 __ctype_b_loc() = 0x7f5364a2d6f0 <0.000062>
1638486316.341758 __fprintf_chk(0x7f5363936400, 1, 0x44cafd, 0) = 9 <0.000094>
1638486316.341874 __fprintf_chk(0x7f5363936400, 1, 0x44cb40, 2) = 2 <0.000089>
1638486316.341985 __ctype_b_loc() = 0x7f5364a2d6f0 <0.000062>
1638486316.342069 ext2fs_get_pathname(0x18ab2f0, 0x261db5f7, 0, 0x7ffdc97a2ce0) = 0 <13.722823>
1638486330.064926 strlen("/REMOTE_PARENT_DIR/???") = 22 <0.000133>
It isn't currently possible to reduce the number of unattached inodes (LU-14168 might avoid attaching them to lost+found, see options there), and it isn't possible to reduce the size of REMOTE_PARENT_DIR retroactively (LU-10329 and |
| Comments |
| Comment by Andreas Dilger [ 07/Dec/21 ] |
|
Instead of linearly traversing the whole 60M-entry REMOTE_PARENT_DIR directory each time to resolve the non-existent pathname, there are a few optimizations that could be done:
|
| Comment by Gerrit Updater [ 08/Dec/21 ] |
|
"Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/45785 |
| Comment by Gerrit Updater [ 17/Dec/21 ] |
|
"Andreas Dilger <adilger@whamcloud.com>" merged in patch https://review.whamcloud.com/45785/ |
| Comment by Gerrit Updater [ 17/Dec/21 ] |
|
"Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/45875 |
| Comment by Andreas Dilger [ 17/Dec/21 ] |
|
Will be included in e2fsprogs-1.46.2.wc4. |
| Comment by Gerrit Updater [ 17/Dec/21 ] |
|
"Andreas Dilger <adilger@whamcloud.com>" merged in patch https://review.whamcloud.com/45875/ |
| Comment by Andreas Dilger [ 17/Dec/21 ] |
|
The current patch avoids the parent lookup for unconnected directories, since that will never succeed. That speeds up e2fsck, but still results in thousands or millions of entries in lost+found. I've filed LU-15383 to understand/fix the root cause, but for filesystems that have this problem already, a better outcome would be to use the trusted.lma xattr to generate the filename (from lma_self_fid) and link the entry back into REMOTE_PARENT_DIR instead of lost+found, as long as e2fsck is properly handling the htree insertion. |