Details
-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
None
-
None
-
3
-
9223372036854775807
Description
Running e2fsck on an MDT with a very large REMOTE_PARENT_DIR is extremely slow if there are entries in that directory that need to be repaired. In the case of a 60M-entry REMOTE_PARENT_DIR system, each directory entry was taking about 1.4s to repair due to PR_3_UNCONNECTED_DIR:
Unconnected directory inode 2102494 (/REMOTE_PARENT_DIR/???) Connect to /lost+found? yes Unconnected directory inode 2102510 (/REMOTE_PARENT_DIR/???) Connect to /lost+found? yes Unconnected directory inode 2102514 (/REMOTE_PARENT_DIR/???) Connect to /lost+found? yes
Depending on how many unattached entries there are, this might take days, weeks, or even months to complete (1M files might take 2 weeks to repair).
Attaching ltrace to e2fsck showed that all of the time is spent in ext2fs_get_pathname() opening and iterating through all of the entries in the huge directory (ltrace slowed down the per-file repair time from 1s to 14s but is the same fraction of time):
1638486316.336885 ext2fs_read_inode(0x18ab2f0, 0x261db5f2, 0x7ffdc97a2d00, 10) = 0 <0.000069> 1638486316.336977 ext2fs_link(0x18ab2f0, 11, 0x7ffdc97a2d80, 0x261db5f2) = 0 <0.001130> 1638486316.338130 ext2fs_read_inode(0x18ab2f0, 0x261db5f2, 0x7ffdc97a2bd0, 0xa626870) = 0 <0.000071> 1638486316.338223 ext2fs_icount_increment(0x383027b0, 0x261db5f2, 0, 0x18ab2b0) = 0 <0.000084> 1638486316.338329 ext2fs_icount_increment(0x1efa400, 0x261db5f2, 0, 0) = 0 <0.000073> 1638486316.338425 ext2fs_write_inode(0x18ab2f0, 0x261db5f2, 0x7ffdc97a2bd0, 0) = 0 <0.000094> 1638486316.338542 ext2fs_u32_list_test(0x1efa310, 0x261db5f2, 11, 0) = 0 <0.000069> 1638486316.338633 ext2fs_dir_iterate(0x18ab2f0, 0x261db5f2, 1, 0 <unfinished ...> 1638486316.338727 ext2fs_read_inode(0x18ab2f0, 0x83f7c001, 0x7ffdc97a28f0, 0) = 0 <0.000071> 1638486316.338819 ext2fs_icount_decrement(0x383027b0, 0x83f7c001, 0, 0x18ab2c0) = 0 <0.000080> 1638486316.338921 ext2fs_read_inode(0x18ab2f0, 11, 0x7ffdc97a28f0, 0) = 0 <0.000070> 1638486316.339014 ext2fs_icount_increment(0x383027b0, 11, 0, 0x18ab2a0) = 0 <0.000075> 1638486316.339111 ext2fs_icount_increment(0x1efa400, 11, 0, 0) = 0 <0.000071> 1638486316.339205 ext2fs_write_inode(0x18ab2f0, 11, 0x7ffdc97a28f0, 0) = 0 <0.000087> 1638486316.339313 <... ext2fs_dir_iterate resumed> ) = 0 <0.000679> 1638486316.339337 ext2fs_test_generic_bmap(0x1efa870, 0x261db5f3, 0x7f527a6cce48, 0x7f52765db010) = 4 <0.000070> 1638486316.339428 ext2fs_mark_generic_bmap(0x296ee90, 0x261db5f3, 4, 2) = 0 <0.000070> 1638486316.339521 ext2fs_mark_generic_bmap(0x296ee90, 0x83f7c001, 0x261db5f3, 0) = 1 <0.000070> 1638486316.339614 ext2fs_test_generic_bmap(0x1efa870, 0x261db5f4, 0x7f527a6cce54, 0x7f52765db010) = 8 <0.000069> 1638486316.339705 ext2fs_mark_generic_bmap(0x296ee90, 0x261db5f4, 8, 3) = 0 <0.000070> 1638486316.339798 ext2fs_mark_generic_bmap(0x296ee90, 0x1a3e72d1, 0x261db5f4, 0) = 1 <0.000073> 1638486316.339894 ext2fs_test_generic_bmap(0x1efa870, 0x261db5f5, 0x7f527a6cce60, 0x7f52765db010) = 16 <0.000069> 1638486316.339985 ext2fs_mark_generic_bmap(0x296ee90, 0x261db5f5, 16, 4) = 0 <0.000069> 1638486316.340077 ext2fs_mark_generic_bmap(0x296ee90, 0x83f7c001, 0x261db5f5, 0) = 1 <0.000069> 1638486316.340168 ext2fs_test_generic_bmap(0x1efa870, 0x261db5f6, 0x7f527a6cce6c, 0x7f52765db010) = 32 <0.000069> 1638486316.340260 ext2fs_mark_generic_bmap(0x296ee90, 0x261db5f6, 32, 5) = 0 <0.000068> 1638486316.340350 ext2fs_mark_generic_bmap(0x296ee90, 0x545ad40d, 0x261db5f6, 0) = 0 <0.000069> 1638486316.340443 ext2fs_mark_generic_bmap(0x296ee90, 0x545ad40c, 0x2434746, 0) = 0 <0.000069> 1638486316.340534 ext2fs_mark_generic_bmap(0x296ee90, 0x1ec6326d, 0x2434743, 0) = 16 <0.000069> 1638486316.340625 ext2fs_test_generic_bmap(0x1efa870, 0x261db5f7, 0x7f527a6cce78, 0x7f52765db010) = 64 <0.000070> 1638486316.340717 ext2fs_mark_generic_bmap(0x296ee90, 0x261db5f7, 64, 6) = 0 <0.000069> 1638486316.340811 dcgettext(0, 0x448684, 5, 335) = 0x448684 <0.000079> 1638486316.340916 __fprintf_chk(0x7f5363936400, 1, 0x44cb40, 12) = 12 <0.000095> 1638486316.341033 dcgettext(0, 0x44cc4d, 5, 12) = 0x44cc4d <0.000072> 1638486316.341128 __fprintf_chk(0x7f5363936400, 1, 0x44cb40, 9) = 9 <0.000088> 1638486316.341239 __fprintf_chk(0x7f5363936400, 1, 0x44cb40, 1) = 1 <0.000089> 1638486316.341350 dcgettext(0, 0x44ccb0, 5, 1) = 0x44ccb0 <0.000071> 1638486316.341445 __fprintf_chk(0x7f5363936400, 1, 0x44cb40, 5) = 5 <0.000089> 1638486316.341557 __fprintf_chk(0x7f5363936400, 1, 0x44cb40, 1) = 1 <0.000092> 1638486316.341673 __ctype_b_loc() = 0x7f5364a2d6f0 <0.000062> 1638486316.341758 __fprintf_chk(0x7f5363936400, 1, 0x44cafd, 0) = 9 <0.000094> 1638486316.341874 __fprintf_chk(0x7f5363936400, 1, 0x44cb40, 2) = 2 <0.000089> 1638486316.341985 __ctype_b_loc() = 0x7f5364a2d6f0 <0.000062> 1638486316.342069 ext2fs_get_pathname(0x18ab2f0, 0x261db5f7, 0, 0x7ffdc97a2ce0) = 0 <13.722823> 1638486330.064926 strlen("/REMOTE_PARENT_DIR/???") = 22 <0.000133>
It isn't currently possible to reduce the number of unattached inodes (LU-14168 might avoid attaching them to lost+found, see options there), and it isn't possible to reduce the size of REMOTE_PARENT_DIR retroactively (LU-10329 and LU-15314 can avoid it in the future), so the ext2fs_get_pathname() function it self must be sped up by a few orders of magnitude.
The current patch avoids the parent lookup for unconnected directories, since that will never succeed. That speeds up e2fsck, but still results in thousands or millions of entries in lost+found. I've filed LU-15383 to understand/fix the root cause, but for filesystems that have this problem already, a better outcome would be to use the trusted.lma xattr to generate the filename (from lma_self_fid) and link the entry back into REMOTE_PARENT_DIR instead of lost+found, as long as e2fsck is properly handling the htree insertion.