Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-16171

e2fsck should handle multiply-claimed blocks better

    XMLWordPrintable

Details

    • Improvement
    • Resolution: Fixed
    • Major
    • None
    • None
    • None
    • 9223372036854775807

    Description

      Running e2fsck on a filesystem with a large number of multiply-claimed blocks can result in e2fsck running for many hours or possibly days. In many such cases, the multiply-claimed blocks are caused by a corrupted inode or indirect block that causes a bad inode to overlap with many good inodes. This problem is made worse when running on a large filesystem (16TB or more) because random 32-bit numbers in the inode->i_blocks[] array are always "valid" block numbers (with smaller filesystems the random block numbers would be detected as an error). Garbage triple/double/indirect blocks will point to random "valid" blocks that will themselves contain other 32-bit block numbers and multiply the number of duplicate blocks exponentially.

      Rather than clone all of those blocks, or possibly deleting/zeroing all such inodes (as is suggested in LU-13446) it would be better to find the "bad" inode(s) causing the most problems, and clear only them, rather than clearing all of the inodes with shared blocks. However, care should be taken to avoid spuriously clearing inodes that only share blocks with a small number of peers, as it is difficult to know for sure in this case which inode is the bad one.

      An added difficulty in implementing this is that the full list of inodes sharing a given block is only available in pass1d, at which point it is already starting to clone the shared blocks. Some work might be possible in pass1b, by monitoring which inodes have the most shared blocks, but this isn't totally clear yet whether just counting the shared blocks is sufficient (divided by a factor like 4096 to avoid penalizing inodes that just have a bad indirect/index block), or if it is better to only count the shared inodes.

      Further complicating implementing the solution is that the "dict" code in e2fsck is only adds duplicate inodes with shared clusters to the list, and never removes anything from the dict (this code is even #ifdef'd out in the library), so this will need additional development to get the dict removal code working correctly. At that point, the goal is if a particularly bad inode is found (sharing blocks with dozens of other inodes), it should be removed from the inode and cluster dictionaries, and hopefully the processing of all later inodes would be trivial since they no longer share any inodes.

      Failing the "delete bad inode from dict" approach, it would be possible to clear the bad inode and restart e2fsck, but this might need a few restarts (full pass1 repeat) if there are multiple bad inodes (which is likely). However, that may still be preferable and faster (a couple of hours) than running pass1d for a very long time.

      Attachments

        Issue Links

          Activity

            People

              adilger Andreas Dilger
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: