Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5949

Verify e2fsck inode badness is working for duplicate blocks

Details

    • Bug
    • Resolution: Fixed
    • Major
    • None
    • None
    • 3
    • 16606

    Description

      A number of cases of corrupt inodes have lead to large numbers of duplicate blocks during e2fsck runs. Corrupt inodes should be detected as such during e2fsck scanning and be cleared instead of invoking duplicate block resolution in pass1b.

      This may be complicated by running "e2fsck -fn" that does not clear these corrupt inodes during pass1. Need to verify that inode badness is working properly, and possibly figure a way that "e2fsck -fn"

      Attachments

        Issue Links

          Activity

            [LU-5949] Verify e2fsck inode badness is working for duplicate blocks

            Andreas Dilger (adilger@whamcloud.com) merged in patch https://review.whamcloud.com/41450/
            Subject: LU-5949 e2fsck: call delete_inode() properly
            Project: tools/e2fsprogs
            Branch: master-lustre
            Current Patch Set:
            Commit: 4aea203f2d490f53596eb21233bf0c186a5b3679

            gerrit Gerrit Updater added a comment - Andreas Dilger (adilger@whamcloud.com) merged in patch https://review.whamcloud.com/41450/ Subject: LU-5949 e2fsck: call delete_inode() properly Project: tools/e2fsprogs Branch: master-lustre Current Patch Set: Commit: 4aea203f2d490f53596eb21233bf0c186a5b3679

            Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/41450
            Subject: LU-5949 e2fsck: call delete_inode() properly
            Project: tools/e2fsprogs
            Branch: master-lustre
            Current Patch Set: 1
            Commit: 7257c1a81bcf13ad7a88da7bb54625a61aba8163

            gerrit Gerrit Updater added a comment - Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/41450 Subject: LU-5949 e2fsck: call delete_inode() properly Project: tools/e2fsprogs Branch: master-lustre Current Patch Set: 1 Commit: 7257c1a81bcf13ad7a88da7bb54625a61aba8163

            Andreas Dilger (adilger@whamcloud.com) merged in patch https://review.whamcloud.com/41328/
            Subject: LU-5949 e2fsck: simplify inode badness handling
            Project: tools/e2fsprogs
            Branch: master-lustre
            Current Patch Set:
            Commit: 8725134d27574605c3da74b46df909e419d02b63

            gerrit Gerrit Updater added a comment - Andreas Dilger (adilger@whamcloud.com) merged in patch https://review.whamcloud.com/41328/ Subject: LU-5949 e2fsck: simplify inode badness handling Project: tools/e2fsprogs Branch: master-lustre Current Patch Set: Commit: 8725134d27574605c3da74b46df909e419d02b63

            One difficulty with checking the badness before doing the duplicate scanning is that the outcome depends on the order the inodes were processed in. The e2fsck block bitmap only stores if a block is used for pass1, assuming that no duplicate blocks exist in the filesystem. If e2fsck finds any duplicate blocks it has to re-scan the whole filesystem to find the inodes that are sharing blocks.

            It is possible that we could avoid pass1 in some cases by clearing the bad inode rather than adding the blocks into the bitmap, assuming that the good inode is processed first and adds its blocks to the bitmap, and then the bad inode is found later and verified and does not add which would avoid pass1b sometimes.

            The other proposal that Ted Ts'o had was to check all inodes in a block at the same time. That makes it easier to detect if a whole block is corrupted.

            adilger Andreas Dilger added a comment - One difficulty with checking the badness before doing the duplicate scanning is that the outcome depends on the order the inodes were processed in. The e2fsck block bitmap only stores if a block is used for pass1, assuming that no duplicate blocks exist in the filesystem. If e2fsck finds any duplicate blocks it has to re-scan the whole filesystem to find the inodes that are sharing blocks. It is possible that we could avoid pass1 in some cases by clearing the bad inode rather than adding the blocks into the bitmap, assuming that the good inode is processed first and adds its blocks to the bitmap, and then the bad inode is found later and verified and does not add which would avoid pass1b sometimes. The other proposal that Ted Ts'o had was to check all inodes in a block at the same time. That makes it easier to detect if a whole block is corrupted.
            dongyang Dongyang Li added a comment -

            I believe inode_badness patch just checks the inode fields without looking at the dup blocks at the moment, and offers to delete the inode in pass2 and pass4 if the badness is over the threshold.

            dup blocks are handled independently by e2fsck: allow deleting or zeroing shared blocks.

            I think we could check if the badness is over before pass1b, and skip/delete the inode before scanning for dup blocks.

            dongyang Dongyang Li added a comment - I believe inode_badness patch just checks the inode fields without looking at the dup blocks at the moment, and offers to delete the inode in pass2 and pass4 if the badness is over the threshold. dup blocks are handled independently by e2fsck: allow deleting or zeroing shared blocks. I think we could check if the badness is over before pass1b, and skip/delete the inode before scanning for dup blocks.

            It is clear from several reports that the e2fsck inode badness is not working correctly for duplicate blocks. It is intended functionality is that if some inode is reporting duplicate blocks, but also had other errors such as bad flags, size, blocks, mode, etc. then it is likely that this inode is actually corrupt/garbage and should be cleared instead of spending a long time to clone the file blocks just to have a garbage inode with some other file's data.

            Avoiding the e2fsck pass1b/1c/1d process can save a very significant amount of time when running on a large/corrupt filesystem, and would be preferable to trying to repair such inodes.

            It has also been suggested that instead of the "inode badness" feature (which is not yet landed upstream because it touches a lot of different code) that instead e2fsck check all of the inodes in an itable block together, and if there is corruption across multiple inodes in that block block that this be used as the "badness" for the inode, and consider those inodes as targets for deletion if they are also found to be sharing blocks. I think it is potentially dangerous to always clear a block of inodes that are all showing corruption, in case this corruption is systematic (e.g. clearing some feature flag, updating the checksum after filesystem UUID was changed, etc).

            adilger Andreas Dilger added a comment - It is clear from several reports that the e2fsck inode badness is not working correctly for duplicate blocks. It is intended functionality is that if some inode is reporting duplicate blocks, but also had other errors such as bad flags, size, blocks, mode, etc. then it is likely that this inode is actually corrupt/garbage and should be cleared instead of spending a long time to clone the file blocks just to have a garbage inode with some other file's data. Avoiding the e2fsck pass1b/1c/1d process can save a very significant amount of time when running on a large/corrupt filesystem, and would be preferable to trying to repair such inodes. It has also been suggested that instead of the "inode badness" feature (which is not yet landed upstream because it touches a lot of different code) that instead e2fsck check all of the inodes in an itable block together, and if there is corruption across multiple inodes in that block block that this be used as the "badness" for the inode, and consider those inodes as targets for deletion if they are also found to be sharing blocks. I think it is potentially dangerous to always clear a block of inodes that are all showing corruption, in case this corruption is systematic (e.g. clearing some feature flag, updating the checksum after filesystem UUID was changed, etc).

            People

              adilger Andreas Dilger
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: