[LU-5949] Verify e2fsck inode badness is working for duplicate blocks - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Major
Fix Version/s: None
Affects Version/s: None
Labels:
- e2fsck

Severity:
3
Rank (Obsolete):
16606

Description

A number of cases of corrupt inodes have lead to large numbers of duplicate blocks during e2fsck runs. Corrupt inodes should be detected as such during e2fsck scanning and be cleared instead of invoking duplicate block resolution in pass1b.

This may be complicated by running "e2fsck -fn" that does not clear these corrupt inodes during pass1. Need to verify that inode badness is working properly, and possibly figure a way that "e2fsck -fn"

Attachments

Issue Links

is related to

LU-13650 ldiskfs: enable metadata_csum on MDT/OST

Open

is related to

LU-12913 fsck found > 1M multilply-claimed blocks

Resolved

LU-8465 parallel e2fsck performance at scale

Resolved

Activity

[LU-5949] Verify e2fsck inode badness is working for duplicate blocks

Gerrit Updater added a comment - 09/Feb/21 6:04 PM

Andreas Dilger (adilger@whamcloud.com) merged in patch https://review.whamcloud.com/41450/
Subject: ~~LU-5949~~ e2fsck: call delete_inode() properly
Project: tools/e2fsprogs
Branch: master-lustre
Current Patch Set:
Commit: 4aea203f2d490f53596eb21233bf0c186a5b3679

Gerrit Updater added a comment - 09/Feb/21 6:04 PM Andreas Dilger (adilger@whamcloud.com) merged in patch https://review.whamcloud.com/41450/ Subject: LU-5949 e2fsck: call delete_inode() properly Project: tools/e2fsprogs Branch: master-lustre Current Patch Set: Commit: 4aea203f2d490f53596eb21233bf0c186a5b3679

Gerrit Updater added a comment - 09/Feb/21 7:49 AM

Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/41450
Subject: ~~LU-5949~~ e2fsck: call delete_inode() properly
Project: tools/e2fsprogs
Branch: master-lustre
Current Patch Set: 1
Commit: 7257c1a81bcf13ad7a88da7bb54625a61aba8163

Gerrit Updater added a comment - 09/Feb/21 7:49 AM Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/41450 Subject: LU-5949 e2fsck: call delete_inode() properly Project: tools/e2fsprogs Branch: master-lustre Current Patch Set: 1 Commit: 7257c1a81bcf13ad7a88da7bb54625a61aba8163

Gerrit Updater added a comment - 08/Feb/21 11:25 AM

Andreas Dilger (adilger@whamcloud.com) merged in patch https://review.whamcloud.com/41328/
Subject: ~~LU-5949~~ e2fsck: simplify inode badness handling
Project: tools/e2fsprogs
Branch: master-lustre
Current Patch Set:
Commit: 8725134d27574605c3da74b46df909e419d02b63

Gerrit Updater added a comment - 08/Feb/21 11:25 AM Andreas Dilger (adilger@whamcloud.com) merged in patch https://review.whamcloud.com/41328/ Subject: LU-5949 e2fsck: simplify inode badness handling Project: tools/e2fsprogs Branch: master-lustre Current Patch Set: Commit: 8725134d27574605c3da74b46df909e419d02b63

Andreas Dilger added a comment - 11/Dec/19 8:09 AM

One difficulty with checking the badness before doing the duplicate scanning is that the outcome depends on the order the inodes were processed in. The e2fsck block bitmap only stores if a block is used for pass1, assuming that no duplicate blocks exist in the filesystem. If e2fsck finds any duplicate blocks it has to re-scan the whole filesystem to find the inodes that are sharing blocks.

It is possible that we could avoid pass1 in some cases by clearing the bad inode rather than adding the blocks into the bitmap, assuming that the good inode is processed first and adds its blocks to the bitmap, and then the bad inode is found later and verified and does not add which would avoid pass1b sometimes.

The other proposal that Ted Ts'o had was to check all inodes in a block at the same time. That makes it easier to detect if a whole block is corrupted.

Andreas Dilger added a comment - 11/Dec/19 8:09 AM One difficulty with checking the badness before doing the duplicate scanning is that the outcome depends on the order the inodes were processed in. The e2fsck block bitmap only stores if a block is used for pass1, assuming that no duplicate blocks exist in the filesystem. If e2fsck finds any duplicate blocks it has to re-scan the whole filesystem to find the inodes that are sharing blocks. It is possible that we could avoid pass1 in some cases by clearing the bad inode rather than adding the blocks into the bitmap, assuming that the good inode is processed first and adds its blocks to the bitmap, and then the bad inode is found later and verified and does not add which would avoid pass1b sometimes. The other proposal that Ted Ts'o had was to check all inodes in a block at the same time. That makes it easier to detect if a whole block is corrupted.

Dongyang Li added a comment - 11/Dec/19 5:16 AM

I believe inode_badness patch just checks the inode fields without looking at the dup blocks at the moment, and offers to delete the inode in pass2 and pass4 if the badness is over the threshold.

dup blocks are handled independently by e2fsck: allow deleting or zeroing shared blocks.

I think we could check if the badness is over before pass1b, and skip/delete the inode before scanning for dup blocks.

Dongyang Li added a comment - 11/Dec/19 5:16 AM I believe inode_badness patch just checks the inode fields without looking at the dup blocks at the moment, and offers to delete the inode in pass2 and pass4 if the badness is over the threshold. dup blocks are handled independently by e2fsck: allow deleting or zeroing shared blocks. I think we could check if the badness is over before pass1b, and skip/delete the inode before scanning for dup blocks.

Andreas Dilger added a comment - 05/Dec/19 9:39 PM

It is clear from several reports that the e2fsck inode badness is not working correctly for duplicate blocks. It is intended functionality is that if some inode is reporting duplicate blocks, but also had other errors such as bad flags, size, blocks, mode, etc. then it is likely that this inode is actually corrupt/garbage and should be cleared instead of spending a long time to clone the file blocks just to have a garbage inode with some other file's data.

Avoiding the e2fsck pass1b/1c/1d process can save a very significant amount of time when running on a large/corrupt filesystem, and would be preferable to trying to repair such inodes.

It has also been suggested that instead of the "inode badness" feature (which is not yet landed upstream because it touches a lot of different code) that instead e2fsck check all of the inodes in an itable block together, and if there is corruption across multiple inodes in that block block that this be used as the "badness" for the inode, and consider those inodes as targets for deletion if they are also found to be sharing blocks. I think it is potentially dangerous to always clear a block of inodes that are all showing corruption, in case this corruption is systematic (e.g. clearing some feature flag, updating the checksum after filesystem UUID was changed, etc).

Andreas Dilger added a comment - 05/Dec/19 9:39 PM It is clear from several reports that the e2fsck inode badness is not working correctly for duplicate blocks. It is intended functionality is that if some inode is reporting duplicate blocks, but also had other errors such as bad flags, size, blocks, mode, etc. then it is likely that this inode is actually corrupt/garbage and should be cleared instead of spending a long time to clone the file blocks just to have a garbage inode with some other file's data. Avoiding the e2fsck pass1b/1c/1d process can save a very significant amount of time when running on a large/corrupt filesystem, and would be preferable to trying to repair such inodes. It has also been suggested that instead of the "inode badness" feature (which is not yet landed upstream because it touches a lot of different code) that instead e2fsck check all of the inodes in an itable block together, and if there is corruption across multiple inodes in that block block that this be used as the "badness" for the inode, and consider those inodes as targets for deletion if they are also found to be sharing blocks. I think it is potentially dangerous to always clear a block of inodes that are all showing corruption, in case this corruption is systematic (e.g. clearing some feature flag, updating the checksum after filesystem UUID was changed, etc).

People

Assignee:: Andreas Dilger

Reporter:: Andreas Dilger

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 23/Nov/14 5:59 PM

Updated:: 09/Feb/21 6:04 PM

Resolved:: 08/Feb/21 11:26 AM