[LU-13770] improve e2fsck handling of meta_bg backup blocks - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Unresolved
Priority: Major
Fix Version/s: None
Affects Version/s: None
Labels:
- e2fsck

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

If a filesystem meta_bg has errors with the group descriptor blocks (e.g. some descriptor block is corrupted), it is potentially difficult to handle this case robustly.

Currently, if "e2fsck -b <backup_superblock>" is run, e2fsck will use all of the first backup meta_bg group descriptors (in "group 1" of the meta group) instead of the primary meta_bg group descriptors (in "group 0" of the meta group). It appears that ext2fs_descriptor_block_loc2() will only ever try the first backup descriptor block, but never the second backup descriptor block (in the last group of the meta group, if available):

        /*
         * If group_block is not the normal value, we're trying to use
         * the backup group descriptors and superblock --- so use the
         * alternate location of the second block group in the
         * metablock group.  Ideally we should be testing each bg
         * descriptor block individually for correctness, but we don't
         * have the infrastructure in place to do that.
         */

Given that OSTs larger than 192TiB always require the use of meta_bg, it would be useful to have e2fsck automatically try the backup group descriptor block(s) for that meta group if the checksum on the primary block is bad. This should be decided on a block-by-block basis, rather than switching to all backup descriptor blocks for a few reasons:

it is unlikely that using only the primary or only the backup descriptor blocks will get a full set of valid group descriptors for the filesystem if there is random corruption, but it is very unlikely that all three copies of the same descriptor block are corrupted.
using backup blocks slows down e2fsck because bg_itable_unused is unset, so all inodes in the group are scanned. This isn't a big issue if done for a few descriptors, but is significant if done for the whole filesystem.
switching to all backup descriptor blocks produces many more errors for users.

Attachments

Activity

People

Assignee:: WC Triage

Reporter:: Andreas Dilger

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 09/Jul/20 8:38 AM

Updated:: 09/Jul/20 8:46 AM