[LU-7381] "e2fsck -fD" on directory may cause extent tree corruption - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Critical
Fix Version/s: Lustre 2.8.0
Affects Version/s: Lustre 2.7.0, Lustre 2.5.5
Labels:
- e2fsck
- e2fsprogs

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

Running e2fsck -fD on an OST upgraded from Lustre 1.8 with large O/0/d* directories (> 300k objects, 1600+ filesystem blocks) may result in the directory becoming corrupted. As yet the reason and mechanism has not been determined, but it may relate to the filesystem upgrade history (Lustre 1.8>2.1->2.5 and/or e2fsck versions), and possibly if the original directories were created as block-mapped directories and later upgraded to extent-mapped directories. The corruption itself is that the extent index block logical number (always for block 4 / 5) was too large, and an extent block was missing. In all observed cases, the extent tree was 5 blocks long (possibly a result of 4 extent blocks being moved out of the in-inode i_block[] array and into an external second-level index block).

e2fsck 1.42.12.wc1 (15-Sep-2014)
MMP interval is 7 seconds and total wait time is 30 seconds. Please wait...
Pass 1: Checking inodes, blocks, and sizes
Inode 17825800, end of extent exceeds allowed value
        (logical block 710, physical block 570459684, len 1019)
Clear? no

Inode 17825800, end of extent exceeds allowed value
        (logical block 1729, physical block 570493888, len 4294966836)
Clear? no

Inode 17825800, i_size is 5197824, should be 2908160.  Fix? no

Inode 17825800, i_blocks is 10192, should be 5704.  Fix? no

Inode 17825801, end of extent exceeds allowed value
        (logical block 711, physical block 570459691, len 966)
Clear? no

There doesn't appear to have been any other data corruption on the OST besides the directory extent blocks, but this resulted in several hundred directory leaf blocks being lost, either because the extent index block was already corrupt and not referencing the required blocks, and because e2fsck considered the last extent index blocks corrupt and discarded the contents.

In some cases, it appears that 100% of files were readable from the corrupted directory using debugfs:

debugfs -c -R "ls -l O/0/$DIR" $DEV

even though e2fsck was unhappy with the extent structure and cleared part of the extent tree and dumped the files into lost+found. This was consistent across a large number of OST object (O/0/d*) directories and was not a sign of external corruption or hardware problems. This implies that the directory entries were all moved into the first blocks of the directory, and the blocks in the corrupt part of the directory were somehow "extra" and the bug lies in the extent handling when shrinking the directory.

During recovery, e2fsck -fyv deleted all the zero-length files that had not had the "lma" FID set on them (i.e. they had never been accessed). To avoid this, the list_ost_objs.sh script was run on all affected OSTs before e2fsck, and then ll_recover_zero_length.sh was run to recreate the zero-length objects after ll_recover_lost_found_objs, and before the filesystem was mounted.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

list_ost_objs.sh
0.3 kB
04/Nov/15 2:48 AM
ll_recover_zero_length.sh
3 kB
04/Nov/15 2:48 AM
LU7381-ost_scratch_61-d0.tar.gz
0.3 kB
04/Nov/15 3:17 PM
LU7381-ost_scratch_73-dump_htree.tar.gz
2.80 MB
04/Nov/15 3:17 PM

Issue Links

is related to

LU-7368 e2fsck unsafe to interrupt with quota enabled

Resolved

LU-8706 e2fsck -fDy running forever

Resolved

Activity

[LU-7381] "e2fsck -fD" on directory may cause extent tree corruption

Andreas Dilger added a comment - 14/Dec/15 8:53 PM

Patch updating lustre/ChangeLog to reference new release has been landed to master for 2.8.0.

Andreas Dilger added a comment - 14/Dec/15 8:53 PM Patch updating lustre/ChangeLog to reference new release has been landed to master for 2.8.0.

Gerrit Updater added a comment - 14/Dec/15 8:17 PM

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/17572/
Subject: ~~LU-7381~~ e2fsck: update recommended e2fsprogs version
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: b3caa5019b8c781499c32a79b2d33a8929f2c045

Gerrit Updater added a comment - 14/Dec/15 8:17 PM Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/17572/ Subject: LU-7381 e2fsck: update recommended e2fsprogs version Project: fs/lustre-release Branch: master Current Patch Set: Commit: b3caa5019b8c781499c32a79b2d33a8929f2c045

Andreas Dilger added a comment - 11/Dec/15 9:12 PM

The e2fsprogs-1.42.13.wc4 release should also be recommended for other maintenance releases.

Andreas Dilger added a comment - 11/Dec/15 9:12 PM The e2fsprogs-1.42.13.wc4 release should also be recommended for other maintenance releases.

Gerrit Updater added a comment - 11/Dec/15 9:06 PM

Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: http://review.whamcloud.com/17572
Subject: ~~LU-7381~~ e2fsck: update recommended e2fsprogs version
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: e334fe27c9b04cd6052f988613bd55e2b679d3ae

Gerrit Updater added a comment - 11/Dec/15 9:06 PM Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: http://review.whamcloud.com/17572 Subject: LU-7381 e2fsck: update recommended e2fsprogs version Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: e334fe27c9b04cd6052f988613bd55e2b679d3ae

Gerrit Updater added a comment - 11/Dec/15 8:35 AM

Andreas Dilger (andreas.dilger@intel.com) merged in patch http://review.whamcloud.com/17431/
Subject: ~~LU-7381~~ e2fsprogs: update build version to 1.42.13.wc4
Project: tools/e2fsprogs
Branch: master-lustre
Current Patch Set:
Commit: bc29f4330fc74836ea7b76e9f0adcd2f59fd9660

Gerrit Updater added a comment - 11/Dec/15 8:35 AM Andreas Dilger (andreas.dilger@intel.com) merged in patch http://review.whamcloud.com/17431/ Subject: LU-7381 e2fsprogs: update build version to 1.42.13.wc4 Project: tools/e2fsprogs Branch: master-lustre Current Patch Set: Commit: bc29f4330fc74836ea7b76e9f0adcd2f59fd9660

Gerrit Updater added a comment - 11/Dec/15 8:34 AM

Andreas Dilger (andreas.dilger@intel.com) merged in patch http://review.whamcloud.com/17153/
Subject: ~~LU-7381~~ e2fsck: fix e2fsck -fD directory truncation
Project: tools/e2fsprogs
Branch: master-lustre
Current Patch Set:
Commit: 7cb8130c79fa80b87c1406056221fc3151184862

Gerrit Updater added a comment - 11/Dec/15 8:34 AM Andreas Dilger (andreas.dilger@intel.com) merged in patch http://review.whamcloud.com/17153/ Subject: LU-7381 e2fsck: fix e2fsck -fD directory truncation Project: tools/e2fsprogs Branch: master-lustre Current Patch Set: Commit: 7cb8130c79fa80b87c1406056221fc3151184862

Gerrit Updater added a comment - 09/Dec/15 7:52 PM

Andreas Dilger (andreas.dilger@intel.com) merged in patch http://review.whamcloud.com/17152/
Subject: ~~LU-7381~~ libext2fs: fix block-mapped file punch
Project: tools/e2fsprogs
Branch: master-lustre
Current Patch Set:
Commit: 229a4739bd8d68192c669e13c411d57575cdc632

Gerrit Updater added a comment - 09/Dec/15 7:52 PM Andreas Dilger (andreas.dilger@intel.com) merged in patch http://review.whamcloud.com/17152/ Subject: LU-7381 libext2fs: fix block-mapped file punch Project: tools/e2fsprogs Branch: master-lustre Current Patch Set: Commit: 229a4739bd8d68192c669e13c411d57575cdc632

Andreas Dilger added a comment - 03/Dec/15 1:16 PM

Patches have all been accepted into upstream e2fsprogs. Working on a -wc4 release for this as well.

Andreas Dilger added a comment - 03/Dec/15 1:16 PM Patches have all been accepted into upstream e2fsprogs. Working on a -wc4 release for this as well.

Gerrit Updater added a comment - 02/Dec/15 8:10 PM

Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: http://review.whamcloud.com/17431
Subject: ~~LU-7381~~ e2fsprogs: update build version to 1.42.13.wc4
Project: tools/e2fsprogs
Branch: master-lustre
Current Patch Set: 1
Commit: 7af2dd90d352a1c07abb159b6752b9d66ed9257c

Gerrit Updater added a comment - 02/Dec/15 8:10 PM Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: http://review.whamcloud.com/17431 Subject: LU-7381 e2fsprogs: update build version to 1.42.13.wc4 Project: tools/e2fsprogs Branch: master-lustre Current Patch Set: 1 Commit: 7af2dd90d352a1c07abb159b6752b9d66ed9257c

Andreas Dilger added a comment - 14/Nov/15 1:16 AM

After moving the e2fsck_rehash() code over to using ext2fs_punch() to truncate the now-smaller directory, it allowed my new f_extent_htree test case to pass, but it caused test failures in other regression tests. It turns out that there were existing bugs in the ext2fs_punch_ind() handling of indirect-block mapped files, and a known bug in ext2fs_punch_ext() (which I didn't hit, but found a patch on e2fsprogs master which seems prudent to port to maint).

I've pushed 4 patches into our local regression testing, which runs the e2fsprogs regression tests on all the server platforms (RHEL/SLES) and then tests the new e2fsprogs with Lustre as well. I've also pushed the patches to the linux-ext4 mailing list for external review and inclusion into the upstream repository.

Andreas Dilger added a comment - 14/Nov/15 1:16 AM After moving the e2fsck_rehash() code over to using ext2fs_punch() to truncate the now-smaller directory, it allowed my new f_extent_htree test case to pass, but it caused test failures in other regression tests. It turns out that there were existing bugs in the ext2fs_punch_ind() handling of indirect-block mapped files, and a known bug in ext2fs_punch_ext() (which I didn't hit, but found a patch on e2fsprogs master which seems prudent to port to maint). I've pushed 4 patches into our local regression testing, which runs the e2fsprogs regression tests on all the server platforms (RHEL/SLES) and then tests the new e2fsprogs with Lustre as well. I've also pushed the patches to the linux-ext4 mailing list for external review and inclusion into the upstream repository.

People

Assignee:: Andreas Dilger

Reporter:: Andreas Dilger

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 04/Nov/15 2:48 AM

Updated:: 13/Oct/16 6:27 PM

Resolved:: 14/Dec/15 8:53 PM