Details
-
Bug
-
Resolution: Fixed
-
Critical
-
Lustre 2.7.0, Lustre 2.5.5
-
3
-
9223372036854775807
Description
Running e2fsck -fD on an OST upgraded from Lustre 1.8 with large O/0/d* directories (> 300k objects, 1600+ filesystem blocks) may result in the directory becoming corrupted. As yet the reason and mechanism has not been determined, but it may relate to the filesystem upgrade history (Lustre 1.8>2.1->2.5 and/or e2fsck versions), and possibly if the original directories were created as block-mapped directories and later upgraded to extent-mapped directories. The corruption itself is that the extent index block logical number (always for block 4 / 5) was too large, and an extent block was missing. In all observed cases, the extent tree was 5 blocks long (possibly a result of 4 extent blocks being moved out of the in-inode i_block[] array and into an external second-level index block).
e2fsck 1.42.12.wc1 (15-Sep-2014) MMP interval is 7 seconds and total wait time is 30 seconds. Please wait... Pass 1: Checking inodes, blocks, and sizes Inode 17825800, end of extent exceeds allowed value (logical block 710, physical block 570459684, len 1019) Clear? no Inode 17825800, end of extent exceeds allowed value (logical block 1729, physical block 570493888, len 4294966836) Clear? no Inode 17825800, i_size is 5197824, should be 2908160. Fix? no Inode 17825800, i_blocks is 10192, should be 5704. Fix? no Inode 17825801, end of extent exceeds allowed value (logical block 711, physical block 570459691, len 966) Clear? no
There doesn't appear to have been any other data corruption on the OST besides the directory extent blocks, but this resulted in several hundred directory leaf blocks being lost, either because the extent index block was already corrupt and not referencing the required blocks, and because e2fsck considered the last extent index blocks corrupt and discarded the contents.
In some cases, it appears that 100% of files were readable from the corrupted directory using debugfs:
debugfs -c -R "ls -l O/0/$DIR" $DEV
even though e2fsck was unhappy with the extent structure and cleared part of the extent tree and dumped the files into lost+found. This was consistent across a large number of OST object (O/0/d*) directories and was not a sign of external corruption or hardware problems. This implies that the directory entries were all moved into the first blocks of the directory, and the blocks in the corrupt part of the directory were somehow "extra" and the bug lies in the extent handling when shrinking the directory.
During recovery, e2fsck -fyv deleted all the zero-length files that had not had the "lma" FID set on them (i.e. they had never been accessed). To avoid this, the list_ost_objs.sh script was run on all affected OSTs before e2fsck, and then ll_recover_zero_length.sh was run to recreate the zero-length objects after ll_recover_lost_found_objs, and before the filesystem was mounted.