Details
-
Bug
-
Resolution: Done
-
Critical
-
None
-
Lustre 2.1.6
-
Toss 2.13 - Lustre 2.1.4
-
4
-
9223372036854775807
Description
We recently ran into LBUG errors with running the 2.5.x Lustre client against Lustre 2.1.2 that’s resolution was to update the version to 2.1.4. In all cases we encountered data loss in that files that previously existed show zero file length. The assumption at the time was that this file loss was due to numerous file system crashes that we encountered prior to the the software update.
This past Friday our last file system running 2.1.2 went down unexpectedly. Since we do not routinely take our file systems down due to demand, and a desire to preemptively prevent the issues that we encountered on the other file systems I update the file system during the outage. Because the OSTs went read-only I performed fsck’s on all the targets as well as the MDT as I routinely do, and they came back cleanly with the exception of a number of free inode count wrong and free block count wrong messages - which in my experience is normal.
When the file system was returned to service everything appeared fine but users started reporting that even though they could stat files, when trying to open them they came back as “no such file or directory”. The file system was immediately taken down and a subsequent fsck of the OSTs - which took several hours - put millions of files into lost+found. The MDT came back clean as before. This was the same scenario as was experienced the file systems that encountered the crashes. As was the case on the other file systems I need to use ll_recover_lost_found_objs to restore the objects and then ran another fsck as a sanity check.
Remounting the file system on a 2.1.4 client show file sizes but can not be opened. On a 2.5.4 client the files show zero file length.
An attempt was made to go back to 2.1.2 but that was impossible because mounting the MDT under lustre product a “Stale NFS file handle” message.
lfs getstripe on a sampling files that are inaccessible shows the objects and using debugfs to examine the objects show data in the objects and in the case of text/ascii files they can be easily read.
Right now we are in a down and critical state.
The inodes being reported by debugfs in lost+found can be ignored. They all show a single entry covering the whole block (4096 bytes in size) with inode number 0, which means the entry is unused and should not show up via ls. The lost+found directory is increased in size during e2fsck to hold unreferenced inodes as needed (using the ldiskfs inode number as the filename) but is never shrunk as the files are moved out of the directory, in case it needs to be used again. That is a safety measure on behalf of e2fsck, which tries to avoid allocating new blocks for lost+found during recovery to avoid the potential for further corruption.
The discrepancy between 2.1 and 2.5 clients on accessing files with missing objects may be due to changes in the client code. For "small files" (i.e. those with size below the stripe of the missing object) it may be that 2.1 will return the size via stat() as computed from the available objects and ignore the fact that one of the objects is missing until it is read. However, if the object is actually missing then the 2.5 behaviour is "more correct" in that it would be possible to have a sparse file that had part of the data on the missing object.
It may be possible to recover some the data from small files with missing objects if they are actually small files that just happen to be striped over 4 OSTs (== default striping?). On a 2.1 client, which reports the file size via stat instead of returning an error, it would be possible to run something like (untested, for example only):
This would try to repair specified files that have a size below the stripe width and copy them to a new temporary file. It isn't 100% foolproof since it isn't easy to figure out which object is missing, so there may be some class of files in the 1-4MB size range that have a hole where the missing object is.
The other issue that hasn't been discussed here is why the OST was corrupted after the upgrade in the first place. Oleg mentioned that this has happened before with a 2.1->2.5 upgrade, and I'm wondering if there is some ldiskfs patch in the TOSS release that needs to be updated, or some bug in e2fsprogs? What version of e2fsprogs is being used with 2.5?