[LU-6945] Clients reporting missing files - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Done
Priority: Critical
Fix Version/s: None
Affects Version/s: Lustre 2.1.6
Labels:
- mdt
Environment:
Toss 2.13 - Lustre 2.1.4

Severity:
4
Rank (Obsolete):
9223372036854775807

Description

We recently ran into LBUG errors with running the 2.5.x Lustre client against Lustre 2.1.2 that’s resolution was to update the version to 2.1.4. In all cases we encountered data loss in that files that previously existed show zero file length. The assumption at the time was that this file loss was due to numerous file system crashes that we encountered prior to the the software update.

This past Friday our last file system running 2.1.2 went down unexpectedly. Since we do not routinely take our file systems down due to demand, and a desire to preemptively prevent the issues that we encountered on the other file systems I update the file system during the outage. Because the OSTs went read-only I performed fsck’s on all the targets as well as the MDT as I routinely do, and they came back cleanly with the exception of a number of free inode count wrong and free block count wrong messages - which in my experience is normal.

When the file system was returned to service everything appeared fine but users started reporting that even though they could stat files, when trying to open them they came back as “no such file or directory”. The file system was immediately taken down and a subsequent fsck of the OSTs - which took several hours - put millions of files into lost+found. The MDT came back clean as before. This was the same scenario as was experienced the file systems that encountered the crashes. As was the case on the other file systems I need to use ll_recover_lost_found_objs to restore the objects and then ran another fsck as a sanity check.

Remounting the file system on a 2.1.4 client show file sizes but can not be opened. On a 2.5.4 client the files show zero file length.

An attempt was made to go back to 2.1.2 but that was impossible because mounting the MDT under lustre product a “Stale NFS file handle” message.

lfs getstripe on a sampling files that are inaccessible shows the objects and using debugfs to examine the objects show data in the objects and in the case of text/ascii files they can be easily read.

Right now we are in a down and critical state.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

cat-lustre-log.txt
0.2 kB
03/Aug/15 9:11 PM
debug.txt
4 kB
03/Aug/15 8:25 PM
lustre-log.txt
0.2 kB
03/Aug/15 8:25 PM

Activity

[LU-6945] Clients reporting missing files

Andreas Dilger added a comment - 04/Aug/15 6:48 PM

The inodes being reported by debugfs in lost+found can be ignored. They all show a single entry covering the whole block (4096 bytes in size) with inode number 0, which means the entry is unused and should not show up via ls. The lost+found directory is increased in size during e2fsck to hold unreferenced inodes as needed (using the ldiskfs inode number as the filename) but is never shrunk as the files are moved out of the directory, in case it needs to be used again. That is a safety measure on behalf of e2fsck, which tries to avoid allocating new blocks for lost+found during recovery to avoid the potential for further corruption.

The discrepancy between 2.1 and 2.5 clients on accessing files with missing objects may be due to changes in the client code. For "small files" (i.e. those with size below the stripe of the missing object) it may be that 2.1 will return the size via stat() as computed from the available objects and ignore the fact that one of the objects is missing until it is read. However, if the object is actually missing then the 2.5 behaviour is "more correct" in that it would be possible to have a sparse file that had part of the data on the missing object.

It may be possible to recover some the data from small files with missing objects if they are actually small files that just happen to be striped over 4 OSTs (== default striping?). On a 2.1 client, which reports the file size via stat instead of returning an error, it would be possible to run something like (untested, for example only):

#!/bin/bash
for F in "$@"; do
        [ -f "$F.recov" ] && echo "$F.recov: already exists" && continue
        SIZE=$(stat -c%s "$F")
        STRIPE_SZ=$(lfs getstripe -S "$F")
        # to be more safe we could assume only the first stripe is valid:
        # STRIPE_CT=1
        # allowing the full stripe count will still eliminate large files that are definitely missing data
        STRIPE_CT=$(lfs getstripe -c "$F")
        (( $SIZE >= $STRIPE_CT * $STRIPE_SZ)) && echo "$F: may be missing data" && continue
        # such small files do not need multiple stripes
        lfs setstripe -c 1 "$F.recov"
        dd if="$F" of="$F.recov" bs=$SIZE count=1 conv=noerror
done

This would try to repair specified files that have a size below the stripe width and copy them to a new temporary file. It isn't 100% foolproof since it isn't easy to figure out which object is missing, so there may be some class of files in the 1-4MB size range that have a hole where the missing object is.

The other issue that hasn't been discussed here is why the OST was corrupted after the upgrade in the first place. Oleg mentioned that this has happened before with a 2.1->2.5 upgrade, and I'm wondering if there is some ldiskfs patch in the TOSS release that needs to be updated, or some bug in e2fsprogs? What version of e2fsprogs is being used with 2.5?

Andreas Dilger added a comment - 04/Aug/15 6:48 PM The inodes being reported by debugfs in lost+found can be ignored. They all show a single entry covering the whole block (4096 bytes in size) with inode number 0, which means the entry is unused and should not show up via ls. The lost+found directory is increased in size during e2fsck to hold unreferenced inodes as needed (using the ldiskfs inode number as the filename) but is never shrunk as the files are moved out of the directory, in case it needs to be used again. That is a safety measure on behalf of e2fsck, which tries to avoid allocating new blocks for lost+found during recovery to avoid the potential for further corruption. The discrepancy between 2.1 and 2.5 clients on accessing files with missing objects may be due to changes in the client code. For "small files" (i.e. those with size below the stripe of the missing object) it may be that 2.1 will return the size via stat() as computed from the available objects and ignore the fact that one of the objects is missing until it is read. However, if the object is actually missing then the 2.5 behaviour is "more correct" in that it would be possible to have a sparse file that had part of the data on the missing object. It may be possible to recover some the data from small files with missing objects if they are actually small files that just happen to be striped over 4 OSTs (== default striping?). On a 2.1 client, which reports the file size via stat instead of returning an error, it would be possible to run something like (untested, for example only): #!/bin/bash for F in "$@"; do [ -f "$F.recov" ] && echo "$F.recov: already exists" && continue SIZE=$(stat -c%s "$F") STRIPE_SZ=$(lfs getstripe -S "$F") # to be more safe we could assume only the first stripe is valid: # STRIPE_CT=1 # allowing the full stripe count will still eliminate large files that are definitely missing data STRIPE_CT=$(lfs getstripe -c "$F") (( $SIZE >= $STRIPE_CT * $STRIPE_SZ)) && echo "$F: may be missing data" && continue # such small files do not need multiple stripes lfs setstripe -c 1 "$F.recov" dd if="$F" of="$F.recov" bs=$SIZE count=1 conv=noerror done This would try to repair specified files that have a size below the stripe width and copy them to a new temporary file. It isn't 100% foolproof since it isn't easy to figure out which object is missing, so there may be some class of files in the 1-4MB size range that have a hole where the missing object is. The other issue that hasn't been discussed here is why the OST was corrupted after the upgrade in the first place. Oleg mentioned that this has happened before with a 2.1->2.5 upgrade, and I'm wondering if there is some ldiskfs patch in the TOSS release that needs to be updated, or some bug in e2fsprogs? What version of e2fsprogs is being used with 2.5?

Oleg Drokin added a comment - 04/Aug/15 6:45 PM

It's ok to see the filenames in ebugfs output in lost+found, but because inode field is zero, that just means they are deleted there. It's just because lost+found is special and we never want it to be truncated (not to need to allocate any data blocks when we want to add stuff there) that the names remain in this deleted state there.

Oleg Drokin added a comment - 04/Aug/15 6:45 PM It's ok to see the filenames in ebugfs output in lost+found, but because inode field is zero, that just means they are deleted there. It's just because lost+found is special and we never want it to be truncated (not to need to allocate any data blocks when we want to add stuff there) that the names remain in this deleted state there.

Ruth Klundt (Inactive) added a comment - 04/Aug/15 2:48 PM

we've checked all the OSTs and they all report 4-5k items in lost+found via debugfs, but nothing shows up via ls -l.

Ruth Klundt (Inactive) added a comment - 04/Aug/15 2:48 PM we've checked all the OSTs and they all report 4-5k items in lost+found via debugfs, but nothing shows up via ls -l.

Ruth Klundt (Inactive) added a comment - 04/Aug/15 2:40 PM

The objects in the screen were all verified to be restored, for that OST.

However, we find that although /bin/ls of lost+found (mounted ldiskfs) shows no files, debugfs shows items in that directory, e.g.:

11 (12) . 2 (4084) .. 0 (4096) #524990 0 (4096) #525635
0 (4096) #526139 0 (4096) #526706 0 (4096) #527737
0 (4096) #590365 0 (4096) #590761 0 (4096) #591191
0 (4096) #591637 0 (4096) #592078 0 (4096) #592408
0 (4096) #592693 0 (4096) #592998 0 (4096) #593285
0 (4096) #593578 0 (4096) #593877 0 (4096) #594179
0 (4096) #594470 0 (4096) #594770 0 (4096) #595075
0 (4096) #595377 0 (4096) #595673 0 (4096) #595971
0 (4096) #596269 0 (4096) #596570 0 (4096) #596871
0 (4096) #597173 0 (4096) #597465 0 (4096) #597767
0 (4096) #598071 0 (4096) #598362 0 (4096) #598647
0 (4096) #598932 0 (4096) #599224 0 (4096) #599521
...

we haven't cross-ref'd the objects against missing ones yet, but wondering if that is expected?
command was:
debugfs -c -R "ls /lost+found" /dev/mapper/360001ff090439000000000248b9b0024

Ruth Klundt (Inactive) added a comment - 04/Aug/15 2:40 PM The objects in the screen were all verified to be restored, for that OST. However, we find that although /bin/ls of lost+found (mounted ldiskfs) shows no files, debugfs shows items in that directory, e.g.: 11 (12) . 2 (4084) .. 0 (4096) #524990 0 (4096) #525635 0 (4096) #526139 0 (4096) #526706 0 (4096) #527737 0 (4096) #590365 0 (4096) #590761 0 (4096) #591191 0 (4096) #591637 0 (4096) #592078 0 (4096) #592408 0 (4096) #592693 0 (4096) #592998 0 (4096) #593285 0 (4096) #593578 0 (4096) #593877 0 (4096) #594179 0 (4096) #594470 0 (4096) #594770 0 (4096) #595075 0 (4096) #595377 0 (4096) #595673 0 (4096) #595971 0 (4096) #596269 0 (4096) #596570 0 (4096) #596871 0 (4096) #597173 0 (4096) #597465 0 (4096) #597767 0 (4096) #598071 0 (4096) #598362 0 (4096) #598647 0 (4096) #598932 0 (4096) #599224 0 (4096) #599521 ... we haven't cross-ref'd the objects against missing ones yet, but wondering if that is expected? command was: debugfs -c -R "ls /lost+found" /dev/mapper/360001ff090439000000000248b9b0024

Oleg Drokin added a comment - 04/Aug/15 12:27 AM

It's a pity you did not save the output.
I wanted to cross-reference if any o the missing objects you see now were listed as restored.

I guess you still can perform the check - all the objects in your buffer - are they sill present to where they were moved?

Oleg Drokin added a comment - 04/Aug/15 12:27 AM It's a pity you did not save the output. I wanted to cross-reference if any o the missing objects you see now were listed as restored. I guess you still can perform the check - all the objects in your buffer - are they sill present to where they were moved?

Joe Mervini added a comment - 03/Aug/15 11:50 PM

I did not log the output from the recovery process. But every file that was in lost+found was restored leaving nothing behind. And I was watching the recovery as it was happen. I was able to scroll back through one of my screens to get to some of the output before it ran out of buffer. Here is a sample:

Object /mnt/lustre/local/scratch1-OST0013/O/0/d18/2326194 restored.
Object /mnt/lustre/local/scratch1-OST0013/O/0/d19/2326195 restored.
Object /mnt/lustre/local/scratch1-OST0013/O/0/d20/2326196 restored.
Object /mnt/lustre/local/scratch1-OST0013/O/0/d21/2326197 restored.
Object /mnt/lustre/local/scratch1-OST0013/O/0/d22/2326198 restored.
Object /mnt/lustre/local/scratch1-OST0013/O/0/d23/2326199 restored.
Object /mnt/lustre/local/scratch1-OST0013/O/0/d24/2326200 restored.
Object /mnt/lustre/local/scratch1-OST0013/O/0/d25/2326201 restored.
Object /mnt/lustre/local/scratch1-OST0013/O/0/d26/2326202 restored.
Object /mnt/lustre/local/scratch1-OST0013/O/0/d27/2326203 restored.
Object /mnt/lustre/local/scratch1-OST0013/O/0/d28/2326204 restored.

And that's pretty much what I observed throughout the process. Didn't see any messages other than restored.

Joe Mervini added a comment - 03/Aug/15 11:50 PM I did not log the output from the recovery process. But every file that was in lost+found was restored leaving nothing behind. And I was watching the recovery as it was happen. I was able to scroll back through one of my screens to get to some of the output before it ran out of buffer. Here is a sample: Object /mnt/lustre/local/scratch1-OST0013/O/0/d18/2326194 restored. Object /mnt/lustre/local/scratch1-OST0013/O/0/d19/2326195 restored. Object /mnt/lustre/local/scratch1-OST0013/O/0/d20/2326196 restored. Object /mnt/lustre/local/scratch1-OST0013/O/0/d21/2326197 restored. Object /mnt/lustre/local/scratch1-OST0013/O/0/d22/2326198 restored. Object /mnt/lustre/local/scratch1-OST0013/O/0/d23/2326199 restored. Object /mnt/lustre/local/scratch1-OST0013/O/0/d24/2326200 restored. Object /mnt/lustre/local/scratch1-OST0013/O/0/d25/2326201 restored. Object /mnt/lustre/local/scratch1-OST0013/O/0/d26/2326202 restored. Object /mnt/lustre/local/scratch1-OST0013/O/0/d27/2326203 restored. Object /mnt/lustre/local/scratch1-OST0013/O/0/d28/2326204 restored. And that's pretty much what I observed throughout the process. Didn't see any messages other than restored.

Oleg Drokin added a comment - 03/Aug/15 11:37 PM

So with ll_recover_lost_found_objs - you did not happen to run it in -v mode and save output, did you? I just want to see if any of the now missing objects were in fact recovered and then deleted by something again.

Additionally, did you see that lost_found is now empty after ll_recover_lost_found_objs was run?

Oleg Drokin added a comment - 03/Aug/15 11:37 PM So with ll_recover_lost_found_objs - you did not happen to run it in -v mode and save output, did you? I just want to see if any of the now missing objects were in fact recovered and then deleted by something again. Additionally, did you see that lost_found is now empty after ll_recover_lost_found_objs was run?

Ruth Klundt (Inactive) added a comment - 03/Aug/15 11:35 PM

yes that seems to be the question. the next file I checked, size 2321, was there in the first stripe but all other stripes did not exist on the OSTs.

There were several fsck's run, I'll defer to Joe for questions about them since I wasn't around for that. I believe he has stored output from them. Is there a chance that he hit a version of fsck with a bug in it?

Ruth Klundt (Inactive) added a comment - 03/Aug/15 11:35 PM yes that seems to be the question. the next file I checked, size 2321, was there in the first stripe but all other stripes did not exist on the OSTs. There were several fsck's run, I'll defer to Joe for questions about them since I wasn't around for that. I believe he has stored output from them. Is there a chance that he hit a version of fsck with a bug in it?

Oleg Drokin added a comment - 03/Aug/15 11:23 PM

I imagine the size might be received from MDS because we started to store size on the mds some time ago, but 2.5 might be disregarding this info and 2.1 did not in the face of missing objects (I can probably check this).

The more important issue is - if all the files are missign at leat one object - where did all of those object disappeared and how to return them back?
I know you already did e2fsck and relinked all the files back into place so supposedly nothing should be lost anymore?

Oleg Drokin added a comment - 03/Aug/15 11:23 PM I imagine the size might be received from MDS because we started to store size on the mds some time ago, but 2.5 might be disregarding this info and 2.1 did not in the face of missing objects (I can probably check this). The more important issue is - if all the files are missign at leat one object - where did all of those object disappeared and how to return them back? I know you already did e2fsck and relinked all the files back into place so supposedly nothing should be lost anymore?

Joe Mervini added a comment - 03/Aug/15 11:17 PM

John - It is not an appliance. The file system consists of 1 SFA12K 5 stack front-ended by 6 Dell R720's (2 MDSs/4 OSSs). There are 28 22TB OSTs. The storage system was purchase through DDN.

Joe Mervini added a comment - 03/Aug/15 11:17 PM John - It is not an appliance. The file system consists of 1 SFA12K 5 stack front-ended by 6 Dell R720's (2 MDSs/4 OSSs). There are 28 22TB OSTs. The storage system was purchase through DDN.

Ruth Klundt (Inactive) added a comment - 03/Aug/15 11:15 PM

It is confusing, for sure. The files reported by the user as 'could not open' behave this way, from what we have seen so far. There are many - we've only looked at a couple of them:

In 2.1 client: size appears correct (non-zero), stat shows it as a regular file /bin/cat at the cmd line gets 'no such file or directory'.

In 2.5 client: size is 0, stat shows it as empty regular file, /bin/cat at the cmd line reports 'no such file or directory'.

lfs getstripe looks ok both places, and the first couple of objects exist. The text can be dumped with debugfs from those objects. I can double check, perhaps they are all missing one of their objects.

Ruth Klundt (Inactive) added a comment - 03/Aug/15 11:15 PM It is confusing, for sure. The files reported by the user as 'could not open' behave this way, from what we have seen so far. There are many - we've only looked at a couple of them: In 2.1 client: size appears correct (non-zero), stat shows it as a regular file /bin/cat at the cmd line gets 'no such file or directory'. In 2.5 client: size is 0, stat shows it as empty regular file, /bin/cat at the cmd line reports 'no such file or directory'. lfs getstripe looks ok both places, and the first couple of objects exist. The text can be dumped with debugfs from those objects. I can double check, perhaps they are all missing one of their objects.

People

Assignee:: Oleg Drokin

Reporter:: Joe Mervini

Votes:: 0 Vote for this issue

Watchers:: 9 Start watching this issue

Dates

Created:: 03/Aug/15 8:25 PM

Updated:: 13/Aug/15 1:03 AM

Resolved:: 13/Aug/15 1:03 AM