Details

    • Bug
    • Resolution: Done
    • Critical
    • None
    • Lustre 2.1.6
    • Toss 2.13 - Lustre 2.1.4
    • 4
    • 9223372036854775807

    Description

      We recently ran into LBUG errors with running the 2.5.x Lustre client against Lustre 2.1.2 that’s resolution was to update the version to 2.1.4. In all cases we encountered data loss in that files that previously existed show zero file length. The assumption at the time was that this file loss was due to numerous file system crashes that we encountered prior to the the software update.

      This past Friday our last file system running 2.1.2 went down unexpectedly. Since we do not routinely take our file systems down due to demand, and a desire to preemptively prevent the issues that we encountered on the other file systems I update the file system during the outage. Because the OSTs went read-only I performed fsck’s on all the targets as well as the MDT as I routinely do, and they came back cleanly with the exception of a number of free inode count wrong and free block count wrong messages - which in my experience is normal.

      When the file system was returned to service everything appeared fine but users started reporting that even though they could stat files, when trying to open them they came back as “no such file or directory”. The file system was immediately taken down and a subsequent fsck of the OSTs - which took several hours - put millions of files into lost+found. The MDT came back clean as before. This was the same scenario as was experienced the file systems that encountered the crashes. As was the case on the other file systems I need to use ll_recover_lost_found_objs to restore the objects and then ran another fsck as a sanity check.

      Remounting the file system on a 2.1.4 client show file sizes but can not be opened. On a 2.5.4 client the files show zero file length.

      An attempt was made to go back to 2.1.2 but that was impossible because mounting the MDT under lustre product a “Stale NFS file handle” message.

      lfs getstripe on a sampling files that are inaccessible shows the objects and using debugfs to examine the objects show data in the objects and in the case of text/ascii files they can be easily read.

      Right now we are in a down and critical state.

      Attachments

        1. cat-lustre-log.txt
          0.2 kB
        2. debug.txt
          4 kB
        3. lustre-log.txt
          0.2 kB

        Activity

          [LU-6945] Clients reporting missing files

          The first 75 files I ran through with debugfs dump script, were all either 1 or 2 stripes, total size < 2MiB. I'll need to get the user to verify the sanity of the files.

          ps dd gets no such file or directory on these files.

          ruth.klundt@gmail.com Ruth Klundt (Inactive) added a comment - The first 75 files I ran through with debugfs dump script, were all either 1 or 2 stripes, total size < 2MiB. I'll need to get the user to verify the sanity of the files. ps dd gets no such file or directory on these files.

          The upgrade on the server was from lustre 2.1.2 -> 2.1.4. The clients are running 2.5.4 llnl version generally, we have a 2.1.4 client off to the side.

          The version of e2fsprogs on the servers right now is:
          e2fsprogs-1.42.7.wc2-7.el6.x86_64

          I also have a script that is using debugfs to dump objects and note the missing ones, on the back end. Joe mentioned that the fsck's appeared to succeed, so we're puzzled also about where did the objects go. They don't show up in lost+found as having been in there before and deleted.

          Is it possible that last_id's were out of order at some point, and the empty objects were deleted as orphans? But in that case it should affect only newish files?

          ruth.klundt@gmail.com Ruth Klundt (Inactive) added a comment - The upgrade on the server was from lustre 2.1.2 -> 2.1.4. The clients are running 2.5.4 llnl version generally, we have a 2.1.4 client off to the side. The version of e2fsprogs on the servers right now is: e2fsprogs-1.42.7.wc2-7.el6.x86_64 I also have a script that is using debugfs to dump objects and note the missing ones, on the back end. Joe mentioned that the fsck's appeared to succeed, so we're puzzled also about where did the objects go. They don't show up in lost+found as having been in there before and deleted. Is it possible that last_id's were out of order at some point, and the empty objects were deleted as orphans? But in that case it should affect only newish files?

          The inodes being reported by debugfs in lost+found can be ignored. They all show a single entry covering the whole block (4096 bytes in size) with inode number 0, which means the entry is unused and should not show up via ls. The lost+found directory is increased in size during e2fsck to hold unreferenced inodes as needed (using the ldiskfs inode number as the filename) but is never shrunk as the files are moved out of the directory, in case it needs to be used again. That is a safety measure on behalf of e2fsck, which tries to avoid allocating new blocks for lost+found during recovery to avoid the potential for further corruption.

          The discrepancy between 2.1 and 2.5 clients on accessing files with missing objects may be due to changes in the client code. For "small files" (i.e. those with size below the stripe of the missing object) it may be that 2.1 will return the size via stat() as computed from the available objects and ignore the fact that one of the objects is missing until it is read. However, if the object is actually missing then the 2.5 behaviour is "more correct" in that it would be possible to have a sparse file that had part of the data on the missing object.

          It may be possible to recover some the data from small files with missing objects if they are actually small files that just happen to be striped over 4 OSTs (== default striping?). On a 2.1 client, which reports the file size via stat instead of returning an error, it would be possible to run something like (untested, for example only):

          #!/bin/bash
          for F in "$@"; do
                  [ -f "$F.recov" ] && echo "$F.recov: already exists" && continue
                  SIZE=$(stat -c%s "$F")
                  STRIPE_SZ=$(lfs getstripe -S "$F")
                  # to be more safe we could assume only the first stripe is valid:
                  # STRIPE_CT=1
                  # allowing the full stripe count will still eliminate large files that are definitely missing data
                  STRIPE_CT=$(lfs getstripe -c "$F")
                  (( $SIZE >= $STRIPE_CT * $STRIPE_SZ)) && echo "$F: may be missing data" && continue
                  # such small files do not need multiple stripes
                  lfs setstripe -c 1 "$F.recov"
                  dd if="$F" of="$F.recov" bs=$SIZE count=1 conv=noerror
          done
          

          This would try to repair specified files that have a size below the stripe width and copy them to a new temporary file. It isn't 100% foolproof since it isn't easy to figure out which object is missing, so there may be some class of files in the 1-4MB size range that have a hole where the missing object is.

          The other issue that hasn't been discussed here is why the OST was corrupted after the upgrade in the first place. Oleg mentioned that this has happened before with a 2.1->2.5 upgrade, and I'm wondering if there is some ldiskfs patch in the TOSS release that needs to be updated, or some bug in e2fsprogs? What version of e2fsprogs is being used with 2.5?

          adilger Andreas Dilger added a comment - The inodes being reported by debugfs in lost+found can be ignored. They all show a single entry covering the whole block (4096 bytes in size) with inode number 0, which means the entry is unused and should not show up via ls. The lost+found directory is increased in size during e2fsck to hold unreferenced inodes as needed (using the ldiskfs inode number as the filename) but is never shrunk as the files are moved out of the directory, in case it needs to be used again. That is a safety measure on behalf of e2fsck, which tries to avoid allocating new blocks for lost+found during recovery to avoid the potential for further corruption. The discrepancy between 2.1 and 2.5 clients on accessing files with missing objects may be due to changes in the client code. For "small files" (i.e. those with size below the stripe of the missing object) it may be that 2.1 will return the size via stat() as computed from the available objects and ignore the fact that one of the objects is missing until it is read. However, if the object is actually missing then the 2.5 behaviour is "more correct" in that it would be possible to have a sparse file that had part of the data on the missing object. It may be possible to recover some the data from small files with missing objects if they are actually small files that just happen to be striped over 4 OSTs (== default striping?). On a 2.1 client, which reports the file size via stat instead of returning an error, it would be possible to run something like (untested, for example only): #!/bin/bash for F in "$@"; do [ -f "$F.recov" ] && echo "$F.recov: already exists" && continue SIZE=$(stat -c%s "$F") STRIPE_SZ=$(lfs getstripe -S "$F") # to be more safe we could assume only the first stripe is valid: # STRIPE_CT=1 # allowing the full stripe count will still eliminate large files that are definitely missing data STRIPE_CT=$(lfs getstripe -c "$F") (( $SIZE >= $STRIPE_CT * $STRIPE_SZ)) && echo "$F: may be missing data" && continue # such small files do not need multiple stripes lfs setstripe -c 1 "$F.recov" dd if="$F" of="$F.recov" bs=$SIZE count=1 conv=noerror done This would try to repair specified files that have a size below the stripe width and copy them to a new temporary file. It isn't 100% foolproof since it isn't easy to figure out which object is missing, so there may be some class of files in the 1-4MB size range that have a hole where the missing object is. The other issue that hasn't been discussed here is why the OST was corrupted after the upgrade in the first place. Oleg mentioned that this has happened before with a 2.1->2.5 upgrade, and I'm wondering if there is some ldiskfs patch in the TOSS release that needs to be updated, or some bug in e2fsprogs? What version of e2fsprogs is being used with 2.5?
          green Oleg Drokin added a comment -

          It's ok to see the filenames in ebugfs output in lost+found, but because inode field is zero, that just means they are deleted there. It's just because lost+found is special and we never want it to be truncated (not to need to allocate any data blocks when we want to add stuff there) that the names remain in this deleted state there.

          green Oleg Drokin added a comment - It's ok to see the filenames in ebugfs output in lost+found, but because inode field is zero, that just means they are deleted there. It's just because lost+found is special and we never want it to be truncated (not to need to allocate any data blocks when we want to add stuff there) that the names remain in this deleted state there.

          we've checked all the OSTs and they all report 4-5k items in lost+found via debugfs, but nothing shows up via ls -l.

          ruth.klundt@gmail.com Ruth Klundt (Inactive) added a comment - we've checked all the OSTs and they all report 4-5k items in lost+found via debugfs, but nothing shows up via ls -l.

          The objects in the screen were all verified to be restored, for that OST.

          However, we find that although /bin/ls of lost+found (mounted ldiskfs) shows no files, debugfs shows items in that directory, e.g.:

          11 (12) . 2 (4084) .. 0 (4096) #524990 0 (4096) #525635
          0 (4096) #526139 0 (4096) #526706 0 (4096) #527737
          0 (4096) #590365 0 (4096) #590761 0 (4096) #591191
          0 (4096) #591637 0 (4096) #592078 0 (4096) #592408
          0 (4096) #592693 0 (4096) #592998 0 (4096) #593285
          0 (4096) #593578 0 (4096) #593877 0 (4096) #594179
          0 (4096) #594470 0 (4096) #594770 0 (4096) #595075
          0 (4096) #595377 0 (4096) #595673 0 (4096) #595971
          0 (4096) #596269 0 (4096) #596570 0 (4096) #596871
          0 (4096) #597173 0 (4096) #597465 0 (4096) #597767
          0 (4096) #598071 0 (4096) #598362 0 (4096) #598647
          0 (4096) #598932 0 (4096) #599224 0 (4096) #599521
          ...

          we haven't cross-ref'd the objects against missing ones yet, but wondering if that is expected?
          command was:
          debugfs -c -R "ls /lost+found" /dev/mapper/360001ff090439000000000248b9b0024

          ruth.klundt@gmail.com Ruth Klundt (Inactive) added a comment - The objects in the screen were all verified to be restored, for that OST. However, we find that although /bin/ls of lost+found (mounted ldiskfs) shows no files, debugfs shows items in that directory, e.g.: 11 (12) . 2 (4084) .. 0 (4096) #524990 0 (4096) #525635 0 (4096) #526139 0 (4096) #526706 0 (4096) #527737 0 (4096) #590365 0 (4096) #590761 0 (4096) #591191 0 (4096) #591637 0 (4096) #592078 0 (4096) #592408 0 (4096) #592693 0 (4096) #592998 0 (4096) #593285 0 (4096) #593578 0 (4096) #593877 0 (4096) #594179 0 (4096) #594470 0 (4096) #594770 0 (4096) #595075 0 (4096) #595377 0 (4096) #595673 0 (4096) #595971 0 (4096) #596269 0 (4096) #596570 0 (4096) #596871 0 (4096) #597173 0 (4096) #597465 0 (4096) #597767 0 (4096) #598071 0 (4096) #598362 0 (4096) #598647 0 (4096) #598932 0 (4096) #599224 0 (4096) #599521 ... we haven't cross-ref'd the objects against missing ones yet, but wondering if that is expected? command was: debugfs -c -R "ls /lost+found" /dev/mapper/360001ff090439000000000248b9b0024
          green Oleg Drokin added a comment -

          It's a pity you did not save the output.
          I wanted to cross-reference if any o the missing objects you see now were listed as restored.

          I guess you still can perform the check - all the objects in your buffer - are they sill present to where they were moved?

          green Oleg Drokin added a comment - It's a pity you did not save the output. I wanted to cross-reference if any o the missing objects you see now were listed as restored. I guess you still can perform the check - all the objects in your buffer - are they sill present to where they were moved?
          jamervi Joe Mervini added a comment -

          I did not log the output from the recovery process. But every file that was in lost+found was restored leaving nothing behind. And I was watching the recovery as it was happen. I was able to scroll back through one of my screens to get to some of the output before it ran out of buffer. Here is a sample:

          Object /mnt/lustre/local/scratch1-OST0013/O/0/d18/2326194 restored.
          Object /mnt/lustre/local/scratch1-OST0013/O/0/d19/2326195 restored.
          Object /mnt/lustre/local/scratch1-OST0013/O/0/d20/2326196 restored.
          Object /mnt/lustre/local/scratch1-OST0013/O/0/d21/2326197 restored.
          Object /mnt/lustre/local/scratch1-OST0013/O/0/d22/2326198 restored.
          Object /mnt/lustre/local/scratch1-OST0013/O/0/d23/2326199 restored.
          Object /mnt/lustre/local/scratch1-OST0013/O/0/d24/2326200 restored.
          Object /mnt/lustre/local/scratch1-OST0013/O/0/d25/2326201 restored.
          Object /mnt/lustre/local/scratch1-OST0013/O/0/d26/2326202 restored.
          Object /mnt/lustre/local/scratch1-OST0013/O/0/d27/2326203 restored.
          Object /mnt/lustre/local/scratch1-OST0013/O/0/d28/2326204 restored.

          And that's pretty much what I observed throughout the process. Didn't see any messages other than restored.

          jamervi Joe Mervini added a comment - I did not log the output from the recovery process. But every file that was in lost+found was restored leaving nothing behind. And I was watching the recovery as it was happen. I was able to scroll back through one of my screens to get to some of the output before it ran out of buffer. Here is a sample: Object /mnt/lustre/local/scratch1-OST0013/O/0/d18/2326194 restored. Object /mnt/lustre/local/scratch1-OST0013/O/0/d19/2326195 restored. Object /mnt/lustre/local/scratch1-OST0013/O/0/d20/2326196 restored. Object /mnt/lustre/local/scratch1-OST0013/O/0/d21/2326197 restored. Object /mnt/lustre/local/scratch1-OST0013/O/0/d22/2326198 restored. Object /mnt/lustre/local/scratch1-OST0013/O/0/d23/2326199 restored. Object /mnt/lustre/local/scratch1-OST0013/O/0/d24/2326200 restored. Object /mnt/lustre/local/scratch1-OST0013/O/0/d25/2326201 restored. Object /mnt/lustre/local/scratch1-OST0013/O/0/d26/2326202 restored. Object /mnt/lustre/local/scratch1-OST0013/O/0/d27/2326203 restored. Object /mnt/lustre/local/scratch1-OST0013/O/0/d28/2326204 restored. And that's pretty much what I observed throughout the process. Didn't see any messages other than restored.
          green Oleg Drokin added a comment -

          So with ll_recover_lost_found_objs - you did not happen to run it in -v mode and save output, did you? I just want to see if any of the now missing objects were in fact recovered and then deleted by something again.

          Additionally, did you see that lost_found is now empty after ll_recover_lost_found_objs was run?

          green Oleg Drokin added a comment - So with ll_recover_lost_found_objs - you did not happen to run it in -v mode and save output, did you? I just want to see if any of the now missing objects were in fact recovered and then deleted by something again. Additionally, did you see that lost_found is now empty after ll_recover_lost_found_objs was run?

          yes that seems to be the question. the next file I checked, size 2321, was there in the first stripe but all other stripes did not exist on the OSTs.

          There were several fsck's run, I'll defer to Joe for questions about them since I wasn't around for that. I believe he has stored output from them. Is there a chance that he hit a version of fsck with a bug in it?

          ruth.klundt@gmail.com Ruth Klundt (Inactive) added a comment - yes that seems to be the question. the next file I checked, size 2321, was there in the first stripe but all other stripes did not exist on the OSTs. There were several fsck's run, I'll defer to Joe for questions about them since I wasn't around for that. I believe he has stored output from them. Is there a chance that he hit a version of fsck with a bug in it?
          green Oleg Drokin added a comment -

          I imagine the size might be received from MDS because we started to store size on the mds some time ago, but 2.5 might be disregarding this info and 2.1 did not in the face of missing objects (I can probably check this).

          The more important issue is - if all the files are missign at leat one object - where did all of those object disappeared and how to return them back?
          I know you already did e2fsck and relinked all the files back into place so supposedly nothing should be lost anymore?

          green Oleg Drokin added a comment - I imagine the size might be received from MDS because we started to store size on the mds some time ago, but 2.5 might be disregarding this info and 2.1 did not in the face of missing objects (I can probably check this). The more important issue is - if all the files are missign at leat one object - where did all of those object disappeared and how to return them back? I know you already did e2fsck and relinked all the files back into place so supposedly nothing should be lost anymore?

          People

            green Oleg Drokin
            jamervi Joe Mervini
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: