Details

    • Bug
    • Resolution: Done
    • Critical
    • None
    • Lustre 2.1.6
    • Toss 2.13 - Lustre 2.1.4
    • 4
    • 9223372036854775807

    Description

      We recently ran into LBUG errors with running the 2.5.x Lustre client against Lustre 2.1.2 that’s resolution was to update the version to 2.1.4. In all cases we encountered data loss in that files that previously existed show zero file length. The assumption at the time was that this file loss was due to numerous file system crashes that we encountered prior to the the software update.

      This past Friday our last file system running 2.1.2 went down unexpectedly. Since we do not routinely take our file systems down due to demand, and a desire to preemptively prevent the issues that we encountered on the other file systems I update the file system during the outage. Because the OSTs went read-only I performed fsck’s on all the targets as well as the MDT as I routinely do, and they came back cleanly with the exception of a number of free inode count wrong and free block count wrong messages - which in my experience is normal.

      When the file system was returned to service everything appeared fine but users started reporting that even though they could stat files, when trying to open them they came back as “no such file or directory”. The file system was immediately taken down and a subsequent fsck of the OSTs - which took several hours - put millions of files into lost+found. The MDT came back clean as before. This was the same scenario as was experienced the file systems that encountered the crashes. As was the case on the other file systems I need to use ll_recover_lost_found_objs to restore the objects and then ran another fsck as a sanity check.

      Remounting the file system on a 2.1.4 client show file sizes but can not be opened. On a 2.5.4 client the files show zero file length.

      An attempt was made to go back to 2.1.2 but that was impossible because mounting the MDT under lustre product a “Stale NFS file handle” message.

      lfs getstripe on a sampling files that are inaccessible shows the objects and using debugfs to examine the objects show data in the objects and in the case of text/ascii files they can be easily read.

      Right now we are in a down and critical state.

      Attachments

        1. cat-lustre-log.txt
          0.2 kB
          Ruth Klundt
        2. debug.txt
          4 kB
          Joe Mervini
        3. lustre-log.txt
          0.2 kB
          Joe Mervini

        Activity

          [LU-6945] Clients reporting missing files
          jfc John Fuchs-Chesney (Inactive) made changes -
          Resolution New: Done [ 10000 ]
          Status Original: Open [ 1 ] New: Resolved [ 5 ]

          Joe,
          We are going to close this out as you suggest.

          There are a number of fixes for large file systems that have been applied in more recent Lustre versions, and it would be quite time-consuming to try to identify exactly what was the cause here.

          Thanks,
          ~ jfc.

          jfc John Fuchs-Chesney (Inactive) added a comment - Joe, We are going to close this out as you suggest. There are a number of fixes for large file systems that have been applied in more recent Lustre versions, and it would be quite time-consuming to try to identify exactly what was the cause here. Thanks, ~ jfc.
          jamervi Joe Mervini added a comment -

          Yes - we might as well close it. I was hoping that Intel might have an idea as to the root cause. My theory is that something changed fundamentally in the way that MDS treats files that don't fill an entire stripe since it only presented the situation after bringing the file system back online under the 2.1.4 version. That isn't something I would have expected in a minor version update.

          In any event, since this was the last of the file systems running the old code we should not encounter the same problem in the future.

          jamervi Joe Mervini added a comment - Yes - we might as well close it. I was hoping that Intel might have an idea as to the root cause. My theory is that something changed fundamentally in the way that MDS treats files that don't fill an entire stripe since it only presented the situation after bringing the file system back online under the 2.1.4 version. That isn't something I would have expected in a minor version update. In any event, since this was the last of the file systems running the old code we should not encounter the same problem in the future.

          Thanks for the advice Andreas. We have found only small files in this condition so far, and we are slowly restoring the items users request. The file system is up and running so we're probably not critical anymore.

          I'll leave it to Joe if there is more he would like to investigate with regard to root cause before closing the ticket.

          ruth.klundt@gmail.com Ruth Klundt (Inactive) added a comment - Thanks for the advice Andreas. We have found only small files in this condition so far, and we are slowly restoring the items users request. The file system is up and running so we're probably not critical anymore. I'll leave it to Joe if there is more he would like to investigate with regard to root cause before closing the ticket.

          I see from one of the earlier comments that these are "28 22TB OSTs". In this case, I'd recommend to update to the latest e2fsprogs-1.42.12.wc1 since this includes a large number of fixes since 1.42.3.wc3 was released 3 years ago. There definitely were bugs fixed related to filesystem sizes over 16TB since that time.

          adilger Andreas Dilger added a comment - I see from one of the earlier comments that these are "28 22TB OSTs". In this case, I'd recommend to update to the latest e2fsprogs-1.42.12.wc1 since this includes a large number of fixes since 1.42.3.wc3 was released 3 years ago. There definitely were bugs fixed related to filesystem sizes over 16TB since that time.
          jamervi Joe Mervini added a comment -

          The version of e2fsprogs that were in the image that was running 2.1.2 was e2fsprogs-1.42.3.wc3-7.el6.x86_64.

          Don't know if that would explain why the OSTs got corrupted.

          jamervi Joe Mervini added a comment - The version of e2fsprogs that were in the image that was running 2.1.2 was e2fsprogs-1.42.3.wc3-7.el6.x86_64. Don't know if that would explain why the OSTs got corrupted.
          green Oleg Drokin added a comment -

          if it's the first stripe, I imagine you can just copy out the object file from ost FS directly and that would be the content.

          green Oleg Drokin added a comment - if it's the first stripe, I imagine you can just copy out the object file from ost FS directly and that would be the content.

          The first 75 files I ran through with debugfs dump script, were all either 1 or 2 stripes, total size < 2MiB. I'll need to get the user to verify the sanity of the files.

          ps dd gets no such file or directory on these files.

          ruth.klundt@gmail.com Ruth Klundt (Inactive) added a comment - The first 75 files I ran through with debugfs dump script, were all either 1 or 2 stripes, total size < 2MiB. I'll need to get the user to verify the sanity of the files. ps dd gets no such file or directory on these files.

          The upgrade on the server was from lustre 2.1.2 -> 2.1.4. The clients are running 2.5.4 llnl version generally, we have a 2.1.4 client off to the side.

          The version of e2fsprogs on the servers right now is:
          e2fsprogs-1.42.7.wc2-7.el6.x86_64

          I also have a script that is using debugfs to dump objects and note the missing ones, on the back end. Joe mentioned that the fsck's appeared to succeed, so we're puzzled also about where did the objects go. They don't show up in lost+found as having been in there before and deleted.

          Is it possible that last_id's were out of order at some point, and the empty objects were deleted as orphans? But in that case it should affect only newish files?

          ruth.klundt@gmail.com Ruth Klundt (Inactive) added a comment - The upgrade on the server was from lustre 2.1.2 -> 2.1.4. The clients are running 2.5.4 llnl version generally, we have a 2.1.4 client off to the side. The version of e2fsprogs on the servers right now is: e2fsprogs-1.42.7.wc2-7.el6.x86_64 I also have a script that is using debugfs to dump objects and note the missing ones, on the back end. Joe mentioned that the fsck's appeared to succeed, so we're puzzled also about where did the objects go. They don't show up in lost+found as having been in there before and deleted. Is it possible that last_id's were out of order at some point, and the empty objects were deleted as orphans? But in that case it should affect only newish files?

          The inodes being reported by debugfs in lost+found can be ignored. They all show a single entry covering the whole block (4096 bytes in size) with inode number 0, which means the entry is unused and should not show up via ls. The lost+found directory is increased in size during e2fsck to hold unreferenced inodes as needed (using the ldiskfs inode number as the filename) but is never shrunk as the files are moved out of the directory, in case it needs to be used again. That is a safety measure on behalf of e2fsck, which tries to avoid allocating new blocks for lost+found during recovery to avoid the potential for further corruption.

          The discrepancy between 2.1 and 2.5 clients on accessing files with missing objects may be due to changes in the client code. For "small files" (i.e. those with size below the stripe of the missing object) it may be that 2.1 will return the size via stat() as computed from the available objects and ignore the fact that one of the objects is missing until it is read. However, if the object is actually missing then the 2.5 behaviour is "more correct" in that it would be possible to have a sparse file that had part of the data on the missing object.

          It may be possible to recover some the data from small files with missing objects if they are actually small files that just happen to be striped over 4 OSTs (== default striping?). On a 2.1 client, which reports the file size via stat instead of returning an error, it would be possible to run something like (untested, for example only):

          #!/bin/bash
          for F in "$@"; do
                  [ -f "$F.recov" ] && echo "$F.recov: already exists" && continue
                  SIZE=$(stat -c%s "$F")
                  STRIPE_SZ=$(lfs getstripe -S "$F")
                  # to be more safe we could assume only the first stripe is valid:
                  # STRIPE_CT=1
                  # allowing the full stripe count will still eliminate large files that are definitely missing data
                  STRIPE_CT=$(lfs getstripe -c "$F")
                  (( $SIZE >= $STRIPE_CT * $STRIPE_SZ)) && echo "$F: may be missing data" && continue
                  # such small files do not need multiple stripes
                  lfs setstripe -c 1 "$F.recov"
                  dd if="$F" of="$F.recov" bs=$SIZE count=1 conv=noerror
          done
          

          This would try to repair specified files that have a size below the stripe width and copy them to a new temporary file. It isn't 100% foolproof since it isn't easy to figure out which object is missing, so there may be some class of files in the 1-4MB size range that have a hole where the missing object is.

          The other issue that hasn't been discussed here is why the OST was corrupted after the upgrade in the first place. Oleg mentioned that this has happened before with a 2.1->2.5 upgrade, and I'm wondering if there is some ldiskfs patch in the TOSS release that needs to be updated, or some bug in e2fsprogs? What version of e2fsprogs is being used with 2.5?

          adilger Andreas Dilger added a comment - The inodes being reported by debugfs in lost+found can be ignored. They all show a single entry covering the whole block (4096 bytes in size) with inode number 0, which means the entry is unused and should not show up via ls. The lost+found directory is increased in size during e2fsck to hold unreferenced inodes as needed (using the ldiskfs inode number as the filename) but is never shrunk as the files are moved out of the directory, in case it needs to be used again. That is a safety measure on behalf of e2fsck, which tries to avoid allocating new blocks for lost+found during recovery to avoid the potential for further corruption. The discrepancy between 2.1 and 2.5 clients on accessing files with missing objects may be due to changes in the client code. For "small files" (i.e. those with size below the stripe of the missing object) it may be that 2.1 will return the size via stat() as computed from the available objects and ignore the fact that one of the objects is missing until it is read. However, if the object is actually missing then the 2.5 behaviour is "more correct" in that it would be possible to have a sparse file that had part of the data on the missing object. It may be possible to recover some the data from small files with missing objects if they are actually small files that just happen to be striped over 4 OSTs (== default striping?). On a 2.1 client, which reports the file size via stat instead of returning an error, it would be possible to run something like (untested, for example only): #!/bin/bash for F in "$@"; do [ -f "$F.recov" ] && echo "$F.recov: already exists" && continue SIZE=$(stat -c%s "$F") STRIPE_SZ=$(lfs getstripe -S "$F") # to be more safe we could assume only the first stripe is valid: # STRIPE_CT=1 # allowing the full stripe count will still eliminate large files that are definitely missing data STRIPE_CT=$(lfs getstripe -c "$F") (( $SIZE >= $STRIPE_CT * $STRIPE_SZ)) && echo "$F: may be missing data" && continue # such small files do not need multiple stripes lfs setstripe -c 1 "$F.recov" dd if="$F" of="$F.recov" bs=$SIZE count=1 conv=noerror done This would try to repair specified files that have a size below the stripe width and copy them to a new temporary file. It isn't 100% foolproof since it isn't easy to figure out which object is missing, so there may be some class of files in the 1-4MB size range that have a hole where the missing object is. The other issue that hasn't been discussed here is why the OST was corrupted after the upgrade in the first place. Oleg mentioned that this has happened before with a 2.1->2.5 upgrade, and I'm wondering if there is some ldiskfs patch in the TOSS release that needs to be updated, or some bug in e2fsprogs? What version of e2fsprogs is being used with 2.5?

          People

            green Oleg Drokin
            jamervi Joe Mervini
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: