Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9836

Issues with 2.10 upgrade and files missing LMAC_FID_ON_OST flag

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.11.0, Lustre 2.10.4
    • Lustre 2.10.0
    • None
    • 3.10.0-514.21.1.el7_lustre.x86_64
    • 3
    • 9223372036854775807

    Description

      Last weekend, we've upgraded our lustre from 2.7 to 2.10. After the upgrade, we were missing about 36M objects. After a bunch of troubleshooting, we ended up running e2fsck (which recovered the objects to lost+found) and ll_recover_lost_found_objs (which moved them back to the proper place in the ldiskfs filesystem). It's worth noting that lfsck couldn't recover the objects from lost+found (because of some kind of incompatibility between the objects EA and lfsck, details following).

      Couple of remarks:
      1. it looks like the functionality from ll_recover_lost_found_objs has been moved to the oi_scrub initial process but it would not work for us. We noticed 2 things:
      a. in the lab, we recreated the issue (by moving manually objects to lost+found) and osd_initial_OI_scrub() would recover only the first 255 objects. We couldn't figure out why it stopped at 255 and restarting the OST would not recover any more than the initial 255.
      b. in prod, osd_initial_OI_scrub() would run but not fix anything. The trace would come back with osd_ios_lf_fill() returning -EINVAL. After troubleshooting this issue more, it turns out all the objects in lost+found do have no compat flag (in particular no LMAC_FID_ON_OST) in the LMA extended attribute and eventually we end up in with osd_get_idif() returning -EINVAL (because __ost_xattr_get() returned 24). We believe all those files were created with lustre 2.7.
      -> this is how far we got troubleshooting those 2 issues. Sounds like bugs, we are happy to give more details and/or file a bug report if that helps.
      2. our lustre has 96 OST (id 0 to 97). All of the bad objects were located on 24 of them (id 48 to 71) – about 1.5M bad inodes out of 3M per OST. What's special about id 48 to 71, is that those OSTs have been reformatted about 6 months ago (with the same id, but at creation we forgot to add --replace to mkfs or do a writeconf). At the time, we saw some "precreate FID 0x0:3164581 is over 100000 larger than the LAST_ID 0x0:0, only precreating the last 10000 objects." in the logs. This sounds like the potential root cause to our issue last week, but we really can't figure out how this would have caused half of inodes to not get LMAC_FID_ON_OST and get lost in ldiskfs.
      3. after fixing everything, when we run the lfsck -t scrub, all the bad objects are being checked and reported as failed in oi_scrub (example below). After digging, this comes down to the same ost_get_idif() function returning -EINVAL. We can fix this by copying files.
      checked: 3383278
      updated: 0
      failed: 1469776

      Overall, we just wanted to report this on the mailing list in case someone else runs into this issue and see if we should open bugs about 1.a. and 1.b. And also, we were curious whether anybody had any explanation on how we got there and whether 2. could explain it.

      This is pretty dense, but overall reports 3 issues:

      • osd_initial_OI_scrub() seems to recover up to 255 files from lost+found, never more
      • somehow we got some files without compat flags that don't get processed by osd_initial_OI_scrub() and report failed by lfsck -t scrub
      • we have 24 OST of 96 that had 1.5M unattached inodes somehow and we're curious how this could have happened.

      Attachments

        1. 800a_mount_patched.log.0.gz
          262 kB
        2. 800a_mount.log.gz
          251 kB
        3. debugfs_mount_logs.tar.gz
          232 kB
        4. l210_loop_4g.tar.xz
          2.30 MB
        5. ls_output.gz
          2 kB

        Issue Links

          Activity

            [LU-9836] Issues with 2.10 upgrade and files missing LMAC_FID_ON_OST flag

            John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/31019/
            Subject: LU-9836 osd-ldiskfs: read directory completely
            Project: fs/lustre-release
            Branch: b2_10
            Current Patch Set:
            Commit: fc862c92d8cad04bceea68823d136d9d8678cc98

            gerrit Gerrit Updater added a comment - John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/31019/ Subject: LU-9836 osd-ldiskfs: read directory completely Project: fs/lustre-release Branch: b2_10 Current Patch Set: Commit: fc862c92d8cad04bceea68823d136d9d8678cc98

            Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/31019
            Subject: LU-9836 osd-ldiskfs: read directory completely
            Project: fs/lustre-release
            Branch: b2_10
            Current Patch Set: 1
            Commit: 53540def8b7ad887b49921028d8378accae22698

            gerrit Gerrit Updater added a comment - Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/31019 Subject: LU-9836 osd-ldiskfs: read directory completely Project: fs/lustre-release Branch: b2_10 Current Patch Set: 1 Commit: 53540def8b7ad887b49921028d8378accae22698

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/30770/
            Subject: LU-9836 osd-ldiskfs: read directory completely
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 1e2cd1c7025080879a27f9ad9a3896fd3e0e8753

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/30770/ Subject: LU-9836 osd-ldiskfs: read directory completely Project: fs/lustre-release Branch: master Current Patch Set: Commit: 1e2cd1c7025080879a27f9ad9a3896fd3e0e8753

            Thank you for the patch! We were able to test it and it resolved the issue for us.

            Thanks again!

            mcmult Tim McMullan (Inactive) added a comment - Thank you for the patch! We were able to test it and it resolved the issue for us. Thanks again!

            mcmult, Thanks for your help.

            We found the reason reason for why some orphans cannot be recovery. The patch https://review.whamcloud.com/30770 is master based, but it also be applicable for b2_10. You can verify it when you have time.

            Thanks!

            yong.fan nasf (Inactive) added a comment - mcmult , Thanks for your help. We found the reason reason for why some orphans cannot be recovery. The patch https://review.whamcloud.com/30770 is master based, but it also be applicable for b2_10. You can verify it when you have time. Thanks!

            Fan Yong (fan.yong@intel.com) uploaded a new patch: https://review.whamcloud.com/30770
            Subject: LU-9836 osd-ldiskfs: read directory completely
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 9605f06ee38fc8d8b8d7d263e9dc3bb88f2d828d

            gerrit Gerrit Updater added a comment - Fan Yong (fan.yong@intel.com) uploaded a new patch: https://review.whamcloud.com/30770 Subject: LU-9836 osd-ldiskfs: read directory completely Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 9605f06ee38fc8d8b8d7d263e9dc3bb88f2d828d
            mcmult Tim McMullan (Inactive) added a comment - - edited

            Please check again, I had thought the upload finished but it hadn't.  The file should be here now.  Sorry about that! 

            In testing I had skipped straight from 512MB and 300 files to 4GB and 1024 files since I knew it would show the issue.  Some of our initial tests were done with 512 files and that also showed the issue.

            mcmult Tim McMullan (Inactive) added a comment - - edited Please check again, I had thought the upload finished but it hadn't.  The file should be here now.  Sorry about that!  In testing I had skipped straight from 512MB and 300 files to 4GB and 1024 files since I knew it would show the issue.  Some of our initial tests were done with 512 files and that also showed the issue.

            mcmult, where have you uploaded the image 210_loop_4g.tar.xz to? what the smallest image size (and what is smallest files number) you can reproduce the issue?

            yong.fan nasf (Inactive) added a comment - mcmult , where have you uploaded the image 210_loop_4g.tar.xz to? what the smallest image size (and what is smallest files number) you can reproduce the issue?

            I got it to reproduce in a loopback device.  I've attached l210_loop_4g.tar.xz containing 3 copies of the 4gig ost file in various states (freshly written to, after I've moved objects to lost+found, after mounting it for the first time), and the log from when it mounted.

            As an interesting note, I had tried to do this with a much smaller disk and fewer objects and the recovery process worked correctly.  We have been seeing it stop around 250-260.  I tried just moving 300 objects into lost+found and all of them were recovered successfully.

            mcmult Tim McMullan (Inactive) added a comment - I got it to reproduce in a loopback device.  I've attached l210_loop_4g.tar.xz  containing 3 copies of the 4gig ost file in various states (freshly written to, after I've moved objects to lost+found, after mounting it for the first time), and the log from when it mounted. As an interesting note, I had tried to do this with a much smaller disk and fewer objects and the recovery process worked correctly.  We have been seeing it stop around 250-260.  I tried just moving 300 objects into lost+found and all of them were recovered successfully.

            This is on a real block device and is too large to upload reasonably.  I will try to recreate on a loopback device that is small enough to upload here

            mcmult Tim McMullan (Inactive) added a comment - This is on a real block device and is too large to upload reasonably.  I will try to recreate on a loopback device that is small enough to upload here

            People

              yong.fan nasf (Inactive)
              jwallior Julien Wallior
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: