Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12913

fsck found > 1M multilply-claimed blocks

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • None
    • None
    • None
    • lustre 2.10.5/toss3.5 (rhel 7.5) running on Dell R730 servers and DDN SFA12K hardware.
    • 1
    • 9223372036854775807

    Description

      On another OST that went read-only on the same file system that I reported in LU-12912 the fsck is reporting over 1.3 million multiply-claimed blocks after going through phase1D of fsck -fy <device> that has been running since ~6PM last night. As of now it looks like it has only clone a small number of the blocks.

      I have never encountered this before so I'm not sure how to proceed. Any advise will be welcome.

      Attachments

        Issue Links

          Activity

            [LU-12913] fsck found > 1M multilply-claimed blocks
            pjones Peter Jones added a comment -

            Excellent news! Joe, I hope to see you at SC19 too

            pjones Peter Jones added a comment - Excellent news! Joe, I hope to see you at SC19 too
            jamervi Joe Mervini added a comment -

            Just wanted to post that this case has been resolved. 

            Andreas, if the opportunity comes up at SC I'd like to buy you a few beers! Maybe pick your brain about the mysteries of fsck...

            jamervi Joe Mervini added a comment - Just wanted to post that this case has been resolved.  Andreas, if the opportunity comes up at SC I'd like to buy you a few beers! Maybe pick your brain about the mysteries of fsck...
            jamervi Joe Mervini added a comment -

            Oh shoot! I thought I sent this out last night...

             

            The fsck completed and I ran another for good measure. 

            There were 461 instances of inode extent trees being optimized. It then went though Pass 1E Optimizing extent trees.

            Pass2 exposed those inodes that were cleared. We're going to see if we can determine the affected directories tomorrow.

            Pass5 had an enormous amount of Block bitmap differences.

             

            The file system is back online and everything looks ok. 

            jamervi Joe Mervini added a comment - Oh shoot! I thought I sent this out last night...   The fsck completed and I ran another for good measure.  There were 461 instances of inode extent trees being optimized. It then went though Pass 1E Optimizing extent trees. Pass2 exposed those inodes that were cleared. We're going to see if we can determine the affected directories tomorrow. Pass5 had an enormous amount of Block bitmap differences.   The file system is back online and everything looks ok. 
            pjones Peter Jones added a comment -

            Hey jamervi any news?

            pjones Peter Jones added a comment - Hey jamervi any news?
            jamervi Joe Mervini added a comment -

            Oh - just saw your message and started the -fy fsck. 

            It shows those inodes being optimized. It's progressing so I'll post an update when it finish. 

            Thanks a lot for your help! 

            jamervi Joe Mervini added a comment - Oh - just saw your message and started the -fy fsck.  It shows those inodes being optimized. It's progressing so I'll post an update when it finish.  Thanks a lot for your help! 
            jamervi Joe Mervini added a comment -

            I just tried the fsck with the -p option. I got:

            mscratch-OST0023: Inode <496> extent tree (at level 2) could be narrower, IGNORED.

            This is repeat for 23 additional inodes. 

            Then I got:

            mscratch-OST0023: Inode 919 has an invalid extent node (blk 119809, lblk 0)

             

            mscratch-OST0023: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.

             

            I'm guessing I am safe to proceed but am curious what these messages mean.

            jamervi Joe Mervini added a comment - I just tried the fsck with the -p option. I got: mscratch-OST0023: Inode <496> extent tree (at level 2) could be narrower, IGNORED. This is repeat for 23 additional inodes.  Then I got: mscratch-OST0023: Inode 919 has an invalid extent node (blk 119809, lblk 0)   mscratch-OST0023: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.   I'm guessing I am safe to proceed but am curious what these messages mean.

            Probably the first one was loading the metadata from disk, and the later ones had it cached already.

            You should probably use "-y" since that will fix any problems found, while "-p" may abort if there are unexpected problems.

            adilger Andreas Dilger added a comment - Probably the first one was loading the metadata from disk, and the later ones had it cached already. You should probably use "-y" since that will fix any problems found, while "-p" may abort if there are unexpected problems.
            jamervi Joe Mervini added a comment -

            Andreas - I cleared the mmp flag and clearing the inodes one at a time. (I was surprised it actually took quite a while for the first one to clear.)

            If you see this before I get to the point where I run fsck again, should I use the "-p" or "-y" option?  

            jamervi Joe Mervini added a comment - Andreas - I cleared the mmp flag and clearing the inodes one at a time. (I was surprised it actually took quite a while for the first one to clear.) If you see this before I get to the point where I run fsck again, should I use the "-p" or "-y" option?  

            Joe, sorry that I did not see your message sooner. You can clear the e2fsck MMP state with "tune2fs -E clear_mmp -f /dev/ostdev" if you are sure nothing else is using the OST.

            You should run another e2fsck after the bad inodes are cleared, but it should be much faster without the badblocks pass.

            adilger Andreas Dilger added a comment - Joe, sorry that I did not see your message sooner. You can clear the e2fsck MMP state with " tune2fs -E clear_mmp -f /dev/ostdev " if you are sure nothing else is using the OST. You should run another e2fsck after the bad inodes are cleared, but it should be much faster without the badblocks pass.

            People

              adilger Andreas Dilger
              jamervi Joe Mervini
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: