Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4102

lots of multiply-claimed blocks in e2fsck

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Major
    • None
    • Lustre 1.8.8
    • e2fsprogs 1.41.90.wc2
    • 3
    • 11017

    Description

      After a power loss, an older e2fsck (e2fsprogs 1.41.90.wc2) was run on the OSTs. It found tons of multiply-claimed blocks, including for the /O directory. Here's an example of one of the inodes:

      File ... (inode #17825793, mod time Wed Aug 15 19:02:25 2012)
        has 1 multiply-claimed block(s), shared with 1 file(s):
              /O (inode #84934657, mod time Wed Aug 15 19:02:25 2012)
      Clone multiply-claimed blocks? yes
      
      Inode 17825793 doesn't have an associated directory entry, it eventually gets put into lost+found.
      

      So the questions are:

      • how could this have happened? My slightly-informed-probably-wrong theory is that the journal got corrupted and it replayed some old inodes back into existence. I noticed there were a lot of patches dealing with journal checksums committed after 1.41.90.
      • what's the best way to deal with these? Cloning takes forever when you are talking about TB sized files. I tested the delete extended option, and it looks like it deletes both sides of the file. It would be nice if it just deleted the unlinked side. Right now my plan is to create a debugfs script from the read-only e2fsck output, but if there is a better way, that would be good.

      Thanks.

      Attachments

        Issue Links

          Activity

            [LU-4102] lots of multiply-claimed blocks in e2fsck
            pjones Peter Jones added a comment -

            Any work still outstanding should be tracked under a new ticket

            pjones Peter Jones added a comment - Any work still outstanding should be tracked under a new ticket

            Hi Andreas,

            I wanted to tie up the loose ends with the e2fsck patches in this thread. Is the shared=ignore patch something that could be landed? Should I create a gerrit changeset for it?

            As for the skip-invalid-bitmap issue, what's the best path to resolution on that? Should I create a new Jira ticket for it or what do you suggest?

            Thanks,
            Kit

            kitwestneat Kit Westneat (Inactive) added a comment - Hi Andreas, I wanted to tie up the loose ends with the e2fsck patches in this thread. Is the shared=ignore patch something that could be landed? Should I create a gerrit changeset for it? As for the skip-invalid-bitmap issue, what's the best path to resolution on that? Should I create a new Jira ticket for it or what do you suggest? Thanks, Kit

            http://review.whamcloud.com/8188 updates lustre/ChangeLog to recommend using a newer e2fsprogs for b2_1.

            adilger Andreas Dilger added a comment - http://review.whamcloud.com/8188 updates lustre/ChangeLog to recommend using a newer e2fsprogs for b2_1.

            Hi Andreas,

            One of the targets just went RO:
            Oct 28 17:09:45 lfs-oss-2-6 kernel: LDISKFS-fs error (device dm-50): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 577corrupted: 24074 blocks free in bitmap,
            24075 - in gd

            It looks like I missed it when disabling targets. I had forgotten that I used the shared=ignore flag when cleaning it up, so the clean bill of health from e2fsck was an illusion.

            I've marked it deactivated on the MDT. Hopefully it can hold until Wednesday.

            kitwestneat Kit Westneat (Inactive) added a comment - Hi Andreas, One of the targets just went RO: Oct 28 17:09:45 lfs-oss-2-6 kernel: LDISKFS-fs error (device dm-50): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 577corrupted: 24074 blocks free in bitmap, 24075 - in gd It looks like I missed it when disabling targets. I had forgotten that I used the shared=ignore flag when cleaning it up, so the clean bill of health from e2fsck was an illusion. I've marked it deactivated on the MDT. Hopefully it can hold until Wednesday.

            I think it seems reasonable, so long as the new ll_recover_lost_found_objs fixes the shared block problem to a large extent.

            It should be possible to do a test run a full test against a raw e2image file for rach of the OSTs. This would reduce the risk of problems during the actual repair, give some confidence that the remaining problems will be repaired, and also minimize the system downtime because the debugfs scripts can be generated while the system is still running.

            Loopback mount the raw image file, run "ll_recover_lost_found_objs -n" against it and unmount. Generate and run the debugfs script against the raw image file, then run "e2fsck -fy" on the image to see what is left. If all goes well, the debugfs script can be used on the real device.

            adilger Andreas Dilger added a comment - I think it seems reasonable, so long as the new ll_recover_lost_found_objs fixes the shared block problem to a large extent. It should be possible to do a test run a full test against a raw e2image file for rach of the OSTs. This would reduce the risk of problems during the actual repair, give some confidence that the remaining problems will be repaired, and also minimize the system downtime because the debugfs scripts can be generated while the system is still running. Loopback mount the raw image file, run "ll_recover_lost_found_objs -n" against it and unmount. Generate and run the debugfs script against the raw image file, then run "e2fsck -fy" on the image to see what is left. If all goes well, the debugfs script can be used on the real device.

            Here's the list of corruption and a general plan of attack:
            ost_15 unattached inodes
            ost_19 multiply-claimed block (2780 inodes), unattached inodes
            ost_28 multiply-claimed blocks (52 inodes), unattached inodes
            ost_32 unattached inodes - will take multiple passes
            ost_34 multiply-claimed blocks (48 inodes), unattached inodes
            ost_36 multiply-claimed blocks (1792 inodes), unattached inodes
            ost_3 multiply-claimed blocks (376 inodes), unattached inodes
            ost_45 multiply-claimed blocks (362 inodes), unattached inodes

            Plan:
            1) using bind mount, run new ll_recover -n on osts with multiply-claimed blocks
            2) use list of duplicate files to create a debugfs script to clri and unlink each "evil twin" file
            3) take downtime to execute debugfs script
            4) [1 hour] run e2fsck -n on OSTs to make sure all multiply-claimed blocks are gone
            a) prepare script in advance to move files to lost+found if any are left
            5) [1 hour] run e2fsck -p on all OSTs, as well as ll_recover
            a) it's possible that there could be unknown issues at this stage
            6) [30 min] clri any multiply-claimed block files in l+f, delete all other files
            a) prepare script to nuke l+f
            7) [30 min] rerun e2fsck -p to verify that all OSTs are clean
            a) again, it's possible that there could be unknown issues at this stage

            So I am thinking 3 hours + 4 hours for unknown issues + 1 hour for startup/shutdown. What do you think of this plan/schedule?

            kitwestneat Kit Westneat (Inactive) added a comment - Here's the list of corruption and a general plan of attack: ost_15 unattached inodes ost_19 multiply-claimed block (2780 inodes), unattached inodes ost_28 multiply-claimed blocks (52 inodes), unattached inodes ost_32 unattached inodes - will take multiple passes ost_34 multiply-claimed blocks (48 inodes), unattached inodes ost_36 multiply-claimed blocks (1792 inodes), unattached inodes ost_3 multiply-claimed blocks (376 inodes), unattached inodes ost_45 multiply-claimed blocks (362 inodes), unattached inodes Plan: 1) using bind mount, run new ll_recover -n on osts with multiply-claimed blocks 2) use list of duplicate files to create a debugfs script to clri and unlink each "evil twin" file 3) take downtime to execute debugfs script 4) [1 hour] run e2fsck -n on OSTs to make sure all multiply-claimed blocks are gone a) prepare script in advance to move files to lost+found if any are left 5) [1 hour] run e2fsck -p on all OSTs, as well as ll_recover a) it's possible that there could be unknown issues at this stage 6) [30 min] clri any multiply-claimed block files in l+f, delete all other files a) prepare script to nuke l+f 7) [30 min] rerun e2fsck -p to verify that all OSTs are clean a) again, it's possible that there could be unknown issues at this stage So I am thinking 3 hours + 4 hours for unknown issues + 1 hour for startup/shutdown. What do you think of this plan/schedule?

            People

              niu Niu Yawei (Inactive)
              orentas Oz Rentas (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: