Details

    • Bug
    • Resolution: Fixed
    • Critical
    • None
    • None
    • None
    • 1
    • 9223372036854775807

    Description

      using the current e2fsprogs-1.44.3.wc1

      we are getting lots of

      Multiply-claimed block(s) in inode 486666: 986392108 986392259
      Multiply-claimed block(s) in inode 486672: 986533901
      Multiply-claimed block(s) in inode 486674: 986534585

      Should we let fsck run and clean these up?

      Attachments

        Issue Links

          Activity

            [LU-11577] fsck Multiply-claimed block(s)

            Yes, you could try to delete the object, but might trigger the mounted filesystem to go read-only as it detects filesystem metadata blocks being freed. Using debugfs to just zero out the inode avoids it from actually trying to free the blocks. Note that you can use "debugfs -w -R 'clri <3821842>' /dev/mapper/nbp10_1-OST30" should do the same thing as running "mi" interactively - zero out the offending inode without trying to free the blocks.

            In any case, you still need to run a full e2fsck after cleaning up the inode. Directly deleting it will mark the "shared" blocks as freed in the allocation bitmaps, and without the post-delete e2fsck it will lead to further corruption as those blocks are reallocated. Zeroing it out with debugfs will likely mean less metadata to repair, but there may still be errors to fix.

            adilger Andreas Dilger added a comment - Yes, you could try to delete the object, but might trigger the mounted filesystem to go read-only as it detects filesystem metadata blocks being freed. Using debugfs to just zero out the inode avoids it from actually trying to free the blocks. Note that you can use " debugfs -w -R 'clri <3821842>' /dev/mapper/nbp10_1-OST30 " should do the same thing as running " mi " interactively - zero out the offending inode without trying to free the blocks. In any case, you still need to run a full e2fsck after cleaning up the inode. Directly deleting it will mark the "shared" blocks as freed in the allocation bitmaps, and without the post-delete e2fsck it will lead to further corruption as those blocks are reallocated. Zeroing it out with debugfs will likely mean less metadata to repair, but there may still be errors to fix.
            mhanafi Mahmoud Hanafi added a comment - - edited

            Thank the mi <> worked.

            You may close this case.

             

             

            mhanafi Mahmoud Hanafi added a comment - - edited Thank the mi <> worked. You may close this case.    

            The shared blocks phase can be very long, and often does not produce useful results, especially if there is random filesystem corruption that caused it (some random 32-bit data will always be a valid block number for filesystems over 16TB in size). The above inode looks like total garbage - random blocks and size, timestamps, etc.

            It doesn't look like you have that big a problem, so if it has already finished there is no need to do anything else immediately. If it is still running then you might consider to restart e2fsck with the "-E" options above. Either "shared=lost+found" or "shared=delete" is probably OK (obviously the filesysyem metadata won't be deleted).

            Alternately, you could use debugfs -w to mount the filesystem and zero out this inode via "mi <3821842>" and set all of the fields to zero, which should avoid e2fsck doing anything with it except mark it deleted.

            adilger Andreas Dilger added a comment - The shared blocks phase can be very long, and often does not produce useful results, especially if there is random filesystem corruption that caused it (some random 32-bit data will always be a valid block number for filesystems over 16TB in size). The above inode looks like total garbage - random blocks and size, timestamps, etc. It doesn't look like you have that big a problem, so if it has already finished there is no need to do anything else immediately. If it is still running then you might consider to restart e2fsck with the "-E" options above. Either "shared=lost+found" or "shared=delete" is probably OK (obviously the filesysyem metadata won't be deleted). Alternately, you could use debugfs -w to mount the filesystem and zero out this inode via " mi <3821842> " and set all of the fields to zero, which should avoid e2fsck doing anything with it except mark it deleted.

            looks like a single inode with the majoriy of the issues

            Inode: 3821842 Type: regular Mode: 0700 Flags: 0x1802f
            Generation: 67584 Version: 0x00b70790
            User: 0 Group: 22176 Size: 4299265028096
            File ACL: 0
            Links: 23506 Blockcount: 251949840
            Fragment: Address: 0 Number: 0 Size: 0
            ctime: 0x6e0f0f87 – Thu Jul 6 00:19:35 2028
            atime: 0x2e2281b6 – Tue Jul 12 04:42:46 1994
            mtime: 0x00000000 – Wed Dec 31 16:00:00 1969
            Size of extra inode fields: 0
            BLOCKS:

            attaching fsck fsck.outoutput

            mhanafi Mahmoud Hanafi added a comment - looks like a single inode with the majoriy of the issues Inode: 3821842 Type: regular Mode: 0700 Flags: 0x1802f Generation: 67584 Version: 0x00b70790 User: 0 Group: 22176 Size: 4299265028096 File ACL: 0 Links: 23506 Blockcount: 251949840 Fragment: Address: 0 Number: 0 Size: 0 ctime: 0x6e0f0f87 – Thu Jul 6 00:19:35 2028 atime: 0x2e2281b6 – Tue Jul 12 04:42:46 1994 mtime: 0x00000000 – Wed Dec 31 16:00:00 1969 Size of extra inode fields: 0 BLOCKS: attaching fsck fsck.out output
            mhanafi Mahmoud Hanafi added a comment - - edited

            Yes it has been a nightmare.

             

            The fsck is just taking a very very long time and is very slow.

             is it safe to run with

            -E clone=zero shared=lost+found

            or

            shared=delete.

            There should not be this many shared blocks.

            mhanafi Mahmoud Hanafi added a comment - - edited Yes it has been a nightmare.   The fsck is just taking a very very long time and is very slow.  is it safe to run with -E clone=zero shared=lost+found or shared=delete. There should not be this many shared blocks.
            pjones Peter Jones added a comment -

            Three Sev 1s within one day must be a record!  Andreas has been in transit today too and will no doubt give a definitive answer when he is able to, but my guess is that it is ok to let lfsck run and move these files to the lost and found where you can then work out what can be salvaged.

            pjones Peter Jones added a comment - Three Sev 1s within one day must be a record!  Andreas has been in transit today too and will no doubt give a definitive answer when he is able to, but my guess is that it is ok to let lfsck run and move these files to the lost and found where you can then work out what can be salvaged.

            People

              adilger Andreas Dilger
              mhanafi Mahmoud Hanafi
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: