[LU-11577] fsck Multiply-claimed block(s) - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Critical
Fix Version/s: None
Affects Version/s: None
Labels:
None

Severity:
1
Rank (Obsolete):
9223372036854775807

Description

using the current e2fsprogs-1.44.3.wc1

we are getting lots of

Multiply-claimed block(s) in inode 486666: 986392108 986392259
Multiply-claimed block(s) in inode 486672: 986533901
Multiply-claimed block(s) in inode 486674: 986534585

Should we let fsck run and clean these up?

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

fsck.out
49 kB
27/Oct/18 12:26 AM

Issue Links

is related to

LU-16171 e2fsck should handle multiply-claimed blocks better

Resolved

Activity

[LU-11577] fsck Multiply-claimed block(s)

Andreas Dilger added a comment - 27/Oct/18 2:37 AM

Yes, you could try to delete the object, but might trigger the mounted filesystem to go read-only as it detects filesystem metadata blocks being freed. Using debugfs to just zero out the inode avoids it from actually trying to free the blocks. Note that you can use "debugfs -w -R 'clri <3821842>' /dev/mapper/nbp10_1-OST30" should do the same thing as running "mi" interactively - zero out the offending inode without trying to free the blocks.

In any case, you still need to run a full e2fsck after cleaning up the inode. Directly deleting it will mark the "shared" blocks as freed in the allocation bitmaps, and without the post-delete e2fsck it will lead to further corruption as those blocks are reallocated. Zeroing it out with debugfs will likely mean less metadata to repair, but there may still be errors to fix.

Andreas Dilger added a comment - 27/Oct/18 2:37 AM Yes, you could try to delete the object, but might trigger the mounted filesystem to go read-only as it detects filesystem metadata blocks being freed. Using debugfs to just zero out the inode avoids it from actually trying to free the blocks. Note that you can use " debugfs -w -R 'clri <3821842>' /dev/mapper/nbp10_1-OST30 " should do the same thing as running " mi " interactively - zero out the offending inode without trying to free the blocks. In any case, you still need to run a full e2fsck after cleaning up the inode. Directly deleting it will mark the "shared" blocks as freed in the allocation bitmaps, and without the post-delete e2fsck it will lead to further corruption as those blocks are reallocated. Zeroing it out with debugfs will likely mean less metadata to repair, but there may still be errors to fix.

Mahmoud Hanafi added a comment - 27/Oct/18 2:29 AM - edited

Thank the mi <> worked.

You may close this case.

Mahmoud Hanafi added a comment - 27/Oct/18 2:29 AM - edited Thank the mi <> worked. You may close this case.

Andreas Dilger added a comment - 27/Oct/18 2:20 AM

The shared blocks phase can be very long, and often does not produce useful results, especially if there is random filesystem corruption that caused it (some random 32-bit data will always be a valid block number for filesystems over 16TB in size). The above inode looks like total garbage - random blocks and size, timestamps, etc.

It doesn't look like you have that big a problem, so if it has already finished there is no need to do anything else immediately. If it is still running then you might consider to restart e2fsck with the "-E" options above. Either "shared=lost+found" or "shared=delete" is probably OK (obviously the filesysyem metadata won't be deleted).

Alternately, you could use debugfs -w to mount the filesystem and zero out this inode via "mi <3821842>" and set all of the fields to zero, which should avoid e2fsck doing anything with it except mark it deleted.

Andreas Dilger added a comment - 27/Oct/18 2:20 AM The shared blocks phase can be very long, and often does not produce useful results, especially if there is random filesystem corruption that caused it (some random 32-bit data will always be a valid block number for filesystems over 16TB in size). The above inode looks like total garbage - random blocks and size, timestamps, etc. It doesn't look like you have that big a problem, so if it has already finished there is no need to do anything else immediately. If it is still running then you might consider to restart e2fsck with the "-E" options above. Either "shared=lost+found" or "shared=delete" is probably OK (obviously the filesysyem metadata won't be deleted). Alternately, you could use debugfs -w to mount the filesystem and zero out this inode via " mi <3821842> " and set all of the fields to zero, which should avoid e2fsck doing anything with it except mark it deleted.

Mahmoud Hanafi added a comment - 27/Oct/18 12:26 AM

looks like a single inode with the majoriy of the issues

Inode: 3821842 Type: regular Mode: 0700 Flags: 0x1802f
Generation: 67584 Version: 0x00b70790
User: 0 Group: 22176 Size: 4299265028096
File ACL: 0
Links: 23506 Blockcount: 251949840
Fragment: Address: 0 Number: 0 Size: 0
ctime: 0x6e0f0f87 – Thu Jul 6 00:19:35 2028
atime: 0x2e2281b6 – Tue Jul 12 04:42:46 1994
mtime: 0x00000000 – Wed Dec 31 16:00:00 1969
Size of extra inode fields: 0
BLOCKS:

attaching fsck fsck.outoutput

Mahmoud Hanafi added a comment - 27/Oct/18 12:26 AM looks like a single inode with the majoriy of the issues Inode: 3821842 Type: regular Mode: 0700 Flags: 0x1802f Generation: 67584 Version: 0x00b70790 User: 0 Group: 22176 Size: 4299265028096 File ACL: 0 Links: 23506 Blockcount: 251949840 Fragment: Address: 0 Number: 0 Size: 0 ctime: 0x6e0f0f87 – Thu Jul 6 00:19:35 2028 atime: 0x2e2281b6 – Tue Jul 12 04:42:46 1994 mtime: 0x00000000 – Wed Dec 31 16:00:00 1969 Size of extra inode fields: 0 BLOCKS: attaching fsck fsck.out output

Mahmoud Hanafi added a comment - 27/Oct/18 12:07 AM - edited

Yes it has been a nightmare.

The fsck is just taking a very very long time and is very slow.

is it safe to run with

-E clone=zero shared=lost+found

or

shared=delete.

There should not be this many shared blocks.

Mahmoud Hanafi added a comment - 27/Oct/18 12:07 AM - edited Yes it has been a nightmare. The fsck is just taking a very very long time and is very slow. is it safe to run with -E clone=zero shared=lost+found or shared=delete. There should not be this many shared blocks.

Peter Jones added a comment - 27/Oct/18 12:00 AM

Three Sev 1s within one day must be a record! Andreas has been in transit today too and will no doubt give a definitive answer when he is able to, but my guess is that it is ok to let lfsck run and move these files to the lost and found where you can then work out what can be salvaged.

Peter Jones added a comment - 27/Oct/18 12:00 AM Three Sev 1s within one day must be a record! Andreas has been in transit today too and will no doubt give a definitive answer when he is able to, but my guess is that it is ok to let lfsck run and move these files to the lost and found where you can then work out what can be salvaged.

fsck Multiply-claimed block(s)

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates