[LU-11577] fsck Multiply-claimed block(s) Created: 26/Oct/18  Updated: 21/Sep/22  Resolved: 27/Oct/18

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Critical
Reporter: Mahmoud Hanafi Assignee: Andreas Dilger
Resolution: Fixed Votes: 0
Labels: None

Attachments: File fsck.out    
Issue Links:
Related
is related to LU-16171 e2fsck should handle multiply-claimed... Resolved
Severity: 1
Rank (Obsolete): 9223372036854775807

 Description   

using the current e2fsprogs-1.44.3.wc1

we are getting lots of

Multiply-claimed block(s) in inode 486666: 986392108 986392259
Multiply-claimed block(s) in inode 486672: 986533901
Multiply-claimed block(s) in inode 486674: 986534585

Should we let fsck run and clean these up?



 Comments   
Comment by Peter Jones [ 27/Oct/18 ]

Three Sev 1s within one day must be a record!  Andreas has been in transit today too and will no doubt give a definitive answer when he is able to, but my guess is that it is ok to let lfsck run and move these files to the lost and found where you can then work out what can be salvaged.

Comment by Mahmoud Hanafi [ 27/Oct/18 ]

Yes it has been a nightmare.

 

The fsck is just taking a very very long time and is very slow.

 is it safe to run with

-E clone=zero shared=lost+found

or

shared=delete.

There should not be this many shared blocks.

Comment by Mahmoud Hanafi [ 27/Oct/18 ]

looks like a single inode with the majoriy of the issues

Inode: 3821842 Type: regular Mode: 0700 Flags: 0x1802f
Generation: 67584 Version: 0x00b70790
User: 0 Group: 22176 Size: 4299265028096
File ACL: 0
Links: 23506 Blockcount: 251949840
Fragment: Address: 0 Number: 0 Size: 0
ctime: 0x6e0f0f87 – Thu Jul 6 00:19:35 2028
atime: 0x2e2281b6 – Tue Jul 12 04:42:46 1994
mtime: 0x00000000 – Wed Dec 31 16:00:00 1969
Size of extra inode fields: 0
BLOCKS:

attaching fsck fsck.outoutput

Comment by Andreas Dilger [ 27/Oct/18 ]

The shared blocks phase can be very long, and often does not produce useful results, especially if there is random filesystem corruption that caused it (some random 32-bit data will always be a valid block number for filesystems over 16TB in size). The above inode looks like total garbage - random blocks and size, timestamps, etc.

It doesn't look like you have that big a problem, so if it has already finished there is no need to do anything else immediately. If it is still running then you might consider to restart e2fsck with the "-E" options above. Either "shared=lost+found" or "shared=delete" is probably OK (obviously the filesysyem metadata won't be deleted).

Alternately, you could use debugfs -w to mount the filesystem and zero out this inode via "mi <3821842>" and set all of the fields to zero, which should avoid e2fsck doing anything with it except mark it deleted.

Comment by Mahmoud Hanafi [ 27/Oct/18 ]

Thank the mi <> worked.

You may close this case.

 

 

Comment by Andreas Dilger [ 27/Oct/18 ]

Yes, you could try to delete the object, but might trigger the mounted filesystem to go read-only as it detects filesystem metadata blocks being freed. Using debugfs to just zero out the inode avoids it from actually trying to free the blocks. Note that you can use "debugfs -w -R 'clri <3821842>' /dev/mapper/nbp10_1-OST30" should do the same thing as running "mi" interactively - zero out the offending inode without trying to free the blocks.

In any case, you still need to run a full e2fsck after cleaning up the inode. Directly deleting it will mark the "shared" blocks as freed in the allocation bitmaps, and without the post-delete e2fsck it will lead to further corruption as those blocks are reallocated. Zeroing it out with debugfs will likely mean less metadata to repair, but there may still be errors to fix.

Generated at Sat Feb 10 02:45:05 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.