[LU-11577] fsck Multiply-claimed block(s) Created: 26/Oct/18 Updated: 21/Sep/22 Resolved: 27/Oct/18 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Critical |
| Reporter: | Mahmoud Hanafi | Assignee: | Andreas Dilger |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Severity: | 1 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
using the current e2fsprogs-1.44.3.wc1 we are getting lots of Multiply-claimed block(s) in inode 486666: 986392108 986392259 Should we let fsck run and clean these up? |
| Comments |
| Comment by Peter Jones [ 27/Oct/18 ] |
|
Three Sev 1s within one day must be a record! |
| Comment by Mahmoud Hanafi [ 27/Oct/18 ] |
|
Yes it has been a nightmare.
The fsck is just taking a very very long time and is very slow. is it safe to run with -E clone=zero shared=lost+found or shared=delete. There should not be this many shared blocks. |
| Comment by Mahmoud Hanafi [ 27/Oct/18 ] |
|
looks like a single inode with the majoriy of the issues Inode: 3821842 Type: regular Mode: 0700 Flags: 0x1802f attaching fsck fsck.out |
| Comment by Andreas Dilger [ 27/Oct/18 ] |
|
The shared blocks phase can be very long, and often does not produce useful results, especially if there is random filesystem corruption that caused it (some random 32-bit data will always be a valid block number for filesystems over 16TB in size). The above inode looks like total garbage - random blocks and size, timestamps, etc. It doesn't look like you have that big a problem, so if it has already finished there is no need to do anything else immediately. If it is still running then you might consider to restart e2fsck with the "-E" options above. Either "shared=lost+found" or "shared=delete" is probably OK (obviously the filesysyem metadata won't be deleted). Alternately, you could use debugfs -w to mount the filesystem and zero out this inode via "mi <3821842>" and set all of the fields to zero, which should avoid e2fsck doing anything with it except mark it deleted. |
| Comment by Mahmoud Hanafi [ 27/Oct/18 ] |
|
Thank the mi <> worked. You may close this case.
|
| Comment by Andreas Dilger [ 27/Oct/18 ] |
|
Yes, you could try to delete the object, but might trigger the mounted filesystem to go read-only as it detects filesystem metadata blocks being freed. Using debugfs to just zero out the inode avoids it from actually trying to free the blocks. Note that you can use "debugfs -w -R 'clri <3821842>' /dev/mapper/nbp10_1-OST30" should do the same thing as running "mi" interactively - zero out the offending inode without trying to free the blocks. In any case, you still need to run a full e2fsck after cleaning up the inode. Directly deleting it will mark the "shared" blocks as freed in the allocation bitmaps, and without the post-delete e2fsck it will lead to further corruption as those blocks are reallocated. Zeroing it out with debugfs will likely mean less metadata to repair, but there may still be errors to fix. |