[LU-4102] lots of multiply-claimed blocks in e2fsck - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Cannot Reproduce
Priority: Major
Fix Version/s: None
Affects Version/s: Lustre 1.8.8
Labels:
- mn1
Environment:
e2fsprogs 1.41.90.wc2

Severity:
3
Rank (Obsolete):
11017

Description

After a power loss, an older e2fsck (e2fsprogs 1.41.90.wc2) was run on the OSTs. It found tons of multiply-claimed blocks, including for the /O directory. Here's an example of one of the inodes:

File ... (inode #17825793, mod time Wed Aug 15 19:02:25 2012)
  has 1 multiply-claimed block(s), shared with 1 file(s):
        /O (inode #84934657, mod time Wed Aug 15 19:02:25 2012)
Clone multiply-claimed blocks? yes

Inode 17825793 doesn't have an associated directory entry, it eventually gets put into lost+found.

So the questions are:

how could this have happened? My slightly-informed-probably-wrong theory is that the journal got corrupted and it replayed some old inodes back into existence. I noticed there were a lot of patches dealing with journal checksums committed after 1.41.90.
what's the best way to deal with these? Cloning takes forever when you are talking about TB sized files. I tested the delete extended option, and it looks like it deletes both sides of the file. It would be nice if it just deleted the unlinked side. Right now my plan is to create a debugfs script from the read-only e2fsck output, but if there is a better way, that would be good.

Thanks.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

e2fsck.patch1
2 kB
19/Oct/13 4:40 AM
lfs2_unusual_syslog_msgs_2013-10-19.txt
67 kB
19/Oct/13 5:06 PM
ost15.log.tgz
599 kB
14/Oct/13 10:09 PM

Issue Links

is related to

LU-4745 Interop 2.5.0<->2.6 failure on test suite conf-sanity test_52: ll_recover_lost_found_objs failed

Resolved

LU-16171 e2fsck should handle multiply-claimed blocks better

Resolved

Activity

[LU-4102] lots of multiply-claimed blocks in e2fsck

Peter Jones added a comment - 19/Jul/17 1:19 PM

Any work still outstanding should be tracked under a new ticket

Peter Jones added a comment - 19/Jul/17 1:19 PM Any work still outstanding should be tracked under a new ticket

Kit Westneat (Inactive) added a comment - 21/Nov/13 10:45 PM

Hi Andreas,

I wanted to tie up the loose ends with the e2fsck patches in this thread. Is the shared=ignore patch something that could be landed? Should I create a gerrit changeset for it?

As for the skip-invalid-bitmap issue, what's the best path to resolution on that? Should I create a new Jira ticket for it or what do you suggest?

Thanks,
Kit

Kit Westneat (Inactive) added a comment - 21/Nov/13 10:45 PM Hi Andreas, I wanted to tie up the loose ends with the e2fsck patches in this thread. Is the shared=ignore patch something that could be landed? Should I create a gerrit changeset for it? As for the skip-invalid-bitmap issue, what's the best path to resolution on that? Should I create a new Jira ticket for it or what do you suggest? Thanks, Kit

Andreas Dilger added a comment - 06/Nov/13 7:22 AM

http://review.whamcloud.com/8188 updates lustre/ChangeLog to recommend using a newer e2fsprogs for b2_1.

Andreas Dilger added a comment - 06/Nov/13 7:22 AM http://review.whamcloud.com/8188 updates lustre/ChangeLog to recommend using a newer e2fsprogs for b2_1.

Kit Westneat (Inactive) added a comment - 28/Oct/13 5:27 PM

Hi Andreas,

One of the targets just went RO:
Oct 28 17:09:45 lfs-oss-2-6 kernel: LDISKFS-fs error (device dm-50): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 577corrupted: 24074 blocks free in bitmap,
24075 - in gd

It looks like I missed it when disabling targets. I had forgotten that I used the shared=ignore flag when cleaning it up, so the clean bill of health from e2fsck was an illusion.

I've marked it deactivated on the MDT. Hopefully it can hold until Wednesday.

Kit Westneat (Inactive) added a comment - 28/Oct/13 5:27 PM Hi Andreas, One of the targets just went RO: Oct 28 17:09:45 lfs-oss-2-6 kernel: LDISKFS-fs error (device dm-50): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 577corrupted: 24074 blocks free in bitmap, 24075 - in gd It looks like I missed it when disabling targets. I had forgotten that I used the shared=ignore flag when cleaning it up, so the clean bill of health from e2fsck was an illusion. I've marked it deactivated on the MDT. Hopefully it can hold until Wednesday.

Andreas Dilger added a comment - 25/Oct/13 4:31 PM

I think it seems reasonable, so long as the new ll_recover_lost_found_objs fixes the shared block problem to a large extent.

It should be possible to do a test run a full test against a raw e2image file for rach of the OSTs. This would reduce the risk of problems during the actual repair, give some confidence that the remaining problems will be repaired, and also minimize the system downtime because the debugfs scripts can be generated while the system is still running.

Loopback mount the raw image file, run "ll_recover_lost_found_objs -n" against it and unmount. Generate and run the debugfs script against the raw image file, then run "e2fsck -fy" on the image to see what is left. If all goes well, the debugfs script can be used on the real device.

Andreas Dilger added a comment - 25/Oct/13 4:31 PM I think it seems reasonable, so long as the new ll_recover_lost_found_objs fixes the shared block problem to a large extent. It should be possible to do a test run a full test against a raw e2image file for rach of the OSTs. This would reduce the risk of problems during the actual repair, give some confidence that the remaining problems will be repaired, and also minimize the system downtime because the debugfs scripts can be generated while the system is still running. Loopback mount the raw image file, run "ll_recover_lost_found_objs -n" against it and unmount. Generate and run the debugfs script against the raw image file, then run "e2fsck -fy" on the image to see what is left. If all goes well, the debugfs script can be used on the real device.

Kit Westneat (Inactive) added a comment - 25/Oct/13 4:05 PM

Here's the list of corruption and a general plan of attack:
ost_15 unattached inodes
ost_19 multiply-claimed block (2780 inodes), unattached inodes
ost_28 multiply-claimed blocks (52 inodes), unattached inodes
ost_32 unattached inodes - will take multiple passes
ost_34 multiply-claimed blocks (48 inodes), unattached inodes
ost_36 multiply-claimed blocks (1792 inodes), unattached inodes
ost_3 multiply-claimed blocks (376 inodes), unattached inodes
ost_45 multiply-claimed blocks (362 inodes), unattached inodes

Plan:
1) using bind mount, run new ll_recover -n on osts with multiply-claimed blocks
2) use list of duplicate files to create a debugfs script to clri and unlink each "evil twin" file
3) take downtime to execute debugfs script
4) [1 hour] run e2fsck -n on OSTs to make sure all multiply-claimed blocks are gone
a) prepare script in advance to move files to lost+found if any are left
5) [1 hour] run e2fsck -p on all OSTs, as well as ll_recover
a) it's possible that there could be unknown issues at this stage
6) [30 min] clri any multiply-claimed block files in l+f, delete all other files
a) prepare script to nuke l+f
7) [30 min] rerun e2fsck -p to verify that all OSTs are clean
a) again, it's possible that there could be unknown issues at this stage

So I am thinking 3 hours + 4 hours for unknown issues + 1 hour for startup/shutdown. What do you think of this plan/schedule?

Kit Westneat (Inactive) added a comment - 25/Oct/13 4:05 PM Here's the list of corruption and a general plan of attack: ost_15 unattached inodes ost_19 multiply-claimed block (2780 inodes), unattached inodes ost_28 multiply-claimed blocks (52 inodes), unattached inodes ost_32 unattached inodes - will take multiple passes ost_34 multiply-claimed blocks (48 inodes), unattached inodes ost_36 multiply-claimed blocks (1792 inodes), unattached inodes ost_3 multiply-claimed blocks (376 inodes), unattached inodes ost_45 multiply-claimed blocks (362 inodes), unattached inodes Plan: 1) using bind mount, run new ll_recover -n on osts with multiply-claimed blocks 2) use list of duplicate files to create a debugfs script to clri and unlink each "evil twin" file 3) take downtime to execute debugfs script 4) [1 hour] run e2fsck -n on OSTs to make sure all multiply-claimed blocks are gone a) prepare script in advance to move files to lost+found if any are left 5) [1 hour] run e2fsck -p on all OSTs, as well as ll_recover a) it's possible that there could be unknown issues at this stage 6) [30 min] clri any multiply-claimed block files in l+f, delete all other files a) prepare script to nuke l+f 7) [30 min] rerun e2fsck -p to verify that all OSTs are clean a) again, it's possible that there could be unknown issues at this stage So I am thinking 3 hours + 4 hours for unknown issues + 1 hour for startup/shutdown. What do you think of this plan/schedule?

Kit Westneat (Inactive) added a comment - 24/Oct/13 8:31 PM

Sorry I was unclear which disks, I meant the system disks are formatted as ext3, where I was building the sparse file. I got a non-sparse e2image I am copying over to webspace. Nathan also set up a Lustre filesystem I will use to dump a sparse image to.

Kit Westneat (Inactive) added a comment - 24/Oct/13 8:31 PM Sorry I was unclear which disks, I meant the system disks are formatted as ext3, where I was building the sparse file. I got a non-sparse e2image I am copying over to webspace. Nathan also set up a Lustre filesystem I will use to dump a sparse image to.

Andreas Dilger added a comment - 24/Oct/13 8:24 PM

NB - you can avoid the 2TB limit if you stripe the file more widely so that individual objects are below 2TB. If you are running an ext4-based ldiskfs (presumably yes) but on a filesystem that was formatted a while ago, you can use "tune2fs -O huge_file" to enable larger-than-2TB files also.

Andreas Dilger added a comment - 24/Oct/13 8:24 PM NB - you can avoid the 2TB limit if you stripe the file more widely so that individual objects are below 2TB. If you are running an ext4-based ldiskfs (presumably yes) but on a filesystem that was formatted a while ago, you can use "tune2fs -O huge_file" to enable larger-than-2TB files also.

Kit Westneat (Inactive) added a comment - 24/Oct/13 5:40 PM

Actually non-sparse e2image works fine. It looks like the sparse image is having issues past 2TB. I guess the FSes on this OSS are all ext3, so that would explain it.

I'll create a dm snapshot to test the tool on.

Kit Westneat (Inactive) added a comment - 24/Oct/13 5:40 PM Actually non-sparse e2image works fine. It looks like the sparse image is having issues past 2TB. I guess the FSes on this OSS are all ext3, so that would explain it. I'll create a dm snapshot to test the tool on.

Kit Westneat (Inactive) added a comment - 24/Oct/13 5:12 PM

Hi Andreas,

I'm getting an error when I try to run e2image, I've done it on a couple OSTs:
[root@lfs-oss-2-4 ~]# /usr/bin/time e2image -r /dev/mapper/ost_lfs2_19 /scratch/ost_lfs2_19.img
e2image 1.42.7.wc1 (12-Apr-2013)
seek: Invalid argument
Command exited with non-zero status 1

I got an strace, would that be useful?

Kit Westneat (Inactive) added a comment - 24/Oct/13 5:12 PM Hi Andreas, I'm getting an error when I try to run e2image, I've done it on a couple OSTs: [root@lfs-oss-2-4 ~] # /usr/bin/time e2image -r /dev/mapper/ost_lfs2_19 /scratch/ost_lfs2_19.img e2image 1.42.7.wc1 (12-Apr-2013) seek: Invalid argument Command exited with non-zero status 1 I got an strace, would that be useful?

People

Assignee:: Niu Yawei (Inactive)

Reporter:: Oz Rentas (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 14/Oct/13 4:48 PM

Updated:: 19/Sep/22 7:48 PM

Resolved:: 19/Jul/17 1:19 PM