[LU-3542] deleted/unused inodes not actually cleared by e2fsck - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Critical
Fix Version/s: None
Affects Version/s: None
Labels:
None
Environment:
Centos5, e2fsprogs-1.42.7.wc1-0redhat

Severity:
2
Rank (Obsolete):
8914

Description

e2fsck doesn't actually clear deleted/unused inodes, though it claims to. I've attached a log showing what we are seeing. The customer is CalTech.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

e2fsck_safe_repair_ost_3.log-1
20 kB
02/Jul/13 4:12 PM
e2fsck_safe_repair_ost_3.log-2
17 kB
02/Jul/13 4:12 PM
e2fsck.log
9 kB
01/Jul/13 2:20 PM
fsck.hpfs2-eg3-oss11.ost0.2013_11_07.out1
186 kB
07/Nov/13 10:57 PM
htree.dump
122 kB
02/Jul/13 4:11 PM

Activity

[LU-3542] deleted/unused inodes not actually cleared by e2fsck

Niu Yawei (Inactive) added a comment - 25/Nov/13 2:54 AM

Hi Niu, I was able to convert the qcow image to a raw (sparse) image on an XFS filesystem. It uses 3.6GB, though it reports a size of 28TB:
[root@oxonia-mds1 rcvy]# ls -lh
total 3.6G
rw------ 1 root root 28T Nov 22 10:04 ost000b.raw

oh, I didn't notice it's sparse file. Then I think it can be converted on ext4 either, however, I got following error while trying to convert it on ext4 (actual size 1.6G, showed 16T):

e2image: Invalid argument while trying to convert qcow2 image (ost000b.qcow) into raw image

If the 3.6G file you mentioned is http://ddntsr.com/ftp/2013-10-30-lustre-ost_lfs2_36.qcow2.bz2, could you upload it to whamcloud ftp? cause I have no permission to access the ddn ftp server.

Niu Yawei (Inactive) added a comment - 25/Nov/13 2:54 AM Hi Niu, I was able to convert the qcow image to a raw (sparse) image on an XFS filesystem. It uses 3.6GB, though it reports a size of 28TB: [root@oxonia-mds1 rcvy] # ls -lh total 3.6G rw------ 1 root root 28T Nov 22 10:04 ost000b.raw oh, I didn't notice it's sparse file. Then I think it can be converted on ext4 either, however, I got following error while trying to convert it on ext4 (actual size 1.6G, showed 16T): e2image: Invalid argument while trying to convert qcow2 image (ost000b.qcow) into raw image If the 3.6G file you mentioned is http://ddntsr.com/ftp/2013-10-30-lustre-ost_lfs2_36.qcow2.bz2 , could you upload it to whamcloud ftp? cause I have no permission to access the ddn ftp server.

Kit Westneat (Inactive) added a comment - 22/Nov/13 6:06 PM

Hi Niu, I was able to convert the qcow image to a raw (sparse) image on an XFS filesystem. It uses 3.6GB, though it reports a size of 28TB:
[root@oxonia-mds1 rcvy]# ls -lh
total 3.6G
rw------ 1 root root 28T Nov 22 10:04 ost000b.raw

Kit Westneat (Inactive) added a comment - 22/Nov/13 6:06 PM Hi Niu, I was able to convert the qcow image to a raw (sparse) image on an XFS filesystem. It uses 3.6GB, though it reports a size of 28TB: [root@oxonia-mds1 rcvy] # ls -lh total 3.6G rw ------ 1 root root 28T Nov 22 10:04 ost000b.raw

Niu Yawei (Inactive) added a comment - 22/Nov/13 6:41 AM

The raw device of ftp.whamcloud.com/uploads/~~LU-3542~~/ost000b.qcow.bz2 is 16TB? It's hard for me to find a machine with that big drive to reproduce the problem, is there any smaller OST which has the same problem?

Niu Yawei (Inactive) added a comment - 22/Nov/13 6:41 AM The raw device of ftp.whamcloud.com/uploads/ LU-3542 /ost000b.qcow.bz2 is 16TB? It's hard for me to find a machine with that big drive to reproduce the problem, is there any smaller OST which has the same problem?

Darby Vicker added a comment - 12/Nov/13 5:50 PM

I just uploaded my qcow image to ftp.whamcloud.com/uploads/~~LU-3542~~/ost000b.qcow.bz2

Darby Vicker added a comment - 12/Nov/13 5:50 PM I just uploaded my qcow image to ftp.whamcloud.com/uploads/ LU-3542 /ost000b.qcow.bz2

Darby Vicker added a comment - 07/Nov/13 10:56 PM

We ran into this problem as well. I'll attach the fsck output to this JIRA. Email me if you'd like me to send you the qcow image.

Darby Vicker added a comment - 07/Nov/13 10:56 PM We ran into this problem as well. I'll attach the fsck output to this JIRA. Email me if you'd like me to send you the qcow image.

Kit Westneat (Inactive) added a comment - 30/Oct/13 4:21 PM

I got a qcow image with a file exhibiting the corruption, it's available here:
http://ddntsr.com/ftp/2013-10-30-lustre-ost_lfs2_36.qcow2.bz2 [295M]

e2fsck -fp /dev/mapper/ost_lfs2_36
lfs2-OST0024: Entry '62977970' in /O/0/d18 (88080410) has deleted/unused inode 1051496. CLEARED.
lfs2-OST0024: 1546929/89620480 files (9.5% non-contiguous), 2367418676/5735710720 blocks

e2fsck -fp /dev/mapper/ost_lfs2_36
lfs2-OST0024: Entry '62977970' in /O/0/d18 (88080410) has deleted/unused inode 1051496. CLEARED.
lfs2-OST0024: 1546929/89620480 files (9.5% non-contiguous), 2367418676/5735710720 blocks

Kit Westneat (Inactive) added a comment - 30/Oct/13 4:21 PM I got a qcow image with a file exhibiting the corruption, it's available here: http://ddntsr.com/ftp/2013-10-30-lustre-ost_lfs2_36.qcow2.bz2 [295M] e2fsck -fp /dev/mapper/ost_lfs2_36 lfs2-OST0024: Entry '62977970' in /O/0/d18 (88080410) has deleted/unused inode 1051496. CLEARED. lfs2-OST0024: 1546929/89620480 files (9.5% non-contiguous), 2367418676/5735710720 blocks e2fsck -fp /dev/mapper/ost_lfs2_36 lfs2-OST0024: Entry '62977970' in /O/0/d18 (88080410) has deleted/unused inode 1051496. CLEARED. lfs2-OST0024: 1546929/89620480 files (9.5% non-contiguous), 2367418676/5735710720 blocks

Andreas Dilger added a comment - 24/Oct/13 5:07 PM

Even if there isn't a 100% chance that OST has the problem, it is still worthwhile to make an image of the OST. This will first give us an idea of how long it takes to generate the image, how large it is (uncompressed and compressed), and it can also be used to test the ~~LU-4102~~ code.

Andreas Dilger added a comment - 24/Oct/13 5:07 PM Even if there isn't a 100% chance that OST has the problem, it is still worthwhile to make an image of the OST. This will first give us an idea of how long it takes to generate the image, how large it is (uncompressed and compressed), and it can also be used to test the LU-4102 code.

Kit Westneat (Inactive) added a comment - 23/Oct/13 6:45 PM

I don't think any of the OSTs described in ~~LU-4102~~ currently has the deleted/unused inodes issue. All the ones that reported it on the r/o e2fsck had previously been clean, so I think that it's just a matter of them being in use. That being said I could get an image of the OST (ost_45) that had the error before.. Do you think that might be useful? I have the e2fsck output as well.

Kit Westneat (Inactive) added a comment - 23/Oct/13 6:45 PM I don't think any of the OSTs described in LU-4102 currently has the deleted/unused inodes issue. All the ones that reported it on the r/o e2fsck had previously been clean, so I think that it's just a matter of them being in use. That being said I could get an image of the OST (ost_45) that had the error before.. Do you think that might be useful? I have the e2fsck output as well.

Andreas Dilger added a comment - 23/Oct/13 5:40 AM

I looked through the relevant code in pass2.c::check_dir_block():

                /* 
                 * Offer to clear unused inodes; if we are going to be
                 * restarting the scan due to bg_itable_unused being
                 * wrong, then don't clear any inodes to avoid zapping
                 * inodes that were skipped during pass1 due to an
                 * incorrect bg_itable_unused; we'll get any real
                 * problems after we restart.
                 */
                if (!(ctx->flags & E2F_FLAG_RESTART_LATER) &&
                    !(ext2fs_test_inode_bitmap2(ctx->inode_used_map,
                                                dirent->inode)))
                        problem = PR_2_UNUSED_INODE;

                if (problem) {
                        if (fix_problem(ctx, problem, &cd->pctx)) {
                                dirent->inode = 0;
                                dir_modified++;
                                goto next;

It is easy to trigger the PR_2_UNUSED_INODE problem by setting nlink = 0 in the inode(s) via debugfs. However, when I run e2fsck against such a filesystem (whether with small directories or large htree directories) e2fsck fixes the problem by clearing the dirent (setting inode = 0 above, and later writing out the directory block) and a second check shows it is fixed.

To capture a filesystem that has a persistent case of this problem (after "e2fsck -fy" didn't fix it) so that it can be debugged and fixed, please use e2image to dump the filesystem metadata. The dense image format can be efficiently compressed and transported, unlike the sparse variant of e2image:

e2image -Q /dev/OSTnnnn OSTnnnn.qcow
bzip2 -9 OSTnnnn.qcow

Hopefully the OSTnnnn.qcow.bz2 image size is small enough for transport. It is possible to reconstitute the (uncompressed) qcow file into a raw ext4 image file that can be tested with e2fsck, debugfs, or mounted via loopback.

e2image -r OSTnnnn.qcow OSTnnnn.raw

Andreas Dilger added a comment - 23/Oct/13 5:40 AM I looked through the relevant code in pass2.c::check_dir_block(): /* * Offer to clear unused inodes; if we are going to be * restarting the scan due to bg_itable_unused being * wrong, then don't clear any inodes to avoid zapping * inodes that were skipped during pass1 due to an * incorrect bg_itable_unused; we'll get any real * problems after we restart. */ if (!(ctx->flags & E2F_FLAG_RESTART_LATER) && !(ext2fs_test_inode_bitmap2(ctx->inode_used_map, dirent->inode))) problem = PR_2_UNUSED_INODE; if (problem) { if (fix_problem(ctx, problem, &cd->pctx)) { dirent->inode = 0; dir_modified++; goto next; It is easy to trigger the PR_2_UNUSED_INODE problem by setting nlink = 0 in the inode(s) via debugfs. However, when I run e2fsck against such a filesystem (whether with small directories or large htree directories) e2fsck fixes the problem by clearing the dirent (setting inode = 0 above, and later writing out the directory block) and a second check shows it is fixed. To capture a filesystem that has a persistent case of this problem (after "e2fsck -fy" didn't fix it) so that it can be debugged and fixed, please use e2image to dump the filesystem metadata. The dense image format can be efficiently compressed and transported, unlike the sparse variant of e2image: e2image -Q /dev/OSTnnnn OSTnnnn.qcow bzip2 -9 OSTnnnn.qcow Hopefully the OSTnnnn.qcow.bz2 image size is small enough for transport. It is possible to reconstitute the (uncompressed) qcow file into a raw ext4 image file that can be tested with e2fsck, debugfs, or mounted via loopback. e2image -r OSTnnnn.qcow OSTnnnn.raw

Kit Westneat (Inactive) added a comment - 22/Oct/13 1:00 PM

Hi Niu,

The first customer had a problem with the RAID storage which caused the ldiskfs corruption. The second customer had a power outage that we think corrupted the journal and journal replay (~~LU-4102~~). Basically when there is some kind of ldiskfs corruption, there is the possibility of getting these delete/unused inode messages, and it seems if the htrees are also corrupt, e2fsck is unable to clear them.

Thanks,
Kit

Kit Westneat (Inactive) added a comment - 22/Oct/13 1:00 PM Hi Niu, The first customer had a problem with the RAID storage which caused the ldiskfs corruption. The second customer had a power outage that we think corrupted the journal and journal replay ( LU-4102 ). Basically when there is some kind of ldiskfs corruption, there is the possibility of getting these delete/unused inode messages, and it seems if the htrees are also corrupt, e2fsck is unable to clear them. Thanks, Kit

Niu Yawei (Inactive) added a comment - 22/Oct/13 3:06 AM

Kit, I didn't know they often run into the problem of "deleted/unused inode". Which Lustre version did they use? and do you know what kind of operation could possibly caused the problem? If possible, could you collect the log on OST before the problem happen? I think it might be helpful for us to figure out how this happened.

I'll look into the e2fsck problem at the same time. Thank you.

Niu Yawei (Inactive) added a comment - 22/Oct/13 3:06 AM Kit, I didn't know they often run into the problem of "deleted/unused inode". Which Lustre version did they use? and do you know what kind of operation could possibly caused the problem? If possible, could you collect the log on OST before the problem happen? I think it might be helpful for us to figure out how this happened. I'll look into the e2fsck problem at the same time. Thank you.

People

Assignee:: Niu Yawei (Inactive)

Reporter:: Kit Westneat (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 10 Start watching this issue

Dates

Created:: 01/Jul/13 2:20 PM

Updated:: 13/Dec/13 8:53 PM

Resolved:: 13/Dec/13 8:53 PM