[LU-3542] deleted/unused inodes not actually cleared by e2fsck - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Critical
Fix Version/s: None
Affects Version/s: None
Labels:
None
Environment:
Centos5, e2fsprogs-1.42.7.wc1-0redhat

Severity:
2
Rank (Obsolete):
8914

Description

e2fsck doesn't actually clear deleted/unused inodes, though it claims to. I've attached a log showing what we are seeing. The customer is CalTech.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

e2fsck_safe_repair_ost_3.log-1
20 kB
02/Jul/13 4:12 PM
e2fsck_safe_repair_ost_3.log-2
17 kB
02/Jul/13 4:12 PM
e2fsck.log
9 kB
01/Jul/13 2:20 PM
fsck.hpfs2-eg3-oss11.ost0.2013_11_07.out1
186 kB
07/Nov/13 10:57 PM
htree.dump
122 kB
02/Jul/13 4:11 PM

Activity

[LU-3542] deleted/unused inodes not actually cleared by e2fsck

Niu Yawei (Inactive) added a comment - 22/Nov/13 6:41 AM

The raw device of ftp.whamcloud.com/uploads/~~LU-3542~~/ost000b.qcow.bz2 is 16TB? It's hard for me to find a machine with that big drive to reproduce the problem, is there any smaller OST which has the same problem?

Niu Yawei (Inactive) added a comment - 22/Nov/13 6:41 AM The raw device of ftp.whamcloud.com/uploads/ LU-3542 /ost000b.qcow.bz2 is 16TB? It's hard for me to find a machine with that big drive to reproduce the problem, is there any smaller OST which has the same problem?

Darby Vicker added a comment - 12/Nov/13 5:50 PM

I just uploaded my qcow image to ftp.whamcloud.com/uploads/~~LU-3542~~/ost000b.qcow.bz2

Darby Vicker added a comment - 12/Nov/13 5:50 PM I just uploaded my qcow image to ftp.whamcloud.com/uploads/ LU-3542 /ost000b.qcow.bz2

Darby Vicker added a comment - 07/Nov/13 10:56 PM

We ran into this problem as well. I'll attach the fsck output to this JIRA. Email me if you'd like me to send you the qcow image.

Darby Vicker added a comment - 07/Nov/13 10:56 PM We ran into this problem as well. I'll attach the fsck output to this JIRA. Email me if you'd like me to send you the qcow image.

Kit Westneat (Inactive) added a comment - 30/Oct/13 4:21 PM

I got a qcow image with a file exhibiting the corruption, it's available here:
http://ddntsr.com/ftp/2013-10-30-lustre-ost_lfs2_36.qcow2.bz2 [295M]

e2fsck -fp /dev/mapper/ost_lfs2_36
lfs2-OST0024: Entry '62977970' in /O/0/d18 (88080410) has deleted/unused inode 1051496. CLEARED.
lfs2-OST0024: 1546929/89620480 files (9.5% non-contiguous), 2367418676/5735710720 blocks

e2fsck -fp /dev/mapper/ost_lfs2_36
lfs2-OST0024: Entry '62977970' in /O/0/d18 (88080410) has deleted/unused inode 1051496. CLEARED.
lfs2-OST0024: 1546929/89620480 files (9.5% non-contiguous), 2367418676/5735710720 blocks

Kit Westneat (Inactive) added a comment - 30/Oct/13 4:21 PM I got a qcow image with a file exhibiting the corruption, it's available here: http://ddntsr.com/ftp/2013-10-30-lustre-ost_lfs2_36.qcow2.bz2 [295M] e2fsck -fp /dev/mapper/ost_lfs2_36 lfs2-OST0024: Entry '62977970' in /O/0/d18 (88080410) has deleted/unused inode 1051496. CLEARED. lfs2-OST0024: 1546929/89620480 files (9.5% non-contiguous), 2367418676/5735710720 blocks e2fsck -fp /dev/mapper/ost_lfs2_36 lfs2-OST0024: Entry '62977970' in /O/0/d18 (88080410) has deleted/unused inode 1051496. CLEARED. lfs2-OST0024: 1546929/89620480 files (9.5% non-contiguous), 2367418676/5735710720 blocks

Andreas Dilger added a comment - 24/Oct/13 5:07 PM

Even if there isn't a 100% chance that OST has the problem, it is still worthwhile to make an image of the OST. This will first give us an idea of how long it takes to generate the image, how large it is (uncompressed and compressed), and it can also be used to test the ~~LU-4102~~ code.

Andreas Dilger added a comment - 24/Oct/13 5:07 PM Even if there isn't a 100% chance that OST has the problem, it is still worthwhile to make an image of the OST. This will first give us an idea of how long it takes to generate the image, how large it is (uncompressed and compressed), and it can also be used to test the LU-4102 code.

Kit Westneat (Inactive) added a comment - 23/Oct/13 6:45 PM

I don't think any of the OSTs described in ~~LU-4102~~ currently has the deleted/unused inodes issue. All the ones that reported it on the r/o e2fsck had previously been clean, so I think that it's just a matter of them being in use. That being said I could get an image of the OST (ost_45) that had the error before.. Do you think that might be useful? I have the e2fsck output as well.

Kit Westneat (Inactive) added a comment - 23/Oct/13 6:45 PM I don't think any of the OSTs described in LU-4102 currently has the deleted/unused inodes issue. All the ones that reported it on the r/o e2fsck had previously been clean, so I think that it's just a matter of them being in use. That being said I could get an image of the OST (ost_45) that had the error before.. Do you think that might be useful? I have the e2fsck output as well.

Andreas Dilger added a comment - 23/Oct/13 5:40 AM

I looked through the relevant code in pass2.c::check_dir_block():

                /* 
                 * Offer to clear unused inodes; if we are going to be
                 * restarting the scan due to bg_itable_unused being
                 * wrong, then don't clear any inodes to avoid zapping
                 * inodes that were skipped during pass1 due to an
                 * incorrect bg_itable_unused; we'll get any real
                 * problems after we restart.
                 */
                if (!(ctx->flags & E2F_FLAG_RESTART_LATER) &&
                    !(ext2fs_test_inode_bitmap2(ctx->inode_used_map,
                                                dirent->inode)))
                        problem = PR_2_UNUSED_INODE;

                if (problem) {
                        if (fix_problem(ctx, problem, &cd->pctx)) {
                                dirent->inode = 0;
                                dir_modified++;
                                goto next;

It is easy to trigger the PR_2_UNUSED_INODE problem by setting nlink = 0 in the inode(s) via debugfs. However, when I run e2fsck against such a filesystem (whether with small directories or large htree directories) e2fsck fixes the problem by clearing the dirent (setting inode = 0 above, and later writing out the directory block) and a second check shows it is fixed.

To capture a filesystem that has a persistent case of this problem (after "e2fsck -fy" didn't fix it) so that it can be debugged and fixed, please use e2image to dump the filesystem metadata. The dense image format can be efficiently compressed and transported, unlike the sparse variant of e2image:

e2image -Q /dev/OSTnnnn OSTnnnn.qcow
bzip2 -9 OSTnnnn.qcow

Hopefully the OSTnnnn.qcow.bz2 image size is small enough for transport. It is possible to reconstitute the (uncompressed) qcow file into a raw ext4 image file that can be tested with e2fsck, debugfs, or mounted via loopback.

e2image -r OSTnnnn.qcow OSTnnnn.raw

Andreas Dilger added a comment - 23/Oct/13 5:40 AM I looked through the relevant code in pass2.c::check_dir_block(): /* * Offer to clear unused inodes; if we are going to be * restarting the scan due to bg_itable_unused being * wrong, then don't clear any inodes to avoid zapping * inodes that were skipped during pass1 due to an * incorrect bg_itable_unused; we'll get any real * problems after we restart. */ if (!(ctx->flags & E2F_FLAG_RESTART_LATER) && !(ext2fs_test_inode_bitmap2(ctx->inode_used_map, dirent->inode))) problem = PR_2_UNUSED_INODE; if (problem) { if (fix_problem(ctx, problem, &cd->pctx)) { dirent->inode = 0; dir_modified++; goto next; It is easy to trigger the PR_2_UNUSED_INODE problem by setting nlink = 0 in the inode(s) via debugfs. However, when I run e2fsck against such a filesystem (whether with small directories or large htree directories) e2fsck fixes the problem by clearing the dirent (setting inode = 0 above, and later writing out the directory block) and a second check shows it is fixed. To capture a filesystem that has a persistent case of this problem (after "e2fsck -fy" didn't fix it) so that it can be debugged and fixed, please use e2image to dump the filesystem metadata. The dense image format can be efficiently compressed and transported, unlike the sparse variant of e2image: e2image -Q /dev/OSTnnnn OSTnnnn.qcow bzip2 -9 OSTnnnn.qcow Hopefully the OSTnnnn.qcow.bz2 image size is small enough for transport. It is possible to reconstitute the (uncompressed) qcow file into a raw ext4 image file that can be tested with e2fsck, debugfs, or mounted via loopback. e2image -r OSTnnnn.qcow OSTnnnn.raw

Kit Westneat (Inactive) added a comment - 22/Oct/13 1:00 PM

Hi Niu,

The first customer had a problem with the RAID storage which caused the ldiskfs corruption. The second customer had a power outage that we think corrupted the journal and journal replay (~~LU-4102~~). Basically when there is some kind of ldiskfs corruption, there is the possibility of getting these delete/unused inode messages, and it seems if the htrees are also corrupt, e2fsck is unable to clear them.

Thanks,
Kit

Kit Westneat (Inactive) added a comment - 22/Oct/13 1:00 PM Hi Niu, The first customer had a problem with the RAID storage which caused the ldiskfs corruption. The second customer had a power outage that we think corrupted the journal and journal replay ( LU-4102 ). Basically when there is some kind of ldiskfs corruption, there is the possibility of getting these delete/unused inode messages, and it seems if the htrees are also corrupt, e2fsck is unable to clear them. Thanks, Kit

Niu Yawei (Inactive) added a comment - 22/Oct/13 3:06 AM

Kit, I didn't know they often run into the problem of "deleted/unused inode". Which Lustre version did they use? and do you know what kind of operation could possibly caused the problem? If possible, could you collect the log on OST before the problem happen? I think it might be helpful for us to figure out how this happened.

I'll look into the e2fsck problem at the same time. Thank you.

Niu Yawei (Inactive) added a comment - 22/Oct/13 3:06 AM Kit, I didn't know they often run into the problem of "deleted/unused inode". Which Lustre version did they use? and do you know what kind of operation could possibly caused the problem? If possible, could you collect the log on OST before the problem happen? I think it might be helpful for us to figure out how this happened. I'll look into the e2fsck problem at the same time. Thank you.

Kit Westneat (Inactive) added a comment - 21/Oct/13 2:18 PM

Hi Niu,

This has become a higher priority for us. The problem is that if deleted inodes are not cleared, the filesystem will go read-only when it encounters the inode. This can lead to a state where the filesystem goes read-only at a random time and only manual intervention with debugfs can bring it back to a healthy state. It has happened to us a couple of times now, so I think we need to explore problem #1 a little more closely.

Thanks.

Kit Westneat (Inactive) added a comment - 21/Oct/13 2:18 PM Hi Niu, This has become a higher priority for us. The problem is that if deleted inodes are not cleared, the filesystem will go read-only when it encounters the inode. This can lead to a state where the filesystem goes read-only at a random time and only manual intervention with debugfs can bring it back to a healthy state. It has happened to us a couple of times now, so I think we need to explore problem #1 a little more closely. Thanks.

Niu Yawei (Inactive) added a comment - 21/Oct/13 7:58 AM

Peter, the two questions Kit asked are probably e2fsck bugs. The remaining work is:

Search to find out if the same problem was reported in Linux community before, and if there is any patch alreay. (I did an initial searching, but had no luck so far)
Try to reproduce the probelm and trace into the e2fsck code to see if it's really some bug needs be fixed. (that requires e2fsprogs expert and could be time-consuming)

I agree with Kit that it's not high priority job.

Niu Yawei (Inactive) added a comment - 21/Oct/13 7:58 AM Peter, the two questions Kit asked are probably e2fsck bugs. The remaining work is: Search to find out if the same problem was reported in Linux community before, and if there is any patch alreay. (I did an initial searching, but had no luck so far) Try to reproduce the probelm and trace into the e2fsck code to see if it's really some bug needs be fixed. (that requires e2fsprogs expert and could be time-consuming) I agree with Kit that it's not high priority job.

People

Assignee:: Niu Yawei (Inactive)

Reporter:: Kit Westneat (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 10 Start watching this issue

Dates

Created:: 01/Jul/13 2:20 PM

Updated:: 13/Dec/13 8:53 PM

Resolved:: 13/Dec/13 8:53 PM