Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3542

deleted/unused inodes not actually cleared by e2fsck

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • None
    • None
    • None
    • Centos5, e2fsprogs-1.42.7.wc1-0redhat
    • 2
    • 8914

    Description

      e2fsck doesn't actually clear deleted/unused inodes, though it claims to. I've attached a log showing what we are seeing. The customer is CalTech.

      Attachments

        1. e2fsck_safe_repair_ost_3.log-1
          20 kB
        2. e2fsck_safe_repair_ost_3.log-2
          17 kB
        3. e2fsck.log
          9 kB
        4. fsck.hpfs2-eg3-oss11.ost0.2013_11_07.out1
          186 kB
        5. htree.dump
          122 kB

        Activity

          [LU-3542] deleted/unused inodes not actually cleared by e2fsck

          Hi Niu, I was able to convert the qcow image to a raw (sparse) image on an XFS filesystem. It uses 3.6GB, though it reports a size of 28TB:
          [root@oxonia-mds1 rcvy]# ls -lh
          total 3.6G
          rw------ 1 root root 28T Nov 22 10:04 ost000b.raw

          oh, I didn't notice it's sparse file. Then I think it can be converted on ext4 either, however, I got following error while trying to convert it on ext4 (actual size 1.6G, showed 16T):

          e2image: Invalid argument while trying to convert qcow2 image (ost000b.qcow) into raw image
          

          If the 3.6G file you mentioned is http://ddntsr.com/ftp/2013-10-30-lustre-ost_lfs2_36.qcow2.bz2, could you upload it to whamcloud ftp? cause I have no permission to access the ddn ftp server.

          niu Niu Yawei (Inactive) added a comment - Hi Niu, I was able to convert the qcow image to a raw (sparse) image on an XFS filesystem. It uses 3.6GB, though it reports a size of 28TB: [root@oxonia-mds1 rcvy] # ls -lh total 3.6G rw------ 1 root root 28T Nov 22 10:04 ost000b.raw oh, I didn't notice it's sparse file. Then I think it can be converted on ext4 either, however, I got following error while trying to convert it on ext4 (actual size 1.6G, showed 16T): e2image: Invalid argument while trying to convert qcow2 image (ost000b.qcow) into raw image If the 3.6G file you mentioned is http://ddntsr.com/ftp/2013-10-30-lustre-ost_lfs2_36.qcow2.bz2 , could you upload it to whamcloud ftp? cause I have no permission to access the ddn ftp server.

          Hi Niu, I was able to convert the qcow image to a raw (sparse) image on an XFS filesystem. It uses 3.6GB, though it reports a size of 28TB:
          [root@oxonia-mds1 rcvy]# ls -lh
          total 3.6G
          rw------ 1 root root 28T Nov 22 10:04 ost000b.raw

          kitwestneat Kit Westneat (Inactive) added a comment - Hi Niu, I was able to convert the qcow image to a raw (sparse) image on an XFS filesystem. It uses 3.6GB, though it reports a size of 28TB: [root@oxonia-mds1 rcvy] # ls -lh total 3.6G rw ------ 1 root root 28T Nov 22 10:04 ost000b.raw

          The raw device of ftp.whamcloud.com/uploads/LU-3542/ost000b.qcow.bz2 is 16TB? It's hard for me to find a machine with that big drive to reproduce the problem, is there any smaller OST which has the same problem?

          niu Niu Yawei (Inactive) added a comment - The raw device of ftp.whamcloud.com/uploads/ LU-3542 /ost000b.qcow.bz2 is 16TB? It's hard for me to find a machine with that big drive to reproduce the problem, is there any smaller OST which has the same problem?
          dvicker Darby Vicker added a comment -

          I just uploaded my qcow image to ftp.whamcloud.com/uploads/LU-3542/ost000b.qcow.bz2

          dvicker Darby Vicker added a comment - I just uploaded my qcow image to ftp.whamcloud.com/uploads/ LU-3542 /ost000b.qcow.bz2

          We ran into this problem as well. I'll attach the fsck output to this JIRA. Email me if you'd like me to send you the qcow image.

          dvicker Darby Vicker added a comment - We ran into this problem as well. I'll attach the fsck output to this JIRA. Email me if you'd like me to send you the qcow image.

          I got a qcow image with a file exhibiting the corruption, it's available here:
          http://ddntsr.com/ftp/2013-10-30-lustre-ost_lfs2_36.qcow2.bz2 [295M]

          1. e2fsck -fp /dev/mapper/ost_lfs2_36
            lfs2-OST0024: Entry '62977970' in /O/0/d18 (88080410) has deleted/unused inode 1051496. CLEARED.
            lfs2-OST0024: 1546929/89620480 files (9.5% non-contiguous), 2367418676/5735710720 blocks
          1. e2fsck -fp /dev/mapper/ost_lfs2_36
            lfs2-OST0024: Entry '62977970' in /O/0/d18 (88080410) has deleted/unused inode 1051496. CLEARED.
            lfs2-OST0024: 1546929/89620480 files (9.5% non-contiguous), 2367418676/5735710720 blocks
          kitwestneat Kit Westneat (Inactive) added a comment - I got a qcow image with a file exhibiting the corruption, it's available here: http://ddntsr.com/ftp/2013-10-30-lustre-ost_lfs2_36.qcow2.bz2 [295M] e2fsck -fp /dev/mapper/ost_lfs2_36 lfs2-OST0024: Entry '62977970' in /O/0/d18 (88080410) has deleted/unused inode 1051496. CLEARED. lfs2-OST0024: 1546929/89620480 files (9.5% non-contiguous), 2367418676/5735710720 blocks e2fsck -fp /dev/mapper/ost_lfs2_36 lfs2-OST0024: Entry '62977970' in /O/0/d18 (88080410) has deleted/unused inode 1051496. CLEARED. lfs2-OST0024: 1546929/89620480 files (9.5% non-contiguous), 2367418676/5735710720 blocks

          Even if there isn't a 100% chance that OST has the problem, it is still worthwhile to make an image of the OST. This will first give us an idea of how long it takes to generate the image, how large it is (uncompressed and compressed), and it can also be used to test the LU-4102 code.

          adilger Andreas Dilger added a comment - Even if there isn't a 100% chance that OST has the problem, it is still worthwhile to make an image of the OST. This will first give us an idea of how long it takes to generate the image, how large it is (uncompressed and compressed), and it can also be used to test the LU-4102 code.

          I don't think any of the OSTs described in LU-4102 currently has the deleted/unused inodes issue. All the ones that reported it on the r/o e2fsck had previously been clean, so I think that it's just a matter of them being in use. That being said I could get an image of the OST (ost_45) that had the error before.. Do you think that might be useful? I have the e2fsck output as well.

          kitwestneat Kit Westneat (Inactive) added a comment - I don't think any of the OSTs described in LU-4102 currently has the deleted/unused inodes issue. All the ones that reported it on the r/o e2fsck had previously been clean, so I think that it's just a matter of them being in use. That being said I could get an image of the OST (ost_45) that had the error before.. Do you think that might be useful? I have the e2fsck output as well.

          I looked through the relevant code in pass2.c::check_dir_block():

                          /* 
                           * Offer to clear unused inodes; if we are going to be
                           * restarting the scan due to bg_itable_unused being
                           * wrong, then don't clear any inodes to avoid zapping
                           * inodes that were skipped during pass1 due to an
                           * incorrect bg_itable_unused; we'll get any real
                           * problems after we restart.
                           */
                          if (!(ctx->flags & E2F_FLAG_RESTART_LATER) &&
                              !(ext2fs_test_inode_bitmap2(ctx->inode_used_map,
                                                          dirent->inode)))
                                  problem = PR_2_UNUSED_INODE;
          
                          if (problem) {
                                  if (fix_problem(ctx, problem, &cd->pctx)) {
                                          dirent->inode = 0;
                                          dir_modified++;
                                          goto next;
          

          It is easy to trigger the PR_2_UNUSED_INODE problem by setting nlink = 0 in the inode(s) via debugfs. However, when I run e2fsck against such a filesystem (whether with small directories or large htree directories) e2fsck fixes the problem by clearing the dirent (setting inode = 0 above, and later writing out the directory block) and a second check shows it is fixed.

          To capture a filesystem that has a persistent case of this problem (after "e2fsck -fy" didn't fix it) so that it can be debugged and fixed, please use e2image to dump the filesystem metadata. The dense image format can be efficiently compressed and transported, unlike the sparse variant of e2image:

          e2image -Q /dev/OSTnnnn OSTnnnn.qcow
          bzip2 -9 OSTnnnn.qcow
          

          Hopefully the OSTnnnn.qcow.bz2 image size is small enough for transport. It is possible to reconstitute the (uncompressed) qcow file into a raw ext4 image file that can be tested with e2fsck, debugfs, or mounted via loopback.

          e2image -r OSTnnnn.qcow OSTnnnn.raw
          
          adilger Andreas Dilger added a comment - I looked through the relevant code in pass2.c::check_dir_block(): /* * Offer to clear unused inodes; if we are going to be * restarting the scan due to bg_itable_unused being * wrong, then don't clear any inodes to avoid zapping * inodes that were skipped during pass1 due to an * incorrect bg_itable_unused; we'll get any real * problems after we restart. */ if (!(ctx->flags & E2F_FLAG_RESTART_LATER) && !(ext2fs_test_inode_bitmap2(ctx->inode_used_map, dirent->inode))) problem = PR_2_UNUSED_INODE; if (problem) { if (fix_problem(ctx, problem, &cd->pctx)) { dirent->inode = 0; dir_modified++; goto next; It is easy to trigger the PR_2_UNUSED_INODE problem by setting nlink = 0 in the inode(s) via debugfs. However, when I run e2fsck against such a filesystem (whether with small directories or large htree directories) e2fsck fixes the problem by clearing the dirent (setting inode = 0 above, and later writing out the directory block) and a second check shows it is fixed. To capture a filesystem that has a persistent case of this problem (after "e2fsck -fy" didn't fix it) so that it can be debugged and fixed, please use e2image to dump the filesystem metadata. The dense image format can be efficiently compressed and transported, unlike the sparse variant of e2image: e2image -Q /dev/OSTnnnn OSTnnnn.qcow bzip2 -9 OSTnnnn.qcow Hopefully the OSTnnnn.qcow.bz2 image size is small enough for transport. It is possible to reconstitute the (uncompressed) qcow file into a raw ext4 image file that can be tested with e2fsck, debugfs, or mounted via loopback. e2image -r OSTnnnn.qcow OSTnnnn.raw

          Hi Niu,

          The first customer had a problem with the RAID storage which caused the ldiskfs corruption. The second customer had a power outage that we think corrupted the journal and journal replay (LU-4102). Basically when there is some kind of ldiskfs corruption, there is the possibility of getting these delete/unused inode messages, and it seems if the htrees are also corrupt, e2fsck is unable to clear them.

          Thanks,
          Kit

          kitwestneat Kit Westneat (Inactive) added a comment - Hi Niu, The first customer had a problem with the RAID storage which caused the ldiskfs corruption. The second customer had a power outage that we think corrupted the journal and journal replay ( LU-4102 ). Basically when there is some kind of ldiskfs corruption, there is the possibility of getting these delete/unused inode messages, and it seems if the htrees are also corrupt, e2fsck is unable to clear them. Thanks, Kit

          Kit, I didn't know they often run into the problem of "deleted/unused inode". Which Lustre version did they use? and do you know what kind of operation could possibly caused the problem? If possible, could you collect the log on OST before the problem happen? I think it might be helpful for us to figure out how this happened.

          I'll look into the e2fsck problem at the same time. Thank you.

          niu Niu Yawei (Inactive) added a comment - Kit, I didn't know they often run into the problem of "deleted/unused inode". Which Lustre version did they use? and do you know what kind of operation could possibly caused the problem? If possible, could you collect the log on OST before the problem happen? I think it might be helpful for us to figure out how this happened. I'll look into the e2fsck problem at the same time. Thank you.

          People

            niu Niu Yawei (Inactive)
            kitwestneat Kit Westneat (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: