Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3542

deleted/unused inodes not actually cleared by e2fsck

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • None
    • None
    • None
    • Centos5, e2fsprogs-1.42.7.wc1-0redhat
    • 2
    • 8914

    Description

      e2fsck doesn't actually clear deleted/unused inodes, though it claims to. I've attached a log showing what we are seeing. The customer is CalTech.

      Attachments

        1. e2fsck_safe_repair_ost_3.log-1
          20 kB
        2. e2fsck_safe_repair_ost_3.log-2
          17 kB
        3. e2fsck.log
          9 kB
        4. fsck.hpfs2-eg3-oss11.ost0.2013_11_07.out1
          186 kB
        5. htree.dump
          122 kB

        Activity

          [LU-3542] deleted/unused inodes not actually cleared by e2fsck
          kitwestneat Kit Westneat (Inactive) added a comment - - edited

          It looks like the block number is wrapping around during the io_channel write:

          Entry '3102500' in /O/0/d4 (19398664) has deleted/unused inode 26072855.  Clear? yes
          Breakpoint 1, check_dir_block (fs=<value optimized out>, db=0x7ffff7f14340, priv_data=0x7fffffffe180) at pass2.c:1219
          1219                    cd->pctx.errcode = ext2fs_write_dir_block(fs, block_nr, buf);
          (gdb) p block_nr
          $30 = 4966058525
          
          Breakpoint 2, raw_write_blk (channel=0x647570, data=0x648670, block=671091229, count=1, bufv=0x64f060) at unix_io.c:233
          
          (gdb) p (unsigned int)4966058525
          $33 = 671091229
          

          I thought maybe it was the cache node, but that appears to use an unsigned long long to store the block.

          I'll keep looking but I thought I'd pass that info along in case it helps.

          kitwestneat Kit Westneat (Inactive) added a comment - - edited It looks like the block number is wrapping around during the io_channel write: Entry '3102500' in /O/0/d4 (19398664) has deleted/unused inode 26072855. Clear? yes Breakpoint 1, check_dir_block (fs=<value optimized out>, db=0x7ffff7f14340, priv_data=0x7fffffffe180) at pass2.c:1219 1219 cd->pctx.errcode = ext2fs_write_dir_block(fs, block_nr, buf); (gdb) p block_nr $30 = 4966058525 Breakpoint 2, raw_write_blk (channel=0x647570, data=0x648670, block=671091229, count=1, bufv=0x64f060) at unix_io.c:233 (gdb) p (unsigned int )4966058525 $33 = 671091229 I thought maybe it was the cache node, but that appears to use an unsigned long long to store the block. I'll keep looking but I thought I'd pass that info along in case it helps.

          It seems like this doesn't actually produce a valid raw image:
          e2image -r OSTnnnn.qcow OSTnnnn.raw

          I had to do:
          qemu-img convert -p -O raw /scratch/ost000b.qcow ost000b.raw

          to get something that worked.

          kitwestneat Kit Westneat (Inactive) added a comment - It seems like this doesn't actually produce a valid raw image: e2image -r OSTnnnn.qcow OSTnnnn.raw I had to do: qemu-img convert -p -O raw /scratch/ost000b.qcow ost000b.raw to get something that worked.

          Hi Niu,

          I don't think ext4 supports files greater than 16TB, so you'd need to use XFS or ZFS.

          Yeah, the files on the DDN server are temporary.. I'll upload it to the Intel FTP server.

          Thanks,
          Kit

          kitwestneat Kit Westneat (Inactive) added a comment - Hi Niu, I don't think ext4 supports files greater than 16TB, so you'd need to use XFS or ZFS. Yeah, the files on the DDN server are temporary.. I'll upload it to the Intel FTP server. Thanks, Kit

          Hi Niu, I was able to convert the qcow image to a raw (sparse) image on an XFS filesystem. It uses 3.6GB, though it reports a size of 28TB:
          [root@oxonia-mds1 rcvy]# ls -lh
          total 3.6G
          rw------ 1 root root 28T Nov 22 10:04 ost000b.raw

          oh, I didn't notice it's sparse file. Then I think it can be converted on ext4 either, however, I got following error while trying to convert it on ext4 (actual size 1.6G, showed 16T):

          e2image: Invalid argument while trying to convert qcow2 image (ost000b.qcow) into raw image
          

          If the 3.6G file you mentioned is http://ddntsr.com/ftp/2013-10-30-lustre-ost_lfs2_36.qcow2.bz2, could you upload it to whamcloud ftp? cause I have no permission to access the ddn ftp server.

          niu Niu Yawei (Inactive) added a comment - Hi Niu, I was able to convert the qcow image to a raw (sparse) image on an XFS filesystem. It uses 3.6GB, though it reports a size of 28TB: [root@oxonia-mds1 rcvy] # ls -lh total 3.6G rw------ 1 root root 28T Nov 22 10:04 ost000b.raw oh, I didn't notice it's sparse file. Then I think it can be converted on ext4 either, however, I got following error while trying to convert it on ext4 (actual size 1.6G, showed 16T): e2image: Invalid argument while trying to convert qcow2 image (ost000b.qcow) into raw image If the 3.6G file you mentioned is http://ddntsr.com/ftp/2013-10-30-lustre-ost_lfs2_36.qcow2.bz2 , could you upload it to whamcloud ftp? cause I have no permission to access the ddn ftp server.

          Hi Niu, I was able to convert the qcow image to a raw (sparse) image on an XFS filesystem. It uses 3.6GB, though it reports a size of 28TB:
          [root@oxonia-mds1 rcvy]# ls -lh
          total 3.6G
          rw------ 1 root root 28T Nov 22 10:04 ost000b.raw

          kitwestneat Kit Westneat (Inactive) added a comment - Hi Niu, I was able to convert the qcow image to a raw (sparse) image on an XFS filesystem. It uses 3.6GB, though it reports a size of 28TB: [root@oxonia-mds1 rcvy] # ls -lh total 3.6G rw ------ 1 root root 28T Nov 22 10:04 ost000b.raw

          The raw device of ftp.whamcloud.com/uploads/LU-3542/ost000b.qcow.bz2 is 16TB? It's hard for me to find a machine with that big drive to reproduce the problem, is there any smaller OST which has the same problem?

          niu Niu Yawei (Inactive) added a comment - The raw device of ftp.whamcloud.com/uploads/ LU-3542 /ost000b.qcow.bz2 is 16TB? It's hard for me to find a machine with that big drive to reproduce the problem, is there any smaller OST which has the same problem?
          dvicker Darby Vicker added a comment -

          I just uploaded my qcow image to ftp.whamcloud.com/uploads/LU-3542/ost000b.qcow.bz2

          dvicker Darby Vicker added a comment - I just uploaded my qcow image to ftp.whamcloud.com/uploads/ LU-3542 /ost000b.qcow.bz2

          We ran into this problem as well. I'll attach the fsck output to this JIRA. Email me if you'd like me to send you the qcow image.

          dvicker Darby Vicker added a comment - We ran into this problem as well. I'll attach the fsck output to this JIRA. Email me if you'd like me to send you the qcow image.

          I got a qcow image with a file exhibiting the corruption, it's available here:
          http://ddntsr.com/ftp/2013-10-30-lustre-ost_lfs2_36.qcow2.bz2 [295M]

          1. e2fsck -fp /dev/mapper/ost_lfs2_36
            lfs2-OST0024: Entry '62977970' in /O/0/d18 (88080410) has deleted/unused inode 1051496. CLEARED.
            lfs2-OST0024: 1546929/89620480 files (9.5% non-contiguous), 2367418676/5735710720 blocks
          1. e2fsck -fp /dev/mapper/ost_lfs2_36
            lfs2-OST0024: Entry '62977970' in /O/0/d18 (88080410) has deleted/unused inode 1051496. CLEARED.
            lfs2-OST0024: 1546929/89620480 files (9.5% non-contiguous), 2367418676/5735710720 blocks
          kitwestneat Kit Westneat (Inactive) added a comment - I got a qcow image with a file exhibiting the corruption, it's available here: http://ddntsr.com/ftp/2013-10-30-lustre-ost_lfs2_36.qcow2.bz2 [295M] e2fsck -fp /dev/mapper/ost_lfs2_36 lfs2-OST0024: Entry '62977970' in /O/0/d18 (88080410) has deleted/unused inode 1051496. CLEARED. lfs2-OST0024: 1546929/89620480 files (9.5% non-contiguous), 2367418676/5735710720 blocks e2fsck -fp /dev/mapper/ost_lfs2_36 lfs2-OST0024: Entry '62977970' in /O/0/d18 (88080410) has deleted/unused inode 1051496. CLEARED. lfs2-OST0024: 1546929/89620480 files (9.5% non-contiguous), 2367418676/5735710720 blocks

          Even if there isn't a 100% chance that OST has the problem, it is still worthwhile to make an image of the OST. This will first give us an idea of how long it takes to generate the image, how large it is (uncompressed and compressed), and it can also be used to test the LU-4102 code.

          adilger Andreas Dilger added a comment - Even if there isn't a 100% chance that OST has the problem, it is still worthwhile to make an image of the OST. This will first give us an idea of how long it takes to generate the image, how large it is (uncompressed and compressed), and it can also be used to test the LU-4102 code.

          I don't think any of the OSTs described in LU-4102 currently has the deleted/unused inodes issue. All the ones that reported it on the r/o e2fsck had previously been clean, so I think that it's just a matter of them being in use. That being said I could get an image of the OST (ost_45) that had the error before.. Do you think that might be useful? I have the e2fsck output as well.

          kitwestneat Kit Westneat (Inactive) added a comment - I don't think any of the OSTs described in LU-4102 currently has the deleted/unused inodes issue. All the ones that reported it on the r/o e2fsck had previously been clean, so I think that it's just a matter of them being in use. That being said I could get an image of the OST (ost_45) that had the error before.. Do you think that might be useful? I have the e2fsck output as well.

          People

            niu Niu Yawei (Inactive)
            kitwestneat Kit Westneat (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: