Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3542

deleted/unused inodes not actually cleared by e2fsck

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • None
    • None
    • None
    • Centos5, e2fsprogs-1.42.7.wc1-0redhat
    • 2
    • 8914

    Description

      e2fsck doesn't actually clear deleted/unused inodes, though it claims to. I've attached a log showing what we are seeing. The customer is CalTech.

      Attachments

        1. e2fsck_safe_repair_ost_3.log-1
          20 kB
        2. e2fsck_safe_repair_ost_3.log-2
          17 kB
        3. e2fsck.log
          9 kB
        4. fsck.hpfs2-eg3-oss11.ost0.2013_11_07.out1
          186 kB
        5. htree.dump
          122 kB

        Activity

          [LU-3542] deleted/unused inodes not actually cleared by e2fsck

          Yes, I think that's probably the reason of the entries are not fixed. check_dir_block() should call ext2fs_write_dir_block3() directly.

          niu Niu Yawei (Inactive) added a comment - Yes, I think that's probably the reason of the entries are not fixed. check_dir_block() should call ext2fs_write_dir_block3() directly.

          oh I think it is the definition of ext2fs_write_dir_block:

          errcode_t ext2fs_write_dir_block(ext2_filsys fs, blk_t block,                       
                           void *inbuf)                                                       
          
          typedef __u32       blk_t;                                                          
          typedef __u64       blk64_t;
          

          It seems like that should be blk64_t? It looks like ext2fs_write_dir_block3 uses blk64_t, but the call to ext2fs_write_dir_block already casts it down to blk_t

          Breakpoint 1, check_dir_block (fs=<value optimized out>, db=0x7ffff7f14a00, priv_data=0x7fffffffe180) at pass2.c:1219
          1219                    cd->pctx.errcode = ext2fs_write_dir_block(fs, block_nr, buf);
          (gdb) p block_nr
          $35 = 4966058603
          (gdb) cont
          Continuing.
          
          Breakpoint 3, ext2fs_write_dir_block3 (fs=0x647420, block=671091307, inbuf=0x667270, flags=0) at dirblock.c:146
          146             return io_channel_write_blk64(fs->io, block, 1, (char *) inbuf);
          (gdb) p (blk_t)4966058603
          $36 = 671091307
          
          kitwestneat Kit Westneat (Inactive) added a comment - oh I think it is the definition of ext2fs_write_dir_block: errcode_t ext2fs_write_dir_block(ext2_filsys fs, blk_t block, void *inbuf) typedef __u32 blk_t; typedef __u64 blk64_t; It seems like that should be blk64_t? It looks like ext2fs_write_dir_block3 uses blk64_t, but the call to ext2fs_write_dir_block already casts it down to blk_t Breakpoint 1, check_dir_block (fs=<value optimized out>, db=0x7ffff7f14a00, priv_data=0x7fffffffe180) at pass2.c:1219 1219 cd->pctx.errcode = ext2fs_write_dir_block(fs, block_nr, buf); (gdb) p block_nr $35 = 4966058603 (gdb) cont Continuing. Breakpoint 3, ext2fs_write_dir_block3 (fs=0x647420, block=671091307, inbuf=0x667270, flags=0) at dirblock.c:146 146 return io_channel_write_blk64(fs->io, block, 1, ( char *) inbuf); (gdb) p (blk_t)4966058603 $36 = 671091307
          kitwestneat Kit Westneat (Inactive) added a comment - - edited

          It looks like the block number is wrapping around during the io_channel write:

          Entry '3102500' in /O/0/d4 (19398664) has deleted/unused inode 26072855.  Clear? yes
          Breakpoint 1, check_dir_block (fs=<value optimized out>, db=0x7ffff7f14340, priv_data=0x7fffffffe180) at pass2.c:1219
          1219                    cd->pctx.errcode = ext2fs_write_dir_block(fs, block_nr, buf);
          (gdb) p block_nr
          $30 = 4966058525
          
          Breakpoint 2, raw_write_blk (channel=0x647570, data=0x648670, block=671091229, count=1, bufv=0x64f060) at unix_io.c:233
          
          (gdb) p (unsigned int)4966058525
          $33 = 671091229
          

          I thought maybe it was the cache node, but that appears to use an unsigned long long to store the block.

          I'll keep looking but I thought I'd pass that info along in case it helps.

          kitwestneat Kit Westneat (Inactive) added a comment - - edited It looks like the block number is wrapping around during the io_channel write: Entry '3102500' in /O/0/d4 (19398664) has deleted/unused inode 26072855. Clear? yes Breakpoint 1, check_dir_block (fs=<value optimized out>, db=0x7ffff7f14340, priv_data=0x7fffffffe180) at pass2.c:1219 1219 cd->pctx.errcode = ext2fs_write_dir_block(fs, block_nr, buf); (gdb) p block_nr $30 = 4966058525 Breakpoint 2, raw_write_blk (channel=0x647570, data=0x648670, block=671091229, count=1, bufv=0x64f060) at unix_io.c:233 (gdb) p (unsigned int )4966058525 $33 = 671091229 I thought maybe it was the cache node, but that appears to use an unsigned long long to store the block. I'll keep looking but I thought I'd pass that info along in case it helps.

          It seems like this doesn't actually produce a valid raw image:
          e2image -r OSTnnnn.qcow OSTnnnn.raw

          I had to do:
          qemu-img convert -p -O raw /scratch/ost000b.qcow ost000b.raw

          to get something that worked.

          kitwestneat Kit Westneat (Inactive) added a comment - It seems like this doesn't actually produce a valid raw image: e2image -r OSTnnnn.qcow OSTnnnn.raw I had to do: qemu-img convert -p -O raw /scratch/ost000b.qcow ost000b.raw to get something that worked.

          Hi Niu,

          I don't think ext4 supports files greater than 16TB, so you'd need to use XFS or ZFS.

          Yeah, the files on the DDN server are temporary.. I'll upload it to the Intel FTP server.

          Thanks,
          Kit

          kitwestneat Kit Westneat (Inactive) added a comment - Hi Niu, I don't think ext4 supports files greater than 16TB, so you'd need to use XFS or ZFS. Yeah, the files on the DDN server are temporary.. I'll upload it to the Intel FTP server. Thanks, Kit

          Hi Niu, I was able to convert the qcow image to a raw (sparse) image on an XFS filesystem. It uses 3.6GB, though it reports a size of 28TB:
          [root@oxonia-mds1 rcvy]# ls -lh
          total 3.6G
          rw------ 1 root root 28T Nov 22 10:04 ost000b.raw

          oh, I didn't notice it's sparse file. Then I think it can be converted on ext4 either, however, I got following error while trying to convert it on ext4 (actual size 1.6G, showed 16T):

          e2image: Invalid argument while trying to convert qcow2 image (ost000b.qcow) into raw image
          

          If the 3.6G file you mentioned is http://ddntsr.com/ftp/2013-10-30-lustre-ost_lfs2_36.qcow2.bz2, could you upload it to whamcloud ftp? cause I have no permission to access the ddn ftp server.

          niu Niu Yawei (Inactive) added a comment - Hi Niu, I was able to convert the qcow image to a raw (sparse) image on an XFS filesystem. It uses 3.6GB, though it reports a size of 28TB: [root@oxonia-mds1 rcvy] # ls -lh total 3.6G rw------ 1 root root 28T Nov 22 10:04 ost000b.raw oh, I didn't notice it's sparse file. Then I think it can be converted on ext4 either, however, I got following error while trying to convert it on ext4 (actual size 1.6G, showed 16T): e2image: Invalid argument while trying to convert qcow2 image (ost000b.qcow) into raw image If the 3.6G file you mentioned is http://ddntsr.com/ftp/2013-10-30-lustre-ost_lfs2_36.qcow2.bz2 , could you upload it to whamcloud ftp? cause I have no permission to access the ddn ftp server.

          Hi Niu, I was able to convert the qcow image to a raw (sparse) image on an XFS filesystem. It uses 3.6GB, though it reports a size of 28TB:
          [root@oxonia-mds1 rcvy]# ls -lh
          total 3.6G
          rw------ 1 root root 28T Nov 22 10:04 ost000b.raw

          kitwestneat Kit Westneat (Inactive) added a comment - Hi Niu, I was able to convert the qcow image to a raw (sparse) image on an XFS filesystem. It uses 3.6GB, though it reports a size of 28TB: [root@oxonia-mds1 rcvy] # ls -lh total 3.6G rw ------ 1 root root 28T Nov 22 10:04 ost000b.raw

          The raw device of ftp.whamcloud.com/uploads/LU-3542/ost000b.qcow.bz2 is 16TB? It's hard for me to find a machine with that big drive to reproduce the problem, is there any smaller OST which has the same problem?

          niu Niu Yawei (Inactive) added a comment - The raw device of ftp.whamcloud.com/uploads/ LU-3542 /ost000b.qcow.bz2 is 16TB? It's hard for me to find a machine with that big drive to reproduce the problem, is there any smaller OST which has the same problem?
          dvicker Darby Vicker added a comment -

          I just uploaded my qcow image to ftp.whamcloud.com/uploads/LU-3542/ost000b.qcow.bz2

          dvicker Darby Vicker added a comment - I just uploaded my qcow image to ftp.whamcloud.com/uploads/ LU-3542 /ost000b.qcow.bz2

          We ran into this problem as well. I'll attach the fsck output to this JIRA. Email me if you'd like me to send you the qcow image.

          dvicker Darby Vicker added a comment - We ran into this problem as well. I'll attach the fsck output to this JIRA. Email me if you'd like me to send you the qcow image.

          I got a qcow image with a file exhibiting the corruption, it's available here:
          http://ddntsr.com/ftp/2013-10-30-lustre-ost_lfs2_36.qcow2.bz2 [295M]

          1. e2fsck -fp /dev/mapper/ost_lfs2_36
            lfs2-OST0024: Entry '62977970' in /O/0/d18 (88080410) has deleted/unused inode 1051496. CLEARED.
            lfs2-OST0024: 1546929/89620480 files (9.5% non-contiguous), 2367418676/5735710720 blocks
          1. e2fsck -fp /dev/mapper/ost_lfs2_36
            lfs2-OST0024: Entry '62977970' in /O/0/d18 (88080410) has deleted/unused inode 1051496. CLEARED.
            lfs2-OST0024: 1546929/89620480 files (9.5% non-contiguous), 2367418676/5735710720 blocks
          kitwestneat Kit Westneat (Inactive) added a comment - I got a qcow image with a file exhibiting the corruption, it's available here: http://ddntsr.com/ftp/2013-10-30-lustre-ost_lfs2_36.qcow2.bz2 [295M] e2fsck -fp /dev/mapper/ost_lfs2_36 lfs2-OST0024: Entry '62977970' in /O/0/d18 (88080410) has deleted/unused inode 1051496. CLEARED. lfs2-OST0024: 1546929/89620480 files (9.5% non-contiguous), 2367418676/5735710720 blocks e2fsck -fp /dev/mapper/ost_lfs2_36 lfs2-OST0024: Entry '62977970' in /O/0/d18 (88080410) has deleted/unused inode 1051496. CLEARED. lfs2-OST0024: 1546929/89620480 files (9.5% non-contiguous), 2367418676/5735710720 blocks

          People

            niu Niu Yawei (Inactive)
            kitwestneat Kit Westneat (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: