Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3542

deleted/unused inodes not actually cleared by e2fsck

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • None
    • None
    • None
    • Centos5, e2fsprogs-1.42.7.wc1-0redhat
    • 2
    • 8914

    Description

      e2fsck doesn't actually clear deleted/unused inodes, though it claims to. I've attached a log showing what we are seeing. The customer is CalTech.

      Attachments

        1. e2fsck_safe_repair_ost_3.log-1
          20 kB
        2. e2fsck_safe_repair_ost_3.log-2
          17 kB
        3. e2fsck.log
          9 kB
        4. fsck.hpfs2-eg3-oss11.ost0.2013_11_07.out1
          186 kB
        5. htree.dump
          122 kB

        Activity

          [LU-3542] deleted/unused inodes not actually cleared by e2fsck
          pjones Peter Jones added a comment -

          This fix has landed for the next e2fsprogs release

          pjones Peter Jones added a comment - This fix has landed for the next e2fsprogs release
          kitwestneat Kit Westneat (Inactive) added a comment - - edited

          ok, I can get a patch for that.

          I ran gcc with -Wconversion on the source code and there are a few other cases where it converts to blk_t from blk64_t. I guess it would be good to go through them all at some point... I am not sure I know enough about ext4 to judge if the conversion is valid or not. For example, pass2.c also has a conversion on line 890:

          struct dx_dirblock_info {                                                           
              int     type;                                                                   
              blk_t       phys;                                                               
              int     flags;                                                                  
              blk_t       parent;                                                             
              ext2_dirhash_t  min_hash;                                                       
              ext2_dirhash_t  max_hash;                                                       
              ext2_dirhash_t  node_min_hash;                                                  
              ext2_dirhash_t  node_max_hash;                                                  
          };                                                                                  
                                                                                              
          ...
          
                  dx_db = &dx_dir->dx_block[db->blockcnt];                                    
                  dx_db->type = DX_DIRBLOCK_LEAF;                                             
          890>>   dx_db->phys = block_nr;                                                     
                  dx_db->min_hash = ~0;                                                       
                  dx_db->max_hash = 0;                                                        
          

          Should those be 64-bit? It seems like it, but I don't know. There are 103 cases of conversion to blk_t from blk64_t . The real number of conversions is probably higher since there are also some like:

          fileio.c:164: warning: conversion to ‘blk_t’ from ‘__u64’ may alter its value
          res_gdt.c:140: warning: conversion to ‘blk_t’ from ‘long long unsigned int’ may alter its value
          pass2.c:687: warning: conversion to ‘blk_t’ from ‘e2_blkcnt_t’ may alter its value
          
          kitwestneat Kit Westneat (Inactive) added a comment - - edited ok, I can get a patch for that. I ran gcc with -Wconversion on the source code and there are a few other cases where it converts to blk_t from blk64_t. I guess it would be good to go through them all at some point... I am not sure I know enough about ext4 to judge if the conversion is valid or not. For example, pass2.c also has a conversion on line 890: struct dx_dirblock_info { int type; blk_t phys; int flags; blk_t parent; ext2_dirhash_t min_hash; ext2_dirhash_t max_hash; ext2_dirhash_t node_min_hash; ext2_dirhash_t node_max_hash; }; ... dx_db = &dx_dir->dx_block[db->blockcnt]; dx_db->type = DX_DIRBLOCK_LEAF; 890>> dx_db->phys = block_nr; dx_db->min_hash = ~0; dx_db->max_hash = 0; Should those be 64-bit? It seems like it, but I don't know. There are 103 cases of conversion to blk_t from blk64_t . The real number of conversions is probably higher since there are also some like: fileio.c:164: warning: conversion to ‘blk_t’ from ‘__u64’ may alter its value res_gdt.c:140: warning: conversion to ‘blk_t’ from ‘ long long unsigned int ’ may alter its value pass2.c:687: warning: conversion to ‘blk_t’ from ‘e2_blkcnt_t’ may alter its value

          Yes, I think that's probably the reason of the entries are not fixed. check_dir_block() should call ext2fs_write_dir_block3() directly.

          niu Niu Yawei (Inactive) added a comment - Yes, I think that's probably the reason of the entries are not fixed. check_dir_block() should call ext2fs_write_dir_block3() directly.

          oh I think it is the definition of ext2fs_write_dir_block:

          errcode_t ext2fs_write_dir_block(ext2_filsys fs, blk_t block,                       
                           void *inbuf)                                                       
          
          typedef __u32       blk_t;                                                          
          typedef __u64       blk64_t;
          

          It seems like that should be blk64_t? It looks like ext2fs_write_dir_block3 uses blk64_t, but the call to ext2fs_write_dir_block already casts it down to blk_t

          Breakpoint 1, check_dir_block (fs=<value optimized out>, db=0x7ffff7f14a00, priv_data=0x7fffffffe180) at pass2.c:1219
          1219                    cd->pctx.errcode = ext2fs_write_dir_block(fs, block_nr, buf);
          (gdb) p block_nr
          $35 = 4966058603
          (gdb) cont
          Continuing.
          
          Breakpoint 3, ext2fs_write_dir_block3 (fs=0x647420, block=671091307, inbuf=0x667270, flags=0) at dirblock.c:146
          146             return io_channel_write_blk64(fs->io, block, 1, (char *) inbuf);
          (gdb) p (blk_t)4966058603
          $36 = 671091307
          
          kitwestneat Kit Westneat (Inactive) added a comment - oh I think it is the definition of ext2fs_write_dir_block: errcode_t ext2fs_write_dir_block(ext2_filsys fs, blk_t block, void *inbuf) typedef __u32 blk_t; typedef __u64 blk64_t; It seems like that should be blk64_t? It looks like ext2fs_write_dir_block3 uses blk64_t, but the call to ext2fs_write_dir_block already casts it down to blk_t Breakpoint 1, check_dir_block (fs=<value optimized out>, db=0x7ffff7f14a00, priv_data=0x7fffffffe180) at pass2.c:1219 1219 cd->pctx.errcode = ext2fs_write_dir_block(fs, block_nr, buf); (gdb) p block_nr $35 = 4966058603 (gdb) cont Continuing. Breakpoint 3, ext2fs_write_dir_block3 (fs=0x647420, block=671091307, inbuf=0x667270, flags=0) at dirblock.c:146 146 return io_channel_write_blk64(fs->io, block, 1, ( char *) inbuf); (gdb) p (blk_t)4966058603 $36 = 671091307
          kitwestneat Kit Westneat (Inactive) added a comment - - edited

          It looks like the block number is wrapping around during the io_channel write:

          Entry '3102500' in /O/0/d4 (19398664) has deleted/unused inode 26072855.  Clear? yes
          Breakpoint 1, check_dir_block (fs=<value optimized out>, db=0x7ffff7f14340, priv_data=0x7fffffffe180) at pass2.c:1219
          1219                    cd->pctx.errcode = ext2fs_write_dir_block(fs, block_nr, buf);
          (gdb) p block_nr
          $30 = 4966058525
          
          Breakpoint 2, raw_write_blk (channel=0x647570, data=0x648670, block=671091229, count=1, bufv=0x64f060) at unix_io.c:233
          
          (gdb) p (unsigned int)4966058525
          $33 = 671091229
          

          I thought maybe it was the cache node, but that appears to use an unsigned long long to store the block.

          I'll keep looking but I thought I'd pass that info along in case it helps.

          kitwestneat Kit Westneat (Inactive) added a comment - - edited It looks like the block number is wrapping around during the io_channel write: Entry '3102500' in /O/0/d4 (19398664) has deleted/unused inode 26072855. Clear? yes Breakpoint 1, check_dir_block (fs=<value optimized out>, db=0x7ffff7f14340, priv_data=0x7fffffffe180) at pass2.c:1219 1219 cd->pctx.errcode = ext2fs_write_dir_block(fs, block_nr, buf); (gdb) p block_nr $30 = 4966058525 Breakpoint 2, raw_write_blk (channel=0x647570, data=0x648670, block=671091229, count=1, bufv=0x64f060) at unix_io.c:233 (gdb) p (unsigned int )4966058525 $33 = 671091229 I thought maybe it was the cache node, but that appears to use an unsigned long long to store the block. I'll keep looking but I thought I'd pass that info along in case it helps.

          It seems like this doesn't actually produce a valid raw image:
          e2image -r OSTnnnn.qcow OSTnnnn.raw

          I had to do:
          qemu-img convert -p -O raw /scratch/ost000b.qcow ost000b.raw

          to get something that worked.

          kitwestneat Kit Westneat (Inactive) added a comment - It seems like this doesn't actually produce a valid raw image: e2image -r OSTnnnn.qcow OSTnnnn.raw I had to do: qemu-img convert -p -O raw /scratch/ost000b.qcow ost000b.raw to get something that worked.

          Hi Niu,

          I don't think ext4 supports files greater than 16TB, so you'd need to use XFS or ZFS.

          Yeah, the files on the DDN server are temporary.. I'll upload it to the Intel FTP server.

          Thanks,
          Kit

          kitwestneat Kit Westneat (Inactive) added a comment - Hi Niu, I don't think ext4 supports files greater than 16TB, so you'd need to use XFS or ZFS. Yeah, the files on the DDN server are temporary.. I'll upload it to the Intel FTP server. Thanks, Kit

          Hi Niu, I was able to convert the qcow image to a raw (sparse) image on an XFS filesystem. It uses 3.6GB, though it reports a size of 28TB:
          [root@oxonia-mds1 rcvy]# ls -lh
          total 3.6G
          rw------ 1 root root 28T Nov 22 10:04 ost000b.raw

          oh, I didn't notice it's sparse file. Then I think it can be converted on ext4 either, however, I got following error while trying to convert it on ext4 (actual size 1.6G, showed 16T):

          e2image: Invalid argument while trying to convert qcow2 image (ost000b.qcow) into raw image
          

          If the 3.6G file you mentioned is http://ddntsr.com/ftp/2013-10-30-lustre-ost_lfs2_36.qcow2.bz2, could you upload it to whamcloud ftp? cause I have no permission to access the ddn ftp server.

          niu Niu Yawei (Inactive) added a comment - Hi Niu, I was able to convert the qcow image to a raw (sparse) image on an XFS filesystem. It uses 3.6GB, though it reports a size of 28TB: [root@oxonia-mds1 rcvy] # ls -lh total 3.6G rw------ 1 root root 28T Nov 22 10:04 ost000b.raw oh, I didn't notice it's sparse file. Then I think it can be converted on ext4 either, however, I got following error while trying to convert it on ext4 (actual size 1.6G, showed 16T): e2image: Invalid argument while trying to convert qcow2 image (ost000b.qcow) into raw image If the 3.6G file you mentioned is http://ddntsr.com/ftp/2013-10-30-lustre-ost_lfs2_36.qcow2.bz2 , could you upload it to whamcloud ftp? cause I have no permission to access the ddn ftp server.

          Hi Niu, I was able to convert the qcow image to a raw (sparse) image on an XFS filesystem. It uses 3.6GB, though it reports a size of 28TB:
          [root@oxonia-mds1 rcvy]# ls -lh
          total 3.6G
          rw------ 1 root root 28T Nov 22 10:04 ost000b.raw

          kitwestneat Kit Westneat (Inactive) added a comment - Hi Niu, I was able to convert the qcow image to a raw (sparse) image on an XFS filesystem. It uses 3.6GB, though it reports a size of 28TB: [root@oxonia-mds1 rcvy] # ls -lh total 3.6G rw ------ 1 root root 28T Nov 22 10:04 ost000b.raw

          People

            niu Niu Yawei (Inactive)
            kitwestneat Kit Westneat (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: