Details
-
Bug
-
Resolution: Fixed
-
Critical
-
None
-
None
-
None
-
Centos5, e2fsprogs-1.42.7.wc1-0redhat
-
2
-
8914
Description
e2fsck doesn't actually clear deleted/unused inodes, though it claims to. I've attached a log showing what we are seeing. The customer is CalTech.
Attachments
Activity
ok, I can get a patch for that.
I ran gcc with -Wconversion on the source code and there are a few other cases where it converts to blk_t from blk64_t. I guess it would be good to go through them all at some point... I am not sure I know enough about ext4 to judge if the conversion is valid or not. For example, pass2.c also has a conversion on line 890:
struct dx_dirblock_info { int type; blk_t phys; int flags; blk_t parent; ext2_dirhash_t min_hash; ext2_dirhash_t max_hash; ext2_dirhash_t node_min_hash; ext2_dirhash_t node_max_hash; }; ... dx_db = &dx_dir->dx_block[db->blockcnt]; dx_db->type = DX_DIRBLOCK_LEAF; 890>> dx_db->phys = block_nr; dx_db->min_hash = ~0; dx_db->max_hash = 0;
Should those be 64-bit? It seems like it, but I don't know. There are 103 cases of conversion to blk_t from blk64_t . The real number of conversions is probably higher since there are also some like:
fileio.c:164: warning: conversion to ‘blk_t’ from ‘__u64’ may alter its value res_gdt.c:140: warning: conversion to ‘blk_t’ from ‘long long unsigned int’ may alter its value pass2.c:687: warning: conversion to ‘blk_t’ from ‘e2_blkcnt_t’ may alter its value
Yes, I think that's probably the reason of the entries are not fixed. check_dir_block() should call ext2fs_write_dir_block3() directly.
oh I think it is the definition of ext2fs_write_dir_block:
errcode_t ext2fs_write_dir_block(ext2_filsys fs, blk_t block, void *inbuf) typedef __u32 blk_t; typedef __u64 blk64_t;
It seems like that should be blk64_t? It looks like ext2fs_write_dir_block3 uses blk64_t, but the call to ext2fs_write_dir_block already casts it down to blk_t
Breakpoint 1, check_dir_block (fs=<value optimized out>, db=0x7ffff7f14a00, priv_data=0x7fffffffe180) at pass2.c:1219 1219 cd->pctx.errcode = ext2fs_write_dir_block(fs, block_nr, buf); (gdb) p block_nr $35 = 4966058603 (gdb) cont Continuing. Breakpoint 3, ext2fs_write_dir_block3 (fs=0x647420, block=671091307, inbuf=0x667270, flags=0) at dirblock.c:146 146 return io_channel_write_blk64(fs->io, block, 1, (char *) inbuf); (gdb) p (blk_t)4966058603 $36 = 671091307
It looks like the block number is wrapping around during the io_channel write:
Entry '3102500' in /O/0/d4 (19398664) has deleted/unused inode 26072855. Clear? yes Breakpoint 1, check_dir_block (fs=<value optimized out>, db=0x7ffff7f14340, priv_data=0x7fffffffe180) at pass2.c:1219 1219 cd->pctx.errcode = ext2fs_write_dir_block(fs, block_nr, buf); (gdb) p block_nr $30 = 4966058525 Breakpoint 2, raw_write_blk (channel=0x647570, data=0x648670, block=671091229, count=1, bufv=0x64f060) at unix_io.c:233 (gdb) p (unsigned int)4966058525 $33 = 671091229
I thought maybe it was the cache node, but that appears to use an unsigned long long to store the block.
I'll keep looking but I thought I'd pass that info along in case it helps.
It seems like this doesn't actually produce a valid raw image:
e2image -r OSTnnnn.qcow OSTnnnn.raw
I had to do:
qemu-img convert -p -O raw /scratch/ost000b.qcow ost000b.raw
to get something that worked.
Hi Niu,
I don't think ext4 supports files greater than 16TB, so you'd need to use XFS or ZFS.
Yeah, the files on the DDN server are temporary.. I'll upload it to the Intel FTP server.
Thanks,
Kit
Hi Niu, I was able to convert the qcow image to a raw (sparse) image on an XFS filesystem. It uses 3.6GB, though it reports a size of 28TB:
[root@oxonia-mds1 rcvy]# ls -lh
total 3.6G
rw------ 1 root root 28T Nov 22 10:04 ost000b.raw
oh, I didn't notice it's sparse file. Then I think it can be converted on ext4 either, however, I got following error while trying to convert it on ext4 (actual size 1.6G, showed 16T):
e2image: Invalid argument while trying to convert qcow2 image (ost000b.qcow) into raw image
If the 3.6G file you mentioned is http://ddntsr.com/ftp/2013-10-30-lustre-ost_lfs2_36.qcow2.bz2, could you upload it to whamcloud ftp? cause I have no permission to access the ddn ftp server.
Hi Niu, I was able to convert the qcow image to a raw (sparse) image on an XFS filesystem. It uses 3.6GB, though it reports a size of 28TB:
[root@oxonia-mds1 rcvy]# ls -lh
total 3.6G
rw------ 1 root root 28T Nov 22 10:04 ost000b.raw
The raw device of ftp.whamcloud.com/uploads/LU-3542/ost000b.qcow.bz2 is 16TB? It's hard for me to find a machine with that big drive to reproduce the problem, is there any smaller OST which has the same problem?
This fix has landed for the next e2fsprogs release