[LU-3138] E2fsck sees some "Deleted inode 14 has zero dtime. Fix? no" after upgraded from 1.8 to 2.4 Created: 09/Apr/13  Updated: 22/Apr/13  Resolved: 22/Apr/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0
Fix Version/s: Lustre 2.4.0

Type: Bug Priority: Blocker
Reporter: Di Wang Assignee: Di Wang
Resolution: Fixed Votes: 0
Labels: LB

Severity: 3
Rank (Obsolete): 7618

 Description   

I found this problem when I was trying to fix some DNE problem. But it turns out this problem exists on single MDT as well. This is easy to reproduce this script

LOAD=y sh llmount.sh
tar -jxvf disk1_8-ldiskfs.tar.bz2
cp ./mdt /tmp/lustre-mdt1
cp ./ost /tmp/lustre-ost1
../utils/tunefs.lustre --writeconf --mgsnode=testnode /tmp/lustre-mdt1
e2fsck -fnvd /tmp/lustre-mdt1
../utils/tunefs.lustre --writeconf --mgsnode=testnode /tmp/lustre-ost1
mount -t lustre -o loop /tmp/lustre-mdt1 /mnt/mds1
mount -t lustre -o loop /tmp/lustre-ost1 /mnt/ost1
mount -t lustre testnode:/t32fs  /mnt/lustre
echo sleep 5 seconds
sleep 5
umount /mnt/lustre
umount /mnt/ost1
umount /mnt/mds1

e2fsck -fnvd /tmp/lustre-mdt1
sh llmountcleanup.sh

here is the e2fsck result

e2fsck -fnvd /tmp/lustre-mdt1
+ e2fsck -fnvd /tmp/lustre-mdt1
e2fsck 1.42.3.wc3 (15-Aug-2012)
Pass 1: Checking inodes, blocks, and sizes
Deleted inode 14 has zero dtime.  Fix? no

Deleted inode 15 has zero dtime.  Fix? no

Deleted inode 16 has zero dtime.  Fix? no

Deleted inode 20 has zero dtime.  Fix? no

Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Block bitmap differences:  -(16104--16112) -(16115--16119)
Fix? no

Inode bitmap differences:  -(14--16) -20
Fix? no


t32fs-MDT0000: ********** WARNING: Filesystem still has errors **********


     756 inodes used (0.76%)
       6 non-contiguous files (0.8%)
       0 non-contiguous directories (0.0%)
         # of inodes with ind/dind/tind blocks: 0/0/0
   16955 blocks used (33.91%)
       0 bad blocks
       1 large file

     181 regular files
      56 directories
       0 character device files
       0 block device files
       0 fifos
       6 links
     506 symbolic links (506 fast symbolic links)
       0 sockets
--------
     749 files
sh llmountcleanup.sh
+ sh llmountcleanup.sh
Stopping clients: testnode /mnt/lustre (opts:-f)
Stopping clients: testnode /mnt/lustre2 (opts:-f)
modules unloaded.

And these inodes turns out to be the old config log

/mnt/mds1/CONFIGS:
13 mountdata  14 t32fs-client  15 t32fs-MDT0000  20 t32fs-OST0000  16 t32fs-params


 Comments   
Comment by Andreas Dilger [ 10/Apr/13 ]

Is this a problem with nlinks on these files (i.e. they are accidentally being unlinked), or are they intentionally being unlinked and the only problem is that osd-ldiskfs is not setting dtime on the unlinked inodes for some reason? I recall we had some problems with link counts for local objects, is there possibly already a patch for this?

Comment by Peter Jones [ 10/Apr/13 ]

Emoly

Could you please look into this one?

Thanks

Peter

Comment by Emoly Liu [ 11/Apr/13 ]

I will have a check.

Comment by Emoly Liu [ 11/Apr/13 ]

I can't reproduce this problem with the test script. The test output is:

[root@centos6-3 tests]# sh 3138_tests.sh 
++ hostname
+ testnode=centos6-3
+ LOAD=y
+ sh llmount.sh
Loading modules from /root/master/lustre/tests/..
detected 2 online CPUs by sysfs
Force libcfs to create 2 CPU partitions
../libcfs/libcfs/libcfs options: 'cpu_npartitions=2'
debug=vfstrace rpctrace dlmtrace neterror ha config ioctl super
subsystem_debug=all -lnet -lnd -pinger
gss/krb5 is not supported
quota/lquota options: 'hash_lqs_cur_bits=3'
+ tar -jxvf disk1_8-ldiskfs.tar.bz2
arch
bspace
commit
ispace
kernel
list
mdt
ost
sha1sums
+ cp -f ./mdt /tmp/lustre-mdt1
+ cp -f ./ost /tmp/lustre-ost1
+ ../utils/tunefs.lustre --writeconf --mgsnode=centos6-3 /tmp/lustre-mdt1
checking for existing Lustre data: found
Reading CONFIGS/mountdata

   Read previous values:
Target:     t32fs-MDT0000
Index:      0
Lustre FS:  t32fs
Mount type: ldiskfs
Flags:      0x5
              (MDT MGS )
Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr,acl
Parameters: sys.timeout=20 lov.stripesize=1048576 lov.stripecount=0


   Permanent disk data:
Target:     t32fs=MDT0000
Index:      0
Lustre FS:  t32fs
Mount type: ldiskfs
Flags:      0x105
              (MDT MGS writeconf )
Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr,acl
Parameters: sys.timeout=20 lov.stripesize=1048576 lov.stripecount=0 mgsnode=10.211.55.7@tcp

Writing CONFIGS/mountdata
+ e2fsck -fnvd /tmp/lustre-mdt1
e2fsck 1.42.6.wc2 (10-Dec-2012)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information

         625 inodes used (0.62%, out of 100000)
           6 non-contiguous files (1.0%)
           0 non-contiguous directories (0.0%)
             # of inodes with ind/dind/tind blocks: 0/0/0
       16741 blocks used (33.48%, out of 50000)
           0 bad blocks
           1 large file

          95 regular files
          15 directories
           0 character device files
           0 block device files
           0 fifos
           0 links
         506 symbolic links (506 fast symbolic links)
           0 sockets
------------
         616 files
+ ../utils/tunefs.lustre --writeconf --mgsnode=centos6-3 /tmp/lustre-ost1
checking for existing Lustre data: found
Reading CONFIGS/mountdata

   Read previous values:
Target:     t32fs-OST0000
Index:      0
Lustre FS:  t32fs
Mount type: ldiskfs
Flags:      0x2
              (OST )
Persistent mount opts: errors=remount-ro,extents,mballoc
Parameters: sys.timeout=20 mgsnode=192.168.203.129@tcp


   Permanent disk data:
Target:     t32fs=OST0000
Index:      0
Lustre FS:  t32fs
Mount type: ldiskfs
Flags:      0x102
              (OST writeconf )
Persistent mount opts: errors=remount-ro,extents,mballoc
Parameters: sys.timeout=20 mgsnode=192.168.203.129@tcp mgsnode=10.211.55.7@tcp

Writing CONFIGS/mountdata
+ mount -t lustre -o loop /tmp/lustre-mdt1 /mnt/mds1
+ mount -t lustre -o loop /tmp/lustre-ost1 /mnt/ost1
+ mount -t lustre centos6-3:/t32fs /mnt/lustre
+ echo sleep 5 seconds
sleep 5 seconds
+ sleep 5
+ umount /mnt/lustre
+ umount /mnt/ost1
+ umount /mnt/mds1
+ e2fsck -fnvd /tmp/lustre-mdt1
e2fsck 1.42.6.wc2 (10-Dec-2012)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information

         751 inodes used (0.75%, out of 100000)
           6 non-contiguous files (0.8%)
           0 non-contiguous directories (0.0%)
             # of inodes with ind/dind/tind blocks: 0/0/0
       16938 blocks used (33.88%, out of 50000)
           0 bad blocks
           1 large file

         180 regular files
          56 directories
           0 character device files
           0 block device files
           0 fifos
           6 links
         506 symbolic links (506 fast symbolic links)
           0 sockets
------------
         748 files
+ sh llmountcleanup.sh
Stopping clients: centos6-3 /mnt/lustre (opts:-f)
Stopping clients: centos6-3 /mnt/lustre2 (opts:-f)
modules unloaded.

The top commit log of my working branch is "2fede8c LU-3026 llite: setattr to override permission check for owner". Probably as Andreas said there was already a patch "landed" for this?

Comment by Emoly Liu [ 11/Apr/13 ]

I fetch the latest commit "9a01e2b LU-3000" and still can't reproduce this problem.

Wangdi, could you please update your master branch and see if this problem still exists? Thanks.

Comment by Di Wang [ 15/Apr/13 ]

Hmm, problem is still there in my local tests with current master, though it can not be reproduced every time, maybe you can try wait 10 seconds after mount client?
Andreas, yes, these logs are supposed to be removed during mount process, if we add --writeconf by tunefs or mount -o. So Emoly, you probably needs to check anything wrong in mgs_erase_log, IMHO. Thanks.

Comment by Emoly Liu [ 16/Apr/13 ]

As suggested, I used 2 MDTs, sleeped 10 seconds before umount, changed e2fsprogs to 1.42.3.wc3 (15-Aug-2012) and ran 10 times, but still can't reproduce it.

[root@centos6-3 tests]# sh 3138_tests.sh  
++ hostname
+ testnode=centos6-3
+ wait_sec=10
+ LOAD=y
+ sh llmount.sh
Loading modules from /root/master/lustre/tests/..
detected 2 online CPUs by sysfs
Force libcfs to create 2 CPU partitions
../libcfs/libcfs/libcfs options: 'cpu_npartitions=2'
debug=vfstrace rpctrace dlmtrace neterror ha config ioctl super
subsystem_debug=all -lnet -lnd -pinger
gss/krb5 is not supported
quota/lquota options: 'hash_lqs_cur_bits=3'
+ tar -jxvf disk1_8-ldiskfs.tar.bz2
arch
bspace
commit
ispace
kernel
list
mdt
ost
sha1sums
+ cp -f ./mdt /tmp/lustre-mdt1
+ cp -f ./ost /tmp/lustre-ost1
+ ../utils/tunefs.lustre --writeconf --mgsnode=centos6-3 /tmp/lustre-mdt1
checking for existing Lustre data: found
Reading CONFIGS/mountdata

   Read previous values:
Target:     t32fs-MDT0000
Index:      0
Lustre FS:  t32fs
Mount type: ldiskfs
Flags:      0x5
              (MDT MGS )
Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr,acl
Parameters: sys.timeout=20 lov.stripesize=1048576 lov.stripecount=0


   Permanent disk data:
Target:     t32fs=MDT0000
Index:      0
Lustre FS:  t32fs
Mount type: ldiskfs
Flags:      0x105
              (MDT MGS writeconf )
Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr,acl
Parameters: sys.timeout=20 lov.stripesize=1048576 lov.stripecount=0 mgsnode=10.211.55.7@tcp

Writing CONFIGS/mountdata
+ e2fsck -fnvd /tmp/lustre-mdt1
e2fsck 1.42.3.wc3 (15-Aug-2012)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information

     625 inodes used (0.62%)
       6 non-contiguous files (1.0%)
       0 non-contiguous directories (0.0%)
         # of inodes with ind/dind/tind blocks: 0/0/0
   16741 blocks used (33.48%)
       0 bad blocks
       1 large file

      95 regular files
      15 directories
       0 character device files
       0 block device files
       0 fifos
       0 links
     506 symbolic links (506 fast symbolic links)
       0 sockets
--------
     616 files
+ ../utils/tunefs.lustre --writeconf --mgsnode=centos6-3 /tmp/lustre-ost1
checking for existing Lustre data: found
Reading CONFIGS/mountdata

   Read previous values:
Target:     t32fs-OST0000
Index:      0
Lustre FS:  t32fs
Mount type: ldiskfs
Flags:      0x2
              (OST )
Persistent mount opts: errors=remount-ro,extents,mballoc
Parameters: sys.timeout=20 mgsnode=192.168.203.129@tcp


   Permanent disk data:
Target:     t32fs=OST0000
Index:      0
Lustre FS:  t32fs
Mount type: ldiskfs
Flags:      0x102
              (OST writeconf )
Persistent mount opts: errors=remount-ro,extents,mballoc
Parameters: sys.timeout=20 mgsnode=192.168.203.129@tcp mgsnode=10.211.55.7@tcp

Writing CONFIGS/mountdata
+ ../utils/mkfs.lustre --reformat --mgsnode=centos6-3 --mdt --index 1 --fsname=t32fs --device-size=104800 /tmp/lustre-mdt2

   Permanent disk data:
Target:     t32fs:MDT0001
Index:      1
Lustre FS:  t32fs
Mount type: ldiskfs
Flags:      0x61
              (MDT first_time update )
Persistent mount opts: user_xattr,errors=remount-ro
Parameters: mgsnode=10.211.55.7@tcp

formatting backing filesystem ldiskfs on /dev/loop0
	target name  t32fs:MDT0001
	4k blocks     26200
	options        -I 512 -i 2048 -q -O dirdata,uninit_bg,^extents,dir_nlink,quota,huge_file,flex_bg -E lazy_journal_init -F
mkfs_cmd = mke2fs -j -b 4096 -L t32fs:MDT0001  -I 512 -i 2048 -q -O dirdata,uninit_bg,^extents,dir_nlink,quota,huge_file,flex_bg -E lazy_journal_init -F /dev/loop0 26200
Writing CONFIGS/mountdata
+ mount -t lustre -o loop /tmp/lustre-mdt1 /mnt/mds1
+ mount -t lustre -o loop /tmp/lustre-mdt2 /mnt/mds2
+ mount -t lustre -o loop /tmp/lustre-ost1 /mnt/ost1
+ mount -t lustre centos6-3:/t32fs /mnt/lustre
+ echo sleep 10 seconds
sleep 10 seconds
+ sleep 10
+ umount /mnt/lustre
+ umount /mnt/ost1
+ umount /mnt/mds1
+ umount /mnt/mds2
+ e2fsck -fnvd /tmp/lustre-mdt1
e2fsck 1.42.3.wc3 (15-Aug-2012)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information

     752 inodes used (0.75%)
       6 non-contiguous files (0.8%)
       0 non-contiguous directories (0.0%)
         # of inodes with ind/dind/tind blocks: 0/0/0
   16942 blocks used (33.88%)
       0 bad blocks
       1 large file

     180 regular files
      57 directories
       0 character device files
       0 block device files
       0 fifos
       7 links
     506 symbolic links (506 fast symbolic links)
       0 sockets
--------
     750 files
+ sh llmountcleanup.sh
Stopping clients: centos6-3 /mnt/lustre (opts:-f)
Stopping clients: centos6-3 /mnt/lustre2 (opts:-f)
modules unloaded.
Comment by Di Wang [ 17/Apr/13 ]

http://review.whamcloud.com/6072

Comment by Peter Jones [ 22/Apr/13 ]

Landed for 2.43

Generated at Sat Feb 10 01:31:18 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.