[LU-1774] fsck -fD corrupts filesystem Created: 20/Aug/12  Updated: 04/Dec/12  Resolved: 04/Dec/12

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.1.2
Fix Version/s: None

Type: Bug Priority: Critical
Reporter: Wojciech Turek (Inactive) Assignee: Zhenyu Xu
Resolution: Fixed Votes: 0
Labels: None
Environment:

lustre-source-2.1.2-2.6.32_220.17.1.el6_lustre.x86_64.x86_64
lustre-2.1.2-2.6.32_220.17.1.el6_lustre.x86_64.x86_64
kernel-2.6.32-220.17.1.el6_lustre.x86_64
kernel-ib-1.5.3-2.6.32_220.17.1.el6_lustre.x86_64.x86_64
kernel-firmware-2.6.32-220.17.1.el6_lustre.x86_64
kernel-ib-devel-1.5.3-2.6.32_220.17.1.el6_lustre.x86_64.x86_64
lustre-tests-2.1.2-2.6.32_220.17.1.el6_lustre.x86_64.x86_64
kernel-headers-2.6.32-220.17.1.el6_lustre.x86_64
lustre-ldiskfs-3.3.0-2.6.32_220.17.1.el6_lustre.x86_64.x86_64
kernel-mft-2.7.1-2.6.32_220.17.1.el6_lustre.x86_64.x86_64
kernel-devel-2.6.32-220.17.1.el6_lustre.x86_64
lustre-modules-2.1.2-2.6.32_220.17.1.el6_lustre.x86_64.x86_64

e2fsprogs-libs-1.42.3.wc1-7.el6.x86_64
e2fsprogs-devel-1.42.3.wc1-7.el6.x86_64
e2fsprogs-1.42.3.wc1-7.el6.x86_64

2.6.32-220.17.1.el6_lustre.x86_64


Attachments: File fsck_scratch_mdt_17AUG12     File screenlog_scratch_mdt_fsck_fvn.log     File screenlog_scratch_mdt_fsck_fvy.log     File syslog_messages_oss31_oss36_mds07.tar.bz2     File syslog_messages_oss31_oss36_mds07_from_29Jul_to_12Aug.tar.bz2    
Issue Links:
Related
is related to LU-1540 e2fsck remove too many symlinks Resolved
is related to LU-1366 getting "dirdata length set incorrect... Closed
Severity: 2
Rank (Obsolete): 4008

 Description   

I have been seeing a large number of messages like the one below on the production /scratch FS.

Aug 17 17:54:52 mds07 mds07 kernel: LDISKFS-fs warning (device dm-2): ldiskfs_dx_add_entry: Directory index full!

the /scratch FS temporarily holds user /home directories until I install new hardware for separate lustre /home FS . The area of /scratch that is holding user /home directories is backed up on daily basis

Device dm-2 is the mdt for our production scratch FS. The file system has around 160M files at the moment and from what I found by reading various posts the LDISKFS message above suggests that we may have a very large directories in our/scrtach FS. I decided to run fsck -fD which supposedly should optimize directory structures and get rid of the above problem (at least temporarily)

Unfortunately this turned out to be a bad idea. The first pass of fsck found over 3200 invalid Symlinks and decided to clear them, for example
Symlink /ROOT/new_home/dws29/sandy/InstallArea/XML/CamMapCut64.pie (inode #66608225) is invalid.
Clear<y>? yes
I have checked those supposedly invalid symlinks with our /home backup and the symlink are actually correct, so fsck just removed over 3k valid symlinks.

/mnt/backup/home/dws29/sandy/InstallArea/XML/CamMapCut64.pie -> /home/dws29/sandy/Task_pkg/HL2_PowellSnakes/v00-00-020000_CVSHEAD/cmt/../XMLMODULESCHECKED//CamMapCut64.pie

Obviously in a whole /scratch FS we have much more than 3K of Symlinks so I am puzzled by what criteria fsck decided to clear these particular Symlinks.
I was able to recover some of them from our /home backup but the ones that were in not backed up area of /scratch were cleared forever (not cool).

I ran second pass of fsck and then mounted MDT back. Everything seemed ok until the overnight rsync backup process started to copy files and found many I/O errors when trying to enter some directories, for example
ls /home/ad491/progs/gromacs-3.3.3/src/gmxlib/gmx_blas
ls: reading directory /home/ad491/progs/gromacs-3.3.3/src/gmxlib/gmx_blas: Input/output error

I can see inside this directories from mds by using debugfs, so I am hoping that the data are not completely gone.
debugfs -c -R 'ls -l ROOT/new_home/ad491/progs/gromacs-3.3.3/src/gmxlib/gmx_blas/' /dev/mapper/mds08_scratch_mdt
debugfs 1.42.3.wc1 (28-May-2012)
/dev/mapper/mds08_scratch_mdt: catastrophic mode - not reading inode or group bitmaps
51920683 40775 (18) 9040 9043 8192 19-Dec-2009 16:23 .
51920609 40775 (18) 9040 9043 16384 19-Dec-2009 16:26 ..
51921092 40775 (18) 9040 9043 4096 19-Dec-2009 16:23 .deps
51921094 40775 (18) 9040 9043 4096 19-Dec-2009 16:23 .libs
51923088 100664 (17) 9040 9043 0 19-Dec-2009 16:19 Makefile
51923090 100644 (17) 9040 9043 0 2-Feb-2005 13:05 Makefile.am
51923092 100644 (17) 9040 9043 0 28-Feb-2008 15:41 Makefile.in
51923093 100644 (17) 9040 9043 0 24-Aug-2005 01:41 dasum.c
51923094 100664 (17) 9040 9043 0 19-Dec-2009 16:23 dasum.lo
51923095 100664 (17) 9040 9043 0 19-Dec-2009 16:23 dasum.o
51923096 100644 (17) 9040 9043 0 2-Feb-2005 13:05 daxpy.c
51923097 100664 (17) 9040 9043 0 19-Dec-2009 16:23 daxpy.lo
51923098 100664 (17) 9040 9043 0 19-Dec-2009 16:23 daxpy.o

Again I am able to recover directories that are on backed up area of scratch but this is not a lot and many of the corrupted directories are not backed up. Is there any way to reverse/fix what -D optimisation did and reconstruct the data?

I am attaching a log from fsck
and also few days worth of syslog messages from mds and oss servers. Please not that on the 17Aug around 6pm we had an IB network aoutage and there will be some noise related to these problems in the logs.

Also maybe worth mentioning the FS is less than couple of months old and it was created using e2fsprogs-1.42.3.wc1-7.el6.x86_64 which already had some fixes for fsck -D issues.



 Comments   
Comment by Wojciech Turek (Inactive) [ 20/Aug/12 ]

Some of the problems I am seeing seem to be related to LU-1366 and LU-1540, so e2fsprogs-1.42.3.wc3 should at least sort out the symlink problem but I can not find anything related to a directory corruption.

Comment by Cliff White (Inactive) [ 21/Aug/12 ]

Your MDS logs start on August 12th and at that time the error message is already happening. Is it possible to get logs for the MDS from prior to August 12th? Can you determine when the error first appeared?

Comment by Wojciech Turek (Inactive) [ 21/Aug/12 ]

Lustre syslog messages from 29Jul till 12 Aug

Comment by Wojciech Turek (Inactive) [ 21/Aug/12 ]

Hi Cliff,

I attached earlier syslogs. Please not though that the corruption occurred after running e2fsck with -D option on 17 of August.

The situation got much worst today and it stops us from running /scratch filesystem, see detials below.

I have decided to run e2fsck on scratch mdt today to fix symlinks that were missing NUL terminators. I updated e2fsprogs to the latest build see below
e2fsprogs-1.42.3.wc3-7.el6.x86_64

I first run fsck -fvn to see what will be done and only symlinks problem were reported so I ran fsck -fvy which fixed bad symlinks but nothing else was reported to be fixed. Then I mounted filesystem as normal. Unfortunately the "old" directory corruption (which occurred on the 17AUG) was still there but also new directories were corrupted. For example I have detected that a large number user directories on /lscratch fs including myself were corrupted and I can not access them any more. Also mds log is full of scary messages about corruption , see below

logs from client that I run ls on corrupted directories:
Aug 21 20:18:52 west-1-1 kernel: Lustre: Mounted lscratch-client
Aug 21 20:19:45 west-1-1 kernel: Lustre: Mounted lhome-client
Aug 21 21:19:11 west-1-1 kernel: LustreError: 9836:0:(dir.c:478:ll_get_dir_page()) read cache page: [0x20000045f:0xe16e:0x0] at 0: rc -5
Aug 21 21:19:11 west-1-1 kernel: LustreError: 9836:0:(dir.c:649:ll_readdir()) error reading dir [0x20000045f:0xe16e:0x0] at 0: rc -5
Aug 21 21:29:54 west-1-1 kernel: LustreError: 9882:0:(dir.c:478:ll_get_dir_page()) read cache page: [0x200000404:0x2a4:0x0] at 0: rc -5
Aug 21 21:29:54 west-1-1 kernel: LustreError: 9882:0:(dir.c:649:ll_readdir()) error reading dir [0x200000404:0x2a4:0x0] at 0: rc -5
Aug 21 21:32:03 west-1-1 kernel: LustreError: 9890:0:(dir.c:439:ll_get_dir_page()) dir page locate: [0x200000404:0x2a4:0x0] at 0: rc -5
Aug 21 21:32:03 west-1-1 kernel: LustreError: 9890:0:(dir.c:649:ll_readdir()) error reading dir [0x200000404:0x2a4:0x0] at 0: rc -5

MDS log

Aug 21 21:19:11 10.143.245.207 mds07 kernel: Lustre: 12085:0:(mdd_object.c:2412:__mdd_readpage()) build page failed: -5!
Aug 21 21:27:27 10.143.245.207 mds07 kernel: LDISKFS-fs warning (device dm-2): dx_probe: dx entry: limit != root limit
Aug 21 21:27:27 10.143.245.207 mds07 kernel: LDISKFS-fs warning (device dm-2): dx_probe: Corrupt dir inode 22551510, running e2fsck is recommended.
Aug 21 21:27:27 10.143.245.207 mds07 kernel: Lustre: 12085:0:(mdd_object.c:2412:__mdd_readpage()) build page failed: -5!
Aug 21 21:29:01 10.143.245.207 mds07 kernel: LDISKFS-fs warning (device dm-2): dx_probe: dx entry: limit != root limit
Aug 21 21:29:01 10.143.245.207 mds07 kernel: LDISKFS-fs warning (device dm-2): dx_probe: Corrupt dir inode 22551603, running e2fsck is recommended.
Aug 21 21:29:01 10.143.245.207 mds07 kernel: Lustre: 12085:0:(mdd_object.c:2412:__mdd_readpage()) build page failed: -5!
Aug 21 21:29:54 10.143.245.207 mds07 kernel: Lustre: 12085:0:(mdd_object.c:2412:__mdd_readpage()) build page failed: -5!
Aug 21 21:29:59 10.143.245.207 mds07 kernel: LDISKFS-fs warning (device dm-2): dx_probe: dx entry: limit != root limit
Aug 21 21:29:59 10.143.245.207 mds07 kernel: LDISKFS-fs warning (device dm-2): dx_probe: Corrupt dir inode 23073075, running e2fsck is recommended.
Aug 21 21:29:59 10.143.245.207 mds07 kernel: Lustre: 12085:0:(mdd_object.c:2412:__mdd_readpage()) build page failed: -5!
Aug 21 21:32:15 10.143.245.207 mds07 kernel: Lustre: 12085:0:(mdd_object.c:2412:__mdd_readpage()) build page failed: -5!
Aug 21 21:34:45 10.143.245.207 mds07 kernel: LDISKFS-fs warning (device dm-2): dx_probe: dx entry: limit != root limit
Aug 21 21:34:45 10.143.245.207 mds07 kernel: LDISKFS-fs warning (device dm-2): dx_probe: Corrupt dir inode 22550971, running e2fsck is recommended.
Aug 21 21:34:45 10.143.245.207 mds07 kernel: Lustre: 12085:0:(mdd_object.c:2412:__mdd_readpage()) build page failed: -5!
Aug 21 21:34:56 10.143.245.207 mds07 kernel: LDISKFS-fs warning (device dm-2): dx_probe: dx entry: limit != root limit
Aug 21 21:34:56 10.143.245.207 mds07 kernel: LDISKFS-fs warning (device dm-2): dx_probe: Corrupt dir inode 22551058, running e2fsck is recommended.
Aug 21 21:35:08 10.143.245.207 mds07 kernel: LDISKFS-fs warning (device dm-2): dx_probe: dx entry: limit != root limit
Aug 21 21:35:08 10.143.245.207 mds07 kernel: LDISKFS-fs warning (device dm-2): dx_probe: Corrupt dir inode 22551153, running e2fsck is recommended.
Aug 21 21:35:08 10.143.245.207 mds07 kernel: Lustre: 12085:0:(mdd_object.c:2412:__mdd_readpage()) build page failed: -5!
Aug 21 21:35:08 10.143.245.207 mds07 kernel: Lustre: 12085:0:(mdd_object.c:2412:__mdd_readpage()) Skipped 1 previous similar message
Aug 21 21:35:08 10.143.245.207 mds07 kernel: LDISKFS-fs warning (device dm-2): dx_probe: dx entry: limit != root limit
Aug 21 21:35:08 10.143.245.207 mds07 kernel: LDISKFS-fs warning (device dm-2): dx_probe: Corrupt dir inode 22551158, running e2fsck is recommended.
Aug 21 21:35:09 10.143.245.207 mds07 kernel: LDISKFS-fs warning (device dm-2): dx_probe: Unrecognised inode hash code 182 for directory #23073097
Aug 21 21:35:09 10.143.245.207 mds07 kernel: LDISKFS-fs warning (device dm-2): dx_probe: Corrupt dir inode 23073097, running e2fsck is recommended.
Aug 21 21:50:15 10.143.245.207 mds07 kernel: Lustre: 12085:0:(mdd_object.c:2412:__mdd_readpage()) build page failed: -5!
Aug 21 21:50:15 10.143.245.207 mds07 kernel: Lustre: 12085:0:(mdd_object.c:2412:__mdd_readpage()) Skipped 1 previous similar message
Aug 21 21:50:29 10.143.245.207 mds07 kernel: LDISKFS-fs warning (device dm-2): dx_probe: dx entry: limit != root limit
Aug 21 21:50:29 10.143.245.207 mds07 kernel: LDISKFS-fs warning (device dm-2): dx_probe: Corrupt dir inode 23073110, running e2fsck is recommended.
Aug 21 21:50:29 10.143.245.207 mds07 kernel: Lustre: 12085:0:(mdd_object.c:2412:__mdd_readpage()) build page failed: -5!
Aug 21 21:50:31 10.143.245.207 mds07 kernel: LDISKFS-fs warning (device dm-2): dx_probe: dx entry: limit != root limit
Aug 21 21:50:31 10.143.245.207 mds07 kernel: LDISKFS-fs warning (device dm-2): dx_probe: Corrupt dir inode 22551292, running e2fsck is recommended.
Aug 21 21:50:33 10.143.245.207 mds07 kernel: LDISKFS-fs warning (device dm-2): dx_probe: dx entry: limit != root limit
Aug 21 21:50:33 10.143.245.207 mds07 kernel: LDISKFS-fs warning (device dm-2): dx_probe: Corrupt dir inode 23073121, running e2fsck is recommended.
Aug 21 21:50:33 10.143.245.207 mds07 kernel: LDISKFS-fs warning (device dm-2): dx_probe: dx entry: limit != root limit
Aug 21 21:50:33 10.143.245.207 mds07 kernel: LDISKFS-fs warning (device dm-2): dx_probe: Corrupt dir inode 22551319, running e2fsck is recommended.
Aug 21 21:50:36 10.143.245.207 mds07 kernel: LDISKFS-fs warning (device dm-2): dx_probe: dx entry: limit != root limit
Aug 21 21:50:36 10.143.245.207 mds07 kernel: LDISKFS-fs warning (device dm-2): dx_probe: Corrupt dir inode 23073123, running e2fsck is recommended.
Aug 21 21:50:37 10.143.245.207 mds07 kernel: LDISKFS-fs warning (device dm-2): dx_probe: dx entry: limit != root limit
Aug 21 21:50:37 10.143.245.207 mds07 kernel: LDISKFS-fs warning (device dm-2): dx_probe: Corrupt dir inode 22551353, running e2fsck is recommended.
Aug 21 21:50:37 10.143.245.207 mds07 kernel: Lustre: 12085:0:(mdd_object.c:2412:__mdd_readpage()) build page failed: -5!
Aug 21 21:50:37 10.143.245.207 mds07 kernel: Lustre: 12085:0:(mdd_object.c:2412:__mdd_readpage()) Skipped 4 previous similar messages
Aug 21 21:50:38 10.143.245.207 mds07 kernel: LDISKFS-fs warning (device dm-2): dx_probe: Unrecognised inode hash code 5 for directory #22551357

There is more entries like that I am still running ls on the top directories of /lscratch to detect corruped ones.

This is very bad and I hope we can recover them.

I am attaching logs from both fsck runs

Comment by Wojciech Turek (Inactive) [ 22/Aug/12 ]

I was wondering if you could update me of any development on this apparent critical issue.
After yesterday's e2fsck run that meant to only fix NUL termination of symlinks we have identify a 26 top level user directories that can not be accessed due to I/O error. It seem that all the affected directories are this that size is bigger than 4K.

Comment by Wojciech Turek (Inactive) [ 23/Aug/12 ]

I am surprised that there is not much progress on this serious issue that everybody using lustre is affected by at the moment.

I managed to reproduce the problem on my test filesystem, these are the steps:
1)create test filesystem with latest e2fsprogs wc3 release.
2) Mount testfs on the client and create directory, fill it with files so the size of the directory is bigger then 4K
cd /ltestfs
ls -al /ltestfs/new_scratch1/
total 40
drwxr-xr-x 4 root root 4096 Aug 23 03:35 .
drwxr-xr-x 4 root root 4096 Aug 23 03:35 ..
drwxr-x--- 229 sjr20 sjr20 20480 Jun 22 10:25 sjr20
drwxr-x--- 131 wjt27 wjt27 12288 Mar 15 20:19 wjt27
3) umount MDT and run fsck -fvD on it. Evry time you run it e2fsck will modify the filesystem.
4) mount MDT back and on the client move directory, for example I moved them one level down
mv new_scratch1/* .
ls -al
total 48
drwxr-xr-x 6 root root 4096 Aug 23 10:51 .
drwxr-xr-x 32 root root 4096 Aug 23 00:58 ..
drwxr-xr-x 2 root root 4096 Aug 22 21:06 .lustre
drwxr-xr-x 2 root root 4096 Aug 23 10:51 new_scratch1
drwxr-x--- 229 sjr20 sjr20 20480 Jun 22 10:25 sjr20
drwxr-x--- 131 wjt27 wjt27 12288 Mar 15 20:19 wjt27
5) try to list directory
ls -al wjt27/
ls: reading directory wjt27/: Input/output error
total 0

I hope that helps in debugging the problem.

Comment by Hellen (Inactive) [ 23/Aug/12 ]

A potential customer is testing 2.1.2 release and ran into this issue?

Comment by Zhenyu Xu [ 27/Aug/12 ]

patch tracking at http://review.whamcloud.com/3799

patch description
    LU-1774 e2fsck: e2fsck -D does not change dirdata content

    * Fix dir optimization to preserver dirdata content for dot and dotdot
      entries.

    * Add test case.
Comment by Wojciech Turek (Inactive) [ 12/Sep/12 ]

We have found a method to recover the data and copy them to a new filesystem. However I think that it still be useful to others to be able to repair the corruption rather than have to copy the data.
I tested the patch and it recovers access to the corrupted directories but it does not fix it completely. So application accessing the dot or dot dot directory still receives I/O error.
For example if /scratch/yyy directory have been corrupted and then fixed rsync of /scratch/yyy will fail with I/O error but rsync of /scratch/yyy/* will work fine.

Comment by Zhenyu Xu [ 04/Dec/12 ]

landed for e2fsprogs 1.42.6

Generated at Sat Feb 10 01:19:34 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.