Details

    • Bug
    • Resolution: Fixed
    • Critical
    • None
    • Lustre 2.1.2
    • None
    • 2
    • 4008

    Description

      I have been seeing a large number of messages like the one below on the production /scratch FS.

      Aug 17 17:54:52 mds07 mds07 kernel: LDISKFS-fs warning (device dm-2): ldiskfs_dx_add_entry: Directory index full!

      the /scratch FS temporarily holds user /home directories until I install new hardware for separate lustre /home FS . The area of /scratch that is holding user /home directories is backed up on daily basis

      Device dm-2 is the mdt for our production scratch FS. The file system has around 160M files at the moment and from what I found by reading various posts the LDISKFS message above suggests that we may have a very large directories in our/scrtach FS. I decided to run fsck -fD which supposedly should optimize directory structures and get rid of the above problem (at least temporarily)

      Unfortunately this turned out to be a bad idea. The first pass of fsck found over 3200 invalid Symlinks and decided to clear them, for example
      Symlink /ROOT/new_home/dws29/sandy/InstallArea/XML/CamMapCut64.pie (inode #66608225) is invalid.
      Clear<y>? yes
      I have checked those supposedly invalid symlinks with our /home backup and the symlink are actually correct, so fsck just removed over 3k valid symlinks.

      /mnt/backup/home/dws29/sandy/InstallArea/XML/CamMapCut64.pie -> /home/dws29/sandy/Task_pkg/HL2_PowellSnakes/v00-00-020000_CVSHEAD/cmt/../XMLMODULESCHECKED//CamMapCut64.pie

      Obviously in a whole /scratch FS we have much more than 3K of Symlinks so I am puzzled by what criteria fsck decided to clear these particular Symlinks.
      I was able to recover some of them from our /home backup but the ones that were in not backed up area of /scratch were cleared forever (not cool).

      I ran second pass of fsck and then mounted MDT back. Everything seemed ok until the overnight rsync backup process started to copy files and found many I/O errors when trying to enter some directories, for example
      ls /home/ad491/progs/gromacs-3.3.3/src/gmxlib/gmx_blas
      ls: reading directory /home/ad491/progs/gromacs-3.3.3/src/gmxlib/gmx_blas: Input/output error

      I can see inside this directories from mds by using debugfs, so I am hoping that the data are not completely gone.
      debugfs -c -R 'ls -l ROOT/new_home/ad491/progs/gromacs-3.3.3/src/gmxlib/gmx_blas/' /dev/mapper/mds08_scratch_mdt
      debugfs 1.42.3.wc1 (28-May-2012)
      /dev/mapper/mds08_scratch_mdt: catastrophic mode - not reading inode or group bitmaps
      51920683 40775 (18) 9040 9043 8192 19-Dec-2009 16:23 .
      51920609 40775 (18) 9040 9043 16384 19-Dec-2009 16:26 ..
      51921092 40775 (18) 9040 9043 4096 19-Dec-2009 16:23 .deps
      51921094 40775 (18) 9040 9043 4096 19-Dec-2009 16:23 .libs
      51923088 100664 (17) 9040 9043 0 19-Dec-2009 16:19 Makefile
      51923090 100644 (17) 9040 9043 0 2-Feb-2005 13:05 Makefile.am
      51923092 100644 (17) 9040 9043 0 28-Feb-2008 15:41 Makefile.in
      51923093 100644 (17) 9040 9043 0 24-Aug-2005 01:41 dasum.c
      51923094 100664 (17) 9040 9043 0 19-Dec-2009 16:23 dasum.lo
      51923095 100664 (17) 9040 9043 0 19-Dec-2009 16:23 dasum.o
      51923096 100644 (17) 9040 9043 0 2-Feb-2005 13:05 daxpy.c
      51923097 100664 (17) 9040 9043 0 19-Dec-2009 16:23 daxpy.lo
      51923098 100664 (17) 9040 9043 0 19-Dec-2009 16:23 daxpy.o

      Again I am able to recover directories that are on backed up area of scratch but this is not a lot and many of the corrupted directories are not backed up. Is there any way to reverse/fix what -D optimisation did and reconstruct the data?

      I am attaching a log from fsck
      and also few days worth of syslog messages from mds and oss servers. Please not that on the 17Aug around 6pm we had an IB network aoutage and there will be some noise related to these problems in the logs.

      Also maybe worth mentioning the FS is less than couple of months old and it was created using e2fsprogs-1.42.3.wc1-7.el6.x86_64 which already had some fixes for fsck -D issues.

      Attachments

        Issue Links

          Activity

            [LU-1774] fsck -fD corrupts filesystem
            bobijam Zhenyu Xu added a comment -

            landed for e2fsprogs 1.42.6

            bobijam Zhenyu Xu added a comment - landed for e2fsprogs 1.42.6

            We have found a method to recover the data and copy them to a new filesystem. However I think that it still be useful to others to be able to repair the corruption rather than have to copy the data.
            I tested the patch and it recovers access to the corrupted directories but it does not fix it completely. So application accessing the dot or dot dot directory still receives I/O error.
            For example if /scratch/yyy directory have been corrupted and then fixed rsync of /scratch/yyy will fail with I/O error but rsync of /scratch/yyy/* will work fine.

            wjt27 Wojciech Turek added a comment - We have found a method to recover the data and copy them to a new filesystem. However I think that it still be useful to others to be able to repair the corruption rather than have to copy the data. I tested the patch and it recovers access to the corrupted directories but it does not fix it completely. So application accessing the dot or dot dot directory still receives I/O error. For example if /scratch/yyy directory have been corrupted and then fixed rsync of /scratch/yyy will fail with I/O error but rsync of /scratch/yyy/* will work fine.
            bobijam Zhenyu Xu added a comment -

            patch tracking at http://review.whamcloud.com/3799

            patch description
                LU-1774 e2fsck: e2fsck -D does not change dirdata content
            
                * Fix dir optimization to preserver dirdata content for dot and dotdot
                  entries.
            
                * Add test case.
            
            bobijam Zhenyu Xu added a comment - patch tracking at http://review.whamcloud.com/3799 patch description LU-1774 e2fsck: e2fsck -D does not change dirdata content * Fix dir optimization to preserver dirdata content for dot and dotdot entries. * Add test case.

            A potential customer is testing 2.1.2 release and ran into this issue?

            hellenn Hellen (Inactive) added a comment - A potential customer is testing 2.1.2 release and ran into this issue?

            I am surprised that there is not much progress on this serious issue that everybody using lustre is affected by at the moment.

            I managed to reproduce the problem on my test filesystem, these are the steps:
            1)create test filesystem with latest e2fsprogs wc3 release.
            2) Mount testfs on the client and create directory, fill it with files so the size of the directory is bigger then 4K
            cd /ltestfs
            ls -al /ltestfs/new_scratch1/
            total 40
            drwxr-xr-x 4 root root 4096 Aug 23 03:35 .
            drwxr-xr-x 4 root root 4096 Aug 23 03:35 ..
            drwxr-x--- 229 sjr20 sjr20 20480 Jun 22 10:25 sjr20
            drwxr-x--- 131 wjt27 wjt27 12288 Mar 15 20:19 wjt27
            3) umount MDT and run fsck -fvD on it. Evry time you run it e2fsck will modify the filesystem.
            4) mount MDT back and on the client move directory, for example I moved them one level down
            mv new_scratch1/* .
            ls -al
            total 48
            drwxr-xr-x 6 root root 4096 Aug 23 10:51 .
            drwxr-xr-x 32 root root 4096 Aug 23 00:58 ..
            drwxr-xr-x 2 root root 4096 Aug 22 21:06 .lustre
            drwxr-xr-x 2 root root 4096 Aug 23 10:51 new_scratch1
            drwxr-x--- 229 sjr20 sjr20 20480 Jun 22 10:25 sjr20
            drwxr-x--- 131 wjt27 wjt27 12288 Mar 15 20:19 wjt27
            5) try to list directory
            ls -al wjt27/
            ls: reading directory wjt27/: Input/output error
            total 0

            I hope that helps in debugging the problem.

            wjt27 Wojciech Turek added a comment - I am surprised that there is not much progress on this serious issue that everybody using lustre is affected by at the moment. I managed to reproduce the problem on my test filesystem, these are the steps: 1)create test filesystem with latest e2fsprogs wc3 release. 2) Mount testfs on the client and create directory, fill it with files so the size of the directory is bigger then 4K cd /ltestfs ls -al /ltestfs/new_scratch1/ total 40 drwxr-xr-x 4 root root 4096 Aug 23 03:35 . drwxr-xr-x 4 root root 4096 Aug 23 03:35 .. drwxr-x--- 229 sjr20 sjr20 20480 Jun 22 10:25 sjr20 drwxr-x--- 131 wjt27 wjt27 12288 Mar 15 20:19 wjt27 3) umount MDT and run fsck -fvD on it. Evry time you run it e2fsck will modify the filesystem. 4) mount MDT back and on the client move directory, for example I moved them one level down mv new_scratch1/* . ls -al total 48 drwxr-xr-x 6 root root 4096 Aug 23 10:51 . drwxr-xr-x 32 root root 4096 Aug 23 00:58 .. drwxr-xr-x 2 root root 4096 Aug 22 21:06 .lustre drwxr-xr-x 2 root root 4096 Aug 23 10:51 new_scratch1 drwxr-x--- 229 sjr20 sjr20 20480 Jun 22 10:25 sjr20 drwxr-x--- 131 wjt27 wjt27 12288 Mar 15 20:19 wjt27 5) try to list directory ls -al wjt27/ ls: reading directory wjt27/: Input/output error total 0 I hope that helps in debugging the problem.

            I was wondering if you could update me of any development on this apparent critical issue.
            After yesterday's e2fsck run that meant to only fix NUL termination of symlinks we have identify a 26 top level user directories that can not be accessed due to I/O error. It seem that all the affected directories are this that size is bigger than 4K.

            wjt27 Wojciech Turek added a comment - I was wondering if you could update me of any development on this apparent critical issue. After yesterday's e2fsck run that meant to only fix NUL termination of symlinks we have identify a 26 top level user directories that can not be accessed due to I/O error. It seem that all the affected directories are this that size is bigger than 4K.

            People

              bobijam Zhenyu Xu
              wjt27 Wojciech Turek
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: