Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.2.0, Lustre 2.3.0, Lustre 2.1.1, Lustre 2.1.2, Lustre 2.1.3
    • None
    • 3
    • 4418

    Description

      Each time we run fsck on a MDS, we lose thousand of symlinks (which is a kind of corruption).
      To figure out a little what was wrong with those inodes, I did a small instrumentation of e2fsck_pass1_check_symlink and started to print various information about the inode (modification are attached).

      Here is what I got :

      [root@gaia14 ~]# ./e2fsck -n -f -v -d /dev/mapper/vg_mdt_work1-mdt_work1 | grep -B2 -A1 "at pass1.c:246"
      e2fsck 1.41.90.wc4 (01-Sep-2011)
      blocks != 0 && fs->blocksize = 4096, buf = %/home/cont001/segura/BIN/ELSA/CHAINE_V2/MODULES_PYTHON/GENERIQUES/Block.py_oldaux.py%
      len = 84, inode->i_size = 78
      at pass1.c:246 : offending inode 16819164 found !
      e2fsck_pass1:1416: increase inode 16819164 badness 0 to 1
      

      note the content of buf (result of io_channel_read_blk64), and specially the latest characters 'aux.py' (% was in the printf format).

      Looking at this inode using debugfs return

      debugfs: stat <16819164>
      Inode: 16819164 Type: symlink Mode: 0777 Flag
      Generation: 1029089099 Version: 0x0000004e:1a78c70
      User: 11876 Group: 1850 Size: 78
      File ACL: 0 Directory ACL: 0
      Links: 1 Blockcount: 8
      Fragment: Address: 0 Number: 0 Size: 0
       ctime: 0x4fd1a998:00000000 -- Fri Jun 8 09:28:24 20
       atime: 0x4fd1a998:00000000 -- Fri Jun 8 09:28:24 20
       mtime: 0x4fd1a998:00000000 -- Fri Jun 8 09:28:24 20
      crtime: 0x4fd1a998:e6373924 -- Fri Jun 8 09:28:24 20
      Size of extra inode fields: 28
      Extended attributes stored in inode body:
        lma = "00 00 00 00 00 00 00 00 56 fa 1b 0d 02 00 00
      00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
        link = "df f1 ea 11 01 00 00 00 36 00 00 00 00 00 0
       5c 00 00 00 00 42 6c 6f 63 6b 2e 70 79 5f 6f 6c 64 "
      BLOCKS:
      (0):17832933
      TOTAL: 1
      
      debugfs: cat <16819164>
      /home/cont001/segura/BIN/ELSA/CHAINE_V2/MODULES_PYTHON/GENERIQUES/Block.py_old
      

      The string is the same except that it doesn't display the 'aux.py', and the targeted file is what was expected.

      Accessing the filesystem through mount doesn't display any issue. Accessing the symlink works and the content is valid (the expected link has a length of inode->i_size).
      The buffer retrieved in e2fsck looks wrong and the end of the string (starting inode->i_size) always contain garbage.

      So who is wrong ? Is it correct for e2fsprog to guess that a symlink is always terminated by a '\0', giving strnlen a chance to return the right length ? Or is ldiskfs wrong in not enforcing the '\0' at inode->i_size position ? ... or something else ...

      Attachments

        1. gen.sh
          1 kB
        2. log.txt.gz
          6 kB
        3. src.txt
          3 kB

        Issue Links

          Activity

            [LU-1540] e2fsck remove too many symlinks
            pjones Peter Jones added a comment - http://review.whamcloud.com/#change,3560
            pjones Peter Jones added a comment -

            Bobijam

            I spoke with Andreas about this issue and he thinks that it makes sense for you to talk directly with Alex Z about ways of approaching this problem and that we should aim to get this fixed for 2.3

            Thanks

            Peter

            pjones Peter Jones added a comment - Bobijam I spoke with Andreas about this issue and he thinks that it makes sense for you to talk directly with Alex Z about ways of approaching this problem and that we should aim to get this fixed for 2.3 Thanks Peter

            Integrated in e2fsprogs-master » i686,el5 #226
            LU-1540 e2fsck: add missing symlink NUL terminator (Revision 311939a654b4085aef9559f21e91d7205150950f)

            Result = SUCCESS
            Andreas Dilger : 311939a654b4085aef9559f21e91d7205150950f
            Files :

            • tests/f_badsymlinks/expect.2
            • e2fsck/e2fsck.h
            • e2fsck/pass2.c
            • e2fsck/pass1.c
            • e2fsck/problem.c
            • tests/f_badsymlinks/expect.1
            • e2fsck/problem.h
            hudson Build Master (Inactive) added a comment - Integrated in e2fsprogs-master » i686,el5 #226 LU-1540 e2fsck: add missing symlink NUL terminator (Revision 311939a654b4085aef9559f21e91d7205150950f) Result = SUCCESS Andreas Dilger : 311939a654b4085aef9559f21e91d7205150950f Files : tests/f_badsymlinks/expect.2 e2fsck/e2fsck.h e2fsck/pass2.c e2fsck/pass1.c e2fsck/problem.c tests/f_badsymlinks/expect.1 e2fsck/problem.h

            Integrated in e2fsprogs-master » x86_64,el5 #226
            LU-1540 e2fsck: add missing symlink NUL terminator (Revision 311939a654b4085aef9559f21e91d7205150950f)

            Result = SUCCESS
            Andreas Dilger : 311939a654b4085aef9559f21e91d7205150950f
            Files :

            • e2fsck/pass2.c
            • tests/f_badsymlinks/expect.2
            • e2fsck/pass1.c
            • tests/f_badsymlinks/expect.1
            • e2fsck/e2fsck.h
            • e2fsck/problem.h
            • e2fsck/problem.c
            hudson Build Master (Inactive) added a comment - Integrated in e2fsprogs-master » x86_64,el5 #226 LU-1540 e2fsck: add missing symlink NUL terminator (Revision 311939a654b4085aef9559f21e91d7205150950f) Result = SUCCESS Andreas Dilger : 311939a654b4085aef9559f21e91d7205150950f Files : e2fsck/pass2.c tests/f_badsymlinks/expect.2 e2fsck/pass1.c tests/f_badsymlinks/expect.1 e2fsck/e2fsck.h e2fsck/problem.h e2fsck/problem.c

            Integrated in e2fsprogs-master » i686,el6 #226
            LU-1540 e2fsck: add missing symlink NUL terminator (Revision 311939a654b4085aef9559f21e91d7205150950f)

            Result = SUCCESS
            Andreas Dilger : 311939a654b4085aef9559f21e91d7205150950f
            Files :

            • e2fsck/problem.c
            • tests/f_badsymlinks/expect.1
            • e2fsck/problem.h
            • e2fsck/pass1.c
            • tests/f_badsymlinks/expect.2
            • e2fsck/pass2.c
            • e2fsck/e2fsck.h
            hudson Build Master (Inactive) added a comment - Integrated in e2fsprogs-master » i686,el6 #226 LU-1540 e2fsck: add missing symlink NUL terminator (Revision 311939a654b4085aef9559f21e91d7205150950f) Result = SUCCESS Andreas Dilger : 311939a654b4085aef9559f21e91d7205150950f Files : e2fsck/problem.c tests/f_badsymlinks/expect.1 e2fsck/problem.h e2fsck/pass1.c tests/f_badsymlinks/expect.2 e2fsck/pass2.c e2fsck/e2fsck.h

            Integrated in e2fsprogs-master » x86_64,el6 #226
            LU-1540 e2fsck: add missing symlink NUL terminator (Revision 311939a654b4085aef9559f21e91d7205150950f)

            Result = SUCCESS
            Andreas Dilger : 311939a654b4085aef9559f21e91d7205150950f
            Files :

            • tests/f_badsymlinks/expect.2
            • tests/f_badsymlinks/expect.1
            • e2fsck/e2fsck.h
            • e2fsck/problem.c
            • e2fsck/pass1.c
            • e2fsck/problem.h
            • e2fsck/pass2.c
            hudson Build Master (Inactive) added a comment - Integrated in e2fsprogs-master » x86_64,el6 #226 LU-1540 e2fsck: add missing symlink NUL terminator (Revision 311939a654b4085aef9559f21e91d7205150950f) Result = SUCCESS Andreas Dilger : 311939a654b4085aef9559f21e91d7205150950f Files : tests/f_badsymlinks/expect.2 tests/f_badsymlinks/expect.1 e2fsck/e2fsck.h e2fsck/problem.c e2fsck/pass1.c e2fsck/problem.h e2fsck/pass2.c

            How does osd-ldiskfs handle the case of a normal 80-byte write to a new file? The rest of the disk block beyond 80 bytes needs to be zero-filled, but I don't think any tricks are played with the file size. Can we not just hook into the ldiskfs symlink code?

            Cheers, Andreas

            adilger Andreas Dilger added a comment - How does osd-ldiskfs handle the case of a normal 80-byte write to a new file? The rest of the disk block beyond 80 bytes needs to be zero-filled, but I don't think any tricks are played with the file size. Can we not just hook into the ldiskfs symlink code? Cheers, Andreas

            I think osd-ldiskfs can check the object's type (=symlink), then either append a zero and decrement i_size or just use trivial copy of osd_ldiskfs_write_record() ?

            bzzz Alex Zhuravlev added a comment - I think osd-ldiskfs can check the object's type (=symlink), then either append a zero and decrement i_size or just use trivial copy of osd_ldiskfs_write_record() ?

            Alex, can you please comment on the osd-ldiskfs issues here. What is the best way to write out a NUL-terminated symlink without changing the file size? This shouldn't be any different than writing out a partial-page buffer (which should be zero-filled at the end).

            adilger Andreas Dilger added a comment - Alex, can you please comment on the osd-ldiskfs issues here. What is the best way to write out a NUL-terminated symlink without changing the file size? This shouldn't be any different than writing out a partial-page buffer (which should be zero-filled at the end).

            Since this is now known to be a bug in osd-ldiskfs and not e2fsck, I'm elevating this to be a 2.1.3 and 2.3 blocker.

            adilger Andreas Dilger added a comment - Since this is now known to be a bug in osd-ldiskfs and not e2fsck, I'm elevating this to be a 2.1.3 and 2.3 blocker.

            The patch to e2fsck in http://review.whamcloud.com/3171 can be used as a temporary workaround to the removal of corrupt symlinks, but it doesn't actually fix the problem on disk, only prevents e2fsck from deleting the symlinks. This is why I want an updated version of that patch landed to e2fsck.

            There is also as yet no patch to fix the osd-ldiskfs code to prevent further bad symlinks from being created in the future.

            Both of these issues need to be addressed before this bug can be closed.

            adilger Andreas Dilger added a comment - The patch to e2fsck in http://review.whamcloud.com/3171 can be used as a temporary workaround to the removal of corrupt symlinks, but it doesn't actually fix the problem on disk, only prevents e2fsck from deleting the symlinks. This is why I want an updated version of that patch landed to e2fsck. There is also as yet no patch to fix the osd-ldiskfs code to prevent further bad symlinks from being created in the future. Both of these issues need to be addressed before this bug can be closed.

            People

              bobijam Zhenyu Xu
              louveta Alexandre Louvet (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: