Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2901

Duplicate filename on the same ldiskfs directory on MDS

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.3.0, Lustre 2.1.3, Lustre 2.1.6, Lustre 2.4.1
    • None
    • 3
    • 6988

    Description

      We meet 3 times on 3 differents lustre MDS ldiskfs filesystem
      a duplicate name file on a directory, for example:

      2012 Dec 4 10:59:22 bigfoot2 :

      cdep4-MDT0000 Starting fsck
      cdep4-MDT0000 Pass 1: Checking inodes, blocks, and sizes
      cdep4-MDT0000 Pass 2: Checking directory structure
      cdep4-MDT0000 Duplicate entry 'BDE_UCD.00000-00001' found.
      cdep4-MDT0000 Marking /ROOT/yack/group/USER/test9 (836767265) to be rebuilt.
      cdep4-MDT0000
      cdep4-MDT0000 Duplicate entry 'BDE_UCD.00000-00001' found.
      cdep4-MDT0000 Marking /ROOT/yack/group/USER/test9_pscc (838872002) to be rebuilt.
      cdep4-MDT0000
      cdep4-MDT0000 Pass 3: Checking directory connectivity
      cdep4-MDT0000 Pass 3A: optimizing directories
      cdep4-MDT0000 Entry 'BDE_UCD.00000-00001' in /ROOT/yack/group/USER/test9 (836767265) has a non-unique filename.
      cdep4-MDT0000 Rename to BDE_UCD.00000-0000~0? yes
      cdep4-MDT0000
      cdep4-MDT0000 Entry 'BDE_UCD.00000-00001' in /ROOT/yack/group/USER/test9_pscc (838872002) has a non-unique filename.
      cdep4-MDT0000 Rename to BDE_UCD.00000-0000~0? yes
      cdep4-MDT0000
      cdep4-MDT0000 Pass 4: Checking reference counts
      cdep4-MDT0000 Pass 5: Checking group summary information
      cdep4-MDT0000
      cdep4-MDT0000: ***** FILE SYSTEM WAS MODIFIED *****
      cdep4-MDT0000: 2872950/878051328 files (0.8% non-contiguous), 111956436/878047232 blocks

      In this case, running the ls command we can see the issue:

      total 0
      Fri Nov 30 16:20:22 + 0.00 ###############################################################################
      Fri Nov 30 16:20:22 + 0.00 ## Contenu du repertoire cache_dep /cea/cache_dep/yack/group/USER/test9_ ##
      Fri Nov 30 16:20:22 + 0.00 ## pscc ##
      Fri Nov 30 16:20:22 + 0.00 ###############################################################################
      total 331844
      rw-rr- 1 USER f7 10240 Nov 30 16:19 BDE_DIVERS
      rw-r---- 1 USER f7 140615680 Nov 30 16:09 BDE_MAILLAGE
      rw-rr- 1 USER f7 10240 Nov 30 16:20 BDE_POST1D
      rw-rr- 1 USER f7 99563008 Nov 30 16:16 BDE_UCD.00000-00001
      rw-rr- 1 USER f7 99563008 Nov 30 16:16 BDE_UCD.00000-00001

      Then running debugfs after the fsck we can see that both 2 files
      exist on 2 different inodes:

      [root@bigfoot2 ~]# /usr/lib/lustre/debugfs /dev/mapper/da1vg0_mdt
      debugfs 1.42.3.wc3 (15-Aug-2012)
      debugfs: ls
      2 (12) . 2 (12) .. 11 (20) lost+found 589299713 (16) CONFIGS
      637534209 (16) OBJECTS 12 (20) lov_objid 13 (16) oi.16
      14 (12) fld 15 (16) seq_srv 16 (16) seq_ctl 17 (20) capa_keys
      627048449 (16) PENDING 643825665 (12) ROOT 18 (20) last_rcvd
      700448769 (20) REM_OBJ_DIR 19 (3852) CATALOGS
      debugfs: cd ROOT/yack/group/USER
      debugfs: cd test9_pscc
      debugfs: ls
      838872002 (28) . 643826507 (28) ..
      838873362 (28) BDE_UCD.00000-0000~0 838875198 (48) BDE_UCD.00000-00001
      838875615 (36) BDE_POST1D 838877439 (40) BDE_MAILLAGE
      838877482 (36) BDE_DIVERS 838877492 (3852) BDE_PROT_LAG.00001-00001
      debugfs:
      debugfs: stat BDE_UCD.00000-0000~0
      Inode: 838873362 Type: regular Mode: 0644 Flags: 0x0
      Generation: 2475899703 Version: 0x0000002b:1ecd9699
      User: 3083 Group: 5214 Size: 0
      File ACL: 0 Directory ACL: 0
      Links: 1 Blockcount: 0
      Fragment: Address: 0 Number: 0 Size: 0
      ctime: 0x50b8cde4:00000000 – Fri Nov 30 16:16:52 2012
      atime: 0x50bf189a:00000000 – Wed Dec 5 10:49:14 2012
      mtime: 0x50b8cde4:00000000 – Fri Nov 30 16:16:52 2012
      crtime: 0x50b8cc66:aa4a5734 – Fri Nov 30 16:10:30 2012
      Size of extra inode fields: 28
      Extended attributes stored in inode body:
      lma = "00 00 00 00 00 00 00 00 71 be 71 17 02 00 00 00 4b 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0
      0 00 00 00 00 " (64)
      lma: fid=[0x21771be71:0x4b:0x0]
      lov = "d0 0b d1 0b 01 00 00 00 4b 00 00 00 00 00 00 00 71 be 71 17 02 00 00 00 00 00 40 00 03 00 00 00 99 9f
      01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 13 00 00 00 18 ad 00 0
      0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0d 02 00 00 c9 af 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      00 00 53 02 00 00 " (104)
      link = "df f1 ea 11 01 00 00 00 3d 00 00 00 00 00 00 00 39 41 6c 65 70 68 6f 4d 00 25 00 00 00 02 17 59 a0 d1
      00 00 34 fb 00 00 00 00 42 44 45 5f 55 43 44 2e 30 30 30 30 30 2d 30 30 30
      30 31 " (61)
      BLOCKS:

      debugfs: stat BDE_UCD.00000-00001
      Inode: 838875198 Type: regular Mode: 0644 Flags: 0x0
      Generation: 2475897856 Version: 0x0000002b:1ec71356
      User: 3083 Group: 5214 Size: 0
      File ACL: 0 Directory ACL: 0
      Links: 1 Blockcount: 0
      Fragment: Address: 0 Number: 0 Size: 0
      ctime: 0x50b8c8fd:00000000 – Fri Nov 30 15:55:57 2012
      atime: 0x50bf18a0:00000000 – Wed Dec 5 10:49:20 2012
      mtime: 0x50b8c8fd:00000000 – Fri Nov 30 15:55:57 2012
      crtime: 0x50b8c6c2:977f5a84 – Fri Nov 30 15:46:26 2012
      Size of extra inode fields: 28
      Extended attributes stored in inode body:
      lma = "00 00 00 00 00 00 00 00 ef 0a 4f 17 02 00 00 00 52 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0
      0 00 00 00 00 " (64)
      lma: fid=[0x2174f0aef:0x52:0x0]
      lov = "d0 0b d1 0b 01 00 00 00 52 00 00 00 00 00 00 00 ef 0a 4f 17 02 00 00 00 00 00 40 00 03 00 00 00 59 27
      01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 c2 00 00 00 9b a6 00 0
      0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 b9 01 00 00 08 ac 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      00 00 db 02 00 00 " (104)
      link = "df f1 ea 11 01 00 00 00 3d 00 00 00 00 00 00 00 39 41 6c 65 70 68 6f 4d 00 25 00 00 00 02 17 59 a0 d1
      00 00 34 fb 00 00 00 00 42 44 45 5f 55 43 44 2e 30 30 30 30 30 2d 30 30 30
      30 31 " (61)
      BLOCKS:

      Another trace can be found attached to this ticket.

      Is this a known issue?

      Attachments

        1. fsck_issue_duplicate
          16 kB
        2. duplicate_name.c
          1 kB
        3. debug.gz
          212 kB

        Issue Links

          Activity

            [LU-2901] Duplicate filename on the same ldiskfs directory on MDS
            pjones Peter Jones added a comment -

            Seems fixes have landed where needed

            pjones Peter Jones added a comment - Seems fixes have landed where needed
            bogl Bob Glossman (Inactive) added a comment - backports to b2_4: http://review.whamcloud.com/7315 http://review.whamcloud.com/7316

            Thanks for the clarification, Bob!

            sebastien.buisson Sebastien Buisson (Inactive) added a comment - Thanks for the clarification, Bob!

            Sebastian,
            The 2nd patch in master, #6592, only changes log and error messages. It uses new macros that were defined in the master version of the first patch but were left out of the b2_1 back port. This makes it hard to back port the 2nd patch.

            Only the first patch, #6591, contains actual functional fixes so that was all that was back ported.

            bogl Bob Glossman (Inactive) added a comment - Sebastian, The 2nd patch in master, #6592, only changes log and error messages. It uses new macros that were defined in the master version of the first patch but were left out of the b2_1 back port. This makes it hard to back port the 2nd patch. Only the first patch, #6591, contains actual functional fixes so that was all that was back ported.

            Hi,

            I think we need clarification regarding the various patches proposed in here. If I understand correctly, we have:

            Does it mean that the patch that really fixes the issue is missing in b2_1?

            Thanks in advance,
            Sebastien.

            sebastien.buisson Sebastien Buisson (Inactive) added a comment - Hi, I think we need clarification regarding the various patches proposed in here. If I understand correctly, we have: http://review.whamcloud.com/6591 : patch for master, adds a new test in sanity; http://review.whamcloud.com/6592 : patch for master, code cleanup; http://review.whamcloud.com/6678 : patch for b2_1, port of http://review.whamcloud.com/6591 . Does it mean that the patch that really fixes the issue is missing in b2_1? Thanks in advance, Sebastien.

            back port to b2_1:
            http://review.whamcloud.com/#/c/6678

            This back port leaves out the first 2 files from the original patch as advised by Oleg.

            bogl Bob Glossman (Inactive) added a comment - back port to b2_1: http://review.whamcloud.com/#/c/6678 This back port leaves out the first 2 files from the original patch as advised by Oleg.
            laisiyao Lai Siyao added a comment -

            I updated the patch of http://review.whamcloud.com/#change,6591 with the way I described above.

            laisiyao Lai Siyao added a comment - I updated the patch of http://review.whamcloud.com/#change,6591 with the way I described above.
            laisiyao Lai Siyao added a comment -

            IMHO a one-line check before mdo_link() in mdt_reint_link() could fix this: check return value of mdt_lookup_version_check() for target is -ENOENT, otherwise -EEXIST is returned. (it's the same as mdt_md_create()).

            Now this check is only missing for link operation, if this check should only be done in ldiskfs, other operations should follow this rule too.

            laisiyao Lai Siyao added a comment - IMHO a one-line check before mdo_link() in mdt_reint_link() could fix this: check return value of mdt_lookup_version_check() for target is -ENOENT, otherwise -EEXIST is returned. (it's the same as mdt_md_create()). Now this check is only missing for link operation, if this check should only be done in ldiskfs, other operations should follow this rule too.

            probably we'd have to scan more than 1 block if they share same hash value? should happen too rare to affect performance though.

            bzzz Alex Zhuravlev added a comment - probably we'd have to scan more than 1 block if they share same hash value? should happen too rare to affect performance though.

            Yes, I was wondering about this also. At one point I thought it was just a dcache lookup, but it seems the dentry being used is totally fake.

            The most efficient way to do this would be to just scan the whole ldiskfs leaf block when doing the insert, since add_dirent_to_buf() is already scanning it looking for free space. It is most likely going to scan most of the used part of the block anyway, so this would add relatively little overhead. It would be good to structure this change so it is only done if being called by Lustre (e.g. if ldiskfs_dentry_param is passed). That gives us some hope of having it accepted upstream.

            adilger Andreas Dilger added a comment - Yes, I was wondering about this also. At one point I thought it was just a dcache lookup, but it seems the dentry being used is totally fake. The most efficient way to do this would be to just scan the whole ldiskfs leaf block when doing the insert, since add_dirent_to_buf() is already scanning it looking for free space. It is most likely going to scan most of the used part of the block anyway, so this would add relatively little overhead. It would be good to structure this change so it is only done if being called by Lustre (e.g. if ldiskfs_dentry_param is passed). That gives us some hope of having it accepted upstream.

            People

              laisiyao Lai Siyao
              dmoreno Diego Moreno (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              21 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: