Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11446

ldiskfs inodes nlink mismatch with DNE

Details

    • Bug
    • Resolution: Unresolved
    • Major
    • Lustre 2.17.0
    • Lustre 2.11.0, Lustre 2.12.0
    • 3
    • 9223372036854775807

    Description

      It is easy to break ldiskfs format in a DNE system with async updates by adding extra hard links from another MDTs:

      1. start a DNE-enabled fs

      [root@vm1 tests]# REFORMAT=yes MDSCOUNT=4 OSTCOUNT=4 sh llmount.sh  
      ...
      quota/lquota options: 'hash_lqs_cur_bits=3'
      Formatting mgs, mds, osts
      Format mds1: /tmp/lustre-mdt1
      Format mds2: /tmp/lustre-mdt2
      Format mds3: /tmp/lustre-mdt3
      ...
      

      2. create a file on MDT0

      [root@vm1 tests]# touch /mnt/lustre/foo                                         [root@vm1 tests]#
      

      3. create a dir on another mdt.

      [root@vm1 tests]# lfs mkdir -i 1 /mnt/lustre/mdt1                               [root@vm1 tests]#
      

      4. create 20 hard links to /mnt/lustre/foo

      [root@vm1 tests]# for x in $(seq 1 20); do ln /mnt/lustre/foo /mnt/lustre/mdt1/foo-link-$x; done
      [root@vm1 tests]# ls -in /mnt/lustre/foo
      144115205322833921 -rw-r--r--. 21 0 0 0 Sep 15 10:06 /mnt/lustre/foo
      [root@vm1 tests]#
      

      5. shutdown the fs

      [root@vm1 tests]# MDSCOUNT=4 OSTCOUNT=4 sh llmountcleanup.sh
      Stopping clients: vm1.localdomain /mnt/lustre (opts:-f)
      Stopping client vm1.localdomain /mnt/lustre opts:-f
      Stopping clients: vm1.localdomain /mnt/lustre2 (opts:-f)
      
      

      6. run e2fsck on MDT0 image.

      [root@vm1 tests]# e2fsck -fnv /tmp/lustre-mdt1
      e2fsck 1.42.13.wc6 (05-Feb-2017)
      Pass 1: Checking inodes, blocks, and sizes
      Pass 2: Checking directory structure
      Pass 3: Checking directory connectivity
      Pass 4: Checking reference counts
      Inode 168 ref count is 21, should be 2.  Fix? no
      
      Pass 5: Checking group summary information
      
      lustre-MDT0000: ********** WARNING: Filesystem still has errors **********
      
      
               280 inodes used (0.28%, out of 100000)
                 7 non-contiguous files (2.5%)
                 0 non-contiguous directories (0.0%)
                   # of inodes with ind/dind/tind blocks: 1/0/0
             29638 blocks used (47.42%, out of 62500)
                 0 bad blocks
                 1 large file
      
               153 regular files
               118 directories
                 0 character device files
                 0 block device files
                 0 fifos
                 1 link
                 0 symbolic links (0 fast symbolic links)
                 0 sockets
      ------------
               272 files
      [root@vm1 tests]#
      
      

      The inode #168 counts all links in its nlink counter, but only two links are local:

      [root@vm1 tests]# debugfs -R "ncheck 168" /tmp/lustre-mdt1
      debugfs 1.42.13.wc6 (05-Feb-2017)
      Inode   Pathname
      168     /REMOTE_PARENT_DIR/0x200000404:0x1:0x0
      168     /ROOT/foo
      Segmentation fault (core dumped)
      [root@vm1 tests]#
      

      If we start the fs again

      [root@vm1 tests]# NOFORMAT=yes MDSCOUNT=4 OSTCOUNT=4 sh llmount.sh
      
      

      all 21 links are visible through lfs fid2path output:

      [root@vm1 tests]# lfs fid2path /mnt/lustre 0x200000404:0x1:0x0
      /mnt/lustre/foo
      /mnt/lustre/mdt1/foo-link-1
      /mnt/lustre/mdt1/foo-link-2
      /mnt/lustre/mdt1/foo-link-3
      /mnt/lustre/mdt1/foo-link-4
      /mnt/lustre/mdt1/foo-link-5
      /mnt/lustre/mdt1/foo-link-6
      /mnt/lustre/mdt1/foo-link-7
      /mnt/lustre/mdt1/foo-link-8
      /mnt/lustre/mdt1/foo-link-9
      /mnt/lustre/mdt1/foo-link-10
      /mnt/lustre/mdt1/foo-link-11
      /mnt/lustre/mdt1/foo-link-12
      /mnt/lustre/mdt1/foo-link-13
      /mnt/lustre/mdt1/foo-link-14
      /mnt/lustre/mdt1/foo-link-15
      /mnt/lustre/mdt1/foo-link-16
      /mnt/lustre/mdt1/foo-link-17
      /mnt/lustre/mdt1/foo-link-18
      /mnt/lustre/mdt1/foo-link-19
      /mnt/lustre/mdt1/foo-link-20
      [root@vm1 tests]#
      
      

      Attachments

        Issue Links

          Activity

            [LU-11446] ldiskfs inodes nlink mismatch with DNE

            >Is there any work remaining on this ticket?
            As Andreas said there is still work that can be done. The ideas are listed in the his posts above.

            artem_blagodarenko Artem Blagodarenko (Inactive) added a comment - >Is there any work remaining on this ticket? As Andreas said there is still work that can be done. The ideas are listed in the his posts above.
            pjones Peter Jones added a comment -

            Is there any work remaining on this ticket?

            pjones Peter Jones added a comment - Is there any work remaining on this ticket?

            Li Dongyang (dongyangli@ddn.com) merged in patch https://review.whamcloud.com/43231/
            Subject: LU-11446 e2fsck: check trusted.link when fixing nlink
            Project: tools/e2fsprogs
            Branch: master-lustre-test
            Current Patch Set:
            Commit: b9cbd54b4a9c1bef0362b9b84b3ab61da0025998

            gerrit Gerrit Updater added a comment - Li Dongyang (dongyangli@ddn.com) merged in patch https://review.whamcloud.com/43231/ Subject: LU-11446 e2fsck: check trusted.link when fixing nlink Project: tools/e2fsprogs Branch: master-lustre-test Current Patch Set: Commit: b9cbd54b4a9c1bef0362b9b84b3ab61da0025998

            Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/43324
            Subject: LU-11446 e2fsck: check trusted.link after linking inode
            Project: tools/e2fsprogs
            Branch: master-lustre
            Current Patch Set: 1
            Commit: 8b18eecebe743a0fd3894a2818fb030994ab8533

            gerrit Gerrit Updater added a comment - Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/43324 Subject: LU-11446 e2fsck: check trusted.link after linking inode Project: tools/e2fsprogs Branch: master-lustre Current Patch Set: 1 Commit: 8b18eecebe743a0fd3894a2818fb030994ab8533

            Li Dongyang (dongyangli@ddn.com) uploaded a new patch: https://review.whamcloud.com/43231
            Subject: LU-11446 e2fsck: check trusted.link when fixing nlink
            Project: tools/e2fsprogs
            Branch: master-lustre-test
            Current Patch Set: 1
            Commit: 4407cd4ac0595dc884a11b7b943cdb5ed9695d21

            gerrit Gerrit Updater added a comment - Li Dongyang (dongyangli@ddn.com) uploaded a new patch: https://review.whamcloud.com/43231 Subject: LU-11446 e2fsck: check trusted.link when fixing nlink Project: tools/e2fsprogs Branch: master-lustre-test Current Patch Set: 1 Commit: 4407cd4ac0595dc884a11b7b943cdb5ed9695d21

            The e2fsck patch is merged into 1.45.6.wc6, but the improvement to DNE nlink handling still needs to be done, so this ticket should not be closed yet.

            adilger Andreas Dilger added a comment - The e2fsck patch is merged into 1.45.6.wc6, but the improvement to DNE nlink handling still needs to be done, so this ticket should not be closed yet.

            Andreas Dilger (adilger@whamcloud.com) merged in patch https://review.whamcloud.com/43169/
            Subject: LU-11446 e2fsck: check trusted.link when fixing nlink
            Project: tools/e2fsprogs
            Branch: master-lustre
            Current Patch Set:
            Commit: 6528bce00beaa69d1d140452b1a8b84cc7e1f253

            gerrit Gerrit Updater added a comment - Andreas Dilger (adilger@whamcloud.com) merged in patch https://review.whamcloud.com/43169/ Subject: LU-11446 e2fsck: check trusted.link when fixing nlink Project: tools/e2fsprogs Branch: master-lustre Current Patch Set: Commit: 6528bce00beaa69d1d140452b1a8b84cc7e1f253

            Instead of changing the existing semantics of leh_reccount to hold the total link count it probably makes more sense to use reserved2 for leh_linkcount to store the total number of links. If this field is zero, then we depend on max(inode->i_links_count, leh_reccount) as the best-guess estimate of the distributed link count, but that cannot be totally accurate given the limitations on the trusted.link xattr size.

            adilger Andreas Dilger added a comment - Instead of changing the existing semantics of leh_reccount to hold the total link count it probably makes more sense to use reserved2 for leh_linkcount to store the total number of links. If this field is zero, then we depend on max(inode->i_links_count, leh_reccount) as the best-guess estimate of the distributed link count, but that cannot be totally accurate given the limitations on the trusted.link xattr size.

            Li Dongyang (dongyangli@ddn.com) uploaded a new patch: https://review.whamcloud.com/43169
            Subject: LU-11446 e2fsck: reference trusted.link xattr when fixing inode nlink
            Project: tools/e2fsprogs
            Branch: master-lustre
            Current Patch Set: 1
            Commit: 0b126f9655f25cfc5718fe5a9b34757377ab595f

            gerrit Gerrit Updater added a comment - Li Dongyang (dongyangli@ddn.com) uploaded a new patch: https://review.whamcloud.com/43169 Subject: LU-11446 e2fsck: reference trusted.link xattr when fixing inode nlink Project: tools/e2fsprogs Branch: master-lustre Current Patch Set: 1 Commit: 0b126f9655f25cfc5718fe5a9b34757377ab595f
            adilger Andreas Dilger added a comment - - edited

            Artem, have you given any thought to how we might handle this in a more transparent manner, separating the local disk nlink count from the distributed nlink count? Using the leh_reccount partially solves this problem, but the linkEA is not guaranteed to store all of the hard links to a file.

            While leh_reccount is a 32-bit value, it (currently) needs to match the number of entries in the list. That could possibly be fixed with some changes to the code (maybe a new magic?), and LFSCK, so leh_reccount always stored the total number of hard links, and the list might be shorter than this. We could base the list iteration on the size of the xattr and not the link count, or add in a separate field to the linkEA, or maybe to the LMA? Then, the MDS would not drop the last local link to a file until leh_recount became zero, instead of trusting the inode nlink count.

            I don't think storing all the hard links to a file in the link EA is practical as that will get very slow - 65000 links x 274 bytes/link = 17MB that needs to be rewritten on each update, and would break getxattr due to the size. Even using the full 64KiB xattr would allow at most (65536 - 24) / (2 + 16 + 8) = 2519 8-byte filenames or 1926 16-byte filenames, which is lower than we'd want for the maximum nlink count.

            adilger Andreas Dilger added a comment - - edited Artem, have you given any thought to how we might handle this in a more transparent manner, separating the local disk nlink count from the distributed nlink count? Using the leh_reccount partially solves this problem, but the linkEA is not guaranteed to store all of the hard links to a file. While leh_reccount is a 32-bit value, it (currently) needs to match the number of entries in the list. That could possibly be fixed with some changes to the code (maybe a new magic?), and LFSCK, so leh_reccount always stored the total number of hard links, and the list might be shorter than this. We could base the list iteration on the size of the xattr and not the link count, or add in a separate field to the linkEA, or maybe to the LMA? Then, the MDS would not drop the last local link to a file until leh_recount became zero, instead of trusting the inode nlink count. I don't think storing all the hard links to a file in the link EA is practical as that will get very slow - 65000 links x 274 bytes/link = 17MB that needs to be rewritten on each update, and would break getxattr due to the size. Even using the full 64KiB xattr would allow at most (65536 - 24) / (2 + 16 + 8) = 2519 8-byte filenames or 1926 16-byte filenames, which is lower than we'd want for the maximum nlink count.

            People

              ablagodarenko Artem Blagodarenko
              zam Alexander Zarochentsev
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated: