Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3573

lustre-rsync-test test_8: @@@@@@ FAIL: Failure in replication; differences found.

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.7.0, Lustre 2.5.4
    • Lustre 2.6.0, Lustre 2.5.1, Lustre 2.7.0, Lustre 2.5.3
    • a patch pushed to autotest
    • 3
    • 9026

    Description

      On ZFS an error was seen with lustre-rsync test 8 the logs are here:
      https://maloo.whamcloud.com/test_sets/e3492890-e901-11e2-ae91-52540035b04c

      I don't really know how to read lrsync_log.

      The test error reports a very basic:

      == lustre-rsync-test test 8: Replicate multiple file/directory moves == 16:00:59 (1373410859)
      CMD: wtm-10vm7 lctl --device lustre-MDT0000 changelog_register -n
      lustre-MDT0000: Registered changelog user cl13
      CMD: wtm-10vm7 lctl get_param -n mdd.lustre-MDT0000.changelog_users
      Lustre filesystem: lustre
      MDT device: lustre-MDT0000
      Source: /mnt/lustre
      Target: /tmp/target
      Statuslog: /tmp/lustre_rsync.log
      Changelog registration: cl13
      Starting changelog record: 0
      Clear changelog after use: no
      Errors: 0
      lustre_rsync took 107 seconds
      Changelog records consumed: 1881
      Only in /tmp/target/d0.lustre-rsync-test/d8/d08/d083: a3
       lustre-rsync-test test_8: @@@@@@ FAIL: Failure in replication; differences found. 
        Trace dump:
        = /usr/lib64/lustre/tests/test-framework.sh:4066:error_noexit()
        = /usr/lib64/lustre/tests/test-framework.sh:4093:error()
        .....
      

      Out of the last 100 runs it reports 1 error so it could be related to the base patch or a rare error.

      Attachments

        Issue Links

          Activity

            [LU-3573] lustre-rsync-test test_8: @@@@@@ FAIL: Failure in replication; differences found.

            Patch landed to Master. b2_5 patch ready to land and tracked outside of this ticket.

            jlevi Jodi Levi (Inactive) added a comment - Patch landed to Master. b2_5 patch ready to land and tracked outside of this ticket.

            Unfortunately, version 2 of the http://review.whamcloud.com/12582 patch failed review-zfs conf-sanity test_32b with a real problem that is now causing serious test failures on master. The fact that it failed on the unlanded patch, when no other tests were failing before that time makes it very likely that this patch caused the problem. I've opened LU-5924 for that issue, but we might consider reverting this patch if it cannot be addressed quickly.

            adilger Andreas Dilger added a comment - Unfortunately, version 2 of the http://review.whamcloud.com/12582 patch failed review-zfs conf-sanity test_32b with a real problem that is now causing serious test failures on master. The fact that it failed on the unlanded patch, when no other tests were failing before that time makes it very likely that this patch caused the problem. I've opened LU-5924 for that issue, but we might consider reverting this patch if it cannot be addressed quickly.
            bogl Bob Glossman (Inactive) added a comment - in b2_5: http://review.whamcloud.com/12649
            pjones Peter Jones added a comment -

            Fix landed for 2.7

            pjones Peter Jones added a comment - Fix landed for 2.7
            utopiabound Nathaniel Clark added a comment - http://review.whamcloud.com/12582
            yujian Jian Yu added a comment -

            One more instance on Lustre b2_5 branch with FSTYPE=zfs: https://testing.hpdd.intel.com/test_sets/5310f46c-61fd-11e4-bd1f-5254006e85c2

            yujian Jian Yu added a comment - One more instance on Lustre b2_5 branch with FSTYPE=zfs: https://testing.hpdd.intel.com/test_sets/5310f46c-61fd-11e4-bd1f-5254006e85c2

            The inode is not 0, the inode is tied to the file and this issue is with the name in that directory.
            I can delete the file and create a new one, and it will still be hidden. It must be in how ZFS stores the name, I'm trying to debug that now.

            utopiabound Nathaniel Clark added a comment - The inode is not 0, the inode is tied to the file and this issue is with the name in that directory. I can delete the file and create a new one, and it will still be hidden. It must be in how ZFS stores the name, I'm trying to debug that now.

            Does ZFS have whiteout entries, or is there some problem with the hashing that prevents the directory entry from appearing? I know that "ls" will not show entries that have inode == 0, but the inode number should be independent of the filename, so that wouldn't behave in this manner. Maybe there is a ZFS "hidden" flag that is not being initialized correctly and in some cases this flag is set? Does anything appear with "strace" or with "zdb"?

            adilger Andreas Dilger added a comment - Does ZFS have whiteout entries, or is there some problem with the hashing that prevents the directory entry from appearing? I know that "ls" will not show entries that have inode == 0, but the inode number should be independent of the filename, so that wouldn't behave in this manner. Maybe there is a ZFS "hidden" flag that is not being initialized correctly and in some cases this flag is set? Does anything appear with "strace" or with "zdb"?

            The file itself isn't hidden, the name is "cloaked".
            In this example d074/a9 has been "lost", notice that a9 will never appear to ls:

            [root@lubuilder d074]# ls
            a1  a2  a3  a4  a5  a6  a7  a8  b0  b1  b2  b3  b4  b5  b6  b7  b8  b9  c0
            [root@lubuilder d074]# mv a9 a9a
            [root@lubuilder d074]# ls
            a1  a2  a3  a4  a5  a6  a7  a8  a9a  b0  b1  b2  b3  b4  b5  b6  b7  b8  b9  c0
            [root@lubuilder d074]# mv a9a a9
            [root@lubuilder d074]# ls
            a1  a2  a3  a4  a5  a6  a7  a8  b0  b1  b2  b3  b4  b5  b6  b7  b8  b9  c0
            [root@lubuilder d074]# mv a9 a9a
            [root@lubuilder d074]# ls
            a1  a2  a3  a4  a5  a6  a7  a8  a9a  b0  b1  b2  b3  b4  b5  b6  b7  b8  b9  c0
            [root@lubuilder d074]# mv a8 a9
            [root@lubuilder d074]# ls
            a1  a2  a3  a4  a5  a6  a7  a9a  b0  b1  b2  b3  b4  b5  b6  b7  b8  b9  c0
            [root@lubuilder d074]# mv a9 a8
            [root@lubuilder d074]# ls
            a1  a2  a3  a4  a5  a6  a7  a8  a9a  b0  b1  b2  b3  b4  b5  b6  b7  b8  b9  c0
            
            utopiabound Nathaniel Clark added a comment - The file itself isn't hidden, the name is "cloaked". In this example d074/a9 has been "lost", notice that a9 will never appear to ls: [root@lubuilder d074]# ls a1 a2 a3 a4 a5 a6 a7 a8 b0 b1 b2 b3 b4 b5 b6 b7 b8 b9 c0 [root@lubuilder d074]# mv a9 a9a [root@lubuilder d074]# ls a1 a2 a3 a4 a5 a6 a7 a8 a9a b0 b1 b2 b3 b4 b5 b6 b7 b8 b9 c0 [root@lubuilder d074]# mv a9a a9 [root@lubuilder d074]# ls a1 a2 a3 a4 a5 a6 a7 a8 b0 b1 b2 b3 b4 b5 b6 b7 b8 b9 c0 [root@lubuilder d074]# mv a9 a9a [root@lubuilder d074]# ls a1 a2 a3 a4 a5 a6 a7 a8 a9a b0 b1 b2 b3 b4 b5 b6 b7 b8 b9 c0 [root@lubuilder d074]# mv a8 a9 [root@lubuilder d074]# ls a1 a2 a3 a4 a5 a6 a7 a9a b0 b1 b2 b3 b4 b5 b6 b7 b8 b9 c0 [root@lubuilder d074]# mv a9 a8 [root@lubuilder d074]# ls a1 a2 a3 a4 a5 a6 a7 a8 a9a b0 b1 b2 b3 b4 b5 b6 b7 b8 b9 c0
            yujian Jian Yu added a comment - One more instance on master branch: https://testing.hpdd.intel.com/test_sets/a2161baa-5bea-11e4-a35f-5254006e85c2

            Doing update restore using LUDOC-161 information (zfs send/recv) instead of tar works much better. It shows no change after restore, the file is still missing form lustre client.

            utopiabound Nathaniel Clark added a comment - Doing update restore using LUDOC-161 information (zfs send/recv) instead of tar works much better. It shows no change after restore, the file is still missing form lustre client.

            People

              utopiabound Nathaniel Clark
              keith Keith Mannthey (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: