Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4256

lustre-rsync-test test_2b: Failure in replication; differences found

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.6.0, Lustre 2.8.0, Lustre 2.12.0
    • server: lustre-master build # 1752 RHEL6 ldiskfs
      client: 2.5.0 RHEL6 ldiskfs

      Also seen in review-dne with both server and client are RHEL6 ldiskfs
    • 3
    • 11613

    Description

      This issue was created by maloo for sarah <sarah@whamcloud.com>

      This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/45f323be-49e3-11e3-8efa-52540035b04c.

      The sub-test test_2b failed with the following error:

      test failed to respond and timed out

      test log shows:

      lustre_rsync took 22 seconds
      Changelog records consumed: 926
      Only in /mnt/lustre/d0.lustre-rsync-test/d2/clients/client0/~dmtmp/WORDPRO: BENCHS1A.PRN
       lustre-rsync-test test_2b: @@@@@@ FAIL: Failure in replication; differences found. 
      

      Attachments

        Issue Links

          Activity

            [LU-4256] lustre-rsync-test test_2b: Failure in replication; differences found

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33006/
            Subject: LU-4256 test: add lustre-rsync-test 2b to ALWAYS_EXCEPT
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: e253179a26f48716702891208772543142579ce1

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33006/ Subject: LU-4256 test: add lustre-rsync-test 2b to ALWAYS_EXCEPT Project: fs/lustre-release Branch: master Current Patch Set: Commit: e253179a26f48716702891208772543142579ce1

            This failed about 11x per week for the past 4 weeks.

            I suspect the test timeout is because the cleanup_src_tgt step is taking a long time to do rm -rf $DIR/$tdir after running dbench, though that doesn't absolve the original error. At least there aren't any Lustre errors in the console logs for any of the nodes.

            The recent failures have a lot of the following errors, though I'm not sure if these are new or not:

            Error replicating  xattr for /tmp/target/d2b.lustre-rsync-test/clients/client0/~dmtmp/WORD/TIPS.DOC: 2
            Error replicating  xattr for /tmp/target/d2b.lustre-rsync-test/clients/client0/~dmtmp/WORD/TIPS.DOC: 2
            Error replicating  xattr for /tmp/target/d2b.lustre-rsync-test/clients/client0/~dmtmp/WORD/TIPS.DOC: 2
            
            adilger Andreas Dilger added a comment - This failed about 11x per week for the past 4 weeks. I suspect the test timeout is because the cleanup_src_tgt step is taking a long time to do rm -rf $DIR/$tdir after running dbench, though that doesn't absolve the original error. At least there aren't any Lustre errors in the console logs for any of the nodes. The recent failures have a lot of the following errors, though I'm not sure if these are new or not: Error replicating xattr for /tmp/target/d2b.lustre-rsync-test/clients/client0/~dmtmp/WORD/TIPS.DOC: 2 Error replicating xattr for /tmp/target/d2b.lustre-rsync-test/clients/client0/~dmtmp/WORD/TIPS.DOC: 2 Error replicating xattr for /tmp/target/d2b.lustre-rsync-test/clients/client0/~dmtmp/WORD/TIPS.DOC: 2

            John L. Hammond (jhammond@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33006
            Subject: LU-4256 test: add lustre-rsync-test 2b to ALWAYS_EXCEPT
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 2e8251b3b66c98c539a2fcbaf3c4b7da7fdfd81d

            gerrit Gerrit Updater added a comment - John L. Hammond (jhammond@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33006 Subject: LU-4256 test: add lustre-rsync-test 2b to ALWAYS_EXCEPT Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 2e8251b3b66c98c539a2fcbaf3c4b7da7fdfd81d
            tappro Mikhail Pershin added a comment - on master: https://testing.whamcloud.com/test_sets/a76a4428-9025-11e8-a9f7-52540065bddc
            hongchao.zhang Hongchao Zhang added a comment - +1 on master https://testing.hpdd.intel.com/test_sets/7c5aead0-6262-11e8-b303-52540065bddc
            emoly.liu Emoly Liu added a comment - +1 on master: https://testing.hpdd.intel.com/test_logs/1161528c-4a2a-11e8-95c0-52540065bddc/show_text
            niu Niu Yawei (Inactive) added a comment - +1 on master review: https://testing.hpdd.intel.com/test_sets/3c38bf1a-7f4f-11e6-8a8c-5254006e85c2

            Another instance found for interop tag 2.7.66 -2.5.5 Server/EL6.7 Client, build# 3316
            https://testing.hpdd.intel.com/test_sets/2f54e8ae-ccf9-11e5-b1fa-5254006e85c2

            standan Saurabh Tandan (Inactive) added a comment - Another instance found for interop tag 2.7.66 -2.5.5 Server/EL6.7 Client, build# 3316 https://testing.hpdd.intel.com/test_sets/2f54e8ae-ccf9-11e5-b1fa-5254006e85c2
            jamesanunez James Nunez (Inactive) added a comment - - edited A few more recent occurrences (non-interop, review-dne-part-1): 2015-07-21 03:39:41 - https://testing.hpdd.intel.com/test_sets/36ca7078-2f69-11e5-ad00-5254006e85c2 2015-07-21 14:18:36 - https://testing.hpdd.intel.com/test_sets/1ede63bc-2fc2-11e5-ad00-5254006e85c2 2015-10-25 19:49:39 - https://testing.hpdd.intel.com/test_sets/b915e5c6-7b5e-11e5-9ee6-5254006e85c2 2016-02-03 14:36:57 - https://testing.hpdd.intel.com/test_sets/d0a0de2c-ca90-11e5-9609-5254006e85c2

            I think this is another:
            https://testing.hpdd.intel.com/test_sets/3efa00dc-6fdf-11e4-85fc-5254006e85c2

            It reports as a TIMEOUT, not a FAIL, but the test log says:

            Only in /mnt/lustre/d2b.lustre-rsync-test/clients/client1/~dmtmp/PARADOX: __QB4.MB
            lustre-rsync-test test_2b: @@@@@@ FAIL: Failure in replication; differences found.

            bogl Bob Glossman (Inactive) added a comment - I think this is another: https://testing.hpdd.intel.com/test_sets/3efa00dc-6fdf-11e4-85fc-5254006e85c2 It reports as a TIMEOUT, not a FAIL, but the test log says: Only in /mnt/lustre/d2b.lustre-rsync-test/clients/client1/~dmtmp/PARADOX: __QB4.MB lustre-rsync-test test_2b: @@@@@@ FAIL: Failure in replication; differences found.

            People

              wc-triage WC Triage
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              16 Start watching this issue

              Dates

                Created:
                Updated: