[LU-4256] lustre-rsync-test test_2b: Failure in replication; differences found Created: 15/Nov/13  Updated: 04/Sep/18

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.6.0, Lustre 2.8.0, Lustre 2.12.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: always_except
Environment:

server: lustre-master build # 1752 RHEL6 ldiskfs
client: 2.5.0 RHEL6 ldiskfs

Also seen in review-dne with both server and client are RHEL6 ldiskfs


Issue Links:
Duplicate
is duplicated by LU-4978 Failure on test suite lustre-rsync-te... Resolved
Related
is related to LU-4781 lustre-rsync-test test_2b: Replicatio... Resolved
Severity: 3
Rank (Obsolete): 11613

 Description   

This issue was created by maloo for sarah <sarah@whamcloud.com>

This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/45f323be-49e3-11e3-8efa-52540035b04c.

The sub-test test_2b failed with the following error:

test failed to respond and timed out

test log shows:

lustre_rsync took 22 seconds
Changelog records consumed: 926
Only in /mnt/lustre/d0.lustre-rsync-test/d2/clients/client0/~dmtmp/WORDPRO: BENCHS1A.PRN
 lustre-rsync-test test_2b: @@@@@@ FAIL: Failure in replication; differences found. 


 Comments   
Comment by nasf (Inactive) [ 06/Jan/14 ]

I found the same failure under non interoperability mode.
https://maloo.whamcloud.com/test_sets/f0019a40-7682-11e3-bf7d-52540035b04c
https://maloo.whamcloud.com/test_sets/ec956efe-76a5-11e3-8c14-52540035b04c

Comment by Sarah Liu [ 11/Feb/14 ]

also seen in lustre-master build #1876
server: RHEL6 ldiskfs
client: SLES11 SP3 ldiskfs

https://maloo.whamcloud.com/test_sets/7edd3618-90d0-11e3-91ee-52540035b04c

Comment by Andreas Dilger [ 18/Mar/14 ]

This is related to, but looks different than, LU-4781. This bug just reports a replication difference, while LU-4781 reports errors during replication.

Comment by nasf (Inactive) [ 25/Apr/14 ]

Another failure instance:

https://maloo.whamcloud.com/test_sets/24f9ee20-cc4e-11e3-bda1-52540035b04c

Comment by Bob Glossman (Inactive) [ 19/Nov/14 ]

I think this is another:
https://testing.hpdd.intel.com/test_sets/3efa00dc-6fdf-11e4-85fc-5254006e85c2

It reports as a TIMEOUT, not a FAIL, but the test log says:

Only in /mnt/lustre/d2b.lustre-rsync-test/clients/client1/~dmtmp/PARADOX: __QB4.MB
lustre-rsync-test test_2b: @@@@@@ FAIL: Failure in replication; differences found.

Comment by James Nunez (Inactive) [ 21/Jul/15 ]

A few more recent occurrences (non-interop, review-dne-part-1):
2015-07-21 03:39:41 - https://testing.hpdd.intel.com/test_sets/36ca7078-2f69-11e5-ad00-5254006e85c2
2015-07-21 14:18:36 - https://testing.hpdd.intel.com/test_sets/1ede63bc-2fc2-11e5-ad00-5254006e85c2
2015-10-25 19:49:39 - https://testing.hpdd.intel.com/test_sets/b915e5c6-7b5e-11e5-9ee6-5254006e85c2
2016-02-03 14:36:57 - https://testing.hpdd.intel.com/test_sets/d0a0de2c-ca90-11e5-9609-5254006e85c2

Comment by Saurabh Tandan (Inactive) [ 10/Feb/16 ]

Another instance found for interop tag 2.7.66 -2.5.5 Server/EL6.7 Client, build# 3316
https://testing.hpdd.intel.com/test_sets/2f54e8ae-ccf9-11e5-b1fa-5254006e85c2

Comment by Niu Yawei (Inactive) [ 21/Sep/16 ]

+1 on master review: https://testing.hpdd.intel.com/test_sets/3c38bf1a-7f4f-11e6-8a8c-5254006e85c2

Comment by Emoly Liu [ 28/Apr/18 ]

+1 on master:
https://testing.hpdd.intel.com/test_logs/1161528c-4a2a-11e8-95c0-52540065bddc/show_text

Comment by Hongchao Zhang [ 29/May/18 ]

+1 on master
https://testing.hpdd.intel.com/test_sets/7c5aead0-6262-11e8-b303-52540065bddc

Comment by Mikhail Pershin [ 26/Jul/18 ]

on master:
https://testing.whamcloud.com/test_sets/a76a4428-9025-11e8-a9f7-52540065bddc

Comment by Gerrit Updater [ 15/Aug/18 ]

John L. Hammond (jhammond@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33006
Subject: LU-4256 test: add lustre-rsync-test 2b to ALWAYS_EXCEPT
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 2e8251b3b66c98c539a2fcbaf3c4b7da7fdfd81d

Comment by Andreas Dilger [ 15/Aug/18 ]

This failed about 11x per week for the past 4 weeks.

I suspect the test timeout is because the cleanup_src_tgt step is taking a long time to do rm -rf $DIR/$tdir after running dbench, though that doesn't absolve the original error. At least there aren't any Lustre errors in the console logs for any of the nodes.

The recent failures have a lot of the following errors, though I'm not sure if these are new or not:

Error replicating  xattr for /tmp/target/d2b.lustre-rsync-test/clients/client0/~dmtmp/WORD/TIPS.DOC: 2
Error replicating  xattr for /tmp/target/d2b.lustre-rsync-test/clients/client0/~dmtmp/WORD/TIPS.DOC: 2
Error replicating  xattr for /tmp/target/d2b.lustre-rsync-test/clients/client0/~dmtmp/WORD/TIPS.DOC: 2
Comment by Gerrit Updater [ 23/Aug/18 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33006/
Subject: LU-4256 test: add lustre-rsync-test 2b to ALWAYS_EXCEPT
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: e253179a26f48716702891208772543142579ce1

Generated at Sat Feb 10 01:41:04 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.