[LU-12869] interop: lustre-rsync-test test 2a fails with 'Failure in replication; differences found.' Created: 16/Oct/19  Updated: 12/Aug/20

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.12.3
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: James Nunez (Inactive) Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: interop

Issue Links:
Duplicate
duplicates LU-11426 2/2 Olafs agree: changelog entries ar... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

We have seen lustre-rsync-test test_2a fail with 'Failure in replication; differences found.' once for 2.10.8 servers and 2.12.3 RC1 testing; see https://testing.whamcloud.com/test_sets/93e5d5ae-eac2-11e9-9874-52540065bddc . Note that this test fails 100% of the time for PPC client testing. This ticket does not cover the PPC failures.

In the suite_log, we don’t see any obvious errors

Throughput 2.75769 MB/sec  2 clients  2 procs  max_latency=2835.173 ms
Lustre filesystem: lustre
MDT device: lustre-MDT0000
Source: /mnt/lustre
Target: /tmp/target
Target: /tmp/target2
Statuslog: /tmp/lustre_rsync.log
Changelog registration: cl3
Starting changelog record: 0
Clear changelog after use: no
Errors: 0
lustre_rsync took 168 seconds
Changelog records consumed: 7580
Only in /mnt/lustre/d2a.lustre-rsync-test/clients/client0: ~dmtmp
 lustre-rsync-test test_2a: @@@@@@ FAIL: Failure in replication; differences found. 
  Trace dump:
  = /usr/lib64/lustre/tests/test-framework.sh:5864:error()
  = /usr/lib64/lustre/tests/lustre-rsync-test.sh:122:check_diff()
  = /usr/lib64/lustre/tests/lustre-rsync-test.sh:293:test_2a()

Looking at the console logs, we don’t see any obvious errors.



 Comments   
Comment by Andreas Dilger [ 18/Oct/19 ]

It seems likely that this is caused by LU-11426 problems with ChangeLog records being skipped on the 2.10.8 server. There is only a single file difference in the target copy, which seems similar to that problem. If there was a serious problem, it would have had dozens of files different as is typically seen with lustre_rsync problems.

Comment by Alexander Zarochentsev [ 12/Aug/20 ]

recently seen not in interopt testing : https://testing.whamcloud.com/sub_tests/2a65106b-6525-459b-9933-f0548ebaf4e1

Generated at Sat Feb 10 02:56:21 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.