[LU-2866] Interop 2.1.4<->2.4 failure on test suite lustre-rsync-test test_1: Failure in replication; differences found Created: 26/Feb/13 Updated: 11/Jul/13 Resolved: 24/Mar/13 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Maloo | Assignee: | Yang Sheng |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | MB | ||
| Environment: |
client: 2.1.4 RHEL6 |
||
| Issue Links: |
|
||||||||||||||||
| Severity: | 3 | ||||||||||||||||
| Rank (Obsolete): | 6932 | ||||||||||||||||
| Description |
|
This issue was created by maloo for sarah <sarah@whamcloud.com> This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/2277b80e-7d52-11e2-a108-52540035b04c. The sub-test test_1 failed with the following error:
client-1 test log shows: == lustre-rsync-test test 1: Simple Replication == 04:47:54 (1361537274) CMD: client-24vm3 lctl --device lustre-MDT0000 changelog_register -n lustre-MDT0000: Registered changelog user cl1 CMD: client-24vm3 lctl get_param -n mdd.lustre-MDT0000.changelog_users Replication #1 Replication of operation failed(-21): 15 UNLNK (6) [0x200000400:0x13:0x0] [0x200000400:0xd:0x0] file4 Lustre filesystem: lustre MDT device: lustre-MDT0000 Source: /mnt/lustre Target: /tmp/target Target: /tmp/target2 Statuslog: /tmp/lustre_rsync.log Changelog registration: cl1 Starting changelog record: 0 Errors: 1 lustre_rsync took 1 seconds Changelog records consumed: 18 Replication #2 Lustre filesystem: lustre MDT device: lustre-MDT0000 Source: /mnt/lustre Target: /tmp/target Target: /tmp/target2 Statuslog: /tmp/lustre_rsync.log Changelog registration: cl1 Starting changelog record: 18 Errors: 0 lustre_rsync took 0 seconds Changelog records consumed: 4 Only in /tmp/target/d0.lustre-rsync-test/d1/d1: file4 Only in /mnt/lustre/d0.lustre-rsync-test/d1: d2 lustre-rsync-test test_1: @@@@@@ FAIL: Failure in replication; differences found. Dumping lctl log to /logdir/test_logs/2013-02-21/lustre-master-el6-x86_64-vs-lustre-b2_1-el6-x86_64--full--2_5_1__1269__-70264313904840-171517/lustre-rsync- client console shows: 04:48:00:Lustre: DEBUG MARKER: == lustre-rsync-test test 1: Simple Replication == 04:47:54 (1361537274) 04:48:00:LustreError: 23091:0:(lmv_obd.c:855:lmv_iocontrol()) error: iocontrol MDC lustre-MDT0000_UUID on MDTidx 0 cmd c0086696: err = -2 04:48:00:LustreError: 23091:0:(lmv_obd.c:855:lmv_iocontrol()) error: iocontrol MDC lustre-MDT0000_UUID on MDTidx 0 cmd c0086696: err = -2 04:48:00:Lustre: DEBUG MARKER: /usr/sbin/lctl mark lustre-rsync-test test_1: @@@@@@ FAIL: Failure in replication\; differences found. 04:48:00:Lustre: DEBUG MARKER: lustre-rsync-test test_1: @@@@@@ FAIL: Failure in replication; differences found. |
| Comments |
| Comment by Sarah Liu [ 26/Feb/13 ] |
|
Also found following errors in subtest test_2b == lustre-rsync-test test 2b: Replicate files changed by dbench. == 04:49:54 (1361537394) CMD: client-24vm3 lctl --device lustre-MDT0000 changelog_register -n lustre-MDT0000: Registered changelog user cl3 CMD: client-24vm3 lctl get_param -n mdd.lustre-MDT0000.changelog_users Other changelog users present (3) copying /usr/share/dbench/client.txt to /mnt/lustre/d0.lustre-rsync-test/d2/client.txt copying necessary libs to /mnt/lustre/d0.lustre-rsync-test/d2 lib64/libpopt.so.0 lib64/libc.so.6 lib64/ld-linux-x86-64.so.2 running 'dbench 2 -t 60' in /mnt/lustre/d0.lustre-rsync-test/d2 at Fri Feb 22 04:49:56 PST 2013 dbench PID=24008 dbench version 4.00 - Copyright Andrew Tridgell 1999-2004 Running for 60 seconds with load 'client.txt' and minimum warmup 12 secs 0 of 2 processes prepared for launch 0 sec 2 of 2 processes prepared for launch 0 sec releasing clients 2 453 31.81 MB/sec warmup 1 sec latency 14.306 ms 2 833 21.56 MB/sec warmup 2 sec latency 202.565 ms 2 960 14.56 MB/sec warmup 3 sec latency 741.947 ms 2 1450 13.01 MB/sec warmup 4 sec latency 79.735 ms 2 1977 10.58 MB/sec warmup 5 sec latency 154.946 ms 2 2451 9.37 MB/sec warmup 6 sec latency 76.113 ms 2 3097 9.66 MB/sec warmup 7 sec latency 87.938 ms 2 3686 9.68 MB/sec warmup 8 sec latency 29.801 ms 2 4002 8.90 MB/sec warmup 9 sec latency 13.199 ms 2 4303 8.09 MB/sec warmup 10 sec latency 106.175 ms 2 4618 7.64 MB/sec warmup 11 sec latency 127.999 ms 2 5662 1.07 MB/sec execute 1 sec latency 62.161 ms 2 6086 2.70 MB/sec execute 2 sec latency 118.249 ms 2 6839 6.15 MB/sec execute 3 sec latency 35.459 ms 2 7348 6.82 MB/sec execute 4 sec latency 14.272 ms 2 7648 5.59 MB/sec execute 5 sec latency 13.883 ms 2 7960 4.86 MB/sec execute 6 sec latency 126.676 ms PIDs: 24008 24009 24010 Stopping dbench Starting replication Replication of operation failed(-39): 5489 RNMFM (8) [0:0x0:0x0] [0x200000400:0x9a8:0x0] Replication of operation failed(-39): 5499 RNMFM (8) [0:0x0:0x0] [0x200000400:0x9a9:0x0] Replication of operation failed(-22): 5510 RNMFM (8) [0:0x0:0x0] [0x200000400:0x9a8:0x0] Replication of operation failed(-21): 5520 UNLNK (6) [0x200000400:0xa25:0x0] [0x200000400:0x9a8:0x0] PPTC321.TMP Replication of operation failed(-22): 5522 RNMFM (8) [0:0x0:0x0] [0x200000400:0x9a8:0x0] Replication of operation failed(-21): 5525 UNLNK (6) [0x200000400:0xa21:0x0] [0x200000400:0x9a8:0x0] PPTC321.TMP Replication of operation failed(-22): 5534 RNMFM (8) [0:0x0:0x0] [0x200000400:0x99a:0x0] Replication of operation failed(-39): 5548 RNMFM (8) [0:0x0:0x0] [0x200000400:0x99b:0x0] Replication of operation failed(-22): 5600 RNMFM (8) [0:0x0:0x0] [0x200000400:0x9dd:0x0] Replication of operation failed(-22): 5610 RNMFM (8) [0:0x0:0x0] [0x200000400:0x9dc:0x0] Replication of operation failed(-39): 5643 RNMFM (8) [0:0x0:0x0] [0x200000400:0x9a2:0x0] Replication of operation failed(-39): 5649 RNMFM (8) [0:0x0:0x0] [0x200000400:0x9a3:0x0] Replication of operation failed(-39): 5661 RNMFM (8) [0:0x0:0x0] [0x200000400:0x9a2:0x0] Replication of operation failed(-39): 5710 RNMFM (8) [0:0x0:0x0] [0x200000400:0x9cc:0x0] Replication of operation failed(-22): 5729 RNMFM (8) [0:0x0:0x0] [0x200000400:0x9cc:0x0] Replication of operation failed(-21): 5776 UNLNK (6) [0x200000400:0xa9c:0x0] [0x200000400:0xa07:0x0] FASTENER.LDB Replication of operation failed(-22): 5799 RNMFM (8) [0:0x0:0x0] [0x200000400:0x9b4:0x0] Replication of operation failed(-22): 5801 RNMFM (8) [0:0x0:0x0] [0x200000400:0x9b4:0x0] Replication of operation failed(-22): 5814 RNMFM (8) [0:0x0:0x0] [0x200000400:0x9b4:0x0] Replication of operation failed(-22): 5817 RNMFM (8) [0:0x0:0x0] [0x200000400:0x9b4:0x0] Replication of operation failed(-21): 5819 UNLNK (6) [0x200000400:0xabd:0x0] [0x200000400:0x9b4:0x0] @@@CDRW.TMP Replication of operation failed(-39): 6087 RNMFM (8) [0:0x0:0x0] [0x200000400:0x99a:0x0] Replication of operation failed(-39): 6093 RNMFM (8) [0:0x0:0x0] [0x200000400:0x99a:0x0] Replication of operation failed(-39): 6095 RNMFM (8) [0:0x0:0x0] [0x200000400:0x99a:0x0] Replication of operation failed(-39): 6105 RNMFM (8) [0:0x0:0x0] [0x200000400:0x99b:0x0] Replication of operation failed(-39): 6111 RNMFM (8) [0:0x0:0x0] [0x200000400:0x99b:0x0] Replication of operation failed(-39): 6113 RNMFM (8) [0:0x0:0x0] [0x200000400:0x99b:0x0] Replication of operation failed(-21): 6118 UNLNK (6) [0x200000400:0xb33:0x0] [0x200000400:0x99a:0x0] FSA4230.TMP Replication of operation failed(-39): 6153 RNMFM (8) [0:0x0:0x0] [0x200000400:0x9dd:0x0] Replication of operation failed(-39): 6307 RNMFM (8) [0:0x0:0x0] [0x200000400:0x9cd:0x0] Replication of operation failed(-21): 6329 UNLNK (6) [0x200000400:0xb95:0x0] [0x200000400:0xa07:0x0] FASTENER.LDB Replication of operation failed(-21): 6336 UNLNK (6) [0x200000400:0xba4:0x0] [0x200000400:0x9b4:0x0] @@@CDRW.TMP Replication of operation failed(-21): 6340 UNLNK (6) [0x200000400:0xba7:0x0] [0x200000400:0x9b4:0x0] @@@CDRW.TMP Replication of operation failed(-21): 6343 UNLNK (6) [0x200000400:0xba9:0x0] [0x200000400:0x9b4:0x0] @@@CDRW.TMP Replication of operation failed(-21): 6371 UNLNK (6) [0x200000400:0xbb6:0x0] [0x200000400:0x9d4:0x0] PMC162.TMP Lustre filesystem: lustre MDT device: lustre-MDT0000 Source: /mnt/lustre Target: /tmp/target Target: /tmp/target2 Statuslog: /tmp/lustre_rsync.log Changelog registration: cl3 Starting changelog record: 0 Errors: 35 lustre_rsync took 108 seconds Changelog records consumed: 6602 Only in /mnt/lustre/d0.lustre-rsync-test/d2/clients/client0/~dmtmp: COREL Only in /mnt/lustre/d0.lustre-rsync-test/d2/clients/client0/~dmtmp: EXCEL Only in /tmp/target/d0.lustre-rsync-test/d2/clients/client0/~dmtmp/PM: PMC162.TMP Only in /mnt/lustre/d0.lustre-rsync-test/d2/clients/client0/~dmtmp: PWRPNT Only in /mnt/lustre/d0.lustre-rsync-test/d2/clients/client0/~dmtmp: WORD Only in /mnt/lustre/d0.lustre-rsync-test/d2/clients/client0/~dmtmp: WORDPRO Only in /tmp/target/d0.lustre-rsync-test/d2/clients/client1/~dmtmp/ACCESS: FASTENER.LDB Only in /mnt/lustre/d0.lustre-rsync-test/d2/clients/client1/~dmtmp: COREL Only in /mnt/lustre/d0.lustre-rsync-test/d2/clients/client1/~dmtmp: EXCEL Only in /mnt/lustre/d0.lustre-rsync-test/d2/clients/client1/~dmtmp: PWRPNT Only in /tmp/target/d0.lustre-rsync-test/d2/clients/client1/~dmtmp/WORD: FSA4230.TMP Only in /mnt/lustre/d0.lustre-rsync-test/d2/clients/client1/~dmtmp: WORDPRO lustre-rsync-test test_2b: @@@@@@ FAIL: Failure in replication; differences found. |
| Comment by Sarah Liu [ 13/Mar/13 ] |
|
another instance: https://maloo.whamcloud.com/test_sets/bf2638e6-8906-11e2-b643-52540035b04c |
| Comment by Yang Sheng [ 13/Mar/13 ] |
|
I am still trying to reproduce it. |
| Comment by Yang Sheng [ 15/Mar/13 ] |
|
Hi, Sarah, have any special while the test running? I cannot reproduce it anyway. |
| Comment by Sarah Liu [ 17/Mar/13 ] |
|
Hi Yang Sheng, I can reproduce it on my test nodes, I just copy the test-framework.sh from server(2.4) to client(2.1.4), otherwise mkfs.lustre cannot work: |
| Comment by Yang Sheng [ 19/Mar/13 ] |
|
Ah, Obviously it isn't standarad operation, has a ticket for mkfs.lustre issue? |
| Comment by Sarah Liu [ 19/Mar/13 ] |
|
Yang Sheng, I think this is the ticket |
| Comment by Oleg Drokin [ 19/Mar/13 ] |
|
So, I see |
| Comment by Yang Sheng [ 20/Mar/13 ] |
|
I think this issue should cause by 2.1.4-rc2 hasn't included http://review.whamcloud.com/#change,3835 So rename changelog is different between 2.1 & 2.4. Just include this patch else we have to stop the rsync in interop situation. |
| Comment by Sarah Liu [ 21/Mar/13 ] |
|
verified with the latest b2_1 build #191 and it passed: https://maloo.whamcloud.com/test_sessions/f2de5456-9244-11e2-b7d5-52540035b04c |
| Comment by Peter Jones [ 24/Mar/13 ] |
|
ok so I am closing this as a duplicate of |