Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-16489

lustre-rsync-test test_2c: Failure in replication; differences found

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.16.0
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for Timothy Day <timday@amazon.com>

      This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/ed7b15d4-da4c-45e5-941d-171b43889c4c

      test_2c failed with the following error:

      Failure in replication; differences found.

      Output:

      Errors: 0
      lustre_rsync took 1 seconds
      Changelog records consumed: 173
      Running lustre_rsync
      Lustre filesystem: lustre
      MDT device: lustre-MDT0000
      Source: /mnt/lustre
      Target: /tmp/target
      Target: /tmp/target2
      Statuslog: /tmp/lustre_rsync.log
      Changelog registration: cl4
      Starting changelog record: 7634
      Clear changelog after use: no
      Errors: 0
      lustre_rsync took 1 seconds
      Changelog records consumed: 2
      Only in /tmp/target/d2c.lustre-rsync-test/clients/client0/~dmtmp/WORDPRO: BENCHS1.LWP
      Only in /tmp/target/d2c.lustre-rsync-test/clients/client0/~dmtmp/WORDPRO: NEWS1_1A.LWP
      lustre-rsync-test test_2c: @@@@@@ FAIL: Failure in replication; differences found. 
      Trace dump:
      = /usr/lib64/lustre/tests/test-framework.sh:6549:error()
      = /usr/lib64/lustre/tests/lustre-rsync-test.sh:113:check_diff()
      = /usr/lib64/lustre/tests/lustre-rsync-test.sh:370:test_2c()
      = /usr/lib64/lustre/tests/test-framework.sh:6887:run_one()
      = /usr/lib64/lustre/tests/test-framework.sh:6937:run_one_logged()
      = /usr/lib64/lustre/tests/test-framework.sh:6773:run_test()
      = /usr/lib64/lustre/tests/lustre-rsync-test.sh:377:main()
      Dumping lctl log to /autotest/autotest-1/2023-01-17/lustre-reviews_review-dne-zfs-part-5_91682_16_90396e57-368d-4a37-bcb7-90f078a84f20//lustre-rsync-test.test_2c.*.1673986148.log
      CMD: onyx-110vm6,onyx-110vm7,onyx-49vm6.onyx.whamcloud.com,onyx-49vm7,onyx-82vm6 /usr/sbin/lctl dk > /autotest/autotest-1/2023-01-17/lustre-reviews_review-dne-zfs-part-5_91682_16_90396e57-368d-4a37-bcb7-90f078a84f20//lustre-rsync-test.test_2c.debug_log.\$(hostname -s).1673986148.log;
      dmesg > /autotest/autotest-1/2023-01-17/lustre-reviews_review-dne-zfs-part-5_91682_16_90396e57-368d-4a37-bcb7-90f078a84f20//lustre-rsync-test.test_2c.dmesg.\$(hostname -s).1673986148.log
      CMD: onyx-110vm7 /usr/sbin/lctl get_param -n mdd.lustre-MDT0003.changelog_users
      lustre-MDT0003: clear the changelog for cl4 of all records
      CMD: onyx-110vm7 /usr/sbin/lctl --device lustre-MDT0003 changelog_deregister cl4
      lustre-MDT0003: Deregistered changelog user #4
      CMD: onyx-110vm7 /usr/sbin/lctl set_param mdd.lustre-MDT0003.changelog_mask='MARK ' -n
      CMD: onyx-110vm6 /usr/sbin/lctl get_param -n mdd.lustre-MDT0002.changelog_users
      lustre-MDT0002: clear the changelog for cl4 of all records
      CMD: onyx-110vm6 /usr/sbin/lctl --device lustre-MDT0002 changelog_deregister cl4
      lustre-MDT0002: Deregistered changelog user #4
      CMD: onyx-110vm6 /usr/sbin/lctl set_param mdd.lustre-MDT0002.changelog_mask='MARK ' -n
      CMD: onyx-110vm7 /usr/sbin/lctl get_param -n mdd.lustre-MDT0001.changelog_users
      lustre-MDT0001: clear the changelog for cl4 of all records
      CMD: onyx-110vm7 /usr/sbin/lctl --device lustre-MDT0001 changelog_deregister cl4
      lustre-MDT0001: Deregistered changelog user #4
      CMD: onyx-110vm7 /usr/sbin/lctl set_param mdd.lustre-MDT0001.changelog_mask='MARK ' -n
      CMD: onyx-110vm6 /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.changelog_users
      lustre-MDT0000: clear the changelog for cl4 of all records
      CMD: onyx-110vm6 /usr/sbin/lctl --device lustre-MDT0000 changelog_deregister cl4
      lustre-MDT0000: Deregistered changelog user #4
      CMD: onyx-110vm6 /usr/sbin/lctl set_param mdd.lustre-MDT0000.changelog_mask='MARK ' -n

      Seems similar to https://jira.whamcloud.com/browse/LU-10054?jql=text%20~%20%22Failure%20in%20replication%3B%20differences%20found.%22 or https://jira.whamcloud.com/browse/LU-4256?jql=text%20~%20%22Failure%20in%20replication%3B%20differences%20found.%22.

      VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
      lustre-rsync-test test_2c - Failure in replication; differences found.

      Attachments

        Issue Links

          Activity

            [LU-16489] lustre-rsync-test test_2c: Failure in replication; differences found

            Still needs a proper fix.

            simmonsja James A Simmons added a comment - Still needs a proper fix.

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/58794/
            Subject: LU-16489 tests: disable lustre_rsync 2c test
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 6358d54e6335f40bbfc90069fc364f2ecbba6e79

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/58794/ Subject: LU-16489 tests: disable lustre_rsync 2c test Project: fs/lustre-release Branch: master Current Patch Set: Commit: 6358d54e6335f40bbfc90069fc364f2ecbba6e79

            "James Simmons <jsimmons@infradead.org>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/58794
            Subject: LU-16489 tests: disable lustre_rsync 2c test
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 588ef663a9c5ff3bb715ae2450baacb3ce47879a

            gerrit Gerrit Updater added a comment - "James Simmons <jsimmons@infradead.org>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/58794 Subject: LU-16489 tests: disable lustre_rsync 2c test Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 588ef663a9c5ff3bb715ae2450baacb3ce47879a

            Oleg's test harness sees this all the time.

            simmonsja James A Simmons added a comment - Oleg's test harness sees this all the time.
            bfaccini-nvda Bruno Faccini added a comment - +1 with recent master at https://testing.whamcloud.com/test_sets/474b5c2e-f68f-42b7-bb0e-2cf6aa6192ce .  
            yujian Jian Yu added a comment -

            The failure occurred frequently in the last 6 months.

            yujian Jian Yu added a comment - The failure occurred frequently in the last 6 months.
            yujian Jian Yu added a comment - +1 on master branch: https://testing.whamcloud.com/test_sets/0fda8ce6-6543-47f5-b424-3c6a373b9b30
            bobijam Zhenyu Xu added a comment - - edited

            There's a strange thing I don't understand about the test failure, take a Gerrit Janitor test failure case for an example:

            Stopping dbench
            9440 9441 9442 stopped
            Starting replication
            Lustre filesystem: lustre
            MDT device: lustre-MDT0000
            Source: /mnt/lustre
            Target: /tmp/target
            Target: /tmp/target2
            Statuslog: /tmp/lustre_rsync.log
            
            Changelog registration: cl1
            Starting changelog record: 0
            Clear changelog after use: no
            Errors: 0
            lustre_rsync took 19 seconds
            Changelog records consumed: 2144
            changelog record number: before lrsync 2144, after 2144
            
            **** 1st replication and checkdiff returns ok (comsume changelog from 0 to 2144)
            
            Resuming dbench
            Stopping dbench
            9440 9441 9442 stopped
            Starting replication
            Lustre filesystem: lustre
            MDT device: lustre-MDT0000
            Source: /mnt/lustre
            Target: /tmp/target
            Target: /tmp/target2
            Statuslog: /tmp/lustre_rsync.log
            
            Changelog registration: cl1
            Starting changelog record: 2144
            Clear changelog after use: no
            Errors: 0
            lustre_rsync took 2 seconds
            Changelog records consumed: 729
            changelog record number: before lrsync 2872, after 2872
            
            **** 2nd replication consuming changelog from 2144 to 2872, then following error emerged
            
            Files /mnt/lustre/d2b.lustre-rsync-test/clients/client0/~dmtmp/PARADOX/ANSWER.DB and /tmp/target/d2b.lustre-rsync-test/clients/client0/~dmtmp/PARADOX/ANSWER.DB differ
            Files /mnt/lustre/d2b.lustre-rsync-test/clients/client1/~dmtmp/PARADOX/ANSWER.DB and /tmp/target/d2b.lustre-rsync-test/clients/client1/~dmtmp/PARADOX/ANSWER.DB differ
            

            For records relating to ANSWER.DB files from the changelog received

            707 01CREAT 12:07:02.380175623 2024.03.28 0x0 t=[0x200000402:0xcd:0x0] j=dbench.0 ef=0xf u=0:0 nid=192.168.201.22@tcp p=[0x200000402:0x59:0x0] ANSWER.DB
            709 01CREAT 12:07:02.385925369 2024.03.28 0x0 t=[0x200000402:0xce:0x0] j=dbench.0 ef=0xf u=0:0 nid=192.168.201.22@tcp p=[0x200000402:0x5a:0x0] ANSWER.DB
            1689 06UNLNK 12:07:07.706417623 2024.03.28 0x1 t=[0x200000402:0xcd:0x0] j=dbench.0 ef=0xf u=0:0 nid=192.168.201.22@tcp p=[0x200000402:0x59:0x0] ANSWER.DB
            1690 06UNLNK 12:07:07.706507384 2024.03.28 0x1 t=[0x200000402:0xce:0x0] j=dbench.0 ef=0xf u=0:0 nid=192.168.201.22@tcp p=[0x200000402:0x5a:0x0] ANSWER.DB
            2139 01CREAT 12:07:11.228556007 2024.03.28 0x0 t=[0x200000402:0x1c5:0x0] j=dbench.0 ef=0xf u=0:0 nid=192.168.201.22@tcp p=[0x200000402:0x59:0x0] ANSWER.DB
            2140 01CREAT 12:07:11.228829638 2024.03.28 0x0 t=[0x200000402:0x1c6:0x0] j=dbench.0 ef=0xf u=0:0 nid=192.168.201.22@tcp p=[0x200000402:0x5a:0x0] ANSWER.DB
               *** we can see that ANSWER.DB file changes all happen in the 1st replication (recno from 0 to
               *** 2144), and the diffcheck after 
               *** the first replication passed ok; but after second replication (consuming recno from 2144
               *** to 2872), there should no operations happened on ANSWER.DB files, but still after the 2nd
               *** replication, diffcheck find ANSWER.DB contents are different
            

            The detailed lustre-rsync operation log relating to ANSWER.DB

            ***** Start 707 CREAT (1) [0x200000402:0xcd:0x0] [0x200000402:0x59:0x0] ANSWER.DB *****
            create: tfid [0x200000402:0xcd:0x0] not found on source-fs
            ##### End 707 CREAT (1) [0x200000402:0xcd:0x0] [0x200000402:0x59:0x0] ANSWER.DB rc=0 #####
            
            ***** Start 709 CREAT (1) [0x200000402:0xce:0x0] [0x200000402:0x5a:0x0] ANSWER.DB *****
            create: tfid [0x200000402:0xce:0x0] not found on source-fs
            ##### End 709 CREAT (1) [0x200000402:0xce:0x0] [0x200000402:0x5a:0x0] ANSWER.DB rc=0 #####
            
            ***** Start 1689 UNLNK (6) [0x200000402:0xcd:0x0] [0x200000402:0x59:0x0] ANSWER.DB *****
            remove: /tmp/target/.lustrerepl/[0x200000402:0xcd:0x0]; rc=-2, errno=2
            remove: /tmp/target/d2b.lustre-rsync-test/clients/client1/~dmtmp/PARADOX/ANSWER.DB; rc1=-2, errno=2
            remove: /tmp/target2/.lustrerepl/[0x200000402:0xcd:0x0]; rc=-2, errno=2
            remove: /tmp/target2/d2b.lustre-rsync-test/clients/client1/~dmtmp/PARADOX/ANSWER.DB; rc1=-2, errno=2
            ##### End 1689 UNLNK (6) [0x200000402:0xcd:0x0] [0x200000402:0x59:0x0] ANSWER.DB rc=-2 #####
            
            ***** Start 1690 UNLNK (6) [0x200000402:0xce:0x0] [0x200000402:0x5a:0x0] ANSWER.DB *****
            remove: /tmp/target/.lustrerepl/[0x200000402:0xce:0x0]; rc=-2, errno=2
            remove: /tmp/target/d2b.lustre-rsync-test/clients/client0/~dmtmp/PARADOX/ANSWER.DB; rc1=-2, errno=2
            remove: /tmp/target2/.lustrerepl/[0x200000402:0xce:0x0]; rc=-2, errno=2
            remove: /tmp/target2/d2b.lustre-rsync-test/clients/client0/~dmtmp/PARADOX/ANSWER.DB; rc1=-2, errno=2
            ##### End 1690 UNLNK (6) [0x200000402:0xce:0x0] [0x200000402:0x5a:0x0] ANSWER.DB rc=-2 ##### 
            
                ~~~~~~ don't know why the 2 deletion of ANSWER.DB (record 1689/1690) failed with ENOENT
            
            ***** Start 2139 CREAT (1) [0x200000402:0x1c5:0x0] [0x200000402:0x59:0x0] ANSWER.DB *****
            dest = d2b.lustre-rsync-test/clients/client1/~dmtmp/PARADOX/ANSWER.DB; savedpath = d2b.lustre-rsync-test/clients/client1/~dmtmp/PARADOX/ANSWER.DB
            mkfile(1) /tmp/target/d2b.lustre-rsync-test/clients/client1/~dmtmp/PARADOX/ANSWER.DB
            Syncing data and attributes [0x200000402:0x1c5:0x0]
            llistxattr(/mnt/lustre/.lustre/fid/[0x200000402:0x1c5:0x0],0x1b92c10) returned 57, errno=0
            	(trusted.link,28916768) rc=0x33
            	lsetxattr(), rc=0, errno=0
            	(user.job,28916768) rc=0x8
            	lsetxattr(), rc=-1, errno=95
            	(trusted.lov,28916768) rc=0x38
            	lsetxattr(), rc=0, errno=95
            	(trusted.lma,28916768) rc=0x18
            	lsetxattr(), rc=0, errno=95
            	(lustre.lov,28916768) rc=0x38
            	lsetxattr(), rc=-1, errno=95
            setxattr: /mnt/lustre/.lustre/fid/[0x200000402:0x1c5:0x0] /tmp/target/d2b.lustre-rsync-test/clients/client1/~dmtmp/PARADOX/ANSWER.DB
            mkfile(1) /tmp/target2/d2b.lustre-rsync-test/clients/client1/~dmtmp/PARADOX/ANSWER.DB
            Syncing data and attributes [0x200000402:0x1c5:0x0]
            llistxattr(/mnt/lustre/.lustre/fid/[0x200000402:0x1c5:0x0],0x1b92c10) returned 57, errno=0
            	(trusted.link,28916768) rc=0x33
            	lsetxattr(), rc=0, errno=0
            	(user.job,28916768) rc=0x8
            	lsetxattr(), rc=-1, errno=95
            	(trusted.lov,28916768) rc=0x38
            	lsetxattr(), rc=0, errno=95
            	(trusted.lma,28916768) rc=0x18
            	lsetxattr(), rc=0, errno=95
            	(lustre.lov,28916768) rc=0x38
            	lsetxattr(), rc=-1, errno=95
            setxattr: /mnt/lustre/.lustre/fid/[0x200000402:0x1c5:0x0] /tmp/target2/d2b.lustre-rsync-test/clients/client1/~dmtmp/PARADOX/ANSWER.DB
            ##### End 2139 CREAT (1) [0x200000402:0x1c5:0x0] [0x200000402:0x59:0x0] ANSWER.DB rc=0 #####
            
            ***** Start 2140 CREAT (1) [0x200000402:0x1c6:0x0] [0x200000402:0x5a:0x0] ANSWER.DB *****
            dest = d2b.lustre-rsync-test/clients/client0/~dmtmp/PARADOX/ANSWER.DB; savedpath = d2b.lustre-rsync-test/clients/client0/~dmtmp/PARADOX/ANSWER.DB
            mkfile(1) /tmp/target/d2b.lustre-rsync-test/clients/client0/~dmtmp/PARADOX/ANSWER.DB
            Syncing data and attributes [0x200000402:0x1c6:0x0]
            llistxattr(/mnt/lustre/.lustre/fid/[0x200000402:0x1c6:0x0],0x1b92c10) returned 57, errno=0
            	(trusted.link,28916768) rc=0x33
            	lsetxattr(), rc=0, errno=0
            	(user.job,28916768) rc=0x8
            	lsetxattr(), rc=-1, errno=95
            	(trusted.lov,28916768) rc=0x38
            	lsetxattr(), rc=0, errno=95
            	(trusted.lma,28916768) rc=0x18
            	lsetxattr(), rc=0, errno=95
            	(lustre.lov,28916768) rc=0x38
            	lsetxattr(), rc=-1, errno=95
            setxattr: /mnt/lustre/.lustre/fid/[0x200000402:0x1c6:0x0] /tmp/target/d2b.lustre-rsync-test/clients/client0/~dmtmp/PARADOX/ANSWER.DB
            mkfile(1) /tmp/target2/d2b.lustre-rsync-test/clients/client0/~dmtmp/PARADOX/ANSWER.DB
            Syncing data and attributes [0x200000402:0x1c6:0x0]
            llistxattr(/mnt/lustre/.lustre/fid/[0x200000402:0x1c6:0x0],0x1b92c10) returned 57, errno=0
            	(trusted.link,28916768) rc=0x33
            	lsetxattr(), rc=0, errno=0
            	(user.job,28916768) rc=0x8
            	lsetxattr(), rc=-1, errno=95
            	(trusted.lov,28916768) rc=0x38
            	lsetxattr(), rc=0, errno=95
            	(trusted.lma,28916768) rc=0x18
            	lsetxattr(), rc=0, errno=95
            	(lustre.lov,28916768) rc=0x38
            	lsetxattr(), rc=-1, errno=95
            setxattr: /mnt/lustre/.lustre/fid/[0x200000402:0x1c6:0x0] /tmp/target2/d2b.lustre-rsync-test/clients/client0/~dmtmp/PARADOX/ANSWER.DB
            ##### End 2140 CREAT (1) [0x200000402:0x1c6:0x0] [0x200000402:0x5a:0x0] ANSWER.DB rc=0 #####
            
            bobijam Zhenyu Xu added a comment - - edited There's a strange thing I don't understand about the test failure, take a Gerrit Janitor test failure case for an example: https://testing.whamcloud.com/gerrit-janitor/41397/testresults/lustre-rsync-test-special4-ldiskfs-DNE-centos7_x86_64-centos7_x86_64/lustre-rsync-test.test_2b.test_log.oleg122-client.log Stopping dbench 9440 9441 9442 stopped Starting replication Lustre filesystem: lustre MDT device: lustre-MDT0000 Source: /mnt/lustre Target: /tmp/target Target: /tmp/target2 Statuslog: /tmp/lustre_rsync.log Changelog registration: cl1 Starting changelog record: 0 Clear changelog after use: no Errors: 0 lustre_rsync took 19 seconds Changelog records consumed: 2144 changelog record number: before lrsync 2144, after 2144 **** 1st replication and checkdiff returns ok (comsume changelog from 0 to 2144) Resuming dbench Stopping dbench 9440 9441 9442 stopped Starting replication Lustre filesystem: lustre MDT device: lustre-MDT0000 Source: /mnt/lustre Target: /tmp/target Target: /tmp/target2 Statuslog: /tmp/lustre_rsync.log Changelog registration: cl1 Starting changelog record: 2144 Clear changelog after use: no Errors: 0 lustre_rsync took 2 seconds Changelog records consumed: 729 changelog record number: before lrsync 2872, after 2872 **** 2nd replication consuming changelog from 2144 to 2872, then following error emerged Files /mnt/lustre/d2b.lustre-rsync-test/clients/client0/~dmtmp/PARADOX/ANSWER.DB and /tmp/target/d2b.lustre-rsync-test/clients/client0/~dmtmp/PARADOX/ANSWER.DB differ Files /mnt/lustre/d2b.lustre-rsync-test/clients/client1/~dmtmp/PARADOX/ANSWER.DB and /tmp/target/d2b.lustre-rsync-test/clients/client1/~dmtmp/PARADOX/ANSWER.DB differ For records relating to ANSWER.DB files from the changelog received https://testing.whamcloud.com/gerrit-janitor/41397/testresults/lustre-rsync-test-special4-ldiskfs-DNE-centos7_x86_64-centos7_x86_64/lustre-rsync-test.test_2b.changelog.oleg122-client.log (changelog received) 707 01CREAT 12:07:02.380175623 2024.03.28 0x0 t=[0x200000402:0xcd:0x0] j=dbench.0 ef=0xf u=0:0 nid=192.168.201.22@tcp p=[0x200000402:0x59:0x0] ANSWER.DB 709 01CREAT 12:07:02.385925369 2024.03.28 0x0 t=[0x200000402:0xce:0x0] j=dbench.0 ef=0xf u=0:0 nid=192.168.201.22@tcp p=[0x200000402:0x5a:0x0] ANSWER.DB 1689 06UNLNK 12:07:07.706417623 2024.03.28 0x1 t=[0x200000402:0xcd:0x0] j=dbench.0 ef=0xf u=0:0 nid=192.168.201.22@tcp p=[0x200000402:0x59:0x0] ANSWER.DB 1690 06UNLNK 12:07:07.706507384 2024.03.28 0x1 t=[0x200000402:0xce:0x0] j=dbench.0 ef=0xf u=0:0 nid=192.168.201.22@tcp p=[0x200000402:0x5a:0x0] ANSWER.DB 2139 01CREAT 12:07:11.228556007 2024.03.28 0x0 t=[0x200000402:0x1c5:0x0] j=dbench.0 ef=0xf u=0:0 nid=192.168.201.22@tcp p=[0x200000402:0x59:0x0] ANSWER.DB 2140 01CREAT 12:07:11.228829638 2024.03.28 0x0 t=[0x200000402:0x1c6:0x0] j=dbench.0 ef=0xf u=0:0 nid=192.168.201.22@tcp p=[0x200000402:0x5a:0x0] ANSWER.DB *** we can see that ANSWER.DB file changes all happen in the 1st replication (recno from 0 to *** 2144), and the diffcheck after *** the first replication passed ok; but after second replication (consuming recno from 2144 *** to 2872), there should no operations happened on ANSWER.DB files, but still after the 2nd *** replication, diffcheck find ANSWER.DB contents are different The detailed lustre-rsync operation log relating to ANSWER.DB https://testing.whamcloud.com/gerrit-janitor/41397/testresults/lustre-rsync-test-special4-ldiskfs-DNE-centos7_x86_64-centos7_x86_64/lustre-rsync-test.test_2b.lrsync_log.oleg122-client.log (lrsync operation log) ***** Start 707 CREAT (1) [0x200000402:0xcd:0x0] [0x200000402:0x59:0x0] ANSWER.DB ***** create: tfid [0x200000402:0xcd:0x0] not found on source-fs ##### End 707 CREAT (1) [0x200000402:0xcd:0x0] [0x200000402:0x59:0x0] ANSWER.DB rc=0 ##### ***** Start 709 CREAT (1) [0x200000402:0xce:0x0] [0x200000402:0x5a:0x0] ANSWER.DB ***** create: tfid [0x200000402:0xce:0x0] not found on source-fs ##### End 709 CREAT (1) [0x200000402:0xce:0x0] [0x200000402:0x5a:0x0] ANSWER.DB rc=0 ##### ***** Start 1689 UNLNK (6) [0x200000402:0xcd:0x0] [0x200000402:0x59:0x0] ANSWER.DB ***** remove: /tmp/target/.lustrerepl/[0x200000402:0xcd:0x0]; rc=-2, errno=2 remove: /tmp/target/d2b.lustre-rsync-test/clients/client1/~dmtmp/PARADOX/ANSWER.DB; rc1=-2, errno=2 remove: /tmp/target2/.lustrerepl/[0x200000402:0xcd:0x0]; rc=-2, errno=2 remove: /tmp/target2/d2b.lustre-rsync-test/clients/client1/~dmtmp/PARADOX/ANSWER.DB; rc1=-2, errno=2 ##### End 1689 UNLNK (6) [0x200000402:0xcd:0x0] [0x200000402:0x59:0x0] ANSWER.DB rc=-2 ##### ***** Start 1690 UNLNK (6) [0x200000402:0xce:0x0] [0x200000402:0x5a:0x0] ANSWER.DB ***** remove: /tmp/target/.lustrerepl/[0x200000402:0xce:0x0]; rc=-2, errno=2 remove: /tmp/target/d2b.lustre-rsync-test/clients/client0/~dmtmp/PARADOX/ANSWER.DB; rc1=-2, errno=2 remove: /tmp/target2/.lustrerepl/[0x200000402:0xce:0x0]; rc=-2, errno=2 remove: /tmp/target2/d2b.lustre-rsync-test/clients/client0/~dmtmp/PARADOX/ANSWER.DB; rc1=-2, errno=2 ##### End 1690 UNLNK (6) [0x200000402:0xce:0x0] [0x200000402:0x5a:0x0] ANSWER.DB rc=-2 ##### ~~~~~~ don't know why the 2 deletion of ANSWER.DB (record 1689/1690) failed with ENOENT ***** Start 2139 CREAT (1) [0x200000402:0x1c5:0x0] [0x200000402:0x59:0x0] ANSWER.DB ***** dest = d2b.lustre-rsync-test/clients/client1/~dmtmp/PARADOX/ANSWER.DB; savedpath = d2b.lustre-rsync-test/clients/client1/~dmtmp/PARADOX/ANSWER.DB mkfile(1) /tmp/target/d2b.lustre-rsync-test/clients/client1/~dmtmp/PARADOX/ANSWER.DB Syncing data and attributes [0x200000402:0x1c5:0x0] llistxattr(/mnt/lustre/.lustre/fid/[0x200000402:0x1c5:0x0],0x1b92c10) returned 57, errno=0 (trusted.link,28916768) rc=0x33 lsetxattr(), rc=0, errno=0 (user.job,28916768) rc=0x8 lsetxattr(), rc=-1, errno=95 (trusted.lov,28916768) rc=0x38 lsetxattr(), rc=0, errno=95 (trusted.lma,28916768) rc=0x18 lsetxattr(), rc=0, errno=95 (lustre.lov,28916768) rc=0x38 lsetxattr(), rc=-1, errno=95 setxattr: /mnt/lustre/.lustre/fid/[0x200000402:0x1c5:0x0] /tmp/target/d2b.lustre-rsync-test/clients/client1/~dmtmp/PARADOX/ANSWER.DB mkfile(1) /tmp/target2/d2b.lustre-rsync-test/clients/client1/~dmtmp/PARADOX/ANSWER.DB Syncing data and attributes [0x200000402:0x1c5:0x0] llistxattr(/mnt/lustre/.lustre/fid/[0x200000402:0x1c5:0x0],0x1b92c10) returned 57, errno=0 (trusted.link,28916768) rc=0x33 lsetxattr(), rc=0, errno=0 (user.job,28916768) rc=0x8 lsetxattr(), rc=-1, errno=95 (trusted.lov,28916768) rc=0x38 lsetxattr(), rc=0, errno=95 (trusted.lma,28916768) rc=0x18 lsetxattr(), rc=0, errno=95 (lustre.lov,28916768) rc=0x38 lsetxattr(), rc=-1, errno=95 setxattr: /mnt/lustre/.lustre/fid/[0x200000402:0x1c5:0x0] /tmp/target2/d2b.lustre-rsync-test/clients/client1/~dmtmp/PARADOX/ANSWER.DB ##### End 2139 CREAT (1) [0x200000402:0x1c5:0x0] [0x200000402:0x59:0x0] ANSWER.DB rc=0 ##### ***** Start 2140 CREAT (1) [0x200000402:0x1c6:0x0] [0x200000402:0x5a:0x0] ANSWER.DB ***** dest = d2b.lustre-rsync-test/clients/client0/~dmtmp/PARADOX/ANSWER.DB; savedpath = d2b.lustre-rsync-test/clients/client0/~dmtmp/PARADOX/ANSWER.DB mkfile(1) /tmp/target/d2b.lustre-rsync-test/clients/client0/~dmtmp/PARADOX/ANSWER.DB Syncing data and attributes [0x200000402:0x1c6:0x0] llistxattr(/mnt/lustre/.lustre/fid/[0x200000402:0x1c6:0x0],0x1b92c10) returned 57, errno=0 (trusted.link,28916768) rc=0x33 lsetxattr(), rc=0, errno=0 (user.job,28916768) rc=0x8 lsetxattr(), rc=-1, errno=95 (trusted.lov,28916768) rc=0x38 lsetxattr(), rc=0, errno=95 (trusted.lma,28916768) rc=0x18 lsetxattr(), rc=0, errno=95 (lustre.lov,28916768) rc=0x38 lsetxattr(), rc=-1, errno=95 setxattr: /mnt/lustre/.lustre/fid/[0x200000402:0x1c6:0x0] /tmp/target/d2b.lustre-rsync-test/clients/client0/~dmtmp/PARADOX/ANSWER.DB mkfile(1) /tmp/target2/d2b.lustre-rsync-test/clients/client0/~dmtmp/PARADOX/ANSWER.DB Syncing data and attributes [0x200000402:0x1c6:0x0] llistxattr(/mnt/lustre/.lustre/fid/[0x200000402:0x1c6:0x0],0x1b92c10) returned 57, errno=0 (trusted.link,28916768) rc=0x33 lsetxattr(), rc=0, errno=0 (user.job,28916768) rc=0x8 lsetxattr(), rc=-1, errno=95 (trusted.lov,28916768) rc=0x38 lsetxattr(), rc=0, errno=95 (trusted.lma,28916768) rc=0x18 lsetxattr(), rc=0, errno=95 (lustre.lov,28916768) rc=0x38 lsetxattr(), rc=-1, errno=95 setxattr: /mnt/lustre/.lustre/fid/[0x200000402:0x1c6:0x0] /tmp/target2/d2b.lustre-rsync-test/clients/client0/~dmtmp/PARADOX/ANSWER.DB ##### End 2140 CREAT (1) [0x200000402:0x1c6:0x0] [0x200000402:0x5a:0x0] ANSWER.DB rc=0 #####
            gerrit Gerrit Updater added a comment - - edited

            "Zhenyu Xu <bobijam@hotmail.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/54384
            Subject: LU-16489 test: loop lustre_rsync until no new changelog record
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 9e254f356fe70a95b640f5b7e8b3e6c7035bd3c0

            gerrit Gerrit Updater added a comment - - edited "Zhenyu Xu <bobijam@hotmail.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/54384 Subject: LU-16489 test: loop lustre_rsync until no new changelog record Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 9e254f356fe70a95b640f5b7e8b3e6c7035bd3c0
            adegremont_nvda Aurelien Degremont added a comment - +1 on master https://testing.whamcloud.com/test_sets/4c511a97-4629-4bb2-b4a5-7a25ba5753b9

            People

              wc-triage WC Triage
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated: