Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10978

lustre-rsync-test test 1A cannot replicate a hard link because file exists

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.12.0
    • Lustre 2.11.0, Lustre 2.12.0
    • None
    • 3
    • 9223372036854775807

    Description

      In lustre-rsync-test test_1A, we do some file system operations including creating a hard link, call lustre_rsync, do more operations and call lustre_rsync again using the log (-l option) from the first call to lustre_rsync. On the second lustre_rsync, we can see in the client test_log that we get an error:

      Replication #2
      Replication of operation failed(-1): 25 HLINK (3) [0x200012ce9:0x1c:0x0] [0x200012ce9:0x12:0x0] link2
      Lustre filesystem: lustre
      MDT device: lustre-MDT0000
      Source: /mnt/lustre
      Target: /tmp/target
      Target: /tmp/target2
      Statuslog: /tmp/lustre_rsync.log
      Changelog registration: cl1
      Starting changelog record: 0
      Clear changelog after use: no
      Errors: 1
      

      In the debug log, we see that there is no issue with replicating the hard link the first time we call lustre_rsync:

      ***** Start 25 HLINK (3) [0x200012ce9:0x1c:0x0] [0x200012ce9:0x12:0x0] link2 *****
      	parent fid2path d1A.lustre-rsync-test/d1, link2, rc=0
      link destination is /tmp/target/d1A.lustre-rsync-test/d1/link2
      	fid2path d1A.lustre-rsync-test/d1/link1, link2, 0 rc=0
      link source is /tmp/target/d1A.lustre-rsync-test/d1/link1
      link: /tmp/target/d1A.lustre-rsync-test/d1/link1 [to] /tmp/target/d1A.lustre-rsync-test/d1/link2; rc1=0 Success
      	parent fid2path d1A.lustre-rsync-test/d1, link2, rc=0
      link destination is /tmp/target2/d1A.lustre-rsync-test/d1/link2
      	fid2path d1A.lustre-rsync-test/d1/link1, link2, 0 rc=0
      link source is /tmp/target2/d1A.lustre-rsync-test/d1/link1
      link: /tmp/target2/d1A.lustre-rsync-test/d1/link1 [to] /tmp/target2/d1A.lustre-rsync-test/d1/link2; rc1=0 Success
      ##### End 25 HLINK (3) [0x200012ce9:0x1c:0x0] [0x200012ce9:0x12:0x0] link2 rc=0 #####
      

      We get a failure on the second call to lustre_rsync

      ***** Start 25 HLINK (3) [0x200012ce9:0x1c:0x0] [0x200012ce9:0x12:0x0] link2 *****
      	parent fid2path d1A.lustre-rsync-test/d1, link2, rc=0
      link destination is /tmp/target/d1A.lustre-rsync-test/d1/link2
      	fid2path d1A.lustre-rsync-test/d1/link1, link2, 0 rc=0
      link source is /tmp/target/d1A.lustre-rsync-test/d1/link1
      link: /tmp/target/d1A.lustre-rsync-test/d1/link1 [to] /tmp/target/d1A.lustre-rsync-test/d1/link2; rc1=-1 File exists
      	parent fid2path d1A.lustre-rsync-test/d1, link2, rc=0
      link destination is /tmp/target2/d1A.lustre-rsync-test/d1/link2
      	fid2path d1A.lustre-rsync-test/d1/link1, link2, 0 rc=0
      link source is /tmp/target2/d1A.lustre-rsync-test/d1/link1
      link: /tmp/target2/d1A.lustre-rsync-test/d1/link1 [to] /tmp/target2/d1A.lustre-rsync-test/d1/link2; rc1=-1 File exists
      ##### End 25 HLINK (3) [0x200012ce9:0x1c:0x0] [0x200012ce9:0x12:0x0] link2 rc=-1 #####
      

      It is true that the file exists.

      Unfortunately, we don’t check the “Errors” line from the output of lustre_rsync. So, test 1A does not fail.

      Here is a link to a test session where we get errors on the second replication
      https://testing.hpdd.intel.com/test_sets/d4f17fde-4d03-11e8-b45c-52540065bddc

      Attachments

        Issue Links

          Activity

            [LU-10978] lustre-rsync-test test 1A cannot replicate a hard link because file exists
            pjones Peter Jones added a comment -

            Landed for 2.12

            pjones Peter Jones added a comment - Landed for 2.12

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/32246/
            Subject: LU-10978 utils: preserve lustre_rsync state
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 75c0f9c701a7a5f1e9caeee1a6cd7164e6635dfb

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/32246/ Subject: LU-10978 utils: preserve lustre_rsync state Project: fs/lustre-release Branch: master Current Patch Set: Commit: 75c0f9c701a7a5f1e9caeee1a6cd7164e6635dfb

            John L. Hammond (john.hammond@intel.com) uploaded a new patch: https://review.whamcloud.com/32246
            Subject: LU-10978 utils: preserve lustre_rsync state
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 16c6185336aa6b08f877d9e095b540b89b146435

            gerrit Gerrit Updater added a comment - John L. Hammond (john.hammond@intel.com) uploaded a new patch: https://review.whamcloud.com/32246 Subject: LU-10978 utils: preserve lustre_rsync state Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 16c6185336aa6b08f877d9e095b540b89b146435
            jhammond John Hammond added a comment -

            It looks like this is due to a bug that goes back to when lustre_rsync was first written.

            int lr_replicate()
            ...
                            if (debug) {
                                    bzero(info, sizeof(struct lr_info));
            	                bzero(ext, sizeof(struct lr_info));
                            }
                    }
            
                    llapi_changelog_fini(&changelog_priv);
            
                    if (errors || verbose)
                            printf("Errors: %d\n", errors);
            
                    /* Clear changelog records used so far */
                    lr_clear_cl(info, 1);
            ...
            

            If debugging is set then we zero-out info which prevents us from clearing the changelog properly on exit.

            jhammond John Hammond added a comment - It looks like this is due to a bug that goes back to when lustre_rsync was first written. int lr_replicate() ... if (debug) { bzero(info, sizeof(struct lr_info)); bzero(ext, sizeof(struct lr_info)); } } llapi_changelog_fini(&changelog_priv); if (errors || verbose) printf( "Errors: %d\n" , errors); /* Clear changelog records used so far */ lr_clear_cl(info, 1); ... If debugging is set then we zero-out info which prevents us from clearing the changelog properly on exit.

            People

              jhammond John Hammond
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: