[LU-10978] lustre-rsync-test test 1A cannot replicate a hard link because file exists Created: 01/May/18  Updated: 18/Dec/18  Resolved: 17/May/18

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.11.0, Lustre 2.12.0
Fix Version/s: Lustre 2.12.0

Type: Bug Priority: Minor
Reporter: James Nunez (Inactive) Assignee: John Hammond
Resolution: Fixed Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

In lustre-rsync-test test_1A, we do some file system operations including creating a hard link, call lustre_rsync, do more operations and call lustre_rsync again using the log (-l option) from the first call to lustre_rsync. On the second lustre_rsync, we can see in the client test_log that we get an error:

Replication #2
Replication of operation failed(-1): 25 HLINK (3) [0x200012ce9:0x1c:0x0] [0x200012ce9:0x12:0x0] link2
Lustre filesystem: lustre
MDT device: lustre-MDT0000
Source: /mnt/lustre
Target: /tmp/target
Target: /tmp/target2
Statuslog: /tmp/lustre_rsync.log
Changelog registration: cl1
Starting changelog record: 0
Clear changelog after use: no
Errors: 1

In the debug log, we see that there is no issue with replicating the hard link the first time we call lustre_rsync:

***** Start 25 HLINK (3) [0x200012ce9:0x1c:0x0] [0x200012ce9:0x12:0x0] link2 *****
	parent fid2path d1A.lustre-rsync-test/d1, link2, rc=0
link destination is /tmp/target/d1A.lustre-rsync-test/d1/link2
	fid2path d1A.lustre-rsync-test/d1/link1, link2, 0 rc=0
link source is /tmp/target/d1A.lustre-rsync-test/d1/link1
link: /tmp/target/d1A.lustre-rsync-test/d1/link1 [to] /tmp/target/d1A.lustre-rsync-test/d1/link2; rc1=0 Success
	parent fid2path d1A.lustre-rsync-test/d1, link2, rc=0
link destination is /tmp/target2/d1A.lustre-rsync-test/d1/link2
	fid2path d1A.lustre-rsync-test/d1/link1, link2, 0 rc=0
link source is /tmp/target2/d1A.lustre-rsync-test/d1/link1
link: /tmp/target2/d1A.lustre-rsync-test/d1/link1 [to] /tmp/target2/d1A.lustre-rsync-test/d1/link2; rc1=0 Success
##### End 25 HLINK (3) [0x200012ce9:0x1c:0x0] [0x200012ce9:0x12:0x0] link2 rc=0 #####

We get a failure on the second call to lustre_rsync

***** Start 25 HLINK (3) [0x200012ce9:0x1c:0x0] [0x200012ce9:0x12:0x0] link2 *****
	parent fid2path d1A.lustre-rsync-test/d1, link2, rc=0
link destination is /tmp/target/d1A.lustre-rsync-test/d1/link2
	fid2path d1A.lustre-rsync-test/d1/link1, link2, 0 rc=0
link source is /tmp/target/d1A.lustre-rsync-test/d1/link1
link: /tmp/target/d1A.lustre-rsync-test/d1/link1 [to] /tmp/target/d1A.lustre-rsync-test/d1/link2; rc1=-1 File exists
	parent fid2path d1A.lustre-rsync-test/d1, link2, rc=0
link destination is /tmp/target2/d1A.lustre-rsync-test/d1/link2
	fid2path d1A.lustre-rsync-test/d1/link1, link2, 0 rc=0
link source is /tmp/target2/d1A.lustre-rsync-test/d1/link1
link: /tmp/target2/d1A.lustre-rsync-test/d1/link1 [to] /tmp/target2/d1A.lustre-rsync-test/d1/link2; rc1=-1 File exists
##### End 25 HLINK (3) [0x200012ce9:0x1c:0x0] [0x200012ce9:0x12:0x0] link2 rc=-1 #####

It is true that the file exists.

Unfortunately, we don’t check the “Errors” line from the output of lustre_rsync. So, test 1A does not fail.

Here is a link to a test session where we get errors on the second replication
https://testing.hpdd.intel.com/test_sets/d4f17fde-4d03-11e8-b45c-52540065bddc



 Comments   
Comment by John Hammond [ 01/May/18 ]

It looks like this is due to a bug that goes back to when lustre_rsync was first written.

int lr_replicate()
...
                if (debug) {
                        bzero(info, sizeof(struct lr_info));
	                bzero(ext, sizeof(struct lr_info));
                }
        }

        llapi_changelog_fini(&changelog_priv);

        if (errors || verbose)
                printf("Errors: %d\n", errors);

        /* Clear changelog records used so far */
        lr_clear_cl(info, 1);
...

If debugging is set then we zero-out info which prevents us from clearing the changelog properly on exit.

Comment by Gerrit Updater [ 02/May/18 ]

John L. Hammond (john.hammond@intel.com) uploaded a new patch: https://review.whamcloud.com/32246
Subject: LU-10978 utils: preserve lustre_rsync state
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 16c6185336aa6b08f877d9e095b540b89b146435

Comment by Gerrit Updater [ 17/May/18 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/32246/
Subject: LU-10978 utils: preserve lustre_rsync state
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 75c0f9c701a7a5f1e9caeee1a6cd7164e6635dfb

Comment by Peter Jones [ 17/May/18 ]

Landed for 2.12

Generated at Sat Feb 10 02:39:52 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.