[LU-11479] Error replicating xattr for /tmp/target/d8.lustre-rsync-test/d07/d073/b4: 2 Created: 07/Oct/18  Updated: 29/Oct/18  Resolved: 29/Oct/18

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.12.0
Fix Version/s: Lustre 2.12.0

Type: Bug Priority: Minor
Reporter: Andreas Dilger Assignee: John Hammond
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-11450 trusted.som xattr is logged in changelog Resolved
is related to LU-11466 DoM files should not need LSOM sync f... Resolved
is related to LU-9538 Size on MDT with guarantee of eventua... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

The lustre-rsync-test test_8() is spewing thousands of messages:

Registered 1 changelog users: 'cl14'
lustre-MDT0000: Registered changelog user cl14
Error replicating  xattr for /tmp/target/d8.lustre-rsync-test/d01/d011/c0: 2
Error replicating  xattr for /tmp/target/d8.lustre-rsync-test/d01/d011/c0: 2
Error replicating  xattr for /tmp/target/d8.lustre-rsync-test/d01/d011/c0: 2
:
Error replicating  xattr for /tmp/target/d8.lustre-rsync-test/d07/d073/b4: 2
:
Source: /mnt/lustre
Target: /tmp/target
Statuslog: /tmp/lustre_rsync.log
Changelog registration: cl14
Starting changelog record: 0
Clear changelog after use: no
Errors: 8100
lustre_rsync took 324 seconds
Changelog records consumed: 5121

Each of the identical messsges if printed 5x before the next file is listed. Which makes it seem like it isn't working correctly, even though the test is not marked as failing.

This looks like it started failing around 2018-07-31, but it is slow to track it back exactly because it involves looking at each passing test individually. I bisected the results to narrow it down to this date (+/- 1 day or so).

A good run looks like:

Starting changelog record: 0
Clear changelog after use: no
Errors: 0
lustre_rsync took 191 seconds
Changelog records consumed: 3501


 Comments   
Comment by Andreas Dilger [ 13/Oct/18 ]

The LSOM patch was landed on 2018-07-30, so is likely to be the cause of this problem. It will hopefully go away when the patch for LU-11466 lands.

However, it isn't clear whether we should return an error from trying to set the trusted.lsom xattr or not? Some tools like "cp" and "tar" will try to copy all of the xattrs, and since trusted.som is listed it will generate an error and lists of noise. Similarly, we silently eat any attempt to set trusted.lov directly on an existing file, so that tools don't complain.

Separately, it would be good to get a better error message in lustre_rsync, as we discussed.

Comment by Qian Yingjin [ 15/Oct/18 ]

After applied the patch LU-11450 trusted.som xattr is logged in changelog, the error messages was gone.

Comment by Gerrit Updater [ 15/Oct/18 ]

John L. Hammond (jhammond@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33373
Subject: LU-11479 rsync: replicate attributes of file in .lustrerepl
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 7ec4297fd6ab1f0e8ae4199c7646960f6e047c46

Comment by John Hammond [ 15/Oct/18 ]

Please note that, even though LU-11450 makes the messages go away, there is still a bug in lustre_rsync which is fixed by https://review.whamcloud.com/33373. So let's leave this open until that change is landed.

Comment by Gerrit Updater [ 29/Oct/18 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33373/
Subject: LU-11479 rsync: replicate attributes of file in .lustrerepl
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 337f230565ea033d126653e8da01315211470665

Comment by Peter Jones [ 29/Oct/18 ]

Landed for 2.12

Generated at Sat Feb 10 02:44:14 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.