PFL known issues tracking ticket (LU-9349)

[LU-9484] sanity test 17k fails with 'rsync failed with xattrs enabled' Created: 10/May/17  Updated: 09/Oct/17  Resolved: 10/Jun/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.10.0, Lustre 2.11.0
Fix Version/s: Lustre 2.10.0

Type: Technical task Priority: Critical
Reporter: James Nunez (Inactive) Assignee: Zhenyu Xu
Resolution: Fixed Votes: 0
Labels: pfl

Attachments: Text File pfl.txt     Text File plain.txt    
Issue Links:
Related
is related to LU-9335 sanity test 17l and 17k fail with ‘rs... Resolved
Rank (Obsolete): 9223372036854775807

 Description   

sanity test_17k fails when a default composite layout is set for the Lustre mount point with

sanity test_17k: @@@@@@ FAIL: rsync failed with xattrs enabled

From the test log, we can see lsetxattr() in rsync is failing

== sanity test 17k: symlinks: rsync with xattrs enabled ============================================== 03:57:01 (1494388621)
striped dir -i1 -c2 /mnt/lustre/d17k.sanity
striped dir -i1 -c2 /mnt/lustre/d17k.sanity.new
sending incremental file list
./
f17k.sanity
f17k.sanity.lnk -> /mnt/lustre/d17k.sanity/f17k.sanity
rsync: rsync_xal_set: lsetxattr(""/mnt/lustre/d17k.sanity.new/.f17k.sanity.qR0FHg"","lustre.lov") failed: File exists (17)
rsync: rsync_xal_set: lsetxattr(""/mnt/lustre/d17k.sanity.new/.f17k.sanity.qR0FHg"","trusted.lov") failed: File exists (17)

sent 1628 bytes  received 48 bytes  1117.33 bytes/sec
total size is 35  speedup is 0.02
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1052) [sender=3.0.9]
 sanity test_17k: @@@@@@ FAIL: rsync failed with xattrs enabled

Logs for recent failures are at:
https://testing.hpdd.intel.com/test_sets/f5d085aa-357c-11e7-b0a8-5254006e85c2
https://testing.hpdd.intel.com/test_sets/2e962ed8-3534-11e7-814a-5254006e85c2



 Comments   
Comment by Joseph Gmitter (Inactive) [ 11/May/17 ]

Hi Bobijam,

Can you add your thoughts here that we discussed 1:1 today?

Thanks.
Joe

Comment by Zhenyu Xu [ 12/May/17 ]

Hi Andreas,

I uploaded two strace output of the rsync command, the difference is that for plain.txt, /mnt/lustre/d17k.sanity/f17k.sanity is created as a plain file, while pfl.txt corresponding to a PFL file.

The difference of the strace shows that with PFL file, rsync tries to lsetxattr a temporary file, while lsetxattr does not show up for the plain file.

rsync FPL file
read(5, "rsync: rsync_xal_set: lsetxattr("..., 93) = 93
write(2, "rsync: rsync_xal_set: lsetxattr("..., 92rsync: rsync_xal_set:
lsetxattr(".f17k.sanity.6HKfyd","lustre.lov") failed: File exists (17)) = 92

I don't know what causes it, could it be that the xattr of PFL is too big that cause rsync tries to use a temporary file? I could not find where I can get the rsync code to have a look.

Comment by Andreas Dilger [ 16/May/17 ]

The temporary file is always being used by rsync, so I don't think that is different. The main issue is that the strace needs to be run with "-f" to trace all the threads across fork, since it looks like the logs are incomplete (they don't show any file creates, etc)? The "lsetxattr()" that shows up in the "pfl" log is only from when it is calling write() to print the message to the console, not the actual syscall to lsetxattr(). I think both logs are missing the actual operations to the target files.

That said, setting trusted.lov and lustre.lov should be accepted from userspace even if they already exist, since they will normally be created when the file is first opened, and then will also be "restored" by rsync (or tar or cp) after the file is written (see comments in ll_setstripe_ea_info():

                /**
                 * b=10667: ignore error.
                 * Silently eat error on setting strusted.lov attribute for
                 * SuSE 9, it added default option to copy all attributes in
                 * 'cp' command.
                 */

I also see in that function:

        if (lump != NULL && lump->lmm_magic == LOV_USER_MAGIC_COMP_V1) {
                return_err = true;
                goto setstripe;
        }

so this is what is causing PFL to fail in the rsync case and not the plain file case. Is there a reason that the PFL code needs to return an error in this case? This came from https://review.whamcloud.com/24851 so you may want to look at the commit history of the master or pfl branches versions of this patch to see why setxattr("trusted.lov") needs to return an error to userspace.

Comment by Andreas Dilger [ 16/May/17 ]

PS: I think this is pretty critical, since it will totally break rsync, cp, tar, or anything that is trying to restore the lustre.lov or trusted.lov xattrs.

Comment by Gerrit Updater [ 16/May/17 ]

Bobi Jam (bobijam@hotmail.com) uploaded a new patch: https://review.whamcloud.com/27126
Subject: LU-9484 llite: eat error on setting trunsted.lov
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: bd9956f9190c1b0aca37c47802e44a3c82eba9a6

Comment by Zhenyu Xu [ 16/May/17 ]

Yes, you are right, strace -f shows that in all cases, rsync creates temporary file.

Comment by Gerrit Updater [ 26/May/17 ]

Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: https://review.whamcloud.com/27311
Subject: LU-9484 llite: eat -EEXIST on setting trusted.lov
Project: fs/lustre-release
Branch: pfl
Current Patch Set: 1
Commit: 44bbbfed3513798638e01f1ae2447faf80f7c15e

Comment by Gerrit Updater [ 10/Jun/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/27126/
Subject: LU-9484 llite: eat -EEXIST on setting trunsted.lov
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 0e90e02ceddd60f24fac9709f3ab9e9421c80315

Comment by James A Simmons [ 10/Jun/17 ]

I see this landed already. I was testing this with my LU-9183 xattr patch and I was seeing sanity 102a and 102n both failing. I thought it was due to my latest LU-9183 but I traced it down to this patch. I'm looking to see what the fix will be and I place the fix in patch https://review.whamcloud.com/#/c/27240. Strange it passed testing in maloo.

Comment by Peter Jones [ 10/Jun/17 ]

Landed for 2.10

Comment by James A Simmons [ 10/Jun/17 ]

Never mind. I see the sanity.sh test were updated and my test suite was missing those changes. It does work.

Generated at Sat Feb 10 02:26:36 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.