PFL known issues tracking ticket
(LU-9349)
|
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.10.0, Lustre 2.11.0 |
| Fix Version/s: | Lustre 2.10.0 |
| Type: | Technical task | Priority: | Critical |
| Reporter: | James Nunez (Inactive) | Assignee: | Zhenyu Xu |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | pfl | ||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
sanity test_17k fails when a default composite layout is set for the Lustre mount point with sanity test_17k: @@@@@@ FAIL: rsync failed with xattrs enabled From the test log, we can see lsetxattr() in rsync is failing == sanity test 17k: symlinks: rsync with xattrs enabled ============================================== 03:57:01 (1494388621)
striped dir -i1 -c2 /mnt/lustre/d17k.sanity
striped dir -i1 -c2 /mnt/lustre/d17k.sanity.new
sending incremental file list
./
f17k.sanity
f17k.sanity.lnk -> /mnt/lustre/d17k.sanity/f17k.sanity
rsync: rsync_xal_set: lsetxattr(""/mnt/lustre/d17k.sanity.new/.f17k.sanity.qR0FHg"","lustre.lov") failed: File exists (17)
rsync: rsync_xal_set: lsetxattr(""/mnt/lustre/d17k.sanity.new/.f17k.sanity.qR0FHg"","trusted.lov") failed: File exists (17)
sent 1628 bytes received 48 bytes 1117.33 bytes/sec
total size is 35 speedup is 0.02
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1052) [sender=3.0.9]
sanity test_17k: @@@@@@ FAIL: rsync failed with xattrs enabled
Logs for recent failures are at: |
| Comments |
| Comment by Joseph Gmitter (Inactive) [ 11/May/17 ] |
|
Hi Bobijam, Can you add your thoughts here that we discussed 1:1 today? Thanks. |
| Comment by Zhenyu Xu [ 12/May/17 ] |
|
Hi Andreas, I uploaded two strace output of the rsync command, the difference is that for plain.txt, /mnt/lustre/d17k.sanity/f17k.sanity is created as a plain file, while pfl.txt corresponding to a PFL file. The difference of the strace shows that with PFL file, rsync tries to lsetxattr a temporary file, while lsetxattr does not show up for the plain file. rsync FPL file read(5, "rsync: rsync_xal_set: lsetxattr("..., 93) = 93
write(2, "rsync: rsync_xal_set: lsetxattr("..., 92rsync: rsync_xal_set:
lsetxattr(".f17k.sanity.6HKfyd","lustre.lov") failed: File exists (17)) = 92
I don't know what causes it, could it be that the xattr of PFL is too big that cause rsync tries to use a temporary file? I could not find where I can get the rsync code to have a look. |
| Comment by Andreas Dilger [ 16/May/17 ] |
|
The temporary file is always being used by rsync, so I don't think that is different. The main issue is that the strace needs to be run with "-f" to trace all the threads across fork, since it looks like the logs are incomplete (they don't show any file creates, etc)? The "lsetxattr()" that shows up in the "pfl" log is only from when it is calling write() to print the message to the console, not the actual syscall to lsetxattr(). I think both logs are missing the actual operations to the target files. That said, setting trusted.lov and lustre.lov should be accepted from userspace even if they already exist, since they will normally be created when the file is first opened, and then will also be "restored" by rsync (or tar or cp) after the file is written (see comments in ll_setstripe_ea_info(): /**
* b=10667: ignore error.
* Silently eat error on setting strusted.lov attribute for
* SuSE 9, it added default option to copy all attributes in
* 'cp' command.
*/
I also see in that function: if (lump != NULL && lump->lmm_magic == LOV_USER_MAGIC_COMP_V1) { return_err = true; goto setstripe; } so this is what is causing PFL to fail in the rsync case and not the plain file case. Is there a reason that the PFL code needs to return an error in this case? This came from https://review.whamcloud.com/24851 so you may want to look at the commit history of the master or pfl branches versions of this patch to see why setxattr("trusted.lov") needs to return an error to userspace. |
| Comment by Andreas Dilger [ 16/May/17 ] |
|
PS: I think this is pretty critical, since it will totally break rsync, cp, tar, or anything that is trying to restore the lustre.lov or trusted.lov xattrs. |
| Comment by Gerrit Updater [ 16/May/17 ] |
|
Bobi Jam (bobijam@hotmail.com) uploaded a new patch: https://review.whamcloud.com/27126 |
| Comment by Zhenyu Xu [ 16/May/17 ] |
|
Yes, you are right, strace -f shows that in all cases, rsync creates temporary file. |
| Comment by Gerrit Updater [ 26/May/17 ] |
|
Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: https://review.whamcloud.com/27311 |
| Comment by Gerrit Updater [ 10/Jun/17 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/27126/ |
| Comment by James A Simmons [ 10/Jun/17 ] |
|
I see this landed already. I was testing this with my |
| Comment by Peter Jones [ 10/Jun/17 ] |
|
Landed for 2.10 |
| Comment by James A Simmons [ 10/Jun/17 ] |
|
Never mind. I see the sanity.sh test were updated and my test suite was missing those changes. It does work. |