Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9349 PFL known issues tracking ticket
  3. LU-9484

sanity test 17k fails with 'rsync failed with xattrs enabled'

Details

    • Technical task
    • Resolution: Fixed
    • Critical
    • Lustre 2.10.0
    • Lustre 2.10.0, Lustre 2.11.0
    • 9223372036854775807

    Description

      sanity test_17k fails when a default composite layout is set for the Lustre mount point with

      sanity test_17k: @@@@@@ FAIL: rsync failed with xattrs enabled
      

      From the test log, we can see lsetxattr() in rsync is failing

      == sanity test 17k: symlinks: rsync with xattrs enabled ============================================== 03:57:01 (1494388621)
      striped dir -i1 -c2 /mnt/lustre/d17k.sanity
      striped dir -i1 -c2 /mnt/lustre/d17k.sanity.new
      sending incremental file list
      ./
      f17k.sanity
      f17k.sanity.lnk -> /mnt/lustre/d17k.sanity/f17k.sanity
      rsync: rsync_xal_set: lsetxattr(""/mnt/lustre/d17k.sanity.new/.f17k.sanity.qR0FHg"","lustre.lov") failed: File exists (17)
      rsync: rsync_xal_set: lsetxattr(""/mnt/lustre/d17k.sanity.new/.f17k.sanity.qR0FHg"","trusted.lov") failed: File exists (17)
      
      sent 1628 bytes  received 48 bytes  1117.33 bytes/sec
      total size is 35  speedup is 0.02
      rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1052) [sender=3.0.9]
       sanity test_17k: @@@@@@ FAIL: rsync failed with xattrs enabled
      

      Logs for recent failures are at:
      https://testing.hpdd.intel.com/test_sets/f5d085aa-357c-11e7-b0a8-5254006e85c2
      https://testing.hpdd.intel.com/test_sets/2e962ed8-3534-11e7-814a-5254006e85c2

      Attachments

        1. pfl.txt
          19 kB
        2. plain.txt
          11 kB

        Issue Links

          Activity

            [LU-9484] sanity test 17k fails with 'rsync failed with xattrs enabled'

            Never mind. I see the sanity.sh test were updated and my test suite was missing those changes. It does work.

            simmonsja James A Simmons added a comment - Never mind. I see the sanity.sh test were updated and my test suite was missing those changes. It does work.
            pjones Peter Jones added a comment -

            Landed for 2.10

            pjones Peter Jones added a comment - Landed for 2.10

            I see this landed already. I was testing this with my LU-9183 xattr patch and I was seeing sanity 102a and 102n both failing. I thought it was due to my latest LU-9183 but I traced it down to this patch. I'm looking to see what the fix will be and I place the fix in patch https://review.whamcloud.com/#/c/27240. Strange it passed testing in maloo.

            simmonsja James A Simmons added a comment - I see this landed already. I was testing this with my LU-9183 xattr patch and I was seeing sanity 102a and 102n both failing. I thought it was due to my latest LU-9183 but I traced it down to this patch. I'm looking to see what the fix will be and I place the fix in patch https://review.whamcloud.com/#/c/27240 . Strange it passed testing in maloo.

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/27126/
            Subject: LU-9484 llite: eat -EEXIST on setting trunsted.lov
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 0e90e02ceddd60f24fac9709f3ab9e9421c80315

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/27126/ Subject: LU-9484 llite: eat -EEXIST on setting trunsted.lov Project: fs/lustre-release Branch: master Current Patch Set: Commit: 0e90e02ceddd60f24fac9709f3ab9e9421c80315

            Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: https://review.whamcloud.com/27311
            Subject: LU-9484 llite: eat -EEXIST on setting trusted.lov
            Project: fs/lustre-release
            Branch: pfl
            Current Patch Set: 1
            Commit: 44bbbfed3513798638e01f1ae2447faf80f7c15e

            gerrit Gerrit Updater added a comment - Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: https://review.whamcloud.com/27311 Subject: LU-9484 llite: eat -EEXIST on setting trusted.lov Project: fs/lustre-release Branch: pfl Current Patch Set: 1 Commit: 44bbbfed3513798638e01f1ae2447faf80f7c15e
            bobijam Zhenyu Xu added a comment -

            Yes, you are right, strace -f shows that in all cases, rsync creates temporary file.

            bobijam Zhenyu Xu added a comment - Yes, you are right, strace -f shows that in all cases, rsync creates temporary file.

            Bobi Jam (bobijam@hotmail.com) uploaded a new patch: https://review.whamcloud.com/27126
            Subject: LU-9484 llite: eat error on setting trunsted.lov
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: bd9956f9190c1b0aca37c47802e44a3c82eba9a6

            gerrit Gerrit Updater added a comment - Bobi Jam (bobijam@hotmail.com) uploaded a new patch: https://review.whamcloud.com/27126 Subject: LU-9484 llite: eat error on setting trunsted.lov Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: bd9956f9190c1b0aca37c47802e44a3c82eba9a6

            PS: I think this is pretty critical, since it will totally break rsync, cp, tar, or anything that is trying to restore the lustre.lov or trusted.lov xattrs.

            adilger Andreas Dilger added a comment - PS: I think this is pretty critical, since it will totally break rsync, cp, tar, or anything that is trying to restore the lustre.lov or trusted.lov xattrs.

            The temporary file is always being used by rsync, so I don't think that is different. The main issue is that the strace needs to be run with "-f" to trace all the threads across fork, since it looks like the logs are incomplete (they don't show any file creates, etc)? The "lsetxattr()" that shows up in the "pfl" log is only from when it is calling write() to print the message to the console, not the actual syscall to lsetxattr(). I think both logs are missing the actual operations to the target files.

            That said, setting trusted.lov and lustre.lov should be accepted from userspace even if they already exist, since they will normally be created when the file is first opened, and then will also be "restored" by rsync (or tar or cp) after the file is written (see comments in ll_setstripe_ea_info():

                            /**
                             * b=10667: ignore error.
                             * Silently eat error on setting strusted.lov attribute for
                             * SuSE 9, it added default option to copy all attributes in
                             * 'cp' command.
                             */
            
            

            I also see in that function:

                    if (lump != NULL && lump->lmm_magic == LOV_USER_MAGIC_COMP_V1) {
                            return_err = true;
                            goto setstripe;
                    }
            

            so this is what is causing PFL to fail in the rsync case and not the plain file case. Is there a reason that the PFL code needs to return an error in this case? This came from https://review.whamcloud.com/24851 so you may want to look at the commit history of the master or pfl branches versions of this patch to see why setxattr("trusted.lov") needs to return an error to userspace.

            adilger Andreas Dilger added a comment - The temporary file is always being used by rsync, so I don't think that is different. The main issue is that the strace needs to be run with "-f" to trace all the threads across fork, since it looks like the logs are incomplete (they don't show any file creates, etc)? The " lsetxattr() " that shows up in the "pfl" log is only from when it is calling write() to print the message to the console, not the actual syscall to lsetxattr() . I think both logs are missing the actual operations to the target files. That said, setting trusted.lov and lustre.lov should be accepted from userspace even if they already exist, since they will normally be created when the file is first opened, and then will also be "restored" by rsync (or tar or cp) after the file is written (see comments in ll_setstripe_ea_info() : /** * b=10667: ignore error. * Silently eat error on setting strusted.lov attribute for * SuSE 9, it added default option to copy all attributes in * 'cp' command. */ I also see in that function: if (lump != NULL && lump->lmm_magic == LOV_USER_MAGIC_COMP_V1) { return_err = true ; goto setstripe; } so this is what is causing PFL to fail in the rsync case and not the plain file case. Is there a reason that the PFL code needs to return an error in this case? This came from https://review.whamcloud.com/24851 so you may want to look at the commit history of the master or pfl branches versions of this patch to see why setxattr("trusted.lov") needs to return an error to userspace.
            bobijam Zhenyu Xu added a comment -

            Hi Andreas,

            I uploaded two strace output of the rsync command, the difference is that for plain.txt, /mnt/lustre/d17k.sanity/f17k.sanity is created as a plain file, while pfl.txt corresponding to a PFL file.

            The difference of the strace shows that with PFL file, rsync tries to lsetxattr a temporary file, while lsetxattr does not show up for the plain file.

            rsync FPL file
            read(5, "rsync: rsync_xal_set: lsetxattr("..., 93) = 93
            write(2, "rsync: rsync_xal_set: lsetxattr("..., 92rsync: rsync_xal_set:
            lsetxattr(".f17k.sanity.6HKfyd","lustre.lov") failed: File exists (17)) = 92
            

            I don't know what causes it, could it be that the xattr of PFL is too big that cause rsync tries to use a temporary file? I could not find where I can get the rsync code to have a look.

            bobijam Zhenyu Xu added a comment - Hi Andreas, I uploaded two strace output of the rsync command, the difference is that for plain.txt, /mnt/lustre/d17k.sanity/f17k.sanity is created as a plain file, while pfl.txt corresponding to a PFL file. The difference of the strace shows that with PFL file, rsync tries to lsetxattr a temporary file, while lsetxattr does not show up for the plain file. rsync FPL file read(5, "rsync: rsync_xal_set: lsetxattr("..., 93) = 93 write(2, "rsync: rsync_xal_set: lsetxattr("..., 92rsync: rsync_xal_set: lsetxattr(".f17k.sanity.6HKfyd","lustre.lov") failed: File exists (17)) = 92 I don't know what causes it, could it be that the xattr of PFL is too big that cause rsync tries to use a temporary file? I could not find where I can get the rsync code to have a look.

            People

              bobijam Zhenyu Xu
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: