Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2559

test: lustre-rsync-test test 1 - setfattr <file path>: Operation not supported

Details

    • Bug
    • Resolution: Won't Fix
    • Blocker
    • Lustre 2.5.0
    • Lustre 2.3.0, Lustre 2.4.0, Lustre 2.1.3
    • 3
    • 5985

    Description

      == lustre-rsync-test test 1: Simple Replication ====================================================== 23:39:05 (1357025945)
      lustre-MDT0000: Registered changelog user cl1
      Replication #1
      Lustre filesystem: lustre
      MDT device: lustre-MDT0000
      Source: /mnt/nbp0-1
      Target: /var/acc-sm/target
      Target: /var/acc-sm/target2
      Statuslog: /var/acc-sm/lustre_rsync.log
      Changelog registration: cl1
      Starting changelog record: 0
      Clear changelog after use: no
      Errors: 0
      lustre_rsync took 0 seconds
      Changelog records consumed: 20
      setfattr: /mnt/nbp0-1/d0.lustre-rsync-test/d1/file5: Operation not supported
      Replication #2
      Replication of operation failed(-17): 20 SLINK (4) [0x200000400:0xe:0x0] [0x200000400:0x3:0x0] link3
      Lustre filesystem: lustre
      MDT device: lustre-MDT0000
      Source: /mnt/nbp0-1
      Target: /var/acc-sm/target
      Target: /var/acc-sm/target2
      Statuslog: /var/acc-sm/lustre_rsync.log
      Changelog registration: cl1
      Starting changelog record: 20
      Clear changelog after use: no
      Errors: 1
      lustre_rsync took 0 seconds
      Changelog records consumed: 4
      /var/acc-sm/target/d0.lustre-rsync-test/d1/file5: user.foo: No such attribute
      /var/acc-sm/target2/d0.lustre-rsync-test/d1/file5: user.foo: No such attribute
      lustre-rsync-test test_1: @@@@@@ FAIL: Error in replicating xattrs.
      Trace dump:
      = /usr/lib64/lustre/tests/test-framework.sh:3643:error_noexit()
      = /usr/lib64/lustre/tests/test-framework.sh:3665:error()
      = /usr/lib64/lustre/tests/lustre-rsync-test.sh:193:test_1()
      = /usr/lib64/lustre/tests/test-framework.sh:3907:run_one()
      = /usr/lib64/lustre/tests/test-framework.sh:3937:run_one_logged()
      = /usr/lib64/lustre/tests/test-framework.sh:3808:run_test()
      = /usr/lib64/lustre/tests/lustre-rsync-test.sh:205:main()
      Dumping lctl log to /var/acc-sm/test_logs/lustre-rsync-test.test_1.*.1357025946.log
      FAIL 1 (3s)

      Attachments

        Activity

          [LU-2559] test: lustre-rsync-test test 1 - setfattr <file path>: Operation not supported

          Peter, Pretty sure this will impact any newer version client trying to run on 2.1 servers. All MDS_MOUNT_OPTS are blank by default in standard configs. What I'm not sure about is if our test environment does something to work around this during interop tests. Since it can easily be masked entirely by just setting environment variables, our test setups may already do so on Toro. If we routinely do new client on old server during our regular interop testing, I don't see how we could have avoided seeing the problem without such setup.

          bogl Bob Glossman (Inactive) added a comment - Peter, Pretty sure this will impact any newer version client trying to run on 2.1 servers. All MDS_MOUNT_OPTS are blank by default in standard configs. What I'm not sure about is if our test environment does something to work around this during interop tests. Since it can easily be masked entirely by just setting environment variables, our test setups may already do so on Toro. If we routinely do new client on old server during our regular interop testing, I don't see how we could have avoided seeing the problem without such setup.

          Yes, Bob, it is OK to close it if you do not need to make any change. Thanks.

          jaylan Jay Lan (Inactive) added a comment - Yes, Bob, it is OK to close it if you do not need to make any change. Thanks.
          pjones Peter Jones added a comment -

          Bob

          Before you close the ticket - will this issue affect 2.4 clients trying to interoperate with 2.1 servers? If so, we should land a change to master to avoid this creating noise in our interop testing for 2.4

          Peter

          pjones Peter Jones added a comment - Bob Before you close the ticket - will this issue affect 2.4 clients trying to interoperate with 2.1 servers? If so, we should land a change to master to avoid this creating noise in our interop testing for 2.4 Peter

          Thanks for the confirmation, Jay.
          Is it OK then for me to go ahead and close this bug? Is it sufficient to know that setting extra cfg or environment variables is needed when interoperating with 2.3 clients on 2.1 servers?

          bogl Bob Glossman (Inactive) added a comment - Thanks for the confirmation, Jay. Is it OK then for me to go ahead and close this bug? Is it sufficient to know that setting extra cfg or environment variables is needed when interoperating with 2.3 clients on 2.1 servers?

          Bob, you are right again!

          After adding user_xattr to MDS_MOUNT_OPTS, it worked!

          I did not include lustre-rsync-test when I converted my test scripts from acc-sm to auster until recently. That explains why I did not get hit by the problem earlier.

          jaylan Jay Lan (Inactive) added a comment - Bob, you are right again! After adding user_xattr to MDS_MOUNT_OPTS, it worked! I did not include lustre-rsync-test when I converted my test scripts from acc-sm to auster until recently. That explains why I did not get hit by the problem earlier.
          bogl Bob Glossman (Inactive) added a comment - - edited

          flags reported by mount differ from those shown in /proc/mounts. Here are both.

          2.3 client:

          # mount
          centos2:/lustre on /mnt/lustre type lustre (rw,user_xattr,flock)
          
          # cat /proc/mounts
          192.168.0.36@tcp:/lustre /mnt/lustre lustre rw,relatime,flock 0 0
          

          2.1 server:

          # mount
          /dev/sdb on /mnt/mds1 type lustre (rw)
          
          # cat /proc/mounts
          /dev/sdb /mnt/mds1 lustre ro 0 0
          

          I think I see where you are going with this. Looks like MGS/MDS gets mounted on the server without user_xattr set. Looking at the test scripts I see a difference between 2.1 and 2.3. 2.1 has MDS_MOUNT_OPTS explicitly defined with user_xattr in it in cfg/local.sh, 2.3 has empty opts.

          I'm guessing that in the 2.3 timeframe server mounts always do user_xattr by default and no longer require explicit flags. This messes up when using script & cfg files from 2.3 on 2.1 servers.

          Jay, can you check this out by adding explicit
          MDS_MOUNT_OPTS="-o user_xattr,acl"
          to your cfg file on clients?
          If you just copied or modified the cfg/local.sh from the build these are set empty.
          Setting MDS_MOUNT_OPTS as an environment variable should work too.

          Try the test with this change.
          You can do the failing test alone with:

          auster -rv lustre-rsync-test --only 1

          bogl Bob Glossman (Inactive) added a comment - - edited flags reported by mount differ from those shown in /proc/mounts. Here are both. 2.3 client: # mount centos2:/lustre on /mnt/lustre type lustre (rw,user_xattr,flock) # cat /proc/mounts 192.168.0.36@tcp:/lustre /mnt/lustre lustre rw,relatime,flock 0 0 2.1 server: # mount /dev/sdb on /mnt/mds1 type lustre (rw) # cat /proc/mounts /dev/sdb /mnt/mds1 lustre ro 0 0 I think I see where you are going with this. Looks like MGS/MDS gets mounted on the server without user_xattr set. Looking at the test scripts I see a difference between 2.1 and 2.3. 2.1 has MDS_MOUNT_OPTS explicitly defined with user_xattr in it in cfg/local.sh, 2.3 has empty opts. I'm guessing that in the 2.3 timeframe server mounts always do user_xattr by default and no longer require explicit flags. This messes up when using script & cfg files from 2.3 on 2.1 servers. Jay, can you check this out by adding explicit MDS_MOUNT_OPTS="-o user_xattr,acl" to your cfg file on clients? If you just copied or modified the cfg/local.sh from the build these are set empty. Setting MDS_MOUNT_OPTS as an environment variable should work too. Try the test with this change. You can do the failing test alone with: auster -rv lustre-rsync-test --only 1

          Can you perform "mount" on both 2.1 server and 2.3 client to check (and paste out) which flags have been enabled when you met the issues "Lustre: Disabling user_xattr feature because it is not supported on the server"? Thanks.

          yong.fan nasf (Inactive) added a comment - Can you perform "mount" on both 2.1 server and 2.3 client to check (and paste out) which flags have been enabled when you met the issues "Lustre: Disabling user_xattr feature because it is not supported on the server"? Thanks.

          Bob, forget about the 2.1.3 client I mentioned in earlier post on Jan 10. That was a beta version from SUSE. Since Peter said Intel would not support that version, it is irrelevant.

          But I did experience that problem in 2.3.0 though as I first reported.

          jaylan Jay Lan (Inactive) added a comment - Bob, forget about the 2.1.3 client I mentioned in earlier post on Jan 10. That was a beta version from SUSE. Since Peter said Intel would not support that version, it is irrelevant. But I did experience that problem in 2.3.0 though as I first reported.

          Jay, are you sure you have a 2.1.3 client for sles11 sp2? I think the standard prebuilt rpms for sles11 are for sp1, not sp2. I haven't been able to build any client 2.1 for sp2. I've only succeeded in building client 2.3 and master for sp2.

          you could do lctl lustre_build_version on the client to double check.

          bogl Bob Glossman (Inactive) added a comment - Jay, are you sure you have a 2.1.3 client for sles11 sp2? I think the standard prebuilt rpms for sles11 are for sp1, not sp2. I haven't been able to build any client 2.1 for sp2. I've only succeeded in building client 2.3 and master for sp2. you could do lctl lustre_build_version on the client to double check.

          I saw this error again when testing the SUSE version of lustre-2.1.3 client for sles11sp2. Note that my servers are still 2.1.3.

          jaylan Jay Lan (Inactive) added a comment - I saw this error again when testing the SUSE version of lustre-2.1.3 client for sles11sp2. Note that my servers are still 2.1.3.

          nasf, adding you to the watcher list as Peter asked. This bug seems to be due to version interop problems with connect flags. Do you know anything in the 2.3 timeframe that might be related? It was suggested you might know or have worked in this area. Just looking at the 2.1 vs. 2.3 code I haven't been able to spot anything obvious, all the flag definitions and so forth look compatible.

          bogl Bob Glossman (Inactive) added a comment - nasf, adding you to the watcher list as Peter asked. This bug seems to be due to version interop problems with connect flags. Do you know anything in the 2.3 timeframe that might be related? It was suggested you might know or have worked in this area. Just looking at the 2.1 vs. 2.3 code I haven't been able to spot anything obvious, all the flag definitions and so forth look compatible.

          People

            bogl Bob Glossman (Inactive)
            jaylan Jay Lan (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: