[LU-2559] test: lustre-rsync-test test 1 - setfattr <file path>: Operation not supported Created: 02/Jan/13  Updated: 15/Oct/13  Resolved: 11/Jan/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.3.0, Lustre 2.4.0, Lustre 2.1.3
Fix Version/s: Lustre 2.5.0

Type: Bug Priority: Blocker
Reporter: Jay Lan (Inactive) Assignee: Bob Glossman (Inactive)
Resolution: Won't Fix Votes: 0
Labels: LB
Environment:

Server: 2.1.3-1nasS, centos 6.3, 2.6.32_279.2.1.el6
Client: 2.3.0-2nasC, sles11sp2, 3.0.42_0.7.3
mds: service337
oss: service361, service362
clients: service331, service332
Git source: https://github.com/jlan/lustre-nas


Attachments: File lustre-rsync-test_1.tgz    
Severity: 3
Rank (Obsolete): 5985

 Description   

== lustre-rsync-test test 1: Simple Replication ====================================================== 23:39:05 (1357025945)
lustre-MDT0000: Registered changelog user cl1
Replication #1
Lustre filesystem: lustre
MDT device: lustre-MDT0000
Source: /mnt/nbp0-1
Target: /var/acc-sm/target
Target: /var/acc-sm/target2
Statuslog: /var/acc-sm/lustre_rsync.log
Changelog registration: cl1
Starting changelog record: 0
Clear changelog after use: no
Errors: 0
lustre_rsync took 0 seconds
Changelog records consumed: 20
setfattr: /mnt/nbp0-1/d0.lustre-rsync-test/d1/file5: Operation not supported
Replication #2
Replication of operation failed(-17): 20 SLINK (4) [0x200000400:0xe:0x0] [0x200000400:0x3:0x0] link3
Lustre filesystem: lustre
MDT device: lustre-MDT0000
Source: /mnt/nbp0-1
Target: /var/acc-sm/target
Target: /var/acc-sm/target2
Statuslog: /var/acc-sm/lustre_rsync.log
Changelog registration: cl1
Starting changelog record: 20
Clear changelog after use: no
Errors: 1
lustre_rsync took 0 seconds
Changelog records consumed: 4
/var/acc-sm/target/d0.lustre-rsync-test/d1/file5: user.foo: No such attribute
/var/acc-sm/target2/d0.lustre-rsync-test/d1/file5: user.foo: No such attribute
lustre-rsync-test test_1: @@@@@@ FAIL: Error in replicating xattrs.
Trace dump:
= /usr/lib64/lustre/tests/test-framework.sh:3643:error_noexit()
= /usr/lib64/lustre/tests/test-framework.sh:3665:error()
= /usr/lib64/lustre/tests/lustre-rsync-test.sh:193:test_1()
= /usr/lib64/lustre/tests/test-framework.sh:3907:run_one()
= /usr/lib64/lustre/tests/test-framework.sh:3937:run_one_logged()
= /usr/lib64/lustre/tests/test-framework.sh:3808:run_test()
= /usr/lib64/lustre/tests/lustre-rsync-test.sh:205:main()
Dumping lctl log to /var/acc-sm/test_logs/lustre-rsync-test.test_1.*.1357025946.log
FAIL 1 (3s)



 Comments   
Comment by Peter Jones [ 03/Jan/13 ]

Bob

Could you please look into this one?

Thanks

Peter

Comment by Bob Glossman (Inactive) [ 03/Jan/13 ]

So far I haven't been able to reproduce the failure. I have been trying to vary the client side only. It will take me a bit longer to bring up a 2.1 server in case it's the server side that causes the problem. I will keep trying to reproduce it.

Comment by Bob Glossman (Inactive) [ 03/Jan/13 ]

Looks like this is an interop problem between 2.3 clients and 2.1 servers. The key fact is the following error that shows up in client dmesg at mount time:

Lustre: Disabling user_xattr feature because it is not supported on the server

This happens with any 2.3 or 2.3+ client, sles11 or centos.
Once the client is mounted with user_xattr disabled, any setfattr command attempted by the client will fail.

Not sure exactly why the 2.3 client thinks the 2.1 server is incapable of doing user_xattr.

Comment by Bob Glossman (Inactive) [ 04/Jan/13 ]

nasf, adding you to the watcher list as Peter asked. This bug seems to be due to version interop problems with connect flags. Do you know anything in the 2.3 timeframe that might be related? It was suggested you might know or have worked in this area. Just looking at the 2.1 vs. 2.3 code I haven't been able to spot anything obvious, all the flag definitions and so forth look compatible.

Comment by Jay Lan (Inactive) [ 10/Jan/13 ]

I saw this error again when testing the SUSE version of lustre-2.1.3 client for sles11sp2. Note that my servers are still 2.1.3.

Comment by Bob Glossman (Inactive) [ 10/Jan/13 ]

Jay, are you sure you have a 2.1.3 client for sles11 sp2? I think the standard prebuilt rpms for sles11 are for sp1, not sp2. I haven't been able to build any client 2.1 for sp2. I've only succeeded in building client 2.3 and master for sp2.

you could do lctl lustre_build_version on the client to double check.

Comment by Jay Lan (Inactive) [ 10/Jan/13 ]

Bob, forget about the 2.1.3 client I mentioned in earlier post on Jan 10. That was a beta version from SUSE. Since Peter said Intel would not support that version, it is irrelevant.

But I did experience that problem in 2.3.0 though as I first reported.

Comment by nasf (Inactive) [ 11/Jan/13 ]

Can you perform "mount" on both 2.1 server and 2.3 client to check (and paste out) which flags have been enabled when you met the issues "Lustre: Disabling user_xattr feature because it is not supported on the server"? Thanks.

Comment by Bob Glossman (Inactive) [ 11/Jan/13 ]

flags reported by mount differ from those shown in /proc/mounts. Here are both.

2.3 client:

# mount
centos2:/lustre on /mnt/lustre type lustre (rw,user_xattr,flock)

# cat /proc/mounts
192.168.0.36@tcp:/lustre /mnt/lustre lustre rw,relatime,flock 0 0

2.1 server:

# mount
/dev/sdb on /mnt/mds1 type lustre (rw)

# cat /proc/mounts
/dev/sdb /mnt/mds1 lustre ro 0 0

I think I see where you are going with this. Looks like MGS/MDS gets mounted on the server without user_xattr set. Looking at the test scripts I see a difference between 2.1 and 2.3. 2.1 has MDS_MOUNT_OPTS explicitly defined with user_xattr in it in cfg/local.sh, 2.3 has empty opts.

I'm guessing that in the 2.3 timeframe server mounts always do user_xattr by default and no longer require explicit flags. This messes up when using script & cfg files from 2.3 on 2.1 servers.

Jay, can you check this out by adding explicit
MDS_MOUNT_OPTS="-o user_xattr,acl"
to your cfg file on clients?
If you just copied or modified the cfg/local.sh from the build these are set empty.
Setting MDS_MOUNT_OPTS as an environment variable should work too.

Try the test with this change.
You can do the failing test alone with:

auster -rv lustre-rsync-test --only 1

Comment by Jay Lan (Inactive) [ 11/Jan/13 ]

Bob, you are right again!

After adding user_xattr to MDS_MOUNT_OPTS, it worked!

I did not include lustre-rsync-test when I converted my test scripts from acc-sm to auster until recently. That explains why I did not get hit by the problem earlier.

Comment by Bob Glossman (Inactive) [ 11/Jan/13 ]

Thanks for the confirmation, Jay.
Is it OK then for me to go ahead and close this bug? Is it sufficient to know that setting extra cfg or environment variables is needed when interoperating with 2.3 clients on 2.1 servers?

Comment by Peter Jones [ 11/Jan/13 ]

Bob

Before you close the ticket - will this issue affect 2.4 clients trying to interoperate with 2.1 servers? If so, we should land a change to master to avoid this creating noise in our interop testing for 2.4

Peter

Comment by Jay Lan (Inactive) [ 11/Jan/13 ]

Yes, Bob, it is OK to close it if you do not need to make any change. Thanks.

Comment by Bob Glossman (Inactive) [ 11/Jan/13 ]

Peter, Pretty sure this will impact any newer version client trying to run on 2.1 servers. All MDS_MOUNT_OPTS are blank by default in standard configs. What I'm not sure about is if our test environment does something to work around this during interop tests. Since it can easily be masked entirely by just setting environment variables, our test setups may already do so on Toro. If we routinely do new client on old server during our regular interop testing, I don't see how we could have avoided seeing the problem without such setup.

Comment by Peter Jones [ 11/Jan/13 ]

Bob

You should be able to find recent interop test runs to see whether we are handling this already. If not then please could you open a TT ticket outlining the changes that we need to make to avoid this unnecessary noise in our test results.

Thanks

Peter

Comment by Bob Glossman (Inactive) [ 11/Jan/13 ]

Peter
Sure looks like our test setup takes care of it. In a full run from 12/19 of master client on 2.1 server lustre-rsync-test passed fine.

Looking in the log of the provisioning phase, I see:

21:01:04:MDS_MOUNT_OPTS=${MDS_MOUNT_OPTS:-"-o user_xattr,acl"}

Later on those opts show up explicitly on the command line of the MDS mount command:

21:01:38:CMD: client-20-ib mkdir -p /mnt/mds1; mount -t lustre -o user_xattr,acl /dev/lvm-MDS/P1 /mnt/mds1

I think there's nothing needed to be done here.

Comment by Peter Jones [ 11/Jan/13 ]

ok then close away!

Generated at Sat Feb 10 01:26:13 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.