[LU-279] Test failure on test suite sanity, subtest test_17k Created: 04/May/11  Updated: 22/Apr/13  Resolved: 14/May/11

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.1.0, Lustre 2.4.0

Type: Bug Priority: Minor
Reporter: Maloo Assignee: Zhenyu Xu
Resolution: Fixed Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 5038

 Description   

This issue was created by maloo for Prakash Surya <surya1@llnl.gov>

This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/899e29aa-76a6-11e0-a1b3-52540025f9af.

The sub-test test_17k failed with the following error:

rsync failed with xattrs enabled

Rsync is failing when xattrs are enable due to the lgetxattr function return inconsistent information about the xattr size.

According to lgetxattr's man page:

An empty buffer of size zero can be passed into these calls to return the current size of the named extended attribute

On success, a positive number is returned indicating the size of the extended attribute value

It doesn't seem like these statements are being upheld.

The behavior I am seeing is:

        // ... 

        // path is a pointer to a sting with the pathname to a directory
        // on a lustre filesystem

        ssize_t ret1 = lgetxattr(path, "lustre.lov", NULL, 0); // returns 96

        char *buf = (char *)malloc(ret1);
        ssize_t ret2 = lgetxattr(path, "lustre.lov", buf, ret1); // returns 56

        // Thus at this point ret1 equals 96 and ret2 equals 56
        // Shouldn't ret1 and ret2 be the same value??

        // ...

Shouldn't both invocations of lgetxattr return the same value?



 Comments   
Comment by Andreas Dilger [ 04/May/11 ]

I don't know the details, but I suspect that the first call is somehow just returning the generic maximum size of the lustre.lov xattr, while the second call is returning the actual size of the lustre.lov xattr for that particular file.

lmmsize = 96 = 32 + 3 * 24, so it looks like your filesystem has 3 OSTs, while lmmsize = 56 = 32 + 1 * 24, so the file actually has 1 stripe allocated.

I don't know if this should be considered a bug or not to return an actual xattr size that is smaller than the reported size, though I agree it is a bit surprising, and unfortunate that rsync really cares that much about it. Having rsync call lgetxattr() twice on every file it is processing, for every xattr on the file seems quite inefficient, and regardless of whether this bug is fixed in Lustre or not, I'd also recommend filing a bug with the rsync folks to just have them allocate a decent-sized buffer for reading all xattrs (can be enlarged if they every get ERANGE back) and then malloc() and memcpy() the returned xattr in userspace, which is probably much more efficient than calling into the kernel (and in some cases over the network) twice for each xattr.

Comment by Peter Jones [ 06/May/11 ]

Bobijam

Could you please review this in more detail

Thanks

Peter

Comment by Zhenyu Xu [ 08/May/11 ]

I tried rsync (version 2.6.8), it does not call lgetxattr() twice on every file, and it just use a 100 length buffer for reading xattrs.

And lustre handles xattr retrival in this way: if size 0 is passed, lustre just returns the maximum buffer size which could hold the value.

For "lustre.lov", it is 32(basic lmm) + 16(pool name) + max_stripecount * 24(lov ost data).

Checking the lgetxattr manual, it states

An empty buffer of size zero can be passed into these calls to return the current size of the named extended attribute,
which can be used to estimate the size of a buffer which is sufficiently large to hold the value associated with the
extended attribute.

and since the purpose of this behaviour is to estimate sufficient buffer size, IMHO Lustre's xattr handling does not count as break the statement.

Comment by Prakash Surya (Inactive) [ 09/May/11 ]

I tried rsync (version 2.6.8), it does not call lgetxattr() twice on every file, and it just use a 100 length buffer for reading xattrs.

Here is what I am seeing on my end using rsync v3.0.6 on a RHEL6 based system ...

# /mnt/lustre > rsync --version | head -1
rsync  version 3.0.6  protocol version 30
# /mnt/lustre > ls -lR
.:
total 0
# /mnt/lustre > mkdir -p ./dir ./dir.new
# /mnt/lustre > touch ./dir/file
# /mnt/lustre > ln -s file ./dir/file.lnk
# /mnt/lustre > ls -lR
.:
total 8
drwx------ 2 root root 4096 May  9 11:35 dir
drwx------ 2 root root 4096 May  9 11:35 dir.new

./dir:
total 0
-rw------- 1 root root 0 May  9 11:35 file
lrwxrwxrwx 1 root root 4 May  9 11:35 file.lnk -> file

./dir.new:
total 0
# /mnt/lustre > strace rsync -av -X ./dir/ ./dir.new/ > strace.txt 2>&1
# /mnt/lustre > ls -lR
.:
total 8
drwx------ 2 root root  4096 May  9 11:35 dir
drwx------ 2 root root  4096 May  9 11:35 dir.new
-rw------- 1 root root 10941 May  9 11:36 strace.txt

./dir:
total 0
-rw------- 1 root root 0 May  9 11:35 file
lrwxrwxrwx 1 root root 4 May  9 11:35 file.lnk -> file

./dir.new:
total 0
# /mnt/lustre > cat strace.txt

# ... Output abbreviated to highlight lgetxattr calls ...

llistxattr(".", 0x6797f0, 1024)         = 36
lgetxattr(".", "trusted.lma", 0x0, 0)   = 64
lgetxattr(".", "trusted.lma", "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x04\x00\x00\x02\x00\x00\x00\x0f\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00", 64) = 64
lgetxattr(".", "trusted.link", 0x0, 0)  = 45
lgetxattr(".", "trusted.link", "\xdf\xf1\xea\x11\x01\x00\x00\x00-\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x15\x00\x00\x00\x00\x00\x00\x00\x13\x03D\x8d\x8d\x00\x00\x00\x00dir", 45) = 45
lgetxattr(".", "lustre.lov", 0x0, 0)    = 96
lgetxattr(".", "lustre.lov", "\xd0\x0b\xd1\x0b\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xff\xff\xff\xff\xff\xff\xff\xff\x00\x00\x10\x00\x01\x00\xff\xff", 96) = 32

# ...

# /mnt/lustre > rsync -av -X ./dir/ ./dir.new/
sending incremental file list
get_xattr_data: lgetxattr(".","lustre.lov",96) returned 32

sent 18 bytes  received 12 bytes  60.00 bytes/sec
total size is 0  speedup is 0.00
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1039) [sender=3.0.6]
# /mnt/lustre > ls -lR
.:
total 20
drwx------ 2 root root 4096 May  9 11:35 dir
drwx------ 2 root root 4096 May  9 11:35 dir.new
-rw------- 1 root root  287 May  9 11:46 strace.txt

./dir:
total 0
-rw------- 1 root root 0 May  9 11:35 file
lrwxrwxrwx 1 root root 4 May  9 11:35 file.lnk -> file

./dir.new:
total 0

Looking at the output from strace, it definitely appears to be calling lgetxattr twice for every xattr.

And from digging into the v3.0.8 rsync source, I believe the relevant code is found in xattrs.c

static char *get_xattr_data(const char *fname, const char *name, size_t *len_ptr,
                            int no_missing_error)
{
        size_t datum_len = sys_lgetxattr(fname, name, NULL, 0);

        // ...

        if (datum_len) {
                size_t len = sys_lgetxattr(fname, name, ptr, datum_len);
                if (len != datum_len) {
                        if (len == (size_t)-1) {
                                // ...
                        } else {
                                rprintf(FERROR_XFER,
                                    "get_xattr_data: lgetxattr(\"%s\",\"%s\",%ld)"
                                    " returned %ld\n", full_fname(fname), name,
                                    (long)datum_len, (long)len);
                        }

                        // ...
Comment by Zhenyu Xu [ 09/May/11 ]

confirmed that rsync v3.0.x changed its behavior.

patch tracking at http://review.whamcloud.com/524 to return exact xattr value size when passing 0 size for determing buffer size to hold the xattr value.

Comment by Prakash Surya (Inactive) [ 10/May/11 ]

Great!

Maloo results for test 17k with patch 524: https://maloo.whamcloud.com/test_sets/e51aa7e4-7b4d-11e0-b5bf-52540025f9af

Comment by Build Master (Inactive) [ 13/May/11 ]

Integrated in lustre-master » x86_64,client,el5,inkernel #119
LU-279 Test failure on test suite sanity, subtest test_17k

Oleg Drokin : 701627cb7c17dfd206ea65aaadab3a109047a4fd
Files :

  • lustre/llite/xattr.c
Comment by Build Master (Inactive) [ 13/May/11 ]

Integrated in lustre-master » x86_64,client,el6,inkernel #119
LU-279 Test failure on test suite sanity, subtest test_17k

Oleg Drokin : 701627cb7c17dfd206ea65aaadab3a109047a4fd
Files :

  • lustre/llite/xattr.c
Comment by Build Master (Inactive) [ 13/May/11 ]

Integrated in lustre-master » i686,client,el5,inkernel #119
LU-279 Test failure on test suite sanity, subtest test_17k

Oleg Drokin : 701627cb7c17dfd206ea65aaadab3a109047a4fd
Files :

  • lustre/llite/xattr.c
Comment by Build Master (Inactive) [ 13/May/11 ]

Integrated in lustre-master » i686,client,el5,ofa #119
LU-279 Test failure on test suite sanity, subtest test_17k

Oleg Drokin : 701627cb7c17dfd206ea65aaadab3a109047a4fd
Files :

  • lustre/llite/xattr.c
Comment by Build Master (Inactive) [ 13/May/11 ]

Integrated in lustre-master » x86_64,client,ubuntu1004,inkernel #119
LU-279 Test failure on test suite sanity, subtest test_17k

Oleg Drokin : 701627cb7c17dfd206ea65aaadab3a109047a4fd
Files :

  • lustre/llite/xattr.c
Comment by Build Master (Inactive) [ 13/May/11 ]

Integrated in lustre-master » i686,client,el6,inkernel #119
LU-279 Test failure on test suite sanity, subtest test_17k

Oleg Drokin : 701627cb7c17dfd206ea65aaadab3a109047a4fd
Files :

  • lustre/llite/xattr.c
Comment by Build Master (Inactive) [ 13/May/11 ]

Integrated in lustre-master » x86_64,client,el5,ofa #119
LU-279 Test failure on test suite sanity, subtest test_17k

Oleg Drokin : 701627cb7c17dfd206ea65aaadab3a109047a4fd
Files :

  • lustre/llite/xattr.c
Comment by Build Master (Inactive) [ 13/May/11 ]

Integrated in lustre-master » x86_64,server,el6,inkernel #119
LU-279 Test failure on test suite sanity, subtest test_17k

Oleg Drokin : 701627cb7c17dfd206ea65aaadab3a109047a4fd
Files :

  • lustre/llite/xattr.c
Comment by Build Master (Inactive) [ 13/May/11 ]

Integrated in lustre-master » x86_64,server,el5,inkernel #119
LU-279 Test failure on test suite sanity, subtest test_17k

Oleg Drokin : 701627cb7c17dfd206ea65aaadab3a109047a4fd
Files :

  • lustre/llite/xattr.c
Comment by Build Master (Inactive) [ 13/May/11 ]

Integrated in lustre-master » x86_64,client,ubuntu1004,ofa #119
LU-279 Test failure on test suite sanity, subtest test_17k

Oleg Drokin : 701627cb7c17dfd206ea65aaadab3a109047a4fd
Files :

  • lustre/llite/xattr.c
Comment by Build Master (Inactive) [ 13/May/11 ]

Integrated in lustre-master » i686,server,el5,inkernel #119
LU-279 Test failure on test suite sanity, subtest test_17k

Oleg Drokin : 701627cb7c17dfd206ea65aaadab3a109047a4fd
Files :

  • lustre/llite/xattr.c
Comment by Build Master (Inactive) [ 13/May/11 ]

Integrated in lustre-master » i686,server,el5,ofa #119
LU-279 Test failure on test suite sanity, subtest test_17k

Oleg Drokin : 701627cb7c17dfd206ea65aaadab3a109047a4fd
Files :

  • lustre/llite/xattr.c
Comment by Build Master (Inactive) [ 13/May/11 ]

Integrated in lustre-master » x86_64,server,el5,ofa #119
LU-279 Test failure on test suite sanity, subtest test_17k

Oleg Drokin : 701627cb7c17dfd206ea65aaadab3a109047a4fd
Files :

  • lustre/llite/xattr.c
Comment by Build Master (Inactive) [ 13/May/11 ]

Integrated in lustre-master » i686,server,el6,inkernel #119
LU-279 Test failure on test suite sanity, subtest test_17k

Oleg Drokin : 701627cb7c17dfd206ea65aaadab3a109047a4fd
Files :

  • lustre/llite/xattr.c
Comment by Peter Jones [ 14/May/11 ]

Landed for 2.1. Please reopen if issue reoccurs or more work is required

Generated at Sat Feb 10 01:05:30 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.