[LU-908] multi-block xattr support Created: 11/Dec/11  Updated: 11/Apr/18  Resolved: 11/Apr/18

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.2.0
Fix Version/s: None

Type: Improvement Priority: Minor
Reporter: Jian Yu Assignee: Jian Yu
Resolution: Incomplete Votes: 0
Labels: None

Issue Links:
Related
is related to LU-80 Wide striping support Resolved
is related to LU-9724 update ext4-large-eas.patch to match ... Resolved
is related to LU-1732 enable wide striping by default Resolved
is related to LU-6220 push ext4/ldiskfs patches upstream if... Open
Rank (Obsolete): 10287

 Description   

The existing large xattr patch for ext4 in http://review.whamcloud.com/1708 saves the xattr value in an external inode if the size of the value is larger than the blocksize. This would introduce performance degradation for operating large xattrs due to the need to create an extra inode for any xattr over 4kB. To improve the performance, we would implement "mid-sized" xattrs (up to 64kB) by referencing them directly from the xattr block pointer. This was requested by the upstream ext4 maintainer before accepting the large xattr patches into mainline Linux.



 Comments   
Comment by Jian Yu [ 11/Dec/11 ]

There are some comments and discussions for this feature in LU-80. Here are some important ones:

Comment from Yu Jian on 09/Nov/11:
-------------------------------------------------
Comments and instructions from Andreas:

As per bug 4424 comment #112 and comment #272, the following improvements need to be made on the existing large EA patch for ext4 and e2fsprogs so as to get the patch and changes accepted and included into the upstream kernel ext4 and e2fsprogs:

1) EA value will be stored in an external inode if value_size > blocksize instead of 1/2 blocksize.
There is no point to move the EA value to an external inode if it would fit into the xattr block. This is doubly true with the next change, to allow multiple blocks for xattrs without having to resort to a separate inode.

2) Increase EA space by allocating a single chunk of up to 64kB/blocksize contiguous xattr blocks and storing the block count in ext4_xattr_header->h_blocks.
To decide how many blocks to be allocated at once, Andreas suggested to just allocate the required number of xattr blocks, and in the rare case where we need to increase the xattr space we can try to reallocate, or simply fail with ENOSPC. This is no worse than the current code if it runs out of space with the single xattr block.

The change to the xattr blocks should be completely transparent to Lustre. ldiskfs can still continue to advertise the maximum xattr size larger than the 64kB which will fit into the external block range, but for most of our uses we will not need more than 64kB. Even the wide striping patch will currently only need 32kB for the 1350-stripe limit. Avoiding the use of the external inode will likely improve performance significantly.

As for testing, the normal way to handle this is to add support to ext4 and create a small test filesystem with a multi-block xattr by mounting it with ext4/ldiskfs (nojournal to save space) and using setfattr with a large xattr, use getfattr to verify it works.

3) Then, add support to e2fsprogs and e2fsck in order to check that the multi-block xattr is correct, and then save the image to the e2fsprogs (something like tests/f_xattr_blocks/image.gz) subdirectory and running the e2fsprogs e2fsck test script on it.

The e2fsprogs "f_xattrs_blocks" image should include several different inodes:

  • xattr blocks with size nearly 64kB (- xattr headers, should pass)
  • xattr blocks with multiple xattrs > 4kB (should pass)
  • xattr blocks with many small xattrs (should pass)
  • xattr blocks that points to completely incorrect block (no header/magic)
    (should clear xattr block from inode)
  • xattr blocks not set in bitmap (should validate xattr, set blocks in bitmap)
  • xattr blocks that points to correct block number, some blocks shared with file
    (should invoke duplicate block pass 1b to clone blocks)
  • xattr block overlapping with inode table (should clone in pass 1b)
  • any other failure cases you can think of

There is already a test case for the external xattr inode, that should stay.

Comment from Yu Jian on 21/Nov/11:
-------------------------------------------------

It would be possible to handle this in the code by handling xattr values that span multiple buffers, which is ignored in the common case because the buffer size should be <= blocksize. However, I don't know how complex this would be. It might also be worthwhile to ask on the ext4 list if anyone has a good idea of how to solve this. I hope their solution is not to require limiting the single xattr size to 4kB or less...

Threads on the linux-ext4 list:
http://www.spinics.net/lists/linux-ext4/msg29059.html
http://www.spinics.net/lists/linux-ext4/msg29061.html

The solution is to support handling xattr data among non-contiguous block buffers.

Comment by Andreas Dilger [ 22/Oct/15 ]

Note: limiting the individual xattr size to < 4KB would make this enhancement totally useless for Lustre, so that is not an acceptable solution for us.

Comment by Andreas Dilger [ 17/Apr/16 ]

Note: the xattr_inode feature has been around long enough in Listre that adding multi-block 64KB xattrs will likely need new feature flag.

Comment by Andreas Dilger [ 16/Sep/17 ]

The xattr_inode feature has been included upstream in e2fsprogs-1.44 and kernel 4.14.

That said, there are a few changes to the on-disk format (made in a forward compatible way) that should be included into our patches, so that filesystems are maximally compatible when we start using those newer kernels.

Comment by Andreas Dilger [ 11/Apr/18 ]

The ea_inode feature is now available in e2fsprogs-1.44 and later. The on-disk format is largely the same as the feature implemented for Lustre, and there are compatibility hooks in the ext4 and e2fsprogs code to handle files using the Lustre-formatted xattr inode.

As such, there is currently not much need for this enhancement to be implemented.

Generated at Sat Feb 10 01:11:34 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.