[LU-11868] ZFS ea size limited to 32K Created: 16/Jan/19 Updated: 07/Feb/20 Resolved: 30/Apr/19 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.13.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Patrick Farrell (Inactive) | Assignee: | Patrick Farrell (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||||||
| Severity: | 3 | ||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||
| Description |
|
The ZFS OSD limits the ea size to DXATTR_MAX_ENTRY_SIZE, which defaults to 32K. This is done when ddp_max_ea_size is set: param->ddp_max_ea_size = DXATTR_MAX_ENTRY_SIZE;
Per Alex Z., this is probably incorrect, since ZFS can use dedicated objects for EAs. This was discovered and confirmed in testing overstriping ( Specifically, test 27ci: 32768/24 bytes per stripe = 1365 So, minus a little for the rest of the layout EA, this matches.
So 32K is too small, especially if we increase the stripe limit to 10K, as the overstriping patch series does in a later patch. The question is what should the limit be.
I would suggest as a possible value: Which is currently in the ldiskfs OSD, but is clearly not ldiskfs specific. I'm curious to get feedback here. |
| Comments |
| Comment by Patrick Farrell (Inactive) [ 16/Jan/19 ] |
|
The problem is, if we do this, I think we'll hit the same OOM issues in autotest as when we try to enable ea_inode on ldiskfs: Comment there explains a bit: I have a theory on this that I'm hoping to be able to confirm from the dump. ea_inode has been the config in many (most?) deployed Cray systems for a few years years, with no issues with OOM. I think we're seeing OOM not because of a bug, but because various buffers are allocated to maximum ea size, which with ea_inode is ~ 1 MiB for ldiskfs. I think we're just running the VMs out of memory because of this. The autotest VMs are tiny, ~1.6 GB. Not sure what to do about that. I'll look at the buffers and see if it's possible to change how they're allocated - there are a bunch of them that depend indirectly on ea_size. |
| Comment by Patrick Farrell (Inactive) [ 16/Jan/19 ] |
|
One other possibility would be to limit the OSD_MAX_EA_SIZE to some smaller value. 10,000 stripes requires ~ 240K of EA, so we could probably limit it to a 256 KiB size. But that's arbitrary, not directly connected to any specific limitation. |
| Comment by Andreas Dilger [ 16/Jan/19 ] |
|
Can you please verify that tools like getfattr, setfattr, cp, rsync, tar, etc. can work with xattrs larger than 32KB or 64KB? AFAIR, there is a hard limit in the kernel for the xattr size that the VFS will even accept, so allowing files with a larger layout internally may cause a lot of problems later. |
| Comment by Patrick Farrell (Inactive) [ 17/Jan/19 ] |
|
Good to know - I'll check on that. I suspect we've got a 64 KiB limit - That is in /usr/include/linux/limits.h: #define XATTR_SIZE_MAX 65536 /* size of an extended attribute value (64k) */ And is used (inconsistently) in the ACL code as an upper limit. That raises a challenging question, then, if we wish to raise stripe count much beyond 2K (Even 2K is over 32 KiB, so...). Interesting - I'll noodle on it. |
| Comment by Patrick Farrell (Inactive) [ 17/Jan/19 ] |
|
Yes, trying to access an ea_size greater than 64K causes E2BIG from getfattr (specifically, the getxattr syscall). Here's an example with 2,730 stripes, trusted.lov is 65792 bytes: 2720 stripes is fine: getxattr("2720file", "trusted.lov", [...], 65536) = 65312 I'm going to assume this limit applies to at least some of the other tools as well, so we'll have to respect it. Just to confirm: This matters for manual editing and backups of MDTs, right? |
| Comment by Gerrit Updater [ 17/Jan/19 ] |
|
Patrick Farrell (pfarrell@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34058 |
| Comment by Gerrit Updater [ 17/Jan/19 ] |
|
Patrick Farrell (pfarrell@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34059 |
| Comment by Gerrit Updater [ 30/Jan/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34059/ |
| Comment by Gerrit Updater [ 30/Apr/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34058/ |
| Comment by Peter Jones [ 30/Apr/19 ] |
|
Landed for 2.13 |