[LU-16972] optimize e2fsck ea_refcount processing Created: 20/Jul/23 Updated: 09/Aug/23 Resolved: 09/Aug/23 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Minor |
| Reporter: | Andreas Dilger | Assignee: | Dongyang Li |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | e2fsck | ||
| Environment: |
Exascaler version: 5.2.5 |
||
| Issue Links: |
|
||||
| Severity: | 3 | ||||
| Rank (Obsolete): | 9223372036854775807 | ||||
| Description |
|
Filesystems with a large number of external xattr blocks, such as MDTs with large PFL layouts that overflow the in-inode xattr space, cause a significant slowdown during pass1 inode processing because of inefficient management of the refcount_extra list that stores the refcounts for shared external xattr blocks. The refcount_extra structure is implemented as a linear array of ea_refcount_el elements:
struct ea_refcount_el {
/* ea_key could either be an inode number or block number. */
ea_key_t ea_key;
ea_value_t ea_value;
};
During element insertion, a binary search is used in the array to find the element, and if not found, the higher elements in the array are memmove()'d upward to make room for the new element in the list. If the currently allocated array is full, then it is realloc()'d with 100 more elements, and the whole array is copied to the new memory. If there are a large number of shared xattr blocks then the array can grow very large and the processing of the linear array will be very inefficient and CPU intensive due to frequent large memory allocations and copies. |
| Comments |
| Comment by Gerrit Updater [ 21/Jul/23 ] |
|
"Li Dongyang <dongyangli@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/tools/e2fsprogs/+/51729 |
| Comment by Andreas Dilger [ 21/Jul/23 ] |
|
I was wondering whether this code could use the "icount" structure used elsewhere in e2fsck (eg. duplicate block handling). This is optimized for a count of 0/1 in a bitmap, and puts entries into an array only for count > 1. The drawback is that it has to allocate a full bitmap of entries for all blocks counted, even though the usage would be very sparse. For an 10TiB MDT filesystem, the icount bitmap would be 2.5B bits, or 320MiB of RAM per icount. The current ea_refcount struct is 2x__u64=16 bytes per element, which means the break-even point would be 20M xattr blocks, which is quite high for most filesystems. I think the better alternative is to expand on what is in the current patch, namely that the block bitmap is used to track xattr blocks with count = 1 and then only add the ea_refcount for xattr blocks with refcount > 1. The one complexity is whether this will catch xattr blocks that are duplicated with regular file blocks, but I think it would be handled correctly anyway. |
| Comment by Gerrit Updater [ 26/Jul/23 ] |
|
"Li Dongyang <dongyangli@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/tools/e2fsprogs/+/51763 |
| Comment by Gerrit Updater [ 01/Aug/23 ] |
|
"Li Dongyang <dongyangli@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/51827 |
| Comment by Gerrit Updater [ 07/Aug/23 ] |
|
"Andreas Dilger <adilger@whamcloud.com>" merged in patch https://review.whamcloud.com/c/tools/e2fsprogs/+/51763/ |
| Comment by Gerrit Updater [ 07/Aug/23 ] |
|
"Andreas Dilger <adilger@whamcloud.com>" merged in patch https://review.whamcloud.com/c/tools/e2fsprogs/+/51729/ |
| Comment by Gerrit Updater [ 07/Aug/23 ] |
|
"Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/tools/e2fsprogs/+/51885 |
| Comment by Gerrit Updater [ 07/Aug/23 ] |
|
"Andreas Dilger <adilger@whamcloud.com>" merged in patch https://review.whamcloud.com/c/tools/e2fsprogs/+/51885/ |
| Comment by Andreas Dilger [ 09/Aug/23 ] |
|
Fix included in 1.47.0-wc4 |