The upstream ext4 directory shrink patches have been refreshed:
The most complexity will be around integration of the "shrink directories on dentry delete" patches with the ext4-pdirop.patch patch, especially related to locking order as levels of the htree are removed. We will also need to disable the htree dx_root removal in make_unindexed() in the same way we do for ext4_update_dx_flag() because this would break htree locking and is of marginal benefit. At the point where all objects in a {SEQ}/d*/ directory tree have been removed on an OST, we can just delete the whole sequence directory tree rather than worry about the few remaining blocks for dx_root.
These will mostly only shrink the directory when it is almost completely empty, but for LU-11912 this would still help reduce space usage as old objects are removed. There still needs to be a patch that merges adjacent htree blocks when they are nearly empty. My proposal for a possible implementation for htree leaf block merging was in this linux-ext4 thread on an earlier version of the patch:
On Mar 25, 2020, at 3:37 AM, Harshad Shirwadkar <harshadshirwadkar@gmail.com> wrote:
> But note that most of the shrinking happens during last 1-2% deletions
> in an average case. Therefore, the next step here is to merge dx nodes
> when possible. That can be achieved by storing the fullness index in
> htree nodes. But that's an on-disk format change. We can instead build
> on tooling added by this patch to perform reverse lookup on a dx
> node and then reading adjacent nodes to check their fullness.
As for storing the fullness on disk changing the on-disk format... That is
true, but the original htree implementation anticipated this and reserved
space in the htree index to store the fullness, so it would not break the
ability of older kernels to access directories with the fullness information.
I think if you used just a few bits (maybe just 2) to store:
0 = unset (every directory today)
1 = under 20% full
2 = under 40% full
3 = under 60% full
or similar. It doesn't matter if they are more full since they won't be
candidates for merging, and then lazily update the htree index fullness
as entries are removed, this will simplify the shrinking process, and will
avoid the need to repeatedly scan the leaf blocks to see if they are empty
enough for merging. It wouldn't be any worse not to store these values
on disk after the first time a "0 = unset" entry was found and not merged,
or setting the fullness on the merged block if it is merged, and running
"e2fsck -D" can easily update the fullness values.
The benefit of using 20%, 40%, and 60% as the fullness markers is that it
is possible to either merge adjacent 60% and 40% blocks or alternately a
60% and two adjacent 20% blocks. Also, since these values are very coarse
they would not need to be updated frequently. If the values are slightly
outdated, then it is again not worse than the "always scan" model (one scan
and the fullness would be updated), but more efficient than repeat scanning.
Using only two bits for fullness also leaves two bits free for future use.
The upstream ext4 directory shrink patches have been refreshed:
The most complexity will be around integration of the "shrink directories on dentry delete" patches with the ext4-pdirop.patch patch, especially related to locking order as levels of the htree are removed. We will also need to disable the htree dx_root removal in make_unindexed() in the same way we do for ext4_update_dx_flag() because this would break htree locking and is of marginal benefit. At the point where all objects in a {SEQ}/d*/ directory tree have been removed on an OST, we can just delete the whole sequence directory tree rather than worry about the few remaining blocks for dx_root.
These will mostly only shrink the directory when it is almost completely empty, but for
LU-11912this would still help reduce space usage as old objects are removed. There still needs to be a patch that merges adjacent htree blocks when they are nearly empty. My proposal for a possible implementation for htree leaf block merging was in this linux-ext4 thread on an earlier version of the patch: