<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:49:15 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-12051] ldiskfs directory shrink</title>
                <link>https://jira.whamcloud.com/browse/LU-12051</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Shrinking directories in ldiskfs would be desirable for cases where a directory had a large number of files created, but the files are deleted and the directory is empty and could be deallocated.&lt;/p&gt;

&lt;p&gt;There is a patch submitted to upstream ext4 that is the start of the support for this functionality, but it is not very aggressive about removing directory blocks: &lt;a href=&quot;https://patchwork.ozlabs.org/patch/1048658/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://patchwork.ozlabs.org/patch/1048658/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;There is intended to be additional work in this area to improve the directory shrinking functionality.&lt;/p&gt;

&lt;p&gt;In addition to the directory shrinking, removal of old OST object directory trees (&lt;tt&gt;O/&amp;#42;/d&amp;#42;&lt;/tt&gt;) is also useful, and could potentially be a substitute for having online directory shrink once the directories are completely empty.&lt;/p&gt;</description>
                <environment></environment>
        <key id="55092">LU-12051</key>
            <summary>ldiskfs directory shrink</summary>
                <type id="4" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11310&amp;avatarType=issuetype">Improvement</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="1" iconUrl="https://jira.whamcloud.com/images/icons/statuses/open.png" description="The issue is open and ready for the assignee to start work on it.">Open</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="wc-triage">WC Triage</assignee>
                                    <reporter username="adilger">Andreas Dilger</reporter>
                        <labels>
                            <label>ldiskfs</label>
                    </labels>
                <created>Thu, 7 Mar 2019 23:57:22 +0000</created>
                <updated>Mon, 23 Oct 2023 23:59:58 +0000</updated>
                                            <version>Lustre 2.12.0</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>6</watches>
                                                                            <comments>
                            <comment id="244283" author="adilger" created="Wed, 20 Mar 2019 07:23:16 +0000"  >&lt;p&gt;I suspect that there isn&apos;t a lot of work we &lt;em&gt;need&lt;/em&gt; to do in this area, but some review and testing of the upstream patch linked in the description (with feedback directly to &lt;tt&gt;linux-ext4@kernel.vger.org&lt;/tt&gt; and the author) would probably speed things up.  After the code is landed upstream, or is at least showing good benefits and is robust, we could backport it to &lt;tt&gt;ldiskfs/kernel_patches&lt;/tt&gt; for use until we catch up with a newer kernel.&lt;/p&gt;</comment>
                            <comment id="267122" author="adilger" created="Wed, 8 Apr 2020 06:12:28 +0000"  >&lt;p&gt;The upstream ext4 directory shrink patches have been refreshed:&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;&lt;a href=&quot;https://patchwork.ozlabs.org/patch/1267257/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;PATCH v2,1/3 ext4: return lblk from ext4_find_entry&lt;/a&gt;&lt;/li&gt;
	&lt;li&gt;&lt;a href=&quot;https://patchwork.ozlabs.org/patch/1267259/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;PATCH v2,2/3 ext4: shrink directories on dentry delete&lt;/a&gt;&lt;/li&gt;
	&lt;li&gt;&lt;a href=&quot;https://patchwork.ozlabs.org/patch/1267258/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;PATCH v2,3/3 ext4: reimplement ext4_empty_dir() using is_dirent_block_empty&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;The most complexity will be around integration of the &quot;&lt;tt&gt;shrink directories on dentry delete&lt;/tt&gt;&quot; patches with the &lt;tt&gt;ext4-pdirop.patch&lt;/tt&gt; patch, especially related to locking order as levels of the htree are removed.  We will also need to disable the htree &lt;tt&gt;dx_root&lt;/tt&gt; removal in &lt;tt&gt;make_unindexed()&lt;/tt&gt; in the same way we do for &lt;tt&gt;ext4_update_dx_flag()&lt;/tt&gt; because this would break htree locking and is of marginal benefit.  At the point where all objects in a {&lt;tt&gt;SEQ}/d&amp;#42;/&lt;/tt&gt; directory tree have been removed on an OST, we can just delete the whole sequence directory tree rather than worry about the few remaining blocks for &lt;tt&gt;dx_root&lt;/tt&gt;.&lt;/p&gt;

&lt;p&gt;These will mostly only shrink the directory when it is almost completely empty, but for &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-11912&quot; title=&quot;reduce number of OST objects created per MDS Sequence&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-11912&quot;&gt;&lt;del&gt;LU-11912&lt;/del&gt;&lt;/a&gt; this would still help reduce space usage as old objects are removed.  There still needs to be a patch that merges adjacent htree blocks when they are nearly empty.  My proposal for a possible implementation for htree leaf block merging was in &lt;a href=&quot;https://lore.kernel.org/linux-ext4/04F44879-15DE-42EE-B87A-0690E9B13BB2@dilger.ca/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;this linux-ext4 thread&lt;/a&gt; on an earlier version of the patch:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;On Mar 25, 2020, at 3:37 AM, Harshad Shirwadkar &amp;lt;harshadshirwadkar@gmail.com&amp;gt; wrote:&lt;br/&gt;
&amp;gt; But note that most of the shrinking happens during last 1-2% deletions&lt;br/&gt;
&amp;gt; in an average case. Therefore, the next step here is to merge dx nodes&lt;br/&gt;
&amp;gt; when possible. That can be achieved by storing the fullness index in&lt;br/&gt;
&amp;gt; htree nodes. But that&apos;s an on-disk format change. We can instead build&lt;br/&gt;
&amp;gt; on tooling added by this patch to perform reverse lookup on a dx&lt;br/&gt;
&amp;gt; node and then reading adjacent nodes to check their fullness.&lt;/p&gt;

&lt;p&gt;As for storing the fullness on disk changing the on-disk format...  That is&lt;br/&gt;
true, but the original htree implementation anticipated this and reserved&lt;br/&gt;
space in the htree index to store the fullness, so it would not break the&lt;br/&gt;
ability of older kernels to access directories with the fullness information.&lt;/p&gt;

&lt;p&gt;I think if you used just a few bits (maybe just 2) to store:&lt;br/&gt;
0 = unset (every directory today)&lt;br/&gt;
1 = under 20% full&lt;br/&gt;
2 = under 40% full&lt;br/&gt;
3 = under 60% full&lt;/p&gt;

&lt;p&gt;or similar.  It doesn&apos;t matter if they are more full since they won&apos;t be&lt;br/&gt;
candidates for merging, and then lazily update the htree index fullness&lt;br/&gt;
as entries are removed, this will simplify the shrinking process, and will&lt;br/&gt;
avoid the need to repeatedly scan the leaf blocks to see if they are empty&lt;br/&gt;
enough for merging.  It wouldn&apos;t be any worse &lt;b&gt;not&lt;/b&gt; to store these values&lt;br/&gt;
on disk after the first time a &quot;0 = unset&quot; entry was found and not merged,&lt;br/&gt;
or setting the fullness on the merged block if it is merged, and running&lt;br/&gt;
&quot;e2fsck -D&quot; can easily update the fullness values.&lt;/p&gt;

&lt;p&gt;The benefit of using 20%, 40%, and 60% as the fullness markers is that it&lt;br/&gt;
is possible to either merge adjacent 60% and 40% blocks or alternately a&lt;br/&gt;
60% and two adjacent 20% blocks.  Also, since these values are very coarse&lt;br/&gt;
they would not need to be updated frequently.  If the values are slightly&lt;br/&gt;
outdated, then it is again not worse than the &quot;always scan&quot; model (one scan&lt;br/&gt;
and the fullness would be updated), but more efficient than repeat scanning.&lt;/p&gt;

&lt;p&gt;Using only two bits for fullness also leaves two bits free for future use.&lt;/p&gt;&lt;/blockquote&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="54735">LU-11912</issuekey>
        </issuelink>
                            </outwardlinks>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="38564">LU-8465</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i00cx3:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                </customfields>
    </item>
</channel>
</rss>