<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:51:05 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-5391] osd-zfs: ZAP objects use 4K blocks for both indirect and leaf blocks</title>
                <link>https://jira.whamcloud.com/browse/LU-5391</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;For example, on a MDS an oi.xx directory:&lt;br/&gt;
Object  lvl   iblk   dblk  dsize  lsize   %full   type&lt;br/&gt;
       44   4    4K     4K 1.06M  4.04M 99.71 ZFS directory&lt;/p&gt;

&lt;p&gt;The code in __osd_zap_create():&lt;br/&gt;
oid = zap_create_flags(uos-&amp;gt;os, 0, flags | ZAP_FLAG_HASH64, DMU_OT_DIRECTORY_CONTENTS, 12, 12, DMU_OT_SA, DN_MAX_BONUSLEN, tx);&lt;/p&gt;

&lt;p&gt;This seemed inefficient. In fact the default leaf block size for fat ZAP is 16K:&lt;br/&gt;
int fzap_default_block_shift = 14; /* 16k blocksize */&lt;/p&gt;

&lt;p&gt;I changed the block sizes to 16K indirect and 128K leaf, and saw a 10% increase in mds-survey creation rates.&lt;/p&gt;</description>
                <environment></environment>
        <key id="25685">LU-5391</key>
            <summary>osd-zfs: ZAP objects use 4K blocks for both indirect and leaf blocks</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="isaac">Isaac Huang</assignee>
                                    <reporter username="isaac">Isaac Huang</reporter>
                        <labels>
                            <label>prz</label>
                            <label>zfs</label>
                    </labels>
                <created>Tue, 22 Jul 2014 05:29:12 +0000</created>
                <updated>Fri, 26 Sep 2014 04:20:49 +0000</updated>
                            <resolved>Tue, 26 Aug 2014 15:24:13 +0000</resolved>
                                    <version>Lustre 2.7.0</version>
                                    <fixVersion>Lustre 2.7.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>6</watches>
                                                                            <comments>
                            <comment id="89708" author="isaac" created="Tue, 22 Jul 2014 05:32:27 +0000"  >&lt;p&gt;Brian or Alex, can you comment?&lt;/p&gt;</comment>
                            <comment id="89717" author="bzzz" created="Tue, 22 Jul 2014 09:31:31 +0000"  >&lt;p&gt;4K seem to be small, of course, but I don&apos;t think 128K is good either. this might be OK for a relatively small directories, but when a directory is big, then evenly distributed load will be touching many different blocks leading to very low write density - we&apos;ll have to make 128K I/O for very few entries to modify. I&apos;d suggest to stay with default 16K.&lt;/p&gt;</comment>
                            <comment id="89743" author="isaac" created="Tue, 22 Jul 2014 16:37:54 +0000"  >&lt;p&gt;I agree that 128K was too aggressive. I changed to 16K/16K and still got 7.4% and 14% increases (over current 4K/4K) for mds-survey creation and destroy rates respectively.&lt;/p&gt;

&lt;p&gt;I&apos;d suggest to increase indirect block to 16K, which is the default indirect block size used by ZPL directories (e.g. / on a MDT), to reduce levels of indirection, and to increase leaf block to 16K, which matches fzap_default_block_shift.&lt;/p&gt;</comment>
                            <comment id="89749" author="isaac" created="Tue, 22 Jul 2014 16:58:08 +0000"  >&lt;p&gt;Patch pushed to &lt;a href=&quot;http://review.whamcloud.com/#/c/11182/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#/c/11182/&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="89775" author="behlendorf" created="Tue, 22 Jul 2014 18:48:03 +0000"  >&lt;p&gt;This is going to be a trade off between memory usage, bandwidth, and IO/s.  I think adopting the ZPL defaults of 16k strikes a good balance  but I&apos;d be careful about drawing any performance conclusions.  This may help create performance but it will hurt other workloads.&lt;/p&gt;

&lt;p&gt;For example, I&apos;d expect that increasing the OI leaf block size on the MDS would improve performance as long as the entire OI can be cached (small filesystems).  But once the OI size is significantly larger than memory (large filesystems) I see two downsides.  1) pulling in a larger block takes slightly longer, and 2) because the FIDs are hashed uniformly over the leaves this effectively reduces by 4x the cached working set size for a given set of FIDs.&lt;/p&gt;</comment>
                            <comment id="89794" author="isaac" created="Tue, 22 Jul 2014 22:01:37 +0000"  >&lt;p&gt;Brian, thanks for the comment:&lt;br/&gt;
1. The single-node mds-survey benchmark was way simplistic and it was not intended as any performance conclusion.&lt;br/&gt;
2. When you said &quot;effectively reduces by 4x the cached working set size&quot;, did you mean: the FIDs in the working set are so sparse that each leaf block holds only one in the set, so for a fixed cache size cached working set is reduced by 4?&lt;/p&gt;</comment>
                            <comment id="89799" author="behlendorf" created="Tue, 22 Jul 2014 22:28:39 +0000"  >&lt;p&gt;&amp;gt; the FIDs in the working set are so sparse that each leaf block holds only one...&lt;/p&gt;

&lt;p&gt;Yes, exactly.  From everything I&apos;ve seen the FIDs are distributed very uniformly over the leaves.  This is a good thing and a bad thing.  So let&apos;s say you have a set of 1024 files all of which are being frequently accessed so the OI leaf blocks all end up on the ARCs MFU.  If the OI FatZAP has enough total entries my expectation would be that each entry would hash to a different leaf.  So with 4K leaf blocks you&apos;ll consume roughly 4M of memory.  With 16k leak blocks you&apos;re looking at 16M for the same workloads.  This may still be a reasonable trade-off to make, but it&apos;s something which should be considered.&lt;/p&gt;</comment>
                            <comment id="92435" author="pjones" created="Tue, 26 Aug 2014 15:24:13 +0000"  >&lt;p&gt;Landed for 2.7&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzwryv:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>15006</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>