<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 03:23:15 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-16017] Suboptimal dnode size used for ZFS</title>
                <link>https://jira.whamcloud.com/browse/LU-16017</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;When allocating new objects for a ZFS-based server the dnode should be sized large enough to store all of the base Lustre extended attributes (trusted.lma, trusted.fid, trusted.version) in order to avoid requiring a spill block.&#160; In practice, this means a minimum size of 1K is required to accommodate the packed xattr nvlist in the bonus area.&lt;/p&gt;

&lt;p&gt;&lt;tt&gt;&amp;gt; zdb -e -p /tmp/ -dddd lustre-ost1/ost1 1500&lt;/tt&gt;&lt;/p&gt;

&lt;p&gt;&lt;tt&gt;Dataset lustre-ost2/ost2 &lt;span class=&quot;error&quot;&gt;&amp;#91;ZPL&amp;#93;&lt;/span&gt;, ID 391, cr_txg 8, 18.8M, 614 objects, rootbp DVA&lt;span class=&quot;error&quot;&gt;&amp;#91;0&amp;#93;&lt;/span&gt;=&amp;lt;0:16440a00:200&amp;gt; DVA&lt;span class=&quot;error&quot;&gt;&amp;#91;1&amp;#93;&lt;/span&gt;=&amp;lt;0:ada00:200&amp;gt; &lt;span class=&quot;error&quot;&gt;&amp;#91;L0 DMU objset&amp;#93;&lt;/span&gt; fletcher4 lz4 unencrypted LE contiguous unique double size=1000L/200P birth=89L/89P fill=614 cksum=dd5e644f1:50403505c9e:f29fa6075ec6:1fd1540cabc5f5&lt;/tt&gt;&lt;/p&gt;

&lt;p&gt;&lt;tt&gt;&#160; &#160; Object &#160;lvl &#160; iblk &#160; dblk &#160;dsize &#160;dnsize &#160;lsize &#160; %full &#160;type&lt;/tt&gt;&lt;br/&gt;
&lt;tt&gt;&#160; &#160; &#160; 1500 &#160; &#160;1 &#160; 128K &#160; &#160;64K &#160; &#160;64K &#160; &#160; &#160;1K &#160; &#160;64K &#160;100.00 &#160;ZFS plain file&lt;/tt&gt;&lt;br/&gt;
&lt;tt&gt;&#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160;356 &#160; bonus &#160;System attributes&lt;/tt&gt;&lt;br/&gt;
&lt;tt&gt;&#160; &#160; &#160; &#160; dnode flags: USED_BYTES USERUSED_ACCOUNTED USEROBJUSED_ACCOUNTED&lt;/tt&gt;&lt;br/&gt;
&lt;tt&gt;&#160;&#160;&#160;&#160;&#160;&#160;&#160; ...&lt;/tt&gt;&lt;/p&gt;

&lt;p&gt;&lt;tt&gt;&#160; &#160; &#160; &#160; SA xattrs: 204 bytes, 3 entries&lt;/tt&gt;&lt;/p&gt;

&lt;p&gt;&lt;tt&gt;&#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; trusted.lma = \010\000\000\000\000\000\000\000\000\000\001\000\001\000\000\000\360\000\000\000\000\000\000\000&lt;/tt&gt;&lt;br/&gt;
&lt;tt&gt;&#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; trusted.fid = \001\004\000\000\002\000\000\000\335\001\000\000\000\000\000\000\000\000\020\000\001\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000&lt;/tt&gt;&lt;br/&gt;
&lt;tt&gt;&#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; trusted.version = \357\000\000\000\001\000\000\000&lt;/tt&gt;&lt;/p&gt;

&lt;p&gt;However, by default 512b dnodes are created forcing a spill block to be allocated for each object which significantly increases the required storage and reduces performance due to the additional I/O.&lt;/p&gt;

&lt;p&gt;It appears the issue is caused by OSD_BASE_EA_IN_BONUS being set incorrectly.&#160; This value needs to account not just for the data size (which it does), but also for the xattr key text and XDR encoding overhead for the packed nvlist.&#160; After everything is taken in to account the correct size is 204 bytes according to zdb.&lt;/p&gt;

&lt;p&gt;It seems to me the correct fix here is to update OSD_BASE_EA_IN_BONUS accordingly.&#160; Making the follow change fixes the issue in my testing.&#160; It&apos;d be great if someone else could review this change to make sure additional changes aren&apos;t needed.&lt;/p&gt;

&lt;p&gt;&lt;tt&gt;+/*&lt;/tt&gt;&lt;br/&gt;
&lt;tt&gt;+ * The base extended attribute SA size including the keys, values,&lt;/tt&gt;&lt;br/&gt;
&lt;tt&gt;+ * and XDR encoding overhead as reported by zdb.&lt;/tt&gt;&lt;br/&gt;
&lt;tt&gt;+ *&lt;/tt&gt;&lt;br/&gt;
&lt;tt&gt;+ * SA xattrs: 204 bytes, 3 entries&lt;/tt&gt;&lt;br/&gt;
&lt;tt&gt;+ * &#160; &#160; trusted.lma = \000\000\000\000...&lt;/tt&gt;&lt;br/&gt;
&lt;tt&gt;+ * &#160; &#160; trusted.fid = \000\000\000\000...&lt;/tt&gt;&lt;br/&gt;
&lt;tt&gt;+ * &#160; &#160; trusted.version = \000\000\000...&lt;/tt&gt;&lt;br/&gt;
&lt;tt&gt;+ */&lt;/tt&gt;&lt;br/&gt;
&lt;tt&gt;+#define OSD_BASE_EA_IN_BONUS &#160; (ZFS_SA_BASE_ATTR_SIZE + 204)&lt;/tt&gt;&lt;/p&gt;

&lt;p&gt;Until a fix is merged a reasonable workaround for this is to explicitly set the dnodesize property for the ZFS datasets to 1K on pools containing OSTs, and possibly larger on pools containing MDTs.&lt;/p&gt;

&lt;p&gt;&lt;tt&gt;zfs set dnodesize=1k pool/ost&lt;/tt&gt;&lt;/p&gt;</description>
                <environment>Any Lustre filesystem using ZFS-based servers.</environment>
        <key id="71170">LU-16017</key>
            <summary>Suboptimal dnode size used for ZFS</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="1" iconUrl="https://jira.whamcloud.com/images/icons/statuses/open.png" description="The issue is open and ready for the assignee to start work on it.">Open</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="behlendorf">Brian Behlendorf</assignee>
                                    <reporter username="behlendorf">Brian Behlendorf</reporter>
                        <labels>
                            <label>llnl</label>
                            <label>zfs</label>
                    </labels>
                <created>Fri, 15 Jul 2022 21:07:02 +0000</created>
                <updated>Fri, 23 Sep 2022 23:54:05 +0000</updated>
                                            <version>Upstream</version>
                    <version>Lustre 2.15.0</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>13</watches>
                                                                            <comments>
                            <comment id="340622" author="ofaaland" created="Fri, 15 Jul 2022 23:56:36 +0000"  >&lt;p&gt;Brian, I believe that sites with large large layouts due to striping over large OST counts or due to use of PFL (we use it) will need much more space than that.  I&apos;m gathering a sample from one of our newer OCF file systems to get a sense of it.&lt;/p&gt;</comment>
                            <comment id="340623" author="ofaaland" created="Sat, 16 Jul 2022 00:44:20 +0000"  >&lt;p&gt;A very brief sample from one of our file systems found that zdb reports between 1028 and 1076 bytes used for SA xattrs, for about 95% of objects of type &quot;ZFS plain file&quot;, on an MDT. Some of those could be objects created and used internally by Lustre, but I believe most are not&lt;/p&gt;

&lt;p&gt;Sampled from MDT0001 on CZlustre2&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;  count   dnodesize
 650274 1K
1157816 512
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The most commonly seen sizes for SA xattrs were:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;count    size
 257173 SA xattrs bytes 1064
 238096 SA xattrs bytes 1052
 210535 SA xattrs bytes 1040
 198597 SA xattrs bytes 1044
 145498 SA xattrs bytes 1060
 134837 SA xattrs bytes 1056
 128782 SA xattrs bytes 1036
 116194 SA xattrs bytes 1048
  96032 SA xattrs bytes 1072
  71828 SA xattrs bytes 1068
  51614 SA xattrs bytes 1032
  51206 SA xattrs bytes 1028
  35392 SA xattrs bytes 1076
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;I haven&apos;t looked at an OST yet.&lt;/p&gt;</comment>
                            <comment id="340624" author="behlendorf" created="Sat, 16 Jul 2022 00:50:53 +0000"  >&lt;p&gt;Assuming a 1k dnode it looks like we&apos;ve got&#160; 476 byes of bonus space still available for an OST regular file object and 336 bytes available for an MDT regular file object.&#160; How much space does a PFL layout typically need?&lt;/p&gt;

&lt;p&gt;Then it sounds like 1k dnodes are sized reasonably for the OSTs to avoid wasting space, but we probably want to go larger by default of the MDTs.&#160; The code automatically sizes the dnode based on the expected xattr size so it seems we&apos;ll need to teach it about those additional MDT xattrs.&lt;/p&gt;</comment>
                            <comment id="340636" author="adilger" created="Sat, 16 Jul 2022 16:34:00 +0000"  >&lt;p&gt;Brian, it would probably be best for review if you pushed a patch that embodied your change. &lt;/p&gt;

&lt;p&gt;That said, it would be preferable IMHO if the change used &quot;&lt;tt&gt;sizeof(foo)&lt;/tt&gt;&quot; instead of a fixed number, possibly with a fudge-factor (2x or 7/4 or whatever) to compensate for encoding and other overhead. &lt;/p&gt;</comment>
                            <comment id="340894" author="behlendorf" created="Tue, 19 Jul 2022 17:06:21 +0000"  >&lt;p&gt;Right, I&apos;ll push a proper patch to for review in the next couple of days.&#160; I&apos;m happy to structure it as you suggested.&lt;/p&gt;</comment>
                            <comment id="340932" author="ofaaland" created="Tue, 19 Jul 2022 23:14:52 +0000"  >&lt;p&gt;For my reference, our local ticket is TOSS5732&lt;/p&gt;</comment>
                            <comment id="344968" author="aboyko" created="Mon, 29 Aug 2022 13:52:56 +0000"  >&lt;p&gt;Hi &lt;a href=&quot;https://jira.whamcloud.com/secure/ViewProfile.jspa?name=behlendorf&quot; class=&quot;user-hover&quot; rel=&quot;behlendorf&quot;&gt;behlendorf&lt;/a&gt;, I see that you was going to prepare a fix.  Was it pushed ? I don&apos;t see any link here.&lt;br/&gt;
Thanks.&lt;/p&gt;</comment>
                            <comment id="345018" author="ofaaland" created="Mon, 29 Aug 2022 21:33:49 +0000"  >&lt;p&gt;Hi Alexander, I took this over from Brian and haven&apos;t pushed a patch yet.  Probably later today.&lt;/p&gt;</comment>
                            <comment id="347693" author="spitzcor" created="Fri, 23 Sep 2022 15:48:37 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/ViewProfile.jspa?name=ofaaland&quot; class=&quot;user-hover&quot; rel=&quot;ofaaland&quot;&gt;ofaaland&lt;/a&gt;, were you still planning to push a patch?&lt;/p&gt;</comment>
                            <comment id="347800" author="gerrit" created="Fri, 23 Sep 2022 23:44:09 +0000"  >&lt;p&gt;&quot;Olaf Faaland &amp;lt;faaland1@llnl.gov&amp;gt;&quot; uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/48646&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/48646&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-16017&quot; title=&quot;Suboptimal dnode size used for ZFS&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-16017&quot;&gt;LU-16017&lt;/a&gt; osd-zfs: OSD_BASE_EA_IN_BONUS should include names&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 010a36889fc430d498b3b0dfb41838aaf8aae024&lt;/p&gt;</comment>
                            <comment id="347801" author="ofaaland" created="Fri, 23 Sep 2022 23:54:05 +0000"  >&lt;p&gt;Cory and Alexander, sorry that took so long.&#160; Reviews appreciated.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                                                <inwardlinks description="is related to">
                                                        </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10040" key="com.atlassian.jira.plugin.system.customfieldtypes:labels">
                        <customfieldname>Epic</customfieldname>
                        <customfieldvalues>
                                        <label>metadata</label>
            <label>performance</label>
            <label>server</label>
            <label>zfs</label>
    
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10030" key="com.atlassian.jira.plugin.system.customfieldtypes:labels">
                        <customfieldname>Epic/Theme</customfieldname>
                        <customfieldvalues>
                                        <label>zfs</label>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i02uqf:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>