<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:14:22 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-8068] Large ZFS Dnode support</title>
                <link>https://jira.whamcloud.com/browse/LU-8068</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Unlanded patches exist in upstream ZFS to increase the dnode size which need to be evaluated for their impact (hopefully improvement) for Lustre metadata performance on ZFS MDTs:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/zfsonlinux/zfs/pull/3542&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://github.com/zfsonlinux/zfs/pull/3542&lt;/a&gt;&lt;/p&gt;</description>
                <environment></environment>
        <key id="31621">LU-8068</key>
            <summary>Large ZFS Dnode support</summary>
                <type id="4" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11310&amp;avatarType=issuetype">Improvement</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="bzzz">Alex Zhuravlev</assignee>
                                    <reporter username="adilger">Andreas Dilger</reporter>
                        <labels>
                            <label>llnl</label>
                    </labels>
                <created>Fri, 21 Aug 2015 22:14:25 +0000</created>
                <updated>Mon, 27 Feb 2017 22:55:17 +0000</updated>
                            <resolved>Thu, 2 Jun 2016 11:45:19 +0000</resolved>
                                                    <fixVersion>Lustre 2.9.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>19</watches>
                                                                            <comments>
                            <comment id="124837" author="adilger" created="Fri, 21 Aug 2015 22:57:11 +0000"  >&lt;p&gt;Any performance testing of this patch should be done using a Lustre MDT rather than just a local ZFS filesystem with a standard create/stat/unlink workload.  Otherwise, the large dnodes will just slow down metadata performance due to increased IO, and the overhead of xattrs and external spill blocks used by Lustre will not be measured.&lt;/p&gt;

&lt;p&gt;It &lt;em&gt;may&lt;/em&gt; be possible to use &lt;tt&gt;mdsrate &amp;#45;&amp;#45;create &amp;#45;&amp;#45;setxattr&lt;/tt&gt; changes from &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6483&quot; title=&quot;Add xattrset to mdsrate&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6483&quot;&gt;&lt;del&gt;LU-6483&lt;/del&gt;&lt;/a&gt; (or equivalent) to test on a local ZFS filesystem, but this still needs an enhancement to allow storing smaller xattrs since &amp;#45;&amp;#45;setxattr currently only stores a 4000&amp;#45;byte xattr.  It needs to be enhanced to allow &lt;tt&gt;&amp;#45;&amp;#45;setxattr=&amp;lt;size&amp;gt;&lt;/tt&gt; to store an xattr of a specific size, say 384 bytes for ZFS with 1024&amp;#45;byte dnodes.&lt;/p&gt;</comment>
                            <comment id="124912" author="jgmitter" created="Mon, 24 Aug 2015 17:27:14 +0000"  >&lt;p&gt;Hi Jinshan,&lt;br/&gt;
Can you have a look at this topic?&lt;br/&gt;
Thanks.&lt;br/&gt;
Joe&lt;/p&gt;</comment>
                            <comment id="127124" author="jay" created="Fri, 11 Sep 2015 18:48:06 +0000"  >&lt;p&gt;It looks like this feature has some conflicts with storing XATTR as system attribute, which blocks further performance benchmark. I&apos;m waiting for the author&apos;s response to move forward.&lt;/p&gt;</comment>
                            <comment id="146448" author="bzzz" created="Tue, 22 Mar 2016 14:31:07 +0000"  >&lt;p&gt;I tried this patch with createmany on a directly mounted ZFS. it degrades create performance from ~29K/sec to ~20K/sec. I&apos;m not sure how quickly this degradation can be addressed, but in general large dnode patch looks very important. to simulate it I tweaked the code to shrink LOVEA to just few bytes so that we fit bonus. and this brought creation rate from ~13K to ~20K in mds-survey.&lt;/p&gt;</comment>
                            <comment id="146473" author="adilger" created="Tue, 22 Mar 2016 16:23:04 +0000"  >&lt;p&gt;Alex, could you please post on the patch in GitHub so the LLNL folks can see. &lt;/p&gt;

&lt;p&gt;Also, it isn&apos;t clear what the difference is between your two tests. In the first case you wrote the create rate is &lt;em&gt;down&lt;/em&gt; from 29k to 20k, is that for ZPL create rate?  I don&apos;t expect this feature to help the non-Lustre case, since ZPL doesn&apos;t use SAs that can fit into the large dnode space, so it is just overhead. &lt;/p&gt;

&lt;p&gt;In the second case you wrote the create rate is &lt;em&gt;up&lt;/em&gt; from 13k to 20k when you shrink the LOVEA, so presumably this is Lustre, but without the large dnode patch?&lt;/p&gt;

&lt;p&gt;What is the performance with Lustre with normal LOVEA size (1-4 stripes) + large dnodes?  Presumably that would be 13k +/- some amount, not 29k +/- some amount?&lt;/p&gt;

&lt;p&gt;Also, my (vague) understanding of this patch is that it dynamically allocates space for the dnode, possibly using up space for dnode numbers following it?  Does this fail if the dnode is not declared large enough for all future SAs during the initial allocation?  IIRC, the osd-zfs code stores the layout and link xattrs to the dnode in a separate operation, which may make the large dnode patch ineffective.  It may also have problems with multiple threads allocating dnodes from the same block in parallel, since it doesn&apos;t know at dnode allocation time how large the SA space Lustre eventually needs. Maybe my understanding of how this feature was implemented is wrong?&lt;/p&gt;</comment>
                            <comment id="146478" author="bzzz" created="Tue, 22 Mar 2016 16:29:57 +0000"  >&lt;p&gt;Andreas, I&apos;ve made a comment to github already, no reply so far. Hope Ned has that seen.&lt;/p&gt;

&lt;p&gt;so far I&apos;ve tested large dnodes with ZPL only and noticed significant degradation, so I took a timeout hoping to see comments from Ned.&lt;br/&gt;
I haven&apos;t tested Lustre with large dnodes.&lt;/p&gt;

&lt;p&gt;the patch allows to ask for dnode of specific size and I think we can do this given that we declare everything (including LOVEA of known size) ahead.&lt;br/&gt;
we can easly track this in OSD.&lt;/p&gt;
</comment>
                            <comment id="146479" author="adilger" created="Tue, 22 Mar 2016 16:35:40 +0000"  >&lt;p&gt;Rereading the large dnode patch, it seems that the caller can specify the dnode size on a per-dnode basis, so ideally we can add support for this to the osd-zfs code, but if not specified it will take the dataset property. Is 1KB large enough to hold the dnode + LOVEA + linkea + FID?&lt;/p&gt;</comment>
                            <comment id="146481" author="adilger" created="Tue, 22 Mar 2016 16:42:16 +0000"  >&lt;p&gt;Alex, your comment is on an old version of the patch, and not on the main pull request (&lt;a href=&quot;https://github.com/zfsonlinux/zfs/pull/3542&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://github.com/zfsonlinux/zfs/pull/3542&lt;/a&gt;), so I don&apos;t think Ned will be looking there? Also, hopefully you are not using this old version of the patch (8f9fdb228), but rather the newest patch (ba39766)?&lt;/p&gt;</comment>
                            <comment id="146482" author="bzzz" created="Tue, 22 Mar 2016 16:44:21 +0000"  >&lt;p&gt;yes, I was about to play with the code, but got confused by that performance issue. and yes, 1K should be more than enough: LinkEA would be 48+ bytes, LOVEA is something like 56+, then LMA and VBR (which I&apos;d hope we can put into ZPL dnode, but in the worst case it&apos;s another 24+8 bytes). &lt;/p&gt;</comment>
                            <comment id="146483" author="bzzz" created="Tue, 22 Mar 2016 16:49:47 +0000"  >&lt;p&gt;hmm, I was using old version.. let me try the new one. this will take some time - the patch doesn&apos;t apply to 0.6.5&lt;/p&gt;</comment>
                            <comment id="146510" author="bzzz" created="Tue, 22 Mar 2016 18:08:33 +0000"  >&lt;p&gt;clean zfs/master:&lt;br/&gt;
Created 1000000 in 29414ms in 1 threads - 33997/sec&lt;br/&gt;
Created 1000000 in 20045ms in 2 threads - 49887/sec&lt;br/&gt;
Created 1000000 in 19259ms in 4 threads - 51923/sec&lt;br/&gt;
Created 1000000 in 17284ms in 8 threads - 57856/sec&lt;/p&gt;

&lt;p&gt;zfs/master + large dnodes:&lt;br/&gt;
Created 1000000 in 40618ms in 1 threads - 24619/sec&lt;br/&gt;
Created 1000000 in 28142ms in 2 threads - 35534/sec&lt;br/&gt;
Created 1000000 in 25731ms in 4 threads - 38863/sec&lt;br/&gt;
Created 1000000 in 25244ms in 8 threads - 39613/sec&lt;/p&gt;</comment>
                            <comment id="146516" author="bzzz" created="Tue, 22 Mar 2016 19:21:19 +0000"  >&lt;p&gt;tried Lustre with that patch (on top of master ZFS):&lt;br/&gt;
before:&lt;br/&gt;
mdt 1 file  500000 dir    1 thr    1 create 21162.48 [ 18998.73, 22999.10] &lt;br/&gt;
after:&lt;br/&gt;
mdt 1 file  500000 dir    1 thr    1 create 18019.70 [ 15999.09, 19999.20] &lt;/p&gt;

&lt;p&gt;osd-zfs/ was modified to ask for 1K dnodes, verified with zdb:&lt;br/&gt;
    Object  lvl   iblk   dblk  dsize  dnsize  lsize   %full  type&lt;br/&gt;
     10000    1    16K    512      0      1K    512    0.00  ZFS plain file&lt;/p&gt;

&lt;p&gt;notice zero dsize meaning no spill was allocated.&lt;/p&gt;</comment>
                            <comment id="147014" author="bzzz" created="Mon, 28 Mar 2016 08:27:59 +0000"  >&lt;p&gt;Ned refreshed the patch to address that performance issue and now it&apos;s doing much better.&lt;br/&gt;
first of all, now I&apos;m able to complete some tests where I was getting OOM (because of huge memory consumption by 8K spill, I guess).&lt;br/&gt;
now it makes sense to benchmark on a real storage as amount of IO with this patch is few times less:&lt;br/&gt;
1K vs (512byte dnode + 8K spill) per dnode OR 976MB vs 8300MB per 1M dnodes.&lt;/p&gt;
</comment>
                            <comment id="150275" author="adilger" created="Tue, 26 Apr 2016 18:06:47 +0000"  >&lt;p&gt;The large dnode patch is blocked behind &lt;a href=&quot;https://github.com/zfsonlinux/zfs/pull/4460&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://github.com/zfsonlinux/zfs/pull/4460&lt;/a&gt; which is the performance problem that Alex and Ned identified, but currently that patch is only a workaround and needs to be improved before landing.  I&apos;ve described in that ticket what seems to be a reasonable approach for making a production-ready solution, but to summarize:&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;by default the dnode allocator should just use a counter that continues at the next file offset (as in the existing 4460 patch)&lt;/li&gt;
	&lt;li&gt;if dnodes are being unlinked, a (per-cpu?) counter of unlinked dnodes and the minimum unlinked dnode number should be tracked (these values could be racy since it isn&apos;t critical that their values be 100% accurate)&lt;/li&gt;
	&lt;li&gt;when the unlinked dnode counter exceeds some threshold (e.g. 4x number of inodes created in previous TXG, or 64x the number of dnodes that fit into a leaf block, or some tunable number of unlinked dnodes specified by userspace) then scanning should restart at the minimum unlinked dnode number instead of &quot;0&quot; to avoid scanning a large number of already-allocated dnode blocks&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;Alex, in order to move the large dnode patch forward, could you or Nathaniel work on an updated 4460 patch so that we can get on with landing the large dnode patch.&lt;/p&gt;</comment>
                            <comment id="153055" author="nedbass" created="Fri, 20 May 2016 20:46:56 +0000"  >&lt;p&gt;We&apos;re currently testing with the following patch to mitigate the performance impact of metadnode backfilling.  It uses a naive heuristic (rescan after 4096 unlinks at most once per txg) but this is simple and probably achieves 99% of the performance to be gained here.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/LLNL/zfs/commit/050b0e69&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://github.com/LLNL/zfs/commit/050b0e69&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="153082" author="gerrit" created="Fri, 20 May 2016 23:03:01 +0000"  >&lt;p&gt;Ned Bass (bass6@llnl.gov) uploaded a new patch: &lt;a href=&quot;http://review.whamcloud.com/20367&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/20367&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-8068&quot; title=&quot;Large ZFS Dnode support&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-8068&quot;&gt;&lt;del&gt;LU-8068&lt;/del&gt;&lt;/a&gt; osd-zfs: large dnode compatibility&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: f0d8afaec213a7f471c3b22b9940de5c5cd192e3&lt;/p&gt;</comment>
                            <comment id="154391" author="gerrit" created="Thu, 2 Jun 2016 04:45:48 +0000"  >&lt;p&gt;Oleg Drokin (oleg.drokin@intel.com) merged in patch &lt;a href=&quot;http://review.whamcloud.com/20367/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/20367/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-8068&quot; title=&quot;Large ZFS Dnode support&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-8068&quot;&gt;&lt;del&gt;LU-8068&lt;/del&gt;&lt;/a&gt; osd-zfs: large dnode support&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 9765c6174ef580fb4deef4e7faea6d5ed634b00f&lt;/p&gt;</comment>
                            <comment id="154410" author="pjones" created="Thu, 2 Jun 2016 11:45:19 +0000"  >&lt;p&gt;Landed for 2.9&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10120">
                    <name>Blocker</name>
                                            <outwardlinks description="is blocking">
                                        <issuelink>
            <issuekey id="35506">LU-7895</issuekey>
        </issuelink>
                            </outwardlinks>
                                                                <inwardlinks description="is blocked by">
                                                        </inwardlinks>
                                    </issuelinktype>
                            <issuelinktype id="10010">
                    <name>Duplicate</name>
                                                                <inwardlinks description="is duplicated by">
                                        <issuelink>
            <issuekey id="38318">LU-8424</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="29600">LU-6483</issuekey>
        </issuelink>
                            </outwardlinks>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="36378">LU-8124</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzxl2f:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                </customfields>
    </item>
</channel>
</rss>