<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:16:42 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-8342] ZFS dnodesize and recordsize should be set at file system creation</title>
                <link>https://jira.whamcloud.com/browse/LU-8342</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;ZFS dnodesize and recordsize should be set to appropriate defaults upon dataset creation at filesystem creation time. We can set dnodesize=auto and recordsize=1M by default if the installed version of zfs supports it.&lt;/p&gt;</description>
                <environment></environment>
        <key id="37879">LU-8342</key>
            <summary>ZFS dnodesize and recordsize should be set at file system creation</summary>
                <type id="4" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11310&amp;avatarType=issuetype">Improvement</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="dinatale2">Giuseppe Di Natale</assignee>
                                    <reporter username="dinatale2">Giuseppe Di Natale</reporter>
                        <labels>
                            <label>llnl</label>
                    </labels>
                <created>Tue, 28 Jun 2016 21:07:01 +0000</created>
                <updated>Wed, 13 Sep 2017 03:49:46 +0000</updated>
                            <resolved>Wed, 13 Sep 2017 03:49:46 +0000</resolved>
                                                    <fixVersion>Lustre 2.11.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>6</watches>
                                                                            <comments>
                            <comment id="157195" author="adilger" created="Tue, 28 Jun 2016 21:43:00 +0000"  >&lt;p&gt;See &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-8042&quot; title=&quot;mkfs.lustre should set ashift=12 recordsize=1024k compression=lz4&amp;quot; by default for new ZFS OSTs&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-8042&quot;&gt;&lt;del&gt;LU-8042&lt;/del&gt;&lt;/a&gt; and &lt;a href=&quot;http://review.whamcloud.com/19892&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/19892&lt;/a&gt; for most of this. I just haven&apos;t had time to finish that up, if you wanted to use my patch as a starting point. I don&apos;t think it has the dnode_size option yet as that needs some configure and runtime detection. &lt;/p&gt;</comment>
                            <comment id="157211" author="dinatale2" created="Tue, 28 Jun 2016 23:49:09 +0000"  >&lt;p&gt;Andreas,&lt;/p&gt;

&lt;p&gt;I have a different way of setting the recordsize and dnodesize properties which will avoid the runtime detection. I&apos;ll submit it shortly so it&apos;s out there and can be commented on.&lt;/p&gt;

&lt;p&gt;I did have a question. You only set the recordsize property only on OSTs in your version. Why not on MDTs as well? Are there performance concerns? From my understanding, the recordsize property is more of a maximum. I also was looking at object creation in osd-zfs and ultimately zfs object allocation is called with a blocksize of 0 which results in a minimum sized block being allocated. Seems like no harm in having recordsize set to 1M on the MDTs as well.&lt;/p&gt;</comment>
                            <comment id="157212" author="gerrit" created="Tue, 28 Jun 2016 23:49:29 +0000"  >&lt;p&gt;Giuseppe Di Natale (dinatale2@llnl.gov) uploaded a new patch: &lt;a href=&quot;http://review.whamcloud.com/21055&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/21055&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-8342&quot; title=&quot;ZFS dnodesize and recordsize should be set at file system creation&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-8342&quot;&gt;&lt;del&gt;LU-8342&lt;/del&gt;&lt;/a&gt; utils: Set dnodesize and recordsize at dataset creation&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 408424c909552b599340953fa92e31b54948249b&lt;/p&gt;</comment>
                            <comment id="157214" author="adilger" created="Tue, 28 Jun 2016 23:59:23 +0000"  >&lt;p&gt;The reason for not setting the recordsize on the MDT is that there are few reasons to have such large IOs on the MDT, and because almost all files on the MDT are modified in small chunks so having a large blocksize could potentially hurt metadata performance significantly (eg. log files, directories (though they have a separate tunable), config files, etc.) so I&apos;d rather avoid that complexity and risk. &lt;/p&gt;

&lt;p&gt;We&apos;ve always tuned ldiskfs differently for the MDT and OST for exactly this reason, for example not having extent-mapped files on the MDT, having different ratios of space to inodes, etc. &lt;/p&gt;</comment>
                            <comment id="157215" author="adilger" created="Wed, 29 Jun 2016 00:02:58 +0000"  >&lt;p&gt;Joe, are you going to also set the ashift=12 for ZFS? I think there no good reasons to have ashift=9 on the OSTs, but this can have significant negative performance impact of not auto detected correctly, and blocks the ability to update to 4KB drives in the future. &lt;/p&gt;</comment>
                            <comment id="157325" author="dinatale2" created="Wed, 29 Jun 2016 21:06:27 +0000"  >&lt;p&gt;Are you suggesting that I set ashift=12 by default? I talked with Brian about the ashift property. Based on that discussion, zfs attempts to pick the right ashift based on the hardware. Is there certain hardware you&apos;re experiencing issues with?&lt;/p&gt;</comment>
                            <comment id="157326" author="dinatale2" created="Wed, 29 Jun 2016 21:07:47 +0000"  >&lt;p&gt;Andreas, what is your opinion on setting dnodesize on both OSTs and MDTs as well?&lt;/p&gt;</comment>
                            <comment id="157333" author="adilger" created="Wed, 29 Jun 2016 21:39:44 +0000"  >&lt;p&gt;There will definitely be more xattrs in the MDT, but there are also some on the OST. AFAIK there shouldn&apos;t be any harm in dnodesize=auto on for the OSTs, even if they only use the minimum dnode size today. There are definitely a few more xattrs that will be stored on the OST objects in the future in order to improve LFSCK support with composite file layouts. &lt;/p&gt;

&lt;p&gt;For the MDT, it definitely makes sense to enable at least dnodesize=auto, but it might make sense to reserve more space of the median xattr size is larger. IMHO it wouldn&apos;t be terrible to track the size of xattrs stored on a dnode in a histogram of percpu counters (to avoid contention) so that there are enough dnode slots reserved for each new dnode even if the setxattr doesn&apos;t happen atomically with the create.  Alex was working on a patch to do this by accumulating the size of xattrs declared during the file create. &lt;a href=&quot;http://review.whamcloud.com/19101&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/19101&lt;/a&gt; that could use some review. &lt;/p&gt;</comment>
                            <comment id="157395" author="dinatale2" created="Thu, 30 Jun 2016 15:14:07 +0000"  >&lt;p&gt;Ok, then I will go ahead and set dnodesize=auto at file system creation time. If the dnode size needs to be different that can be handled after the fact.&lt;/p&gt;

&lt;p&gt;Brian had also mentioned that in the future, dnodesize=auto could be a bit more intelligent and choose the appropriate dnode size if one wasn&apos;t specified. But, currently I believe auto just results in dnode sizes of 1K.&lt;/p&gt;

&lt;p&gt;Andreas, can you please comment on the ashift questions above? I just want to make sure I understand the ashift changes proposed.&lt;/p&gt;</comment>
                            <comment id="157436" author="adilger" created="Thu, 30 Jun 2016 17:22:26 +0000"  >&lt;p&gt;Based on past postings on the ZFS mailing lists, users have reported terrible performance when ZFS doesn&apos;t auto-detect the 4KB sector size correctly (usually because drives are advertising 512-byte sectors for &quot;maximum compatibility&quot; even when they have 4KB sectors internally).  Not only does this hurt performance, it can potentially lead to data reliability problems if sectors are being modified that do not belong to the current block.&lt;/p&gt;

&lt;p&gt;Also, even if drives are correctly reporting 512-byte sectors, there is a long-term maintenance problem if those drives need to be replaced by newer drives, because all newer/larger drives have 4KB sectors and it isn&apos;t possible to replace any drives in a 512-byte sector VDEV with 4096-byte sector drives without a full backup/restore.  That makes maintenance more problematic, as well as prevents VDEV &quot;autoexpand&quot; to work if existing drives are replaced with larger drives.&lt;/p&gt;

&lt;p&gt;While I understand Brian&apos;s concern that changing the OpenZFS default to ashift=12 since it would increase space usage for some workloads (despite repeated requests to change it), this is less of a concern for Lustre OSTs.  From a support and &quot;best performance out of the box&quot; point of view I&apos;d prefer setting ashift=12 on OSTs by default.&lt;/p&gt;</comment>
                            <comment id="158136" author="ofaaland" created="Fri, 8 Jul 2016 16:51:53 +0000"  >&lt;p&gt;Even if these pool and dataset settings are correct for all cases for now, they may become incorrect with future changes in either lustre or zfs.  Furthermore, there are likely unusual cases (e.g. testing, see Jinshan&apos;s concern about test data often being highly compressible) where one or more of these settings are undesirable.&lt;/p&gt;

&lt;p&gt;Both &lt;a href=&quot;http://review.whamcloud.com/#/c/19892/2&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#/c/19892/2&lt;/a&gt; and &lt;a href=&quot;http://review.whamcloud.com/#/c/21055/3&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#/c/21055/3&lt;/a&gt; set the desired settings by hard-coding specific values into mkfs.lustre.  How about putting the settings themselves into a configuration file, e.g. /etc/sysconfig/lustre or distro-specific equivalent which is parsed by mkfs.lustre?  Then they are visible to the user, can easily be changed when appropriate, and the defaults can be changed with a trivial patch that is easy to review.&lt;/p&gt;</comment>
                            <comment id="158137" author="ofaaland" created="Fri, 8 Jul 2016 16:54:59 +0000"  >&lt;p&gt;I see that lustre/conf/lustre already has ZPOOL_IMPORT_DIR and ZPOOL_IMPORT_ARGS.  So perhaps ZPOOL_CREATE_ARGS and ZFS_CREATE_ARGS?&lt;/p&gt;</comment>
                            <comment id="158158" author="ofaaland" created="Fri, 8 Jul 2016 18:38:34 +0000"  >&lt;p&gt;Perhaps lustre/conf/lustre is not the right place for such settings; I see there is an /etc/mke2fs.conf with ini-style contents, and some userspace apps seem to use /etc/default/foo.  Anyway, basic proposal that these settings be put in a config file, instead of in the code, still stands.&lt;/p&gt;</comment>
                            <comment id="158197" author="adilger" created="Fri, 8 Jul 2016 22:02:22 +0000"  >&lt;p&gt;There are already &lt;tt&gt;-&lt;del&gt;mkfsoptions&lt;/tt&gt; and &lt;tt&gt;&lt;/del&gt;-mountfsoptions&lt;/tt&gt; that can be used to pass extra options to &lt;tt&gt;mkfs.lustre&lt;/tt&gt; and to the internal mount command for the back-end filesystem.  They should be able to override the default options specified internally by &lt;tt&gt;mkfs.lustre&lt;/tt&gt;.  My goal in specifying these options internally is that the majority of users should get the best performance out of the box if possible, rather than having to specify extra options.&lt;/p&gt;</comment>
                            <comment id="158200" author="ofaaland" created="Fri, 8 Jul 2016 22:25:02 +0000"  >&lt;p&gt;Andreas,&lt;br/&gt;
I understand.  I&apos;m suggesting that if those good defaults are encoded in a config file instead of in code, they (a) are visible to the user and (b) require trival code review to change.  Also, the existing options do not distinguish between pool properties and dataset properties.&lt;/p&gt;</comment>
                            <comment id="158490" author="adilger" created="Tue, 12 Jul 2016 16:19:51 +0000"  >&lt;p&gt;Olaf, I&apos;m not against that, but it would definitely be more work than the current patch, and likely push the change out into 2.10.&lt;/p&gt;</comment>
                            <comment id="179088" author="cengku9660" created="Wed, 28 Dec 2016 01:58:10 +0000"  >&lt;p&gt;Hi Giuseppe,&lt;/p&gt;

&lt;p&gt;Any update about the patch &lt;a href=&quot;http://review.whamcloud.com/21055?&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/21055?&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="179595" author="dinatale2" created="Wed, 4 Jan 2017 18:26:43 +0000"  >&lt;p&gt;Currently no updates to report. I will try to revisit this soon.&lt;/p&gt;</comment>
                            <comment id="208200" author="gerrit" created="Wed, 13 Sep 2017 03:37:19 +0000"  >&lt;p&gt;Oleg Drokin (oleg.drokin@intel.com) merged in patch &lt;a href=&quot;https://review.whamcloud.com/21055/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/21055/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-8342&quot; title=&quot;ZFS dnodesize and recordsize should be set at file system creation&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-8342&quot;&gt;&lt;del&gt;LU-8342&lt;/del&gt;&lt;/a&gt; utils: Set dnodesize/recordsize at zfs dataset create&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 1617b8f6b6cdd0f5b74d7bfb8166d74b63cfed81&lt;/p&gt;</comment>
                            <comment id="208211" author="pjones" created="Wed, 13 Sep 2017 03:49:46 +0000"  >&lt;p&gt;Landed for 2.11&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10010">
                    <name>Duplicate</name>
                                                                <inwardlinks description="is duplicated by">
                                                        </inwardlinks>
                                    </issuelinktype>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="36252">LU-8042</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzyg3r:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                </customfields>
    </item>
</channel>
</rss>