<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:25:10 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-2435] inode accounting in osd-zfs is racy</title>
                <link>https://jira.whamcloud.com/browse/LU-2435</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Observed on the MDT during weekend testing with an mixed I/O workload.  I agree failure here shouldn&apos;t be fatal, however I&apos;m not sure we should even bother telling the administrator.  If it&apos;s true there&apos;s nothing that needs to be (or can be) done, then there&apos;s no point is logging this to the console.  Changing it to a CDEBUG() would be preferable.  If however the administrator needs to do something (which I don&apos;t believe is the case) that should be part of the message.&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;2012-05-04 17:12:17 LustreError: 14242:0:(osd_handler.c:997:osd_object_destroy()) lcz-MDT0000: failed to remove [0x2000023a1:0xf3ed:0x0] from accounting ZAP for grp 36427 (-2)
2012-05-04 17:12:17 LustreError: 14242:0:(osd_handler.c:997:osd_object_destroy()) Skipped 1 previous similar message
2012-05-04 17:16:22 LustreError: 14244:0:(osd_handler.c:991:osd_object_destroy()) lcz-MDT0000: failed to remove [0x20000239a:0x991a:0x0] from accounting ZAP for usr 36427 (-2)
2012-05-04 17:16:30 LustreError: 5618:0:(osd_handler.c:2003:osd_object_create()) lcz-MDT0000: failed to add [0x20000239a:0x9d46:0x0] to accounting ZAP for usr 36427 (-2)
2012-05-04 17:16:34 LustreError: 14241:0:(osd_handler.c:991:osd_object_destroy()) lcz-MDT0000: failed to remove [0x2000013ca:0x1952b:0x0] from accounting ZAP for usr 36427 (-2)
2012-05-04 17:16:53 LustreError: 6778:0:(osd_handler.c:997:osd_object_destroy()) lcz-MDT0000: failed to remove [0x2000013ca:0x19e3f:0x0] from accounting ZAP for grp 36427 (-2)
2012-05-04 17:31:21 LustreError: 6778:0:(osd_handler.c:2003:osd_object_create()) lcz-MDT0000: failed to add [0x200002393:0x1ee18:0x0] to accounting ZAP for usr 36427 (-2)
2012-05-04 17:31:32 LustreError: 6770:0:(osd_handler.c:991:osd_object_destroy()) lcz-MDT0000: failed to remove [0x200002395:0x167e0:0x0] from accounting ZAP for usr 36427 (-2)
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</description>
                <environment></environment>
        <key id="14304">LU-2435</key>
            <summary>inode accounting in osd-zfs is racy</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="jay">Jinshan Xiong</assignee>
                                    <reporter username="behlendorf">Brian Behlendorf</reporter>
                        <labels>
                            <label>RZ_LS</label>
                            <label>llnl</label>
                            <label>quota</label>
                    </labels>
                <created>Mon, 7 May 2012 13:42:47 +0000</created>
                <updated>Sun, 15 Apr 2018 00:13:57 +0000</updated>
                            <resolved>Thu, 30 Mar 2017 22:07:27 +0000</resolved>
                                    <version>Lustre 2.4.1</version>
                                    <fixVersion>Lustre 2.10.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>15</watches>
                                                                            <comments>
                            <comment id="38804" author="johann" created="Tue, 15 May 2012 03:20:10 +0000"  >&lt;p&gt;Brian, i guess you have all those messages because you have an &quot;old&quot; filesystem which hadn&apos;t the accounting ZAPs when it was created.&lt;br/&gt;
I&apos;m fine to &quot;hide&quot; those errors, but the problem is that accounting will be broken and the administrator won&apos;t be notified ...&lt;/p&gt;</comment>
                            <comment id="38826" author="behlendorf" created="Tue, 15 May 2012 12:13:04 +0000"  >&lt;p&gt;Actually this was on a newly formatted filesystem.  While I haven&apos;t looked close I was attributing this to perhaps another concurrent destroy operation or some other race?.  Unfortunately I don&apos;t have full logs, when I see it again I&apos;ll try and grab some debugging.&lt;/p&gt;</comment>
                            <comment id="40982" author="johann" created="Thu, 21 Jun 2012 08:05:42 +0000"  >&lt;p&gt;the problem is that zap_increment() has no locking (it first looks up the value and then updates it), so inode accounting is currently racy with concurrent destroy/create, you are right &lt;img class=&quot;emoticon&quot; src=&quot;https://jira.whamcloud.com/images/icons/emoticons/sad.png&quot; height=&quot;16&quot; width=&quot;16&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/p&gt;</comment>
                            <comment id="49463" author="johann" created="Wed, 19 Dec 2012 16:02:00 +0000"  >&lt;p&gt;Brian, do you still see those warnings?&lt;/p&gt;</comment>
                            <comment id="88260" author="johann" created="Mon, 7 Jul 2014 13:50:15 +0000"  >&lt;p&gt;The race should be addressed by &lt;a href=&quot;http://review.whamcloud.com/#/c/7157&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#/c/7157&lt;/a&gt; which got reverted but will hopefully be re-landed soon.&lt;br/&gt;
Unfortunately, we have no way to fix on-disk inode accounting (ZFS has no fsck), but inode estimate can be enabled via:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;# lctl set_param osd-zfs.*-MDT*.quota_iused_estimate=1
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="88683" author="johann" created="Thu, 10 Jul 2014 09:34:32 +0000"  >&lt;p&gt;Brian, Alex, according to you, what is the best way to fix on-disk inode accounting? Is there an easy way in ZFS to iterate over all dnodes? Maybe we could implement a quiescent quota check in ZFS OSD to fix accounting when it goes wrong like in this case.&lt;/p&gt;</comment>
                            <comment id="88975" author="behlendorf" created="Mon, 14 Jul 2014 20:10:46 +0000"  >&lt;p&gt;Johann You can use the dmu_object_next() function to walk all of the dnodes in a object set.  It will be a little tricky to do this online because as Alex noticed in another issue objects being actively allocated will be skipped.  I wouldn&apos;t be opposed to us adding a little debug patch or writing utility to do this.  But I think the take-away from this is we need to be more ruthless about letting these kinds of defects in.&lt;/p&gt;</comment>
                            <comment id="88978" author="behlendorf" created="Mon, 14 Jul 2014 20:44:05 +0000"  >&lt;p&gt;Alternately, since the Lustre code uses the DMU_GROUPUSED_OBJECT and the DMU_USERUSED_OBJECT we could probably fix this with a file-level backup.  By mounting each server through the Posix layer and rsync&apos;ing it to a new dataset we&apos;d effectively regenerate the correct quota accounting.  This would strip off all the fids in the directories but a subsequent lfsck &lt;em&gt;should&lt;/em&gt; fix it.  That would be a nice way to verify lfsck actually works correctly.&lt;/p&gt;

&lt;p&gt;There are other ways to tackle this as well, extending &apos;zpool scrub&apos; or &apos;zfs send/recv&apos; but I really don&apos;t like any of them.  Better to just fix the root cause and ensure it never comes back then add ugly code to handle this case and have to live with it forever.&lt;/p&gt;</comment>
                            <comment id="90282" author="johann" created="Tue, 29 Jul 2014 14:25:52 +0000"  >&lt;blockquote&gt;
&lt;p&gt;Johann You can use the dmu_object_next() function to walk all of the dnodes in a object set. It will be a little tricky to do this online because as Alex noticed in another issue objects being actively allocated will be skipped. I wouldn&apos;t be opposed to us adding a little debug patch or writing utility to do this.&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;ok, i will have a look. Thanks.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;But I think the take-away from this is we need to be more ruthless about letting these kinds of defects in.&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;Agreed, although i have always been opposed to maintain space accounting inside osd. It would have been simpler if ZFS was doing inode accounting in the same way as block accounting.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Alternately, since the Lustre code uses the DMU_GROUPUSED_OBJECT and the DMU_USERUSED_OBJECT we could probably fix this with a file-level backup. By mounting each server through the Posix layer and rsync&apos;ing it to a new dataset we&apos;d effectively regenerate the correct quota accounting.&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;DMU_{USER,GROUP}USED_OBJECT are used for block accounting which works fine. The issue here is with inode accounting which is maintained by osd-zfs since zfs does not support this.&lt;/p&gt;

&lt;p&gt;Thanks.&lt;/p&gt;</comment>
                            <comment id="90286" author="bzzz" created="Tue, 29 Jul 2014 14:42:36 +0000"  >&lt;p&gt;Johann, the patch does quite the same as ZFS - in-core structure to track delta, then apply delta at sync.&lt;/p&gt;</comment>
                            <comment id="90323" author="behlendorf" created="Tue, 29 Jul 2014 16:57:04 +0000"  >&lt;p&gt;&amp;gt; It would have been simpler if ZFS was doing inode accounting in the same way as block accounting.&lt;/p&gt;

&lt;p&gt;I&apos;m not completely opposed to adding inode accounting to in to the ZFS code.  That would allow us to integrate better with the existing Linux quota utilities which would be nice.&lt;/p&gt;</comment>
                            <comment id="90521" author="johann" created="Thu, 31 Jul 2014 07:51:13 +0000"  >&lt;blockquote&gt;
&lt;p&gt;I&apos;m not completely opposed to adding inode accounting to in to the ZFS code. That would allow us to integrate better with the existing Linux quota utilities which would be nice.&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;Great, i will provide a patch against the ZFS tree and then change ZFS OSD to use ZFS inode accounting when available. Thanks.&lt;/p&gt;</comment>
                            <comment id="91095" author="johann" created="Thu, 7 Aug 2014 19:46:19 +0000"  >&lt;p&gt;ZFS patch pending review:&lt;br/&gt;
&lt;a href=&quot;https://github.com/zfsonlinux/zfs/issues/2576&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://github.com/zfsonlinux/zfs/issues/2576&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="97823" author="johann" created="Wed, 29 Oct 2014 13:58:40 +0000"  >&lt;p&gt;Reassign to Isaac as per our discussion by email&lt;/p&gt;</comment>
                            <comment id="100073" author="adilger" created="Tue, 25 Nov 2014 19:04:25 +0000"  >&lt;p&gt;Johann, Isaac, it looks like the patch got a review from Richard Yao on the pull request &lt;a href=&quot;https://github.com/zfsonlinux/zfs/pull/2577:&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://github.com/zfsonlinux/zfs/pull/2577:&lt;/a&gt;&lt;/p&gt;
&lt;blockquote&gt;&lt;/blockquote&gt;
&lt;p&gt;I do not have time to do a full review, but I can give you some comments. First, there are some style issues with this:&lt;/p&gt;

&lt;p&gt;./lib/libzfs/libzfs_dataset.c: 2606: continuation line not indented by 4 spaces&lt;br/&gt;
./lib/libzfs/libzfs_dataset.c: 2608: line &amp;gt; 80 characters&lt;br/&gt;
./lib/libzfs/libzfs_dataset.c: 2608: continuation line not indented by 4 spaces&lt;br/&gt;
./lib/libzfs/libzfs_dataset.c: 2742: spaces instead of tabs&lt;br/&gt;
./module/zfs/zfs_vfsops.c: 473: spaces instead of tabs&lt;br/&gt;
./module/zfs/zfs_vfsops.c: 548: spaces instead of tabs&lt;br/&gt;
./module/zfs/zfeature_common.c: 223: indent by spaces instead of tabs&lt;br/&gt;
./module/zfs/zfeature_common.c: 224: indent by spaces instead of tabs&lt;br/&gt;
./module/zfs/zfeature_common.c: 225: indent by spaces instead of tabs&lt;br/&gt;
./module/zfs/dnode_sync.c: 582: line &amp;gt; 80 characters&lt;br/&gt;
./module/zfs/dmu_objset.c: 1136: spaces instead of tabs&lt;br/&gt;
./module/zfs/dmu_objset.c: 1137: spaces instead of tabs&lt;br/&gt;
./module/zfs/dmu_objset.c: 1163: continuation line not indented by 4 spaces&lt;br/&gt;
./module/zfs/dmu_objset.c: 1165: continuation line not indented by 4 spaces&lt;br/&gt;
./module/zfs/dmu_objset.c: 1167: line &amp;gt; 80 characters&lt;br/&gt;
./module/zfs/dmu_objset.c: 1171: line &amp;gt; 80 characters&lt;br/&gt;
./module/zfs/dmu_objset.c: 1215: line &amp;gt; 80 characters&lt;br/&gt;
./cmd/zfs/zfs_main.c: 2807: spaces instead of tabs&lt;br/&gt;
./cmd/zfs/zfs_main.c: 2810: line &amp;gt; 80 characters&lt;br/&gt;
./cmd/zfs/zfs_main.c: 2810: spaces instead of tabs&lt;/p&gt;

&lt;p&gt;Second, I see that you used ZFS instead of using it. That is likely the logical choice, but unfortunately, Solaris has already used that and reusing it will make an unfortunate situation with respect to platform incompatibility worse. This is why feature flags were designed, but so far, they have only been implemented for the zpool version because we had nothing that needed it for the ZFS version. This would merit feature flags on the ZFS version, so that would need to be done before it could be merged, provided that there are no other potential issues and this is portable enough that other Open ZFS implementations could merge it so that we remain compatible with them.&lt;/p&gt;
{quota}

&lt;p&gt;Can you update the patch to move this code forward?&lt;/p&gt;</comment>
                            <comment id="100111" author="johann" created="Wed, 26 Nov 2014 06:15:52 +0000"  >&lt;p&gt;Yup, i am aware and discussed this a bit with Brian some time ago. Isaac has agreed to take over the patch and will be updating it.&lt;/p&gt;</comment>
                            <comment id="100372" author="behlendorf" created="Mon, 1 Dec 2014 23:07:38 +0000"  >&lt;p&gt;During the OpenZFS summit Isaac and I had a chance to discuss this.  We came up with a nice clean solution based on Johann initial patch. If Issac has the time to do the heavy lifting on this I&apos;m happy to work with him to get the patch reviewed and in to a form where it can be merged.  There&apos;s a fair bit of work remaining to get it where it needs to be.  But it should be pretty straight forward:&lt;/p&gt;

&lt;p&gt;Required functionality:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;Add a feature flag.&lt;/li&gt;
	&lt;li&gt;Rework the code so this feature can be enabled without rewiring every dnode.&lt;/li&gt;
	&lt;li&gt;Rework the code to use sync tasks so it can be resumed across pool import/exports.&lt;/li&gt;
	&lt;li&gt;Update the zpool status command to provide status information while the feature is being enabled.&lt;/li&gt;
	&lt;li&gt;Add the ioctl() handlers so utilities such as repquota work.&lt;/li&gt;
	&lt;li&gt;Update send/recv to handle this change.&lt;/li&gt;
	&lt;li&gt;Update the zfs(8) and zpool-features(5) man pages.&lt;/li&gt;
&lt;/ul&gt;
</comment>
                            <comment id="100390" author="isaac" created="Tue, 2 Dec 2014 07:02:59 +0000"  >&lt;p&gt;Two notes about the implementation:&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;May add two new special ZAP objects for the dnode accounting, so that the feature can be easily disabled by removing the objects.&lt;/li&gt;
	&lt;li&gt;One way to enable it without rewriting every dnode is to take a snapshot and iterate over all objects in the snapshot while doing incremental accounting for changes after the snapshot was created:&lt;br/&gt;
    1. Return -EAGAIN to any query before objects in the snapshot are all counted&lt;br/&gt;
    2. Remove the snapshot when counting done&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;I&apos;ll be able to begin the work later this week.&lt;/p&gt;</comment>
                            <comment id="100435" author="adilger" created="Tue, 2 Dec 2014 17:54:55 +0000"  >&lt;p&gt;It isn&apos;t clear how the snapshot will help? There will continue to be changed beyond the snapshot that woukd need to be accounted, so I don&apos;t think that will help. &lt;/p&gt;

&lt;p&gt;There is already a process for iterating all the dnodes to enable regular quota, so hopefully the same process can be used for inode accounting? As long as accounting is done incrementally and (IIRC) inodes flagged so they are not counted twice then this can proceed while the filesystem is in use. perhaps an auxiliary bitmap while the initial accounting is enabled, that can be deleted when finished?&lt;/p&gt;</comment>
                            <comment id="100436" author="behlendorf" created="Tue, 2 Dec 2014 18:01:45 +0000"  >&lt;p&gt;&amp;gt; One way to enable it without rewriting every dnode is to take a snapshot&lt;/p&gt;

&lt;p&gt;This is a good idea but we can do better.  If we were to use full snapshots there are some significant downsides.&lt;/p&gt;

&lt;p&gt;1. We&apos;d need to take a snapshot per dataset and it&apos;s not uncommon for pools to have 1000s, 10,000s, or 100,000s of datasets.  Doubling this just to enable the feature is a bit heavy handed.&lt;br/&gt;
2. While the snapshots exist we can&apos;t free any data in the pool since it will be referenced by a snapshot.  This would be problematic for pools which are already near capacity.&lt;br/&gt;
3. I suspect cleanly handling all the possible failure modes will be fairly complicated.  You&apos;ll need to do the creation of all the snapshots in a sync task and be able to unwind them all in the event of a failure.  You&apos;ll also want to do it in a single tx so that either all the snapshots exist or none of them do.  When we&apos;re talking about a large number of snapshots this may take a significant amount of time (several seconds).&lt;br/&gt;
4. Doing any operation which spans datasets complicates things considerably.  If this could be avoided it would greatly simplify the problem.&lt;/p&gt;

&lt;p&gt;Luckily, for this specific case a full snapshot isn&apos;t needed.  Storing the TXG number in which the feature was enabled and the per-dataset dnode number for the traversal is enough.  This is possible because every dnode already stores the TXG number it was originally allocated in (dn-&amp;gt;dn_allocated_txg).  We can also leverage the fact that the traversal will strictly happen for lowest to highest numbered dnodes.  Which means we can split the problem up like this:&lt;/p&gt;

&lt;p&gt;1. Newly allocated dnodes always update the quota ZAP&lt;/p&gt;

&lt;p&gt;2. Freed dnodes update the quota ZAP as follows&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;if (dn-&amp;gt;dn_object &amp;lt; dataset-&amp;gt;scan_object): dnode has been traversed by scan update quota ZAP&lt;/li&gt;
	&lt;li&gt;if (dn-&amp;gt;dn_object &amp;gt;= dataset-&amp;gt;scan_object): dnode has NOT been traversed by scan no update needed&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;3. Dnode traversal scan, this can be done with the existing dnode iterator&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;if (dn-&amp;gt;dn_allocated_txg &amp;lt; feature_txg): dnode is not accounted for update quota ZAP&lt;/li&gt;
	&lt;li&gt;if (dn-&amp;gt;dn_allocated_txg &amp;gt;= feature_txg): dnode is new has been accounted for during create no update needed&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;The traversal part of this would need to be done in small batches in a sync task.  This would allow us to transactional update the dataset-&amp;gt;scan_object on disk so the traversal can be resumed and it simplifies concurrency concerns.  Doing it this way addresses my concerns above and nicely simplifies the logic and the amount of code needed.  For example, all that&apos;s needed to abort the entire operation or disable the feature is to stop the traversal sync task (if running) and remove the two new ZAPs.&lt;/p&gt;</comment>
                            <comment id="118597" author="gerrit" created="Mon, 15 Jun 2015 22:07:56 +0000"  >&lt;p&gt;Jinshan Xiong (jinshan.xiong@intel.com) uploaded a new patch: &lt;a href=&quot;http://review.whamcloud.com/15294&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/15294&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-2435&quot; title=&quot;inode accounting in osd-zfs is racy&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-2435&quot;&gt;&lt;del&gt;LU-2435&lt;/del&gt;&lt;/a&gt; osd-zfs: use zfs native dnode accounting&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 3a44dfc5b64383dddf5b18a683ab98ce5b7cd4da&lt;/p&gt;</comment>
                            <comment id="118603" author="jay" created="Mon, 15 Jun 2015 22:27:24 +0000"  >&lt;p&gt;I pushed patch 15294 to gerrit to use ZFS dnode accounting for Lustre inode accounting. This is the work phase one. The second phase is to launch a thread to iterate all dnodes in the objset of MDT to repair accounting of legacy file system.&lt;/p&gt;</comment>
                            <comment id="118715" author="jay" created="Tue, 16 Jun 2015 19:27:20 +0000"  >&lt;p&gt;I&apos;m going to start the 2nd phase of this work. At the time of dnode accounting feature is enabled, it will check if the file system is a legacy one. In that case, it will launch a thread to iterate all dnodes in all dataset and repair dnode accounting. I have two concerns for this task:&lt;/p&gt;

&lt;p&gt;1. this piece of code will become dead once the scanning task is done, and new FS won&apos;t need it at all. Will ZFS upstream team accept this code? Based on the same concern, I will make an independent patch based on &lt;a href=&quot;https://github.com/zfsonlinux/zfs/pull/2577&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://github.com/zfsonlinux/zfs/pull/2577&lt;/a&gt;;&lt;br/&gt;
2. does it need to do anything special for snapshot and clone?&lt;/p&gt;

&lt;p&gt;Thanks.&lt;/p&gt;</comment>
                            <comment id="118755" author="adilger" created="Wed, 17 Jun 2015 07:04:41 +0000"  >&lt;p&gt;There is already existing code to do the dnode iteration to enable user/group block quota on an existing filesystem, and I suspect that this is still part of the ZFS code, and may never be deleted.  There hopefully shouldn&apos;t be the need for much new code beyond the existing block quota iterator.&lt;/p&gt;</comment>
                            <comment id="118830" author="behlendorf" created="Wed, 17 Jun 2015 16:57:07 +0000"  >&lt;p&gt;To my knowledge there is no existing code for dnode accounting in ZFS.  How to implement it in a reasonable way was described above but no one has yet done that work.  However once it is done and merged we&apos;ll have to maintain it forever since old filesystems may always need to be upgraded.&lt;/p&gt;</comment>
                            <comment id="119555" author="jay" created="Thu, 25 Jun 2015 00:37:30 +0000"  >&lt;p&gt;The proposal made by Andreas worked. This is what I have done:&lt;/p&gt;

&lt;p&gt;0. prepare an &apos;old&apos; lustre zfs backend and copy some files into the file system with user &apos;tstusr&apos;;&lt;br/&gt;
1. apply patch &lt;a href=&quot;http://review.whamcloud.com/15180&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/15180&lt;/a&gt; to zfs repo and patch 15294 to lustre;&lt;br/&gt;
2. recompile and install zfs and lustre;&lt;br/&gt;
3. upgrade pool by &apos;zpool upgrade lustre-mdt1&apos;;&lt;br/&gt;
4. mount Lustre;&lt;br/&gt;
5. upgrade mdt to use native dnode accounting by &apos;lctl set_param osd-zfs.lustre-MDT0000.quota_native_dnused_upgrade&apos;;&lt;br/&gt;
6. and it worked. I can get correct dnode use accounting by &apos;lfs quota -u tstusr /mnt/lustre&apos;;&lt;/p&gt;</comment>
                            <comment id="121167" author="pjones" created="Mon, 13 Jul 2015 17:17:12 +0000"  >&lt;p&gt;Jinshan&lt;/p&gt;

&lt;p&gt;The link #15180 does not seem to work. Could you please confirm whether that is correct?&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="121172" author="jay" created="Mon, 13 Jul 2015 17:27:20 +0000"  >&lt;p&gt;I will work on this.&lt;/p&gt;</comment>
                            <comment id="138570" author="adilger" created="Mon, 11 Jan 2016 21:50:59 +0000"  >&lt;p&gt;The latest version of the patch is actually &lt;a href=&quot;https://github.com/zfsonlinux/zfs/pull/3983&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://github.com/zfsonlinux/zfs/pull/3983&lt;/a&gt; and not 3723.&lt;/p&gt;</comment>
                            <comment id="147004" author="bzzz" created="Sun, 27 Mar 2016 06:47:47 +0000"  >&lt;p&gt;each ZAP declaration adds ~0.5MB to space/memory reservation. so dnode accounting adds ~1MB, which is 20% of single-stripe object creation.&lt;/p&gt;</comment>
                            <comment id="147060" author="jay" created="Mon, 28 Mar 2016 17:21:21 +0000"  >&lt;p&gt;Is ~0.5MB the worst case or on average? Can you please describe this in detail?&lt;/p&gt;</comment>
                            <comment id="147064" author="bzzz" created="Mon, 28 Mar 2016 17:45:19 +0000"  >&lt;p&gt;depends on how exactly we call dmu_tx_hold_zap(): the more specific, the less credits. unfortunately when the key is specified, then declaration becomes very expensing due to hash lock/lookup. also, the final calculation depends on inflation size:&lt;br/&gt;
	asize = spa_get_asize(tx-&amp;gt;tx_pool-&amp;gt;dp_spa, towrite + tooverwrite);&lt;/p&gt;

&lt;p&gt;spa_get_asize(spa_t *spa, uint64_t lsize)&lt;br/&gt;
{&lt;br/&gt;
	return (lsize * spa_asize_inflation);&lt;/p&gt;

&lt;p&gt;int spa_asize_inflation = 24;&lt;/p&gt;
</comment>
                            <comment id="188973" author="gerrit" created="Mon, 20 Mar 2017 16:27:14 +0000"  >&lt;p&gt;Jinshan Xiong (jinshan.xiong@intel.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/26090&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/26090&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-2435&quot; title=&quot;inode accounting in osd-zfs is racy&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-2435&quot;&gt;&lt;del&gt;LU-2435&lt;/del&gt;&lt;/a&gt; osd-zfs: for test only&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: eac70d59649d2ab3bd5b7d0acc94cbf67a430251&lt;/p&gt;</comment>
                            <comment id="190073" author="gerrit" created="Thu, 30 Mar 2017 03:54:51 +0000"  >&lt;p&gt;Oleg Drokin (oleg.drokin@intel.com) merged in patch &lt;a href=&quot;https://review.whamcloud.com/15294/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/15294/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-2435&quot; title=&quot;inode accounting in osd-zfs is racy&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-2435&quot;&gt;&lt;del&gt;LU-2435&lt;/del&gt;&lt;/a&gt; osd-zfs: use zfs native dnode accounting&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 11afef00b6af407b8987076bd4f1ec9bc77eb75e&lt;/p&gt;</comment>
                            <comment id="190222" author="pjones" created="Thu, 30 Mar 2017 22:07:27 +0000"  >&lt;p&gt;Landed for 2.10&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10120">
                    <name>Blocker</name>
                                            <outwardlinks description="is blocking">
                                        <issuelink>
            <issuekey id="35506">LU-7895</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                            <issuelinktype id="10010">
                    <name>Duplicate</name>
                                                                <inwardlinks description="is duplicated by">
                                                        </inwardlinks>
                                    </issuelinktype>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="35896">LU-7991</issuekey>
        </issuelink>
                            </outwardlinks>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="37822">LU-8326</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="31378">LU-6965</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="42333">LU-8927</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="44569">LU-9192</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="17170">LU-2619</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="26634">LU-5638</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="35896">LU-7991</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10490" key="com.atlassian.jira.plugin.system.customfieldtypes:datepicker">
                        <customfieldname>End date</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Wed, 6 Apr 2016 13:42:47 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10040" key="com.atlassian.jira.plugin.system.customfieldtypes:labels">
                        <customfieldname>Epic</customfieldname>
                        <customfieldvalues>
                                        <label>server</label>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzux0f:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>3010</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                        <customfield id="customfield_10493" key="com.atlassian.jira.plugin.system.customfieldtypes:datepicker">
                        <customfieldname>Start date</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Mon, 7 May 2012 13:42:47 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                    </customfields>
    </item>
</channel>
</rss>