<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:57:56 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-6177] LFSCK 4: namespace LFSCK scalability</title>
                <link>https://jira.whamcloud.com/browse/LU-6177</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Currently, for namespace LFSCK routine check without inconsistency repaired, the best bundle performance is under 4-MDTs configuration. As more MDTs join, the performance decreased. It is totally out of our expectation, should be resolved.&lt;/p&gt;</description>
                <environment></environment>
        <key id="28446">LU-6177</key>
            <summary>LFSCK 4: namespace LFSCK scalability</summary>
                <type id="7" iconUrl="https://jira.whamcloud.com/images/icons/issuetypes/task_agile.png">Technical task</type>
                            <parent id="29081">LU-6361</parent>
                                    <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="yong.fan">nasf</assignee>
                                    <reporter username="yong.fan">nasf</reporter>
                        <labels>
                    </labels>
                <created>Thu, 29 Jan 2015 01:39:39 +0000</created>
                <updated>Fri, 1 May 2015 03:57:53 +0000</updated>
                            <resolved>Fri, 1 May 2015 03:57:53 +0000</resolved>
                                    <version>Lustre 2.7.0</version>
                                    <fixVersion>Lustre 2.8.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>6</watches>
                                                                            <comments>
                            <comment id="105063" author="adilger" created="Thu, 29 Jan 2015 02:27:35 +0000"  >&lt;p&gt;I don&apos;t think it is only a matter of performance going down after 4 MDTs. The biggest issue is that aggregate performance isn&apos;t scaling at all when new MDTs are added. With only a small percentage of cross-MDT and hard-linked objects, most of the MDT namespace scanning should be local to the MDT and the aggregate scanning performance should scale almost linearly with the addition of each MDT. &lt;/p&gt;

&lt;p&gt;Since the performance was flat for 2-6 MDTs then either:&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;the performance results are actually per-MDT and not aggregate&lt;/li&gt;
	&lt;li&gt;there is some kind of bottleneck or too much communication between MDTs that is preventing scaling.&lt;/li&gt;
&lt;/ul&gt;
</comment>
                            <comment id="106580" author="yong.fan" created="Wed, 11 Feb 2015 02:55:32 +0000"  >&lt;p&gt;The main reason for the bad aggregated namespace LFSCK performance is that the performance calculating method is not suitable. After studying the test data, I found that it was always the MDT0 scanned more objects than the other MDTs. That caused the other MDTs had to wait the MDT0 to finish its first-stage scanning, then their performance became very slow because of the long time waiting for the MDT0.&lt;/p&gt;

&lt;p&gt;In fact, for each MDT, the real performance should be calculated as: the scanned objects is divided by the scanned time, not including the waiting time after the first-stage scanning. With such new calculating method, the real performance for each MDT is approximately equal. I will make patch for that and re-test the performance.&lt;/p&gt;</comment>
                            <comment id="106751" author="adilger" created="Thu, 12 Feb 2015 02:13:49 +0000"  >&lt;p&gt;Shouldn&apos;t the number of files per MDT be about the same?  Should the test config create balanced file creation?  I thought the top-level directories are spread across all MDTs and then all the files are created in those directories?&lt;/p&gt;</comment>
                            <comment id="106792" author="yong.fan" created="Thu, 12 Feb 2015 13:32:06 +0000"  >&lt;p&gt;It should be, but unfortunately, because of the test script issue, the master MDT-object of striped directory is always created on MDT0, as to the objects count on the MDTs are not balance unexpectedly.&lt;/p&gt;

&lt;p&gt;On the other hand, we should not assume that every MDT has the same processing capability. We still need to adjust the performance calculating method.&lt;/p&gt;</comment>
                            <comment id="106793" author="bzzz" created="Thu, 12 Feb 2015 13:36:09 +0000"  >&lt;p&gt;even so, that should give us performance multiplied by (#MDTs-1), it shouldn&apos;t stop to scale?&lt;/p&gt;</comment>
                            <comment id="106794" author="yong.fan" created="Thu, 12 Feb 2015 13:57:07 +0000"  >&lt;p&gt;As the MDTs increased, the waiting time (as described above) increased also, so the aggregated performance does not scale as expected.&lt;/p&gt;</comment>
                            <comment id="106846" author="adilger" created="Thu, 12 Feb 2015 19:48:38 +0000"  >&lt;blockquote&gt;
&lt;p&gt;It should be, but unfortunately, because of the test script issue, the master MDT-object of striped directory is always created on MDT0, as to the objects count on the MDTs are not balance unexpectedly.&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;Is that because all of the striped directories are created at the top level directory (on MDT0)?  Otherwise, I would think that the master MDT object should be on the same MDT as the parent directory.  If not, I think that is a bug in the DNE code.&lt;/p&gt;

&lt;p&gt;Secondly, even if the master MDT object of each striped directory is on MDT0, this should only be a few thousand more objects, but the actual files created inside the striped directories should be balanced evenly across all MDTs, or again this would be a bug in the DNE code.&lt;/p&gt;</comment>
                            <comment id="107111" author="yong.fan" created="Tue, 17 Feb 2015 14:41:40 +0000"  >&lt;p&gt;The striped directories were created under each sub-directory. The master MDT-object of the striped directory should reside on the same MDT as its parent directory, but because of test scripts issue, it was created on the MDT0 always. On the other hand, the test scripts did not handle the remote sub-directory properly, and caused the remote sub-directory were also unbalanced among the MDTs. I have fixed the test scripts and made them to be balanced.&lt;/p&gt;</comment>
                            <comment id="109205" author="gerrit" created="Mon, 9 Mar 2015 15:12:16 +0000"  >&lt;p&gt;Fan Yong (fan.yong@intel.com) uploaded a new patch: &lt;a href=&quot;http://review.whamcloud.com/14014&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/14014&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6177&quot; title=&quot;LFSCK 4: namespace LFSCK scalability&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6177&quot;&gt;&lt;del&gt;LU-6177&lt;/del&gt;&lt;/a&gt; lfsck: calculate the phase2 time correctly&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: edf9f948ad9f5c86ddf1a891dae8ce0cdde07593&lt;/p&gt;</comment>
                            <comment id="109207" author="yong.fan" created="Mon, 9 Mar 2015 15:17:36 +0000"  >&lt;p&gt;Above patch fixed an serious issue that will cause the phase2 time is longer than the real used time by the second-stage scanning.&lt;/p&gt;</comment>
                            <comment id="113964" author="gerrit" created="Fri, 1 May 2015 03:22:17 +0000"  >&lt;p&gt;Oleg Drokin (oleg.drokin@intel.com) merged in patch &lt;a href=&quot;http://review.whamcloud.com/14014/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/14014/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6177&quot; title=&quot;LFSCK 4: namespace LFSCK scalability&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6177&quot;&gt;&lt;del&gt;LU-6177&lt;/del&gt;&lt;/a&gt; lfsck: calculate the phase2 time correctly&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 0f4875343e22bcdfe18708806e172aa234da23a6&lt;/p&gt;</comment>
                            <comment id="113979" author="yong.fan" created="Fri, 1 May 2015 03:57:53 +0000"  >&lt;p&gt;Related patches have been landed to master&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzx567:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>17278</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                </customfields>
    </item>
</channel>
</rss>