<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 03:00:00 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-13287] DNE2 - Shared directory performance does not scale and starts to plateau beyond 2MDTs</title>
                <link>https://jira.whamcloud.com/browse/LU-13287</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;While testing in an environment with a single parent directory following by 1 shared sub directory for all client mdtest ranks, we are observe very little scaling when moving to more than 2 MDTs. See below for 1 million objects per MDT, 0K File Creates:&lt;/p&gt;

&lt;p&gt;1 MDTs - 83,948&lt;br/&gt;
2 MDTs - 115,929&lt;br/&gt;
3 MDTs - 123,186&lt;br/&gt;
4 MDTs - 130,846&lt;/p&gt;

&lt;p&gt;Stats and deletes are showcasing similar results. It seems to not follow a linear scale but instead plateaus. It would also seem that we are not the only ones to observe this. A recent Cambridge University IO-500 presentation presented a slide with very similar results (fourth from the bottom):&#160;&lt;a href=&quot;https://www.eofs.eu/_media/events/lad19/03_matt_raso-barnett-io500-cambridge.pdf&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://www.eofs.eu/_media/events/lad19/03_matt_raso-barnett-io500-cambridge.pdf&lt;/a&gt;&lt;/p&gt;</description>
                <environment></environment>
        <key id="58168">LU-13287</key>
            <summary>DNE2 - Shared directory performance does not scale and starts to plateau beyond 2MDTs</summary>
                <type id="4" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11310&amp;avatarType=issuetype">Improvement</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="6">Not a Bug</resolution>
                                        <assignee username="wc-triage">WC Triage</assignee>
                                    <reporter username="koutoupis">Petros Koutoupis</reporter>
                        <labels>
                            <label>dne2</label>
                            <label>llnl</label>
                            <label>performance</label>
                    </labels>
                <created>Fri, 21 Feb 2020 20:39:08 +0000</created>
                <updated>Wed, 20 May 2020 00:00:14 +0000</updated>
                            <resolved>Wed, 20 May 2020 00:00:06 +0000</resolved>
                                    <version>Lustre 2.13.0</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>7</watches>
                                                                            <comments>
                            <comment id="263846" author="spitzcor" created="Fri, 21 Feb 2020 22:41:40 +0000"  >&lt;p&gt;Possibly also reported and related to &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-9436&quot; title=&quot;DNE2 - performance improvement with wide stripping directory&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-9436&quot;&gt;LU-9436&lt;/a&gt;.&lt;/p&gt;</comment>
                            <comment id="263847" author="adilger" created="Fri, 21 Feb 2020 23:22:00 +0000"  >&lt;p&gt;Is this with one MDT per MDS, or are all four MDTs on the same MDS?  If all MDTs are on the same MDS, then this is totally expected, as there just isn&apos;t enough unused CPU/network on the MDS to double or quadruple the performance on that node.&lt;/p&gt;

&lt;p&gt;How many clients are being used for this test?  Does the performance improve when there are additional clients added for the 3/4 MDT test cases?  Having the actual test command line included in the problem description would make this report a lot more useful.&lt;/p&gt;

&lt;p&gt;Looking at the referenced slide from the Cambridge presentation (attached), it actually shows almost linear scaling for additional MDTs (one per MDS) up to 48, excluding the 48-MDT stat test.  I suspect in that case they didn&apos;t have enough clients to drive the aggregate MDT performance to saturation.&lt;/p&gt;

&lt;p&gt;&lt;span class=&quot;image-wrap&quot; style=&quot;&quot;&gt;&lt;a id=&quot;34327_thumb&quot; href=&quot;https://jira.whamcloud.com/secure/attachment/34327/34327_Screen+Shot+2020-02-21+at+16.11.52.png&quot; title=&quot;Screen Shot 2020-02-21 at 16.11.52.png&quot; file-preview-type=&quot;image&quot; file-preview-id=&quot;34327&quot; file-preview-title=&quot;Screen Shot 2020-02-21 at 16.11.52.png&quot;&gt;&lt;img src=&quot;https://jira.whamcloud.com/secure/thumbnail/34327/_thumb_34327.png&quot; style=&quot;border: 0px solid black&quot; role=&quot;presentation&quot;/&gt;&lt;/a&gt;&lt;/span&gt; &lt;/p&gt;</comment>
                            <comment id="263910" author="koutoupis" created="Mon, 24 Feb 2020 16:32:48 +0000"  >&lt;p&gt;Andreas,&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;&amp;gt; Is this with one MDT per MDS, or are all four MDTs on the same MDS?&#160;&lt;/p&gt;

&lt;p&gt;1 MDT per MDS (each on one).&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;&amp;gt; How many clients are being used for this test?&lt;/p&gt;

&lt;p&gt;It was 60 clients.&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;&amp;gt; Does the performance improve when there are additional clients added for the 3/4 MDT test cases? &lt;/p&gt;

&lt;p&gt;We have not added more clients than this.&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;&amp;gt; Having the actual test command line included in the problem description would make this report a lot more useful.&#160;&lt;/p&gt;

&lt;p&gt;We had four MDTs&lt;/p&gt;

&lt;p&gt;lfs mkdir -c 4 &amp;lt;remote directory&amp;gt; &lt;span class=&quot;error&quot;&gt;&amp;#91;-D&amp;#93;&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;mdtest -i 3 -p 30 -F -C -E -T -r -n $(( 1048576 / $PROCS )*Num_MDTs) -v -d $&amp;lt;remote directory/OUTDIR&amp;gt;&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;We will do 1 million objects per MDT&#8230;so for this test, we have 4 MDTs, so we did 4 Million objects.&#160;Again, 60 Clients.&lt;/p&gt;

&lt;p&gt;With the same mdtest with -u flag, we see good scaling with 4 MDTs, remove the -u flag to not do unique directory operation per rank (shared directory), the lack of scaling is present. We even tried mdtest with and without -g flag &lt;span class=&quot;error&quot;&gt;&amp;#91;in the mainline latest builds&amp;#93;&lt;/span&gt;, same behavior.&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;&amp;gt; Looking at the referenced slide from the Cambridge presentation (attached), it actually shows almost linear scaling for additional MDTs (one per MDS) up to 48, excluding the 48-MDT stat test.&#160;&lt;/p&gt;

&lt;p&gt;The scaling in the presentation is very minimal as it was in some of our older tests with larger MDT/client counts (up to 512 clients). Is this to be expected?&lt;/p&gt;</comment>
                            <comment id="263923" author="spitzcor" created="Mon, 24 Feb 2020 21:21:24 +0000"  >&lt;p&gt;&amp;gt;&amp;gt; Andreas wrote:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;Looking at the referenced slide from the Cambridge presentation (attached), it actually shows almost linear scaling for additional MDTs (one per MDS) up to 48, excluding the 48-MDT stat test. &lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;&amp;gt; Petros wrote:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;The scaling in the presentation is very minimal as it was in some of our older tests with larger MDT/client counts (up to 512 clients). Is this to be expected?&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;More specifically, the scaling in the chart is about the easy mdtest (shared dir?) and stats.  I think the focus of the problem is the scaling/performance of create in a single shared directory.&lt;/p&gt;</comment>
                            <comment id="264039" author="ofaaland" created="Tue, 25 Feb 2020 18:39:23 +0000"  >&lt;p&gt;We haven&apos;t tested this with recent 2.12 or master, but we also saw cases of poor DNE2 scaling in the past.&lt;/p&gt;</comment>
                            <comment id="264253" author="koutoupis" created="Fri, 28 Feb 2020 14:13:53 +0000"  >&lt;p&gt;@Olaf Faaland,&lt;/p&gt;

&lt;p&gt;I have tested master from 3 weeks ago or so and the results are the same. Part of the challenges that I am facing here is: how much of this minimal scaling is expected and how much room do we have to make it better? Earlier presentations posted online show that between 1-4 MDTs running mknod tests show some scaling but these were running an older build of Lustre and since then our single MDT performance has gotten exponentially better. Today, when I run mknod tests, the scaling results are no different than my creates.&lt;/p&gt;</comment>
                            <comment id="264887" author="koutoupis" created="Mon, 9 Mar 2020 14:07:02 +0000"  >&lt;p&gt;I have attached some flamegraphs and perf reports (MDT-DNE2-shareddir_flamegraphs.zip)&#160;for a 2 MDT and then 4 MDT configuration during load. Anyway, I am able to provide any data or traces upon request. Also, I have run the same tests using normal creates and then again with mknod with the same scaling results. Any thoughts, ideas, etc?&lt;/p&gt;</comment>
                            <comment id="264890" author="koutoupis" created="Mon, 9 Mar 2020 14:27:21 +0000"  >&lt;p&gt;I also shared&#160;shared-directory_mdt-perf.tar.gz, which consists of the flamegraphs of the original test that correlate to the numbers posted above in the description. Note that in the tarball, mdt0-1total consists of the single MDT testing while the rest of the subdirectories inside the archive are each MDT in a 4 MDT configuration.&lt;/p&gt;</comment>
                            <comment id="270574" author="koutoupis" created="Tue, 19 May 2020 16:54:42 +0000"  >&lt;p&gt;Added the tarball&#160;archive_smaller_inodes-tests.tar.gz and an accompanying powerpoint&#160;archive_smaller_inodes-tests.pptx which highlights DNE2 single shared directory scaling utilizing the large Moon cluster over at LANL. We were able to drive load from 512 clients and starting from a single server, double it at each iteration until we reached 32 MDTs. With enough clients, it seems that there was a reasonable amount of scaling and that this issue becomes much less of a concern. I will close this ticket unless there are objections to my doing so.&lt;/p&gt;</comment>
                            <comment id="270575" author="koutoupis" created="Tue, 19 May 2020 16:56:28 +0000"  >&lt;p&gt;@Andreas Dilger,&lt;/p&gt;

&lt;p&gt;It seems that I do not have the proper rights to close this ticket. Please advise.&lt;/p&gt;</comment>
                            <comment id="270576" author="pjones" created="Tue, 19 May 2020 16:57:59 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/ViewProfile.jspa?name=koutoupis&quot; class=&quot;user-hover&quot; rel=&quot;koutoupis&quot;&gt;koutoupis&lt;/a&gt; try again&lt;/p&gt;</comment>
                            <comment id="270579" author="koutoupis" created="Tue, 19 May 2020 17:05:12 +0000"  >&lt;p&gt;@Peter Jones&lt;/p&gt;

&lt;p&gt;Still cannot close the ticket. It is not even an option.&lt;/p&gt;</comment>
                            <comment id="270581" author="pjones" created="Tue, 19 May 2020 17:19:11 +0000"  >&lt;p&gt;Sorry about that - two similarly named groups lured me into an error. Please have another go - I think I got it this time&lt;/p&gt;</comment>
                            <comment id="270582" author="koutoupis" created="Tue, 19 May 2020 17:48:03 +0000"  >&lt;p&gt;It would seem that with enough client load, that we are able to drive proper DNE2 single shared directory scaling.&lt;/p&gt;</comment>
                            <comment id="270626" author="adilger" created="Tue, 19 May 2020 23:59:49 +0000"  >&lt;p&gt;Reopen to change resolution.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="45824">LU-9436</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="34422" name="MDT-DNE2-shareddir_flamegraphs.zip" size="12013861" author="koutoupis" created="Mon, 9 Mar 2020 14:04:42 +0000"/>
                            <attachment id="34327" name="Screen Shot 2020-02-21 at 16.11.52.png" size="65790" author="adilger" created="Fri, 21 Feb 2020 23:12:52 +0000"/>
                            <attachment id="34936" name="archive_smaller_inodes-tests.pptx" size="641845" author="koutoupis" created="Tue, 19 May 2020 16:52:07 +0000"/>
                            <attachment id="34935" name="archive_smaller_inodes-tests.tar.gz" size="170141" author="koutoupis" created="Tue, 19 May 2020 16:50:03 +0000"/>
                            <attachment id="34423" name="shared-directory_mdt-perf.tar.gz" size="7105149" author="koutoupis" created="Mon, 9 Mar 2020 14:26:07 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i00u8n:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                </customfields>
    </item>
</channel>
</rss>