<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:14:09 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-1167] Poor mdtest unlink performance with multiple processes per node</title>
                <link>https://jira.whamcloud.com/browse/LU-1167</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;We have noticed in testing that running multiple mdtest processes per node severely degrades unlink performance.&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;Lustre mounted once per client; not multimount.&lt;/li&gt;
	&lt;li&gt;shared directory case&lt;br/&gt;
This can be seen at a wide range of node counts (5-128) and backends, to varying degrees.&lt;br/&gt;
Interestingly scaling the client count up does not seem to have nearly the same negative performance impact; only the ppn seems to matter.&lt;/li&gt;
&lt;/ul&gt;


&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;	total
nodes	jobs	unlink kops
8	8	11.4
8	16	9.7
8	32	8.6
8	64	7
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;We see the same issue with 1.8.6 clients against the server; we do not see it with 1.8.6 servers.&lt;/p&gt;



</description>
                <environment>SL6.1,&lt;br/&gt;
2.6.32-131.12.1.el6.lustre.20.x86_64</environment>
        <key id="13418">LU-1167</key>
            <summary>Poor mdtest unlink performance with multiple processes per node</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="1" iconUrl="https://jira.whamcloud.com/images/icons/statuses/open.png" description="The issue is open and ready for the assignee to start work on it.">Open</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="wc-triage">WC Triage</assignee>
                                    <reporter username="nrutman">Nathan Rutman</reporter>
                        <labels>
                    </labels>
                <created>Fri, 2 Mar 2012 20:09:22 +0000</created>
                <updated>Fri, 10 May 2013 08:08:44 +0000</updated>
                                            <version>Lustre 2.1.1</version>
                                                        <due></due>
                            <votes>1</votes>
                                    <watches>10</watches>
                                                                            <comments>
                            <comment id="30367" author="nrutman" created="Fri, 2 Mar 2012 20:10:06 +0000"  >&lt;p&gt;tabs didn&apos;t come through; the columns are nodes, total jobs, and unlink kops.&lt;/p&gt;

&lt;p&gt;Anyone else seen this behavior?&lt;/p&gt;</comment>
                            <comment id="30423" author="adilger" created="Sun, 4 Mar 2012 02:14:06 +0000"  >&lt;p&gt;Nathan,&lt;br/&gt;
there are a couple of things worth trying out here:&lt;/p&gt;

&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;Lustre 2.2 has pdirops on the MDS, so if there is directory contention at the server this would be reduced or eliminated.  Presumably this is not a regression from 2.1.0 server performance (i.e. you only compared 1.8.6 and 2.1.1, right)?&lt;/li&gt;
&lt;/ul&gt;


&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;the patch that Liang made in &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-933&quot; title=&quot;allow disabling the mdc_rpc_lock for performance testing&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-933&quot;&gt;&lt;del&gt;LU-933&lt;/del&gt;&lt;/a&gt; (&lt;a href=&quot;http://review.whamcloud.com/2084&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/2084&lt;/a&gt;) allows &lt;em&gt;testing&lt;/em&gt; concurrent modifying metadata RPCs from the same client (breaks recovery, so NOT suitable for real world usage), but since you indicate the same problem happens with both 1.8 and 2.1 clients against 2.1.1 servers, I suspect that the problem is on the server side&lt;/li&gt;
&lt;/ul&gt;
</comment>
                            <comment id="30443" author="nrutman" created="Sun, 4 Mar 2012 14:27:18 +0000"  >&lt;blockquote&gt;&lt;p&gt;- Lustre 2.2 has pdirops on the MDS, so if there is directory contention at the server this would be reduced or eliminated. &lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;There definitely is directory contention (I get much better rates with -u), but I&apos;m still wondering why it should change so dramatically depending on the number of threads per client, and not with the number of clients.  Why should having more than one thread on a client have any effect on the overall rate, assuming there are enough clients to saturate the MDS?&lt;br/&gt;
Pdirops patch and &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-933&quot; title=&quot;allow disabling the mdc_rpc_lock for performance testing&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-933&quot;&gt;&lt;del&gt;LU-933&lt;/del&gt;&lt;/a&gt; are definitely something I will investigate, but I&apos;d still like to understand the current behavior.&lt;/p&gt;

&lt;blockquote&gt;&lt;p&gt;- Presumably this is not a regression from 2.1.0 server performance (i.e. you only compared 1.8.6 and 2.1.1, right)?&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;Right.&lt;/p&gt;
</comment>
                            <comment id="31054" author="nrutman" created="Tue, 13 Mar 2012 14:40:56 +0000"  >&lt;p&gt;Here are some Hyperion results for 8 ppn on 100 nodes, performance also pretty poor.&lt;br/&gt;
shared dir&lt;br/&gt;
&lt;a href=&quot;https://maloo.whamcloud.com/sub_tests/9bda4762-6740-11e1-a671-5254004bbbd3&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://maloo.whamcloud.com/sub_tests/9bda4762-6740-11e1-a671-5254004bbbd3&lt;/a&gt;&lt;br/&gt;
000:    File creation     :  10373.125   7962.209   9318.090   1006.979&lt;br/&gt;
000:    File removal      :   2532.009   2325.865   2402.659     91.998&lt;/p&gt;

&lt;p&gt;unique dir&lt;br/&gt;
&lt;a href=&quot;https://maloo.whamcloud.com/sub_tests/9bf3f5f4-6740-11e1-a671-5254004bbbd3&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://maloo.whamcloud.com/sub_tests/9bf3f5f4-6740-11e1-a671-5254004bbbd3&lt;/a&gt;&lt;br/&gt;
000:    File creation     :  10543.279  10210.804  10392.223    137.420&lt;br/&gt;
000:    File removal      :   4178.868   3979.381   4093.730     84.019&lt;/p&gt;
</comment>
                            <comment id="33467" author="nrutman" created="Wed, 4 Apr 2012 13:41:29 +0000"  >&lt;p&gt;Continuing to pursue this, there are four identified bottlenecks for unlink performance:&lt;/p&gt;

&lt;p&gt;a. Parent directory mutex in the Linux kernel VFS (in do_unlinkat). This greatly affects shared directory operations within a single client (avg latency: 1ppn=0 microsecs, 4ppn=6460, while lustre unlink rpc=250 (constant for 1,4 ppn)). Measured with dir-per-client, to remove MDT shared directory ldiskfs lock and shared lock ldlm callbacks.&lt;/p&gt;

&lt;p&gt;b. Single MDC rpc-in-flight (rpc_lock) serializing rpcs. This also greatly affects multiple ppn operations (1ppn=0, 2ppn=180, 4ppn=1080, 8ppn=3530). Measured with dir-per-process to avoid parent mutex, but sharing the same dirs between clients to include ldiskfs and ldlm effects. While it may be possible to remove this restriction (MRP-59), doing so may be very complex due to ordering issues. (Note &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-933&quot; title=&quot;allow disabling the mdc_rpc_lock for performance testing&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-933&quot;&gt;&lt;del&gt;LU-933&lt;/del&gt;&lt;/a&gt; has a patch to remove this in an unsafe way, for testing.)&lt;/p&gt;

&lt;p&gt;c. Shared-dir MDT ldlm lock. Lock callback time increases slowly with increased ppn (1ppn=130, 8ppn=240), possibly due to increasing client context switching time. Measured as in b above. Not expected to increase with client count. Possibly could be eliminated by having the MDS act as a proxy lock holder for multiple shared-dir clients, but not much gain possible here.&lt;/p&gt;

&lt;p&gt;d. Shared-dir ldiskfs lock. Contention for the same lock on the MDT directory, mostly independent of ppn but will increase with client count (subtracting out lock callback latency, 1ppn=90, 8ppn=130, measured as in b with 8 clients). Pdirops (&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-50&quot; title=&quot;pdirops patch for ldiskfs&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-50&quot;&gt;&lt;del&gt;LU-50&lt;/del&gt;&lt;/a&gt;) in Lustre 2.2.0 should help with this.&lt;/p&gt;


&lt;p&gt;Cliff, I was wondering if you have any 1 ppn mdtest results on Hyperion for different client counts?  I can&apos;t find any on Maloo (and indeed the link above seem to have lost their logs as well).  Also, can you tell me if the results above included pdirops (Lu-50, lustre 2.2.0)?&lt;/p&gt;</comment>
                            <comment id="33470" author="spitzcor" created="Wed, 4 Apr 2012 14:04:21 +0000"  >&lt;p&gt;Nathan, I agree with the bottlenecks that you&apos;ve identified, but I don&apos;t think that any of them are regressions.  But, maybe I&apos;m wrong wrt c.?  In the description you wrote, &quot;We see the same issue with 1.8.6 clients against the server; we do not see it with 1.8.6 servers.&quot;  Shouldn&apos;t we first focus on the regression from 1.8.6 to 2.x?&lt;/p&gt;</comment>
                            <comment id="33486" author="nrutman" created="Wed, 4 Apr 2012 16:11:01 +0000"  >&lt;p&gt;Hmm, that&apos;s a good point Cory.  It gets a little obfuscated with the number of variations in client count, server type, and storage layout.  Our 1.8.6 5-client test didn&apos;t show the drop with increasing ppn; it remained constant at 5kops.  Our 2.1 8-client test did show the decrease, but started from 13kops and went down to 9kops.  So it&apos;s a little hard to call this a clear regression, when the 2.1 numbers are all above the 1.8 numbers.&lt;/p&gt;</comment>
                            <comment id="33487" author="spitzcor" created="Wed, 4 Apr 2012 16:24:28 +0000"  >&lt;p&gt;If the HW was fixed for the comparison then that doesn&apos;t sound like a regression, just a major improvement with different dynamics.&lt;/p&gt;</comment>
                            <comment id="34563" author="nrutman" created="Wed, 11 Apr 2012 15:21:45 +0000"  >&lt;p&gt;Cory: yes, but it&apos;s a little unclear. &lt;br/&gt;
I&apos;m having trouble finding some old performance numbers at scale; I was really hoping Cliff had some older Hyperion numbers using 1.8.x&lt;/p&gt;</comment>
                            <comment id="34756" author="spitzcor" created="Sun, 15 Apr 2012 01:16:59 +0000"  >&lt;p&gt;We could run some apple-to-apples #s on Cray gear between 1.8.6 and 2.1.1.  What kind of scale do you need?&lt;/p&gt;</comment>
                            <comment id="56203" author="mmansk" created="Fri, 12 Apr 2013 15:37:44 +0000"  >&lt;p&gt;excel sheet with comparisons of 1.8.6 to 2.3 &amp;amp; 2.4 using metabench.  show performance drop from 1.8.6 in deletes in same directory for 2.x.&lt;/p&gt;</comment>
                            <comment id="56204" author="mmansk" created="Fri, 12 Apr 2013 15:39:36 +0000"  >&lt;p&gt;data from metabench runs for 1.8.6,2.3&amp;amp;2.4&lt;/p&gt;</comment>
                            <comment id="57930" author="spitzcor" created="Wed, 8 May 2013 18:38:45 +0000"  >&lt;p&gt;Bug should be marked as affects all 2.x versions.&lt;/p&gt;</comment>
                            <comment id="57948" author="mmansk" created="Wed, 8 May 2013 21:24:32 +0000"  >&lt;p&gt;metabench source&lt;/p&gt;</comment>
                            <comment id="58110" author="spitzcor" created="Fri, 10 May 2013 03:31:15 +0000"  >&lt;p&gt;It looks as though the comments that were made on 04/Apr/12 were about right.  &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-3308&quot; title=&quot;large readdir chunk size slows unlink/&amp;quot;rm -r&amp;quot; performance&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-3308&quot;&gt;LU-3308&lt;/a&gt; was opened to look into the regression aspect (which is seemingly unrelated to multiple processes per node).&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="18787">LU-3308</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="15321">LU-1695</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="12649" name="license.txt" size="2405" author="mmansk" created="Wed, 8 May 2013 21:18:52 +0000"/>
                            <attachment id="12509" name="metabench-compare.xlsx" size="10568" author="mmansk" created="Fri, 12 Apr 2013 15:37:44 +0000"/>
                            <attachment id="12510" name="metabench-comparison.txt" size="24122" author="mmansk" created="Fri, 12 Apr 2013 15:39:36 +0000"/>
                            <attachment id="12650" name="metabench.tar" size="4608000" author="mmansk" created="Wed, 8 May 2013 21:24:32 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10040" key="com.atlassian.jira.plugin.system.customfieldtypes:labels">
                        <customfieldname>Epic</customfieldname>
                        <customfieldvalues>
                                        <label>metadata</label>
            <label>performance</label>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzvnx3:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>7703</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>