<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 03:20:51 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-15736] Commit for LU-14792 introduces client side mdtest file create/remove regression and high std dev</title>
                <link>https://jira.whamcloud.com/browse/LU-15736</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;While testing 2.15 and comparing it to our 2.12 branch, I observed a noticeable regression on the the following:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;client side file create regression&lt;/li&gt;
	&lt;li&gt;client side 32K file remove regression&lt;/li&gt;
	&lt;li&gt;and all of the high std dev for creates/remove that we have been experiencing for creates/remove&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;A git bisect revealed that this commit is the root cause (&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-14792&quot; title=&quot;DNE3: enable filesystem-wide default LMV&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-14792&quot;&gt;&lt;del&gt;LU-14792&lt;/del&gt;&lt;/a&gt;):&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;b9c4dc3c33 LU-14792 llite: enable filesystem-wide default LMV &lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;More details:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;commit b9c4dc3c33fe87ecaa79a290190524ea21b7fa8a
Author: Lai Siyao &amp;lt;lai.siyao@whamcloud.com&amp;gt;
Date: &#160; Mon Jun 21 11:52:01 2021 +0800
&#160;
&#160;
&#160; &#160; LU-14792 llite: enable filesystem-wide default LMV
&#160;&#160; &#160;
&#160; &#160; This change includes three parts:
&#160; &#160; 1. save dir depth to ROOT after lookup on client side.
&#160; &#160; 2. once space balanced default LMV is set on ROOT, and
&#160;&#160; &#160; &#160; max-inherit/max-inherit-rr is unlimited or not less than directory
&#160;&#160; &#160; &#160; depth, new directory will be created in QOS or roundrobin mode.
&#160; &#160; 3. set ROOT default LMV max-inherit unlimited, and max-inherit-rr to
&#160;&#160; &#160; &#160; 3, and increase the ratio to create subdirectory on local MDT with
&#160;&#160; &#160; &#160; the directory depth to ROOT, so that new directories will be
&#160;&#160; &#160; &#160; created by space usage, and the deeper it&apos;s located it&apos;s more
&#160;&#160; &#160; &#160; likely to create on local MDTs; and the top 3 layer will be created
&#160;&#160; &#160; &#160; in roundrobin mode if system is balanced.
&#160;&#160; &#160;
&#160; &#160; Set default LMV in mkdir_on_mdt() to make sure its subdirectories are
&#160; &#160; created on the same MDT. Add sanity 413d.
&#160;&#160; &#160;
&#160; &#160; Create a test directory on MDT0 for pjdfstest, because cross-MDT
&#160; &#160; rename of symlink will migrate symlink to target MDT, which will cause
&#160; &#160; inode change (LU-11631).&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;All commits before this look great. All commits after this exhibit the above symptoms.&lt;/p&gt;

&lt;p&gt;git log on master:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;4668283cd1 LU-14806 o2iblnd: clear fatal error on successful failover
---&amp;gt; introduces regression b9c4dc3c33 LU-14792 llite: enable filesystem-wide default LMV
---&amp;gt; looks good b7bd4e3422 LU-14621 mdd: fix lock-tx order in mdd_xattr_merge()
3e04b0fd6c LU-13417 mdd: set default LMV on ROOT
4e05f3b70b (tag: v2_14_53, tag: 2.14.53) New tag 2.14.53&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;Testing b7bd4e3422 (before patch):&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;SUMMARY rate: (of 5 iterations)
&#160;&#160; Operation&#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; Max&#160; &#160; &#160; &#160; &#160; &#160; Min &#160; &#160; &#160; &#160; &#160; Mean&#160; &#160; &#160; &#160; Std Dev
&#160;&#160; ---------&#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; ---&#160; &#160; &#160; &#160; &#160; &#160; --- &#160; &#160; &#160; &#160; &#160; ----&#160; &#160; &#160; &#160; -------
&#160;&#160; Directory creation&#160; &#160; &#160; &#160; : &#160; &#160; 109280.683 &#160; &#160; 100961.554 &#160; &#160; 105818.622 &#160; &#160; &#160; 3136.705
&#160;&#160; Directory stat&#160; &#160; &#160; &#160; &#160; &#160; : &#160; &#160; 410841.732 &#160; &#160; 388930.761 &#160; &#160; 404696.344 &#160; &#160; &#160; 7969.689
&#160;&#160; Directory removal &#160; &#160; &#160; &#160; : &#160; &#160; 220323.614 &#160; &#160; 150785.433 &#160; &#160; 181709.288&#160; &#160; &#160; 25249.587
&#160;&#160; File creation &#160; &#160; &#160; &#160; &#160; &#160; : &#160; &#160; 154658.972 &#160; &#160; 143961.530 &#160; &#160; 149709.807 &#160; &#160; &#160; 4125.522
&#160;&#160; File stat &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; : &#160; &#160; 700893.743 &#160; &#160; 685670.701 &#160; &#160; 692684.713 &#160; &#160; &#160; 6583.956
&#160;&#160; File read &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; : &#160; &#160; 271890.920 &#160; &#160; 183951.839 &#160; &#160; 205427.555&#160; &#160; &#160; 33679.583
&#160;&#160; File removal&#160; &#160; &#160; &#160; &#160; &#160; &#160; : &#160; &#160; 147697.301 &#160; &#160; 135354.855 &#160; &#160; 140847.877 &#160; &#160; &#160; 4338.359
&#160;&#160; Tree creation &#160; &#160; &#160; &#160; &#160; &#160; :&#160; &#160; &#160; &#160; 275.553&#160; &#160; &#160; &#160; 170.019&#160; &#160; &#160; &#160; 248.261 &#160; &#160; &#160; &#160; 39.874
&#160;&#160; Tree removal&#160; &#160; &#160; &#160; &#160; &#160; &#160; : &#160; &#160; &#160; &#160; 99.770 &#160; &#160; &#160; &#160; 85.408 &#160; &#160; &#160; &#160; 91.795&#160; &#160; &#160; &#160; &#160; 5.479
 &lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;Testing b9c4dc3c33 (after patch):&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;SUMMARY rate: (of 5 iterations)
&#160;&#160; Operation&#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; Max&#160; &#160; &#160; &#160; &#160; &#160; Min &#160; &#160; &#160; &#160; &#160; Mean&#160; &#160; &#160; &#160; Std Dev
&#160;&#160; ---------&#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; ---&#160; &#160; &#160; &#160; &#160; &#160; --- &#160; &#160; &#160; &#160; &#160; ----&#160; &#160; &#160; &#160; -------
&#160;&#160; Directory creation&#160; &#160; &#160; &#160; : &#160; &#160; 108068.523 &#160; &#160; 102899.926 &#160; &#160; 105606.738 &#160; &#160; &#160; 2004.020
&#160;&#160; Directory stat&#160; &#160; &#160; &#160; &#160; &#160; : &#160; &#160; 428322.427 &#160; &#160; 395826.681 &#160; &#160; 404486.906&#160; &#160; &#160; 12222.136
&#160;&#160; Directory removal &#160; &#160; &#160; &#160; : &#160; &#160; 236153.570 &#160; &#160; 146400.162 &#160; &#160; 179968.138&#160; &#160; &#160; 32242.271
&#160;&#160; File creation &#160; &#160; &#160; &#160; &#160; &#160; : &#160; &#160; 156681.218 &#160; &#160; 101096.295 &#160; &#160; 122707.414&#160; &#160; &#160; 23848.521
&#160;&#160; File stat &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; : &#160; &#160; 689022.637 &#160; &#160; 677108.079 &#160; &#160; 683537.598 &#160; &#160; &#160; 4706.503
&#160;&#160; File read &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; : &#160; &#160; 276963.750 &#160; &#160; 184493.079 &#160; &#160; 241172.371&#160; &#160; &#160; 30923.700
&#160;&#160; File removal&#160; &#160; &#160; &#160; &#160; &#160; &#160; : &#160; &#160; 148977.883 &#160; &#160; 100569.361 &#160; &#160; 123812.878&#160; &#160; &#160; 18654.554
&#160;&#160; Tree creation &#160; &#160; &#160; &#160; &#160; &#160; :&#160; &#160; &#160; &#160; 280.232&#160; &#160; &#160; &#160; &#160; 0.994&#160; &#160; &#160; &#160; 142.324&#160; &#160; &#160; &#160; 123.201
&#160;&#160; Tree removal&#160; &#160; &#160; &#160; &#160; &#160; &#160; : &#160; &#160; &#160; &#160; 99.952 &#160; &#160; &#160; &#160; 20.766 &#160; &#160; &#160; &#160; 57.230 &#160; &#160; &#160; &#160; 35.277
 &lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;Again, every test run b9c4dc3c33 and after continues exhibiting the regressions and high deviations noted above. It varies from run to run but I can get regressions 15% or more for both file creates and file removes.&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;mdtest script:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;#!/bin/bash
&#160;
&#160;
NODES=21
PPN=16
PROCS=$(( $NODES * $PPN ))
MDT_COUNT=1
PAUSED=120
&#160;
&#160;
# Unique directory #
srun -N $NODES --ntasks-per-node $PPN ~bloewe/benchmarks/ior-3.3.0-CentOS-8.2/install/bin/mdtest -v -i 5 -p $PAUSED -C -E -T -r -n $(( $MDT_COUNT * 1048576 / $PROCS )) -u -d /mnt/kjlmo13/pkoutoupis/mdt0/test.`date +&quot;%Y%m%d.%H%M%S&quot;` 2&amp;gt;&amp;amp;1 |&amp;amp; tee f_mdt0_0k_ost_uniq.out
&#160;
srun -N $NODES --ntasks-per-node $PPN ~bloewe/benchmarks/ior-3.3.0-CentOS-8.2/install/bin/mdtest -v -i 5 -p $PAUSED -C -w 32768 -E -e 32768 -T -r -n $(( $MDT_COUNT * 1048576 / $PROCS )) -u -d /mnt/kjlmo13/pkoutoupis/mdt0/test.`date +&quot;%Y%m%d.%H%M%S&quot;` 2&amp;gt;&amp;amp;1 |&amp;amp; tee f_mdt0_32k_ost_uniq.out 


# Shared directory #
srun -N $NODES --ntasks-per-node $PPN ~bloewe/benchmarks/ior-3.3.0-CentOS-8.2/install/bin/mdtest -v -i 5 -p $PAUSED -C -E -T -r -n $(( $MDT_COUNT * 1048576 / $PROCS )) -d /mnt/kjlmo13/pkoutoupis/mdt0/test.`date +&quot;%Y%m%d.%H%M%S&quot;` 2&amp;gt;&amp;amp;1 |&amp;amp; tee f_mdt0_0k_ost_shared.out

srun -N $NODES --ntasks-per-node $PPN ~bloewe/benchmarks/ior-3.3.0-CentOS-8.2/install/bin/mdtest -v -i 5 -p $PAUSED -C -w 32768 -E -e 32768 -T -r -n $(( $MDT_COUNT * 1048576 / $PROCS )) -d /mnt/kjlmo13/pkoutoupis/mdt0/test.`date +&quot;%Y%m%d.%H%M%S&quot;` 2&amp;gt;&amp;amp;1 |&amp;amp; tee f_mdt0_32k_ost_shared.out&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&#160;&lt;/p&gt;</description>
                <environment></environment>
        <key id="69696">LU-15736</key>
            <summary>Commit for LU-14792 introduces client side mdtest file create/remove regression and high std dev</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="2" iconUrl="https://jira.whamcloud.com/images/icons/priorities/critical.svg">Critical</priority>
                        <status id="6" iconUrl="https://jira.whamcloud.com/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="5">Cannot Reproduce</resolution>
                                        <assignee username="wc-triage">WC Triage</assignee>
                                    <reporter username="koutoupis">Petros Koutoupis</reporter>
                        <labels>
                    </labels>
                <created>Tue, 12 Apr 2022 13:28:06 +0000</created>
                <updated>Fri, 24 Jun 2022 13:04:05 +0000</updated>
                            <resolved>Fri, 24 Jun 2022 13:04:05 +0000</resolved>
                                                                        <due></due>
                            <votes>0</votes>
                                    <watches>3</watches>
                                                                            <comments>
                            <comment id="331696" author="adilger" created="Tue, 12 Apr 2022 14:58:25 +0000"  >&lt;p&gt;There may be some imbalance in the directory creation because it is the clients which decide which MDT to use at mkdir time. They initially start on different MDTs (essentially &quot;NID % MDTCOUNT&quot;), but this can become unsync&apos;d if the clients are doing different things. &lt;/p&gt;

&lt;p&gt;On the flip side, mdtest runs for &quot;unique dir&quot; no longer need to manually set the directory layout for the output directory to use multiple MDTs.  This would be $MDTCOUNT times faster for user applications where they do not manually set their own directory layout (ie. all of them, because users don&apos;t know about this).&lt;/p&gt;</comment>
                            <comment id="331750" author="adilger" created="Tue, 12 Apr 2022 22:48:47 +0000"  >&lt;p&gt;Petros, just to clarify, does the filesystem only have a single MDT, or is &quot;&lt;tt&gt;MDTCOUNT=1&lt;/tt&gt;&quot; in the test config because the test directory &quot;&lt;tt&gt;/mnt/kjlmo13/pkoutoupis/mdt0&lt;/tt&gt;&quot; is only using MDT0000, but there are actually multiple MDTs in the filesystem?  Have you done any performance comparisons with multiple MDTs?&lt;/p&gt;

&lt;p&gt;It would be useful to collect the &quot;&lt;tt&gt;lfs getdirstripe&lt;/tt&gt;&quot; and &quot;&lt;tt&gt;lfs getdirstripe -D&lt;/tt&gt;&quot; for that directory, and then check during the test run (or disable the &quot;unlink&quot; &quot;&lt;tt&gt;-r&lt;/tt&gt;&quot; phase) and then check the directory distribution of the per-thread directories across MDTs.&lt;/p&gt;</comment>
                            <comment id="331794" author="koutoupis" created="Wed, 13 Apr 2022 12:52:51 +0000"  >&lt;p&gt;Andreas,&lt;/p&gt;

&lt;p&gt;The test directory &quot;/mnt/kjlmo13/pkoutoupis/mdt0&quot; is only tied to a single MDT and yes, there are two on the system.&lt;br/&gt;
Ex. lfs mkdir -i 0 /mnt/kjlmo13/`whoami`/mdt0&lt;/p&gt;

&lt;p&gt;And yes, we have tested the second MDT and it shows worse performance in these areas although I am not entirely sure it is related (yet).&lt;/p&gt;

&lt;p&gt;I can gather the rest of that information and post it shortly.&lt;/p&gt;</comment>
                            <comment id="336093" author="koutoupis" created="Thu, 26 May 2022 14:41:57 +0000"  >&lt;p&gt;Andreas,&lt;/p&gt;

&lt;p&gt;Unfortunately, I am unable to reproduce this client-side issue. Once upon a time I observed it on two separate systems and both of those systems have since be reformatted and gone through other reconfiguration changes which implies that certain conditions need to be met (on the server-side) in order to experience the regression I have noted above in the description. Before these changes, the reproducibility was so consistent that I was able to root cause the issue to:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;b9c4dc3c33 LU-14792 llite: enable filesystem-wide default LMV  &lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Anyway, we have been working internally to understand:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;What server-side conditions could have caused us to observe this regression in the first place?&lt;/li&gt;
	&lt;li&gt;And why are we not able to see it anymore?&lt;/li&gt;
&lt;/ul&gt;
</comment>
                            <comment id="338025" author="koutoupis" created="Fri, 17 Jun 2022 13:02:02 +0000"  >&lt;p&gt;&lt;b&gt;Update -&lt;/b&gt; Despite seeing it on two separate systems at separate moments, I am now unable to reproduce the same issue once again (please refer to my previous post for details). I have been working with our internal architectural team to get a better understanding of why that is and experimenting with some of their suggestions in the hopes of resurfacing the original issue.&lt;/p&gt;</comment>
                            <comment id="338664" author="koutoupis" created="Fri, 24 Jun 2022 13:04:05 +0000"  >&lt;p&gt;Unfortunately,&#160; I am unable to reproduce the original issue. If/when I do, I will reopen the ticket.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="64857">LU-14792</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i02mvj:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>