<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:34:44 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-10400] Reduced stat performance with lustre 2.10</title>
                <link>https://jira.whamcloud.com/browse/LU-10400</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;We have been noticing decreased performance in any stat-intense operation on lustre 2.10.&lt;span class=&quot;error&quot;&gt;&amp;#91;0,2&amp;#93;&lt;/span&gt; when compared to 2.7. The difference is more significant when testing on HDDs than when testing on SSDs, but is visible for us on both. Between runs I am dropping cache on the client, mds, and oss via &quot;echo 3 &amp;gt; /proc/sys/vm/drop_caches&quot;&lt;/p&gt;

&lt;p&gt;For example, in a single directory containing 100000 files:&lt;/p&gt;
&lt;div class=&apos;table-wrap&apos;&gt;
&lt;table class=&apos;confluenceTable&apos;&gt;&lt;tbody&gt;
&lt;tr&gt;
&lt;th class=&apos;confluenceTh&apos;&gt;&#160;Client Version&lt;/th&gt;
&lt;th class=&apos;confluenceTh&apos;&gt;2.10.2&lt;/th&gt;
&lt;th class=&apos;confluenceTh&apos;&gt;2.10.2&lt;/th&gt;
&lt;th class=&apos;confluenceTh&apos;&gt;2.7&lt;/th&gt;
&lt;th class=&apos;confluenceTh&apos;&gt;2.7&lt;/th&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;th class=&apos;confluenceTh&apos;&gt;Server Version&lt;/th&gt;
&lt;th class=&apos;confluenceTh&apos;&gt;2.10.2&#160;&lt;/th&gt;
&lt;th class=&apos;confluenceTh&apos;&gt;2.7&lt;/th&gt;
&lt;th class=&apos;confluenceTh&apos;&gt;2.10.2&lt;/th&gt;
&lt;th class=&apos;confluenceTh&apos;&gt;2.7&lt;/th&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;&#160;ls -l time (seconds)&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;54&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;5&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;53&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;6&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;du -s time (seconds)&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;150&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;22&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;150&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;29&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/div&gt;


&lt;p&gt;We are running 2.10 server on centos7 and 2.7 on rhel6.6.&lt;/p&gt;</description>
                <environment></environment>
        <key id="49769">LU-10400</key>
            <summary>Reduced stat performance with lustre 2.10</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="1" iconUrl="https://jira.whamcloud.com/images/icons/statuses/open.png" description="The issue is open and ready for the assignee to start work on it.">Open</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="standan">Saurabh Tandan</assignee>
                                    <reporter username="mcmult">Tim McMullan</reporter>
                        <labels>
                    </labels>
                <created>Fri, 15 Dec 2017 18:51:46 +0000</created>
                <updated>Thu, 22 Mar 2018 19:36:01 +0000</updated>
                                                                                <due></due>
                            <votes>0</votes>
                                    <watches>12</watches>
                                                                            <comments>
                            <comment id="216766" author="pjones" created="Tue, 19 Dec 2017 19:14:56 +0000"  >&lt;p&gt;Saraubh&lt;/p&gt;

&lt;p&gt;Can you please see whether you can reproduce these results?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="216768" author="adilger" created="Tue, 19 Dec 2017 19:20:34 +0000"  >&lt;p&gt;Hi Tim, are there any tunable or formatting options that are used, or default file striping that is used at your site? We&#8217;d like to reproduce this locally to debug the problem, but want to make sure that what we are testing matches what you have. &lt;/p&gt;</comment>
                            <comment id="216851" author="mcmult" created="Wed, 20 Dec 2017 16:02:33 +0000"  >&lt;p&gt;&lt;tt&gt;Hey Andreas,&#160;&lt;/tt&gt;&lt;br/&gt;
The physical setup is 1mds/mgs, 1 oss with 4 osts (2 SSD. 2 HDD). The stripe size is 1MB, and we tested with stripe 2 so everything hits both osts of the same type (SSD and HDD are in separate pools).&#160; The files we used were all 2MB.&#160; All the testing above was done on the HDDs.&lt;br/&gt;
&#160;&lt;br/&gt;
I&apos;m setting the following on the MGS, but otherwise the setup is default for both 2.7 and 2.10&lt;br/&gt;
&lt;tt&gt;lctl set_param&#160;&lt;/tt&gt;&lt;tt&gt;-&lt;/tt&gt;&lt;tt&gt;P llite.&lt;/tt&gt;&lt;tt&gt;*&lt;/tt&gt;&lt;tt&gt;.lazystatfs&lt;/tt&gt;&lt;tt&gt;=&lt;/tt&gt;&lt;tt&gt;1&lt;/tt&gt;&lt;br/&gt;
&lt;tt&gt;lctl set_param&#160;&lt;/tt&gt;&lt;tt&gt;-&lt;/tt&gt;&lt;tt&gt;P osc.&lt;/tt&gt;&lt;tt&gt;*&lt;/tt&gt;&lt;tt&gt;.max_rpcs_in_flight&lt;/tt&gt;&lt;tt&gt;=&lt;/tt&gt;&lt;tt&gt;32&lt;/tt&gt;&lt;br/&gt;
&lt;tt&gt;lctl set_param&#160;&lt;/tt&gt;&lt;tt&gt;-&lt;/tt&gt;&lt;tt&gt;P osc.&lt;/tt&gt;&lt;tt&gt;*&lt;/tt&gt;&lt;tt&gt;.max_dirty_mb&lt;/tt&gt;&lt;tt&gt;=&lt;/tt&gt;&lt;tt&gt;256&lt;/tt&gt;&lt;br/&gt;
&#160;&lt;br/&gt;
&lt;tt&gt;I&apos;m formatting with the following:&lt;/tt&gt;&lt;br/&gt;
MDS - the mgs and mdt are on the same host sharing a drive, the underlying device is a RAID1 of 10k RPM disks.&lt;br/&gt;
&lt;tt&gt;mkfs.lustre --fsname=${name} --mgs /dev/sdc1&lt;/tt&gt;&lt;br/&gt;
&lt;tt&gt;mkfs.lustre --fsname=${name} --mdt --mgsnode=${mgs_ip}@o2ib --index=0 /dev/sdc2&lt;/tt&gt;&lt;br/&gt;
&#160;&lt;br/&gt;
&lt;tt&gt;OSS -&lt;/tt&gt;&lt;br/&gt;
&lt;tt&gt;ssd, no raid&lt;/tt&gt;&lt;br/&gt;
&lt;tt&gt;mkfs.lustre&#160;&#8211;fsname=${name} -&lt;del&gt;ost&#160;&lt;/del&gt;-mgsnode=${mgs_ip}@o2ib --index=0 /dev/sdb&lt;/tt&gt;&lt;br/&gt;
&lt;tt&gt;mkfs.lustre&#160;&#8211;fsname=${name} -&lt;del&gt;ost&#160;&lt;/del&gt;-mgsnode=${mgs_ip}@o2ib --index=1 /dev/sdc&lt;/tt&gt;&lt;br/&gt;
&#160;&lt;br/&gt;
&#160;&lt;br/&gt;
&lt;tt&gt;10k RPM disks, no raid:&lt;/tt&gt;&lt;br/&gt;
&lt;tt&gt;mkfs.lustre &#8211;fsname=${name} -&lt;del&gt;ost&#160;&lt;/del&gt;-mgsnode=${mgs_ip}@o2ib --index=2 /dev/sdd&lt;/tt&gt;&lt;br/&gt;
&lt;tt&gt;mkfs.lustre&#160;&#8211;fsname=${name} -&lt;del&gt;ost&#160;&lt;/del&gt;-mgsnode=${mgs_ip}@o2ib --index=3 /dev/sde&lt;/tt&gt;&lt;br/&gt;
&#160;&lt;br/&gt;
&lt;tt&gt;Thank you!&lt;/tt&gt;&lt;br/&gt;
&lt;tt&gt;--Tim&lt;/tt&gt;&lt;br/&gt;
&#160;&lt;/p&gt;</comment>
                            <comment id="217604" author="standan" created="Fri, 5 Jan 2018 17:39:46 +0000"  >&lt;p&gt;Hi Tim,&lt;br/&gt;
Can you please clarify if you are using el7 or el7.4 with 2.10.x ? &lt;br/&gt;
Also can you please clarify if you are using two separate systems with 2.7 and 2.10.x or upgrading from 2.7 to 2.10.x ? &lt;br/&gt;
Statement - &quot;We are running 2.10 server on centos7 and 2.7 on rhel6.6.&quot; is creating a bit of confusion. &lt;br/&gt;
Can you please clarify a bit on your lustre version setup please.&lt;/p&gt;</comment>
                            <comment id="217633" author="allen.todd@sig.com" created="Fri, 5 Jan 2018 21:02:48 +0000"  >&lt;p&gt;The lustre 2.10.x system is running:  CentOS Linux release 7.4.1708 (Core)&lt;br/&gt;
The lustre 2.7 system is running: RedHatEnterpriseServer 6.6&lt;/p&gt;

&lt;p&gt;Both filesystems are new builds in a lab with no preexisting data.&lt;/p&gt;</comment>
                            <comment id="222734" author="standan" created="Wed, 7 Mar 2018 19:50:26 +0000"  >&lt;p&gt;Tried to verify the performance drop between Lustre version 2.7.19.6 and 2.10.0 using the same kernel but I was not able to identify any huge delta between their performance numbers for file creation of 100000 files and later stat using &apos;time ls -l&apos;. The numbers below are average of 3 runs for each. We will still continue to investigate further into this issue and see if we may identify anything.&lt;br/&gt;
File creation using Touch&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Build		           Version	     Real           user       sys	
b_ieel3_0 build 159	   2.7.19.6      85.389      0.33	     14.625      kernel-3.10.0-514.el7
b2_10 build 5	           2.10.0         99.75        0.325      18.363      kernel-3.10.0-514.el7
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;time ls -l for touch&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Build		               Version      Real            user      sys	
b_ieel3_0 build 159		2.7.19.6      4.444      0.835      2.098      kernel-3.10.0-514.el7
b2_10 build 5          	2.10.0          3.848      0.824      2.338      kernel-3.10.0-514.el7
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;File creation using Mcreate:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Build                        	Version	Real	        usr	         sys	
b_ieel3_0 build 159		2.7.19.6	183.02	38.133	137.687	kernel-3.10.0-514.el7
b2_10 build 5	                2.10.0	196.111	38.003	152.28	kernel-3.10.0-514.el7
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;time ls -l for Mcreate:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Build	                         Version      Real         usr         sys	
b_ieel3_0 build 159	         2.7.19.6      3.266      0.76      1.464      kernel-3.10.0-514.el7
b2_10 build 5    	         2.10.0         3.27       0.738     1.782      kernel-3.10.0-514.el7
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="224305" author="mcmult" created="Thu, 22 Mar 2018 19:03:04 +0000"  >&lt;p&gt;Thanks for checking it out!&#160; After your test I decided to try running a test with the same lustre version on the el6 and 7 kernels.&#160; I ran this with lustre 2.8 on rhel6 and rhel7 since it happens to be easy with the released packages.&#160; The results are below, but times appear to be significantly different between the two.&#160;&#160;&lt;/p&gt;

&lt;p&gt;time ls -l&#160;&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Kernel                             real   user   sys
2.6.32-573.12.1.el6_lustre.x86_64  2.848  0.824  1.808
3.10.0-693.11.6.el7_lustre.x86_64  4.322  0.832  2.188&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;time du -s&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Kernel                             real   user   sys
2.6.32-573.12.1.el6_lustre.x86_64  20.450 0.188  5.280
3.10.0-693.11.6.el7_lustre.x86_64  34.830 0.192  5.448&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;I&apos;ll keep looking and see what more I can come up with.&#160; Thanks!&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;</comment>
                            <comment id="224312" author="paf" created="Thu, 22 Mar 2018 19:14:50 +0000"  >&lt;p&gt;Tim,&lt;/p&gt;

&lt;p&gt;That version of CentOS 7 includes the KPTI/Meltdown fix, and that version of CentOS 6 does not.&#160; That&apos;s a huge difference, and should account for the differences you&apos;re seeing, unless you&apos;ve specifically disabled KPTI.&lt;/p&gt;</comment>
                            <comment id="224314" author="mcmult" created="Thu, 22 Mar 2018 19:36:01 +0000"  >&lt;p&gt;I&apos;m sorry Patrick, my mistake.&#160; I grabbed some output for the wrong host...&#160;&#160;&lt;/p&gt;

&lt;p&gt;This is the run from&#160;3.10.0-327.3.1.el7_lustre.x86_64 (packaged one for 2.8)&lt;/p&gt;


&lt;p&gt;time ls -l&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Kernel                             real   user   sys
2.6.32-573.12.1.el6_lustre.x86_64  2.848  0.824  1.808
3.10.0-327.3.1.el7_lustre.x86_64   3.391  0.820  1.876&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;time du -s&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Kernel                             real   user   sys
2.6.32-573.12.1.el6_lustre.x86_64  20.450 0.188  5.280
3.10.0-327.3.1.el7_lustre.x86_64   32.417 0.252  5.272&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&#160;&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzzpkv:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>