<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 03:10:59 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-14580] Lustre 2.12.6 performance regression</title>
                <link>https://jira.whamcloud.com/browse/LU-14580</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Between Lustre 2.12.5 and 2.12.6 releases, we have a major performance regression on large systems with multiple interfaces.&lt;/p&gt;

&lt;p&gt;On a 16 socket Superdome Flex with 8 EDR interfaces functioning as a client of a filesystem capable of &amp;gt; 100 GB/s, using Lustre 2.12.5, we are measuring up to 50 GB/s.  With Lustre 2.12.6, the same test is down to 13 GB/s.&lt;/p&gt;

&lt;p&gt;Reverting commit ID f92c7a161242c478658af09159a127bc21cba611 restores performance.&lt;/p&gt;</description>
                <environment>16 Socket Superdome Flex w/8 EDR IB interfaces</environment>
        <key id="63640">LU-14580</key>
            <summary>Lustre 2.12.6 performance regression</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="1" iconUrl="https://jira.whamcloud.com/images/icons/statuses/open.png" description="The issue is open and ready for the assignee to start work on it.">Open</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="neilb">Neil Brown</assignee>
                                    <reporter username="schamp">Stephen Champion</reporter>
                        <labels>
                            <label>performance</label>
                    </labels>
                <created>Fri, 2 Apr 2021 15:40:46 +0000</created>
                <updated>Sun, 4 Jul 2021 14:52:42 +0000</updated>
                                            <version>Lustre 2.12.6</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>14</watches>
                                                                            <comments>
                            <comment id="297701" author="schamp" created="Fri, 2 Apr 2021 16:01:13 +0000"  >&lt;p&gt;FWIW, the test we are using is IOR w/ POSIX Direct IO:&lt;br/&gt;
+ mpirun sdf-1 240 /jet/home/champios/ior/bin/ior -i 3 -t 16m -b 13120m -s 1 -a POSIX --posix.odirect -e -F -w -k -E -D 30 -o /ocean/neocortex/tests/IOR-files/iorfile&lt;/p&gt;</comment>
                            <comment id="297705" author="pjones" created="Fri, 2 Apr 2021 16:47:28 +0000"  >&lt;p&gt;Thanks for the heads-up Steve!&lt;/p&gt;</comment>
                            <comment id="297732" author="jhammond" created="Fri, 2 Apr 2021 20:55:36 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/ViewProfile.jspa?name=schamp&quot; class=&quot;user-hover&quot; rel=&quot;schamp&quot;&gt;schamp&lt;/a&gt; instead of reverting the entire f92c7a1 change, would it be possible to just revert the following hunk and rerun your benchmark?&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;diff --git a/lustre/include/obd.h b/lustre/include/obd.h
index 2b5fb97..8545fd4 100644
--- a/lustre/include/obd.h
+++ b/lustre/include/obd.h
@@ -204,7 +204,6 @@ struct client_obd {
        /* the grant values are protected by loi_list_lock below */
        unsigned long            cl_dirty_pages;      /* all _dirty_ in pages */
        unsigned long            cl_dirty_max_pages;  /* allowed w/o rpc */
-       unsigned long            cl_dirty_transit;    /* dirty synchronous */
        unsigned long            cl_avail_grant;   /* bytes of credit for ost */
        unsigned long            cl_lost_grant;    /* lost credits (trunc) */
        /* grant consumed for dirty pages */
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="297737" author="neilb" created="Sat, 3 Apr 2021 01:27:33 +0000"  >&lt;p&gt;This looks to be a badly ported patch.&lt;/p&gt;

&lt;p&gt;Before the patch, osc_consume_write_grant() increments obd_dirty_pages.&#160; After the patch it doesn&apos;t.&lt;/p&gt;

&lt;p&gt;There are two callers of osc_consume_write_grant().&lt;/p&gt;

&lt;p&gt;One of them, in osc_enter_cache_try() now increments obd_dirty_pages before calling osc_consume_write_grant()&lt;/p&gt;

&lt;p&gt;The other, in osc_queue_sync_pages(), hasn&apos;t been updated.&#160; Presumably it needs an increment.&#160; Though there is already an increment after the call.&lt;/p&gt;

&lt;p&gt;So the patch makes a behaviour change which wasn&apos;t intended.&#160; I don&apos;t know what the correct behaviour is.&lt;/p&gt;

&lt;p&gt;@mpershin you did the backport: can you check it?&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;</comment>
                            <comment id="298209" author="tappro" created="Thu, 8 Apr 2021 07:41:03 +0000"  >&lt;p&gt;Sure, I am checking that . These changes are combination of two patches actually: first ID is Ia047affc33fb9277e6c28a8f6d7d088c385b51a8 and next one is already referred&#160;f92c7a161242c478658af09159a127bc21cba611&lt;/p&gt;

&lt;p&gt;Their ports to 2.12 base were a bit different from master branch so I am checking if some functionality was broken.&#160;&lt;/p&gt;</comment>
                            <comment id="298225" author="tappro" created="Thu, 8 Apr 2021 12:24:22 +0000"  >&lt;p&gt;I don&apos;t see problems with patch itself. Increment in &lt;tt&gt;osc_consume_write_grant()&lt;/tt&gt; was removed because it is done by &lt;tt&gt;atomic_long_add_return()&lt;/tt&gt; now outside that call and it is done in both places where it is called. But maybe the patch &quot;&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-12687&quot; title=&quot;Fast ENOSPC on direct I/O&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-12687&quot;&gt;&lt;del&gt;LU-12687&lt;/del&gt;&lt;/a&gt; osc: consume grants for direct I/O&quot; itself causes slowdown? Now grants are taken for Direct IO as well, so maybe that is related to not enough grants problem or similar. Are there any complains about grants on client during IOR run?&lt;/p&gt;</comment>
                            <comment id="301802" author="schamp" created="Mon, 17 May 2021 21:16:45 +0000"  >&lt;p&gt;@John Hammond&#160;Good idea!  I tried re-introducing the padding to no effect on x86_64.  128-bit cache line architectures may have different results.&lt;/p&gt;</comment>
                            <comment id="303025" author="pjones" created="Fri, 28 May 2021 18:15:11 +0000"  >&lt;p&gt;Did you get any updated results on this Steve?&lt;/p&gt;</comment>
                            <comment id="304497" author="paf0186" created="Mon, 14 Jun 2021 20:57:14 +0000"  >&lt;p&gt;I&apos;m pretty confident this and &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-14055&quot; title=&quot;Write performance regression caused by an commit from LU-13344&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-14055&quot;&gt;&lt;del&gt;LU-14055&lt;/del&gt;&lt;/a&gt; are the same issue, so I&apos;m snagging both of these.&lt;/p&gt;

&lt;p&gt;I&apos;m going to leave this one open for now, but I think &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-14055&quot; title=&quot;Write performance regression caused by an commit from LU-13344&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-14055&quot;&gt;&lt;del&gt;LU-14055&lt;/del&gt;&lt;/a&gt; is the better place to continue the discussion.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="61287">LU-14055</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="57028">LU-12820</issuekey>
        </issuelink>
                            </outwardlinks>
                                                                <inwardlinks description="is related to">
                                                        </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i01r9r:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10021"><![CDATA[2]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>