<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 03:25:53 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-16311] Lustre 2.15 IOR rewrite IOPS are lower than 2.12 (after fix for LU-13013)</title>
                <link>https://jira.whamcloud.com/browse/LU-16311</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;When using a simple IOR script to test BIO rewrite performance (see example script below), there seems to be a very noticeable regression when comparing anything from the 2.15 family to 2.12. This can only be observed when you begin to saturate the target systems (specifically, the OSS nodes) with load. For instance, I am running 64 processes per node across 21 clients. On my system, we are talking about approximately &lt;ins&gt;25% performance loss&lt;/ins&gt;.&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;&lt;b&gt;2.12 64 PPN BIO random rewrites (70 - 85% CPU util):&lt;/b&gt;&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Summary of all tests:
Operation   Max(MiB)   Min(MiB)  Mean(MiB)     StdDev   Max(OPs)   Min(OPs)  Mean(OPs)
write         748.31     748.31     748.31       0.00  191566.59  191566.59  191566.59
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;&lt;b&gt;2.15 64 PPN BIO random rewrites&lt;/b&gt; &lt;b&gt;(85 - 95% CPU util)&lt;/b&gt;&lt;b&gt;:&lt;/b&gt;&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Summary of all tests:
Operation   Max(MiB)   Min(MiB)  Mean(MiB)     StdDev   Max(OPs)   Min(OPs)  Mean(OPs)
write         575.92     575.92     575.92       0.00  147434.55  147434.55  147434.55 
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;Perf captures and ftrace data (with the help of git blame) have led me down the path to a root cause:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;commit 17ed9ed24bffd038e2cd116012f9a40d09afc9fc
Author: Alex Zhuravlev &amp;lt;bzzz@whamcloud.com&amp;gt;
Date: &#160; Tue Nov 26 16:24:55 2019 +0300
&#160;
&#160; &#160; LU-13013 osd: do not count credits for mapped blocks
&#160;
&#160; &#160; this should help to save credits if an application
&#160; &#160; overwrites using many tiny fragments.&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;HOWEVER, reverting commit 17ed9ed24b is reliant (at a minimum) on reverting both:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;commit 42cda8781f94ad1138afac2d23180ea48f3c3450
Author: Wang Shilong &amp;lt;wshilong@ddn.com&amp;gt;
Date: &#160; Wed Jun 2 09:52:39 2021 +0800
&#160;
&#160; &#160; LU-14729 osd-ldiskfs: declare dirty block groups correctly
&#160;
&#160; &#160; Calculate dirty block groups only include estimated extents,
&#160; &#160; indirect blocks and extent node/leaf blocks are missed, this
&#160; &#160; could make us short of credits.
&#160;


commit e1bd38e27a810bad7a25813ebc1ca0535c9d7228
Author: Wang Shilong &amp;lt;wshilong@ddn.com&amp;gt;
Date: &#160; Wed Nov 11 14:51:09 2020 +0800
&#160;
&#160; &#160; LU-14131 osd-ldiskfs: reduce credits for overwritting
&#160;
&#160; &#160; If all blocks are mapped which means this is overwritting
&#160; &#160; case or space has been allocated by fallocate.
&#160;
&#160; &#160; There is no need to modify exten tree, and we only
&#160; &#160; need 1 credits for inode.&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Because of this dependency, all three commits need to be reverted in order to restore the IOR rewrite regression. With them reverted, 2.15 reports an improved 207K OPs with IOR.&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;&lt;b&gt;Note -&lt;/b&gt; There is likely more code that is affected, so reverting only what is mentioned above may not be the correct solution.&lt;/p&gt;

&lt;p&gt;----------------&lt;/p&gt;

&lt;p&gt;Example IOR script:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;#!/bin/bash
&#160;
NODES=21
PPN=64
MDT_COUNT=1
DEST=kjlmo2
DIR=flash
IOR=ior-3.3.0-CentOS-8.2/install/bin/ior
DIO=&quot;--posix.odirect&quot;
&#160;
&#160;
# DIO writes (prefill)
sudo pdsh -w c-lmo[1004-1024] sh /home/bloewe/bin/sudo-flush.sh
srun --mpi=pmi2 -N $NODES --ntasks-per-node $PPN ~bloewe/benchmarks/$IOR -F -w -t 64m -k $DIO -b 8g -vv -o /mnt/$DEST/pkoutoupis/$DIR/test.01 2&amp;gt;&amp;amp;1 |&amp;amp; tee flash_write_dio_01.out
sleep 30
# Rewrites
sudo pdsh -w c-lmo[1004-1024] sh /home/bloewe/bin/sudo-flush.sh
srun --mpi=pmi2 -N $NODES --ntasks-per-node $PPN ~bloewe/benchmarks/$IOR -F -w -t 4k -E -k -D 180 -b 8g -vv -z -o /mnt/$DEST/pkoutoupis/$DIR/test.01 2&amp;gt;&amp;amp;1 |&amp;amp; tee flash_rewrite_bio_01.out
 &lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;^ &lt;b&gt;Note&lt;/b&gt; that the flush script only drops caches on the clients.&lt;/p&gt;</description>
                <environment></environment>
        <key id="73255">LU-16311</key>
            <summary>Lustre 2.15 IOR rewrite IOPS are lower than 2.12 (after fix for LU-13013)</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="1" iconUrl="https://jira.whamcloud.com/images/icons/statuses/open.png" description="The issue is open and ready for the assignee to start work on it.">Open</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="wc-triage">WC Triage</assignee>
                                    <reporter username="koutoupis">Petros Koutoupis</reporter>
                        <labels>
                    </labels>
                <created>Mon, 14 Nov 2022 20:23:50 +0000</created>
                <updated>Mon, 14 Nov 2022 20:23:50 +0000</updated>
                                                                                <due></due>
                            <votes>0</votes>
                                    <watches>5</watches>
                                                                                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i035p3:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10021"><![CDATA[2]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>