<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:35:19 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-10463] Poor write performance periodically on repeated test runs</title>
                <link>https://jira.whamcloud.com/browse/LU-10463</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;I&apos;m running an IOR test (IOR-2.10.3) that writes 1GB files to one dataset/directory, then writes 3GB files to another dataset/directory, then reads back the first dataset. This test sequence is run 25 times. My filesystem is able to do 14-16GB/sec writes, and most iterations of this test will produce that bandwidth. Problem is that out of the 25 iterations, a couple/few of the test iterations turn in significantly lower results often in the 5-10GB/sec range.&lt;/p&gt;

&lt;p&gt; I initially suspected hardware issues, but testing of components including each individual disk drive showed everything working properly, and I&apos;ve seen nothing in the logs when running the test above reporting any problem. So, I started building and testing various combinations of Lustre and ZFS. The hardware, clients and server OS have been constant for each of the tests. Only SPL/ZFS and Lustre on the server have changed from test to test.&lt;/p&gt;

&lt;p&gt;It appears to boil down to the problem having been introduced in the Lustre 2.10.x branch. I have not seen the problem occur in the Lustre 2.9 builds I&apos;ve done. I&apos;ve built Lustre 2.9 with ZFS 0.7.3 and seen no issue. I&apos;ve build Lustre 2.10.x with ZFS 0.6.5.7 and do observe the issue. Every build I&apos;ve done with Lustre 2.10.x (several) showed the issue.&lt;/p&gt;</description>
                <environment>Centos 7.4, various Lustre and ZFS versions tested. Lustre clients are 2.10.2_RC2.</environment>
        <key id="50067">LU-10463</key>
            <summary>Poor write performance periodically on repeated test runs</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="pjones">Peter Jones</assignee>
                                    <reporter username="adilger">Andreas Dilger</reporter>
                        <labels>
                    </labels>
                <created>Fri, 5 Jan 2018 18:24:47 +0000</created>
                <updated>Fri, 9 Feb 2018 21:58:52 +0000</updated>
                            <resolved>Sat, 20 Jan 2018 16:29:47 +0000</resolved>
                                    <version>Lustre 2.11.0</version>
                    <version>Lustre 2.10.2</version>
                                    <fixVersion>Lustre 2.11.0</fixVersion>
                    <fixVersion>Lustre 2.10.4</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>7</watches>
                                                                            <comments>
                            <comment id="217615" author="adilger" created="Fri, 5 Jan 2018 19:02:31 +0000"  >&lt;p&gt;After narrowing it down between 2.9.58 and 2.9.59, of the 87 patches possible candidate patches that affect the server (excluding ldiskfs) aree:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;42bf19a573a5 LU-8703 libcfs: make tolerant to offline CPUs and empty NUMA nodes
e711370e13dc LU-9448 lnet: handle empty CPTs
8c9c1f59d99c LU-9090 ofd: increase default OST BRW size to 4MB
03f24e6f7864 LU-2049 grant: Fix grant interop with pre-GRANT_PARAM clients
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Of those patches, it seems that 8c9c1f59d99c is very likely the culprit for this, since it is the only patch that directly affects the IO path.  It would be possible to verify this by setting &quot;&lt;tt&gt;lctl set_param osc.&amp;#42;.max_pages_per_rpc=1M&lt;/tt&gt;&quot; on the clients for a 2.9.59/2.10.0 client.&lt;/p&gt;</comment>
                            <comment id="217619" author="rgunlock" created="Fri, 5 Jan 2018 19:39:10 +0000"  >&lt;p&gt;Looks like setting max_pages_per_rpc=1M has done the trick. I&apos;ve tested both 2.9.59 and 2.10.2 using ZFS 0.7.3 with consistent write results. I didn&apos;t see any significant performance degradation using this setting. Using 2.10.2, for writes I averaged 16,053 MiB/s with a spread of 15,070-16729 MiB/s across 80 test runs, which seems pretty typical for my hardware.&lt;/p&gt;

&lt;p&gt;Short of figuring out how to get ZFS OSDs to take advantage of the larger default max_pages_per_rpc, I do think a patch to default to 1M for ZFS OSDs would be a good idea.&lt;/p&gt;

&lt;p&gt;Thanks for the prompt attention to this issue, and I&apos;m happy that there is a simple solution.  I&apos;ve attached a spreadsheet that shows my write results with default and 1M max_pages_per_rpc.&lt;/p&gt;</comment>
                            <comment id="217653" author="gerrit" created="Sat, 6 Jan 2018 01:46:47 +0000"  >&lt;p&gt;Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/30757&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/30757&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-10463&quot; title=&quot;Poor write performance periodically on repeated test runs&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-10463&quot;&gt;&lt;del&gt;LU-10463&lt;/del&gt;&lt;/a&gt; osd-zfs: use 1MB RPC size by default&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 6848d1ad26d00ade658e85e608c4a83a9a7747cd&lt;/p&gt;</comment>
                            <comment id="218730" author="gerrit" created="Sat, 20 Jan 2018 06:19:17 +0000"  >&lt;p&gt;Oleg Drokin (oleg.drokin@intel.com) merged in patch &lt;a href=&quot;https://review.whamcloud.com/30757/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/30757/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-10463&quot; title=&quot;Poor write performance periodically on repeated test runs&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-10463&quot;&gt;&lt;del&gt;LU-10463&lt;/del&gt;&lt;/a&gt; osd-zfs: use 1MB RPC size by default&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: af34a876d2ebde2b4717c920683c7fc8b5eae1cf&lt;/p&gt;</comment>
                            <comment id="218745" author="pjones" created="Sat, 20 Jan 2018 16:29:47 +0000"  >&lt;p&gt;Landed for 2.11&lt;/p&gt;</comment>
                            <comment id="218808" author="gerrit" created="Mon, 22 Jan 2018 15:28:08 +0000"  >&lt;p&gt;Minh Diep (minh.diep@intel.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/30969&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/30969&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-10463&quot; title=&quot;Poor write performance periodically on repeated test runs&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-10463&quot;&gt;&lt;del&gt;LU-10463&lt;/del&gt;&lt;/a&gt; osd-zfs: use 1MB RPC size by default&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: b2_10&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 79f3e1a4fa0ed94ee3958c955471d3ba67050a60&lt;/p&gt;</comment>
                            <comment id="220619" author="gerrit" created="Fri, 9 Feb 2018 18:13:00 +0000"  >&lt;p&gt;John L. Hammond (john.hammond@intel.com) merged in patch &lt;a href=&quot;https://review.whamcloud.com/30969/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/30969/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-10463&quot; title=&quot;Poor write performance periodically on repeated test runs&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-10463&quot;&gt;&lt;del&gt;LU-10463&lt;/del&gt;&lt;/a&gt; osd-zfs: use 1MB RPC size by default&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: b2_10&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: f119ec3196eb3e7773eeb4dcb3d825d7f8725a9c&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10324">
                    <name>Cloners</name>
                                            <outwardlinks description="Clones">
                                                        </outwardlinks>
                                                        </issuelinktype>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="32420">LU-9090</issuekey>
        </issuelink>
                            </outwardlinks>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="50077">LU-10465</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="29093" name="ior-results-1Mvs4M.xlsx" size="14184" author="rgunlock" created="Fri, 5 Jan 2018 19:37:41 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzzqkv:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>