<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:43:20 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-4507] Server hangs and terrible performance - ZFS IOR</title>
                <link>https://jira.whamcloud.com/browse/LU-4507</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;For sometime now we have been observing terrible read performance when running ZFS IOR file-per-proccess. The system will see ~7 GB/s reading with ldiskfs, at higher client counts the ZFS read performance on this test will drop to ~400 MB/s which is roughly a single client level. &lt;br/&gt;
Observing the OSTs we typically see one or two of the 12 OSTs with a very high load, the rest idle. The busy OST with then timeout, frequently evict several clients, and move forward. Stack dumps and errors from two servers are attached. These tests are ongoing, please advise what further data needs to be collected.&lt;/p&gt;</description>
                <environment>Hyperion/LLNL</environment>
        <key id="22795">LU-4507</key>
            <summary>Server hangs and terrible performance - ZFS IOR</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="1" iconUrl="https://jira.whamcloud.com/images/icons/priorities/blocker.svg">Blocker</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="5">Cannot Reproduce</resolution>
                                        <assignee username="isaac">Isaac Huang</assignee>
                                    <reporter username="cliffw">Cliff White</reporter>
                        <labels>
                    </labels>
                <created>Fri, 17 Jan 2014 19:09:56 +0000</created>
                <updated>Sat, 9 Oct 2021 05:55:58 +0000</updated>
                            <resolved>Sat, 9 Oct 2021 05:55:58 +0000</resolved>
                                    <version>Lustre 2.6.0</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>7</watches>
                                                                            <comments>
                            <comment id="75352" author="cliffw" created="Tue, 21 Jan 2014 17:01:07 +0000"  >&lt;p&gt;ZFS also is demonstrating terrible mdtest performance. I compared ZFS and ldiskfs, increasing the number of files. Increasing file count on ZFS from 500k to 750k increased the test run time 4-6x. No such impact was observed on the ldiskfs side. Spreadsheet attached.&lt;/p&gt;</comment>
                            <comment id="75487" author="adilger" created="Thu, 23 Jan 2014 04:53:52 +0000"  >&lt;p&gt;Brian, Isaac is starting to dig into ZFS debugging and development.  The stack traces that Cliff attached here look pretty clearly like a lockup (or at least heavy contention) in ZFS:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;ll_ost_io02_0 D 0000000000000008     0 10402      2 0x00000000
Call Trace:
 [&amp;lt;ffffffff81510ad5&amp;gt;] rwsem_down_failed_common+0x95/0x1d0
 [&amp;lt;ffffffff81510c33&amp;gt;] rwsem_down_write_failed+0x23/0x30
 [&amp;lt;ffffffff81283cb3&amp;gt;] call_rwsem_down_write_failed+0x13/0x20
 [&amp;lt;ffffffff81510132&amp;gt;] ? down_write+0x32/0x40
 [&amp;lt;ffffffffa04e970a&amp;gt;] dmu_zfetch+0x6fa/0xd70 [zfs]
 [&amp;lt;ffffffffa04d3281&amp;gt;] dbuf_read+0x6a1/0x750 [zfs]
 [&amp;lt;ffffffffa04db122&amp;gt;] dmu_buf_hold_array_by_dnode+0x162/0x560 [zfs]
 [&amp;lt;ffffffffa04dc167&amp;gt;] dmu_read+0x97/0x180 [zfs]
 [&amp;lt;ffffffffa0d0891f&amp;gt;] osd_read+0x18f/0x240 [osd_zfs]
 [&amp;lt;ffffffffa0d08b53&amp;gt;] osd_read_prep+0x183/0x240 [osd_zfs]
 [&amp;lt;ffffffffa0dc3ba3&amp;gt;] ofd_preprw_read+0x253/0x7f0 [ofd]
 [&amp;lt;ffffffffa0dc48ba&amp;gt;] ofd_preprw+0x77a/0x1480 [ofd]
 [&amp;lt;ffffffffa0d58e11&amp;gt;] obd_preprw+0x121/0x390 [ost]
 [&amp;lt;ffffffffa0d60569&amp;gt;] ost_brw_read+0xd29/0x1350 [ost]
 [&amp;lt;ffffffffa0d676d8&amp;gt;] ost_handle+0x24a8/0x44d0 [ost]
 [&amp;lt;ffffffffa08a00e5&amp;gt;] ptlrpc_server_handle_request+0x385/0xc00 [ptlrpc]
 [&amp;lt;ffffffffa08a144d&amp;gt;] ptlrpc_main+0xaed/0x1740 [ptlrpc]
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Any pointers for how to proceed on this?  Looks like the threads are blocked for a long time reading a dnode from disk, or there is some sort of lock deadlock.&lt;/p&gt;</comment>
                            <comment id="75512" author="behlendorf" created="Thu, 23 Jan 2014 17:49:02 +0000"  >&lt;p&gt;My suggestion would be to start with the latest source from Github if you&apos;re doing any sort of performance work.  We&apos;ve made some major performance improvements in the last 6 months you&apos;ll definitely benefit from.  We try very hard to keep what&apos;s on the master branch stable so I would track it for performance testing.&lt;/p&gt;

&lt;p&gt;  &lt;a href=&quot;https://github.com/zfsonlinux/zfs/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://github.com/zfsonlinux/zfs/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;One of the major improvements made post 0.6.2 was the ZFS write throttle code has been completely reworked.  The previous design was causing considerable I/O starvation/contention just like you&apos;ve described above.  The updated code smooths things out considerably we&apos;re see more consistent I/O times and improved throughput.  Here are some additional links describing this work.&lt;/p&gt;

&lt;p&gt;  &lt;a href=&quot;http://open-zfs.org/wiki/Features#Smoother_Write_Throttle&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://open-zfs.org/wiki/Features#Smoother_Write_Throttle&lt;/a&gt;&lt;br/&gt;
  &lt;a href=&quot;http://dtrace.org/blogs/ahl/2013/12/27/zfs-fundamentals-the-write-throttle/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://dtrace.org/blogs/ahl/2013/12/27/zfs-fundamentals-the-write-throttle/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Another thing we&apos;ve been working on is improving the ARC hit rate.  We&apos;ve observed that particularly with meta data heavy workloads (which is all the MDS does) the ARC performance degrades over time and we end up needing to read from disk more.  You can see this behavior pretty easily by running the arcstat.py script which among other things can show you the the current cache hit rate.  Prakash has been investigating this and has proposed some promising patches which help a lot.  But we&apos;re still reviewing and testing them to ensure they work as expected and don&apos;t introduce regressions for other workloads.  We&apos;d love for you to give them a spin and see how much they help your testing.&lt;/p&gt;

&lt;p&gt;  &lt;a href=&quot;https://github.com/zfsonlinux/zfs/pull/1967&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://github.com/zfsonlinux/zfs/pull/1967&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It&apos;s also worth running the master branch because it adds some useful tools and more entries to proc to improve visibility.  My favorite new tool is dbufstat.py.  It allows you to dump all the cached dbufs and show which pool, dataset, object they belong too.  You can also see extended information about each buffer which allows you to often infer why it&apos;s being kept in the cache.  For example, for Lustre it clearly shows all the spill blocks we&apos;re forced to use because of the 512 byte dnode size.  That makes it quite clear than increasing the dnode size to 1k could halve the number of I/Os we need to do for lookups.  It&apos;s nice to be able to easily see that.&lt;/p&gt;

&lt;p&gt;There are also some new entries in /proc/spl/kstat/zfs/.  They let you get a handle of how long it&apos;s taking to assign a TXG or exactly what I/O are we issuing to disk when we get a cache miss.&lt;/p&gt;

&lt;ul&gt;
	&lt;li&gt;dbufs                - Stats for all dbufs in the dbuf_hash&lt;/li&gt;
	&lt;li&gt;&amp;lt;pool&amp;gt;/txgs          - Stats for the last N txgs synced to disk&lt;/li&gt;
	&lt;li&gt;&amp;lt;pool&amp;gt;/reads         - Stats for the last N reads issues by the ARC&lt;/li&gt;
	&lt;li&gt;&amp;lt;pool&amp;gt;/dmu_tx_assign - Histogram of tx assign times&lt;/li&gt;
	&lt;li&gt;&amp;lt;pool&amp;gt;/io            - Total I/O issued for the pool&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;Basically, we&apos;ve been thinking about performance with ZFS too.  And now that things are running well we&apos;ve been getting the tools in place so we can clearly understand exactly what needs to be improved.  I&apos;d hoped to get an 0.6.3 tag out with all these improvements in January but that&apos;s slipped.  One of the two major blockers in convincing ourselves that Prakash&apos;s ARC changes work as designed and help the expected workloads.  Once again if you guys could help test them that would be very helpful!&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="23474">LU-4716</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="14000" name="MDTEST performance.xlsx" size="35550" author="cliffw" created="Tue, 21 Jan 2014 17:01:07 +0000"/>
                            <attachment id="13997" name="h-agb15.errors.txt" size="8979" author="cliffw" created="Fri, 17 Jan 2014 19:09:57 +0000"/>
                            <attachment id="13996" name="h-agb15.log.dump.txt" size="1738099" author="cliffw" created="Fri, 17 Jan 2014 19:09:57 +0000"/>
                            <attachment id="13995" name="h-agb21.zfs.read.txt" size="1477227" author="cliffw" created="Fri, 17 Jan 2014 19:09:57 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzwd53:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>12333</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>