<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:22:10 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-2078] sync_on_lock_cancel too aggressive</title>
                <link>https://jira.whamcloud.com/browse/LU-2078</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;While running some single shared file IOR tests I noticed, with &apos;zpool iostat&apos;, that there were a lots write going on during the read phase.  This was absolutely destroying performance and it wasn&apos;t at all clear to me why there should be any reads at all.  IOR is just reading during this phase, no writes.&lt;/p&gt;

&lt;p&gt;On a hunch I set &apos;sync_on_lock_cancel=never&apos; and the writes stopped entirely and performance improved at least 6x.  I haven&apos;t looked at the code yet but I think there might be a few issues here.&lt;/p&gt;

&lt;p&gt;1) Why were so many lock revocations occurring during the read phase.  I would have thought that each client would end up with a single concurrent read lock rather quickly.  However, this behavior persistent for the 30 minutes it took to read all the data back.&lt;/p&gt;

&lt;p&gt;2) When revoking a read lock, or even a write lock which isn&apos;t actually covering any dirty data, there&apos;s no reason to issue the sync.  I suspect this was never noticed because under ldiskfs if there&apos;s no dirty data to sync performing the sync is basically free.  Under ZFS however if you call sync you will force the current txg to be written out.  That will result in at a minimum the vdev labels at the beginning and end of each disk to be written.  Lots of syncs will result each disk head basically seeking from the being to end of the disk repeatedly.  Very very bad.&lt;/p&gt;</description>
                <environment></environment>
        <key id="14142">LU-2078</key>
            <summary>sync_on_lock_cancel too aggressive</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="bzzz">Alex Zhuravlev</assignee>
                                    <reporter username="behlendorf">Brian Behlendorf</reporter>
                        <labels>
                            <label>server</label>
                            <label>topsequoia</label>
                    </labels>
                <created>Thu, 19 Apr 2012 14:09:03 +0000</created>
                <updated>Thu, 26 Dec 2013 17:21:00 +0000</updated>
                            <resolved>Mon, 8 Oct 2012 01:21:25 +0000</resolved>
                                                    <fixVersion>Lustre 2.4.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>4</watches>
                                                                            <comments>
                            <comment id="41594" author="bzzz" created="Mon, 9 Jul 2012 04:55:01 +0000"  >&lt;p&gt;please, give this trivial patch a run?&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;http://review.whamcloud.com/#change,3355&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#change,3355&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="41770" author="adilger" created="Thu, 12 Jul 2012 18:19:56 +0000"  >&lt;p&gt;Per comments in Skype, it might make sense to remove the sync from the DLM level entirely, and implement a &quot;commit on share&quot; style handling of write data.  That means the writes should not be committed to disk when the lock is cancelled, but rather only if the data is being read again.  This should allow the initial writes will go faster because a larger number of writes can be aggregated into a single commit.  If the data is not immediately being read back from the object, it may not even need a transaction commit.&lt;/p&gt;

&lt;p&gt;There is no data consistency problem from doing this, because until the data is read by another client it is in a Schr&#246;dinger&apos;s box, and whether it was committed to disk or not is irrelevant.  Using the same &quot;commit on share&quot; mechanism also brings the OST data handling more in line with COS on the MDS.&lt;/p&gt;

&lt;p&gt;The most simplistic mechanism for this is to move the &quot;inode version &amp;gt; last_committed&quot; conditional sync from change 3355 from the DLM cancel into the OST read code.  This will improve the common case of many page-aligned writers on the same file that do not need to do read-modify-write of the file data.  A more sophisticated mechanism would be to track the dirty extents of the file (via ZIL?), and only sync the inode and data if reading dirty pages.&lt;/p&gt;</comment>
                            <comment id="41995" author="bzzz" created="Thu, 19 Jul 2012 04:05:40 +0000"  >&lt;p&gt;Brian, do you remember any details of the test? how many stripes? arguments to IOR, etc? thanks in advance.&lt;/p&gt;</comment>
                            <comment id="42028" author="prakash" created="Thu, 19 Jul 2012 21:54:56 +0000"  >&lt;p&gt;Alex, I was able to run a few single shared file tests today. Here&apos;s a summary of my results.&lt;/p&gt;

&lt;p&gt;I ran 4 separate tests. Two tests with our 55chaos tag (&lt;b&gt;without&lt;/b&gt; &lt;a href=&quot;http://review.whamcloud.com/3355&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/3355&lt;/a&gt;), one with sync_on_lock_cancel=always and another with sync_on_lock_cancel=never. Also, two tests were run with our 56chaos tag (&lt;b&gt;with&lt;/b&gt; &lt;a href=&quot;http://review.whamcloud.com/3355&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/3355&lt;/a&gt;), again one with &quot;always&quot; and another with &quot;never&quot;.&lt;/p&gt;

&lt;p&gt;Also, each test was run using 64 client nodes, 1 task per node, striped over a single OST, using these IOR options &quot;-a POSIX -CegkY&quot;.&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Single Shared File Read Performance (MiB/s)

         |  always |   never |
       --+---------+---------+
 55chaos |  469.19 | 1304.03 |
       --+---------+---------+
 56chaos |  988.26 | 1275.59 |
       --+---------+---------+
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;So just from the above data, your patch seems to provide a 2x improvement for reads!&lt;/p&gt;

&lt;p&gt;Also, I&apos;ll attach a tarball with the IOR output and and logs from MDS and OSS with &apos;+dlmtrace&apos; enabled during each test&apos;s read phase.&lt;/p&gt;</comment>
                            <comment id="42029" author="prakash" created="Thu, 19 Jul 2012 21:59:10 +0000"  >&lt;p&gt;IOR output for the read and write phase of each test. And MDS and OSS server logs for the read phase of each test with &apos;+dlmtrace&apos; enabled.&lt;/p&gt;</comment>
                            <comment id="42195" author="bzzz" created="Tue, 24 Jul 2012 12:14:48 +0000"  >&lt;p&gt;thanks for the data Prakash. I&apos;m still looking at dlmtrace.. it&apos;s still not clear why we don&apos;t reach&lt;br/&gt;
the peak performance: the very first PR lock should commit the whole object and all subsequent reads&lt;br/&gt;
should skip osd_sync().. so here is a debug patch: &lt;a href=&quot;http://review.whamcloud.com/#change,3456&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#change,3456&lt;/a&gt;&lt;br/&gt;
please try, just current orion + this patch with sync_on_cancel=always should be enough. a log with&lt;br/&gt;
+dlmtrace +rpctrace would be very useful. thanks in advance.&lt;/p&gt;</comment>
                            <comment id="42219" author="prakash" created="Tue, 24 Jul 2012 20:10:45 +0000"  >&lt;p&gt;This has the IOR output from the read and write phase of the test. Along with Lustre Logs from the OSS with +dlmtrace and +rpctrace.&lt;/p&gt;</comment>
                            <comment id="42232" author="bzzz" created="Wed, 25 Jul 2012 00:58:12 +0000"  >&lt;p&gt;thanks.. here is another patch: &lt;a href=&quot;http://review.whamcloud.com/#change,3456&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#change,3456&lt;/a&gt; &lt;img class=&quot;emoticon&quot; src=&quot;https://jira.whamcloud.com/images/icons/emoticons/smile.png&quot; height=&quot;16&quot; width=&quot;16&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt; hope this time we&apos;ll get some improvement. again, a log with +dlmtrace +rpctrace is very welcome. thanks in advance.&lt;/p&gt;</comment>
                            <comment id="42269" author="prakash" created="Wed, 25 Jul 2012 16:35:34 +0000"  >&lt;p&gt;Same logs as before, except with our 58chaos tag (reverted debug patch-set 1 and applied debug patch-set 2 as requested).&lt;/p&gt;

&lt;p&gt;With this tag, I saw the best performance numbers of all my tests run so far (roughly equal to the sync_on_lock_cancel=never tests):&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Single Shared File Read Performance (MiB/s)

        |  always |   never  |
      --+---------+----------+
58chaos | 1311.51 | untested |
      --+---------+----------+
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;So my question is, can we safely drop the write lock as is done in the debug patch? Or is this just a proof of concept?&lt;/p&gt;

&lt;p&gt;Also, What exactly is that lock protecting, and why?&lt;/p&gt;</comment>
                            <comment id="42275" author="prakash" created="Wed, 25 Jul 2012 16:52:56 +0000"  >&lt;p&gt;I went ahead and ran the same test with `sync_on_lock_cancel=never` as well. Here&apos;s the logs and performance numbers.&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Single Shared File Read Performance (MiB/s)

        |  always |  never  |
      --+---------+---------+
58chaos | 1311.51 | 1265.46 |
      --+---------+---------+
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="46070" author="morrone" created="Fri, 5 Oct 2012 17:43:11 +0000"  >&lt;p&gt;Patch landed on master: &lt;a href=&quot;http://review.whamcloud.com/4117&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/4117&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="46116" author="ian" created="Mon, 8 Oct 2012 01:21:25 +0000"  >&lt;p&gt;Patch landed.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="13017">LU-2085</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="11718" name="57chaos-always.tar.bz2" size="4777469" author="prakash" created="Tue, 24 Jul 2012 20:10:45 +0000"/>
                            <attachment id="11720" name="58chaos-always.tar.bz2" size="4725274" author="prakash" created="Wed, 25 Jul 2012 16:35:34 +0000"/>
                            <attachment id="11721" name="58chaos-never.tar.bz2" size="4715309" author="prakash" created="Wed, 25 Jul 2012 16:52:56 +0000"/>
                            <attachment id="11705" name="ori642.tar.bz2" size="7425565" author="prakash" created="Thu, 19 Jul 2012 21:59:10 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10040" key="com.atlassian.jira.plugin.system.customfieldtypes:labels">
                        <customfieldname>Epic</customfieldname>
                        <customfieldvalues>
                                        <label>server</label>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                    <customfield id="customfield_10070" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Project</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10031"><![CDATA[Orion]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzuwwf:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2992</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>