<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:03:33 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-6823] Performance regression on servers with LU-5264</title>
                <link>https://jira.whamcloud.com/browse/LU-6823</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Since the introduction of &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-5264&quot; title=&quot;ASSERTION( info-&amp;gt;oti_r_locks == 0 ) at OST umount&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-5264&quot;&gt;&lt;del&gt;LU-5264&lt;/del&gt;&lt;/a&gt;, we hit large performance regression on our filesystem with some user code that do lots of IOPS. This has an huge impact on the MDS, and btw on all the lustre clients. On the MDS, ptlrpcd, mdt and ldlm threads are overloaded, waiting in _spin_lock().&lt;/p&gt;

&lt;p&gt;The perf report recorded during a slow-down window of the FS is attached. Here is a sample from perf.report-dso:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;#
# Overhead             Shared &lt;span class=&quot;code-object&quot;&gt;Object&lt;/span&gt;
# ........  ........................
#
    98.08%  [kernel.kallsyms]
            |
            --- _spin_lock
               |
               |--97.33%-- 0xffffffffa05c12dc
               |          |
               |          |--53.24%-- 0xffffffffa075b629
               |          |          kthread
               |          |          child_rip
               |          |
               |          |--45.63%-- 0xffffffffa075b676
               |          |          kthread
               |          |          child_rip
               |          |
               |           --1.12%-- 0xffffffffa076aa58
               |                     kthread
               |                     child_rip
               |
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Callers:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;crash&amp;gt; kmem 0xffffffffa05c12dc
ffffffffa05c12dc (t) lu_context_exit+188 [obdclass] 

   VM_STRUCT                 ADDRESS RANGE               SIZE
ffff881877c91a40  ffffffffa0571000 - ffffffffa06a5000  1261568

      PAGE         PHYSICAL      MAPPING       INDEX CNT FLAGS
ffffea0055920d28 1872df3000                0  e4d1079  1 140000000000000
crash&amp;gt; kmem 0xffffffffa075b676
ffffffffa075b676 (t) ptlrpc_main+2806 [ptlrpc] ../debug/lustre-2.5.3.90/lustre/ptlrpc/service.c: 2356

   VM_STRUCT                 ADDRESS RANGE               SIZE
ffff881877c91400  ffffffffa06fb000 - ffffffffa0895000  1679360

      PAGE         PHYSICAL      MAPPING       INDEX CNT FLAGS
ffffea004b685228 158b853000                0      110  1 140000000000000
crash&amp;gt; kmem 0xffffffffa075b629
ffffffffa075b629 (t) ptlrpc_main+2729 [ptlrpc] ../debug/lustre-2.5.3.90/lustre/ptlrpc/service.c: 2534

   VM_STRUCT                 ADDRESS RANGE               SIZE
ffff881877c91400  ffffffffa06fb000 - ffffffffa0895000  1679360

      PAGE         PHYSICAL      MAPPING       INDEX CNT FLAGS
ffffea004b685228 158b853000                0      110  1 140000000000000
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This _spin_lock() has been introduced by &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-5264&quot; title=&quot;ASSERTION( info-&amp;gt;oti_r_locks == 0 ) at OST umount&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-5264&quot;&gt;&lt;del&gt;LU-5264&lt;/del&gt;&lt;/a&gt;, see &lt;a href=&quot;http://review.whamcloud.com/#/c/13103/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#/c/13103/&lt;/a&gt; . I can&apos;t find any backport to b2_5 in gerrit, but it seems that our engineering got the agreement to use this patch with Lustre 2.5.&lt;/p&gt;

&lt;p&gt;For now, as a workaround, we removed both patches from &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-5264&quot; title=&quot;ASSERTION( info-&amp;gt;oti_r_locks == 0 ) at OST umount&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-5264&quot;&gt;&lt;del&gt;LU-5264&lt;/del&gt;&lt;/a&gt; and &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6049&quot; title=&quot;General Protection Fault at echo_session_key_fini+0xa9&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6049&quot;&gt;&lt;del&gt;LU-6049&lt;/del&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;My guess is that it should happen as well on OSS.&lt;/p&gt;

&lt;p&gt;Could you help us to fix this? Do we miss some patches on top of 2.5.3.90? Is there any other conflicting patch?&lt;/p&gt;

&lt;p&gt;I attach some debug logs (dump_log.tgz) from the MDS during the observed issue, if this can help.&lt;/p&gt;

&lt;p&gt;If you need further information, please let me know.&lt;/p&gt;</description>
                <environment>RHEL 6.6 w/ Bull kernel 2.6.32-504.16.2.el6.Bull.74.x86_64&lt;br/&gt;
Lustre 2.5.3.90 with additionnal patches:&lt;br/&gt;
&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6471&quot; title=&quot;Unexpected Lustre Client LBUG in llog_write()&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6471&quot;&gt;&lt;strike&gt;LU-6471&lt;/strike&gt;&lt;/a&gt; obdclass: fix llog_cat_cleanup() usage on client&lt;br/&gt;
&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6392&quot; title=&quot;short read/write with stripe count &amp;gt; 1&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6392&quot;&gt;&lt;strike&gt;LU-6392&lt;/strike&gt;&lt;/a&gt;/&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6389&quot; title=&quot;read()/write() returning less than available bytes intermittently&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6389&quot;&gt;&lt;strike&gt;LU-6389&lt;/strike&gt;&lt;/a&gt; llite: restart short read/write for normal IO&lt;br/&gt;
&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-5740&quot; title=&quot;Kernel upgrade [RHEL6.6 2.6.32-504.el6]&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-5740&quot;&gt;&lt;strike&gt;LU-5740&lt;/strike&gt;&lt;/a&gt; kernel upgrade [RHEL6.6 2.6.32-504.el6]&lt;br/&gt;
&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-4582&quot; title=&quot;After failing over Lustre MGS node to the secondary, client mount fails with -5&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-4582&quot;&gt;&lt;strike&gt;LU-4582&lt;/strike&gt;&lt;/a&gt; mgc: replace hard-coded MGC_ENQUEUE_LIMIT value&lt;br/&gt;
&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-5678&quot; title=&quot;kernel crash due to NULL pointer dereference in kiblnd_pool_alloc_node()&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-5678&quot;&gt;&lt;strike&gt;LU-5678&lt;/strike&gt;&lt;/a&gt; o2iblnd: connection refcount fix for kiblnd_post_rx&lt;br/&gt;
&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-5393&quot; title=&quot;LBUG: (ost_handler.c:882:ost_brw_read()) ASSERTION( local_nb[i].rc == 0 ) failed&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-5393&quot;&gt;&lt;strike&gt;LU-5393&lt;/strike&gt;&lt;/a&gt; osd-ldiskfs: read i_size once to protect against race &lt;br/&gt;
&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-3727&quot; title=&quot;LBUG (llite_nfs.c:281:ll_get_parent()) ASSERTION(body-&amp;gt;valid &amp;amp; OBD_MD_FLID) failed&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-3727&quot;&gt;&lt;strike&gt;LU-3727&lt;/strike&gt;&lt;/a&gt; nfs: fix ll_get_parent() LBUG caused by permission&lt;br/&gt;
&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-4528&quot; title=&quot;osd_trans_exec_op()) ASSERTION( oti-&amp;gt;oti_declare_ops_rb[rb] &amp;gt; 0 ) failed: rb = 0&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-4528&quot;&gt;&lt;strike&gt;LU-4528&lt;/strike&gt;&lt;/a&gt; llog: dont write llog in 3 steps&lt;br/&gt;
&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-5522&quot; title=&quot;ofd_prolong_extent_locks()) ASSERTION( lock-&amp;gt;l_flags &amp;amp; 0x0000000000000020ULL ) failed&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-5522&quot;&gt;&lt;strike&gt;LU-5522&lt;/strike&gt;&lt;/a&gt; ldlm: remove expired lock from per-export list&lt;br/&gt;
&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-5264&quot; title=&quot;ASSERTION( info-&amp;gt;oti_r_locks == 0 ) at OST umount&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-5264&quot;&gt;&lt;strike&gt;LU-5264&lt;/strike&gt;&lt;/a&gt; obdclass: fix race during key quiescency&lt;br/&gt;
&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6049&quot; title=&quot;General Protection Fault at echo_session_key_fini+0xa9&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6049&quot;&gt;&lt;strike&gt;LU-6049&lt;/strike&gt;&lt;/a&gt; obdclass: Add synchro in lu_context_key_degister()&lt;br/&gt;
&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6084&quot; title=&quot;Tests are failed due to &amp;#39;recovery is aborted by hard timeout&amp;#39;&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6084&quot;&gt;&lt;strike&gt;LU-6084&lt;/strike&gt;&lt;/a&gt; ptlrpc: prevent request timeout grow due to recovery&lt;br/&gt;
&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-5764&quot; title=&quot;Crash of MDS on &amp;quot;apparent buffer overflow&amp;quot;&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-5764&quot;&gt;&lt;strike&gt;LU-5764&lt;/strike&gt;&lt;/a&gt; proc: crash of mds on apparent buffer overflow&lt;br/&gt;
&lt;br/&gt;
1 MDT, 480 OSTs, 5000+ clients.</environment>
        <key id="31024">LU-6823</key>
            <summary>Performance regression on servers with LU-5264</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="3">Duplicate</resolution>
                                        <assignee username="bfaccini">Bruno Faccini</assignee>
                                    <reporter username="bruno.travouillon">Bruno Travouillon</reporter>
                        <labels>
                    </labels>
                <created>Thu, 9 Jul 2015 15:08:23 +0000</created>
                <updated>Fri, 10 Jul 2015 18:27:26 +0000</updated>
                            <resolved>Fri, 10 Jul 2015 18:27:26 +0000</resolved>
                                    <version>Lustre 2.5.3</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>5</watches>
                                                                            <comments>
                            <comment id="120824" author="bfaccini" created="Thu, 9 Jul 2015 15:15:11 +0000"  >&lt;p&gt;Hello Bruno,&lt;br/&gt;
This is likely a duplicate of &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6800&quot; title=&quot;Significant performance regression with patch LU-5264&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6800&quot;&gt;&lt;del&gt;LU-6800&lt;/del&gt;&lt;/a&gt;. And my &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-5264&quot; title=&quot;ASSERTION( info-&amp;gt;oti_r_locks == 0 ) at OST umount&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-5264&quot;&gt;&lt;del&gt;LU-5264&lt;/del&gt;&lt;/a&gt; patch is the culprit here.&lt;br/&gt;
A first fix (changing spin-lock in a rw-lock) is currently under testing and others possible way to fix/improve too.&lt;br/&gt;
Will let you know asap how it goes.&lt;br/&gt;
Also, thanks for your &quot;real-world&quot; profiling infos.&lt;/p&gt;</comment>
                            <comment id="120826" author="bruno.travouillon" created="Thu, 9 Jul 2015 15:21:05 +0000"  >&lt;p&gt;Indeed, this is a duplicate of &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6800&quot; title=&quot;Significant performance regression with patch LU-5264&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6800&quot;&gt;&lt;del&gt;LU-6800&lt;/del&gt;&lt;/a&gt;. Ga&#235;tan is working on this issue with me, feel free to contact him if you need further information. A crash dump is available at the customer site if you need more &quot;real-world&quot; data.&lt;/p&gt;</comment>
                            <comment id="121021" author="adilger" created="Fri, 10 Jul 2015 18:27:26 +0000"  >&lt;p&gt;Closing as a duplicate of &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6800&quot; title=&quot;Significant performance regression with patch LU-5264&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6800&quot;&gt;&lt;del&gt;LU-6800&lt;/del&gt;&lt;/a&gt;.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="30925">LU-6800</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="18407" name="dump_log.tgz" size="3351699" author="bruno.travouillon" created="Thu, 9 Jul 2015 15:08:23 +0000"/>
                            <attachment id="18405" name="perf.report-dso" size="1769732" author="bruno.travouillon" created="Thu, 9 Jul 2015 15:08:23 +0000"/>
                            <attachment id="18406" name="perf.report.gz" size="2568510" author="bruno.travouillon" created="Thu, 9 Jul 2015 15:08:23 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzxhtr:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>