<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:07:58 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-7330] spinlock lockup on ldlm blp_lock in ldlm_bl_* threads</title>
                <link>https://jira.whamcloud.com/browse/LU-7330</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Our system administrators have reported that Lustre clients on our big BG/Q system are locking up due to lustre.  There are some softlockups, and then the kernel reports a &quot;spinlock lockup&quot; and dumps backtraces for many processes that are all stuck spinning on the same ldlm blp_lock.&lt;/p&gt;

&lt;p&gt;Many ldlm_bl_* threads are all spinning here:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;_raw_spin_lock
_spin_lock
ldlm_bl_get_work
ldlm_bl_thread_main
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Meanwhile many sysiod processes (the I/O node portion of the I/O forwarding system on BG/Q) are stuck in either a read or write system call path that go through shrink_slab() and get stuck on the same spin lock.  Here is an example backtrace for a read case:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;_raw_spin_lock
_spin_lock
_ldlm_bl_to_thread
ldlm_bl_to_thread
ldlm_cancel_lru
ldlm_cli_pool_shrink
ldlm_pool_shrink
ldlm_pools_shrink
shrink_slab
do_try_free_pages
try_free_pages
__alloc_pages_nodemask
generic_file_aio_read
vvp_io_read_start
cl_io_start
cl_io_loop
ll_file_io_generic
ll_file_aio_read
ll_file_read
vfs_read
sys_read
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Crash dumps are unfortunately not available on this system.  That is about ll the information that I can currently get.&lt;/p&gt;

&lt;p&gt;Those clients are running lustre 2.5.4-4chaos.&lt;/p&gt;</description>
                <environment></environment>
        <key id="32781">LU-7330</key>
            <summary>spinlock lockup on ldlm blp_lock in ldlm_bl_* threads</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="niu">Niu Yawei</assignee>
                                    <reporter username="morrone">Christopher Morrone</reporter>
                        <labels>
                            <label>llnl</label>
                            <label>llnlfixready</label>
                    </labels>
                <created>Fri, 23 Oct 2015 00:39:57 +0000</created>
                <updated>Fri, 27 Jan 2017 22:38:46 +0000</updated>
                            <resolved>Wed, 11 Nov 2015 18:14:48 +0000</resolved>
                                                    <fixVersion>Lustre 2.8.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>7</watches>
                                                                            <comments>
                            <comment id="131393" author="adilger" created="Fri, 23 Oct 2015 17:12:27 +0000"  >&lt;p&gt;When the clients lock up, do you know how many &lt;tt&gt;ldlm_bl_*&lt;/tt&gt; threads are running? There was another issue reported recently about too many ldlm_bl threads being started when a process is interrupted, but setting &lt;tt&gt;options ptlrpc ldlm_num_threads=16&lt;/tt&gt; has avoided the problem. &lt;/p&gt;</comment>
                            <comment id="131449" author="morrone" created="Fri, 23 Oct 2015 23:25:24 +0000"  >&lt;p&gt;The &quot;spinlock lockup&quot; backtraces are showing in the 50+ threads, and the thread numbers are up in the 120s.  I am also seeing multiple ldlm_bl_* threads duplicate names.&lt;/p&gt;

&lt;p&gt;On one node that is not currently locked up, there are 160 ldlm_bl_* threads with only 72 unique names between them.&lt;/p&gt;</comment>
                            <comment id="131462" author="adilger" created="Sat, 24 Oct 2015 07:22:22 +0000"  >&lt;p&gt;It sounds as if this may be a similar problem. The thread names are not important for their functionality, but the duplicate names do imply that many of the threads started around the same time. You could try the module option, but that won&apos;t take effect  until the next remount. &lt;/p&gt;</comment>
                            <comment id="131582" author="pjones" created="Mon, 26 Oct 2015 17:21:31 +0000"  >&lt;p&gt;Niu&lt;/p&gt;

&lt;p&gt;Could you please assist with this issue?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="131615" author="adilger" created="Mon, 26 Oct 2015 22:22:47 +0000"  >&lt;p&gt;[ Comments from the other ticket, which cannot be made public ]&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Users that run jobs interactively and when they kill their jobs with Ctrl-C, they find that some nodes will end up with 100-200 LDLM processes. For comparison, freshly rebooted nodes have only 15-25 ldlm processes. The user checked and confirmed that after the Ctrl-C, there are no more processes of the user job remaining, nor are fuser and lsof reporting any open files. It is therefore not clear what is driving up the LDLM process count.&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;There is one problem that is shown by the stack traces, which may only be cosmetic, but may also be a sign of a larger problem. There are a large number of threads named ldlm_bl_38, and in fact most of the ldlm_bl_* threads have duplicate names. That in itself is harmless (though confusing while debugging), but may indicate that there is some race condition when starting up these threads that causes so many to be started in such a short time.&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;&lt;span class=&quot;code-keyword&quot;&gt;static&lt;/span&gt; &lt;span class=&quot;code-object&quot;&gt;int&lt;/span&gt; ldlm_bl_thread_start(struct ldlm_bl_pool *blp)
{
        struct ldlm_bl_thread_data bltd = { .bltd_blp = blp };
        struct task_struct *task;

        init_completion(&amp;amp;bltd.bltd_comp);
        bltd.bltd_num = atomic_read(&amp;amp;blp-&amp;gt;blp_num_threads);
        snprintf(bltd.bltd_name, sizeof(bltd.bltd_name) - 1,
                &lt;span class=&quot;code-quote&quot;&gt;&quot;ldlm_bl_%02d&quot;&lt;/span&gt;, bltd.bltd_num);
        task = kthread_run(ldlm_bl_thread_main, &amp;amp;bltd, bltd.bltd_name);
        &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (IS_ERR(task)) {
                CERROR(&lt;span class=&quot;code-quote&quot;&gt;&quot;cannot start LDLM thread ldlm_bl_%02d: rc %ld\n&quot;&lt;/span&gt;,
                       atomic_read(&amp;amp;blp-&amp;gt;blp_num_threads), PTR_ERR(task));
                &lt;span class=&quot;code-keyword&quot;&gt;return&lt;/span&gt; PTR_ERR(task);
        }
        wait_for_completion(&amp;amp;bltd.bltd_comp);

        &lt;span class=&quot;code-keyword&quot;&gt;return&lt;/span&gt; 0;
}
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The naming problem is clearly in the code above - it picks the thread name before actually starting the thread and incrementing the counter.&lt;/p&gt;

&lt;p&gt;I counted about 200 threads started on this node, none of which were doing anything. It appears that the check for the number of running threads is also racy:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;&lt;span class=&quot;code-comment&quot;&gt;/* Not fatal &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; racy and have a few too many threads */&lt;/span&gt;
&lt;span class=&quot;code-keyword&quot;&gt;static&lt;/span&gt; &lt;span class=&quot;code-object&quot;&gt;int&lt;/span&gt; ldlm_bl_thread_need_create(struct ldlm_bl_pool *blp,
                                      struct ldlm_bl_work_item *blwi)
{
        &lt;span class=&quot;code-object&quot;&gt;int&lt;/span&gt; busy = atomic_read(&amp;amp;blp-&amp;gt;blp_busy_threads);

        &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (busy &amp;gt;= blp-&amp;gt;blp_max_threads)
                &lt;span class=&quot;code-keyword&quot;&gt;return&lt;/span&gt; 0;

        &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (busy &amp;lt; atomic_read(&amp;amp;blp-&amp;gt;blp_num_threads))
                &lt;span class=&quot;code-keyword&quot;&gt;return&lt;/span&gt; 0;
        :
        :
}
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;and could be improved.  I agree a small number of threads too many is not harmful, but it seems that this can go badly wrong in some cases.&lt;/p&gt;

&lt;p&gt;There is a module parameter that could be used to limit the number of threads, to add in &lt;tt&gt;/etc/modprobe.d/lustre.conf&lt;/tt&gt;:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;options ptlrpc ldlm_num_threads=16
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;which should reduce the number of threads significantly, but shouldn&apos;t cause any problems for clients since they don&apos;t handle many of these RPCs at one time.  Note that this tunable may only fix the &quot;symptom&quot; (many LDLM threads) and not the root cause of the hang. It&apos;s hard to know for sure what is causing this problem since there are no error messages and the threads themselves are not doing anything either.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;After implementing the &lt;tt&gt;options ptlrpc ldlm_num_threads=16&lt;/tt&gt; module option, the site has not experienced any further problems.&lt;/p&gt;&lt;/blockquote&gt;</comment>
                            <comment id="131618" author="adilger" created="Mon, 26 Oct 2015 22:51:53 +0000"  >&lt;p&gt;I think there are three items that need to be fixed here:&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;figure out what is triggering so many ldlm_bl_&amp;#42; threads being started.&lt;/li&gt;
	&lt;li&gt;fix the thread naming problem.  Creating the thread name before calling kthread_run() is no longer needed, it should be possible to pass the format string and thread index to kthread_run() directly, if it didn&apos;t cause a racy startup.  It may be that we don&apos;t really want to have thread forking be &lt;em&gt;so&lt;/em&gt; efficient&lt;/li&gt;
	&lt;li&gt;limit the maximum number of &lt;tt&gt;ldlm_bl_&amp;#42;&lt;/tt&gt; threads on client namespaces to a lower number like 16 instead of 128, since clients don&apos;t handle many blocking callbacks compared to servers, and these numbers are per CPT.&lt;/li&gt;
&lt;/ul&gt;
</comment>
                            <comment id="132431" author="niu" created="Tue, 3 Nov 2015 04:57:37 +0000"  >&lt;p&gt;I scrutinized the code and looks the only possible reason of so many bl threads being started is the racy thread starting code itself.&lt;/p&gt;

&lt;p&gt;When there are lots of unused locks cached on client and kernel senses memory pressure, there could be lots bl work items being generated and processed by the bl threads, and if several bl threads running in to the ldlm_bl_thread_need_create() in parallel, lot more thread could be created mistakenly (it could exceed the blp_max_threads). Probably it caused the spinlock lockup problem as well? (too many threads contending on the blp_lock)&lt;/p&gt;

&lt;p&gt;I&apos;ll post a patch to fix the problem of forking too many threads exceeding limit (along with the wrong thread name) first. That could fix the spinlock lockup too.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;limit the maximum number of ldlm_bl_* threads on client namespaces to a lower number like 16 instead of 128, since clients don&apos;t handle many blocking callbacks compared to servers, and these numbers are per CPT.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;Andreas, usually server sends blocking callbacks to client, so client should have more bl callbacks to handle, right? In addition, the bl threads on client needs to handle the unused lock cancel, so I don&apos;t think we should decrease the maximum thread number for client. (it&apos;s a per module parameter now, and bl threads are shared by all namespaces)&lt;/p&gt;</comment>
                            <comment id="132443" author="gerrit" created="Tue, 3 Nov 2015 07:04:39 +0000"  >&lt;p&gt;Niu Yawei (yawei.niu@intel.com) uploaded a new patch: &lt;a href=&quot;http://review.whamcloud.com/17026&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/17026&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-7330&quot; title=&quot;spinlock lockup on ldlm blp_lock in ldlm_bl_* threads&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-7330&quot;&gt;&lt;del&gt;LU-7330&lt;/del&gt;&lt;/a&gt; ldlm: fix race of starting bl threads&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 187c285188130bb670c9f4daa38a67c35f064046&lt;/p&gt;</comment>
                            <comment id="133252" author="gerrit" created="Wed, 11 Nov 2015 15:53:52 +0000"  >&lt;p&gt;Oleg Drokin (oleg.drokin@intel.com) merged in patch &lt;a href=&quot;http://review.whamcloud.com/17026/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/17026/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-7330&quot; title=&quot;spinlock lockup on ldlm blp_lock in ldlm_bl_* threads&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-7330&quot;&gt;&lt;del&gt;LU-7330&lt;/del&gt;&lt;/a&gt; ldlm: fix race of starting bl threads&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: ebda41d8de7956f19fd27f86208c668e43c6957c&lt;/p&gt;</comment>
                            <comment id="133280" author="jgmitter" created="Wed, 11 Nov 2015 18:14:49 +0000"  >&lt;p&gt;Landed for 2.8&lt;/p&gt;</comment>
                            <comment id="147151" author="adilger" created="Tue, 29 Mar 2016 05:33:12 +0000"  >&lt;p&gt;Just to restate the workaround for this problem (so that it is more easily visible) is to add &lt;tt&gt;options ptlrpc ldlm_num_threads=16&lt;/tt&gt; to &lt;tt&gt;/etc/modprobe.d/lustre.conf&lt;/tt&gt; to avoid the problem until upgrading to a release in which this has been fixed.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10010">
                    <name>Duplicate</name>
                                                                <inwardlinks description="is duplicated by">
                                                        </inwardlinks>
                                    </issuelinktype>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                                        </outwardlinks>
                                                                <inwardlinks description="is related to">
                                                        </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzxr6v:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>