<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:45:52 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-4790] CPU soft lockups (60+ seconds) in ptlrpcd threads running IOR resulting in eviction</title>
                <link>https://jira.whamcloud.com/browse/LU-4790</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;I&apos;m able to easily reproduce an eviction by running a simple IOR on one of our PPC64 clusters (Vulcan). It looks like this is due to CPU soft lockups of the ptlrpcd threads in conjunction with threads being incorrectly bound to a single CPU.&lt;/p&gt;

&lt;p&gt;The reproducer can be as simple as a file-per-process IOR using a 1M transfer size:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;srun -- ~/ior/src/ior -o /p/lustre/file -v -w -k -t1m -b1g -F
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;When using a 512 compute node allocation, this will result in 128 threads on each of 4 IO nodes performing the actual IO on behalf of the function shipped compute nodes. So as far as the lustre client is concerned, 128 threads are writing to separate files with a 1M transfer size.&lt;/p&gt;

&lt;p&gt;The CPU soft lockups that result from this look like so:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;2014-03-19 13:38:45.928533 {RMP19Ma080953563} [mmcs]{2}.0.0: BUG: soft lockup - CPU#0 stuck for 66s! [ptlrpcd_1:3201]
2014-03-19 13:38:45.928973 {RMP19Ma080953563} [mmcs]{2}.0.0: Modules linked in: lmv(U) mgc(U) lustre(U) mdc(U) fid(U) fld(U) lov(U) osc(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) bgvrnic bgmudm
2014-03-19 13:38:45.929415 {RMP19Ma080953563} [mmcs]{2}.0.0: NIP: 80000000003515d0 LR: 80000000003517cc CTR: 000000000000007d
2014-03-19 13:38:45.929697 {RMP19Ma080953563} [mmcs]{2}.0.0: REGS: c0000003c34d6b30 TRAP: 0901   Not tainted  (2.6.32-358.11.1.bgq.2llnl.V1R2M1.bl2.1_0.ppc64)
2014-03-19 13:38:45.930094 {RMP19Ma080953563} [mmcs]{2}.0.0: MSR: 0000000080029000 &amp;lt;EE,ME,CE&amp;gt;  CR: 44224424  XER: 00000000
2014-03-19 13:38:45.930449 {RMP19Ma080953563} [mmcs]{2}.0.0: TASK = c0000003c318f500[3201] &apos;ptlrpcd_1&apos; THREAD: c0000003c34d4000 CPU: 0
2014-03-19 13:38:45.930893 {RMP19Ma080953563} [mmcs]{2}.0.0: GPR00: 000000001ea5240a c0000003c34d6db0 8000000000395550 00000000000015b0 
2014-03-19 13:38:45.931227 {RMP19Ma080953563} [mmcs]{2}.0.0: GPR04: c0000003ad9dd8e0 000000001ea8bb55 000000000003974c 0000000000000053 
2014-03-19 13:38:45.931607 {RMP19Ma080953563} [mmcs]{2}.0.0: GPR08: 000000000000009c c0000003ad9de6c0 0000000000000029 00000000000000ff 
2014-03-19 13:38:45.931929 {RMP19Ma080953563} [mmcs]{2}.0.0: GPR12: 0000000000001170 c0000000007c2500 
2014-03-19 13:38:45.932301 {RMP19Ma080953563} [mmcs]{2}.0.0: NIP [80000000003515d0] .__adler32+0xb0/0x220 [libcfs]
2014-03-19 13:38:45.932648 {RMP19Ma080953563} [mmcs]{2}.0.0: LR [80000000003517cc] .adler32_update+0x1c/0x40 [libcfs]
2014-03-19 13:38:45.933093 {RMP19Ma080953563} [mmcs]{2}.0.0: Call Trace:
2014-03-19 13:38:45.933443 {RMP19Ma080953563} [mmcs]{2}.0.0: [c0000003c34d6db0] [c0000003c34d6e40] 0xc0000003c34d6e40 (unreliable)
2014-03-19 13:38:45.933807 {RMP19Ma080953563} [mmcs]{2}.0.0: [c0000003c34d6e30] [c0000000002041c4] .crypto_shash_update+0x4c/0x60
2014-03-19 13:38:45.934089 {RMP19Ma080953563} [mmcs]{2}.0.0: [c0000003c34d6ea0] [c000000000204218] .shash_compat_update+0x40/0x80
2014-03-19 13:38:45.934450 {RMP19Ma080953563} [mmcs]{2}.0.0: [c0000003c34d6f60] [800000000035072c] .cfs_crypto_hash_update_page+0x8c/0xb0 [libcfs]
2014-03-19 13:38:45.934778 {RMP19Ma080953563} [mmcs]{2}.0.0: [c0000003c34d7020] [8000000001244f94] .osc_checksum_bulk+0x1c4/0x890 [osc]
2014-03-19 13:38:45.935105 {RMP19Ma080953563} [mmcs]{2}.0.0: [c0000003c34d7170] [80000000012463c0] .osc_brw_prep_request+0xd60/0x1d80 [osc]
2014-03-19 13:38:45.935373 {RMP19Ma080953563} [mmcs]{2}.0.0: [c0000003c34d7330] [800000000125931c] .osc_build_rpc+0xc8c/0x2320 [osc]
2014-03-19 13:38:45.935676 {RMP19Ma080953563} [mmcs]{2}.0.0: [c0000003c34d74e0] [800000000127f2f4] .osc_send_write_rpc+0x594/0xb40 [osc]
2014-03-19 13:38:45.935955 {RMP19Ma080953563} [mmcs]{2}.0.0: [c0000003c34d76c0] [800000000127ff74] .osc_check_rpcs+0x6d4/0x1770 [osc]
2014-03-19 13:38:45.936251 {RMP19Ma080953563} [mmcs]{2}.0.0: [c0000003c34d78f0] [800000000128131c] .osc_io_unplug0+0x30c/0x6b0 [osc]
2014-03-19 13:38:45.936528 {RMP19Ma080953563} [mmcs]{2}.0.0: [c0000003c34d7a10] [800000000125b174] .brw_interpret+0x7c4/0x1940 [osc]
2014-03-19 13:38:45.936881 {RMP19Ma080953563} [mmcs]{2}.0.0: [c0000003c34d7b50] [8000000000e9a964] .ptlrpc_check_set+0x3a4/0x4d30 [ptlrpc]
2014-03-19 13:38:45.937166 {RMP19Ma080953563} [mmcs]{2}.0.0: [c0000003c34d7d20] [8000000000ee9a9c] .ptlrpcd_check+0x66c/0x870 [ptlrpc]
2014-03-19 13:38:45.937456 {RMP19Ma080953563} [mmcs]{2}.0.0: [c0000003c34d7e40] [8000000000ee9f58] .ptlrpcd+0x2b8/0x4c0 [ptlrpc]
2014-03-19 13:38:45.937733 {RMP19Ma080953563} [mmcs]{2}.0.0: [c0000003c34d7f90] [c00000000001b9a8] .kernel_thread+0x54/0x70
2014-03-19 13:38:45.938068 {RMP19Ma080953563} [mmcs]{2}.0.0: Instruction dump:
2014-03-19 13:38:45.938421 {RMP19Ma080953563} [mmcs]{2}.0.0: 89490000 88a90001 88c90002 89090003 88e90004 7c005214 89490005 7ca02a14 
2014-03-19 13:38:45.938714 {RMP19Ma080953563} [mmcs]{2}.0.0: 7c005a14 89690006 7cc53214 7ca02a14 &amp;lt;88090008&amp;gt; 7d064214 7cc53214 7ce83a14 
2014-03-19 13:38:45.939210 {RMP19Ma080953563} [mmcs]{2}.13.0: ^GMessage from syslogd@(none) at Mar 19 13:38:45 ...
2014-03-19 13:38:45.939508 {RMP19Ma080953563} [mmcs]{2}.13.0:  kernel:BUG: soft lockup - CPU#0 stuck for 66s! [ptlrpcd_1:3201]
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;While &lt;em&gt;most&lt;/em&gt; of the stacks that have dumped are in the &lt;tt&gt;__adler32&lt;/tt&gt; function, not all of them are. I think the ptlrpd thread is processing requests just fine, it&apos;s just failing to schedule itself, and thus consuming an absurd amount of CPU.&lt;/p&gt;

&lt;p&gt;As far as I can tell, &lt;b&gt;all&lt;/b&gt; of the ptlrpcd threads are being pinned to CPU 0 along with the ll_ping thread. Thus, when one of the ptlrpcd threads sits on CPU 0 for so long, it starves the ll_ping thread which leads to the evictions I am seeing. The fact that all these threads are being pinned to CPU 0 is probably a bug in and of itself, but even with that behavior, a ptlrpcd thread should never sit on a core for 60+ seconds.&lt;/p&gt;

&lt;p&gt;My theory is that async write requests are being queued up by the writing threads on the client node, and it&apos;s just taking the ptlrpcd thread an absurd amount of time to process all of them. Unfortunately, I can&apos;t seem to get a grip on how the client is supposed to work, so I haven&apos;t gathered any evidence to support this claim. Looking at the logs and the unstable page stats in proc, it does look like pages are being processed by the system just fine. If it wasn&apos;t for the soft lockup messages and evictions, I probably wouldn&apos;t have even noticed anything was wrong.&lt;/p&gt;

&lt;p&gt;Keep in mind, the client nodes on this system have 68 cores, are PPC64, and have 16K pages. The ptlrpcd threads seem to run only on core 0, but the threads submitting the write requests run on all cores.&lt;/p&gt;

&lt;p&gt;Can anybody give me a lead on what might be going wrong here? Is it possible that requests keep getting stuffed into the ptlrpcd thread&apos;s queue, keeping that thread running on the core without it ever scheduling? Looking at the ptlrpcd thread code, I don&apos;t see anyplace where it would schedule except for within the ptlrpcd function itself. So if the request list it is processing is &lt;b&gt;very&lt;/b&gt; long, is it conceivable that it would sit on a core for a &lt;b&gt;very&lt;/b&gt; long time processing the requests without scheduling? The client code is so difficult to wade through, it&apos;s hard to understand exactly how all the machinery is &lt;b&gt;supposed&lt;/b&gt; to work, let alone what might be going wrong.&lt;/p&gt;

&lt;p&gt;EDIT: I also should note, this is running the our local 2.4.0-15chaos tag.&lt;/p&gt;</description>
                <environment></environment>
        <key id="23809">LU-4790</key>
            <summary>CPU soft lockups (60+ seconds) in ptlrpcd threads running IOR resulting in eviction</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="1" iconUrl="https://jira.whamcloud.com/images/icons/priorities/blocker.svg">Blocker</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="3">Duplicate</resolution>
                                        <assignee username="bobijam">Zhenyu Xu</assignee>
                                    <reporter username="prakash">Prakash Surya</reporter>
                        <labels>
                    </labels>
                <created>Thu, 20 Mar 2014 17:29:03 +0000</created>
                <updated>Tue, 15 Dec 2015 22:28:16 +0000</updated>
                            <resolved>Mon, 31 Mar 2014 21:33:47 +0000</resolved>
                                                                        <due></due>
                            <votes>0</votes>
                                    <watches>8</watches>
                                                                            <comments>
                            <comment id="79897" author="prakash" created="Thu, 20 Mar 2014 17:31:49 +0000"  >&lt;p&gt;CC&apos;ing Jinshan since, afaik, he&apos;s the only one who understand the client code in detail.&lt;/p&gt;</comment>
                            <comment id="79902" author="pjones" created="Thu, 20 Mar 2014 17:40:58 +0000"  >&lt;p&gt;Bobijam&lt;/p&gt;

&lt;p&gt;Could you please advise with this one?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="79972" author="bobijam" created="Fri, 21 Mar 2014 09:52:49 +0000"  >&lt;p&gt;There is a refcount issue in ptlrpcd_queue_work() (referring to calling path osc_io_unplug_async()&lt;del&gt;&amp;gt;osc_io_unplug0()&lt;/del&gt;&amp;gt;ptlrpcd_queue_work(cli-&amp;gt;cl_writeback_work))&lt;/p&gt;

&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeHeader panelHeader&quot; style=&quot;border-bottom-width: 1px;&quot;&gt;&lt;b&gt;ptlrpcd_queue_work()&lt;/b&gt;&lt;/div&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;...
        &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (cfs_atomic_inc_return(&amp;amp;req-&amp;gt;rq_refcount) &amp;gt; 2) { &lt;span class=&quot;code-comment&quot;&gt;/* race */&lt;/span&gt;
                cfs_atomic_dec(&amp;amp;req-&amp;gt;rq_refcount);
                &lt;span class=&quot;code-keyword&quot;&gt;return&lt;/span&gt; -EBUSY;
        }
...
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;and the &lt;del&gt;&amp;gt;rq_refcount does not decreased in its -&amp;gt;rq_interpret_reply, i.e. work_interpreter(), so that all threads handling async IO does not accept requests any longer, and all requests falls back to sync handling ptlrpcd thread, and they choose using PDL_POLICY_SAME policy which means to use the same CPU ptlrpcd uses. (referring to calling path osc_setup()&lt;/del&gt;&amp;gt;brw_queue_work()&lt;del&gt;&amp;gt;osc_io_unplug(env, cli, NULL, PDL_POLICY_SAME) and brw_interpret()&lt;/del&gt;&amp;gt;osc_io_unplug(env, cli, NULL, PDL_POLICY_SAME)).&lt;/p&gt;

&lt;p&gt;Master branch has already fixed this issue by the side effect of patch &lt;a href=&quot;http://review.whamcloud.com/#/c/8922/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#/c/8922/&lt;/a&gt; , I back ported it here &lt;a href=&quot;http://review.whamcloud.com/9747&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/9747&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="80023" author="prakash" created="Fri, 21 Mar 2014 18:15:10 +0000"  >&lt;p&gt;Thanks! Initial testing is looking good. I wish I could better understand the problem and fix though.&lt;/p&gt;

&lt;p&gt;I&apos;m still seeing CPU soft lockups in ptlrpcd threads, but I&apos;m pretty sure we&apos;ve always seen them on these machines. With that said, they&apos;re no longer pinned to CPU 0 and the evictions have gone away (at least in my testing so far).&lt;/p&gt;

&lt;p&gt;Appreciate the help.&lt;/p&gt;</comment>
                            <comment id="80047" author="liang" created="Sat, 22 Mar 2014 04:04:43 +0000"  >&lt;p&gt;bobijam, I think it would be helpful if you can also add description of this bug into the patch, thanks.&lt;/p&gt;</comment>
                            <comment id="80660" author="pjones" created="Mon, 31 Mar 2014 21:33:47 +0000"  >&lt;p&gt;Seems to be a duplicate of &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-4509&quot; title=&quot;clio can be stuck in osc_extent_wait&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-4509&quot;&gt;&lt;del&gt;LU-4509&lt;/del&gt;&lt;/a&gt;&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10010">
                    <name>Duplicate</name>
                                            <outwardlinks description="duplicates">
                                        <issuelink>
            <issuekey id="22805">LU-4509</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzwi1b:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>13189</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>