<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:53:17 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-5644] CPU stalls/heartbeat loss in cl_locks_prune</title>
                <link>https://jira.whamcloud.com/browse/LU-5644</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Over the last several months we&apos;ve had intermittent reports of CPU stalls and heartbeat failures where the offending process was executing somewhere in cl_locks_prune. Given enough time, the stuck process eventually terminated. We finally captured a dump where the node was STONITH&apos;d at the time of heartbeat loss. I think the dump explains the stalls.&lt;/p&gt;

&lt;p&gt;The system as a whole was experiencing network problems that prevented the client node from connecting to an OSS. At the time of the dump, the job had completed and the only thing running was a node health  test against the Lustre file system. ps output shows the test had been on the runqueue for an excessive amount of time.&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;&amp;gt; crash&amp;gt; bt
&amp;gt; PID: 34321  TASK: ffff880fe1d517f0  CPU: 15  COMMAND: &quot;nhc_fs_test&quot;
&amp;gt;  #0 [ffff880fe1e2d6d8] schedule at ffffffff8141d9f7
&amp;gt;  #1 [ffff880fe1e2d6e0] cl_env_info at ffffffffa0387b55 [obdclass]
&amp;gt;  #2 [ffff880fe1e2d6f0] cl_lock_counters at ffffffffa038fa66 [obdclass]
&amp;gt;  #3 [ffff880fe1e2d710] cl_lock_mutex_tail at ffffffffa038fae1 [obdclass]
&amp;gt;  #4 [ffff880fe1e2d730] cl_lock_mutex_get at ffffffffa0390732 [obdclass]
&amp;gt;  #5 [ffff880fe1e2d760] cl_locks_prune at ffffffffa0392001 [obdclass]
&amp;gt;  #6 [ffff880fe1e2d800] lov_delete_raid0 at ffffffffa07a3d31 [lov]
&amp;gt;  #7 [ffff880fe1e2d8b0] lov_object_delete at ffffffffa07a30f9 [lov]
&amp;gt;  #8 [ffff880fe1e2d8d0] lu_object_free at ffffffffa037fd75 [obdclass]
&amp;gt;  #9 [ffff880fe1e2d940] lu_object_put at ffffffffa03802ae [obdclass]
&amp;gt; #10 [ffff880fe1e2d9a0] cl_object_put at ffffffffa03890ae [obdclass]
&amp;gt; #11 [ffff880fe1e2d9b0] cl_inode_fini at ffffffffa087b56d [lustre]
&amp;gt; #12 [ffff880fe1e2da80] ll_clear_inode at ffffffffa0842210 [lustre]
&amp;gt; #13 [ffff880fe1e2dab0] ll_delete_inode at ffffffffa08428fd [lustre]
&amp;gt; #14 [ffff880fe1e2dae0] evict at ffffffff81171c11
&amp;gt; #15 [ffff880fe1e2db10] iput at ffffffff81172082
&amp;gt; #16 [ffff880fe1e2db40] ll_d_iput at ffffffffa0812af6 [lustre]
&amp;gt; #17 [ffff880fe1e2db80] d_kill at ffffffff8116c96b
&amp;gt; #18 [ffff880fe1e2dbb0] dput at ffffffff8116fd21
&amp;gt; #19 [ffff880fe1e2dbe0] fput at ffffffff8115a5a7
&amp;gt; #20 [ffff880fe1e2dc20] filp_close at ffffffff811563f3
&amp;gt; #21 [ffff880fe1e2dc50] put_files_struct at ffffffff81051e24
&amp;gt; #22 [ffff880fe1e2dc90] exit_files at ffffffff81051ed3
&amp;gt; #23 [ffff880fe1e2dcc0] do_exit at ffffffff810538ec
&amp;gt; #24 [ffff880fe1e2dd60] do_group_exit at ffffffff810540dc
&amp;gt; #25 [ffff880fe1e2dda0] get_signal_to_deliver at ffffffff81065463
&amp;gt; #26 [ffff880fe1e2de40] do_notify_resume at ffffffff81002360
&amp;gt; #27 [ffff880fe1e2df50] int_signal at ffffffff81427ff0
&amp;gt;     RIP: 00007ffff7b33300  RSP: 00007fffffffe908  RFLAGS: 00000246
&amp;gt;     RAX: fffffffffffffffc  RBX: 0000000000000000  RCX: ffffffffffffffff
&amp;gt;     RDX: 0000000000000060  RSI: 00000000004037c0  RDI: 0000000000000003
&amp;gt;     RBP: 0000000000000000   R8: 000000000060aaa0   R9: 746c61656865646f
&amp;gt;     R10: 00007fffffffe6b0  R11: 0000000000000246  R12: 0000000000000003
&amp;gt;     R13: 000000000060aaa0  R14: 00000000ffffffff  R15: 00000000ffffffff
&amp;gt;     ORIG_RAX: ffffffffffffffff  CS: 0033  SS: 002b

&amp;gt; crash&amp;gt; ps -l | grep &apos;\^&apos;
&amp;gt; ^[47137739341345] [IN]  PID: 459    TASK: ffff8810335b67f0  CPU: 12  COMMAND: &quot;kworker/12:1&quot;
&amp;gt; ^[47137739340142] [IN]  PID: 458    TASK: ffff8810335b6040  CPU: 13  COMMAND: &quot;kworker/13:1&quot;
[skip]
&amp;gt; ^[47136767282677] [IN]  PID: 462    TASK: ffff88083366e040  CPU: 9   COMMAND: &quot;kworker/9:1&quot;
&amp;gt; ^[47093087128394] [RU]  PID: 34321  TASK: ffff880fe1d517f0  CPU: 15  COMMAND: &quot;nhc_fs_test&quot;
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Dump diving yields these values for various variables:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;cl_env = ffff8810372a1e58
cl_object = ffff881037443f50
cl_object_header = ffff881037443ed0
cl_lock = ffff880fe3733df8
osc_lock = ffff8810395dcef8
cancel = 1
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Of particular interest are the cll_holds and cll_flags fields.&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;crash&amp;gt; cl_lock.cll_flags ffff880fe3733df8
   cll_flags = 6      /* 6 = CLF_CANCELPEND &amp;amp; CLF_DOOMED */
crash&amp;gt; cl_lock.cll_holds ffff880fe3733df8
   cll_holds = 1
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;CLF_CANCELPEND is set by cl_lock_cancel when cll_holds !=0.&lt;br/&gt;
CLF_DOOMED is set by cl_lock_delete when cll_holds !=0.&lt;br/&gt;
Because these flags are set, we can infer that the while loop in cl_locks_prune has executed more than once for this particular lock. cl_locks_prune won&apos;t exit the loop until cll_holds becomes 0. Looping here for an extended period  would certainly explain the heartbeat fault since nothing in the loop yields the CPU. &lt;/p&gt;

&lt;p&gt;cll_holds is incremented in osc_lock_enqueue (among other places). Scanning the ptlrpc_request_set lists shows  there&apos;s an LDLM_ENQUEUE rpc waiting to be sent (Lustre is wating for the connection to recover after the network errors).&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;&amp;gt; crash&amp;gt; load ptlrpc.sial
&amp;gt; crash&amp;gt; ptlrpc
&amp;gt; 
&amp;gt; Sent RPCS: ptlrpc_request_set.set_requests-&amp;gt;rq_set_chain
&amp;gt; thread        ptlrpc_request      pid xid                   nid                opc  phase  bulk  sent/deadline
&amp;gt; ===============================================================================================
&amp;gt; ptlrpcd_1:    ffff881038cfdc00      0 x1475337000695248     10.10.100.15@o2ib1    6 RPC    0:0 0/0
&amp;gt; ptlrpcd_29:   ffff880fe380e000      0 x1475337000694880     10.10.100.15@o2ib1  101 RPC    0:0 0/0
&amp;gt; ===============================================================================================
opcode 101 == LDLM_ENQUEUE

&amp;gt; crash&amp;gt; ptlrpc_request.rq_async_args ffff880fe380e000
&amp;gt;   rq_async_args = {
&amp;gt;     pointer_arg = {0xffff88103bbffc00, 0xffff8810395dcf60, 0xffffffffa0716bc0 &amp;lt;osc_lock_upcall&amp;gt;, 0xffff8810395dcef8, 0xffff8810395dcf28, 0xffff8810395dcf68, 0xffff8810395dcf70, 0x0, 0x0, 0x0, 0x0}, 
&amp;gt;     space = {18446612202036132864, 18446612201996144480, 18446744072106372032, 18446612201996144376, 18446612201996144424, 18446612201996144488, 18446612201996144496}
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Note the osc_lock arg following the upcall arg in the LDLM_ENQUEUE request matches the osc_lock referenced by the cl_lock in cl_locks_prune. So cl_locks_prune is looping on the CPU waiting for an rpc to complete and release its hold on the cl_lock. &lt;/p&gt;

&lt;p&gt;I think the solution is to simply call cond_resched in cl_locks_prune when cll_hold !=0. Patch in progress.&lt;/p&gt;</description>
                <environment>Client: SLES11SP2/SP3 Lustre 2.5.0/2.5.1 </environment>
        <key id="26663">LU-5644</key>
            <summary>CPU stalls/heartbeat loss in cl_locks_prune</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="jay">Jinshan Xiong</assignee>
                                    <reporter username="amk">Ann Koehler</reporter>
                        <labels>
                            <label>patch</label>
                    </labels>
                <created>Fri, 19 Sep 2014 21:54:18 +0000</created>
                <updated>Tue, 22 Dec 2015 03:32:52 +0000</updated>
                            <resolved>Wed, 17 Jun 2015 23:39:07 +0000</resolved>
                                    <version>Lustre 2.5.0</version>
                                    <fixVersion>Lustre 2.5.5</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>5</watches>
                                                                            <comments>
                            <comment id="94557" author="amk" created="Fri, 19 Sep 2014 22:00:18 +0000"  >&lt;p&gt;Dump uploaded to ftp.whamcloud.com:/uploads/&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-5644&quot; title=&quot;CPU stalls/heartbeat loss in cl_locks_prune&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-5644&quot;&gt;&lt;del&gt;LU-5644&lt;/del&gt;&lt;/a&gt;/&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-5644&quot; title=&quot;CPU stalls/heartbeat loss in cl_locks_prune&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-5644&quot;&gt;&lt;del&gt;LU-5644&lt;/del&gt;&lt;/a&gt;_cl_locks_prune.tgz&lt;/p&gt;</comment>
                            <comment id="94628" author="amk" created="Mon, 22 Sep 2014 15:20:16 +0000"  >&lt;p&gt;Forgot to mention that cll_users == 0 so l_wait_event is not being executed.&lt;/p&gt;

&lt;p&gt;crash&amp;gt;  cl_lock.cll_users ffff880fe3733df8&lt;br/&gt;
  cll_users = 0&lt;/p&gt;</comment>
                            <comment id="94756" author="amk" created="Tue, 23 Sep 2014 18:01:37 +0000"  >&lt;p&gt;Patch: -&lt;a href=&quot;http://review.whamcloud.com/12023-&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/12023-&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Instead of doing a cond_resched when cll_holds != 0, I added the cll_holds condition to the l_wait_event already in cl_locks_prune. No point trying to delete the lock until both cll_users and cll_holds == 0.&lt;/p&gt;

&lt;p&gt;But I wonder, is the real bug that cll_users is 0 when cll_holds is not? &lt;/p&gt;</comment>
                            <comment id="94843" author="amk" created="Wed, 24 Sep 2014 16:04:47 +0000"  >&lt;p&gt;Adding a cll_holds condition to the l_wait_event doesn&apos;t work because cl_lock_hold_release deletes the cl_lock when it decrements the hold count. If cl_locks_prune were woken up at this point, it would reference a freed cl_lock structure.&lt;/p&gt;</comment>
                            <comment id="95100" author="jay" created="Fri, 26 Sep 2014 22:54:44 +0000"  >&lt;p&gt;Hi Ann,&lt;/p&gt;

&lt;p&gt;It&apos;s fine to wait for cll_holds to become zero and it won&apos;t reference a freed cl_lock because cl_locks_prune() holds a refcount to cl_lock.&lt;/p&gt;

&lt;p&gt;please try patch: &lt;a href=&quot;http://review.whamcloud.com/12080&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/12080&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="95194" author="amk" created="Mon, 29 Sep 2014 15:56:36 +0000"  >&lt;p&gt;Hi Jinshan,  &lt;/p&gt;

&lt;p&gt;Thanks for the patch. I appreciate the help. I&apos;ll get it added to our builds so it can be tested but I think it should fix the problem.&lt;/p&gt;</comment>
                            <comment id="95403" author="adilger" created="Wed, 1 Oct 2014 09:01:48 +0000"  >&lt;p&gt;Should &lt;a href=&quot;http://review.whamcloud.com/12023&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/12023&lt;/a&gt; be abandoned in light of &lt;a href=&quot;http://review.whamcloud.com/12080&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/12080&lt;/a&gt; or are they both needed?&lt;/p&gt;</comment>
                            <comment id="95411" author="amk" created="Wed, 1 Oct 2014 13:29:43 +0000"  >&lt;p&gt;I abandoned 12023. We&apos;re going with the 12080 patch by itself.&lt;/p&gt;</comment>
                            <comment id="100284" author="adilger" created="Mon, 1 Dec 2014 08:08:30 +0000"  >&lt;p&gt;Jinshan, can you please make a master version of &lt;a href=&quot;http://review.whamcloud.com/12080&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/12080&lt;/a&gt; since it doesn&apos;t cherry-pick cleanly from b2_5.  &lt;/p&gt;</comment>
                            <comment id="100385" author="jay" created="Tue, 2 Dec 2014 05:49:03 +0000"  >&lt;p&gt;It doesn&apos;t need this patch in master due to cl_lock refactoring patch.&lt;/p&gt;</comment>
                            <comment id="118849" author="amk" created="Wed, 17 Jun 2015 18:20:58 +0000"  >&lt;p&gt;This bug can be closed. The patch fixes the problem in b2_5 and is not needed in master.&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzwwrz:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>15816</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>