<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:53:42 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-5695] watchdog dispatch thread disappears</title>
                <link>https://jira.whamcloud.com/browse/LU-5695</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Sometimes lc_watchdogd disappears w/o any messages and lustre logs are not dumped after watchdog triggered.&lt;/p&gt;

&lt;p&gt;How the correct behaviour  should look:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;LNet: Service thread pid 7096 was inactive for 10.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
Pid: 7096, comm: lctl

Call Trace:
 [&amp;lt;ffffffff81528eb2&amp;gt;] schedule_timeout+0x192/0x2e0
 [&amp;lt;ffffffff81084220&amp;gt;] ? process_timeout+0x0/0x10
 [&amp;lt;ffffffffa0380df7&amp;gt;] proc_trigger_watchdog+0x67/0x80 [libcfs]
 [&amp;lt;ffffffff811fd8e7&amp;gt;] proc_sys_call_handler+0x97/0xd0
 [&amp;lt;ffffffff811fd934&amp;gt;] proc_sys_write+0x14/0x20
 [&amp;lt;ffffffff81188f68&amp;gt;] vfs_write+0xb8/0x1a0
 [&amp;lt;ffffffff81189861&amp;gt;] sys_write+0x51/0x90
 [&amp;lt;ffffffff8152b2be&amp;gt;] ? do_device_not_available+0xe/0x10
 [&amp;lt;ffffffff8100b072&amp;gt;] system_call_fastpath+0x16/0x1b

LustreError: dumping log to /tmp/lustre-log.1411548646.7096
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;


&lt;p&gt;and how it may look in the kernel logs when lustre logs are not dumped:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Lustre: DEBUG MARKER: == sanity test 242: Check that watchdog causes kernel log dump == 09:19:38 (1411550378)
LNet: Service thread pid 12742 stopped after 20.00s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources).
Lustre: DEBUG MARKER: sanity test_242: @@@@@@ FAIL: Lustre log wasn&apos;t dumped
Lustre: DEBUG MARKER: == sanity test complete, duration 29 sec == 09:20:01 (1411550401)
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</description>
                <environment></environment>
        <key id="26799">LU-5695</key>
            <summary>watchdog dispatch thread disappears</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="simmonsja">James A Simmons</assignee>
                                    <reporter username="zam">Alexander Zarochentsev</reporter>
                        <labels>
                            <label>patch</label>
                    </labels>
                <created>Wed, 1 Oct 2014 13:26:47 +0000</created>
                <updated>Tue, 27 Feb 2018 04:28:31 +0000</updated>
                            <resolved>Tue, 27 Feb 2018 04:28:31 +0000</resolved>
                                                    <fixVersion>Lustre 2.11.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>6</watches>
                                                                            <comments>
                            <comment id="95412" author="zam" created="Wed, 1 Oct 2014 13:34:55 +0000"  >&lt;p&gt;After a closer look at libcfs/libcfs/watchdog.c it was found that &lt;b&gt;LCW_FLAG_STOP&lt;/b&gt; flag in &lt;b&gt;lcw_flags&lt;/b&gt; variable is only set and never gets cleared.&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[ 17:33:12 ] $ git grep lcw_flags
libcfs/libcfs/watchdog.c:static unsigned long lcw_flags = 0;
libcfs/libcfs/watchdog.c:       if (test_bit(LCW_FLAG_STOP, &amp;amp;lcw_flags))
libcfs/libcfs/watchdog.c:               if (test_bit(LCW_FLAG_STOP, &amp;amp;lcw_flags)) {
libcfs/libcfs/watchdog.c:       set_bit(LCW_FLAG_STOP, &amp;amp;lcw_flags);
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt; So if &lt;b&gt;lcw_refcount&lt;/b&gt; reaches zero and the watchdog thread is stopped by lcw_dispatch_stop() , it will be never working again (it exists immediately after start) until the modules reload or system restart.&lt;/p&gt;</comment>
                            <comment id="95413" author="zam" created="Wed, 1 Oct 2014 13:36:45 +0000"  >&lt;p&gt;the fix is like:&lt;/p&gt;

&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;diff --git a/libcfs/libcfs/watchdog.c b/libcfs/libcfs/watchdog.c
index ed1acf7..e71b48a 100644
--- a/libcfs/libcfs/watchdog.c
+++ b/libcfs/libcfs/watchdog.c
@@ -330,6 +330,7 @@ &lt;span class=&quot;code-keyword&quot;&gt;static&lt;/span&gt; void lcw_dispatch_stop(void)
        wake_up(&amp;amp;lcw_event_waitq);
 
        wait_for_completion(&amp;amp;lcw_stop_completion);
+       clear_bit(LCW_FLAG_STOP, &amp;amp;lcw_flags);
 
        CDEBUG(D_INFO, &lt;span class=&quot;code-quote&quot;&gt;&quot;watchdog dispatcher has shut down.\n&quot;&lt;/span&gt;);
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;a proper patch will be submitted soon&lt;/p&gt;</comment>
                            <comment id="95417" author="zam" created="Wed, 1 Oct 2014 14:20:36 +0000"  >&lt;p&gt;patch &lt;a href=&quot;http://review.whamcloud.com/#/c/12155/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#/c/12155/&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="95540" author="cliffw" created="Thu, 2 Oct 2014 16:45:55 +0000"  >&lt;p&gt;I will monitor this issue&lt;/p&gt;</comment>
                            <comment id="99698" author="cliffw" created="Thu, 20 Nov 2014 17:25:43 +0000"  >&lt;p&gt;The patch has failed review, can you address the issues?&lt;/p&gt;</comment>
                            <comment id="162399" author="cliffw" created="Thu, 18 Aug 2016 17:09:18 +0000"  >&lt;p&gt;Bug out of date, no patch update. Closing&lt;/p&gt;</comment>
                            <comment id="162400" author="simmonsja" created="Thu, 18 Aug 2016 17:20:57 +0000"  >&lt;p&gt;Please reopen. I plan to update the patch but I was waiting until the port to sysfs happens for Lustre 2.10.&lt;/p&gt;</comment>
                            <comment id="162428" author="cliffw" created="Thu, 18 Aug 2016 19:17:18 +0000"  >&lt;p&gt;Still waiting for a patch&lt;/p&gt;</comment>
                            <comment id="162599" author="zam" created="Sat, 20 Aug 2016 12:18:56 +0000"  >&lt;p&gt;procfs (or sysfs) part of the patch is only for testing, I think at least the actual fix from the patch can be landed.&lt;/p&gt;</comment>
                            <comment id="220598" author="cliffw" created="Fri, 9 Feb 2018 16:56:42 +0000"  >&lt;p&gt;Old issue, already fixed&lt;/p&gt;</comment>
                            <comment id="221180" author="simmonsja" created="Fri, 16 Feb 2018 17:36:36 +0000"  >&lt;p&gt;Is this fixed?&lt;/p&gt;</comment>
                            <comment id="221184" author="simmonsja" created="Fri, 16 Feb 2018 18:15:36 +0000"  >&lt;p&gt;Since Alex is okay with a one line fix I refreshed the patch. Very simple and should be landed soon.&lt;/p&gt;</comment>
                            <comment id="221246" author="zam" created="Sat, 17 Feb 2018 20:12:20 +0000"  >&lt;p&gt;yes, lets go with the one-line fix. &lt;/p&gt;</comment>
                            <comment id="221742" author="gerrit" created="Tue, 27 Feb 2018 03:42:22 +0000"  >&lt;p&gt;Oleg Drokin (oleg.drokin@intel.com) merged in patch &lt;a href=&quot;https://review.whamcloud.com/12155/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/12155/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-5695&quot; title=&quot;watchdog dispatch thread disappears&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-5695&quot;&gt;&lt;del&gt;LU-5695&lt;/del&gt;&lt;/a&gt; libcfs: watchdog dispatch thread fix&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 1947bc08c0709ad80611dc65785ccb8dbf7f7214&lt;/p&gt;</comment>
                            <comment id="221773" author="pjones" created="Tue, 27 Feb 2018 04:28:31 +0000"  >&lt;p&gt;Landed for 2.11&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="36381">LU-8066</issuekey>
        </issuelink>
                            </outwardlinks>
                                                                <inwardlinks description="is related to">
                                                        </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzwxi7:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>15940</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>