<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:40:36 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-11062] Backtrace stack printing is broken in RHEL 7.5</title>
                <link>https://jira.whamcloud.com/browse/LU-11062</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;It looks like struct stacktrace_ops is no longer there in rhel7.5 so our detection of if dump_stack would work is no longer working.&lt;/p&gt;

&lt;p&gt;We get this now:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[29185.979935] LNet: Service thread pid 9373 was inactive for 40.04s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
[29185.981691] LNet: 31237:0:(linux-debug.c:185:libcfs_call_trace()) can&apos;t show stack: kernel doesn&apos;t export show_task
[29185.982826] LustreError: dumping log to /tmp/lustre-log.1527605717.9373
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Given that struct stacktrace_ops are still there in mainline kernels, this seems something specific to rhel7.5 and we need to find another way of detecting this I guess.&lt;/p&gt;</description>
                <environment></environment>
        <key id="52424">LU-11062</key>
            <summary>Backtrace stack printing is broken in RHEL 7.5</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="ys">Yang Sheng</assignee>
                                    <reporter username="green">Oleg Drokin</reporter>
                        <labels>
                    </labels>
                <created>Tue, 29 May 2018 16:56:21 +0000</created>
                <updated>Thu, 13 Sep 2018 12:40:50 +0000</updated>
                            <resolved>Mon, 13 Aug 2018 13:58:08 +0000</resolved>
                                                    <fixVersion>Lustre 2.12.0</fixVersion>
                    <fixVersion>Lustre 2.10.5</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>10</watches>
                                                                            <comments>
                            <comment id="228780" author="ofaaland" created="Tue, 29 May 2018 17:38:17 +0000"  >&lt;p&gt;I may be misreading it, but it looks to me like this only affects cases where one thread (e.g. watchdog) is attempting to dump the stack of another thread, and stacktrace_ops/print_trace_ops are used.&lt;/p&gt;

&lt;p&gt;In the case where a thread is dumping its own thread, dump_stack() is called, and still works, I believe.&lt;/p&gt;</comment>
                            <comment id="231322" author="pjones" created="Thu, 2 Aug 2018 17:10:52 +0000"  >&lt;p&gt;Yang Sheng&lt;/p&gt;

&lt;p&gt;Could you please investigate?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="231326" author="adilger" created="Thu, 2 Aug 2018 18:02:55 +0000"  >&lt;p&gt;Whomever is working on this, please also update the error message to print &lt;tt&gt;kernel doesn&apos;t export dump_trace()&lt;/tt&gt; since &lt;tt&gt;show_stack()&lt;/tt&gt; hasn&apos;t existed for a long time.&lt;/p&gt;

&lt;p&gt;It looks like the relevant kernel commit is:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;commit e18bcccd1a4ecb41e99678e002ef833586185bf1
Author:     Josh Poimboeuf &amp;lt;jpoimboe@redhat.com&amp;gt;
AuthorDate: Fri Sep 16 14:18:16 2016 -0500
Commit:     Ingo Molnar &amp;lt;mingo@kernel.org&amp;gt;
CommitDate: Tue Sep 20 08:29:34 2016 +0200

    x86/dumpstack: Convert show_trace_log_lvl() to use the new unwinder
    
    Convert show_trace_log_lvl() to use the new unwinder.  dump_trace() has
    been deprecated.
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="231327" author="adilger" created="Thu, 2 Aug 2018 18:05:13 +0000"  >&lt;p&gt;This was disabled in Lustre via:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;commit 70d70c4f541c84adc88c372d670cd3a7fa8bc91f
Author:     Dmitry Eremin &amp;lt;dmitry.eremin@intel.com&amp;gt;

    LU-9183 libcfs: handle dump_trace() and related callbacks removal
    
    In kernel version 4.8 commit c8fe4609827aedc9c4b45de80e7cdc8ccfa8541b all previous
    users of dump_trace() have been converted to use the new unwind interfaces,
    so the dump_trace() and the related print_context_stack() and
    print_context_stack_bp() callback functions were removed.
    
    Change-Id: Ifa7a112d622b23f733d6daab05f9838afdf31a86
    Reviewed-on: https://review.whamcloud.com/25816
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;It seems we need to update to the new kernel API, since this was backported to the RHEL7.5 kernel.&lt;/p&gt;</comment>
                            <comment id="231328" author="simmonsja" created="Thu, 2 Aug 2018 19:06:02 +0000"  >&lt;p&gt;Maybe its time to look at using the hang checker available in the kernel?&lt;/p&gt;</comment>
                            <comment id="231349" author="adilger" created="Thu, 2 Aug 2018 22:18:45 +0000"  >&lt;p&gt;James, do you have any pointers to the kernel mechanism?  We already get NMI watchdogs from the kernel, but those are when the thread doesn&apos;t schedule for a long time.  Our current code can dump the stack on another service thread that is not making progress, even though it isn&apos;t totally dead (e.g. in a sleep/check loop).  The thread can &quot;ping&quot; the watchdog to tell it is is alive, and lack of pings == lack of progress.  If the kernel can do something similar, especially if it&apos;s been around a while, then I&apos;d be happy to see it.&lt;/p&gt;</comment>
                            <comment id="231357" author="green" created="Fri, 3 Aug 2018 02:25:19 +0000"  >&lt;p&gt;hangcheck timer reports on idle threads that did not schedule in a while or some such (it needs to be compiled in and enabled), but it&apos;s not quite what we need since our timeouts are different and we have a bit different model of when to start the countdown and when to stop it.&lt;/p&gt;</comment>
                            <comment id="231362" author="simmonsja" created="Fri, 3 Aug 2018 04:05:17 +0000"  >&lt;p&gt;I was talking to Neil and he recommend we use save_stack_trace_tsk() since it been around since 2.6.29.&lt;/p&gt;

&lt;p&gt;Also of note Shadow pointed out that the kernel does have a soft lockup thread (CONFIG_LOCKUP_DETECTOR). It appears enabled for RHEL kernels. Their is also a detect hung_task functionality.&lt;/p&gt;</comment>
                            <comment id="231497" author="scadmin" created="Mon, 6 Aug 2018 13:45:16 +0000"  >&lt;p&gt;FWIW we got hung tasks in &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-11082&quot; title=&quot;stuck threads on MDS&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-11082&quot;&gt;&lt;del&gt;LU-11082&lt;/del&gt;&lt;/a&gt; and got this message and no stack traces. in retrospect I guess I should have tried sysrq to get stack traces before rebooting. I&apos;ve got used to Lustre doing it for me.&lt;/p&gt;

&lt;p&gt;cheers,&lt;br/&gt;
robin&lt;/p&gt;</comment>
                            <comment id="231590" author="gerrit" created="Tue, 7 Aug 2018 16:36:50 +0000"  >&lt;p&gt;Yang Sheng (ys@whamcloud.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/32952&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/32952&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-11062&quot; title=&quot;Backtrace stack printing is broken in RHEL 7.5&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-11062&quot;&gt;&lt;del&gt;LU-11062&lt;/del&gt;&lt;/a&gt; libcfs: use save_stack_trace for stack dump&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: bc344d10030aa60385f0577a855993733b38c916&lt;/p&gt;</comment>
                            <comment id="231591" author="ys" created="Tue, 7 Aug 2018 16:42:23 +0000"  >&lt;p&gt;I have submitted a patch for this ticket. Just use save_stack_trace_tsk for backtrace dump. The obvious problem is that unable to judge a address whether reliable or not. This will make things confused in some case.&lt;/p&gt;</comment>
                            <comment id="231765" author="gerrit" created="Thu, 9 Aug 2018 21:46:48 +0000"  >&lt;p&gt;James Nunez (jnunez@whamcloud.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/32972&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/32972&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-11062&quot; title=&quot;Backtrace stack printing is broken in RHEL 7.5&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-11062&quot;&gt;&lt;del&gt;LU-11062&lt;/del&gt;&lt;/a&gt; libcfs: use save_stack_trace for stack dump&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: b2_10&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 7232c445fe30f6500f6f731ef8ffad617490eb68&lt;/p&gt;</comment>
                            <comment id="231846" author="gerrit" created="Mon, 13 Aug 2018 01:13:43 +0000"  >&lt;p&gt;Oleg Drokin (green@whamcloud.com) merged in patch &lt;a href=&quot;https://review.whamcloud.com/32952/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/32952/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-11062&quot; title=&quot;Backtrace stack printing is broken in RHEL 7.5&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-11062&quot;&gt;&lt;del&gt;LU-11062&lt;/del&gt;&lt;/a&gt; libcfs: use save_stack_trace for stack dump&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: afedf9343686504c89f2e28cf6133540166f2347&lt;/p&gt;</comment>
                            <comment id="231859" author="pjones" created="Mon, 13 Aug 2018 13:58:08 +0000"  >&lt;p&gt;Landed for 2.12&lt;/p&gt;</comment>
                            <comment id="231934" author="gerrit" created="Tue, 14 Aug 2018 18:29:49 +0000"  >&lt;p&gt;John L. Hammond (jhammond@whamcloud.com) merged in patch &lt;a href=&quot;https://review.whamcloud.com/32972/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/32972/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-11062&quot; title=&quot;Backtrace stack printing is broken in RHEL 7.5&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-11062&quot;&gt;&lt;del&gt;LU-11062&lt;/del&gt;&lt;/a&gt; libcfs: use save_stack_trace for stack dump&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: b2_10&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: a2af371bd8a79e293a9ba95b8016de92040101a6&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="44318">LU-9183</issuekey>
        </issuelink>
                            </outwardlinks>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="52957">LU-11241</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzzxxj:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>