<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:41:34 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-4309] mds_intent_policy ASSERTION(new_lock != NULL) failed: op 0x8 lockh 0x0</title>
                <link>https://jira.whamcloud.com/browse/LU-4309</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;An MDT thread hit an assertion in mds_intent_policy in what otherwise appeared to be normal operation.&lt;/p&gt;

&lt;p&gt;I&apos;m attaching the kernel log messages after the LBUG. These are from the console. We have a crash dump from the node, but no lustre log files.&lt;/p&gt;

&lt;p&gt;Lustre build:&lt;br/&gt;
Nov 18 12:46:55 widow-mds2 kernel: [  387.597792] Lustre: Build Version: v1_8_9_WC1--CHANGED-2.6.18-348.3.1.el5.widow&lt;/p&gt;</description>
                <environment>RHEL 5.9/distro IB</environment>
        <key id="22233">LU-4309</key>
            <summary>mds_intent_policy ASSERTION(new_lock != NULL) failed: op 0x8 lockh 0x0</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="2" iconUrl="https://jira.whamcloud.com/images/icons/priorities/critical.svg">Critical</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="laisiyao">Lai Siyao</assignee>
                                    <reporter username="blakecaldwell">Blake Caldwell</reporter>
                        <labels>
                    </labels>
                <created>Mon, 25 Nov 2013 20:52:02 +0000</created>
                <updated>Mon, 10 Feb 2014 17:28:42 +0000</updated>
                            <resolved>Mon, 10 Feb 2014 17:28:42 +0000</resolved>
                                    <version>Lustre 1.8.9</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>6</watches>
                                                                            <comments>
                            <comment id="72265" author="johann" created="Mon, 25 Nov 2013 21:42:10 +0000"  >&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Nov 18 11:46:10 widow-mds2 kernel: [1726316.746981] LustreError: dumping log to /tmp/lustre-log.1384793170.9088
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Any chance to access /tmp/lustre-log.1384793170.9088?&lt;/p&gt;

</comment>
                            <comment id="72266" author="pjones" created="Mon, 25 Nov 2013 21:47:29 +0000"  >&lt;p&gt;Hongchao&lt;/p&gt;

&lt;p&gt;Could you please advise on this issue?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="72268" author="blakecaldwell" created="Mon, 25 Nov 2013 22:01:14 +0000"  >&lt;p&gt;Unfortunately, no logs from /tmp/ are left (on ramdisk).&lt;/p&gt;</comment>
                            <comment id="72537" author="pjones" created="Fri, 29 Nov 2013 18:48:43 +0000"  >&lt;p&gt;Lai&lt;/p&gt;

&lt;p&gt;I have realized that Hongchao is on vacation so could you please handle this one instead?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="72561" author="laisiyao" created="Mon, 2 Dec 2013 07:36:49 +0000"  >&lt;p&gt;Could you know which client getattr cause this ASSERT? If so, can you check the backtrace of the process on the client that is doing getattr?&lt;/p&gt;</comment>
                            <comment id="72627" author="blakecaldwell" created="Mon, 2 Dec 2013 18:16:33 +0000"  >&lt;p&gt;Without the lustre logs in /tmp, I won&apos;t be able to track down the client. Even if the client could be identified from the crash dump, then there is the problem of identifying what it was doing at the time.&lt;/p&gt;

&lt;p&gt;I see that the dmesg output is not very helpful, but that&apos;s all I have other than a crash dump.&lt;/p&gt;

&lt;p&gt;So that we are better prepared for these cases in the future, what information can be collected on the server side beside /tmp/lustre.*? Collecting client debug logs is very difficult due to the number of clients.  Would a ldlm_namespace_dump be helpful?  If the LBUG has already occurred are there any debug flags for /proc/sys/lnet/debug that would provide useful information? Since the offending request has already been made, does capturing +net +dlmtrace +rpctrace do any good?&lt;/p&gt;</comment>
                            <comment id="72675" author="laisiyao" created="Tue, 3 Dec 2013 03:05:38 +0000"  >&lt;p&gt;Hmm, there is not much we can do in this case IMO, since MDS crash will cause all system hang, and it&apos;s hard to trace back to the client. I&apos;ll do more reviews on related code to understand this assert.&lt;/p&gt;</comment>
                            <comment id="72786" author="laisiyao" created="Wed, 4 Dec 2013 07:17:17 +0000"  >&lt;p&gt;I am not able to find the problem in the code, and I composed a debug patch to dump request before this assert, could you apply it and to get more info upon this failure again?&lt;/p&gt;</comment>
                            <comment id="74327" author="jamesanunez" created="Sat, 4 Jan 2014 00:15:19 +0000"  >&lt;p&gt;Blake, &lt;/p&gt;

&lt;p&gt;Are you still seeing this assertion on your systems? If so, were you able to apply the patch to collect more information?&lt;/p&gt;

&lt;p&gt;Thanks, &lt;br/&gt;
James&lt;/p&gt;</comment>
                            <comment id="74416" author="blakecaldwell" created="Mon, 6 Jan 2014 18:58:18 +0000"  >&lt;p&gt;I haven&apos;t been able to apply this debug patch yet. The system has been stable, and as a result we haven&apos;t had an unschedule outage to apply that patch. So nothing at this time. I will try applying the debug patch to another system that we can take down sooner.&lt;/p&gt;</comment>
                            <comment id="74419" author="jamesanunez" created="Mon, 6 Jan 2014 19:23:51 +0000"  >&lt;p&gt;Blake, Thanks for the update.&lt;/p&gt;</comment>
                            <comment id="76609" author="hilljjornl" created="Mon, 10 Feb 2014 16:46:24 +0000"  >&lt;p&gt;So this filesystem is out of production (in a hold state before decommissioning); my assertion is that we should go ahead and close this issue &amp;#8211; even if we integrated the patch and ran the storage system with it for a while it would never get any client access and likely would not exercise the code path for the patch. Any objections?&lt;/p&gt;

&lt;p&gt;&amp;#8211;&lt;br/&gt;
-Jason&lt;/p&gt;</comment>
                            <comment id="76613" author="blakecaldwell" created="Mon, 10 Feb 2014 16:53:16 +0000"  >&lt;p&gt;Let&apos;s close it.&lt;/p&gt;</comment>
                            <comment id="76618" author="jamesanunez" created="Mon, 10 Feb 2014 17:28:42 +0000"  >&lt;p&gt;Thank you for the update. I will close this ticket.&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                            <attachment id="13884" name="0001-LU-4309-debug-debug-mds_intent_policy-assert.patch" size="956" author="laisiyao" created="Wed, 4 Dec 2013 07:17:17 +0000"/>
                            <attachment id="13874" name="widow-mds2_lbug.log" size="28403" author="blakecaldwell" created="Mon, 25 Nov 2013 20:52:02 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzwa27:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>11801</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10021"><![CDATA[2]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>