<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:04:43 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-6953] LustreError: 50126:0:(mdt_handler.c:3409:mdt_recovery()) LBUG</title>
                <link>https://jira.whamcloud.com/browse/LU-6953</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;grove-mds1 crashed 2015-07-29 with the following LBUG:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;2015-07-29 03:05:17 LustreError: 50126:0:(mdt_handler.c:3409:mdt_recovery()) LBUG
2015-07-29 03:05:17 Call Trace:
2015-07-29 03:05:17 [&amp;lt;ffffffffa07b28f5&amp;gt;] libcfs_debug dumpstack+0x55/0x80 [libcfs]
2015-07-29 03:05:17 Jul 29 03:05:17 [&amp;lt;ffffffffa07b2ef7&amp;gt;] lbug_with_loc+0x47/0xb0 [libcfs]
2015-07-29 03:05:17 grove-mds1 kerne [&amp;lt;ffffffffa0fcf9d8&amp;gt;] mdt_handle_common+0x13d8/0x1470 [mdt]
2015-07-29 03:05:17 l: LustreError:  [&amp;lt;ffffffffa100b625&amp;gt;] mds_regular_handle+0x15/0x20 [mdt]
2015-07-29 03:05:17 50126:0:(mdt_han [&amp;lt;ffffffffa0b05095&amp;gt;] ptlrpc_server_handle_request+0x305/0xc00 [ptlrpc]
2015-07-29 03:05:17 dler.c:3409:mdt_ [&amp;lt;ffffffffa07b352e&amp;gt;] ? cfs_timer_arm+0xe/0x10 [libcfs]
2015-07-29 03:05:17 recovery()) LBUG [&amp;lt;ffffffffa07c4845&amp;gt;] ? lc_watchdog_touch+0x65/0x170 [libcfs]
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;It was preceded by a ptlrpc debug message&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;2015-07-29 03:05:17 Lustre:50126:0:(mdt_handler.c:4508:mdt_recovery()) @@@ rq_xid 15027...0684 matches last_xid, expected REPLAY or RESENT flag (0) req@ffff...d1400 x15027...0684/t0(0) o101-&amp;gt;28e0...cc83@172.20.15.14@o2ib500:0/0 lens 4616/0 e 0 to 0 dl 1438165072 ref 1 fl Interpret:/0/ffffffff rc 0/-1
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;For this system, I cannot extract bulk logs and add them to the ticket.  We do we have a crash dump and console logs, I can obtain specific information that would help.&lt;/p&gt;

&lt;p&gt;The mds was under severe memory pressure at the time of the lbug.  &lt;/p&gt;

&lt;p&gt;The MDS was responding very slowly at the time.  At 3:05:03 it appears to have dropped 84,316 timed out requests (output from one DEBUG_REQ() call from within ptlrpc_server_handle_request() appears in the console log, followed by Skipped 84315 previous similar messages).&lt;/p&gt;</description>
                <environment>lustre-2.5.4-4chaos_2.6.32_504.16.2.1chaos.ch5.3.x86_64.x86_64</environment>
        <key id="31349">LU-6953</key>
            <summary>LustreError: 50126:0:(mdt_handler.c:3409:mdt_recovery()) LBUG</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="5">Cannot Reproduce</resolution>
                                        <assignee username="tappro">Mikhail Pershin</assignee>
                                    <reporter username="ofaaland">Olaf Faaland</reporter>
                        <labels>
                            <label>llnl</label>
                    </labels>
                <created>Tue, 4 Aug 2015 16:52:53 +0000</created>
                <updated>Sat, 10 Oct 2015 14:18:29 +0000</updated>
                            <resolved>Mon, 5 Oct 2015 22:09:39 +0000</resolved>
                                                                        <due></due>
                            <votes>0</votes>
                                    <watches>5</watches>
                                                                            <comments>
                            <comment id="123233" author="pjones" created="Tue, 4 Aug 2015 17:42:08 +0000"  >&lt;p&gt;Mike&lt;/p&gt;

&lt;p&gt;Could you please advise here?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="123234" author="green" created="Tue, 4 Aug 2015 17:43:50 +0000"  >&lt;p&gt;So this is another case of &quot;client sent us something that we don&apos;t understand, let&apos;s panic&quot;, I guess.&lt;br/&gt;
We need to drop this LBUG a tthe very least as the first step and avoid this crash.&lt;br/&gt;
Though I am not sure how the condition might arise at all.&lt;/p&gt;

&lt;p&gt;Also what do you mean you cannot extract bulk logs, crash+ the module to extract the logs is not working?&lt;/p&gt;</comment>
                            <comment id="123239" author="morrone" created="Tue, 4 Aug 2015 18:01:11 +0000"  >&lt;p&gt;He meant mostly that those things cannot be shared outside of the lab.&lt;/p&gt;</comment>
                            <comment id="123429" author="tappro" created="Thu, 6 Aug 2015 05:58:34 +0000"  >&lt;p&gt;Olaf, is that possible to find more information about the request with that XID in log? Especially in client log where request was sent from. We have two possibilities here - request flag (RESENT or REPLAY) was dropped somehow or XID was assigned improperly. Client log may help to find out was that resent case or normal request.&lt;/p&gt;</comment>
                            <comment id="123553" author="ofaaland" created="Thu, 6 Aug 2015 23:56:11 +0000"  >&lt;p&gt;Mikhail, that XID doesn&apos;t appear in the client&apos;s console log.  The client with NID 172.20.15.14@o2ib500 logged nothing but the expected &quot;lost connection&quot; and &quot;connection restored&quot; messages during the hour leading up to the lbug.  Unfortunately we have no additional information from the client.&lt;/p&gt;

&lt;p&gt;I&apos;m extracting the lustre debug logs from the crash dump and I&apos;ll check for that XID and post anything I find.&lt;/p&gt;</comment>
                            <comment id="123557" author="ofaaland" created="Fri, 7 Aug 2015 01:17:19 +0000"  >&lt;p&gt;Debug logs from the mds crash dump had nothing obviously of interest.  The XID in question did not occur in any message other than the one given above.  Rest of the log messages are variations on request timed out, request took too long to process, etc.&lt;/p&gt;</comment>
                            <comment id="129332" author="tappro" created="Mon, 5 Oct 2015 17:12:10 +0000"  >&lt;p&gt;Were there other occurrences of this issue? There is not enough information to solve it, if it is happening regularly then it is possible to add more debug. &lt;/p&gt;</comment>
                            <comment id="129376" author="ofaaland" created="Mon, 5 Oct 2015 20:24:53 +0000"  >&lt;p&gt;Mikhail,&lt;/p&gt;

&lt;p&gt;This has not occurred again, so go ahead and close it.  If it happens again we can reopen it.&lt;/p&gt;

&lt;p&gt;thanks,&lt;br/&gt;
Olaf&lt;/p&gt;</comment>
                            <comment id="129385" author="pjones" created="Mon, 5 Oct 2015 22:09:39 +0000"  >&lt;p&gt;ok thanks&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzxjqv:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>