<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:53:09 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-5629] osp_sync_interpret() ASSERTION( rc || req-&gt;rq_transno ) failed</title>
                <link>https://jira.whamcloud.com/browse/LU-5629</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;One of our MDS nodes crashed to day with the following assertion:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;client.c:304:ptlrpc_at_adj_net_latency()) Reported service time 548 &amp;gt; total measured time 165
osp_sync.c:355:osp_sync_interpret())  ASSERTION( rc || req-&amp;gt;rq_transno ) failed
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Note that the two messages above were printed in the same second (as reported by syslog) and by the same kernel thread.  I don&apos;t know if the ptlrpc_at_adj_net_latency() message is actually related to the assertion or not, but the proximity makes it worth noting. &lt;/p&gt;

&lt;p&gt;There were a few OST to which the MDS lost and reestablished a connection a couple of minutes earlier in the log.&lt;/p&gt;

&lt;p&gt;The backtrace was:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;panic
lbug_with_loc
osp_sync_interpret
ptlrpc_check_set
ptlrpcd_check
ptlrpcd
kernel_thread
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;It was running lustre version 2.4.2-14chaos (see github.com/chaos/lustre).&lt;/p&gt;

&lt;p&gt;We cannot provide logs or crash dumps for this machine.&lt;/p&gt;</description>
                <environment>Lustre 2.4.2-14chaos (see github.com/chaos/lustre)</environment>
        <key id="26587">LU-5629</key>
            <summary>osp_sync_interpret() ASSERTION( rc || req-&gt;rq_transno ) failed</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="2" iconUrl="https://jira.whamcloud.com/images/icons/priorities/critical.svg">Critical</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="3">Duplicate</resolution>
                                        <assignee username="dmiter">Dmitry Eremin</assignee>
                                    <reporter username="morrone">Christopher Morrone</reporter>
                        <labels>
                            <label>llnl</label>
                    </labels>
                <created>Tue, 16 Sep 2014 00:47:34 +0000</created>
                <updated>Fri, 12 Jan 2018 13:20:23 +0000</updated>
                            <resolved>Fri, 12 Jan 2018 13:20:23 +0000</resolved>
                                    <version>Lustre 2.6.0</version>
                    <version>Lustre 2.4.2</version>
                    <version>Lustre 2.5.3</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>15</watches>
                                                                            <comments>
                            <comment id="94118" author="morrone" created="Tue, 16 Sep 2014 00:49:43 +0000"  >&lt;p&gt;In &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-5193&quot; title=&quot;2.6 DNE stress testing: osp_sync_interpret()) ASSERTION( rc || req-&amp;gt;rq_transno ) failed:&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-5193&quot;&gt;&lt;del&gt;LU-5193&lt;/del&gt;&lt;/a&gt; Cray reports hitting the same assertion under lustre 2.6.&lt;/p&gt;</comment>
                            <comment id="94135" author="liang" created="Tue, 16 Sep 2014 10:02:50 +0000"  >&lt;p&gt;FYI, I think a possible reason of ptlrpc_at_adj_net_latency() warning is because early reply is lost so RPC is expired, then client(it is OSP of MDS at here) resends the request, because reply of original request (on server) is still using the same rq_xid as match-bits of reply, so it can fit into the reposted reply buffer.&lt;/p&gt;

&lt;p&gt;If this happened, service time returned by original reply can be longer than execution time of the resent RPC. I&apos;m not sure if this is relevant to the assertion, but we at least should remove this warning and only put it in debug info.&lt;/p&gt;</comment>
                            <comment id="94140" author="pjones" created="Tue, 16 Sep 2014 12:25:31 +0000"  >&lt;p&gt;Dmitry is looking into this one&lt;/p&gt;</comment>
                            <comment id="95916" author="dmiter" created="Wed, 8 Oct 2014 11:36:18 +0000"  >&lt;p&gt;Commit e12b89a9e7d8409c2b624162760c2e7e3481d7be with fix was landed in 2.4.2.&lt;/p&gt;</comment>
                            <comment id="95921" author="pjones" created="Wed, 8 Oct 2014 13:26:51 +0000"  >&lt;p&gt;duplicate of lu-3892&lt;/p&gt;</comment>
                            <comment id="112327" author="morrone" created="Fri, 17 Apr 2015 23:41:42 +0000"  >&lt;p&gt;I am reopening this ticket because it does not appear that the issue was resolved as previously believed.  We are still seeing the same assertion with Lustre 2.5.3, which contains the patch from &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-3892&quot; title=&quot;osp_sync.c:356:osp_sync_interpret()) ASSERTION( req-&amp;gt;rq_transno == 0 ) failed&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-3892&quot;&gt;&lt;del&gt;LU-3892&lt;/del&gt;&lt;/a&gt; (commit 7f4a635, which landed well before 2.5.0).&lt;/p&gt;</comment>
                            <comment id="134028" author="simmonsja" created="Thu, 19 Nov 2015 23:55:03 +0000"  >&lt;p&gt;Looks like we just hit this on our 2.5.3+ production file system&lt;/p&gt;</comment>
                            <comment id="153262" author="weems2" created="Mon, 23 May 2016 21:04:31 +0000"  >&lt;p&gt;Wanted to report we hit this over the weekend on our 2.5.5 production file system here at LLNL.&lt;/p&gt;</comment>
                            <comment id="157982" author="ruth.klundt@gmail.com" created="Thu, 7 Jul 2016 15:41:23 +0000"  >&lt;p&gt;Add one more at snl last night, also running 2.5.5. &lt;/p&gt;</comment>
                            <comment id="158105" author="bzzz" created="Fri, 8 Jul 2016 09:37:46 +0000"  >&lt;p&gt;any logs/dumps?&lt;/p&gt;</comment>
                            <comment id="158119" author="ruth.klundt@gmail.com" created="Fri, 8 Jul 2016 13:59:45 +0000"  >&lt;p&gt;Attached server syslogs, the stack dumps were not captured unfortunately, but the location for that collection is mounted now in case it happens again.  &lt;/p&gt;

&lt;p&gt;There had been network issues earlier in the day, reportedly resolved by 4pm. &lt;/p&gt;

&lt;p&gt;fyi the number of clients on the fs is currently 6395. And the exact version of the software is &lt;br/&gt;
lustre: 2.5.5&lt;br/&gt;
kernel: patchless_client&lt;br/&gt;
build:  -6chaos-CHANGED-2.6.32-573.26.1.1chaos.ch5.4.x86_64&lt;/p&gt;
</comment>
                            <comment id="160806" author="charr" created="Thu, 4 Aug 2016 15:49:07 +0000"  >&lt;p&gt;Saw the same crash on 8/03/16. &lt;br/&gt;
lustre-2.5.5-6chaos_2.6.32_573.26.1.1chaos.ch5.4.x86_64.x86_64&lt;/p&gt;

&lt;p&gt;We have a 16GB dump available if need be.&lt;/p&gt;

&lt;p&gt;Not sure how related it is, but an OSS node suffered major hardware problems (MCEs) throughout the 30 minutes before the LBUG on the MDS. The MDS console log messages directly (~2 min) before the assertion were an evict/reconnect to that OST.&lt;/p&gt;</comment>
                            <comment id="170218" author="skirvan" created="Tue, 18 Oct 2016 19:12:32 +0000"  >&lt;p&gt;Exact same issue @ LANL, Lustre 2.5.5.&lt;/p&gt;</comment>
                            <comment id="218084" author="dmiter" created="Fri, 12 Jan 2018 08:44:59 +0000"  >&lt;p&gt;Probably the patch &lt;a href=&quot;https://review.whamcloud.com/30129/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/30129/&lt;/a&gt; should resolve this.&lt;/p&gt;</comment>
                            <comment id="218094" author="pjones" created="Fri, 12 Jan 2018 13:20:23 +0000"  >&lt;p&gt;Closing as a duplicate of &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-9135&quot; title=&quot;sanity test_313: osp_sync.c:571:osp_sync_interpret()) LBUG&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-9135&quot;&gt;&lt;del&gt;LU-9135&lt;/del&gt;&lt;/a&gt;&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="20807">LU-3892</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="33259">LU-7453</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="43915">LU-9135</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="25152">LU-5193</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="22169" name="LU-5629-syslog.bz2" size="178283" author="ruth.klundt@gmail.com" created="Fri, 8 Jul 2016 13:59:45 +0000"/>
                            <attachment id="26607" name="lbugmay2.zip" size="55972449" author="apargal" created="Fri, 5 May 2017 17:30:57 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10490" key="com.atlassian.jira.plugin.system.customfieldtypes:datepicker">
                        <customfieldname>End date</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Mon, 23 May 2016 00:47:34 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzwwcf:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>15744</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                        <customfield id="customfield_10493" key="com.atlassian.jira.plugin.system.customfieldtypes:datepicker">
                        <customfieldname>Start date</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Tue, 16 Sep 2014 00:47:34 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                    </customfields>
    </item>
</channel>
</rss>