<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:02:05 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-6655] MDS LBUG: (ldlm_lib.c:2277:target_queue_recovery_request()) ASSERTION( req-&gt;rq_export-&gt;exp_lock_replay_needed ) failed</title>
                <link>https://jira.whamcloud.com/browse/LU-6655</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;While attempting a Lustre upgrade from 2.5.3 to 2.7.0 on our preproduction file system, we encountered this LBUG after first mounting the MDT while attempting to mount the OSTs for the first time. The first OST mounted fine and while mounting the second OST, the LBUG happened.&lt;/p&gt;

&lt;p&gt;There are most likely clients out there that haven&apos;t had the file system unmounted and have been trying to reconnect during this time.&lt;/p&gt;

&lt;p&gt;The information below has been extracted from the Red Hat crash log as we didn&apos;t have a serial console attached at the time.&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;&amp;lt;4&amp;gt;Lustre: 14008:0:(client.c:1939:ptlrpc_expire_one_request()) Skipped 2 previous similar messages
&amp;lt;0&amp;gt;LustreError: 31012:0:(ldlm_lib.c:2277:target_queue_recovery_request()) ASSERTION( req-&amp;gt;rq_export-&amp;gt;exp_lock_replay_needed ) failed: 
&amp;lt;0&amp;gt;LustreError: 31012:0:(ldlm_lib.c:2277:target_queue_recovery_request()) LBUG
&amp;lt;4&amp;gt;Pid: 31012, comm: mdt00_001
&amp;lt;4&amp;gt;
&amp;lt;4&amp;gt;Call Trace:
&amp;lt;4&amp;gt; [&amp;lt;ffffffffa03c8895&amp;gt;] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
&amp;lt;4&amp;gt; [&amp;lt;ffffffffa03c8e97&amp;gt;] lbug_with_loc+0x47/0xb0 [libcfs]
&amp;lt;4&amp;gt; [&amp;lt;ffffffffa0674d10&amp;gt;] target_queue_recovery_request+0xb00/0xc10 [ptlrpc]
&amp;lt;4&amp;gt; [&amp;lt;ffffffffa06b3c4c&amp;gt;] ? lustre_msg_get_version+0x8c/0x100 [ptlrpc]
&amp;lt;4&amp;gt; [&amp;lt;ffffffffa0714b3d&amp;gt;] tgt_request_handle+0xe8d/0x1000 [ptlrpc]
&amp;lt;3&amp;gt;LustreError: 11-0: play01-OST0000-osc-MDT0000: operation ost_connect to node 172.23.144.18@tcp failed: rc = -16
&amp;lt;4&amp;gt; [&amp;lt;ffffffffa06c45a1&amp;gt;] ptlrpc_main+0xe41/0x1960 [ptlrpc]
&amp;lt;4&amp;gt; [&amp;lt;ffffffffa06c3760&amp;gt;] ? ptlrpc_main+0x0/0x1960 [ptlrpc]
&amp;lt;4&amp;gt; [&amp;lt;ffffffff8109e66e&amp;gt;] kthread+0x9e/0xc0
&amp;lt;4&amp;gt; [&amp;lt;ffffffff8100c20a&amp;gt;] child_rip+0xa/0x20
&amp;lt;4&amp;gt; [&amp;lt;ffffffff8109e5d0&amp;gt;] ? kthread+0x0/0xc0
&amp;lt;4&amp;gt; [&amp;lt;ffffffff8100c200&amp;gt;] ? child_rip+0x0/0x20
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;After power cycling the affected MDT and  OSS and starting again, so far we&apos;ve not seen it again and recovery on the MDT is completed.&lt;/p&gt;

&lt;p&gt;The only other information I could potentially provide is a vmcore that Red Hat collected automatically as well as more lines from the vmcore-dmesg.txt file if required.&lt;/p&gt;</description>
                <environment>RHEL6, during upgrade from 2.5 to 2.7</environment>
        <key id="30398">LU-6655</key>
            <summary>MDS LBUG: (ldlm_lib.c:2277:target_queue_recovery_request()) ASSERTION( req-&gt;rq_export-&gt;exp_lock_replay_needed ) failed</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="2" iconUrl="https://jira.whamcloud.com/images/icons/priorities/critical.svg">Critical</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="bobijam">Zhenyu Xu</assignee>
                                    <reporter username="ferner">Frederik Ferner</reporter>
                        <labels>
                    </labels>
                <created>Wed, 27 May 2015 18:42:22 +0000</created>
                <updated>Sun, 6 May 2018 04:29:20 +0000</updated>
                            <resolved>Sun, 6 May 2018 04:17:01 +0000</resolved>
                                    <version>Lustre 2.7.0</version>
                                    <fixVersion>Lustre 2.12.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>6</watches>
                                                                            <comments>
                            <comment id="116586" author="pjones" created="Wed, 27 May 2015 19:00:31 +0000"  >&lt;p&gt;Bobijam&lt;/p&gt;

&lt;p&gt;Could you please assist with this one?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="116643" author="bobijam" created="Thu, 28 May 2015 07:16:40 +0000"  >&lt;p&gt;I think it&apos;s a dup of &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-5651&quot; title=&quot;ASSERTION( req-&amp;gt;rq_export-&amp;gt;exp_lock_replay_needed ) failed&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-5651&quot;&gt;&lt;del&gt;LU-5651&lt;/del&gt;&lt;/a&gt;. With all nodes upgraded to 2.7, the issue should be gone.&lt;/p&gt;</comment>
                            <comment id="116664" author="ferner" created="Thu, 28 May 2015 13:25:11 +0000"  >&lt;p&gt;I had looked at &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-5651&quot; title=&quot;ASSERTION( req-&amp;gt;rq_export-&amp;gt;exp_lock_replay_needed ) failed&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-5651&quot;&gt;&lt;del&gt;LU-5651&lt;/del&gt;&lt;/a&gt; but initially didn&apos;t think it was the same as all servers had been upgraded. Reading it again, I&apos;m not suspecting there&apos;s a client side patch which is not on our clients yet, so you might be right. Could I check that I read this right?&lt;/p&gt;

&lt;p&gt;Cheers,&lt;br/&gt;
Frederik&lt;/p&gt;</comment>
                            <comment id="116681" author="bobijam" created="Thu, 28 May 2015 14:33:09 +0000"  >&lt;p&gt;you are right, it&apos;s a client patch, a client w/o this patch connecting to upgraded server could LBUG the server.&lt;/p&gt;</comment>
                            <comment id="147492" author="haisong" created="Thu, 31 Mar 2016 19:57:22 +0000"  >
&lt;p&gt;We just hit the same LBUG today in our OSS. While I was searching inside Jira I found this ticket. What happened to us was, we shutdown OSS gracefully for maintenance while filesystem was still running. After we mount OSTs back on OSS, in about a minute we hit the LBUG. It looks like during recovery.&lt;/p&gt;

&lt;p&gt;&lt;span class=&quot;error&quot;&gt;&amp;#91;root@wombat-oss-20-5 ~&amp;#93;&lt;/span&gt;# &lt;br/&gt;
Message from syslogd@wombat-oss-20-5 at Mar 31 12:18:18 ...&lt;br/&gt;
 kernel:LustreError: 13701:0:(ldlm_lib.c:2277:target_queue_recovery_request()) ASSERTION( req-&amp;gt;rq_export-&amp;gt;exp_lock_replay_needed ) failed: &lt;/p&gt;

&lt;p&gt;Message from syslogd@wombat-oss-20-5 at Mar 31 12:18:18 ...&lt;br/&gt;
 kernel:LustreError: 13701:0:(ldlm_lib.c:2277:target_queue_recovery_request()) LBUG&lt;/p&gt;
</comment>
                            <comment id="147496" author="haisong" created="Thu, 31 Mar 2016 20:35:26 +0000"  >&lt;p&gt;By the way, we are running Lustre FE-2.7.1,  with ZFS 0.6.4.2, CentOS 6.6.7&lt;/p&gt;

&lt;p&gt;Haisong &lt;/p&gt;</comment>
                            <comment id="147542" author="bobijam" created="Fri, 1 Apr 2016 03:50:43 +0000"  >&lt;p&gt;Does all clients contains the &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-5651&quot; title=&quot;ASSERTION( req-&amp;gt;rq_export-&amp;gt;exp_lock_replay_needed ) failed&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-5651&quot;&gt;&lt;del&gt;LU-5651&lt;/del&gt;&lt;/a&gt; fix? That is a client issue which corrects the client restore state, otherwise server could be confused of the client&apos;s recovery state and hit this LBUG.&lt;/p&gt;</comment>
                            <comment id="147645" author="haisong" created="Fri, 1 Apr 2016 20:18:22 +0000"  >&lt;p&gt;Zhenyu,&lt;/p&gt;

&lt;p&gt;Our clients do have the pactch:&lt;/p&gt;

&lt;p&gt;&lt;span class=&quot;error&quot;&gt;&amp;#91;root@comet-02-01 ~&amp;#93;&lt;/span&gt;# rpm -qi lustre-client&lt;br/&gt;
Name        : lustre-client                Relocations: (not relocatable)&lt;br/&gt;
Version     : 2.7.1                             Vendor: (none)&lt;br/&gt;
Release     : 2.6.32_573.12.1.el6.x86_64_g965bd63   Build Date: Tue 02 Feb 2016 02:29:06 PM PST&lt;br/&gt;
Install Date: Thu 11 Feb 2016 05:54:41 PM PST      Build Host: comet-23-11.sdsc.edu&lt;br/&gt;
Group       : Utilities/System              Source RPM: lustre-client-2.7.1-2.6.32_573.12.1.el6.x86_64_g965bd63.src.rpm&lt;br/&gt;
Size        : 2030643                          License: GPL&lt;br/&gt;
Signature   : (none)&lt;br/&gt;
URL         : &lt;a href=&quot;https://wiki.hpdd.intel.com/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://wiki.hpdd.intel.com/&lt;/a&gt;&lt;br/&gt;
Summary     : Lustre File System&lt;br/&gt;
Description :&lt;br/&gt;
Userspace tools and files for the Lustre file system.&lt;/p&gt;


&lt;p&gt;&lt;a href=&quot;http://git.whamcloud.com/fs/lustre-release.git/commit/d730750a6311cae8a4427824867410faccc6698f&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://git.whamcloud.com/fs/lustre-release.git/commit/d730750a6311cae8a4427824867410faccc6698f&lt;/a&gt; is contained in the version we&#8217;re using:&lt;/p&gt;

&lt;p&gt;dimm:lustre-release-fe dimm$ git branch --contains d730750a6311cae8a4427824867410faccc6698f&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;(HEAD detached at 965bd63)&lt;br/&gt;
  b2_8_fe&lt;/li&gt;
&lt;/ul&gt;

</comment>
                            <comment id="147685" author="bobijam" created="Sun, 3 Apr 2016 10:55:53 +0000"  >&lt;p&gt;All clients have this? Since a single old client (without it) could possibly cause this OSS LBUG.&lt;/p&gt;</comment>
                            <comment id="148160" author="haisong" created="Thu, 7 Apr 2016 17:28:43 +0000"  >&lt;p&gt;Zhenyu,&lt;/p&gt;

&lt;p&gt;We have carefully walked through our clients and checked Lustre version among them. To our acknowledge, all of them have the patch Lu-5651.&lt;/p&gt;


&lt;p&gt;Haisong &lt;/p&gt;</comment>
                            <comment id="148201" author="bobijam" created="Fri, 8 Apr 2016 00:25:08 +0000"  >&lt;p&gt;For the workaround, you can mount the OST disabling recovery (&quot;-o abort_recov&quot;).&lt;/p&gt;

&lt;p&gt;And could you upload the dump file as well as all the support files (rpms with debuginfo)?&lt;/p&gt;</comment>
                            <comment id="169926" author="gerrit" created="Mon, 17 Oct 2016 14:33:38 +0000"  >&lt;p&gt;Hongchao Zhang (hongchao.zhang@intel.com) uploaded a new patch: &lt;a href=&quot;http://review.whamcloud.com/23205&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/23205&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6655&quot; title=&quot;MDS LBUG: (ldlm_lib.c:2277:target_queue_recovery_request()) ASSERTION( req-&amp;gt;rq_export-&amp;gt;exp_lock_replay_needed ) failed&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6655&quot;&gt;&lt;del&gt;LU-6655&lt;/del&gt;&lt;/a&gt; ptlrpc: skip delayed replay requests&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: c819f1ca2d3c12348d1e7e779500d7f774f923f7&lt;/p&gt;</comment>
                            <comment id="227367" author="gerrit" created="Sun, 6 May 2018 03:40:05 +0000"  >&lt;p&gt;Oleg Drokin (oleg.drokin@intel.com) merged in patch &lt;a href=&quot;https://review.whamcloud.com/23205/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/23205/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6655&quot; title=&quot;MDS LBUG: (ldlm_lib.c:2277:target_queue_recovery_request()) ASSERTION( req-&amp;gt;rq_export-&amp;gt;exp_lock_replay_needed ) failed&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6655&quot;&gt;&lt;del&gt;LU-6655&lt;/del&gt;&lt;/a&gt; ptlrpc: skip delayed replay requests&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: c1d465de13ccf0eda8020c88661c3cc4d78538ca&lt;/p&gt;</comment>
                            <comment id="227390" author="pjones" created="Sun, 6 May 2018 04:17:01 +0000"  >&lt;p&gt;Landed for 2.12&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10010">
                    <name>Duplicate</name>
                                                                <inwardlinks description="is duplicated by">
                                                        </inwardlinks>
                                    </issuelinktype>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                                        </outwardlinks>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="39122">LU-8544</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10490" key="com.atlassian.jira.plugin.system.customfieldtypes:datepicker">
                        <customfieldname>End date</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Fri, 8 Apr 2016 18:42:22 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzxecn:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                        <customfield id="customfield_10493" key="com.atlassian.jira.plugin.system.customfieldtypes:datepicker">
                        <customfieldname>Start date</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Wed, 27 May 2015 18:42:22 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                    </customfields>
    </item>
</channel>
</rss>