<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:53:20 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-5651] ASSERTION( req-&gt;rq_export-&gt;exp_lock_replay_needed ) failed</title>
                <link>https://jira.whamcloud.com/browse/LU-5651</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Client doesn&apos;t restore import state correctly&lt;br/&gt;
on reconnect during replay. It resends lock replay&lt;br/&gt;
when final ping was queued by server.&lt;br/&gt;
Server fails with &quot;target_queue_recovery_request())&lt;br/&gt;
ASSERTION( req-&amp;gt;rq_export-&amp;gt;exp_lock_replay_needed ) failed&quot;&lt;/p&gt;

&lt;p&gt;Solution is to add imp_replay_state to store last replay state. &lt;br/&gt;
During reconnect imp_state is restored from imp_replay_state.&lt;/p&gt;</description>
                <environment></environment>
        <key id="26687">LU-5651</key>
            <summary>ASSERTION( req-&gt;rq_export-&gt;exp_lock_replay_needed ) failed</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="niu">Niu Yawei</assignee>
                                    <reporter username="askulysh">Andriy Skulysh</reporter>
                        <labels>
                            <label>patch</label>
                    </labels>
                <created>Tue, 23 Sep 2014 11:16:29 +0000</created>
                <updated>Fri, 29 May 2015 12:32:28 +0000</updated>
                            <resolved>Thu, 15 Jan 2015 07:44:03 +0000</resolved>
                                                    <fixVersion>Lustre 2.7.0</fixVersion>
                    <fixVersion>Lustre 2.5.4</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>12</watches>
                                                                            <comments>
                            <comment id="94699" author="askulysh" created="Tue, 23 Sep 2014 11:55:52 +0000"  >&lt;p&gt;patch: &lt;a href=&quot;http://review.whamcloud.com/12015&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/12015&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="94700" author="niu" created="Tue, 23 Sep 2014 12:20:06 +0000"  >&lt;p&gt;This looks dup of &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-5287&quot; title=&quot;(ldlm_lib.c:2253:target_queue_recovery_request()) ASSERTION( req-&amp;gt;rq_export-&amp;gt;exp_lock_replay_needed ) failed&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-5287&quot;&gt;&lt;del&gt;LU-5287&lt;/del&gt;&lt;/a&gt; and I posted a fix at: &lt;a href=&quot;http://review.whamcloud.com/#/c/11871/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#/c/11871/&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Client doesn&apos;t restore import state correctly&lt;br/&gt;
on reconnect during replay. It resends lock replay&lt;br/&gt;
when final ping was queued by server.&lt;br/&gt;
Server fails with &quot;target_queue_recovery_request())&lt;br/&gt;
ASSERTION( req-&amp;gt;rq_export-&amp;gt;exp_lock_replay_needed ) failed&quot;&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;If the final ping has been queued on server, the recovery on this export should be finished (exp_in_recovery == 0) when the resent lock replay reach server. I don&apos;t see why that could trigger the assertion, could you explain it in detail?&lt;/p&gt;</comment>
                            <comment id="94702" author="askulysh" created="Tue, 23 Sep 2014 12:28:30 +0000"  >&lt;p&gt;The recovery ends only on receiving  and processing final pings from all clients. I can happen that server accepted final ping from client1 but waits for requests from client2 and client1 reconnects. This situation was simulated in test by adding timeout before processing final pings. &lt;/p&gt;</comment>
                            <comment id="94708" author="niu" created="Tue, 23 Sep 2014 12:44:12 +0000"  >&lt;blockquote&gt;
&lt;p&gt;The recovery ends only on receiving and processing final pings from all clients.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;I was saying recovery on this export is done. (exp_in_recovery == 0).&lt;/p&gt;</comment>
                            <comment id="94710" author="askulysh" created="Tue, 23 Sep 2014 13:07:57 +0000"  >&lt;p&gt;exp_in_recovery is zeroed during recovery final stage 3, client1 reconnects during lock replay stage 2.&lt;/p&gt;</comment>
                            <comment id="94713" author="askulysh" created="Tue, 23 Sep 2014 13:43:56 +0000"  >&lt;p&gt;step by step explanation: &lt;br/&gt;
1) server goes to recovery stage 2 (lock replay) with at least 2 clients.&lt;br/&gt;
2) client1 and client 2 send lock relay,&lt;br/&gt;
3) client1 sends final ping&lt;br/&gt;
4) server queues   final ping from client1 and set exp_lock_replay_needed to 0&lt;br/&gt;
5) client2 still in lock replay stage (waits for lock replies form server)&lt;br/&gt;
6) client1 reconnects&lt;br/&gt;
7) client1 replays locks from the beginning &lt;br/&gt;
8) assertion fails &lt;/p&gt;</comment>
                            <comment id="94716" author="niu" created="Tue, 23 Sep 2014 13:55:10 +0000"  >&lt;blockquote&gt;
&lt;p&gt;step by step explanation: &lt;br/&gt;
1) server goes to recovery stage 2 (lock replay) with at least 2 clients.&lt;br/&gt;
2) client1 and client 2 send lock relay,&lt;br/&gt;
3) client1 sends final ping&lt;br/&gt;
4) server queues final ping from client1 and set exp_lock_replay_needed to 0&lt;br/&gt;
5) client2 still in lock replay stage (waits for lock replies form server)&lt;br/&gt;
6) client1 reconnects&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;Is the final ping from client1 still in queue? If it&apos;s still in queue, client1 can&apos;t reconnect because there is inflgiht RPC; If the final ping has been processed, the exp_in_recovery should have been cleared.&lt;/p&gt;</comment>
                            <comment id="94717" author="askulysh" created="Tue, 23 Sep 2014 14:03:37 +0000"  >&lt;p&gt;no. request reaches  timeout on reply, client reconnects &lt;/p&gt;</comment>
                            <comment id="94806" author="niu" created="Wed, 24 Sep 2014 04:16:14 +0000"  >&lt;blockquote&gt;
&lt;p&gt;no. request reaches timeout on reply, client reconnects&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;Could you explain how can client1 reconnect when the final ping is in queue (server would reject the reconnect because the export has inflight RPC)? Is there defect in reconnect path?&lt;/p&gt;</comment>
                            <comment id="95028" author="askulysh" created="Fri, 26 Sep 2014 09:52:07 +0000"  >&lt;p&gt;I don&apos;t understand  why reconnect should be rejected. The only check for inflight RPC on a export is &lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;no_export:
                OBD_FAIL_TIMEOUT(OBD_FAIL_TGT_DELAY_CONNECT, 2 * obd_timeout);
        } else if (req-&amp;gt;rq_export == NULL &amp;amp;&amp;amp;
		   atomic_read(&amp;amp;export-&amp;gt;exp_rpc_count) &amp;gt; 0) {
                LCONSOLE_WARN(&quot;%s: Client %s (at %s) refused connection, &quot;
                              &quot;still busy with %d references\n&quot;,
                              target-&amp;gt;obd_name, cluuid.uuid,
                              libcfs_nid2str(req-&amp;gt;rq_peer.nid),
			      atomic_read(&amp;amp;export-&amp;gt;exp_refcount));
                GOTO(out, rc = -EBUSY);
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;but reconnect request has valid export handle.&lt;/p&gt;</comment>
                            <comment id="95031" author="niu" created="Fri, 26 Sep 2014 11:24:07 +0000"  >&lt;p&gt;Ah, my mistake, the code has been changed by &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-793&quot; title=&quot;Reconnections should not be refused when there is a request in progress from this client.&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-793&quot;&gt;&lt;del&gt;LU-793&lt;/del&gt;&lt;/a&gt;, so client can reconnect now. I&apos;ll review the patch soon, thanks for your explanation.&lt;/p&gt;</comment>
                            <comment id="95962" author="hilljjornl" created="Wed, 8 Oct 2014 19:33:34 +0000"  >&lt;p&gt;ORNL hit this issue today in production after upgrading to Lustre 2.5.2 on both server and client.&lt;/p&gt;</comment>
                            <comment id="100668" author="jamesanunez" created="Thu, 4 Dec 2014 14:47:34 +0000"  >&lt;p&gt;Patch for b2_5 at &lt;a href=&quot;http://review.whamcloud.com/#/c/12163/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#/c/12163/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The master patch has landed. Is there more work needed to complete this ticket or should it be closed?&lt;/p&gt;</comment>
                            <comment id="100673" author="simmonsja" created="Thu, 4 Dec 2014 15:30:48 +0000"  >&lt;p&gt;I believe the replay-single test needs to be updated to test for lustre versions so interop testing passes.&lt;/p&gt;</comment>
                            <comment id="100784" author="gerrit" created="Fri, 5 Dec 2014 00:47:13 +0000"  >&lt;p&gt;James Simmons (uja.ornl@gmail.com) uploaded a new patch: &lt;a href=&quot;http://review.whamcloud.com/12942&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/12942&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-5651&quot; title=&quot;ASSERTION( req-&amp;gt;rq_export-&amp;gt;exp_lock_replay_needed ) failed&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-5651&quot;&gt;&lt;del&gt;LU-5651&lt;/del&gt;&lt;/a&gt; test: run replay-single test 93 only when supported.&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 7bc6d9b8c2e09027444b31c40dd320e244c2412d&lt;/p&gt;</comment>
                            <comment id="101912" author="gerrit" created="Thu, 18 Dec 2014 03:14:00 +0000"  >&lt;p&gt;Oleg Drokin (oleg.drokin@intel.com) merged in patch &lt;a href=&quot;http://review.whamcloud.com/12942/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/12942/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-5651&quot; title=&quot;ASSERTION( req-&amp;gt;rq_export-&amp;gt;exp_lock_replay_needed ) failed&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-5651&quot;&gt;&lt;del&gt;LU-5651&lt;/del&gt;&lt;/a&gt; test: run replay-single test 93 only when supported.&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: afde9f17260650d0cb80d53613fb5afda0a39384&lt;/p&gt;</comment>
                            <comment id="103558" author="gerrit" created="Thu, 15 Jan 2015 04:45:29 +0000"  >&lt;p&gt;Oleg Drokin (oleg.drokin@intel.com) merged in patch &lt;a href=&quot;http://review.whamcloud.com/12163/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/12163/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-5651&quot; title=&quot;ASSERTION( req-&amp;gt;rq_export-&amp;gt;exp_lock_replay_needed ) failed&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-5651&quot;&gt;&lt;del&gt;LU-5651&lt;/del&gt;&lt;/a&gt;: ptlrpc: fix import state during replay&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: b2_5&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 5748379d28846c672f793ba1f8e143e63531dd05&lt;/p&gt;</comment>
                            <comment id="103560" author="yujian" created="Thu, 15 Jan 2015 07:44:03 +0000"  >&lt;p&gt;Patches landed to master and b2_5 branches.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10010">
                    <name>Duplicate</name>
                                                                <inwardlinks description="is duplicated by">
                                        <issuelink>
            <issuekey id="26917">LU-5719</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="25409">LU-5287</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzwwwf:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>15837</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>