<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:12:13 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-7822] import.c:1249:ptlrpc_connect_interpret()) soaked-MDT0001_UUID went back in time</title>
                <link>https://jira.whamcloud.com/browse/LU-7822</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Error happens during soak testing of build &apos;20160224&apos; (b2_8 RC2) (see:&lt;br/&gt;
&lt;a href=&quot;https://wiki.hpdd.intel.com/pages/viewpage.action?title=Soak+Testing+on+Lola&amp;amp;&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://wiki.hpdd.intel.com/pages/viewpage.action?title=Soak+Testing+on+Lola&amp;amp;&lt;/a&gt; spaceKey=Releases#SoakTestingonLola-20150224). DNE is enabled.&lt;br/&gt;
MDSes had been formatted using ldiskfs, OSTs using zfs. MDSes are configured in active-active HA failover configuration.&lt;/p&gt;

&lt;p&gt;Sequence of events:&lt;/p&gt;

&lt;ul&gt;
	&lt;li&gt;2016-02-26 08:46:16,864:fsmgmt.fsmgmt:INFO     triggering fault mds_failover&lt;br/&gt;
  2016-02-26 08:46:16,865:fsmgmt.fsmgmt:INFO     reseting MDS node lola-8&lt;/li&gt;
	&lt;li&gt;2016-02-26 09:01:59,696:fsmgmt.fsmgmt:INFO     ... soaked-MDT0001 mounted successfully on lola-9&lt;/li&gt;
	&lt;li&gt;2016-02-26 09:02:52,797:fsmgmt.fsmgmt:INFO     ... soaked-MDT0000 mounted successfully on lola-9&lt;/li&gt;
	&lt;li&gt;2016-02-26 09:03:13,043:fsmgmt.fsmgmt:INFO     Don&apos;t wait for recovery to complete. Failback MDT&apos;s immediately&lt;/li&gt;
	&lt;li&gt;2016-02-26 09:03:13,044:fsmgmt.fsmgmt:INFO     Failing back soaked-MDT0001 ...&lt;/li&gt;
	&lt;li&gt;2016-02-26 09:04:04,707:fsmgmt.fsmgmt:INFO     ... soaked-MDT0001 mounted successfully on lola-8    (--&amp;gt; failed back)&lt;/li&gt;
	&lt;li&gt;2016-02-26 09:04:04,708:fsmgmt.fsmgmt:INFO     Failing back soaked-MDT0000 ...&lt;/li&gt;
	&lt;li&gt;2016-02-26 09:04:25,446:fsmgmt.fsmgmt:INFO     ... soaked-MDT0000 mounted successfully on lola-8      (--&amp;gt; failed back)&lt;/li&gt;
	&lt;li&gt;2016-02- 26 09:15:57 lola-8 kernel: Lustre: soaked-MDT0001: Recovery over after 11:45, of 16 clients 15 recovered and 1 was evicte&lt;br/&gt;
d.&lt;/li&gt;
	&lt;li&gt;201602-26 09:16:48 lola-8 kernel: Lustre: soaked-MDT0000: Recovery over after 12:16, of 16 clients 16 recovered and 0 were evicted.&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;All Lustre clients show the following error message:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;lola-27.log:Feb 26 09:04:37 lola-27 kernel: LustreError: 3684:0:(import.c:1249:ptlrpc_connect_interpret()) soaked-MDT0000_UUID went back in time (transno 120267673197 was previously committed, server now claims 120267671307)!  See https://bugzilla.lustre.org/show_bug.cgi?id=9646
lola-27.log:Feb 26 09:04:37 lola-27 kernel: LustreError: 3684:0:(mdc_request.c:554:mdc_replay_open()) @@@ Can&apos;t properly replay without open data.  req@ffff8806b1593980 x1527085335890828/t85900680419(85900680419) o101-&amp;gt;soaked-MDT0000-mdc-ffff88082f749000@192.168.1.108@o2ib10:12/10 lens 896/544 e 0 to 0 dl 1456506319 ref 2 fl Interpret:RP/4/0 rc 301/301
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The corresponding error message on MGS/MDT node &lt;tt&gt;lola-8&lt;/tt&gt; is:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;lola-8.log:Feb 26 09:04:37 lola-8 kernel: LustreError: 4561:0:(mdt_open.c:1251:mdt_reint_open()) @@@ OPEN &amp;amp; CREAT not in open replay/by_fid.  req@ffff8808231ac680 x1527085335890828/t0(85900680419) o101-&amp;gt;713987fa-97c6-9c64-e2ea-b436903a0b12@192.168.1.127@o2ib100:333/0 lens 896/3512 e 0 to 0 dl 1456506318 ref 1 fl Interpret:/4/0 rc 0/0
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;After recovery completed the errors appear in messages file of &lt;tt&gt;lola-8&lt;/tt&gt;&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;lola-8.log:Feb 26 09:15:42 lola-8 kernel: LustreError: 4664:0:(ldlm_lib.c:1900:check_for_next_transno()) soaked-MDT0001: waking for gap in transno, VBR is OFF (skip: 120260806786, ql: 7, comp: 9, conn: 16, next: 120260806788, next_update 120260806887 last_committed: 120260806785)
lola-8.log:Feb 26 09:15:42 lola-8 kernel: LustreError: 4664:0:(ldlm_lib.c:1900:check_for_next_transno()) soaked-MDT0001: waking for gap in transno, VBR is OFF (skip: 120260806789, ql: 7, comp: 9, conn: 16, next: 120260806887, next_update 120260806887 last_committed: 120260806785)
lola-8.log:Feb 26 09:15:42 lola-8 kernel: LustreError: 4664:0:(ldlm_lib.c:1900:check_for_next_transno()) soaked-MDT0001: waking for gap in transno, VBR is OFF (skip: 120260806888, ql: 7, comp: 9, conn: 16, next: 120260806892, next_update 120260806903 last_committed: 120260806785)
lola-8.log:Feb 26 09:15:42 lola-8 kernel: LustreError: 4664:0:(ldlm_lib.c:1900:check_for_next_transno()) soaked-MDT0001: waking for gap in transno, VBR is OFF (skip: 120260806893, ql: 7, comp: 9, conn: 16, next: 120260806903, next_update 120260806903 last_committed: 120260806785)
lola-8.log:Feb 26 09:15:42 lola-8 kernel: LustreError: 4664:0:(ldlm_lib.c:1900:check_for_next_transno()) soaked-MDT0001: waking for gap in transno, VBR is OFF (skip: 120260806904, ql: 7, comp: 9, conn: 16, next: 120260806946, next_update 120260806960 last_committed: 120260806785)
lola-8.log:Feb 26 09:15:42 lola-8 kernel: LustreError: 4664:0:(ldlm_lib.c:1900:check_for_next_transno()) soaked-MDT0001: waking for gap in transno, VBR is OFF (skip: 120260806947, ql: 7, comp: 9, conn: 16, next: 120260806958, next_update 120260806960 last_committed: 120260806785)
...
... etc etc
lola-8.log:Feb 26 09:15:51 lola-8 kernel: LustreError: 4583:0:(ldlm_lockd.c:858:ldlm_server_blocking_ast()) ### BUG 6063: lock collide during recovery ns: mdt-soaked-MDT0000_UUID lock: ffff8803ecf60dc0/0x73ce4df547baebbb lrc: 3/0,0 mode: EX/EX res: [0x200010da0:0x74:0x0].0x0 bits 0x2 rrc: 2 type: IBT flags: 0x40000001000020 nid: 0@lo remote: 0x73ce4df547baebb4 expref: 95 pid: 4836 timeout: 0 lvb_type: 0
lola-8.log:Feb 26 09:15:51 lola-8 kernel: LustreError: 4583:0:(ldlm_lockd.c:858:ldlm_server_blocking_ast()) Skipped 3 previous similar messages
...
...
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Attached console, message files of lola-8,9,27 (27--&amp;gt; as example client)&lt;/p&gt;</description>
                <environment>lola&lt;br/&gt;
build: &lt;a href=&quot;https://build.hpdd.intel.com/job/lustre-b2_8/8/&quot;&gt;https://build.hpdd.intel.com/job/lustre-b2_8/8/&lt;/a&gt;</environment>
        <key id="35007">LU-7822</key>
            <summary>import.c:1249:ptlrpc_connect_interpret()) soaked-MDT0001_UUID went back in time</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="2" iconUrl="https://jira.whamcloud.com/images/icons/priorities/critical.svg">Critical</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="5">Cannot Reproduce</resolution>
                                        <assignee username="di.wang">Di Wang</assignee>
                                    <reporter username="heckes">Frank Heckes</reporter>
                        <labels>
                            <label>soak</label>
                    </labels>
                <created>Fri, 26 Feb 2016 17:59:32 +0000</created>
                <updated>Wed, 13 Oct 2021 02:23:50 +0000</updated>
                            <resolved>Wed, 13 Oct 2021 02:23:50 +0000</resolved>
                                    <version>Lustre 2.8.0</version>
                                                        <due></due>
                            <votes>1</votes>
                                    <watches>6</watches>
                                                                            <comments>
                            <comment id="143975" author="jgmitter" created="Fri, 26 Feb 2016 18:14:26 +0000"  >&lt;p&gt;Hi Di,&lt;br/&gt;
Can you take a look at this issue?&lt;br/&gt;
Thanks.&lt;br/&gt;
Joe&lt;/p&gt;</comment>
                            <comment id="143992" author="di.wang" created="Fri, 26 Feb 2016 19:11:54 +0000"  >&lt;p&gt;hmm, it looks like the object is missing for this replay(open_no_create) request.&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;lola-27.log:Feb 26 09:04:37 lola-27 kernel: LustreError: 3684:0:(import.c:1249:ptlrpc_connect_interpret()) soaked-MDT0000_UUID went back in time (transno 120267673197 was previously committed, server now claims 120267671307)!  See https://bugzilla.lustre.org/show_bug.cgi?id=9646
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This is interesting. lola-27 seems keeps the same replay request to lola-8 in every recovery. &lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;messages-lola-8.log:Feb 25 16:53:17 lola-8 kernel: LustreError: 4536:0:(mdt_open.c:1251:mdt_reint_open()) @@@ OPEN &amp;amp; CREAT not in open replay/by_fid.  req@ffff8804346976c0 x1527085335890828/t0(85900680419) o101-&amp;gt;713987fa-97c6-9c64-e2ea-b436903a0b12@192.168.1.127@o2ib100:209/0 lens 896/3512 e 0 to 0 dl 1456448059 ref 1 fl Interpret:/4/0 rc 0/0
messages-lola-8.log:Feb 25 17:15:44 lola-8 kernel: LustreError: 4451:0:(mdt_open.c:1251:mdt_reint_open()) @@@ OPEN &amp;amp; CREAT not in open replay/by_fid.  req@ffff88082b319cc0 x1527085335890828/t0(85900680419) o101-&amp;gt;713987fa-97c6-9c64-e2ea-b436903a0b12@192.168.1.127@o2ib100:738/0 lens 896/3512 e 0 to 0 dl 1456450098 ref 1 fl Interpret:/4/0 rc 0/0
messages-lola-8.log:Feb 25 23:10:46 lola-8 kernel: LustreError: 4401:0:(mdt_open.c:1251:mdt_reint_open()) @@@ OPEN &amp;amp; CREAT not in open replay/by_fid.  req@ffff880816d100c0 x1527085335890828/t0(85900680419) o101-&amp;gt;713987fa-97c6-9c64-e2ea-b436903a0b12@192.168.1.127@o2ib100:191/0 lens 896/3512 e 0 to 0 dl 1456470691 ref 1 fl Interpret:/4/0 rc 0/0
messages-lola-8.log:Feb 25 23:54:27 lola-8 kernel: LustreError: 4450:0:(mdt_open.c:1251:mdt_reint_open()) @@@ OPEN &amp;amp; CREAT not in open replay/by_fid.  req@ffff8807b679d0c0 x1527085335890828/t0(85900680419) o101-&amp;gt;713987fa-97c6-9c64-e2ea-b436903a0b12@192.168.1.127@o2ib100:512/0 lens 896/3512 e 0 to 0 dl 1456473277 ref 1 fl Interpret:/4/0 rc 0/0
messages-lola-8.log:Feb 26 00:30:11 lola-8 kernel: LustreError: 4389:0:(mdt_open.c:1251:mdt_reint_open()) @@@ OPEN &amp;amp; CREAT not in open replay/by_fid.  req@ffff880821640080 x1527085335890828/t0(85900680419) o101-&amp;gt;713987fa-97c6-9c64-e2ea-b436903a0b12@192.168.1.127@o2ib100:391/0 lens 896/3512 e 0 to 0 dl 1456475421 ref 1 fl Interpret:/4/0 rc 0/0
messages-lola-8.log:Feb 26 00:56:32 lola-8 kernel: LustreError: 4543:0:(mdt_open.c:1251:mdt_reint_open()) @@@ OPEN &amp;amp; CREAT not in open replay/by_fid.  req@ffff880824a71980 x1527085335890828/t0(85900680419) o101-&amp;gt;713987fa-97c6-9c64-e2ea-b436903a0b12@192.168.1.127@o2ib100:498/0 lens 896/3512 e 0 to 0 dl 1456477038 ref 1 fl Interpret:/4/0 rc 0/0
messages-lola-8.log:Feb 26 02:40:43 lola-8 kernel: LustreError: 4449:0:(mdt_open.c:1251:mdt_reint_open()) @@@ OPEN &amp;amp; CREAT not in open replay/by_fid.  req@ffff88081e04dcc0 x1527085335890828/t0(85900680419) o101-&amp;gt;713987fa-97c6-9c64-e2ea-b436903a0b12@192.168.1.127@o2ib100:668/0 lens 896/3512 e 0 to 0 dl 1456483248 ref 1 fl Interpret:/4/0 rc 0/0
messages-lola-8.log:Feb 26 08:00:34 lola-8 kernel: LustreError: 4408:0:(mdt_open.c:1251:mdt_reint_open()) @@@ OPEN &amp;amp; CREAT not in open replay/by_fid.  req@ffff8808119ea9c0 x1527085335890828/t0(85900680419) o101-&amp;gt;713987fa-97c6-9c64-e2ea-b436903a0b12@192.168.1.127@o2ib100:229/0 lens 896/3512 e 0 to 0 dl 1456502439 ref 1 fl Interpret:/4/0 rc 0/0
messages-lola-8.log:Feb 26 09:04:37 lola-8 kernel: LustreError: 4561:0:(mdt_open.c:1251:mdt_reint_open()) @@@ OPEN &amp;amp; CREAT not in open replay/by_fid.  req@ffff8808231ac680 x1527085335890828/t0(85900680419) o101-&amp;gt;713987fa-97c6-9c64-e2ea-b436903a0b12@192.168.1.127@o2ib100:333/0 lens 896/3512 e 0 to 0 dl 1456506318 ref 1 fl Interpret:/4/0 rc 0/0
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;I think client should throw such replay request once it is failed, since it is only open, no create.&lt;/p&gt;</comment>
                            <comment id="143995" author="di.wang" created="Fri, 26 Feb 2016 19:34:24 +0000"  >&lt;p&gt;looks like the open replay data is already deleted from the replay request.  So I do not know why this request still exist in the replay list. I wonder if this related with the recent xid change?  Niu, please have a look? thanks.&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;LustreError: 3684:0:(mdc_request.c:554:mdc_replay_open()) @@@ Can&apos;t properly replay without open data.  req@ffff8806b1593980 x1527085335890828/t85900680419(85900680419) o101-&amp;gt;soaked-MDT0000-mdc-ffff88082f749000@192.168.1.108@o2ib10:12/10 lens 896/544 e 0 to 0 dl 1456470692 ref 2 fl Interpret:RP/4/0 rc 301/301
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;And also this is probably not a critical ticket here, because I did not see this broke the consistency of the namespace. Only some noise message caused by this mysterious replay request? &lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                            <attachment id="20568" name="console-lola-27.log.bz2" size="46515" author="heckes" created="Fri, 26 Feb 2016 18:10:59 +0000"/>
                            <attachment id="20571" name="console-lola-8.log.bz2" size="487603" author="heckes" created="Fri, 26 Feb 2016 18:12:07 +0000"/>
                            <attachment id="20573" name="console-lola-9.log.bz2" size="532981" author="heckes" created="Fri, 26 Feb 2016 18:13:47 +0000"/>
                            <attachment id="20570" name="messages-lola-27.log.bz2" size="244475" author="heckes" created="Fri, 26 Feb 2016 18:10:59 +0000"/>
                            <attachment id="20572" name="messages-lola-8.log.bz2" size="190824" author="heckes" created="Fri, 26 Feb 2016 18:12:50 +0000"/>
                            <attachment id="20569" name="messages-lola-9.log.bz2" size="233099" author="heckes" created="Fri, 26 Feb 2016 18:10:59 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzy2y7:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>