<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:04:01 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-128] OSSs frequent crashes due to LBUG/[ASSERTION(last_rcvd&gt;=le64_to_cpu(lcd-&gt;lcd_last_transno)) failed] in recovery</title>
                <link>https://jira.whamcloud.com/browse/LU-128</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;As suggested by Peter Jones, we open a Jira ticket for this issue in order to get the fix landed in 2.1.&lt;br/&gt;
I simply copy here the initial description from bugzilla 24420:&lt;/p&gt;

&lt;p&gt;We are having this bug when we reboot some OSSs. It&apos;s being raised in the recovery phase and it&apos;s&lt;br/&gt;
provoking a long Lustre service interruption.&lt;/p&gt;


&lt;p&gt;Each time/crash, the panic&apos;ing thread stack-trace looked like following :&lt;br/&gt;
=========================================================================&lt;br/&gt;
 #0 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff881021fd1238&amp;#93;&lt;/span&gt; machine_kexec at ffffffff8102e66b&lt;br/&gt;
 #1 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff881021fd1298&amp;#93;&lt;/span&gt; crash_kexec at ffffffff810a9ae8&lt;br/&gt;
 #2 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff881021fd1368&amp;#93;&lt;/span&gt; panic at ffffffff8145210d&lt;br/&gt;
 #3 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff881021fd13e8&amp;#93;&lt;/span&gt; lbug_with_loc at ffffffffa0454eeb&lt;br/&gt;
 #4 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff881021fd1438&amp;#93;&lt;/span&gt; libcfs_assertion_failed at ffffffffa04607d6&lt;br/&gt;
 #5 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff881021fd1488&amp;#93;&lt;/span&gt; filter_finish_transno at ffffffffa096c825&lt;br/&gt;
 #6 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff881021fd1548&amp;#93;&lt;/span&gt; filter_do_bio at ffffffffa098e390&lt;br/&gt;
 #7 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff881021fd15e8&amp;#93;&lt;/span&gt; filter_commitrw_write at ffffffffa0990a78&lt;br/&gt;
 #8 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff881021fd17d8&amp;#93;&lt;/span&gt; filter_commitrw at ffffffffa09833d5&lt;br/&gt;
 #9 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff881021fd1898&amp;#93;&lt;/span&gt; obd_commitrw at ffffffffa093affa&lt;br/&gt;
#10 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff881021fd1918&amp;#93;&lt;/span&gt; ost_brw_write at ffffffffa0943644&lt;br/&gt;
#11 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff881021fd1af8&amp;#93;&lt;/span&gt; ost_handle at ffffffffa094837a&lt;br/&gt;
#12 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff881021fd1ca8&amp;#93;&lt;/span&gt; ptlrpc_server_handle_request at ffffffffa060eb11&lt;br/&gt;
#13 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff881021fd1de8&amp;#93;&lt;/span&gt; ptlrpc_main at ffffffffa060feea&lt;br/&gt;
#14 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff881021fd1f48&amp;#93;&lt;/span&gt; kernel_thread at ffffffff8100d1aa&lt;br/&gt;
=========================================================================&lt;/p&gt;

&lt;p&gt;In a particular analysis from our on-site support we get the following values when the LBUG is&lt;br/&gt;
raised on &quot;filter_finish_transno&quot; function:&lt;/p&gt;

&lt;p&gt;lcd_last_transno=0x4ddebb&lt;br/&gt;
oti_transno=last_rcvd=0x4ddeba&lt;br/&gt;
lsd_last_transno=0x4de0ee&lt;/p&gt;

&lt;p&gt;So we have the client (lcd_last_transno) having a bad transaction number with the actual&lt;br/&gt;
transaction number being lower than client&apos;s one which, according the the ASSERT, is bad. &lt;/p&gt;

&lt;p&gt;I could see there is a similar bug (bz23296) but I don&apos;t think this bug is related with this one,&lt;br/&gt;
as in bz23296 the problem comes from a bad initialization in obdecho/echo_client.c which is used&lt;br/&gt;
only for tests, not for production as it&apos;s our case.&lt;/p&gt;

&lt;p&gt;Does this sound as a known bug for you? In order to work-around this bug, what would be the&lt;br/&gt;
consequences of disabling this LBUG? I mean, I think we would loss some data on a client but I&lt;br/&gt;
don&apos;t know if there is any other important consequence.&lt;/p&gt;



&lt;p&gt;I also attach here the patch from bugzilla 24420 that is already landed in 1.8.6.&lt;/p&gt;

&lt;p&gt;Thanks,&lt;br/&gt;
Sebastien.&lt;/p&gt;</description>
                <environment>x86_64, RHEL6</environment>
        <key id="10448">LU-128</key>
            <summary>OSSs frequent crashes due to LBUG/[ASSERTION(last_rcvd&gt;=le64_to_cpu(lcd-&gt;lcd_last_transno)) failed] in recovery</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="niu">Niu Yawei</assignee>
                                    <reporter username="sebastien.buisson">Sebastien Buisson</reporter>
                        <labels>
                    </labels>
                <created>Tue, 15 Mar 2011 01:49:22 +0000</created>
                <updated>Thu, 25 Aug 2011 23:38:45 +0000</updated>
                            <resolved>Tue, 14 Jun 2011 08:06:37 +0000</resolved>
                                    <version>Lustre 2.0.0</version>
                                    <fixVersion>Lustre 2.1.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>4</watches>
                                                                            <comments>
                            <comment id="11081" author="tappro" created="Tue, 15 Mar 2011 02:52:49 +0000"  >&lt;p&gt;&amp;gt; lcd_last_transno=0x4ddebb&lt;br/&gt;
&amp;gt; oti_transno=last_rcvd=0x4ddeba&lt;br/&gt;
&amp;gt; lsd_last_transno=0x4de0ee&lt;br/&gt;
&amp;gt;&lt;br/&gt;
&amp;gt; So we have the client (lcd_last_transno) having a bad transaction number with the actual&lt;br/&gt;
&amp;gt; transaction number being lower than client&apos;s one which, according the the ASSERT, is bad.&lt;/p&gt;

&lt;p&gt;This is not correct, lcd_last_transno is not client transno but the latest transno written on server, it is stored in &apos;client_data&apos; per export. Given the fact it is on disk that said it was committed, therefore all lower transactions must be committed as well. But client tries to replay 0x4ddeba transaction which is bad and shouldn&apos;t happen.&lt;/p&gt;

&lt;p&gt;Therefore we have the following possible cases:&lt;br/&gt;
1) bug in client replay code - it issues lower transaction after higher&lt;br/&gt;
2) bug on server side - it should skip such lower transno if server committed trasno is higher&lt;/p&gt;

&lt;p&gt;I doubt we have case 1) as it is generic code for MDS/OSS. Also this happens always with IO replays, maybe something is wrong in that path. I&apos;d propose to track how such replays are handled on server.&lt;/p&gt;</comment>
                            <comment id="11082" author="tappro" created="Tue, 15 Mar 2011 03:10:44 +0000"  >&lt;p&gt;the patch is correct but only to avoid assertion on wire data, the bug itself is not fixed and still exists, that is fundamentally wrong to try to execute the lower transaction request. Do you have lustre log about that crash?&lt;/p&gt;</comment>
                            <comment id="11083" author="dmoreno" created="Tue, 15 Mar 2011 03:41:53 +0000"  >&lt;p&gt;Thanks for your feedback Mikhail,&lt;/p&gt;

&lt;p&gt;We asked our client to activate this debug levels:&lt;/p&gt;

&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;On OSS: HA and warning&lt;/li&gt;
	&lt;li&gt;On MDS: HA, inode and warning&lt;/li&gt;
	&lt;li&gt;On clients: nothing in particular&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;They cannot set full debug level as their fs is in production. Do you think this will be enough trying to trace trnasno&apos;s?&lt;/p&gt;</comment>
                            <comment id="11084" author="sebastien.buisson" created="Tue, 15 Mar 2011 03:43:36 +0000"  >&lt;p&gt;Hi,&lt;/p&gt;

&lt;p&gt;The patch is only considered as a work-around for us, but it is vital because without it production is completely blocked at customer site.&lt;/p&gt;

&lt;p&gt;Unfortunately we do not have the lustre debug logs, all the information we have is in bugzilla 24420.&lt;/p&gt;

&lt;p&gt;Regards,&lt;br/&gt;
Sebastien.&lt;/p&gt;</comment>
                            <comment id="11085" author="tappro" created="Tue, 15 Mar 2011 04:16:44 +0000"  >&lt;p&gt;Diego, HA should help a lot.&lt;/p&gt;

&lt;p&gt;Sebastien, am I right that bug was seen only with 1.8.x so far?&lt;/p&gt;</comment>
                            <comment id="11087" author="sebastien.buisson" created="Tue, 15 Mar 2011 04:42:34 +0000"  >&lt;p&gt;No, our customer is installed with Lustre 2.0.0.1, this is where we have seen this issue.&lt;/p&gt;</comment>
                            <comment id="11088" author="pjones" created="Tue, 15 Mar 2011 04:48:00 +0000"  >&lt;p&gt;Niu&lt;/p&gt;

&lt;p&gt;Could you please look at this one?&lt;/p&gt;

&lt;p&gt;thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="11089" author="niu" created="Tue, 15 Mar 2011 05:29:33 +0000"  >&lt;p&gt;Hi, tappro&lt;/p&gt;

&lt;p&gt;Though the 0x4ddebb was already committed, we can&apos;t guarantee that the reply with last_commited_trasno &amp;gt;= 0x4ddebb arrived client before server crash, so the replay requests probably were still in the client&apos;s replay list, and they should be replayed on recovery, did I miss anything?  &lt;/p&gt;

&lt;p&gt;To my understanding, such kind of assert is incorrect (the first replay request might have lower transno), we can just simply remove it.&lt;/p&gt;</comment>
                            <comment id="11091" author="tappro" created="Tue, 15 Mar 2011 06:22:57 +0000"  >&lt;p&gt;upon connect the real last_committed is reported back to the client, so it shouldn&apos;t send lower replays. That is the first questions why that doesn&apos;t work?&lt;/p&gt;

&lt;p&gt;Also server always can compare them during replay and must drop them because processing them breaks consistency. The assertion is set in filter_finish_transno which is endpoint, it shouldn&apos;t go so far.&lt;/p&gt;

&lt;p&gt;Note that this assertion was there for quite long time and all cases are about bulk replay which was introduced recently. I think this is root of problem, we need to review that paths, what last_committed value client has after reconnect? What server values for next_transno, committed, current at the moment of replay? After that it will be more clear what is wrong. &lt;/p&gt;</comment>
                            <comment id="11092" author="tappro" created="Tue, 15 Mar 2011 06:35:11 +0000"  >&lt;p&gt;in  target_queue_recovery_request() we have exclusion for lower transno requests:&lt;/p&gt;

&lt;p&gt;        if (transno &amp;lt; obd-&amp;gt;obd_next_recovery_transno) &lt;/p&gt;
{
                /* Processing the queue right now, don&apos;t re-add. */
                LASSERT(cfs_list_empty(&amp;amp;req-&amp;gt;rq_list));
                cfs_spin_unlock(&amp;amp;obd-&amp;gt;obd_recovery_task_lock);
                RETURN(1);
        }

&lt;p&gt;that is done for open replays which can be older than committed values, but such requests are specially kept on client. That makes sense only on MDS and I am not sure about bulk replay - maybe they also can be kept even if lower than committed for some reason? Though I can&apos;t imagine any.&lt;/p&gt;

&lt;p&gt;So the only question remains why bulk replays are possible with lower transno, see my previous comment about last_committed which must be applied after reconnect.&lt;/p&gt;</comment>
                            <comment id="11104" author="niu" created="Tue, 15 Mar 2011 07:57:55 +0000"  >&lt;p&gt;Right, client should get last_committed upon connect and cleanup all the requests with lower transactions, I missed that part. Will rethink it. Thank you, Tappro.&lt;/p&gt;</comment>
                            <comment id="11255" author="tappro" created="Sun, 20 Mar 2011 08:48:05 +0000"  >
&lt;p&gt;1) Such replays will be replayed during replay phase, there is no mechanism to replay something after replay phase. It is still in replay queue and will be replayed again at first place (not committed on server) or freed if committed. The server last_committed value must be equal to its transno or less.&lt;/p&gt;

&lt;p&gt;2) The replay errors causes import invalidation, client is evicted from server. Anyway we have no assertion for &apos;==&apos; case, it is allowed.&lt;/p&gt;

&lt;p&gt;That doesn&apos;t look as source of problem. I suspect the source of problem is in client code, which replays requests with transno lesser than last_committed from server. I don&apos;t know how yet, but that is good point to start with.&lt;/p&gt;

&lt;p&gt;The problem is not in assertion, this is just fundamentally wrong to replay lesser transno over greater one. &lt;/p&gt;</comment>
                            <comment id="11259" author="niu" created="Mon, 21 Mar 2011 02:38:43 +0000"  >&lt;p&gt;Hmm, whenever a replay request get error or timeout (lost reply), the ptlrpc_replay_interpret() calls ptlrpc_connect_import() to restart recover, if that replay wasn&apos;t committed before the reconnect replied, then client will resend it as the first replay request upon reconnect. The transno of this resent replay might be less than the &apos;obd_next_recovery_transno&apos;, so it&apos;ll be processed immediately. For filter, this looks like a defect to me, because OSS doesn&apos;t check the non-idempotent resent request as MDS does, however, it shouldn&apos;t trigger the LBUG, because the transno is at most equal to lcd_last_transno in this case. &lt;/p&gt;

&lt;p&gt;I can&apos;t imagine the case of client sending lower transno niether, maybe some more log would be helpful.&lt;/p&gt;</comment>
                            <comment id="11316" author="niu" created="Wed, 23 Mar 2011 22:34:01 +0000"  >&lt;p&gt;Hi, Tappro&lt;/p&gt;

&lt;p&gt;It looks to me there is a serious race in target_handle_connect() which could casuse this LBUG.&lt;/p&gt;

&lt;p&gt;In target_handle_connect(), it&apos;ll check the obd_recovering to see if we&apos;re in recovery state, if it&apos;s true, then MSG_CONNECT_RECOVERING will packed in reply to inform the client to start replay, however, recovery might be finished just after this checking/setting done, at the end, this client will be allowed to do replay after recovery window. &lt;/p&gt;

&lt;p&gt;Do you agree with me on this? If you think it&apos;s a race either, I&apos;ll make a patch to fix it soon. Thanks.&lt;/p&gt;</comment>
                            <comment id="11319" author="tappro" created="Thu, 24 Mar 2011 00:24:42 +0000"  >&lt;p&gt;Niu, that shouldn&apos;t be. Recovery is waiting for all clients to start and cannot end until connected client will finish recovery. That said if client was connected then 1) it will be added to recovery 2) if recovery stops before that then this client will be evicted as missed recovery window. I see no way how it can do replays after recovery window.&lt;/p&gt;

&lt;p&gt;Please note that this bug would be seen much often if exits but we start to see that bug only recently and only with bulk replays. I think that OSS recovery is right place to investigate.&lt;/p&gt;</comment>
                            <comment id="11520" author="niu" created="Tue, 29 Mar 2011 06:16:09 +0000"  >&lt;p&gt;Hi, Diego &amp;amp; Sebastien&lt;/p&gt;

&lt;p&gt;Is this LBUG reproduced with the HA and WARNING logs? So far, I don&apos;t see how it can be triggered with only code inspection, if there is any logs indicating what&apos;s the recovery stage it was on and if any client was evicted, that&apos;ll be helpful.&lt;/p&gt;

&lt;p&gt;Thanks&lt;br/&gt;
Niu&lt;/p&gt;</comment>
                            <comment id="11534" author="sebastien.buisson" created="Tue, 29 Mar 2011 08:56:38 +0000"  >&lt;p&gt;Hi Niu,&lt;/p&gt;

&lt;p&gt;We have asked our on-site Support Team for the logs, but unfortunately we have not been sent them.&lt;/p&gt;

&lt;p&gt;Sebastien.&lt;/p&gt;</comment>
                            <comment id="11556" author="pjones" created="Tue, 29 Mar 2011 14:17:21 +0000"  >&lt;p&gt;Dropping priority to reflect importance to CEA. Please just attach logs when you are able to gather them and we will see what we can do then.&lt;/p&gt;</comment>
                            <comment id="11789" author="tappro" created="Fri, 1 Apr 2011 09:54:54 +0000"  >&lt;p&gt;Peter, does it make sense to land 24420 patch on master, it is safe and replaces assertions on wire data. I have updated version of patch so it will require just inspections and testing&lt;/p&gt;</comment>
                            <comment id="12517" author="pjones" created="Thu, 7 Apr 2011 08:14:14 +0000"  >&lt;p&gt;Mike&lt;/p&gt;

&lt;p&gt;Yes, Oleg is ok with this approach. Please upload the patch to gerrit so that we can get the inspections underways&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="13186" author="pjones" created="Thu, 21 Apr 2011 07:22:39 +0000"  >&lt;p&gt;Mike&lt;/p&gt;

&lt;p&gt;Are you able to hand off your patch to Niu so he can complete the testing and landing of this patch?&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="13202" author="tappro" created="Thu, 21 Apr 2011 11:33:35 +0000"  >&lt;p&gt;Peter, I&apos;ve just added patch to gerrit for review. Local tests are passed.&lt;/p&gt;</comment>
                            <comment id="13203" author="pjones" created="Thu, 21 Apr 2011 12:33:39 +0000"  >&lt;p&gt;thanks Mike!&lt;/p&gt;</comment>
                            <comment id="14200" author="hudson" created="Wed, 11 May 2011 19:00:30 +0000"  >&lt;p&gt;Integrated in &lt;span class=&quot;image-wrap&quot; style=&quot;&quot;&gt;&lt;img src=&quot;http://newbuild.whamcloud.com/images/16x16/blue.png&quot; style=&quot;border: 0px solid black&quot; /&gt;&lt;/span&gt; &lt;a href=&quot;http://newbuild.whamcloud.com/job/lustre-master/./arch=x86_64,build_type=client,distro=el5,ib_stack=inkernel/117/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;lustre-master &#187; x86_64,client,el5,inkernel #117&lt;/a&gt;&lt;br/&gt;
     &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-128&quot; title=&quot;OSSs frequent crashes due to LBUG/[ASSERTION(last_rcvd&amp;gt;=le64_to_cpu(lcd-&amp;gt;lcd_last_transno)) failed] in recovery&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-128&quot;&gt;&lt;del&gt;LU-128&lt;/del&gt;&lt;/a&gt; Avoid assertion on wire data in last_rcvd update&lt;/p&gt;

&lt;p&gt;Oleg Drokin : &lt;a href=&quot;http://git.whamcloud.com/gitweb?p=fs/lustre-release.git;a=shortlog;h=refs/heads/master&amp;amp;a=commit&amp;amp;h=2bb3a7f6b9889af696485267eb254db7980fe193&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;2bb3a7f6b9889af696485267eb254db7980fe193&lt;/a&gt;&lt;br/&gt;
Files : &lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;lustre/mdt/mdt_recovery.c&lt;/li&gt;
	&lt;li&gt;lustre/mdt/mdt_open.c&lt;/li&gt;
	&lt;li&gt;lustre/obdfilter/filter.c&lt;/li&gt;
&lt;/ul&gt;
</comment>
                            <comment id="14202" author="hudson" created="Wed, 11 May 2011 19:02:48 +0000"  >&lt;p&gt;Integrated in &lt;span class=&quot;image-wrap&quot; style=&quot;&quot;&gt;&lt;img src=&quot;http://newbuild.whamcloud.com/images/16x16/blue.png&quot; style=&quot;border: 0px solid black&quot; /&gt;&lt;/span&gt; &lt;a href=&quot;http://newbuild.whamcloud.com/job/lustre-master/./arch=i686,build_type=client,distro=el5,ib_stack=inkernel/117/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;lustre-master &#187; i686,client,el5,inkernel #117&lt;/a&gt;&lt;br/&gt;
     &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-128&quot; title=&quot;OSSs frequent crashes due to LBUG/[ASSERTION(last_rcvd&amp;gt;=le64_to_cpu(lcd-&amp;gt;lcd_last_transno)) failed] in recovery&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-128&quot;&gt;&lt;del&gt;LU-128&lt;/del&gt;&lt;/a&gt; Avoid assertion on wire data in last_rcvd update&lt;/p&gt;

&lt;p&gt;Oleg Drokin : &lt;a href=&quot;http://git.whamcloud.com/gitweb?p=fs/lustre-release.git;a=shortlog;h=refs/heads/master&amp;amp;a=commit&amp;amp;h=2bb3a7f6b9889af696485267eb254db7980fe193&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;2bb3a7f6b9889af696485267eb254db7980fe193&lt;/a&gt;&lt;br/&gt;
Files : &lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;lustre/mdt/mdt_recovery.c&lt;/li&gt;
	&lt;li&gt;lustre/obdfilter/filter.c&lt;/li&gt;
	&lt;li&gt;lustre/mdt/mdt_open.c&lt;/li&gt;
&lt;/ul&gt;
</comment>
                            <comment id="14204" author="hudson" created="Wed, 11 May 2011 19:02:55 +0000"  >&lt;p&gt;Integrated in &lt;span class=&quot;image-wrap&quot; style=&quot;&quot;&gt;&lt;img src=&quot;http://newbuild.whamcloud.com/images/16x16/blue.png&quot; style=&quot;border: 0px solid black&quot; /&gt;&lt;/span&gt; &lt;a href=&quot;http://newbuild.whamcloud.com/job/lustre-master/./arch=i686,build_type=client,distro=el5,ib_stack=ofa/117/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;lustre-master &#187; i686,client,el5,ofa #117&lt;/a&gt;&lt;br/&gt;
     &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-128&quot; title=&quot;OSSs frequent crashes due to LBUG/[ASSERTION(last_rcvd&amp;gt;=le64_to_cpu(lcd-&amp;gt;lcd_last_transno)) failed] in recovery&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-128&quot;&gt;&lt;del&gt;LU-128&lt;/del&gt;&lt;/a&gt; Avoid assertion on wire data in last_rcvd update&lt;/p&gt;

&lt;p&gt;Oleg Drokin : &lt;a href=&quot;http://git.whamcloud.com/gitweb?p=fs/lustre-release.git;a=shortlog;h=refs/heads/master&amp;amp;a=commit&amp;amp;h=2bb3a7f6b9889af696485267eb254db7980fe193&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;2bb3a7f6b9889af696485267eb254db7980fe193&lt;/a&gt;&lt;br/&gt;
Files : &lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;lustre/mdt/mdt_recovery.c&lt;/li&gt;
	&lt;li&gt;lustre/obdfilter/filter.c&lt;/li&gt;
	&lt;li&gt;lustre/mdt/mdt_open.c&lt;/li&gt;
&lt;/ul&gt;
</comment>
                            <comment id="14206" author="hudson" created="Wed, 11 May 2011 19:03:49 +0000"  >&lt;p&gt;Integrated in &lt;span class=&quot;image-wrap&quot; style=&quot;&quot;&gt;&lt;img src=&quot;http://newbuild.whamcloud.com/images/16x16/blue.png&quot; style=&quot;border: 0px solid black&quot; /&gt;&lt;/span&gt; &lt;a href=&quot;http://newbuild.whamcloud.com/job/lustre-master/./arch=x86_64,build_type=client,distro=ubuntu1004,ib_stack=inkernel/117/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;lustre-master &#187; x86_64,client,ubuntu1004,inkernel #117&lt;/a&gt;&lt;br/&gt;
     &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-128&quot; title=&quot;OSSs frequent crashes due to LBUG/[ASSERTION(last_rcvd&amp;gt;=le64_to_cpu(lcd-&amp;gt;lcd_last_transno)) failed] in recovery&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-128&quot;&gt;&lt;del&gt;LU-128&lt;/del&gt;&lt;/a&gt; Avoid assertion on wire data in last_rcvd update&lt;/p&gt;

&lt;p&gt;Oleg Drokin : &lt;a href=&quot;http://git.whamcloud.com/gitweb?p=fs/lustre-release.git;a=shortlog;h=refs/heads/master&amp;amp;a=commit&amp;amp;h=2bb3a7f6b9889af696485267eb254db7980fe193&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;2bb3a7f6b9889af696485267eb254db7980fe193&lt;/a&gt;&lt;br/&gt;
Files : &lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;lustre/mdt/mdt_recovery.c&lt;/li&gt;
	&lt;li&gt;lustre/mdt/mdt_open.c&lt;/li&gt;
	&lt;li&gt;lustre/obdfilter/filter.c&lt;/li&gt;
&lt;/ul&gt;
</comment>
                            <comment id="14209" author="hudson" created="Wed, 11 May 2011 19:06:18 +0000"  >&lt;p&gt;Integrated in &lt;span class=&quot;image-wrap&quot; style=&quot;&quot;&gt;&lt;img src=&quot;http://newbuild.whamcloud.com/images/16x16/blue.png&quot; style=&quot;border: 0px solid black&quot; /&gt;&lt;/span&gt; &lt;a href=&quot;http://newbuild.whamcloud.com/job/lustre-master/./arch=x86_64,build_type=server,distro=el5,ib_stack=inkernel/117/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;lustre-master &#187; x86_64,server,el5,inkernel #117&lt;/a&gt;&lt;br/&gt;
     &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-128&quot; title=&quot;OSSs frequent crashes due to LBUG/[ASSERTION(last_rcvd&amp;gt;=le64_to_cpu(lcd-&amp;gt;lcd_last_transno)) failed] in recovery&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-128&quot;&gt;&lt;del&gt;LU-128&lt;/del&gt;&lt;/a&gt; Avoid assertion on wire data in last_rcvd update&lt;/p&gt;

&lt;p&gt;Oleg Drokin : &lt;a href=&quot;http://git.whamcloud.com/gitweb?p=fs/lustre-release.git;a=shortlog;h=refs/heads/master&amp;amp;a=commit&amp;amp;h=2bb3a7f6b9889af696485267eb254db7980fe193&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;2bb3a7f6b9889af696485267eb254db7980fe193&lt;/a&gt;&lt;br/&gt;
Files : &lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;lustre/obdfilter/filter.c&lt;/li&gt;
	&lt;li&gt;lustre/mdt/mdt_recovery.c&lt;/li&gt;
	&lt;li&gt;lustre/mdt/mdt_open.c&lt;/li&gt;
&lt;/ul&gt;
</comment>
                            <comment id="14212" author="hudson" created="Wed, 11 May 2011 19:17:43 +0000"  >&lt;p&gt;Integrated in &lt;span class=&quot;image-wrap&quot; style=&quot;&quot;&gt;&lt;img src=&quot;http://newbuild.whamcloud.com/images/16x16/blue.png&quot; style=&quot;border: 0px solid black&quot; /&gt;&lt;/span&gt; &lt;a href=&quot;http://newbuild.whamcloud.com/job/lustre-master/./arch=x86_64,build_type=client,distro=el6,ib_stack=inkernel/117/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;lustre-master &#187; x86_64,client,el6,inkernel #117&lt;/a&gt;&lt;br/&gt;
     &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-128&quot; title=&quot;OSSs frequent crashes due to LBUG/[ASSERTION(last_rcvd&amp;gt;=le64_to_cpu(lcd-&amp;gt;lcd_last_transno)) failed] in recovery&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-128&quot;&gt;&lt;del&gt;LU-128&lt;/del&gt;&lt;/a&gt; Avoid assertion on wire data in last_rcvd update&lt;/p&gt;

&lt;p&gt;Oleg Drokin : &lt;a href=&quot;http://git.whamcloud.com/gitweb?p=fs/lustre-release.git;a=shortlog;h=refs/heads/master&amp;amp;a=commit&amp;amp;h=2bb3a7f6b9889af696485267eb254db7980fe193&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;2bb3a7f6b9889af696485267eb254db7980fe193&lt;/a&gt;&lt;br/&gt;
Files : &lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;lustre/mdt/mdt_recovery.c&lt;/li&gt;
	&lt;li&gt;lustre/mdt/mdt_open.c&lt;/li&gt;
	&lt;li&gt;lustre/obdfilter/filter.c&lt;/li&gt;
&lt;/ul&gt;
</comment>
                            <comment id="14214" author="hudson" created="Wed, 11 May 2011 19:18:38 +0000"  >&lt;p&gt;Integrated in &lt;span class=&quot;image-wrap&quot; style=&quot;&quot;&gt;&lt;img src=&quot;http://newbuild.whamcloud.com/images/16x16/blue.png&quot; style=&quot;border: 0px solid black&quot; /&gt;&lt;/span&gt; &lt;a href=&quot;http://newbuild.whamcloud.com/job/lustre-master/./arch=x86_64,build_type=client,distro=el5,ib_stack=ofa/117/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;lustre-master &#187; x86_64,client,el5,ofa #117&lt;/a&gt;&lt;br/&gt;
     &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-128&quot; title=&quot;OSSs frequent crashes due to LBUG/[ASSERTION(last_rcvd&amp;gt;=le64_to_cpu(lcd-&amp;gt;lcd_last_transno)) failed] in recovery&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-128&quot;&gt;&lt;del&gt;LU-128&lt;/del&gt;&lt;/a&gt; Avoid assertion on wire data in last_rcvd update&lt;/p&gt;

&lt;p&gt;Oleg Drokin : &lt;a href=&quot;http://git.whamcloud.com/gitweb?p=fs/lustre-release.git;a=shortlog;h=refs/heads/master&amp;amp;a=commit&amp;amp;h=2bb3a7f6b9889af696485267eb254db7980fe193&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;2bb3a7f6b9889af696485267eb254db7980fe193&lt;/a&gt;&lt;br/&gt;
Files : &lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;lustre/mdt/mdt_open.c&lt;/li&gt;
	&lt;li&gt;lustre/obdfilter/filter.c&lt;/li&gt;
	&lt;li&gt;lustre/mdt/mdt_recovery.c&lt;/li&gt;
&lt;/ul&gt;
</comment>
                            <comment id="14216" author="hudson" created="Wed, 11 May 2011 19:21:42 +0000"  >&lt;p&gt;Integrated in &lt;span class=&quot;image-wrap&quot; style=&quot;&quot;&gt;&lt;img src=&quot;http://newbuild.whamcloud.com/images/16x16/blue.png&quot; style=&quot;border: 0px solid black&quot; /&gt;&lt;/span&gt; &lt;a href=&quot;http://newbuild.whamcloud.com/job/lustre-master/./arch=x86_64,build_type=client,distro=ubuntu1004,ib_stack=ofa/117/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;lustre-master &#187; x86_64,client,ubuntu1004,ofa #117&lt;/a&gt;&lt;br/&gt;
     &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-128&quot; title=&quot;OSSs frequent crashes due to LBUG/[ASSERTION(last_rcvd&amp;gt;=le64_to_cpu(lcd-&amp;gt;lcd_last_transno)) failed] in recovery&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-128&quot;&gt;&lt;del&gt;LU-128&lt;/del&gt;&lt;/a&gt; Avoid assertion on wire data in last_rcvd update&lt;/p&gt;

&lt;p&gt;Oleg Drokin : &lt;a href=&quot;http://git.whamcloud.com/gitweb?p=fs/lustre-release.git;a=shortlog;h=refs/heads/master&amp;amp;a=commit&amp;amp;h=2bb3a7f6b9889af696485267eb254db7980fe193&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;2bb3a7f6b9889af696485267eb254db7980fe193&lt;/a&gt;&lt;br/&gt;
Files : &lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;lustre/mdt/mdt_recovery.c&lt;/li&gt;
	&lt;li&gt;lustre/obdfilter/filter.c&lt;/li&gt;
	&lt;li&gt;lustre/mdt/mdt_open.c&lt;/li&gt;
&lt;/ul&gt;
</comment>
                            <comment id="14218" author="hudson" created="Wed, 11 May 2011 19:22:19 +0000"  >&lt;p&gt;Integrated in &lt;span class=&quot;image-wrap&quot; style=&quot;&quot;&gt;&lt;img src=&quot;http://newbuild.whamcloud.com/images/16x16/blue.png&quot; style=&quot;border: 0px solid black&quot; /&gt;&lt;/span&gt; &lt;a href=&quot;http://newbuild.whamcloud.com/job/lustre-master/./arch=x86_64,build_type=server,distro=el5,ib_stack=ofa/117/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;lustre-master &#187; x86_64,server,el5,ofa #117&lt;/a&gt;&lt;br/&gt;
     &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-128&quot; title=&quot;OSSs frequent crashes due to LBUG/[ASSERTION(last_rcvd&amp;gt;=le64_to_cpu(lcd-&amp;gt;lcd_last_transno)) failed] in recovery&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-128&quot;&gt;&lt;del&gt;LU-128&lt;/del&gt;&lt;/a&gt; Avoid assertion on wire data in last_rcvd update&lt;/p&gt;

&lt;p&gt;Oleg Drokin : &lt;a href=&quot;http://git.whamcloud.com/gitweb?p=fs/lustre-release.git;a=shortlog;h=refs/heads/master&amp;amp;a=commit&amp;amp;h=2bb3a7f6b9889af696485267eb254db7980fe193&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;2bb3a7f6b9889af696485267eb254db7980fe193&lt;/a&gt;&lt;br/&gt;
Files : &lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;lustre/obdfilter/filter.c&lt;/li&gt;
	&lt;li&gt;lustre/mdt/mdt_open.c&lt;/li&gt;
	&lt;li&gt;lustre/mdt/mdt_recovery.c&lt;/li&gt;
&lt;/ul&gt;
</comment>
                            <comment id="14220" author="hudson" created="Wed, 11 May 2011 19:23:28 +0000"  >&lt;p&gt;Integrated in &lt;span class=&quot;image-wrap&quot; style=&quot;&quot;&gt;&lt;img src=&quot;http://newbuild.whamcloud.com/images/16x16/blue.png&quot; style=&quot;border: 0px solid black&quot; /&gt;&lt;/span&gt; &lt;a href=&quot;http://newbuild.whamcloud.com/job/lustre-master/./arch=i686,build_type=server,distro=el5,ib_stack=ofa/117/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;lustre-master &#187; i686,server,el5,ofa #117&lt;/a&gt;&lt;br/&gt;
     &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-128&quot; title=&quot;OSSs frequent crashes due to LBUG/[ASSERTION(last_rcvd&amp;gt;=le64_to_cpu(lcd-&amp;gt;lcd_last_transno)) failed] in recovery&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-128&quot;&gt;&lt;del&gt;LU-128&lt;/del&gt;&lt;/a&gt; Avoid assertion on wire data in last_rcvd update&lt;/p&gt;

&lt;p&gt;Oleg Drokin : &lt;a href=&quot;http://git.whamcloud.com/gitweb?p=fs/lustre-release.git;a=shortlog;h=refs/heads/master&amp;amp;a=commit&amp;amp;h=2bb3a7f6b9889af696485267eb254db7980fe193&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;2bb3a7f6b9889af696485267eb254db7980fe193&lt;/a&gt;&lt;br/&gt;
Files : &lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;lustre/obdfilter/filter.c&lt;/li&gt;
	&lt;li&gt;lustre/mdt/mdt_recovery.c&lt;/li&gt;
	&lt;li&gt;lustre/mdt/mdt_open.c&lt;/li&gt;
&lt;/ul&gt;
</comment>
                            <comment id="14222" author="hudson" created="Wed, 11 May 2011 19:24:05 +0000"  >&lt;p&gt;Integrated in &lt;span class=&quot;image-wrap&quot; style=&quot;&quot;&gt;&lt;img src=&quot;http://newbuild.whamcloud.com/images/16x16/blue.png&quot; style=&quot;border: 0px solid black&quot; /&gt;&lt;/span&gt; &lt;a href=&quot;http://newbuild.whamcloud.com/job/lustre-master/./arch=i686,build_type=server,distro=el5,ib_stack=inkernel/117/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;lustre-master &#187; i686,server,el5,inkernel #117&lt;/a&gt;&lt;br/&gt;
     &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-128&quot; title=&quot;OSSs frequent crashes due to LBUG/[ASSERTION(last_rcvd&amp;gt;=le64_to_cpu(lcd-&amp;gt;lcd_last_transno)) failed] in recovery&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-128&quot;&gt;&lt;del&gt;LU-128&lt;/del&gt;&lt;/a&gt; Avoid assertion on wire data in last_rcvd update&lt;/p&gt;

&lt;p&gt;Oleg Drokin : &lt;a href=&quot;http://git.whamcloud.com/gitweb?p=fs/lustre-release.git;a=shortlog;h=refs/heads/master&amp;amp;a=commit&amp;amp;h=2bb3a7f6b9889af696485267eb254db7980fe193&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;2bb3a7f6b9889af696485267eb254db7980fe193&lt;/a&gt;&lt;br/&gt;
Files : &lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;lustre/mdt/mdt_recovery.c&lt;/li&gt;
	&lt;li&gt;lustre/obdfilter/filter.c&lt;/li&gt;
	&lt;li&gt;lustre/mdt/mdt_open.c&lt;/li&gt;
&lt;/ul&gt;
</comment>
                            <comment id="14225" author="hudson" created="Wed, 11 May 2011 19:36:31 +0000"  >&lt;p&gt;Integrated in &lt;span class=&quot;image-wrap&quot; style=&quot;&quot;&gt;&lt;img src=&quot;http://newbuild.whamcloud.com/images/16x16/blue.png&quot; style=&quot;border: 0px solid black&quot; /&gt;&lt;/span&gt; &lt;a href=&quot;http://newbuild.whamcloud.com/job/lustre-master/./arch=x86_64,build_type=server,distro=el6,ib_stack=inkernel/117/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;lustre-master &#187; x86_64,server,el6,inkernel #117&lt;/a&gt;&lt;br/&gt;
     &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-128&quot; title=&quot;OSSs frequent crashes due to LBUG/[ASSERTION(last_rcvd&amp;gt;=le64_to_cpu(lcd-&amp;gt;lcd_last_transno)) failed] in recovery&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-128&quot;&gt;&lt;del&gt;LU-128&lt;/del&gt;&lt;/a&gt; Avoid assertion on wire data in last_rcvd update&lt;/p&gt;

&lt;p&gt;Oleg Drokin : &lt;a href=&quot;http://git.whamcloud.com/gitweb?p=fs/lustre-release.git;a=shortlog;h=refs/heads/master&amp;amp;a=commit&amp;amp;h=2bb3a7f6b9889af696485267eb254db7980fe193&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;2bb3a7f6b9889af696485267eb254db7980fe193&lt;/a&gt;&lt;br/&gt;
Files : &lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;lustre/obdfilter/filter.c&lt;/li&gt;
	&lt;li&gt;lustre/mdt/mdt_recovery.c&lt;/li&gt;
	&lt;li&gt;lustre/mdt/mdt_open.c&lt;/li&gt;
&lt;/ul&gt;
</comment>
                            <comment id="14227" author="hudson" created="Wed, 11 May 2011 19:44:52 +0000"  >&lt;p&gt;Integrated in &lt;span class=&quot;image-wrap&quot; style=&quot;&quot;&gt;&lt;img src=&quot;http://newbuild.whamcloud.com/images/16x16/blue.png&quot; style=&quot;border: 0px solid black&quot; /&gt;&lt;/span&gt; &lt;a href=&quot;http://newbuild.whamcloud.com/job/lustre-master/./arch=i686,build_type=client,distro=el6,ib_stack=inkernel/117/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;lustre-master &#187; i686,client,el6,inkernel #117&lt;/a&gt;&lt;br/&gt;
     &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-128&quot; title=&quot;OSSs frequent crashes due to LBUG/[ASSERTION(last_rcvd&amp;gt;=le64_to_cpu(lcd-&amp;gt;lcd_last_transno)) failed] in recovery&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-128&quot;&gt;&lt;del&gt;LU-128&lt;/del&gt;&lt;/a&gt; Avoid assertion on wire data in last_rcvd update&lt;/p&gt;

&lt;p&gt;Oleg Drokin : &lt;a href=&quot;http://git.whamcloud.com/gitweb?p=fs/lustre-release.git;a=shortlog;h=refs/heads/master&amp;amp;a=commit&amp;amp;h=2bb3a7f6b9889af696485267eb254db7980fe193&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;2bb3a7f6b9889af696485267eb254db7980fe193&lt;/a&gt;&lt;br/&gt;
Files : &lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;lustre/obdfilter/filter.c&lt;/li&gt;
	&lt;li&gt;lustre/mdt/mdt_open.c&lt;/li&gt;
	&lt;li&gt;lustre/mdt/mdt_recovery.c&lt;/li&gt;
&lt;/ul&gt;
</comment>
                            <comment id="14229" author="hudson" created="Wed, 11 May 2011 20:08:33 +0000"  >&lt;p&gt;Integrated in &lt;span class=&quot;image-wrap&quot; style=&quot;&quot;&gt;&lt;img src=&quot;http://newbuild.whamcloud.com/images/16x16/blue.png&quot; style=&quot;border: 0px solid black&quot; /&gt;&lt;/span&gt; &lt;a href=&quot;http://newbuild.whamcloud.com/job/lustre-master/./arch=i686,build_type=server,distro=el6,ib_stack=inkernel/117/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;lustre-master &#187; i686,server,el6,inkernel #117&lt;/a&gt;&lt;br/&gt;
     &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-128&quot; title=&quot;OSSs frequent crashes due to LBUG/[ASSERTION(last_rcvd&amp;gt;=le64_to_cpu(lcd-&amp;gt;lcd_last_transno)) failed] in recovery&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-128&quot;&gt;&lt;del&gt;LU-128&lt;/del&gt;&lt;/a&gt; Avoid assertion on wire data in last_rcvd update&lt;/p&gt;

&lt;p&gt;Oleg Drokin : &lt;a href=&quot;http://git.whamcloud.com/gitweb?p=fs/lustre-release.git;a=shortlog;h=refs/heads/master&amp;amp;a=commit&amp;amp;h=2bb3a7f6b9889af696485267eb254db7980fe193&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;2bb3a7f6b9889af696485267eb254db7980fe193&lt;/a&gt;&lt;br/&gt;
Files : &lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;lustre/obdfilter/filter.c&lt;/li&gt;
	&lt;li&gt;lustre/mdt/mdt_open.c&lt;/li&gt;
	&lt;li&gt;lustre/mdt/mdt_recovery.c&lt;/li&gt;
&lt;/ul&gt;
</comment>
                            <comment id="14496" author="pichong" created="Wed, 18 May 2011 08:01:47 +0000"  >&lt;p&gt;The patch has been installed on our client&apos;s cluster, and he started to get MDS crashes&lt;br/&gt;
for LBUG/ASSERTION(req_is_replay(req)) failed.&lt;/p&gt;

&lt;p&gt;The MDS Panic thread stack-trace looks like following :&lt;br/&gt;
=======================================================&lt;br/&gt;
panic()&lt;br/&gt;
lbug_with_loc()&lt;br/&gt;
libcfs_assertion_failed()&lt;br/&gt;
mdt_txn_stop_cb()&lt;br/&gt;
dt_txn_hook_stop()&lt;br/&gt;
osd_trans_stop()&lt;br/&gt;
mdd_trans_stop()&lt;br/&gt;
mdd_create()&lt;br/&gt;
cml_create()&lt;br/&gt;
mdt_reint_open()&lt;br/&gt;
mdt_reint_rec()&lt;br/&gt;
mdt_reint_internal()&lt;br/&gt;
mdt_intent_reint()&lt;br/&gt;
mdt_intent_policy()&lt;br/&gt;
ldlm_lock_enqueue()&lt;br/&gt;
ldlm_handle_enqueue0()&lt;br/&gt;
mdt_enqueue()&lt;br/&gt;
mdt_handle_common()&lt;br/&gt;
mdt_regular_handle()&lt;br/&gt;
ptlrpc_server_handle_request()&lt;br/&gt;
ptlrpc_main()&lt;br/&gt;
kernel_thread()&lt;br/&gt;
=======================================================&lt;/p&gt;

&lt;p&gt;Looking at the stack trace, the failing ASSERTION(req_is_replay(req)) is likely to come from the fix in /lustre/mdt/mdt_recovery.c.&lt;/p&gt;

&lt;p&gt;It appears some scenario are still not covered by the patch.&lt;/p&gt;

&lt;p&gt;Could you have a look ?&lt;br/&gt;
Thanks,&lt;/p&gt;

&lt;p&gt;Gr&#233;goire.&lt;/p&gt;</comment>
                            <comment id="14497" author="tappro" created="Wed, 18 May 2011 08:28:05 +0000"  >&lt;p&gt;Can you provide more information about setup, what are versions on clients and servers? When does bug occur - immediately after start or occasionally during normal work?&lt;/p&gt;</comment>
                            <comment id="14654" author="pichong" created="Thu, 19 May 2011 05:14:00 +0000"  >&lt;p&gt;It is the same cluster than initial problem (Lustre 2.0.0, x86_64, RHEL6).&lt;/p&gt;

&lt;p&gt;Here is the information from the client:&lt;br/&gt;
The 1st occurence was during normal operations, and next ones during restart+recovery.&lt;br/&gt;
And no msgs of connection problems nor client eviction at that time ...&lt;/p&gt;

&lt;p&gt;The patch has been removed from the cluster.&lt;/p&gt;</comment>
                            <comment id="14655" author="tappro" created="Thu, 19 May 2011 06:01:45 +0000"  >&lt;p&gt;That reason can be other just 2.0.0 code itself, some bug which causes this assertion. The assert can be removed from patch, in that case there will be client evictions only and more debug info about why that is happening.&lt;/p&gt;</comment>
                            <comment id="15020" author="pjones" created="Wed, 25 May 2011 06:00:47 +0000"  >&lt;p&gt;Adding as 2.1 blocker upon advice of Bull because this patch is landed for 2.1 and caused issues when deployed in production at CEA.&lt;/p&gt;</comment>
                            <comment id="15021" author="tappro" created="Wed, 25 May 2011 06:27:23 +0000"  >&lt;p&gt;Peter, it is not quite correct, patch was landed for 2.1 but problems with it were seen with 2.0.0. I am afraid the differences between 2.1 and 2.0.0 can be the reason for this. So it is not blocker for 2.1 at least we saw no issues with it so far, but we need patch which works with 2.0.0 correctly. &lt;/p&gt;

&lt;p&gt;I&apos;d propose to cook special patch for Bull with removed assertions just to make it loyal to any error and we will be able to see debug messages if issues will occur again.&lt;/p&gt;</comment>
                            <comment id="15030" author="pjones" created="Wed, 25 May 2011 07:04:17 +0000"  >&lt;p&gt;Ah thanks for clarifying Mike. This can certainly remain an important support issue for CEA and a priority for us without being considered a 2.1 blocker. I have adjusted the status accordingly.&lt;/p&gt;</comment>
                            <comment id="16092" author="pjones" created="Mon, 13 Jun 2011 14:16:33 +0000"  >&lt;p&gt;Mike&lt;/p&gt;

&lt;p&gt;Are you still expecting to be able to create a patch based on 2.0 for CEA?&lt;/p&gt;

&lt;p&gt;CEA,&lt;/p&gt;

&lt;p&gt;Would you deploy such a patch or is the window until you rebase on 2.1 small enough that it would not be worthwhile?&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="16266" author="sebastien.buisson" created="Tue, 14 Jun 2011 03:47:32 +0000"  >&lt;p&gt;Peter,&lt;/p&gt;

&lt;p&gt;As suggested by Mike we cooked a patch for CEA with removed assertions just to make it loyal to any error. Also these error messages are deactivated by default, but can be activated via a kernel module option. That way CEA will be able to collect debug messages when the issue reoccurs.&lt;/p&gt;

&lt;p&gt;Sebastien.&lt;/p&gt;</comment>
                            <comment id="16271" author="pjones" created="Tue, 14 Jun 2011 08:06:37 +0000"  >&lt;p&gt;ok then I think that we can close this ticket and reopen it if it transpires that we need to take any further action before CEA realign on 2.1&lt;/p&gt;</comment>
                            <comment id="16272" author="sebastien.buisson" created="Tue, 14 Jun 2011 08:13:14 +0000"  >&lt;p&gt;I agree.&lt;/p&gt;

&lt;p&gt;Thank you,&lt;br/&gt;
Sebastien.&lt;/p&gt;</comment>
                            <comment id="19650" author="niu" created="Thu, 25 Aug 2011 23:38:45 +0000"  >&lt;blockquote&gt;
&lt;p&gt;The patch has been installed on our client&apos;s cluster, and he started to get MDS crashes&lt;br/&gt;
for LBUG/ASSERTION(req_is_replay(req)) failed.&lt;/p&gt;

&lt;p&gt;The MDS Panic thread stack-trace looks like following :&lt;br/&gt;
=======================================================&lt;br/&gt;
panic()&lt;br/&gt;
lbug_with_loc()&lt;br/&gt;
libcfs_assertion_failed()&lt;br/&gt;
mdt_txn_stop_cb()&lt;br/&gt;
dt_txn_hook_stop()&lt;br/&gt;
osd_trans_stop()&lt;br/&gt;
mdd_trans_stop()&lt;br/&gt;
mdd_create()&lt;br/&gt;
cml_create()&lt;br/&gt;
mdt_reint_open()&lt;br/&gt;
mdt_reint_rec()&lt;br/&gt;
mdt_reint_internal()&lt;br/&gt;
mdt_intent_reint()&lt;br/&gt;
mdt_intent_policy()&lt;br/&gt;
ldlm_lock_enqueue()&lt;br/&gt;
ldlm_handle_enqueue0()&lt;br/&gt;
mdt_enqueue()&lt;br/&gt;
mdt_handle_common()&lt;br/&gt;
mdt_regular_handle()&lt;br/&gt;
ptlrpc_server_handle_request()&lt;br/&gt;
ptlrpc_main()&lt;br/&gt;
kernel_thread()&lt;br/&gt;
=======================================================&lt;/p&gt;

&lt;p&gt;Looking at the stack trace, the failing ASSERTION(req_is_replay(req)) is likely to come from the fix in /lustre/mdt/mdt_recovery.c.&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;This should be fixed in &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-617&quot; title=&quot;LBUG: (mdt_recovery.c:787:mdt_last_rcvd_update()) ASSERTION(req_is_replay(req)) failed&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-617&quot;&gt;&lt;del&gt;LU-617&lt;/del&gt;&lt;/a&gt;.&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                            <attachment id="10145" name="24420-b18.patch" size="2614" author="sebastien.buisson" created="Tue, 15 Mar 2011 01:49:22 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                    <customfield id="customfield_10020" key="com.atlassian.jira.plugin.system.customfieldtypes:float">
                        <customfieldname>Bugzilla ID</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>24420.0</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzv9an:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>5040</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>