<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:39:42 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-10958] brw rpc reordering causes data corruption when the writethrough cache is disabled</title>
                <link>https://jira.whamcloud.com/browse/LU-10958</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;We ran IOR with LNet router failure simulation and encountered data corruption which seems to be reproducible on master.&lt;/p&gt;

&lt;p&gt;The following scenario happens:&lt;/p&gt;
&lt;ol&gt;
	&lt;li&gt;a client thread writes some data to page N of file X&lt;/li&gt;
	&lt;li&gt;page N is transfered to the OSS, the processing thread sleeps somewhere&lt;/li&gt;
	&lt;li&gt;the original BRW request timeouts and the client resends page N&lt;/li&gt;
	&lt;li&gt;page N is successfully written to disk, the client receives the reply and clears PG_Writeback&lt;/li&gt;
	&lt;li&gt;a client thread writes different data to the same page N of file X&lt;/li&gt;
	&lt;li&gt;page N with the new data is successfully written to disk, the client receives the reply and clears PG_Writeback&lt;/li&gt;
	&lt;li&gt;the OSS thread from step 2 wakes up and writes stale data to disk &lt;span class=&quot;error&quot;&gt;&amp;#91;data corruption&amp;#93;&lt;/span&gt;&lt;/li&gt;
&lt;/ol&gt;


&lt;p&gt;A reproducer will be uploaded shortly.&lt;/p&gt;</description>
                <environment></environment>
        <key id="52015">LU-10958</key>
            <summary>brw rpc reordering causes data corruption when the writethrough cache is disabled</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="2" iconUrl="https://jira.whamcloud.com/images/icons/priorities/critical.svg">Critical</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="panda">Andrew Perepechko</assignee>
                                    <reporter username="panda">Andrew Perepechko</reporter>
                        <labels>
                    </labels>
                <created>Thu, 26 Apr 2018 15:22:59 +0000</created>
                <updated>Thu, 11 Feb 2021 14:38:47 +0000</updated>
                            <resolved>Mon, 8 Feb 2021 21:59:05 +0000</resolved>
                                                    <fixVersion>Lustre 2.14.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>10</watches>
                                                                            <comments>
                            <comment id="226804" author="gerrit" created="Thu, 26 Apr 2018 15:32:21 +0000"  >&lt;p&gt;Andrew Perepechko (c17827@cray.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/32165&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/32165&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-10958&quot; title=&quot;brw rpc reordering causes data corruption when the writethrough cache is disabled&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-10958&quot;&gt;&lt;del&gt;LU-10958&lt;/del&gt;&lt;/a&gt; tests: data corruption due to RPC reordering&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: b537966042d03f7d04dd39e114d9e85d21ce1b8c&lt;/p&gt;</comment>
                            <comment id="226805" author="green" created="Thu, 26 Apr 2018 15:33:28 +0000"  >&lt;p&gt;hm... so I imagine this would also be the case in the old 1.x code before we got writeback cache enabled, right?&lt;/p&gt;

&lt;p&gt;So... I imagine while the client is having a page in-flight to the server, the page on the client is market writeback, so no further modifications are possible until the write actually completes, how do you avoid this? directio?&lt;/p&gt;</comment>
                            <comment id="226806" author="panda" created="Thu, 26 Apr 2018 15:37:35 +0000"  >&lt;blockquote&gt;&lt;p&gt;hm... so I imagine this would also be the case in the old 1.x code before we got writeback cache enabled, right?&lt;/p&gt;

&lt;p&gt;So... I imagine while the client is having a page in-flight to the server, the page on the client is market writeback, so no further modifications are possible until the write actually completes, how do you avoid this? directio?&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;We were able to reproduce the bug with Lustre 2.7, but we think earlier Lustre versions can be affected too.&lt;/p&gt;

&lt;p&gt;The successful reply to the resent request will clear PG_writeback on the client. However, the original request can still be being processed on the OSS.&lt;/p&gt;

&lt;p&gt;When caching is enabled, this scenario cannot happen because the stuck OSS thread either holds the page lock which serializes writes or did not get so far as to get the page lock but in this case it also hasn&apos;t transferred the stale data to the OSS, so even if it&apos;s reordered it gets the latest data.&lt;/p&gt;</comment>
                            <comment id="226807" author="green" created="Thu, 26 Apr 2018 15:39:16 +0000"  >&lt;p&gt;Ah, I see there&apos;s a resend. then it&apos;s even more strange! We have a logic to catch resends so that the same resend it not executed twice - so we should not be able to both be running at the same time, right?&lt;/p&gt;

&lt;p&gt;Something does not add up.&lt;/p&gt;</comment>
                            <comment id="226809" author="panda" created="Thu, 26 Apr 2018 15:43:33 +0000"  >&lt;p&gt;Perhaps, such logic is only implemented for the MDS. What do you think?&lt;/p&gt;</comment>
                            <comment id="226811" author="green" created="Thu, 26 Apr 2018 15:48:32 +0000"  >&lt;p&gt;the logic should be generic enough. I have a passing memory about bulk xids being incremented on resend, but that should not affect detection of double request processing. We even have code to plug into existing request processing when this is detected so you don&apos;t wait exponentially longer on long operations (though it is possible this does not happens for bulk requests)&lt;/p&gt;</comment>
                            <comment id="226812" author="panda" created="Thu, 26 Apr 2018 16:03:01 +0000"  >&lt;p&gt;I uploaded the logs from my text box as 134.tar.bz2. Something like the following happens:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;root@panda:/tmp/134# egrep &lt;span class=&quot;code-quote&quot;&gt;&quot;tgt_brw_write|tgt_brw_read&quot;&lt;/span&gt; recovery-small.test_134.debug_log.panda-testbox.1524758128.log
(1) writing a file of 4096 zeros
00000020:00000001:1.0:1524758106.298943:0:28572:0:(tgt_handler.c:2270:tgt_brw_write()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; entered
00000020:00000040:1.0:1524758106.299164:0:28572:0:(tgt_handler.c:2385:tgt_brw_write()) Client use &lt;span class=&quot;code-object&quot;&gt;short&lt;/span&gt; io &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; data transfer, size = 4096
00000020:00000040:1.0:1524758106.299205:0:28572:0:(tgt_handler.c:2446:tgt_brw_write()) Checksum 1 from 12345-0@lo OK: 10000001
00000020:00000001:1.0:1524758106.341298:0:28572:0:(tgt_handler.c:2514:tgt_brw_write()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
(2) copying &lt;span class=&quot;code-keyword&quot;&gt;this&lt;/span&gt; file to /tmp &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; later cmp
00000020:00000001:1.0:1524758106.362982:0:28572:0:(tgt_handler.c:1956:tgt_brw_read()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; entered
00000020:00008000:1.0:1524758106.378496:0:28572:0:(tgt_handler.c:2089:tgt_brw_read()) checksum at read origin: 10000001
00000020:00000001:1.0:1524758106.378764:0:28572:0:(tgt_handler.c:2174:tgt_brw_read()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=4096 : 4096 : 1000)
00000020:00000001:1.0:1524758106.381031:0:28572:0:(tgt_handler.c:1956:tgt_brw_read()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; entered
00000020:00008000:1.0:1524758106.381160:0:28572:0:(tgt_handler.c:2089:tgt_brw_read()) checksum at read origin: 1
00000020:00000001:1.0:1524758106.381374:0:28572:0:(tgt_handler.c:2174:tgt_brw_read()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
(3) initial brw rpc
00000020:00000001:1.0:1524758106.521557:0:28572:0:(tgt_handler.c:2270:tgt_brw_write()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; entered
00000020:00000040:1.0:1524758106.521743:0:28572:0:(tgt_handler.c:2385:tgt_brw_write()) Client use &lt;span class=&quot;code-object&quot;&gt;short&lt;/span&gt; io &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; data transfer, size = 4096
00000020:00000040:1.0:1524758106.521785:0:28572:0:(tgt_handler.c:2446:tgt_brw_write()) Checksum 2 from 12345-0@lo OK: 4ebccd1f
(4) here things get stalled and the client resends &lt;span class=&quot;code-keyword&quot;&gt;this&lt;/span&gt; rpc
00000020:00000001:1.0:1524758126.481399:0:30161:0:(tgt_handler.c:2270:tgt_brw_write()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; entered
00000020:00000020:1.0:1524758126.481435:0:30161:0:(tgt_handler.c:2361:tgt_brw_write()) @@@ clear resent/replay req grant info req@ffff880125623c50 x1598824716964176/t0(0) o4-&amp;gt;60f13dec-9165-f217-961e-91740a2150d5@0@lo:0/0 lens 4704/448 e 0 to 0 dl 1524758146 ref 1 fl Interpret:/2/0 rc 0/0
00000020:00000040:1.0:1524758126.481634:0:30161:0:(tgt_handler.c:2385:tgt_brw_write()) Client use &lt;span class=&quot;code-object&quot;&gt;short&lt;/span&gt; io &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; data transfer, size = 4096
00000020:00000001:1.0:1524758126.482205:0:30161:0:(tgt_handler.c:2514:tgt_brw_write()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
(5) the client is now able to rewrite old data
00000020:00000001:0.0:1524758126.606629:0:30161:0:(tgt_handler.c:2270:tgt_brw_write()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; entered
00000020:00000040:0.0:1524758126.606824:0:30161:0:(tgt_handler.c:2385:tgt_brw_write()) Client use &lt;span class=&quot;code-object&quot;&gt;short&lt;/span&gt; io &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; data transfer, size = 4096
00000020:00000040:0.0:1524758126.606865:0:30161:0:(tgt_handler.c:2446:tgt_brw_write()) Checksum 4 from 12345-0@lo OK: 10000001
00000020:00000001:0.0:1524758126.612424:0:30161:0:(tgt_handler.c:2514:tgt_brw_write()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
(6) the data from the initial brw rpc get written to disk (corruption)
00000020:00000001:1.0:1524758128.533347:0:28572:0:(tgt_handler.c:2514:tgt_brw_write()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
(7) the &lt;span class=&quot;code-keyword&quot;&gt;final&lt;/span&gt; data verification
00000020:00000001:1.0:1524758128.768109:0:28572:0:(tgt_handler.c:1956:tgt_brw_read()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; entered
00000020:00008000:1.0:1524758128.778266:0:28572:0:(tgt_handler.c:2089:tgt_brw_read()) checksum at read origin: 4ebccd1f
00000020:00000001:1.0:1524758128.778548:0:28572:0:(tgt_handler.c:2174:tgt_brw_read()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=4096 : 4096 : 1000)
00000020:00000001:1.0:1524758128.780809:0:28572:0:(tgt_handler.c:1956:tgt_brw_read()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; entered
00000020:00008000:1.0:1524758128.780932:0:28572:0:(tgt_handler.c:2089:tgt_brw_read()) checksum at read origin: 1
00000020:00000001:1.0:1524758128.781146:0:28572:0:(tgt_handler.c:2174:tgt_brw_read()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0 &lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&#160;&lt;/p&gt;</comment>
                            <comment id="227280" author="gerrit" created="Thu, 3 May 2018 22:16:12 +0000"  >&lt;p&gt;Andrew Perepechko (c17827@cray.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/32281&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/32281&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-10958&quot; title=&quot;brw rpc reordering causes data corruption when the writethrough cache is disabled&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-10958&quot;&gt;&lt;del&gt;LU-10958&lt;/del&gt;&lt;/a&gt; ofd: data corruption due to RPC reordering&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: f0f14e60a61c227734c8cebfe2cb1c0cd2397f20&lt;/p&gt;</comment>
                            <comment id="227433" author="panda" created="Mon, 7 May 2018 16:30:22 +0000"  >&lt;blockquote&gt;&lt;p&gt;the logic should be generic enough. I have a passing memory about bulk xids being incremented on resend, but that should not affect detection of double request processing.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;This topic was raised again during review. It seems like ptlrpc_server_check_resend_in_progress() was written in such a way as not to drop brw resends in progress since bulk transfer may fail after match bits get changed. In our scenario the real delay happens in the dm/mdraid layer after the bulk transfer succeeded.&lt;/p&gt;

&lt;p&gt;No -EBUSY from ptlrpc_server_request_add() calls in the attached log:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;root@panda:/tmp/134# egrep &lt;span class=&quot;code-quote&quot;&gt;&quot;server_request_add.*leaving&quot;&lt;/span&gt; recovery-small.test_134.debug_log.panda-testbox.1524758128.log
00000100:00000001:1.0:1524758084.215027:0:28252:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:1.0:1524758084.296625:0:28252:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:0.0:1524758084.372998:0:28252:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:0.0:1524758084.374387:0:28252:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:0.0:1524758084.375747:0:28252:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:0.0:1524758084.377223:0:28252:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:0.0:1524758084.378622:0:28252:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:0.0:1524758084.379762:0:28252:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:0.0:1524758084.381124:0:28252:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:0.0:1524758084.618182:0:28252:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
...
00000100:00000001:2.0:1524758109.428916:0:28255:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:1.0:1524758109.429419:0:28568:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:1.0:1524758109.429583:0:28568:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:2.0:1524758109.429676:0:28255:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:1.0:1524758109.429710:0:28568:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:2.0:1524758109.429816:0:28255:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:1.0:1524758109.429839:0:28568:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:2.0:1524758109.429948:0:28255:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:2.0:1524758109.430084:0:28255:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:0.0:1524758109.430352:0:28957:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:0.0:1524758109.430510:0:28957:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:2.0:1524758109.491963:0:30160:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:2.0:1524758111.940049:0:30160:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:0.0:1524758111.940090:0:28570:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:3.0:1524758114.429713:0:28255:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:3.0:1524758114.429848:0:28255:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:0.0:1524758114.429912:0:28839:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:2.0:1524758114.430457:0:28252:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:3.0:1524758114.430567:0:28567:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:1.0:1524758114.430693:0:28568:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:3.0:1524758114.430713:0:28567:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:0.0:1524758114.431758:0:28839:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:3.0:1524758114.431789:0:28567:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:0.0:1524758114.432571:0:28957:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:1.0:1524758114.433376:0:28568:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:3.0:1524758114.433438:0:28839:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:2.0:1524758114.499932:0:30160:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:3.0:1524758116.948122:0:29382:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:3.0:1524758116.948779:0:30160:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:2.0:1524758119.430075:0:28252:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:3.0:1524758119.430712:0:28568:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:3.0:1524758119.430857:0:28957:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:0.0:1524758119.430972:0:28255:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:0.0:1524758119.431110:0:28255:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:1.0:1524758119.431420:0:28566:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:2.0:1524758119.432337:0:28839:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:2.0:1524758119.432462:0:28839:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:2.0:1524758119.432919:0:28588:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:3.0:1524758119.433354:0:28957:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:3.0:1524758119.433531:0:28957:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:3.0:1524758119.434360:0:28957:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:3.0:1524758119.508004:0:30160:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:2.0:1524758121.956131:0:29382:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:2.0:1524758121.956785:0:30160:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:2.0:1524758124.431106:0:28252:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:3.0:1524758124.431686:0:28839:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:3.0:1524758124.431826:0:28839:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:3.0:1524758124.432540:0:28839:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:2.0:1524758124.432774:0:28957:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:2.0:1524758124.432916:0:28957:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:2.0:1524758124.433065:0:28957:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:2.0:1524758124.433192:0:28957:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:0.0:1524758124.433610:0:28568:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:3.0:1524758124.433857:0:28958:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:3.0:1524758124.435074:0:28839:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:3.0:1524758124.435867:0:28839:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:2.0:1524758124.515970:0:30160:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:0.0:1524758126.464702:0:28568:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:0.0:1524758126.464840:0:28568:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:3.0:1524758126.465510:0:28256:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:3.0:1524758126.465654:0:28256:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:1.0:1524758126.465754:0:28255:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:0.0:1524758126.465784:0:28568:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:3.0:1524758126.465793:0:28256:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:0.0:1524758126.465928:0:28568:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:0.0:1524758126.466064:0:28568:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:0.0:1524758126.466192:0:28568:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:0.0:1524758126.468126:0:28588:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:3.0:1524758126.468374:0:28252:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:0.0:1524758126.481228:0:28573:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:0.0:1524758126.483509:0:28588:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:3.0:1524758126.521548:0:28567:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:1.0:1524758126.591581:0:28259:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:2.0:1524758126.599733:0:28588:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:1.0:1524758126.603148:0:29095:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:0.0:1524758126.606446:0:30161:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:2.0:1524758126.613821:0:28588:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:3.0:1524758126.629660:0:28567:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:1.0:1524758126.651194:0:28259:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:3.0:1524758126.964226:0:29382:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:2.0:1524758126.964243:0:30160:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:0.0:1524758128.709591:0:29095:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:3.0:1524758128.759534:0:28588:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:0.0:1524758128.762873:0:29095:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:3.0:1524758128.763899:0:28567:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:1.0:1524758128.767953:0:28572:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:1.0:1524758128.780653:0:28572:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00000100:00000001:0.0:1524758128.785932:0:28258:0:(service.c:1732:ptlrpc_server_request_add()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
 &lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="230022" author="cfaber" created="Fri, 6 Jul 2018 20:07:56 +0000"  >&lt;p&gt;Hi Peter,&lt;/p&gt;

&lt;p&gt;Can we get some attention on this bug? It&apos;s low hanging fruit.&lt;/p&gt;

&lt;p&gt;-cf&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;</comment>
                            <comment id="231723" author="adilger" created="Thu, 9 Aug 2018 17:05:35 +0000"  >&lt;blockquote&gt;
&lt;p&gt;In our scenario the real delay happens in the dm/mdraid layer after the bulk transfer succeeded.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;Isn&apos;t that a problem of the DM/mdraid later that it is reordering writes incorrectly?  If the OST thread submit writes to disk as A, A&apos;, B, but the disk writes A&apos;, B, A because A was blocked in the IO stack, then there isn&apos;t much we can do about it.&lt;/p&gt;</comment>
                            <comment id="231725" author="panda" created="Thu, 9 Aug 2018 17:09:29 +0000"  >&lt;p&gt;No, I don&apos;t think so. There&apos;s nothing wrong in the md layer except the delay itself which makes it possible for the resent RPC and the RPC after it to complete before the initial delayed RPC. This delay is the analogue of OBD_FAIL_OST_BRW_PAUSE_BULK2 from &lt;a href=&quot;https://review.whamcloud.com/#/c/32165/6/lustre/tests/recovery-small.sh&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/#/c/32165/6/lustre/tests/recovery-small.sh&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Delay can happen anywhere on a non-RTOS system.&lt;/p&gt;</comment>
                            <comment id="291278" author="spitzcor" created="Thu, 4 Feb 2021 22:18:35 +0000"  >&lt;p&gt;Proposed for 2.14.0.  With -RC1 already available, I realize that its candidacy might not hold.&lt;/p&gt;</comment>
                            <comment id="291458" author="gerrit" created="Mon, 8 Feb 2021 21:54:46 +0000"  >&lt;p&gt;Oleg Drokin (green@whamcloud.com) merged in patch &lt;a href=&quot;https://review.whamcloud.com/32281/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/32281/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-10958&quot; title=&quot;brw rpc reordering causes data corruption when the writethrough cache is disabled&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-10958&quot;&gt;&lt;del&gt;LU-10958&lt;/del&gt;&lt;/a&gt; ofd: data corruption due to RPC reordering&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 35679a730bf0b7a8d4ce84cadc3ecc7c289ef491&lt;/p&gt;</comment>
                            <comment id="291468" author="pjones" created="Mon, 8 Feb 2021 21:59:05 +0000"  >&lt;p&gt;Landed for 2.14&lt;/p&gt;</comment>
                            <comment id="291729" author="eaujames" created="Thu, 11 Feb 2021 14:38:47 +0000"  >&lt;p&gt;Hello,&lt;/p&gt;

&lt;p&gt;Is a backport planned for the b2_12 branch for this issue ?&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                            <attachment id="30088" name="134.tar.bz2" size="1698129" author="panda" created="Thu, 26 Apr 2018 15:58:46 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzzwfj:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>