<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:16:40 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-1442] File corrupt with 1MiB-aligned 4k regions of zeros</title>
                <link>https://jira.whamcloud.com/browse/LU-1442</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;A data integrity test run periodically run by our storage group found two occurrences of corrupt files written to Lustre.  The original files contain 300 MB of random data.  The corrupt copies contain several 4096B regions of zeros aligned on 1MiB boundaries.   The two corrupt files were written to the same filesystem from two different login nodes on the same cluster within five minutes of each other.  The stripe count is 100.&lt;/p&gt;

&lt;p&gt;The client application is a parallel ftp client reading data out of our storage archive into Lustre.  The test checks for differences between the restored files and the original copies.  For a 300MB file it uses 4 threads which issue 4 64MB pwrite()&apos;s and 1 44MB pwrite().  It is possible that the pwrite() gets restarted due to SIGUSR2 from a master process, though we don&apos;t know if this occurred in the corrupting case.  This test has seen years of widespread use on all of our clusters, and this is the first reported incidence of this type of corruption, so we can characterize the frequency as rare.&lt;/p&gt;

&lt;p&gt;When I examine an OST object containing a corrupt region, I see there is no block allocated for the corrupt region (in this case, logical block 256 is missing). &lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;# pigs58 /root &amp;gt; debugfs -c -R &quot;dump_extents /O/0/d$((30205348 % 32))/30205348&quot; /dev/sdb
debugfs 1.41.12 (17-May-2010)
/dev/sdb: catastrophic mode - not reading inode or group bitmaps
Level Entries       Logical              Physical Length Flags
 0/ 0   1/  3     0 -   255 813140480 - 813140735    256
 0/ 0   2/  3   257 -   511 813142528 - 813142782    255
 0/ 0   3/  3   512 -   767 813143040 - 813143295    256
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Finally, the following server-side console messages appeared at the same time one of the corrupted files was written, and mention the NID of the implicated client.  The consoles of the OSTs containing the corrupt objects were quiet at the time.&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;May 17 01:06:08 pigs-mds1 kernel: LustreError: 20418:0:(mdt_recovery.c:1011:mdt_steal_ack_locks()) Resent req xid 1402165306385077 has mismatched opc: new 101 old 0
May 17 01:06:08 pigs-mds1 kernel: Lustre: 20418:0:(mdt_recovery.c:1022:mdt_steal_ack_locks()) Stealing 1 locks from rs ffff880410f62000 x1402165306385077.t125822723745 o0 NID 192.168.114.155@o2ib5
May 17 01:06:08 pigs-mds1 kernel: Lustre: All locks stolen from rs ffff880410f62000 x1402165306385077.t125822723745 o0 NID 192.168.114.155@o2ib5
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</description>
                <environment>&lt;a href=&quot;https://github.com/chaos/lustre/commits/2.1.1-llnl&quot;&gt;https://github.com/chaos/lustre/commits/2.1.1-llnl&lt;/a&gt;</environment>
        <key id="14581">LU-1442</key>
            <summary>File corrupt with 1MiB-aligned 4k regions of zeros</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="6" iconUrl="https://jira.whamcloud.com/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="2">Won&apos;t Fix</resolution>
                                        <assignee username="jay">Jinshan Xiong</assignee>
                                    <reporter username="nedbass">Ned Bass</reporter>
                        <labels>
                            <label>llnl</label>
                    </labels>
                <created>Fri, 25 May 2012 18:46:41 +0000</created>
                <updated>Tue, 16 Aug 2016 16:36:46 +0000</updated>
                            <resolved>Tue, 16 Aug 2016 16:36:46 +0000</resolved>
                                    <version>Lustre 2.3.0</version>
                    <version>Lustre 2.1.1</version>
                                    <fixVersion>Lustre 2.3.0</fixVersion>
                    <fixVersion>Lustre 2.1.3</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>17</watches>
                                                                            <comments>
                            <comment id="39430" author="nedbass" created="Fri, 25 May 2012 19:08:30 +0000"  >&lt;p&gt;The client logs were also quiet at the time of the corruption.&lt;/p&gt;

&lt;p&gt;I should also mention that the corruption offsets are not at 64MiB boundaries, so it is less likely that the application accidentally wrote a sparse file by miscalculating the offset argument to pwrite().  The first file had corruption as MiB offsets 159, 162, 173, 189.  In the second file the MiB offsets were 158 and 190.&lt;/p&gt;</comment>
                            <comment id="39432" author="pjones" created="Fri, 25 May 2012 19:22:45 +0000"  >&lt;p&gt;Bobijam&lt;/p&gt;

&lt;p&gt;Could you please look into this one?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="39443" author="bobijam" created="Sun, 27 May 2012 21:50:19 +0000"  >&lt;p&gt;Would you please upload debug log of MDS and client as much as possible?&lt;/p&gt;

&lt;p&gt;May 17 01:06:08 pigs-mds1 kernel: LustreError: 20418:0:(mdt_recovery.c:1011:mdt_steal_ack_locks()) Resent req xid 1402165306385077 has mismatched opc: new 101 old 0&lt;/p&gt;

&lt;p&gt;the operation request from client is abnormal (its opc was 0, which is abnormal), the client opc saved on MDS is stored in target_send_reply()&lt;/p&gt;

&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeHeader panelHeader&quot; style=&quot;border-bottom-width: 1px;&quot;&gt;&lt;b&gt;target_send_reply()&lt;/b&gt;&lt;/div&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;        &lt;span class=&quot;code-comment&quot;&gt;/* disable reply scheduling &lt;span class=&quot;code-keyword&quot;&gt;while&lt;/span&gt; I&apos;m setting up */&lt;/span&gt;
        rs-&amp;gt;rs_scheduled = 1;
        rs-&amp;gt;rs_on_net    = 1;
        rs-&amp;gt;rs_xid       = req-&amp;gt;rq_xid;
        rs-&amp;gt;rs_transno   = req-&amp;gt;rq_transno;
        rs-&amp;gt;rs_export    = exp;
        rs-&amp;gt;rs_opc       = lustre_msg_get_opc(rs-&amp;gt;rs_msg);
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeHeader panelHeader&quot; style=&quot;border-bottom-width: 1px;&quot;&gt;&lt;b&gt;lustre_msg_get_opc()&lt;/b&gt;&lt;/div&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;__u32 lustre_msg_get_opc(struct lustre_msg *msg)
{
        &lt;span class=&quot;code-keyword&quot;&gt;switch&lt;/span&gt; (msg-&amp;gt;lm_magic) {
        &lt;span class=&quot;code-keyword&quot;&gt;case&lt;/span&gt; LUSTRE_MSG_MAGIC_V2: {
                struct ptlrpc_body *pb = lustre_msg_ptlrpc_body(msg);
                &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (!pb) {
                        CERROR(&lt;span class=&quot;code-quote&quot;&gt;&quot;invalid msg %p: no ptlrpc body!\n&quot;&lt;/span&gt;, msg);
                        &lt;span class=&quot;code-keyword&quot;&gt;return&lt;/span&gt; 0;
                }
                &lt;span class=&quot;code-keyword&quot;&gt;return&lt;/span&gt; pb-&amp;gt;pb_opc;
        }
        &lt;span class=&quot;code-keyword&quot;&gt;default&lt;/span&gt;:
                CERROR(&lt;span class=&quot;code-quote&quot;&gt;&quot;incorrect message magic: %08x(msg:%p)\n&quot;&lt;/span&gt;, msg-&amp;gt;lm_magic, msg);
                LBUG();
                &lt;span class=&quot;code-keyword&quot;&gt;return&lt;/span&gt; 0;
        }
}
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
</comment>
                            <comment id="39446" author="nedbass" created="Mon, 28 May 2012 01:36:17 +0000"  >&lt;p&gt;Hi,&lt;/p&gt;

&lt;blockquote&gt;&lt;p&gt;Would you please upload debug log of MDS and client as much as possible?&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;Unfortunately the MDS was rebooted since this event and the debug log was lost.  I dumped the debug logs for both clients involved, but I don&apos;t think you&apos;ll find anything useful in them.  I&apos;ll attach them just in case.&lt;/p&gt;</comment>
                            <comment id="39525" author="adilger" created="Tue, 29 May 2012 16:39:36 +0000"  >&lt;p&gt;Ned,&lt;br/&gt;
can you please check the client NID that reported this problem and check the Lustre version.  If the client is a bit older, it is possible that it hit &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-721&quot; title=&quot;Parallel writes to same file results in a file of zeroes&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-721&quot;&gt;&lt;del&gt;LU-721&lt;/del&gt;&lt;/a&gt;, which was only fixed in Lustre 1.8.7, 2.1.1, and 2.2+.  AFAIK, you have 1.8.5 clients still in use.&lt;/p&gt;

&lt;p&gt;Failing that, is there any chance that the client was evicted from the OST, but the console messages related to client eviction have been turned off?  There were a lot of patches to quiet console errors, so I&apos;m wondering if a client eviction could happen without any visible console error messages.&lt;/p&gt;

&lt;p&gt;The write was never sent to the filesystem (evidenced by the unallocated block), so it is unlikely to be corruption at the underlying block layer or in ldiskfs.  It is much more likely that the RPC was not handled at all, which could be caused by eviction, or something like &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-721&quot; title=&quot;Parallel writes to same file results in a file of zeroes&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-721&quot;&gt;&lt;del&gt;LU-721&lt;/del&gt;&lt;/a&gt;.&lt;/p&gt;
</comment>
                            <comment id="39544" author="morrone" created="Tue, 29 May 2012 18:51:14 +0000"  >&lt;p&gt;Andreas, the clients that are hitting this are running &lt;a href=&quot;https://github.com/chaos/lustre/tree/2.1.1-11chaos&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;2.1.1-11chaos&lt;/a&gt;, and the servers are all running &lt;a href=&quot;https://github.com/chaos/lustre/tree/2.1.1-4chaos&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;2.1.1-4chaos&lt;/a&gt; or newer.  So we definitely already have the &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-721&quot; title=&quot;Parallel writes to same file results in a file of zeroes&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-721&quot;&gt;&lt;del&gt;LU-721&lt;/del&gt;&lt;/a&gt; patch that is landed in 2.1.1 on those clients.&lt;/p&gt;

&lt;p&gt;We have 1.8.5-?chaos on some x86_64 clients (again, all server are 2.1.1+) still, but none of them exhibit this problem.  These clients will be upgraded to 2.1.1-*chaos in time.  But again, only the 2.1.1+ clients are seeing this bug.&lt;/p&gt;</comment>
                            <comment id="39660" author="morrone" created="Wed, 30 May 2012 18:52:36 +0000"  >&lt;p&gt;We are NOT silencing the eviction messages.  We absolutely need those.&lt;/p&gt;

&lt;p&gt;But there are no messages about eviction or anything else on the clients in question for hours on either side of the problem.&lt;/p&gt;
</comment>
                            <comment id="39751" author="morrone" created="Thu, 31 May 2012 14:40:19 +0000"  >&lt;p&gt;We have confirmation from the pftp developers that the parent process will send SIGUSER2 to all four of the children (probably when the first child completes).  Here is exactly what we&apos;ve been told:&lt;/p&gt;

&lt;p&gt;&quot;Near the end of the transfer, the parent process send SIGUSR2 to all four children. Anywhere from 0 to 3 of the 4 children exit pwrite with a return value of -1 and errno of 4 (interrupted system call).  All interrupted pwrites are reissued and then succeed.&quot;&lt;/p&gt;</comment>
                            <comment id="39814" author="nedbass" created="Fri, 1 Jun 2012 12:00:33 +0000"  >&lt;p&gt;Andreas,&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The write was never sent to the filesystem (evidenced by the unallocated block), so it is unlikely to be corruption at the underlying block layer or in ldiskfs. It is much more likely that the RPC was not handled at all, which could be caused by eviction, or something like &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-721&quot; title=&quot;Parallel writes to same file results in a file of zeroes&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-721&quot;&gt;&lt;del&gt;LU-721&lt;/del&gt;&lt;/a&gt;.&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;One thing that puzzles me about this explanation is that the size of the hole is only 4k.  I would expect the client to have been sending 1MB write RPCs since the pwrite() size was a multiple of 1MB.  If the write was never sent or handled, shouldn&apos;t the hole be 1MB?&lt;/p&gt;</comment>
                            <comment id="39985" author="jay" created="Tue, 5 Jun 2012 00:51:00 +0000"  >&lt;p&gt;This bug may be related to &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-1299&quot; title=&quot;running truncated executable causes spewing of lock debug messages&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-1299&quot;&gt;&lt;del&gt;LU-1299&lt;/del&gt;&lt;/a&gt; where sleeping on a cl_lock is interrupted by the signal, so the lock is wrongly marked as error. As a result, the lock canceling will abort and this causes some dirty pages are not flushed.&lt;/p&gt;

&lt;p&gt;There is a patch at &lt;a href=&quot;http://review.whamcloud.com/2574&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/2574&lt;/a&gt; can you please try if it can fix your issue?&lt;/p&gt;</comment>
                            <comment id="40045" author="nedbass" created="Tue, 5 Jun 2012 17:53:43 +0000"  >&lt;p&gt;Hi Jinshan,&lt;/p&gt;

&lt;p&gt;We have patchset 7 of that patch in our current tag.  We will update to the latest.  However we don&apos;t have a reliable reproducer for this bug so it will be difficult to say if it fixes the issue.&lt;/p&gt;</comment>
                            <comment id="40052" author="morrone" created="Tue, 5 Jun 2012 18:29:23 +0000"  >&lt;p&gt;I&apos;ve updated the 2.1.1-llnl branch to have patch set 10 from &lt;a href=&quot;http://review.whamcloud.com/2574&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/2574&lt;/a&gt;.&lt;/p&gt;</comment>
                            <comment id="40341" author="pjones" created="Mon, 11 Jun 2012 09:19:31 +0000"  >&lt;p&gt;ok I will close this as a duplicate of &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-1299&quot; title=&quot;running truncated executable causes spewing of lock debug messages&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-1299&quot;&gt;&lt;del&gt;LU-1299&lt;/del&gt;&lt;/a&gt; and we can reopen if that is proven to not be the case&lt;/p&gt;</comment>
                            <comment id="40666" author="nedbass" created="Fri, 15 Jun 2012 13:07:52 +0000"  >&lt;p&gt;We had a new occurrence of this bug this morning on a client running patchset 7 of &lt;a href=&quot;http://review.whamcloud.com/2574&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/2574&lt;/a&gt;.  Does this change your view that this is related to &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-1299&quot; title=&quot;running truncated executable causes spewing of lock debug messages&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-1299&quot;&gt;&lt;del&gt;LU-1299&lt;/del&gt;&lt;/a&gt;, or do we need patchset 10 to prevent the corruption?&lt;/p&gt;</comment>
                            <comment id="40671" author="nedbass" created="Fri, 15 Jun 2012 14:04:18 +0000"  >&lt;p&gt;The new occurrence again coincided with messages on the MDS like these:&lt;/p&gt;

&lt;p&gt;Resent req xid ... has mismatched opc: new 101 old 0&lt;br/&gt;
Stealing 1 locks from rs ...&lt;br/&gt;
All locks stolen from rs ...&lt;/p&gt;</comment>
                            <comment id="40820" author="jay" created="Mon, 18 Jun 2012 23:15:47 +0000"  >&lt;p&gt;Hi Ned, it seems this is not related to &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-1299&quot; title=&quot;running truncated executable causes spewing of lock debug messages&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-1299&quot;&gt;&lt;del&gt;LU-1299&lt;/del&gt;&lt;/a&gt;. Did the processes write their portion of file exclusively, i.e. there is no extent lock confliction?&lt;/p&gt;</comment>
                            <comment id="40872" author="nedbass" created="Tue, 19 Jun 2012 14:58:58 +0000"  >&lt;blockquote&gt;&lt;p&gt;Did the processes write their portion of file exclusively, i.e. there is no extent lock confliction?&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;I&apos;m not sure how to determine the answer to that.  The pwrite() calls from one strace of the four involved processes looked like this:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;6621:pwrite(8, &quot;...&quot;, 67108864, 0) = 67108864                    
6621:--- SIGUSR2 (User defined signal 2) @ 0 (0) ---
6621:pwrite(8, &quot;...&quot;, 46137344, 268435456) = 46137344                

6622:pwrite(4, &quot;...&quot;, 67108864, 67108864) = 67108864
6622:--- SIGUSR2 (User defined signal 2) @ 0 (0) ---

6623:pwrite(4, &quot;...&quot;, 67108864, 134217728) = ? ERESTARTSYS (To be restarted)
6623:--- SIGUSR2 (User defined signal 2) @ 0 (0) ---
6623:pwrite(4, &quot;...&quot;, 67108864, 134217728) = 67108864                

6624:pwrite(4, &quot;...&quot;, 67108864, 201326592) = ? ERESTARTSYS (To be restarted)
6624:--- SIGUSR2 (User defined signal 2) @ 0 (0) ---
6624:pwrite(4, &quot;...&quot;, 67108864, 201326592) = 67108864   
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;(This was not a corrupting case, just a sample run of the test program.)&lt;/p&gt;

&lt;p&gt;It may be worth noting that in each occurrence we&apos;ve had so far, the corrupted regions have all been at offsets between 100-200 MB. Since the test uses a stripe count of 100 that&apos;s always in the second stripe of the file.&lt;/p&gt;</comment>
                            <comment id="40999" author="morrone" created="Thu, 21 Jun 2012 13:16:18 +0000"  >&lt;blockquote&gt;&lt;p&gt;Did the processes write their portion of file exclusively, i.e. there is no extent lock confliction?&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;The files write to their own ranges in the files, and do not overlap with any of the other writers.  Furthermore, the writers are all on one node, so there is no ldlm lock contention with any other lustre clients.&lt;/p&gt;</comment>
                            <comment id="41083" author="pjones" created="Mon, 25 Jun 2012 09:41:37 +0000"  >&lt;p&gt;It seems like this ticket should be reopened.&lt;/p&gt;</comment>
                            <comment id="41109" author="adilger" created="Mon, 25 Jun 2012 17:30:57 +0000"  >&lt;p&gt;Jinshan,&lt;br/&gt;
I would investigate the RPC opcode issue.  It would be possible to add debugging on the client to verify the opcode of the RPC is non-zero when it is put into the replay list, and print a message at that point if the opcode is bad, and also verify the opcode on the client before the RPC is resent.  If only the second message triggers, it means some kind of in-memory corruption is happening on the client.&lt;/p&gt;

&lt;p&gt;What is strange is that there are messages on the MDS, instead of on the OSS where one would expect any errors related to file IO to happen.  The only thing I can think of where an MDS error would cause bad data is that some of the client threads are somehow opening the incorrect file and the data is going to the wrong location?&lt;/p&gt;</comment>
                            <comment id="41114" author="jay" created="Mon, 25 Jun 2012 21:49:13 +0000"  >&lt;p&gt;I&apos;m working on this issue.&lt;/p&gt;

&lt;p&gt;Verifying RPC opcode would be the only way to start with. The strange thing is that only one page in the middle was missing, I can&apos;t connect this with MDS replay error.&lt;/p&gt;
</comment>
                            <comment id="41117" author="jay" created="Tue, 26 Jun 2012 00:03:52 +0000"  >&lt;p&gt;Hi Ned, did you notice that when this issue happened, pwrite() kept returning -1 with errno ERESTARTSYS, or sometimes short write occurred?&lt;/p&gt;</comment>
                            <comment id="41134" author="jay" created="Tue, 26 Jun 2012 09:30:18 +0000"  >&lt;p&gt;just in case, did you use DIRECT_IO to write the file?&lt;/p&gt;</comment>
                            <comment id="41148" author="nedbass" created="Tue, 26 Jun 2012 12:30:51 +0000"  >&lt;blockquote&gt;&lt;p&gt;did you notice that when this issue happened, pwrite() kept returning -1 with errno ERESTARTSYS, or sometimes short write occurred?&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;I don&apos;t have a trace of the process when the issue happened.  All I know is that the parent process will signal its children with SIGUSR2 which will cause pwrite() to return -1 with errno ERESTARTSYS.  I&apos;ve never seen this happen more than once in my tests, but it&apos;s possible this pattern will repeat if pwrite() is slow.  I don&apos;t have the pftp source code to check, but I can ask our HPSS group to look into it.&lt;/p&gt;

&lt;blockquote&gt;&lt;p&gt;just in case, did you use DIRECT_IO to write the file?&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;No, strace shows pftp just uses O_WRONLY|O_CREAT.&lt;/p&gt;</comment>
                            <comment id="41174" author="jay" created="Tue, 26 Jun 2012 23:19:42 +0000"  >&lt;p&gt;Hi Ned, will you please try this patch: &lt;a href=&quot;http://review.whamcloud.com/3194&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/3194&lt;/a&gt;, this patch may fix corruption issue.&lt;/p&gt;

&lt;p&gt;After the corruption issue is fixed, I&apos;ll start to work on wrong opc issue if it&apos;s bothering you guys.&lt;/p&gt;</comment>
                            <comment id="41203" author="nedbass" created="Wed, 27 Jun 2012 12:53:57 +0000"  >&lt;p&gt;Great, we&apos;ll give the patch a try.  We&apos;ve had three corruptions in about two months, and we haven&apos;t found a way to easily reproduce it. So it may take a few months with no new corruptions to gain some confidence in the fix.&lt;/p&gt;</comment>
                            <comment id="41255" author="jay" created="Thu, 28 Jun 2012 09:29:07 +0000"  >&lt;p&gt;When I was trying to reproduce this with fail_loc today I found something new. Actually, though a dirty page will not be added into osc&apos;s cache if osc_page_cache_add() is interrupted by signal, the page will still be written back by kernel flush daemon. Saying that, the IO pattern can be exactly what you have seen(a page of gap in block allocation), but data corruption is unexpected. I still need to investigate this problem, but I will focus on if there exists a code path causing dirty page discarded.&lt;/p&gt;

&lt;p&gt;I&apos;m pretty sure the issue you have seen is related to the problem I found in the patch, so please apply this patch and do test intensely. Maybe we can find new clues. Thanks,&lt;/p&gt;</comment>
                            <comment id="41256" author="jay" created="Thu, 28 Jun 2012 09:34:57 +0000"  >&lt;p&gt;Hi Ned, Do you know how the application detected the data corruption issue? Do they just read the data back on the same client or some operations, for example flush caching pages, were done between the write and verification?&lt;/p&gt;</comment>
                            <comment id="41263" author="nedbass" created="Thu, 28 Jun 2012 12:31:57 +0000"  >&lt;p&gt;Attached test script.  No cache flush or other operations are done between write and verification.  The entire test is run on the same client.  The process is basically:&lt;/p&gt;

&lt;ol&gt;
	&lt;li&gt;write random patterns to a file&lt;/li&gt;
	&lt;li&gt;ftp file to archvial storage&lt;/li&gt;
	&lt;li&gt;retrieve copy from archival storage&lt;/li&gt;
	&lt;li&gt;compare copy to original with &apos;cmp&apos; to check for corruption&lt;/li&gt;
&lt;/ol&gt;
</comment>
                            <comment id="41576" author="morrone" created="Fri, 6 Jul 2012 19:54:09 +0000"  >&lt;p&gt;Jinshan, I&apos;ve added &lt;a href=&quot;http://review.whamcloud.com/3194&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/3194&lt;/a&gt; to our branch to include in the next round of testing.&lt;/p&gt;</comment>
                            <comment id="42524" author="ian" created="Tue, 31 Jul 2012 18:57:06 +0000"  >&lt;p&gt;Master version of patch has been merged - &lt;a href=&quot;http://review.whamcloud.com/#change,3447&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#change,3447&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="42531" author="jay" created="Tue, 31 Jul 2012 19:51:45 +0000"  >&lt;p&gt;Hi Chris, how long have you been running the test with this patch?&lt;/p&gt;</comment>
                            <comment id="42599" author="chris" created="Thu, 2 Aug 2012 10:33:00 +0000"  >&lt;p&gt;I don&apos;t know, any pushes to gerrit that have rebased on this patch will have been run with this patch. I cannot know who has rebased and pushed.&lt;/p&gt;</comment>
                            <comment id="42606" author="jay" created="Thu, 2 Aug 2012 12:53:02 +0000"  >&lt;p&gt;Hi Chris Gearing, Sorry I meant to say Christopher Morrone because LLNL is verifying if the patch can fix the data corruption problem. This is a rarely occurred problem so it may take months to verify it.&lt;/p&gt;</comment>
                            <comment id="42625" author="morrone" created="Thu, 2 Aug 2012 19:26:23 +0000"  >&lt;p&gt;Its been on our test systems and hasn&apos;t caused any problems that I am aware of.  It is not installed in production yet.  It might make it into a production release in a couple of weeks.&lt;/p&gt;

&lt;p&gt;We&apos;ve seen the &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-1680&quot; title=&quot;LBUG cl_lock.c:1949:discard_cb()) (ORI-726)&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-1680&quot;&gt;&lt;del&gt;LU-1680&lt;/del&gt;&lt;/a&gt; failures on the orion branch, and just recently pulled this &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-1442&quot; title=&quot;File corrupt with 1MiB-aligned 4k regions of zeros&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-1442&quot;&gt;&lt;del&gt;LU-1442&lt;/del&gt;&lt;/a&gt; patch into there.  We&apos;ll keep an eye out for failures on orion when we upgrade to that version.&lt;/p&gt;</comment>
                            <comment id="42626" author="morrone" created="Thu, 2 Aug 2012 19:27:54 +0000"  >&lt;p&gt;Oh, in other words, I don&apos;t know if it fixes the problem because we don&apos;t know hot to reproduce it in our test environment.  But it hasn&apos;t caused any new problems that I know of.&lt;/p&gt;</comment>
                            <comment id="42628" author="jaylan" created="Thu, 2 Aug 2012 21:01:45 +0000"  >&lt;p&gt;Are the messages below evidence of data corruption? We have quite a number of these on our 2.1.2 mds:&lt;/p&gt;

&lt;p&gt;LustreError: 4869:0:(mdt_recovery.c:1011:mdt_steal_ack_locks()) Resent req xid 1407363869661366 has mismatched opc: new 101 old 0^M&lt;br/&gt;
Lustre: 4869:0:(mdt_recovery.c:1022:mdt_steal_ack_locks()) Stealing 1 locks from rs ffff8802a0930000 x1407363869661366.t210874408121 o0 NID 10.151.26.25@o2ib^M&lt;br/&gt;
Lustre: 4265:0:(service.c:1865:ptlrpc_handle_rs()) All locks stolen from rs ffff8802a0930000 x1407363869661366.t210874408121 o0 NID 10.151.26.25@o2ib^M&lt;/p&gt;</comment>
                            <comment id="42757" author="jay" created="Mon, 6 Aug 2012 15:16:28 +0000"  >&lt;p&gt;Hi Jay Lan,&lt;/p&gt;

&lt;p&gt;This error messages are not related as data corruption happened on OST and the message showed something wrong with MDT.&lt;/p&gt;</comment>
                            <comment id="42758" author="jay" created="Mon, 6 Aug 2012 15:17:35 +0000"  >&lt;p&gt;I&apos;m going to set this issue as fixed otherwise we can&apos;t release 2.3.&lt;/p&gt;

&lt;p&gt;Please reopen this issue if it occurs again.&lt;/p&gt;</comment>
                            <comment id="42775" author="jaylan" created="Mon, 6 Aug 2012 17:35:48 +0000"  >&lt;p&gt;Hi Jinshan, but our console log messages look the same as the server console log messages shown in the Description section of the ticket. Are you suggesting that other problems can produce the same log messages and thus the messages alone not sufficient evidence? Please advise. Thanks!&lt;/p&gt;</comment>
                            <comment id="42782" author="jay" created="Mon, 6 Aug 2012 20:11:52 +0000"  >&lt;p&gt;This message may not introduce any problem, probably just too aggressive console output. I think it can be ignored if you didn&apos;t see any real problem.&lt;/p&gt;</comment>
                            <comment id="42819" author="jay" created="Tue, 7 Aug 2012 13:23:38 +0000"  >&lt;p&gt;Hi Jay Lan, Lu-1717 was just created to address/understand this MDS console error message.&lt;/p&gt;</comment>
                            <comment id="42820" author="jaylan" created="Tue, 7 Aug 2012 13:34:18 +0000"  >&lt;p&gt;Thanks, Jinshan!&lt;/p&gt;</comment>
                            <comment id="44711" author="cyberchip" created="Wed, 12 Sep 2012 15:14:20 +0000"  >&lt;p&gt;Hi,&lt;/p&gt;

&lt;p&gt;We got a similar issue (the written file has a big hole with a lot of zeros) occasionally and caused wrong simulation results.&lt;/p&gt;

&lt;p&gt;As we have Luster 1.8.8, I wonder if our file system could be affected by this bug.&lt;/p&gt;

&lt;p&gt;Thanks,&lt;br/&gt;
Yusong &lt;/p&gt;</comment>
                            <comment id="48700" author="nedbass" created="Mon, 3 Dec 2012 15:57:15 +0000"  >&lt;p&gt;Our test program detected this type of corruption again, so we should re-open this issue.  Client was running 2.1.2-4chaos, which includes the patch &quot;&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-1442&quot; title=&quot;File corrupt with 1MiB-aligned 4k regions of zeros&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-1442&quot;&gt;&lt;del&gt;LU-1442&lt;/del&gt;&lt;/a&gt; llite: cleanup if a page failed to add into cache&quot;.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/chaos/lustre/commits/2.1.2-4chaos&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://github.com/chaos/lustre/commits/2.1.2-4chaos&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Note that this test runs regularly on our production systems, so this bug is very rare to reproduce.&lt;/p&gt;</comment>
                            <comment id="48705" author="nedbass" created="Mon, 3 Dec 2012 17:16:18 +0000"  >&lt;p&gt;Looking at the on-disk data, it is again logical block 256 that is unallocated in the OST object.&lt;/p&gt;</comment>
                            <comment id="48840" author="bobijam" created="Wed, 5 Dec 2012 21:58:57 +0000"  >&lt;p&gt;Ned Bass,&lt;/p&gt;

&lt;p&gt;Would you mind checking whether the stripe patterns are always the same for the error cases? Also please collect and upload client&apos;s rpc_stats of clients info (e.g. /proc/fs/lustre/osc/lustre-OST0000-osc-ffff88003de72800/rpc_stats data).&lt;/p&gt;

&lt;p&gt;We&apos;d like to know whether clients have always written 1M RPC to OST w/o partial write, so that we&apos;d know whether client&apos;s issue or OST&apos;s.&lt;/p&gt;</comment>
                            <comment id="48844" author="nedbass" created="Thu, 6 Dec 2012 01:56:47 +0000"  >&lt;p&gt;The stripe patterns are not always the same.  The first case involved a 300 MB file with a stripe count of 100.  This most recent case involved a 10 GB file with a stripe count of 2.  The only consistent pattern so far is that it always seems to be logical block 256 missing from an OST object.&lt;/p&gt;

&lt;p&gt;I&apos;ll check the rpc_stats tomorrow.&lt;/p&gt;</comment>
                            <comment id="48871" author="jay" created="Thu, 6 Dec 2012 11:53:19 +0000"  >&lt;p&gt;I can&apos;t think of what kind of problem can cause this symptom, there is even no lock contention. One more question is: is the client ppc64?&lt;/p&gt;</comment>
                            <comment id="48872" author="morrone" created="Thu, 6 Dec 2012 11:55:02 +0000"  >&lt;p&gt;No, both client and server are x86_64.&lt;/p&gt;</comment>
                            <comment id="48873" author="nedbass" created="Thu, 6 Dec 2012 13:00:24 +0000"  >&lt;p&gt;The rpc_stats for the involved OST looked like this:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;                            read                    write
pages per rpc           rpcs   %   cum % |   rpcs  %  cum %
1:                        0    0     0   |      3  0    0   
2:                        0    0     0   |      0  0    0   
4:                        2    0     0   |      0  0    0   
8:                        0    0     0   |      0  0    0   
16:                       0    0     0   |      0  0    0   
32:                       0    0     0   |      1  0    0   
64:                       0    0     0   |      3  0    0   
128:                      0    0     0   |      3  0    0   
256:                   4771   99   100   |   4768 99  100 

                            read                    write
offset                  rpcs   %   cum % |   rpcs  %  cum %
1:                     4772   99    99   |   4760 99   99  
2:                        0    0    99   |      4  0   99  
4:                        2    0    99   |      0  0   99  
8:                        0    0    99   |      0  0   99  
16:                       0    0    99   |      0  0   99  
32:                       0    0    99   |      1  0   99  
64:                       1    0   100   |      6  0   99  
128:                      0    0   100   |      7  0  100 
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="48916" author="jay" created="Fri, 7 Dec 2012 13:23:34 +0000"  >&lt;p&gt;Have you ever run other jobs in this cluster because I saw some non 1M RPCs in the roc_stats?&lt;/p&gt;

&lt;p&gt;From the symptom, I guess somehow the 1st page of a stripe was not included in the RPC. If that is true, then that RPC must have only 255 pages and we should see a tally in the stats. Of course this way only works if that job is the only one in the cluster.&lt;/p&gt;

&lt;p&gt;We&apos;ll make a debug patch for this purpose.&lt;/p&gt;</comment>
                            <comment id="48917" author="nedbass" created="Fri, 7 Dec 2012 13:37:58 +0000"  >&lt;p&gt;Hi Jinshan,&lt;/p&gt;

&lt;p&gt;This node has been up for about 70 days and has run many other jobs, so we can&apos;t say for sure whether the missing page is reflected in the stats.&lt;/p&gt;

&lt;p&gt;I like the idea of a debug patch to catch this case.  We discussed that approach during our last meeting but I couldn&apos;t easily figure out where to put the debugging.&lt;/p&gt;</comment>
                            <comment id="53560" author="jay" created="Thu, 7 Mar 2013 19:50:01 +0000"  >&lt;p&gt;I&apos;m lowering the prio to major and will continue working on this after getting new clues.&lt;/p&gt;</comment>
                            <comment id="54912" author="artem_blagodarenko" created="Wed, 27 Mar 2013 13:16:39 +0000"  >&lt;p&gt;Moved unrelated comments to new bug &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-3219&quot; title=&quot;FIEMAP does not sync data or return cached pages&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-3219&quot;&gt;&lt;del&gt;LU-3219&lt;/del&gt;&lt;/a&gt;.&lt;/p&gt;</comment>
                            <comment id="56880" author="artem_blagodarenko" created="Tue, 23 Apr 2013 20:53:05 +0000"  >&lt;p&gt;.&lt;/p&gt;</comment>
                            <comment id="56960" author="jlevi" created="Wed, 24 Apr 2013 18:03:57 +0000"  >&lt;p&gt;Reclosing this ticket. The new issue mentioned is tracked in &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-3219&quot; title=&quot;FIEMAP does not sync data or return cached pages&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-3219&quot;&gt;&lt;del&gt;LU-3219&lt;/del&gt;&lt;/a&gt;.&lt;/p&gt;</comment>
                            <comment id="56963" author="nedbass" created="Wed, 24 Apr 2013 18:12:33 +0000"  >&lt;p&gt;Jodi, afaik this issue is not resolved, just rare.  Are there still any plans to make a debug patch?&lt;/p&gt;</comment>
                            <comment id="56964" author="pjones" created="Wed, 24 Apr 2013 18:18:32 +0000"  >&lt;p&gt;Sorry for the confusion Ned - things just got a little muddled due to two somewhat similar issues becoming intertwined. Yes, this is still on Jinshan&apos;s plate to dig into further at some point.&lt;/p&gt;</comment>
                            <comment id="162055" author="simmonsja" created="Tue, 16 Aug 2016 16:36:46 +0000"  >&lt;p&gt;Old ticket for unsupported version&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10010">
                    <name>Duplicate</name>
                                                                <inwardlinks description="is duplicated by">
                                        <issuelink>
            <issuekey id="15328">LU-1680</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="15399">LU-1703</issuekey>
        </issuelink>
                            </outwardlinks>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="14657">LU-1458</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="11471" name="LU-1442.lustre.log.sierra654.gz" size="2678070" author="nedbass" created="Mon, 28 May 2012 01:38:10 +0000"/>
                            <attachment id="11472" name="LU-1442.lustre.log.sierra972.gz" size="2544202" author="nedbass" created="Mon, 28 May 2012 01:38:10 +0000"/>
                            <attachment id="11664" name="qualify.ftp" size="914" author="nedbass" created="Thu, 28 Jun 2012 12:31:57 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10490" key="com.atlassian.jira.plugin.system.customfieldtypes:datepicker">
                        <customfieldname>End date</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Fri, 27 Jun 2014 18:46:41 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzv633:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>4520</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                        <customfield id="customfield_10493" key="com.atlassian.jira.plugin.system.customfieldtypes:datepicker">
                        <customfieldname>Start date</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Fri, 25 May 2012 18:46:41 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                    </customfields>
    </item>
</channel>
</rss>