<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:59:47 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-6389] read()/write() returning less than available bytes intermittently</title>
                <link>https://jira.whamcloud.com/browse/LU-6389</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Since March 10, 2015, we have be tracking an increasing number of user reports of intermittent I/O problems with our largest Lustre filesystem on Stampede (SCRATCH). This is affecting dozens of users on multiple jobs per user. First detected in Fortran programs and reduced to a 10-line reproducer (test_break.f), we have now also generated a C reproducer (rwb.c) that does not depend on a specific Fortran runtime library. This case was designed to mimic the underlying libc calls that the Fortran case was making without the interference from the runtime library. The attached case fails with either icc or gcc on our system. &lt;/p&gt;

&lt;p&gt;The basic case involves a long sequence of ~4MB read() or write() calls which eventually should read or write all of a large file. Intermittently, but reproducibly, one of these calls will come back short before getting to the last block of the file. I.e. a 4MB read may only read 2.5MB somewhere in the middle of the file. The number bytes read on the short call and the position in the sequence are apparently random. This issue does not occur if the file has only 1 stripe, but does consistently occur with 2 stripes or more. The problem does not occur on either of our other Lustre filesystems on Stampede, and nothing appears to have changed that is correlated in time with the start of the problems.&lt;/p&gt;

&lt;p&gt;The short read/write does not report an error when running the C code, and subsequent reads continue as normal. Writing behaves identically. Some codes, including the Intel Fortran runtime do not tolerate short reads (though they potentially could), and the codes abort (including the attached one). No codes that I know of are designed to tolerate shorter than requested writes generally. We can find no client or server error messages associated with these short read/write events.&lt;/p&gt;

&lt;p&gt;We would be happy to provide access to Stampede for testing and verification.&lt;/p&gt;</description>
                <environment>CentOS 6.5 2.6.32-431.17.1.el6.x86_64. 2.5.2 client. 2.5.3 server.</environment>
        <key id="29175">LU-6389</key>
            <summary>read()/write() returning less than available bytes intermittently</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="2" iconUrl="https://jira.whamcloud.com/images/icons/priorities/critical.svg">Critical</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="bobijam">Zhenyu Xu</assignee>
                                    <reporter username="bbarth">Bill Barth</reporter>
                        <labels>
                            <label>llnl</label>
                    </labels>
                <created>Thu, 19 Mar 2015 23:22:33 +0000</created>
                <updated>Fri, 19 Jun 2020 14:52:25 +0000</updated>
                            <resolved>Mon, 18 May 2015 14:23:01 +0000</resolved>
                                    <version>Lustre 2.5.2</version>
                    <version>Lustre 2.5.3</version>
                                    <fixVersion>Lustre 2.8.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>30</watches>
                                                                            <comments>
                            <comment id="110180" author="pjones" created="Thu, 19 Mar 2015 23:58:27 +0000"  >&lt;p&gt;Bobijam&lt;/p&gt;

&lt;p&gt;Could you please advise on this issue?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="110194" author="gerrit" created="Fri, 20 Mar 2015 10:03:14 +0000"  >&lt;p&gt;Bobi Jam (bobijam@hotmail.com) uploaded a new patch: &lt;a href=&quot;http://review.whamcloud.com/14123&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/14123&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6389&quot; title=&quot;read()/write() returning less than available bytes intermittently&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6389&quot;&gt;&lt;del&gt;LU-6389&lt;/del&gt;&lt;/a&gt; llite: restart short read/write IO&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 44fd6ceb9810d1ce6ce15256732e42aea11ddf1c&lt;/p&gt;</comment>
                            <comment id="110195" author="bobijam" created="Fri, 20 Mar 2015 10:05:56 +0000"  >&lt;p&gt;I cannot reproduce it for read, and only reproduced the write when one OST is full. This patch will report write error.&lt;/p&gt;

&lt;p&gt;Can you give it a try?&lt;/p&gt;</comment>
                            <comment id="110196" author="adegremont" created="Fri, 20 Mar 2015 10:20:18 +0000"  >&lt;p&gt;Hello&lt;/p&gt;

&lt;p&gt;At CEA we are facing the same issue and working on it for few days. &lt;br/&gt;
We can reproduce short read or write simply on various Lustre filesystems with IOR as a reproducer.&lt;/p&gt;

&lt;p&gt;So far, I&apos;ve track this down to &lt;tt&gt;can_populate_pages()&lt;/tt&gt; and a layout lock cancel. We can easily see in Lustre debug logs that the layout lock is being canceled during I/O and so returning a short write. File layout as not changed and when lock is taken back, layout_gen is still &apos;0&apos;. File is accessed by only 1 node, 1 ior task and have just been created.&lt;br/&gt;
In log, the layout lock as no more reader and writer and so could be canceled. I&apos;m trying to understand why lock is localy being cancelled so early.&lt;/p&gt;

&lt;p&gt;Aur&#233;lien&lt;/p&gt;</comment>
                            <comment id="110197" author="bruno.travouillon" created="Fri, 20 Mar 2015 10:30:21 +0000"  >&lt;p&gt;Hi bobijam,&lt;/p&gt;

&lt;p&gt;We are currently working on the same kind of issue (short read/write). I am trying to grab relevant information to open a JIRA.&lt;/p&gt;

&lt;p&gt;When looking at your patch in lustre/llite/file.c, I don&apos;t understand why you change the behaviour? Is it for testing purposes? Obviously, this patch will solve our issue, but this is a U-turn from &quot;return short read/write&quot; to &quot;Restart io for short read/write&quot;,&lt;/p&gt;</comment>
                            <comment id="110200" author="bbarth" created="Fri, 20 Mar 2015 11:52:28 +0000"  >&lt;p&gt;Thanks, Bobijam. We can&apos;t try this patch against a filesystem that&apos;s in production operations. We will try to reproduce this on our test system, but I have my doubts. At best we might be able to patch on 3/31/15 when we have our next maintenance. Meanwhile, I&apos;m interested in anyone&apos;s opinions about how this could have started with no changes in the last several months. &lt;/p&gt;

&lt;p&gt;Bill.&lt;/p&gt;</comment>
                            <comment id="110201" author="bobijam" created="Fri, 20 Mar 2015 11:55:34 +0000"  >&lt;p&gt;Bill,&lt;/p&gt;

&lt;p&gt;Can you grab -1 logs when you reproduce it on the test system?&lt;/p&gt;</comment>
                            <comment id="110205" author="bbarth" created="Fri, 20 Mar 2015 13:59:07 +0000"  >&lt;p&gt;debug log&lt;/p&gt;</comment>
                            <comment id="110206" author="bbarth" created="Fri, 20 Mar 2015 14:00:19 +0000"  >&lt;p&gt;I have a -1 log from a production client node during a run that exhibits the problem. The file, bizip2ed is attached.&lt;/p&gt;</comment>
                            <comment id="110228" author="adilger" created="Fri, 20 Mar 2015 17:30:00 +0000"  >&lt;p&gt;Bobijam, I agree with Bruno that it would be better to track down the root cause why the IO is being interrupted rather than just retrying at the top level.&lt;/p&gt;</comment>
                            <comment id="110229" author="bbarth" created="Fri, 20 Mar 2015 17:34:41 +0000"  >&lt;p&gt;FYI a I ran this on our smaller, unloaded test rig running identical client and server versions and could not reproduce. &lt;/p&gt;</comment>
                            <comment id="110236" author="bobijam" created="Fri, 20 Mar 2015 18:16:05 +0000"  >&lt;p&gt;Bill, I couldn&apos;t find the error in the log, I think it does not cover the short read section.&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;$ grep ll_file_io_generic lustre_bbarth.log.txt
00000080:00000001:15.0:1426859590.895686:0:16140:0:(file.c:1187:ll_file_io_generic()) Process leaving via out (rc=4194308 : 4194308 : 0x400004)
00000080:00000001:15.0:1426859590.895760:0:16140:0:(file.c:1131:ll_file_io_generic()) Process entered
00000080:00000001:15.0:1426859591.062183:0:16140:0:(file.c:1187:ll_file_io_generic()) Process leaving via out (rc=4194308 : 4194308 : 0x400004)
00000080:00000001:15.0:1426859591.062260:0:16140:0:(file.c:1131:ll_file_io_generic()) Process entered
00000080:00000001:15.0:1426859591.229229:0:16140:0:(file.c:1187:ll_file_io_generic()) Process leaving via out (rc=4194308 : 4194308 : 0x400004)
00000080:00000001:15.0:1426859591.229306:0:16140:0:(file.c:1131:ll_file_io_generic()) Process entered
00000080:00000001:15.0:1426859591.396594:0:16140:0:(file.c:1187:ll_file_io_generic()) Process leaving via out (rc=4194308 : 4194308 : 0x400004)
00000080:00000001:15.0:1426859591.396667:0:16140:0:(file.c:1131:ll_file_io_generic()) Process entered
00000080:00000001:15.0:1426859591.563333:0:16140:0:(file.c:1187:ll_file_io_generic()) Process leaving via out (rc=4194308 : 4194308 : 0x400004)
00000080:00000001:15.0:1426859591.563417:0:16140:0:(file.c:1131:ll_file_io_generic()) Process entered
00000080:00000001:15.0:1426859591.730218:0:16140:0:(file.c:1187:ll_file_io_generic()) Process leaving via out (rc=4194308 : 4194308 : 0x400004)
00000080:00000001:15.0:1426859591.730295:0:16140:0:(file.c:1131:ll_file_io_generic()) Process entered
00000080:00000001:15.0:1426859591.896834:0:16140:0:(file.c:1187:ll_file_io_generic()) Process leaving via out (rc=4194308 : 4194308 : 0x400004)
00000080:00000001:15.0:1426859591.896904:0:16140:0:(file.c:1131:ll_file_io_generic()) Process entered
00000080:00000001:15.0:1426859592.063883:0:16140:0:(file.c:1187:ll_file_io_generic()) Process leaving via out (rc=4194308 : 4194308 : 0x400004)
00000080:00000001:15.0:1426859592.063961:0:16140:0:(file.c:1131:ll_file_io_generic()) Process entered
00000080:00000001:15.0:1426859592.233223:0:16140:0:(file.c:1187:ll_file_io_generic()) Process leaving via out (rc=4194308 : 4194308 : 0x400004)
00000080:00000001:15.0:1426859592.233301:0:16140:0:(file.c:1131:ll_file_io_generic()) Process entered
00000080:00000001:15.0:1426859592.402279:0:16140:0:(file.c:1187:ll_file_io_generic()) Process leaving via out (rc=4194308 : 4194308 : 0x400004)
00000080:00000001:15.0:1426859592.402355:0:16140:0:(file.c:1131:ll_file_io_generic()) Process entered
00000080:00000001:15.0:1426859592.571321:0:16140:0:(file.c:1187:ll_file_io_generic()) Process leaving via out (rc=4194308 : 4194308 : 0x400004)
00000080:00000001:15.0:1426859592.571398:0:16140:0:(file.c:1131:ll_file_io_generic()) Process entered
00000080:00000001:15.0:1426859592.741125:0:16140:0:(file.c:1187:ll_file_io_generic()) Process leaving via out (rc=4194308 : 4194308 : 0x400004)
00000080:00000001:15.0:1426859592.741202:0:16140:0:(file.c:1131:ll_file_io_generic()) Process entered
00000080:00000001:15.0:1426859598.853529:0:16140:0:(file.c:1187:ll_file_io_generic()) Process leaving via out (rc=4194308 : 4194308 : 0x400004)
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;all of them returns the expected length (4194308).&lt;/p&gt;</comment>
                            <comment id="110237" author="jay" created="Fri, 20 Mar 2015 18:16:44 +0000"  >&lt;p&gt;Hi Aurelien,&lt;/p&gt;

&lt;p&gt;This is because layout lock can be shared one dlm lock with other inodebits lock at open. The client may lose layout lock due to this false sharing.&lt;/p&gt;

&lt;p&gt;Jinshan&lt;/p&gt;</comment>
                            <comment id="110251" author="bbarth" created="Fri, 20 Mar 2015 20:12:04 +0000"  >&lt;p&gt;I have uploaded a second log file. I don&apos;t know how to produce text from the binary dump, but grepping the binary for that message only shows 4MB prints. Is it possible that the debug message is printing the size it wanted iinstead of the size it actually gave to the read()? I can guarantee that I got the short read on this run.&lt;/p&gt;

&lt;p&gt;Bill.&lt;/p&gt;</comment>
                            <comment id="110252" author="bbarth" created="Fri, 20 Mar 2015 20:12:38 +0000"  >&lt;p&gt;second attempt at a log file&lt;/p&gt;</comment>
                            <comment id="110253" author="bobijam" created="Fri, 20 Mar 2015 20:13:20 +0000"  >&lt;p&gt;Don&apos;t try this patch, it still has issue yet.&lt;/p&gt;</comment>
                            <comment id="110254" author="bbarth" created="Fri, 20 Mar 2015 20:19:14 +0000"  >&lt;p&gt;Don&apos;t worry, we didn&apos;t try it yet. I can&apos;t reproduce this in our test environment yet. Only in production. &lt;/p&gt;</comment>
                            <comment id="110278" author="adegremont" created="Sat, 21 Mar 2015 10:24:15 +0000"  >&lt;p&gt;For your information, I think this issue is really important because some libraries like netCDF (which is widely use) does stupid things like :&lt;br/&gt;
&lt;a href=&quot;https://github.com/Unidata/netcdf-c/blob/master/libsrc/posixio.c#L318&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://github.com/Unidata/netcdf-c/blob/master/libsrc/posixio.c#L318&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If it faces a short read, it zeroes all the unread part of the buffer, instead of doing another read, and continues and do not consider this as an error.&lt;/p&gt;

&lt;p&gt;Some of our netCDF users hit this bug with short read.&lt;br/&gt;
We can easily reproduce short writes on various clusters and different lustre filesystems with a simple &lt;tt&gt;IOR&lt;/tt&gt; (one thread, posix io) and even &lt;tt&gt;dd&lt;/tt&gt;! You do not need to be unaligned (ior like dd do aligned writes) but IOSIZE should be greater than STRIPESIZE, so your write should at least touch 2 stripes.&lt;/p&gt;

&lt;p&gt;We do not understand why some filesystems are impacted and other not.&lt;br/&gt;
@Bill: This problem appears suddendly for us too. It was first detected on our recently upgraded cluster to Lustre 2.5.3. But in fact this bug impacts all of them, and they are all now with 2.5.3. We downgrade some clients to 2.4 and the issue is still there (Layout Lock was already landed in 2.4). Servers are still in 2.5.3 though.&lt;/p&gt;

&lt;p&gt;So far, the smallest filesystems where we can reproduce this is one with 120 OSTs and quite some load. However, some bigger FS are not impacted... for unknown reason&lt;/p&gt;</comment>
                            <comment id="110279" author="adegremont" created="Sat, 21 Mar 2015 10:32:13 +0000"  >&lt;p&gt;Hi Jinshan!&lt;/p&gt;

&lt;p&gt;To reproduce this bug, it is mostly a timing issue AFAIK. Layout lock should be canceled by ldlm_bl thread during I/O, exactly between 2 calls to vvp_io_write_start(). If LL is dropped between 2 writes, it will be enqueued again before doing the 2nd write and it will be OK. If your I/O does not cover several stripes, it is also fine. &lt;/p&gt;</comment>
                            <comment id="110281" author="bbarth" created="Sat, 21 Mar 2015 13:56:43 +0000"  >&lt;p&gt;We have also experienced problems with NetCDF-based codes, but I don&apos;t have a compact reproducer yet. In our cases, we&apos;re seeing large writes from WRF hitting asserts in posixio.c:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;&lt;span class=&quot;code-keyword&quot;&gt;assert&lt;/span&gt;(*posp == OFF_NONE || *posp == lseek(nciop-&amp;gt;fd, 0, SEEK_CUR));
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;I haven&apos;t been able to prove that this follows a short read or write, but I have had complaints of files coming out shorter than expected.&lt;/p&gt;

&lt;p&gt;All of this also followed an update to 2.5.3 on our servers, but there is at least 2 months between our update and first known case. I would say that activity on the filesystems has been much higher this month, but not in a way that substantially increases the load reported by Linux, etc.&lt;/p&gt;
</comment>
                            <comment id="110284" author="bbarth" created="Sat, 21 Mar 2015 15:09:43 +0000"  >&lt;p&gt;Also, for the record, we have not been able to reproduce this problem with the attached Fortran code using gfortran. strace&apos;ing the gfortran version shows, basically, one giant read() call for the whole Fortran record rather than a long sequence of 4MB reads. This may or may not point in the right direction, but block based reads of sufficient size seem to be required. Also, changing read() to fread() in the attached C program makes the problem go away as well in my experience.&lt;/p&gt;</comment>
                            <comment id="110286" author="jay" created="Sat, 21 Mar 2015 16:48:47 +0000"  >&lt;p&gt;The root cause of this problem is clear and Bobijam is working on a fix, please stay tuned.&lt;/p&gt;</comment>
                            <comment id="110288" author="adilger" created="Sun, 22 Mar 2015 00:26:59 +0000"  >&lt;p&gt;Aurelien, have you considered filing a bug with the upstream NetCDF to handle short read/write?  I don&apos;t think it is a replacement for a solution to this bug, but it does seem like a bug in that code that should be fixed.&lt;/p&gt;</comment>
                            <comment id="110332" author="bobijam" created="Mon, 23 Mar 2015 09:00:29 +0000"  >&lt;p&gt;As Jinshan pointed, the layout lock could be lost and that would cause short read/write. Based on &quot;man 2 read&quot;, &quot;The system guarantees to read the number of bytes requested if the descriptor references a normal file that has that many bytes left before the end-of-file, but in no other case.&quot;, so read() should be return less than available bytes, so I think we need to retry short read/write.&lt;/p&gt;</comment>
                            <comment id="110333" author="adegremont" created="Mon, 23 Mar 2015 09:57:56 +0000"  >&lt;p&gt;It seems this note comes from OSX man page. There is nothing like that in Linux man page.&lt;br/&gt;
I&apos;ve looked at POSIX.1 quickly and there is nothing like this. &lt;/p&gt;

&lt;p&gt;I think Lustre should do is best to avoid short read but user codes should not consider short read will never happen. Both should be fixed.&lt;/p&gt;

&lt;p&gt;There is nothing like this for write, even if OSX man page &lt;img class=&quot;emoticon&quot; src=&quot;https://jira.whamcloud.com/images/icons/emoticons/wink.png&quot; height=&quot;16&quot; width=&quot;16&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/p&gt;</comment>
                            <comment id="110334" author="bobijam" created="Mon, 23 Mar 2015 10:28:46 +0000"  >&lt;p&gt;I&apos;ve updated #14123 to retry short read/write if user codes does not cope with short read/write.&lt;/p&gt;</comment>
                            <comment id="110336" author="bbarth" created="Mon, 23 Mar 2015 12:02:20 +0000"  >&lt;p&gt;Aurelien, I take the following from &lt;a href=&quot;http://pubs.opengroup.org/stage7tc1/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;POSIX&lt;/a&gt; to be the exhaustive list of reasons a short read may occur: &lt;/p&gt;

&lt;blockquote&gt;&lt;p&gt;The value returned may be less than nbyte if the number of bytes left in the file is less than nbyte, if the read() request was interrupted by a signal, or if the file is a pipe or FIFO or special file and has fewer than nbyte bytes immediately available for reading.&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;Since this is a regular file, only the first two may occur. Do you know of another interpretation,or am I maybe looking in the wrong place?&lt;/p&gt;</comment>
                            <comment id="110342" author="adegremont" created="Mon, 23 Mar 2015 13:58:02 +0000"  >&lt;p&gt;I think this is correct. That means:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;Users can face short read, mostly due to signal interrupt, if I/O was already started. In this case, they should do another read. If they supports this, this handles also the current Lustre behaviour nicely.&lt;/li&gt;
	&lt;li&gt;In theory, Lustre should do a short read only at EOF or on signal. So the current behavior should be improved.&lt;/li&gt;
&lt;/ul&gt;

</comment>
                            <comment id="110370" author="jay" created="Mon, 23 Mar 2015 16:56:24 +0000"  >&lt;p&gt;Hi Aurelien,&lt;/p&gt;

&lt;p&gt;In any cases, when a read or write encounters an error, such as ENOMEM or EIO, it should return the number of bytes it has already done to the application. A robust program must handle short read and write. This is just FYI. We&apos;ll make a fix for Lustre.&lt;/p&gt;</comment>
                            <comment id="110385" author="jcl" created="Mon, 23 Mar 2015 17:08:08 +0000"  >&lt;p&gt;Hi Jinshan&lt;/p&gt;

&lt;p&gt;on error read/write return -1, not the size moved. The only case of partial read/write developpers manage is for network file descriptors over protocols like UDP. &lt;br/&gt;
Even if it is allowed by posix, it is not usual for storage, so may be we whould add a mount option to allow partial read/write (if any one find an interrest / use case). &lt;/p&gt;</comment>
                            <comment id="110396" author="jay" created="Mon, 23 Mar 2015 17:33:06 +0000"  >&lt;p&gt;it returns -1 only if there is no bytes having been read or written yet.&lt;/p&gt;</comment>
                            <comment id="110399" author="paf" created="Mon, 23 Mar 2015 17:48:01 +0000"  >&lt;p&gt;jcf,&lt;/p&gt;

&lt;p&gt;Should Lustre then also reset the file pointer to the point it was at before the partial read or write?  What about the contents of the file or the buffer?  I feel like returning -1 in this case is misrepresenting what has happened: Some data has already been read or written, so the contents of your buffer or disk has changed.  Lustre cannot undo that, and it&apos;s an important difference from simply failing to read or write any data.&lt;/p&gt;

&lt;p&gt;I think Jinshan is right and we should follow POSIX semantics as he described.&lt;/p&gt;</comment>
                            <comment id="110431" author="thiells" created="Mon, 23 Mar 2015 21:56:38 +0000"  >&lt;p&gt;Some literature, like Robert Love&#8217;s Linux System Programming&lt;span class=&quot;error&quot;&gt;&amp;#91;1&amp;#93;&lt;/span&gt;, says that &#8220;for regular files, write() is guaranteed to perform the entire requested write, unless an error occurs&#8221;.&lt;/p&gt;

&lt;p&gt;&lt;span class=&quot;error&quot;&gt;&amp;#91;1&amp;#93;&lt;/span&gt; &lt;a href=&quot;https://books.google.fr/books?id=K1vXEb1SgawC&amp;amp;lpg=PA37&amp;amp;ots=fdB0D4uUUC&amp;amp;pg=PA37#v=onepage&amp;amp;q&amp;amp;f=false&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://books.google.fr/books?id=K1vXEb1SgawC&amp;amp;lpg=PA37&amp;amp;ots=fdB0D4uUUC&amp;amp;pg=PA37#v=onepage&amp;amp;q&amp;amp;f=false&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="110441" author="paf" created="Tue, 24 Mar 2015 02:03:09 +0000"  >&lt;p&gt;Yes, that&apos;s correct - The point is that an error sometimes occurs, and in that case, the number of bytes successfully read or written is returned.&lt;/p&gt;

&lt;p&gt;Note that the patch proposed currently will stop Lustre from returning short reads or writes except in case of interruption/error.&lt;/p&gt;</comment>
                            <comment id="110452" author="jcl" created="Tue, 24 Mar 2015 11:01:23 +0000"  >&lt;p&gt;I agree we have to always follow Posix (I misunderstood jinshan uses case). My suggestion is to add a mount option to allow partial R/W in case we find some interrest in doing such behavior (eg if the patch introduce some performance decrease). The idea is to tune like it is already done for posix locking support.&lt;/p&gt;</comment>
                            <comment id="110492" author="bbarth" created="Tue, 24 Mar 2015 16:32:32 +0000"  >&lt;p&gt;Just FYI, we tried to apply this patch to our 2.5.2 client source in order to test, and it didn&apos;t take. Once y&apos;all are happy that it is correct, we&apos;re going to need a 2.5.2 applicable version as well.&lt;/p&gt;</comment>
                            <comment id="110496" author="morrone" created="Tue, 24 Mar 2015 16:56:58 +0000"  >&lt;p&gt;Livermore has hit this problem in production as well.  The HPSS movers are hitting this.&lt;/p&gt;

&lt;p&gt;This problem crops up every few years, and each time we repeat the same debate about whether or not posix allows short reads.  Yes, it may technically allow it, but for filesystems no applications expect it.&lt;/p&gt;

&lt;p&gt;Our design choice in Lustre has been (for probably well over a decade), that Lustre must not return short reads or writes, except in the cases of a fatal error.  For fatal errors, all further IO to the file will fail as well, so it should be fairly obvious to the application that something has gone wrong.&lt;/p&gt;

&lt;p&gt;If that design choice is not recorded anywhere, it would be very good for someone to write it down this time.&lt;/p&gt;

&lt;p&gt;We should be treating the issue in this ticket a regression.&lt;/p&gt;</comment>
                            <comment id="110497" author="gerrit" created="Tue, 24 Mar 2015 17:00:32 +0000"  >&lt;p&gt;Bobi Jam (bobijam@hotmail.com) uploaded a new patch: &lt;a href=&quot;http://review.whamcloud.com/14160&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/14160&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6389&quot; title=&quot;read()/write() returning less than available bytes intermittently&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6389&quot;&gt;&lt;del&gt;LU-6389&lt;/del&gt;&lt;/a&gt; llite: restart short read/write for normal IO&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: b2_5&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 08085c551a2202594762ad999d511511be2f1c70&lt;/p&gt;</comment>
                            <comment id="110521" author="jay" created="Tue, 24 Mar 2015 18:12:50 +0000"  >&lt;p&gt;Hi Bill,&lt;/p&gt;

&lt;p&gt;Just in case you didn&apos;t notice that, Bobijam has backported the patch to b2_5 at &lt;a href=&quot;http://review.whamcloud.com/14160&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/14160&lt;/a&gt;.&lt;/p&gt;</comment>
                            <comment id="110530" author="bbarth" created="Tue, 24 Mar 2015 18:59:01 +0000"  >&lt;p&gt;Thanks. We&apos;re looking at it.&lt;/p&gt;</comment>
                            <comment id="110574" author="bbarth" created="Tue, 24 Mar 2015 23:29:24 +0000"  >&lt;p&gt;Testing with the 2.5 patch seems to be going fine. I&apos;ll let it run all night, but no failures so far. &lt;/p&gt;

&lt;p&gt;Two questions: How do y&apos;all feel about this patch (i.e. are we safe to push it to production on 3/31)? And, does anyone have a clue why this might have become a problem for us recently instead of immediately after a client or server change?&lt;/p&gt;

&lt;p&gt;Thanks,&lt;br/&gt;
Bill.&lt;/p&gt;</comment>
                            <comment id="110632" author="jay" created="Wed, 25 Mar 2015 17:48:52 +0000"  >&lt;p&gt;Hi Bill,&lt;/p&gt;

&lt;p&gt;It should be pretty safe to apply, especially when you will have run it on testing system several days by 3/31.&lt;/p&gt;

&lt;p&gt;Have you ever upgrade MDT node or started a new job recently?&lt;/p&gt;</comment>
                            <comment id="110650" author="bbarth" created="Wed, 25 Mar 2015 18:53:38 +0000"  >&lt;p&gt;We have not updated the MDT recently. &lt;/p&gt;

&lt;p&gt;We have thousands of new jobs start every day, so I&apos;m not sure what you mean by a new job. &lt;/p&gt;</comment>
                            <comment id="110676" author="jaylan" created="Wed, 25 Mar 2015 22:42:01 +0000"  >&lt;p&gt;NASA Ames hit the same problem in production.&lt;/p&gt;

&lt;p&gt;Can I take that the patch would do what Christopher Morrone said: &quot;Our design choice in Lustre has been (for probably well over a decade), that Lustre must not return short reads or writes, except in the cases of a fatal error?&quot;&lt;/p&gt;</comment>
                            <comment id="110718" author="gerrit" created="Thu, 26 Mar 2015 06:31:00 +0000"  >&lt;p&gt;Bobi Jam (bobijam@hotmail.com) uploaded a new patch: &lt;a href=&quot;http://review.whamcloud.com/14190&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/14190&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6389&quot; title=&quot;read()/write() returning less than available bytes intermittently&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6389&quot;&gt;&lt;del&gt;LU-6389&lt;/del&gt;&lt;/a&gt; llite: restart short read/write for normal IO&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: b2_4&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 7f12166c9fedd6d8aba3e59042142935d285d70e&lt;/p&gt;</comment>
                            <comment id="110719" author="bobijam" created="Thu, 26 Mar 2015 06:31:36 +0000"  >&lt;p&gt;yes, the patch would try to restart and finish the IO from where it has accomplished until EOF or error encountered.&lt;/p&gt;</comment>
                            <comment id="110794" author="bbarth" created="Fri, 27 Mar 2015 00:56:42 +0000"  >&lt;p&gt;I was slightly wrong. We applied patches from &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-5062&quot; title=&quot;LBUG: osc_req_attr_set&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-5062&quot;&gt;&lt;del&gt;LU-5062&lt;/del&gt;&lt;/a&gt; and &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-5726&quot; title=&quot;MDS buffer not freed when deleting files&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-5726&quot;&gt;&lt;del&gt;LU-5726&lt;/del&gt;&lt;/a&gt; on Feb 10, 2015. Those are the only changes in our setup recently. Do you think they&apos;re related to this issue?&lt;/p&gt;

&lt;p&gt;Also, we have been testing patchset 4 from the most recent 2.5 patch with good success. Our current plan is to deploy to production on 3/31/15.&lt;/p&gt;</comment>
                            <comment id="111176" author="paf" created="Tue, 31 Mar 2015 21:38:29 +0000"  >&lt;p&gt;Stepping back, is there any intent to address the underlying layout lock revocation?  Is that even addressable, or is it a permanent part of the design?  I&apos;m curious to try to better understand the underlying cause.&lt;/p&gt;

&lt;p&gt;Aurelien - I don&apos;t think I understand this comment:&lt;br/&gt;
&quot;Layout lock should be canceled by ldlm_bl thread during I/O, exactly between 2 calls to vvp_io_write_start(). If LL is dropped between 2 writes, it will be enqueued again before doing the 2nd write and it will be OK. If your I/O does not cover several stripes, it is also fine. &quot;&lt;/p&gt;

&lt;p&gt;What&apos;s the actual race condition here?  You say if it is dropped between the two writes, it is re-enqueued and all is well.  So when exactly does it need to be dropped for this to be a problem?  (And shouldn&apos;t the code that needs it ensure the lock is taken, rather than return?)&lt;/p&gt;</comment>
                            <comment id="111976" author="jay" created="Mon, 13 Apr 2015 03:42:29 +0000"  >&lt;p&gt;Layout lock can be lost in any of the following situations: false sharing, LRU, or memory pressure; and the tricky thing is the client doesn&apos;t know if the layout is still the same after it re-enqueues and gets new layout.&lt;/p&gt;</comment>
                            <comment id="115613" author="gerrit" created="Sun, 17 May 2015 22:50:07 +0000"  >&lt;p&gt;Oleg Drokin (oleg.drokin@intel.com) merged in patch &lt;a href=&quot;http://review.whamcloud.com/14123/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/14123/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6389&quot; title=&quot;read()/write() returning less than available bytes intermittently&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6389&quot;&gt;&lt;del&gt;LU-6389&lt;/del&gt;&lt;/a&gt; llite: restart short read/write for normal IO&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 8badb39913b5e1c614d2fe410ef7200391099855&lt;/p&gt;</comment>
                            <comment id="115664" author="pjones" created="Mon, 18 May 2015 14:23:01 +0000"  >&lt;p&gt;Landed for 2.8&lt;/p&gt;</comment>
                            <comment id="140960" author="gerrit" created="Wed, 3 Feb 2016 11:07:06 +0000"  >&lt;p&gt;Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: &lt;a href=&quot;http://review.whamcloud.com/18275&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/18275&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6389&quot; title=&quot;read()/write() returning less than available bytes intermittently&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6389&quot;&gt;&lt;del&gt;LU-6389&lt;/del&gt;&lt;/a&gt; utils: fix lustre_rsync read retry&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: b0c49e274f75afc4a4e2fcacfc7df9bbf88d5487&lt;/p&gt;</comment>
                            <comment id="143100" author="gerrit" created="Sat, 20 Feb 2016 05:40:11 +0000"  >&lt;p&gt;Oleg Drokin (oleg.drokin@intel.com) merged in patch &lt;a href=&quot;http://review.whamcloud.com/18275/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/18275/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6389&quot; title=&quot;read()/write() returning less than available bytes intermittently&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6389&quot;&gt;&lt;del&gt;LU-6389&lt;/del&gt;&lt;/a&gt; utils: fix lustre_rsync read retry&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 7d165f5fe357010c3b41abf1163aacb09a88816f&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10010">
                    <name>Duplicate</name>
                                                                <inwardlinks description="is duplicated by">
                                        <issuelink>
            <issuekey id="29186">LU-6392</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                                        </outwardlinks>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="29186">LU-6392</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="29785">LU-6545</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="17339" name="lustre_bbarth.log.bz2" size="239" author="bbarth" created="Fri, 20 Mar 2015 20:12:38 +0000"/>
                            <attachment id="17338" name="lustre_bbarth.log.bz2" size="239" author="bbarth" created="Fri, 20 Mar 2015 13:59:07 +0000"/>
                            <attachment id="17331" name="short_io_bug.tar.gz" size="1697" author="bbarth" created="Thu, 19 Mar 2015 23:22:33 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzx8zb:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10023"><![CDATA[4]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>