<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:44:10 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-4597] inconsistent file size</title>
                <link>https://jira.whamcloud.com/browse/LU-4597</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;We have received reports of Lustre clients incorrectly reporting files with 0 length then on a second attempt the correct non-zero length will be reported.  This is reminiscent of &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-274&quot; title=&quot;Client delayed file status (cache meta-data) causing job failures&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-274&quot;&gt;&lt;del&gt;LU-274&lt;/del&gt;&lt;/a&gt;.  Before digging too far into this, I notice that the &lt;a href=&quot;http://review.whamcloud.com/#/c/507/2&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;fix&lt;/a&gt; for &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-274&quot; title=&quot;Client delayed file status (cache meta-data) causing job failures&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-274&quot;&gt;&lt;del&gt;LU-274&lt;/del&gt;&lt;/a&gt; did not seem to survive the conversion from obdfilter to ofd.  Do we need a similar fix in &lt;tt&gt;ofd_intent_policy()&lt;/tt&gt;?&lt;/p&gt;

&lt;p&gt;Here is the reproducer reported by our user:&lt;/p&gt;

&lt;ol&gt;
	&lt;li&gt;On cslic3 using lscratch3, I created 20 files of size 1024 bytes&lt;/li&gt;
	&lt;li&gt;Waited 27 minutes (I tried around 15 minutes and didn&apos;t see incorrect size zeros)&lt;/li&gt;
	&lt;li&gt;On cslic8 listed the lscratch3 directory from step 1, I saw 3 files with size 0&lt;/li&gt;
	&lt;li&gt;On cslic8 listed the lscratch3 directory again (immediately following step3) and all files listed as 1024 bytes&lt;/li&gt;
&lt;/ol&gt;
</description>
                <environment>2.4.0-19chaos clients and servers</environment>
        <key id="23045">LU-4597</key>
            <summary>inconsistent file size</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="2" iconUrl="https://jira.whamcloud.com/images/icons/priorities/critical.svg">Critical</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="niu">Niu Yawei</assignee>
                                    <reporter username="nedbass">Ned Bass</reporter>
                        <labels>
                            <label>mn4</label>
                    </labels>
                <created>Thu, 6 Feb 2014 23:01:23 +0000</created>
                <updated>Mon, 14 Apr 2014 20:44:45 +0000</updated>
                            <resolved>Wed, 26 Feb 2014 23:18:09 +0000</resolved>
                                    <version>Lustre 2.4.0</version>
                                    <fixVersion>Lustre 2.6.0</fixVersion>
                    <fixVersion>Lustre 2.5.1</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>10</watches>
                                                                            <comments>
                            <comment id="76408" author="nedbass" created="Thu, 6 Feb 2014 23:19:06 +0000"  >&lt;p&gt;Oh, I see &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-274&quot; title=&quot;Client delayed file status (cache meta-data) causing job failures&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-274&quot;&gt;&lt;del&gt;LU-274&lt;/del&gt;&lt;/a&gt; was fixed for b_2.x in &lt;a href=&quot;http://review.whamcloud.com/583&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;this patch&lt;/a&gt;.&lt;/p&gt;</comment>
                            <comment id="76417" author="pjones" created="Thu, 6 Feb 2014 23:51:02 +0000"  >&lt;p&gt;Niu&lt;/p&gt;

&lt;p&gt;Could you please help Prakash with any questions he has about this work?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="76528" author="niu" created="Sat, 8 Feb 2014 03:04:23 +0000"  >&lt;p&gt;The fix of &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-274&quot; title=&quot;Client delayed file status (cache meta-data) causing job failures&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-274&quot;&gt;&lt;del&gt;LU-274&lt;/del&gt;&lt;/a&gt; is in ldlm_cb_interpret() of 2.4, so looks this is different problem than &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-274&quot; title=&quot;Client delayed file status (cache meta-data) causing job failures&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-274&quot;&gt;&lt;del&gt;LU-274&lt;/del&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Ned, is it only seen on zfs or it&apos;s a common issue? Can this be easily reproduced by the procedures provided by you? Thank you.&lt;/p&gt;</comment>
                            <comment id="76537" author="nedbass" created="Sat, 8 Feb 2014 04:21:42 +0000"  >&lt;p&gt;I believe we&apos;ve seen it on both ZFS and ldiskfs, but I&apos;ll verify that next week.  It&apos;s not easily reproducible.  I&apos;ve been running the procedure in a loop for over 24 hours and haven&apos;t reproduced it yet.&lt;/p&gt;</comment>
                            <comment id="76758" author="nedbass" created="Tue, 11 Feb 2014 18:50:46 +0000"  >&lt;p&gt;Niu, our archival storage servers log an error message during file transfer if a file size changes after an initial scan, so we can use that as evidence of this bug.  The logs show a sharp increase in file sizes changing from 0 after we updated our Lustre servers from 2.1 to 2.4.0-19chaos.  This has affected all of our filesystems running both ZFS and ldiskfs.  It has been observed from both 2.1 and 2.4 clients.&lt;/p&gt;</comment>
                            <comment id="76794" author="nedbass" created="Wed, 12 Feb 2014 00:35:35 +0000"  >&lt;p&gt;We found a pretty reliable reproducer for this bug.  Unfortunately it is only working on one of our classified filesystems, so I can&apos;t send debug logs. The server is running 2.4.0-24chaos (see &lt;a href=&quot;https://github.com/chaos/lustre&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://github.com/chaos/lustre&lt;/a&gt;) with ZFS and the clients are 2.4.0-19chaos.  The reproducer is pretty simple. &lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;# Create files locally then list them on remote node e8.
for ((i=0;i&amp;lt;20;i++)) ; do dd if=/dev/urandom of=file$i bs=1k count=1 ; done ; rsh e8 ls -l `pwd`
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Create several 1k files then immediately list them on another node.  Some of the files are listed with 0 length, then show the correct lengths if listed again.  Usually between 0 and 3 files are affected, but which and how many files varies between attempts.&lt;/p&gt;

&lt;p&gt;I captured debug logs from the client and MDS for one successful attempt.  The client logs had &lt;tt&gt;+dlmtrace +rpctrace&lt;/tt&gt; and the MDS log had -1.  The bug wouldn&apos;t reproduce with -1 debugging on the clients.  But, I haven&apos;t been able to find the bug yet from the logs.  Please let me know if you have any tips on how to debug this.  Meanwhile I&apos;ll keep trying to reproduce this on an unclassified system so we can send debug logs.&lt;/p&gt;</comment>
                            <comment id="76796" author="nedbass" created="Wed, 12 Feb 2014 01:50:13 +0000"  >&lt;p&gt;I managed to reproduce the bug on an unclassified system and get debug logs from the clients, MDS, and OSS.  I uploaded them in a tarball to ftp.whamcloud.com.  Email me privately if you need the file name.  There is a README in the tarball with a few notes of relevance.&lt;/p&gt;</comment>
                            <comment id="76797" author="nedbass" created="Wed, 12 Feb 2014 02:04:58 +0000"  >&lt;p&gt;In case it helps interpret the debug logs, here are the NIDs of the nodes involved.&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;sierra654: 192.168.114.155@o2ib5     # created files
sierra330: 192.168.113.81@o2ib5      # got zero length for file4
porter44: 172.19.1.213@o2ib100       # OSS owning object for file4
porter-mds1: 172.19.1.165@o2ib100    # MDS
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="76798" author="nedbass" created="Wed, 12 Feb 2014 02:09:40 +0000"  >&lt;p&gt;FWIW, having full debugging enabled on the servers seems to make this bug much easier to reproduce.&lt;/p&gt;</comment>
                            <comment id="76836" author="niu" created="Wed, 12 Feb 2014 15:33:11 +0000"  >&lt;p&gt;I can reproduce it with two mounts, and looks it&apos;s a race of agl vs normal getattr (because it can&apos;t be reproduced anymore when agl turned off), will look into it further. &lt;/p&gt;</comment>
                            <comment id="76870" author="nedbass" created="Wed, 12 Feb 2014 18:47:46 +0000"  >&lt;p&gt;I confirmed that disabling statahead_agl seem to prevent the bug here as well.&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;lctl set_param llite.*.statahead_agl=0
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="76941" author="niu" created="Thu, 13 Feb 2014 07:13:06 +0000"  >&lt;p&gt;patch for master: &lt;a href=&quot;http://review.whamcloud.com/9249&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/9249&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="77398" author="nedbass" created="Wed, 19 Feb 2014 18:32:37 +0000"  >&lt;p&gt;We set &lt;tt&gt;statahead_agl=0&lt;/tt&gt; on all our clients to workaround this issue until the patch can be deployed.  This seemed to work, however I just learned that the &apos;size changed from 0&apos; error was reported for 9 files during a run of the &quot;htar&quot; archival storage utility.  So there may be another (less frequent) bug that can cause this behavior.&lt;/p&gt;</comment>
                            <comment id="77472" author="jamesanunez" created="Thu, 20 Feb 2014 14:53:10 +0000"  >&lt;p&gt;Patch for b2_5 at &lt;a href=&quot;http://review.whamcloud.com/#/c/9328/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#/c/9328/&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="77967" author="pjones" created="Wed, 26 Feb 2014 23:18:09 +0000"  >&lt;p&gt;Landed for 2.5.1 and 2.6&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzwehj:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>12564</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>