<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 03:05:09 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-13899] IOR data corruption detected during automated lnet fofb testing</title>
                <link>https://jira.whamcloud.com/browse/LU-13899</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Seeing data corruption on two IOR jobs while running lnet failover test with &quot; Regression write/verify DNE2 PFL&quot;. Both the failures showup data from immediate previous run. Likely cache related issue. fsync related warnings are also logged.&lt;/p&gt;

&lt;p&gt;CL_IOR_all_wr_20iter_1666K_rand&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt; access             = file-per-process
 	pattern            = strided (33 segments)
 	ordering in a file = random offsets
 	ordering inter file=constant task offsets = 1
 	clients            = 48 (4 per node)
 	repetitions        = 20
 	xfersize           = 1.63 MiB
 	blocksize          = 27.66 MiB
 	aggregate filesize = 42.78 GiB

 read      1316.54    28322      1666.00    0.001729   33.27      0.010429   33.28      16   XXCEL
 Using Time Stamp 1594461626 (0x5f098dba) &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; Data Signature
 Commencing write performance test.
 Sat Jul 11 05:00:26 2020read      1852.87    28322      1666.00    0.001450   23.64      0.004490   23.64      17   XXCEL
 Using Time Stamp 1594461701 (0x5f098e05) &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; Data Signature
 Commencing write performance test.
 Sat Jul 11 05:01:41 2020WARNING: cannot perform fsync on file.
 WARNING: cannot perform fsync on file.
 write     197.50     28322      1666.00    0.001447   221.82     0.421875   221.82     18   XXCEL
 Verifying contents of the file(s) just written.
 Sat Jul 11 05:05:23 2020 [6] At transfer buffer #3, index #0 (file &lt;span class=&quot;code-object&quot;&gt;byte&lt;/span&gt; offset 235425792):
 [6] Expected: 0x0000000e5f098e05
 [6] Actual:   0x0000000e5f098dba
 [6] At transfer buffer #3, index #2 (file &lt;span class=&quot;code-object&quot;&gt;byte&lt;/span&gt; offset 235425808):
 [6] Expected: 0x0000000e5f098e05
 [6] Actual:   0x0000000e5f098dbaIOR job: -a POSIX -i 20 -w -r -W -t 1666K -b 28322K -C -e -k -vv -E -F -q -s 33 -x -z


File in question:/lus/snx11205/ostest.vers/alsorun.20200711032404.2543.pollux-p4/CL_IOR_all_wr_20iter_1666K_rand.2.9t3jzC.1594459731/CL_IOR_all_wr_20iter_1666K_rand/IORfile.00000004 
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</description>
                <environment></environment>
        <key id="60360">LU-13899</key>
            <summary>IOR data corruption detected during automated lnet fofb testing</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="2" iconUrl="https://jira.whamcloud.com/images/icons/priorities/critical.svg">Critical</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="zam">Alexander Zarochentsev</assignee>
                                    <reporter username="zam">Alexander Zarochentsev</reporter>
                        <labels>
                    </labels>
                <created>Mon, 10 Aug 2020 17:49:48 +0000</created>
                <updated>Tue, 1 Sep 2020 04:55:45 +0000</updated>
                            <resolved>Tue, 1 Sep 2020 04:55:45 +0000</resolved>
                                                    <fixVersion>Lustre 2.14.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>6</watches>
                                                                            <comments>
                            <comment id="277116" author="zam" created="Mon, 10 Aug 2020 18:07:29 +0000"  >&lt;p&gt;Further debugging showed that&lt;br/&gt;
 1. failed fsync() was due to a AS_EIO flag set for the address space, and 492 pages with PG_error flag set:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;crash-7.2.8&amp;gt; kmem -p | grep ffff8803f2c10c50|awk &apos;{print $7}&apos;|sort|uniq -c
    491 error,referenced,uptodate,lru,private
      1 error,uptodate,lru,active,private
 232875 referenced,uptodate,lru,private
     11 referenced,uptodate,private
    279 uptodate,lru,active,private
crash-7.2.8&amp;gt;
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;2. A debug patch gave us error code -ESTALE: &lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;18282:0:3564:0:(vvp_page.c:304:vvp_page_completion_write()) page@ffff88017556f200[2 ffff8801f8663
0e8 0 1           (null)]
00000020:00000001:28.0:1595864091.118282:0:3566:0:(cl_page.c:840:cl_req_type_state()) Process leaving (rc=2 : 2 : 2)
00000080:00008000:25.0:1595864091.118283:0:3564:0:(vvp_page.c:304:vvp_page_completion_write()) completing WRITE with -116
00000020:00000001:28.0:1595864091.118283:0:3566:0:(cl_page.c:925:cl_page_completion()) Process entered
00000080:00040000:25.0:1595864091.118284:0:3564:0:(vvp_page.c:246:vvp_vmpage_error()) LBUG
00000020:00000001:28.0:1595864091.118285:0:3566:0:(cl_page.c:926:cl_page_completion()) page@ffff8801ff17fc00[2 ffff8801f86630e8 2 1 
          (null)]
00000020:00000001:28.0:1595864091.118286:0:3566:0:(cl_page.c:926:cl_page_completion()) 1 0
00000020:00000001:28.0:1595864091.118287:0:3566:0:(cl_page.c:341:cl_page_state_set0()) Process entered
00000020:00000001:28.0:1595864091.118288:0:3566:0:(cl_page.c:344:cl_page_state_set0()) page@ffff8801ff17fc00[2 ffff8801f86630e8 2 1 
          (null)]
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;3. and finally, the -ESTALE comes from (process_req_last_xid):&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;00000100:00000200:15.0:1596013191.408113:0:11072:0:(service.c:2233:ptlrpc_server_handle_request()) got req 1673498761985920
00000020:00000001:15.0:1596013191.408114:0:11072:0:(tgt_handler.c:706:tgt_request_handle()) Process entered
00000020:00000001:15.0:1596013191.408114:0:11072:0:(tgt_handler.c:637:process_req_last_xid()) Process entered
00000020:00000001:15.0:1596013191.408115:0:11072:0:(tgt_handler.c:679:process_req_last_xid()) Process leaving via out (rc=18446744073709551500 : -116 : 0xffffffffffffff8c)
00000020:00000001:15.0:1596013191.408116:0:11072:0:(tgt_handler.c:691:process_req_last_xid()) Process leaving (rc=18446744073709551500 : -116 : ffffffffffffff8c)
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;


&lt;p&gt;It appears the -ESTALE is not handled properly at client.&lt;/p&gt;</comment>
                            <comment id="277121" author="gerrit" created="Mon, 10 Aug 2020 18:26:30 +0000"  >&lt;p&gt;Alexander Zarochentsev (alexander.zarochentsev@hpe.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/39612&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/39612&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-13899&quot; title=&quot;IOR data corruption detected during automated lnet fofb testing&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-13899&quot;&gt;&lt;del&gt;LU-13899&lt;/del&gt;&lt;/a&gt; tgt: drop old epoch request&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 2af903484430def99448b0e0ba8685aec720dda6&lt;/p&gt;</comment>
                            <comment id="277255" author="spitzcor" created="Wed, 12 Aug 2020 00:59:15 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/ViewProfile.jspa?name=zam&quot; class=&quot;user-hover&quot; rel=&quot;zam&quot;&gt;zam&lt;/a&gt;, shouldn&apos;t this data integrity issue carry a higher priority?  I think we ought to target landing for 2.14.0.&lt;/p&gt;</comment>
                            <comment id="277791" author="eaujames" created="Thu, 20 Aug 2020 08:33:53 +0000"  >&lt;p&gt;Hello,&lt;br/&gt;
 Is a backport planned for b2_12?&lt;/p&gt;</comment>
                            <comment id="277794" author="zam" created="Thu, 20 Aug 2020 12:06:01 +0000"  >&lt;p&gt;&amp;gt;  shouldn&apos;t this data integrity issue carry a higher priority? I think we ought to target landing for 2.14.0.&lt;br/&gt;
agreed&lt;/p&gt;</comment>
                            <comment id="278441" author="gerrit" created="Tue, 1 Sep 2020 03:42:49 +0000"  >&lt;p&gt;Oleg Drokin (green@whamcloud.com) merged in patch &lt;a href=&quot;https://review.whamcloud.com/39612/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/39612/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-13899&quot; title=&quot;IOR data corruption detected during automated lnet fofb testing&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-13899&quot;&gt;&lt;del&gt;LU-13899&lt;/del&gt;&lt;/a&gt; tgt: drop old epoch request&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 4f192768293364c65411015de7531f62fdfb754c&lt;/p&gt;</comment>
                            <comment id="278467" author="pjones" created="Tue, 1 Sep 2020 04:55:45 +0000"  >&lt;p&gt;Landed for 2.14&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i017kv:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>