<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 03:07:45 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-14205] RHEL8.3: sanity test 398c fails with &apos;fio write error&apos;</title>
                <link>https://jira.whamcloud.com/browse/LU-14205</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;sanity test_398c fails with &apos;fio write error&apos;. This failure is different from LU-13897.&lt;br/&gt;
We&#8217;ve only seen this failure once for RHEL8.3 server/client testing at &lt;a href=&quot;https://testing.whamcloud.com/test_sets/fe2f2ee3-067d-431b-b69c-01858086821c&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.whamcloud.com/test_sets/fe2f2ee3-067d-431b-b69c-01858086821c&lt;/a&gt; . Looking at the suite_log output, we see a couple of fio calls with &#8216;err=71&#8217;&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;== sanity test 398c: run fio to test AIO ============================================================= 21:25:37 (1607462737)
/usr/bin/fio
debug=0
40+0 records in
40+0 records out
41943040 bytes (42 MB, 40 MiB) copied, 0.0712628 s, 589 MB/s
osc.lustre-OST0000-osc-ffff8a2144fe3000.rpc_stats=clear
writing 40M to OST0 by fio with 4 jobs...
rand-write: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=16
...
fio-3.19
Starting 4 processes
fio: io_u error on file /mnt/lustre/f398c.sanity: Protocol error: write offset=2654208, buflen=4096
fio: io_u error on file /mnt/lustre/f398c.sanity: Protocol error: write offset=2682880, buflen=4096
fio: io_u error on file /mnt/lustre/f398c.sanity: Protocol error: write offset=2686976, buflen=4096
fio: io_u error on file /mnt/lustre/f398c.sanity: Protocol error: write offset=2695168, buflen=4096
fio: io_u error on file /mnt/lustre/f398c.sanity: Protocol error: write offset=3534848, buflen=4096
fio: io_u error on file /mnt/lustre/f398c.sanity: Protocol error: write offset=3543040, buflen=4096
fio: io_u error on file /mnt/lustre/f398c.sanity: Protocol error: write offset=3547136, buflen=4096
fio: io_u error on file /mnt/lustre/f398c.sanity: Protocol error: write offset=3567616, buflen=4096
fio: io_u error on file /mnt/lustre/f398c.sanity: Protocol error: write offset=3575808, buflen=4096
fio: io_u error on file /mnt/lustre/f398c.sanity: Protocol error: write offset=3588096, buflen=4096
fio: io_u error on file /mnt/lustre/f398c.sanity: Protocol error: write offset=3596288, buflen=4096
fio: pid=892877, err=71/file:io_u.c:1803, func=io_u error, error=Protocol error

rand-write: (groupid=0, jobs=1): err= 0: pid=892876: Tue Dec  8 21:28:37 2020
  write: IOPS=14, BW=58.4KiB/s (59.8kB/s)(10.0MiB/175370msec); 0 zone resets
    slat (usec): min=14, max=321, avg=37.10, stdev=35.86
    clat (msec): min=116, max=1944, avg=1095.64, stdev=174.44
     lat (msec): min=116, max=1944, avg=1095.68, stdev=174.44
    clat percentiles (msec):
     |  1.00th=[  617],  5.00th=[  877], 10.00th=[  927], 20.00th=[  978],
     | 30.00th=[ 1011], 40.00th=[ 1053], 50.00th=[ 1083], 60.00th=[ 1116],
     | 70.00th=[ 1150], 80.00th=[ 1217], 90.00th=[ 1284], 95.00th=[ 1385],
     | 99.00th=[ 1737], 99.50th=[ 1854], 99.90th=[ 1905], 99.95th=[ 1921],
     | 99.99th=[ 1938]
   bw (  KiB/s): min=    8, max=  128, per=25.59%, avg=59.36, stdev=25.99, samples=343
   iops        : min=    2, max=   32, avg=14.84, stdev= 6.50, samples=343
  lat (msec)   : 250=0.04%, 500=0.23%, 750=2.30%, 1000=24.30%, 2000=73.12%
  cpu          : usr=0.01%, sys=0.05%, ctx=756, majf=0, minf=10
  IO depths    : 1=0.1%, 2=0.1%, 4=0.2%, 8=0.3%, 16=99.4%, 32=0.0%, &amp;gt;=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, &amp;gt;=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, &amp;gt;=64=0.0%
     issued rwts: total=0,2560,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16
rand-write: (groupid=0, jobs=1): err=71 (file:io_u.c:1803, func=io_u error, error=Protocol error): pid=892877: Tue Dec  8 21:28:37 2020
&#8230;
Run status group 0 (all jobs):
  WRITE: bw=233KiB/s (239kB/s), 58.2KiB/s-58.4KiB/s (59.6kB/s-59.8kB/s), io=39.0MiB (41.9MB), run=175199-175612msec
 sanity test_398c: @@@@@@ FAIL: fio write error 
  Trace dump:
  = /usr/lib64/lustre/tests/test-framework.sh:6257:error()
  = /usr/lib64/lustre/tests/sanity.sh:22338:test_398c()
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Looking at the console logs, client1 (vm1) has as error&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[19342.934999] Lustre: DEBUG MARKER: == sanity test 398c: run fio to test AIO ============================================================= 21:25:37 (1607462737)
[19520.867822] LustreError: 558753:0:(osc_request.c:1947:osc_brw_fini_request()) lustre-OST0005-osc-ffff8a2144fe3000: unexpected positive size 1
[19523.015054] Lustre: DEBUG MARKER: /usr/sbin/lctl mark  sanity test_398c: @@@@@@ FAIL: fio write error 
[19523.582640] Lustre: DEBUG MARKER: sanity test_398c: @@@@@@ FAIL: fio write error
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</description>
                <environment>RHEL8.3 servers/clients</environment>
        <key id="61931">LU-14205</key>
            <summary>RHEL8.3: sanity test 398c fails with &apos;fio write error&apos;</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="1" iconUrl="https://jira.whamcloud.com/images/icons/priorities/blocker.svg">Blocker</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="wshilong">Wang Shilong</assignee>
                                    <reporter username="jamesanunez">James Nunez</reporter>
                        <labels>
                            <label>rhel8.3</label>
                    </labels>
                <created>Wed, 9 Dec 2020 18:43:16 +0000</created>
                <updated>Tue, 22 Dec 2020 06:19:51 +0000</updated>
                            <resolved>Tue, 22 Dec 2020 06:19:51 +0000</resolved>
                                    <version>Lustre 2.14.0</version>
                                    <fixVersion>Lustre 2.14.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>6</watches>
                                                                            <comments>
                            <comment id="287141" author="pjones" created="Wed, 9 Dec 2020 18:59:56 +0000"  >&lt;p&gt;Shilong&lt;/p&gt;

&lt;p&gt;Could you please advise on this one?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="287261" author="jamesanunez" created="Thu, 10 Dec 2020 23:08:44 +0000"  >&lt;p&gt;I think we are seeing the same issue for sanity-pfl test_6 at &lt;a href=&quot;https://testing.whamcloud.com/test_sets/cee68701-50f3-4729-b843-34935b3129d2&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.whamcloud.com/test_sets/cee68701-50f3-4729-b843-34935b3129d2&lt;/a&gt; :&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;== sanity-pfl test 6: Migrate composite file ========================================================= 20:02:44 (1607371364)
striped dir -i2 -c2 -H crush /mnt/lustre/d6.sanity-pfl
5+0 records in
5+0 records out
5242880 bytes (5.2 MB, 5.0 MiB) copied, 3.02798 s, 1.7 MB/s
error: lfs migrate: /mnt/lustre/d6.sanity-pfl/f6.sanity-pfl: data copy failed: Protocol error
 sanity-pfl test_6: @@@@@@ FAIL: Migrate(compsoite -&amp;gt; composite) /mnt/lustre/d6.sanity-pfl/f6.sanity-pfl failed 
  Trace dump:
  = /usr/lib64/lustre/tests/test-framework.sh:6257:error()
  = /usr/lib64/lustre/tests/sanity-pfl.sh:364:test_6()
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;and test 18&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;== sanity-pfl test 18: check component distribution ================================================== 20:18:46 (1607372326)
dd: error writing &apos;/mnt/lustre/f18.sanity-pfl-1&apos;: Protocol error
3+0 records in
2+0 records out
2097152 bytes (2.1 MB, 2.0 MiB) copied, 0.0616448 s, 34.0 MB/s
 sanity-pfl test_18: @@@@@@ FAIL: dd failed for /mnt/lustre/f18.sanity-pfl-1 
  Trace dump:
  = /usr/lib64/lustre/tests/test-framework.sh:6257:error()
  = /usr/lib64/lustre/tests/sanity-pfl.sh:982:test_18()
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
</comment>
                            <comment id="287289" author="wshilong" created="Fri, 11 Dec 2020 06:57:04 +0000"  >&lt;p&gt;This looks not really specific to AIO/DIO, test 18 of sanity-pfl is a generic buffer write, maybe write from server side somehow return error.&lt;/p&gt;</comment>
                            <comment id="287291" author="gerrit" created="Fri, 11 Dec 2020 07:37:01 +0000"  >&lt;p&gt;Wang Shilong (wshilong@whamcloud.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/40944&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/40944&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-14205&quot; title=&quot;RHEL8.3: sanity test 398c fails with &amp;#39;fio write error&amp;#39;&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-14205&quot;&gt;&lt;del&gt;LU-14205&lt;/del&gt;&lt;/a&gt; osd-ldiskfs: return correct error after end io&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 296545189f9ebeccf0bcff8b068f2d87ce78b1ed&lt;/p&gt;</comment>
                            <comment id="287370" author="jhammond" created="Fri, 11 Dec 2020 20:51:36 +0000"  >&lt;p&gt;I can see this in a very vanilla setup on RHEL 8.3. Using /bin/cp to copy a 1G file:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[root@trevis-60vm1 lustre]# ls -lh f0
-rw-r--r-- 1 root root 1.0G Dec 11 19:38 f0
[root@trevis-60vm1 lustre]# rm -f f1; echo 3 &amp;gt; /proc/sys/vm/drop_caches
[root@trevis-60vm1 lustre]# /bin/cp f0 f1
/bin/cp: failed to close &apos;f1&apos;: Input/output error
[root@trevis-60vm1 lustre]# dmesg | tail -1
[ 6672.574315] LustreError: 12513:0:(osc_request.c:1947:osc_brw_fini_request()) lustre-OST0003-osc-ffff9d2eaeb92800: unexpected positive size 1
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="287658" author="wshilong" created="Wed, 16 Dec 2020 01:36:41 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/ViewProfile.jspa?name=jhammond&quot; class=&quot;user-hover&quot; rel=&quot;jhammond&quot;&gt;jhammond&lt;/a&gt; I tried your steps without any luck, would you try above patch if it help fix your problem?&lt;/p&gt;</comment>
                            <comment id="287664" author="gerrit" created="Wed, 16 Dec 2020 02:59:53 +0000"  >&lt;p&gt;Wang Shilong (wshilong@whamcloud.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/40989&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/40989&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-14205&quot; title=&quot;RHEL8.3: sanity test 398c fails with &amp;#39;fio write error&amp;#39;&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-14205&quot;&gt;&lt;del&gt;LU-14205&lt;/del&gt;&lt;/a&gt; osd-ldiskfs: fix return of osd_extend_restart_trans()&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: df32229ddafdd4696c33e8790b93a3db57997de7&lt;/p&gt;</comment>
                            <comment id="287665" author="wshilong" created="Wed, 16 Dec 2020 03:00:29 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/ViewProfile.jspa?name=jhammond&quot; class=&quot;user-hover&quot; rel=&quot;jhammond&quot;&gt;jhammond&lt;/a&gt; Never mind, i reproduced problem and finally figured out..&lt;/p&gt;</comment>
                            <comment id="287759" author="jhammond" created="Wed, 16 Dec 2020 18:13:51 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/ViewProfile.jspa?name=wshilong&quot; class=&quot;user-hover&quot; rel=&quot;wshilong&quot;&gt;wshilong&lt;/a&gt; are both changes needed?&lt;/p&gt;</comment>
                            <comment id="287826" author="wshilong" created="Thu, 17 Dec 2020 00:58:15 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/ViewProfile.jspa?name=jhammond&quot; class=&quot;user-hover&quot; rel=&quot;jhammond&quot;&gt;jhammond&lt;/a&gt; Acutally second patch is root cause of your problem.&lt;/p&gt;</comment>
                            <comment id="287866" author="jhammond" created="Thu, 17 Dec 2020 14:14:11 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/ViewProfile.jspa?name=wshilong&quot; class=&quot;user-hover&quot; rel=&quot;wshilong&quot;&gt;wshilong&lt;/a&gt; Right. So should the first patch be landed or abandoned?&lt;/p&gt;</comment>
                            <comment id="287868" author="wshilong" created="Thu, 17 Dec 2020 14:26:25 +0000"  >&lt;p&gt;First patch indeed fix a bug  which should be landed as well.&lt;/p&gt;</comment>
                            <comment id="287882" author="gerrit" created="Thu, 17 Dec 2020 17:00:36 +0000"  >&lt;p&gt;Oleg Drokin (green@whamcloud.com) merged in patch &lt;a href=&quot;https://review.whamcloud.com/40944/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/40944/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-14205&quot; title=&quot;RHEL8.3: sanity test 398c fails with &amp;#39;fio write error&amp;#39;&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-14205&quot;&gt;&lt;del&gt;LU-14205&lt;/del&gt;&lt;/a&gt; osd-ldiskfs: return correct error after end io&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 31a1afade08f5bc36b3ebb1d1bace103f902c155&lt;/p&gt;</comment>
                            <comment id="288164" author="gerrit" created="Tue, 22 Dec 2020 05:28:02 +0000"  >&lt;p&gt;Oleg Drokin (green@whamcloud.com) merged in patch &lt;a href=&quot;https://review.whamcloud.com/40989/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/40989/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-14205&quot; title=&quot;RHEL8.3: sanity test 398c fails with &amp;#39;fio write error&amp;#39;&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-14205&quot;&gt;&lt;del&gt;LU-14205&lt;/del&gt;&lt;/a&gt; osd-ldiskfs: fix return of osd_extend_restart_trans()&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 4aa17923f08bea425a20961cb6eaa72ad9af38c1&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10010">
                    <name>Duplicate</name>
                                                                <inwardlinks description="is duplicated by">
                                        <issuelink>
            <issuekey id="61915">LU-14202</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="61910">LU-14200</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i01gsf:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>