<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:45:36 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-4758] parallel-scale test_metabench: metabench failed with 1</title>
                <link>https://jira.whamcloud.com/browse/LU-4758</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;&lt;a href=&quot;https://maloo.whamcloud.com/test_sets/2254277a-a030-11e3-947c-52540035b04c&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://maloo.whamcloud.com/test_sets/2254277a-a030-11e3-947c-52540035b04c&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;test log shows:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[02/27/2014 12:30:13] Entering par_create_multidir to create 4343 files in 1 dirs
Removed 30400 files in    205.132 seconds
[02/27/2014 12:34:02] Leaving par_create_multidir
Parallel file creation by 7 processes in seperate directories
Process       Files       Time       Rate
-------- ---------- ---------- ----------
001/008        4343     23.758    182.805
002/008        4343     18.392    236.135
003/008        4343     23.868    181.957
004/008        4343     18.325    236.996
005/008        4343     23.885    181.828
006/008        4343     18.388    236.186
007/008        4343     23.927    181.510
-------- ---------- ---------- ----------
Total         30401     23.927   1270.568
Elapsed       30401     23.928   1270.507
-------- ---------- ---------- ----------
Average        4343     21.506    205.345
Std Dev                  2.718     26.932 (  12.64%) (  13.12%)

[02/27/2014 12:36:47] FATAL error on process 0
Proc 0: Unable to stat file [/mnt/lustre/d0.metabench/CREATE_MD_008.000/nSUFhBISq]: No such file or directory
--------------------------------------------------------------------------
mpirun has exited due to process rank 0 with PID 29305 on
node client-32vm5 exiting improperly. There are two reasons this could occur:

1. this process did not call &quot;init&quot; before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call &quot;init&quot;. By rule, if one process calls &quot;init&quot;,
then ALL processes must call &quot;init&quot; prior to termination.

2. this process called &quot;init&quot;, but exited without calling &quot;finalize&quot;.
By rule, all processes that call &quot;init&quot; MUST call &quot;finalize&quot; prior to
exiting or it will be considered an &quot;abnormal termination&quot;

This may have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
 parallel-scale test_metabench: @@@@@@ FAIL: metabench failed! 1 
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</description>
                <environment>client and server: lustre-master build # 1911 RHEL6 ldiskfs</environment>
        <key id="23608">LU-4758</key>
            <summary>parallel-scale test_metabench: metabench failed with 1</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="2" iconUrl="https://jira.whamcloud.com/images/icons/priorities/critical.svg">Critical</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="3">Duplicate</resolution>
                                        <assignee username="hongchao.zhang">Hongchao Zhang</assignee>
                                    <reporter username="sarah">Sarah Liu</reporter>
                        <labels>
                    </labels>
                <created>Wed, 12 Mar 2014 19:55:12 +0000</created>
                <updated>Wed, 14 May 2014 17:05:16 +0000</updated>
                            <resolved>Wed, 14 May 2014 17:05:16 +0000</resolved>
                                    <version>Lustre 2.6.0</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>5</watches>
                                                                            <comments>
                            <comment id="79988" author="pjones" created="Fri, 21 Mar 2014 13:34:00 +0000"  >&lt;p&gt;Hongchao&lt;/p&gt;

&lt;p&gt;Could you please look into this one?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="80303" author="hongchao.zhang" created="Wed, 26 Mar 2014 16:16:44 +0000"  >&lt;p&gt;status update:&lt;/p&gt;

&lt;p&gt;checked the some result of the recent failed tests in Maloo, and found no -ENOENT(-2) in the logs of both client and server. &lt;/p&gt;</comment>
                            <comment id="80537" author="hongchao.zhang" created="Sat, 29 Mar 2014 00:33:36 +0000"  >&lt;p&gt;the debug patch collecting more specific logs was tracked at &lt;a href=&quot;http://review.whamcloud.com/#/c/9813/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#/c/9813/&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="80581" author="hongchao.zhang" created="Mon, 31 Mar 2014 03:57:50 +0000"  >&lt;p&gt;in test report &lt;a href=&quot;https://maloo.whamcloud.com/sub_tests/2d11e162-b87b-11e3-ac5f-52540035b04c&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://maloo.whamcloud.com/sub_tests/2d11e162-b87b-11e3-ac5f-52540035b04c&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;the missing file (/mnt/lustre/d0.metabench/STAT_MD_008.000/3SoD) was just deleted before &quot;stat&quot; it, &lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;1396208726.881017:0:5933:0:(file.c:3517:ll_inode_permission()) VFS Op:inode=[0x200005221:0x98be:0x0](ffff88000a887b38), inode mode 41e8 mask 1
1396208726.881019:0:5933:0:(file.c:3517:ll_inode_permission()) VFS Op:inode=[0x200005221:0x98be:0x0](ffff88000a887b38), inode mode 41e8 mask 1
1396208726.881020:0:5933:0:(dcache.c:421:ll_revalidate_nd()) VFS Op:name=3SoD, flags=0
1396208726.881021:0:5933:0:(file.c:3517:ll_inode_permission()) VFS Op:inode=[0x200005221:0x98be:0x0](ffff88000a887b38), inode mode 41e8 mask 3
1396208726.881022:0:5933:0:(namei.c:1297:ll_unlink_generic()) VFS Op:name=3SoD, dir=[0x200005221:0x98be:0x0](ffff88000a887b38)

...

1396208726.955788:0:5933:0:(file.c:3517:ll_inode_permission()) VFS Op:inode=[0x200005221:0x65d5:0x0](ffff88005c63abb8), inode mode 41e8 mask 1
1396208726.955790:0:5933:0:(namei.c:568:ll_lookup_it()) VFS Op:name=3SoD, dir=[0x200005221:0x65d5:0x0](ffff88005c63abb8), intent=getattr
1396208726.955793:0:5933:0:(mdc_locks.c:1173:mdc_intent_lock()) (name: 3SoD,[0x0:0x0:0x0]) in obj [0x200005221:0x65d5:0x0], intent: getattr flags 00

...

1396208727.630480:0:6049:0:(debug.c:345:libcfs_debug_mark_buffer()) DEBUG MARKER: /usr/sbin/lctl mark  parallel-scale test_metabench: @@@@@@ FAIL: metabench failed! 1 
1396208727.744342:0:6057:0:(debug.c:345:libcfs_debug_mark_buffer()) DEBUG MARKER: parallel-scale test_metabench: @@@@@@ FAIL: metabench failed! 1
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;this issue could be related to the test script or metabench itself.&lt;/p&gt;</comment>
                            <comment id="80591" author="hongchao.zhang" created="Mon, 31 Mar 2014 09:02:14 +0000"  >&lt;p&gt;there is another type of failure of this ticket,&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Proc 0: Unable to stat file [/mnt/lustre/d0.metabench/STAT_MD_008.000/STAT_007_008/Q4oDt.1EWkTu]: Input/output error
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;it&apos;s caused by the client eviction from OST for the ptlrpc_request of LDLM_BL_CALLBACK(104) was timed out.&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;1394542614.447124:0:21137:0:(client.c:2121:ptlrpc_set_wait()) set ffff88001b35c880 going to sleep for 6 seconds
1394542614.447131:0:24355:0:(client.c:1901:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1394542606/real 1394542606]  req@ffff88000afc3000 x1462244279669228/t0(0) o104-&amp;gt;lustre-OST0003@10.10.4.120@tcp:15/16 lens 296/224 e 0 to 1 dl 1394542613 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
1394542614.447152:0:24355:0:(client.c:1901:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1394542606/real 1394542613]  req@ffff88001ef28400 x1462244279669232/t0(0) o104-&amp;gt;lustre-OST0003@10.10.4.121@tcp:15/16 lens 296/224 e 0 to 1 dl 1394542613 ref 1 fl Rpc:RXN/0/ffffffff rc 0/-1
1394542614.447157:0:24355:0:(client.c:2121:ptlrpc_set_wait()) set ffff880040e87ec0 going to sleep for 0 seconds
1394542614.447161:0:24355:0:(ldlm_lockd.c:523:ldlm_del_waiting_lock()) ### removed ns: filter-lustre-OST0003_UUID lock: ffff880072554340/0xd73a284e105aae19 lrc: 3/0,0 mode: PR/PR res: [0x543b1:0x0:0x0].0 rrc: 3 type: EXT [0-&amp;gt;18446744073709551615] (req 0-&amp;gt;18446744073709551615) flags: 0x60000000010020 nid: 10.10.4.120@tcp remote: 0xfd059f9b5a2f235d expref: 1918 pid: 21114 timeout: 4304418095 lvb_type: 1
1394542614.447166:0:24355:0:(ldlm_lockd.c:594:ldlm_failed_ast()) 138-a: lustre-OST0003: A client on nid 10.10.4.120@tcp was evicted due to a lock blocking callback time out: rc -107
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;the request with rq_xid=&quot;x1462244279669228&quot; was indeed received by the client,&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;1394542606.112063:0:19904:0:(service.c:1866:ptlrpc_server_handle_req_in()) got req x1462244279669228
1394542606.112066:0:19904:0:(service.c:1079:ptlrpc_update_export_timer()) updating export LOV_OSC_UUID at 1394542606 exp ffff880064797000
1394542606.112080:0:19904:0:(nrs_fifo.c:182:nrs_fifo_req_get()) NRS start fifo request from 12345-10.10.4.123@tcp, seq: 88055
1394542606.112086:0:19904:0:(service.c:2011:ptlrpc_server_handle_request()) Handling RPC pname:cluuid+ref:pid:xid:nid:opc ldlm_cb00_001:LOV_OSC_UUID+4:24355:x1462244279669228:12345-10.10.4.123@tcp:104
1394542606.112112:0:19904:0:(service.c:2055:ptlrpc_server_handle_request()) Handled RPC pname:cluuid+ref:pid:xid:nid:opc ldlm_cb00_001:LOV_OSC_UUID+4:24355:x1462244279669228:12345-10.10.4.123@tcp:104 Request procesed in 32us (61us total) trans 0 rc 0/0
1394542606.112118:0:19904:0:(nrs_fifo.c:244:nrs_fifo_req_stop()) NRS stop fifo request from 12345-10.10.4.123@tcp, seq: 88055
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;and the corresponding lock was cancelled subsequently,&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;1394542606.112122:0:19905:0:(ldlm_lockd.c:1655:ldlm_handle_bl_callback()) ### client blocking AST callback handler ns: lustre-OST0003-osc-ffff88007013b000 lock: ffff88004d2ae180/0xfd059f9b5a2f235d lrc: 3/0,0 mode: PR/PR res: [0x543b1:0x0:0x0].0 rrc: 1 type: EXT [0-&amp;gt;18446744073709551615] (req 0-&amp;gt;18446744073709551615) flags: 0x420000000000 nid: local remote: 0xd73a284e105aae19 expref: -99 pid: 24556 timeout: 0 lvb_type: 1
1394542606.112127:0:19905:0:(ldlm_lockd.c:1668:ldlm_handle_bl_callback()) Lock ffff88004d2ae180 already unused, calling callback (ffffffffa0a55a30)
1394542606.112133:0:19905:0:(cl_lock.c:151:cl_lock_trace0()) cancel lock: ffff880028d504b0@(1 ffff8800654780c0 1 5 0 0 0 0)(ffff88007a495b18/1/1) at cl_lock_cancel():1834
1394542606.112137:0:19905:0:(vvp_io.c:1165:vvp_io_init()) [0x2000013aa:0x4476:0x0] ignore/verify layout 1/0, layout version 0 restore needed 0
1394542606.112143:0:19905:0:(vvp_io.c:153:vvp_io_fini()) [0x2000013aa:0x4476:0x0] ignore/verify layout 1/0, layout version 0 restore needed 0
1394542606.112146:0:19905:0:(ldlm_request.c:1127:ldlm_cli_cancel_local()) ### client-side cancel ns: lustre-OST0003-osc-ffff88007013b000 lock: ffff88004d2ae180/0xfd059f9b5a2f235d lrc: 4/0,0 mode: PR/PR res: [0x543b1:0x0:0x0].0 rrc: 1 type: EXT [0-&amp;gt;18446744073709551615] (req 0-&amp;gt;18446744073709551615) flags: 0x428400000000 nid: local remote: 0xd73a284e105aae19 expref: -99 pid: 24556 timeout: 0 lvb_type: 1
1394542606.112154:0:19905:0:(cl_lock.c:151:cl_lock_trace0()) cancel lock: ffff880028d504b0@(2 ffff8800654780c0 2 5 0 0 0 1)(ffff88007a495920/1/1) at cl_lock_cancel():1834
1394542606.112157:0:19905:0:(cl_lock.c:151:cl_lock_trace0()) delete lock: ffff880028d504b0@(2 ffff8800654780c0 2 5 0 0 0 1)(ffff88007a495920/1/1) at cl_lock_delete():1781
1394542606.112171:0:19905:0:(ldlm_request.c:1186:ldlm_cancel_pack()) ### packing ns: lustre-OST0003-osc-ffff88007013b000 lock: ffff88004d2ae180/0xfd059f9b5a2f235d lrc: 2/0,0 mode: --/PR res: [0x543b1:0x0:0x0].0 rrc: 1 type: EXT [0-&amp;gt;18446744073709551615] (req 0-&amp;gt;18446744073709551615) flags: 0x4c69400000000 nid: local remote: 0xd73a284e105aae19 expref: -99 pid: 24556 timeout: 0 lvb_type: 1
1394542606.112175:0:19905:0:(ldlm_request.c:1190:ldlm_cancel_pack()) 1 locks packed
1394542606.112180:0:19905:0:(cl_lock.c:151:cl_lock_trace0()) delete lock: ffff880028d504b0@(1 ffff8800654780c0 1 6 0 0 0 1)(ffff88007a495b18/1/1) at cl_lock_delete():1781
1394542606.112182:0:19905:0:(cl_lock.c:151:cl_lock_trace0()) free lock: ffff880028d504b0@(0           (null) 0 6 0 0 0 1)(ffff88007a495b18/1/0) at cl_lock_free():270
1394542606.112186:0:19905:0:(ldlm_lockd.c:1677:ldlm_handle_bl_callback()) ### client blocking callback handler END ns: lustre-OST0003-osc-ffff88007013b000 lock: ffff88004d2ae180/0xfd059f9b5a2f235d lrc: 1/0,0 mode: --/PR res: [0x543b1:0x0:0x0].0 rrc: 1 type: EXT [0-&amp;gt;18446744073709551615] (req 0-&amp;gt;18446744073709551615) flags: 0x4c69400000000 nid: local remote: 0xd73a284e105aae19 expref: -99 pid: 24556 timeout: 0 lvb_type: 1
1394542606.112190:0:19905:0:(ldlm_lock.c:219:ldlm_lock_put()) ### final lock_put on destroyed lock, freeing it. ns: lustre-OST0003-osc-ffff88007013b000 lock: ffff88004d2ae180/0xfd059f9b5a2f235d lrc: 0/0,0 mode: --/PR res: [0x543b1:0x0:0x0].0 rrc: 1 type: EXT [0-&amp;gt;18446744073709551615] (req 0-&amp;gt;18446744073709551615) flags: 0x4c69400000000 nid: local remote: 0xd73a284e105aae19 expref: -99 pid: 24556 timeout: 0 lvb_type: 1
1394542606.112200:0:2899:0:(client.c:1473:ptlrpc_send_new_req()) Sending RPC pname:cluuid:pid:xid:nid:opc ptlrpcd_0:4581b497-cf77-5ea1-5ebc-c8dd032f07ff:2899:1462244323541000:10.10.4.123@tcp:103
1394542607.731980:0:2899:0:(ptlrpcd.c:387:ptlrpcd_check()) transfer 1 async RPCs [1-&amp;gt;0]
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;but the lock cancellation request (rq_xid=1462244323541000) was received just after the above &quot;LDLM_BL_CALLBACK&quot; request was timed out.&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;1394542614.454055:0:17823:0:(service.c:1866:ptlrpc_server_handle_req_in()) got req x1462244323540996
1394542614.454067:0:25143:0:(service.c:1866:ptlrpc_server_handle_req_in()) got req x1462244323541000
1394542614.454071:0:25143:0:(service.c:1866:ptlrpc_server_handle_req_in()) got req x1462244323541008
1394542614.454074:0:25143:0:(service.c:1866:ptlrpc_server_handle_req_in()) got req x1462244323541044
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;then this kind of failure should be caused by heavy network traffic or network problem.&lt;/p&gt;</comment>
                            <comment id="80730" author="hongchao.zhang" created="Tue, 1 Apr 2014 18:00:27 +0000"  >&lt;p&gt;according to the source code of metabench, this ticket could be caused by reading staled directory data in &quot;MDS_READPAGE&quot; request&lt;/p&gt;

&lt;p&gt;code snippet in metabench/util.c&lt;/p&gt;

&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;void clear_dir(&lt;span class=&quot;code-object&quot;&gt;char&lt;/span&gt;* path, &lt;span class=&quot;code-object&quot;&gt;int&lt;/span&gt; recurse)
{
    ...
    struct dirent *entry;
    ...
    &lt;span class=&quot;code-keyword&quot;&gt;while&lt;/span&gt; ( (entry = readdir(dir)) != NULL ) {
        &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (lstat(entry-&amp;gt;d_name,&amp;amp;st)) {
            fatal(&lt;span class=&quot;code-quote&quot;&gt;&quot;Unable to stat file [%s/%s]&quot;&lt;/span&gt;,path,entry-&amp;gt;d_name);
        }
        &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (S_ISDIR(st.st_mode) ) {
            &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (! recurse)
                &lt;span class=&quot;code-keyword&quot;&gt;continue&lt;/span&gt;;
            &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (strcmp(entry-&amp;gt;d_name,&lt;span class=&quot;code-quote&quot;&gt;&quot;.&quot;&lt;/span&gt;) == 0)
                &lt;span class=&quot;code-keyword&quot;&gt;continue&lt;/span&gt;;
            &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (strcmp(entry-&amp;gt;d_name,&lt;span class=&quot;code-quote&quot;&gt;&quot;..&quot;&lt;/span&gt;) == 0)
                &lt;span class=&quot;code-keyword&quot;&gt;continue&lt;/span&gt;;
            sprintf(newpath,&lt;span class=&quot;code-quote&quot;&gt;&quot;%s/%s&quot;&lt;/span&gt;,path,entry-&amp;gt;d_name);
            level++;
            clear_dir(newpath,recurse);
            level--;
            &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (testing) {
                printf(&lt;span class=&quot;code-quote&quot;&gt;&quot;Would remove dir  %s\n&quot;&lt;/span&gt;,newpath);
            } &lt;span class=&quot;code-keyword&quot;&gt;else&lt;/span&gt; &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (rmdir(newpath)) {
                    fatal(&lt;span class=&quot;code-quote&quot;&gt;&quot;Cant remove directory [%s]&quot;&lt;/span&gt;,newpath);
            }
            dirs++;
        } &lt;span class=&quot;code-keyword&quot;&gt;else&lt;/span&gt; {
            &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (testing) {
                printf(&lt;span class=&quot;code-quote&quot;&gt;&quot;Would remove file %s/%s\n&quot;&lt;/span&gt;,path,entry-&amp;gt;d_name);
            } &lt;span class=&quot;code-keyword&quot;&gt;else&lt;/span&gt; &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (unlink(entry-&amp;gt;d_name)) {
                fatal(&lt;span class=&quot;code-quote&quot;&gt;&quot;Unable to remove file [%s]&quot;&lt;/span&gt;,entry-&amp;gt;d_name);
            }
            files++;
        }
        num_files++;
    }
    ...
}
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;in test report &lt;a href=&quot;https://maloo.whamcloud.com/sub_tests/2d11e162-b87b-11e3-ac5f-52540035b04c&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://maloo.whamcloud.com/sub_tests/2d11e162-b87b-11e3-ac5f-52540035b04c&lt;/a&gt;&lt;br/&gt;
the entry &quot;3SoD&quot; has been deleted, but it was read back from MDT again, which cause -ENOENT error.&lt;/p&gt;</comment>
                            <comment id="80794" author="green" created="Tue, 1 Apr 2014 23:49:47 +0000"  >&lt;p&gt;Well, if metabench already deleted the file, then subsequent readdit should not have been able to do it. Lustre drops readdir cache and while glibc can cache this readdir result, is there any evidence metabench works with same readdir by doign a rewind and readdir again?&lt;/p&gt;</comment>
                            <comment id="80807" author="hongchao.zhang" created="Wed, 2 Apr 2014 04:47:00 +0000"  >&lt;p&gt;I have extracted the directory cleanup stuff in &quot;metabench&quot; to a new test that only deletes some directory, and found this issue can be reproduced locally,&lt;br/&gt;
the problem is caused by the staled data returned by ll_readdir&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;LustreError: 20820:0:(dir.c:259:ll_dir_read()) [0x200000400:0x776:0x0] filldir: K75udev-post
LustreError: 20820:0:(dir.c:259:ll_dir_read()) [0x200000400:0x776:0x0] filldir: K01certmonger
LustreError: 20820:0:(dir.c:307:ll_readdir()) VFS Op:inode=[0x200000400:0x776:0x0](ffff88003cba4678) pos/size9174905734640535999/4096 32bit_api 0
LustreError: 20820:0:(dir.c:307:ll_readdir()) VFS Op:inode=[0x200000400:0x5f3:0x0](ffff8800652bd138) pos/size8883493503157300815/4096 32bit_api 0
LustreError: 20820:0:(dir.c:259:ll_dir_read()) [0x200000400:0x5f3:0x0] filldir: K75udev-post
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt; 

&lt;p&gt;the file &quot;K75udev-post&quot; is not in &quot;&lt;span class=&quot;error&quot;&gt;&amp;#91;0x200000400:0x5f3:0x0&amp;#93;&lt;/span&gt;&quot;, but in &quot;&lt;span class=&quot;error&quot;&gt;&amp;#91;0x200000400:0x776:0x0&amp;#93;&lt;/span&gt;&quot;,&lt;/p&gt;

&lt;p&gt;the steps to reproduce the issue,&lt;/p&gt;

&lt;p&gt;1, copy /etc/ to /mnt/lustre/test/&lt;/p&gt;

&lt;p&gt;2, using the new test to cleanup /mnt/lustre/test, and it fails just as this ticket (-ENOENT)&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[root@zhanghc metabench]# ./test 
[03/28/2014 02:55:43] FATAL error on process 0
Proc 0: Unable to stat file [/mnt/lustre/test/xinetd.d/environment]: No such file or directory &amp;lt;-- Note: there is no &quot;environment&quot; in original &quot;/etc/xinetd.d&quot;
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;btw, the command &quot;rm -fr /mnt/lustre/test&quot; succeed in cleaning up the directory.&lt;/p&gt;

&lt;p&gt;will update the status soon.&lt;/p&gt;</comment>
                            <comment id="80879" author="hongchao.zhang" created="Wed, 2 Apr 2014 17:37:14 +0000"  >&lt;p&gt;status update:&lt;/p&gt;

&lt;p&gt;this issue is caused by the patch &lt;a href=&quot;http://review.whamcloud.com/#/c/7196/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#/c/7196/&lt;/a&gt; in &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-3529&quot; title=&quot;create striped directory&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-3529&quot;&gt;&lt;del&gt;LU-3529&lt;/del&gt;&lt;/a&gt;,&lt;br/&gt;
and this issue occurs only after this commit (5f3e926ac9ff8ad134ad920d0e8545e16395ef3b) was cherry-picked at 2014-02-22&lt;/p&gt;

</comment>
                            <comment id="80913" author="hongchao.zhang" created="Thu, 3 Apr 2014 02:58:46 +0000"  >&lt;p&gt;there is a problem in the patch &lt;a href=&quot;http://review.whamcloud.com/#/c/7196/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#/c/7196/&lt;/a&gt; in &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-3529&quot; title=&quot;create striped directory&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-3529&quot;&gt;&lt;del&gt;LU-3529&lt;/del&gt;&lt;/a&gt;,&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;diff --git a/lustre/osd-ldiskfs/osd_handler.c b/lustre/osd-ldiskfs/osd_handler.c
@@ -4831,14 +4836,16 @@ &lt;span class=&quot;code-keyword&quot;&gt;static&lt;/span&gt; &lt;span class=&quot;code-object&quot;&gt;int&lt;/span&gt; osd_ldiskfs_it_fill(&lt;span class=&quot;code-keyword&quot;&gt;const&lt;/span&gt; struct lu_env *env,
         &lt;span class=&quot;code-keyword&quot;&gt;else&lt;/span&gt;
                up_read(&amp;amp;obj-&amp;gt;oo_ext_idx_sem);
 
-        &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (it-&amp;gt;oie_rd_dirent == 0) {
-                result = -EIO;
-        } &lt;span class=&quot;code-keyword&quot;&gt;else&lt;/span&gt; {
-                it-&amp;gt;oie_dirent = it-&amp;gt;oie_buf;
-                it-&amp;gt;oie_it_dirent = 1;
-        }
+       &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (it-&amp;gt;oie_rd_dirent == 0) {
+               /*If it does not get any dirent, it means it has been reached
+                *to the end of the dir */
+               it-&amp;gt;oie_file.f_pos = ldiskfs_get_htree_eof(&amp;amp;it-&amp;gt;oie_file);  &amp;lt;-- here, we should &lt;span class=&quot;code-keyword&quot;&gt;return&lt;/span&gt; an error atm, otherwise, the caller maybe use the &lt;span class=&quot;code-quote&quot;&gt;&quot;it&quot;&lt;/span&gt;
+       } &lt;span class=&quot;code-keyword&quot;&gt;else&lt;/span&gt; {
+               it-&amp;gt;oie_dirent = it-&amp;gt;oie_buf;
+               it-&amp;gt;oie_it_dirent = 1;
+       }
 
-        RETURN(result);
+       RETURN(result);
 }
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;if there is no more entry found in the directory, we should return some kind of error to tell the caller not use the &quot;it&quot;.&lt;/p&gt;

&lt;p&gt;btw, this only fixed the -ENOENT problem (No such file or directory), and the other kind of problem of this ticket (Input/output error) should be caused&lt;br/&gt;
by heavy network traffic or network problem as mentioned above.&lt;/p&gt;</comment>
                            <comment id="80926" author="hongchao.zhang" created="Thu, 3 Apr 2014 11:04:09 +0000"  >&lt;p&gt;the patch is tracked at &lt;a href=&quot;http://review.whamcloud.com/#/c/9880/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#/c/9880/&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="84097" author="hongchao.zhang" created="Wed, 14 May 2014 15:53:58 +0000"  >&lt;p&gt;the patch (&lt;a href=&quot;http://review.whamcloud.com/#/c/9880/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#/c/9880/&lt;/a&gt;) of this issue has been included in &lt;a href=&quot;http://review.whamcloud.com/#/c/9511/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#/c/9511/&lt;/a&gt;, which has been landed on master.&lt;/p&gt;</comment>
                            <comment id="84102" author="pjones" created="Wed, 14 May 2014 17:05:16 +0000"  >&lt;p&gt;It sounds like this issue is a duplicate of &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-4690&quot; title=&quot;sanity test_4: Expect error removing in-use dir /mnt/lustre/remote_dir&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-4690&quot;&gt;&lt;del&gt;LU-4690&lt;/del&gt;&lt;/a&gt;&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10010">
                    <name>Duplicate</name>
                                            <outwardlinks description="duplicates">
                                        <issuelink>
            <issuekey id="12469">LU-855</issuekey>
        </issuelink>
                            </outwardlinks>
                                                                <inwardlinks description="is duplicated by">
                                        <issuelink>
            <issuekey id="12469">LU-855</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzwhin:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>13091</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>