<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:03:00 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-6758] racer test_1: test failed to respond and timed out</title>
                <link>https://jira.whamcloud.com/browse/LU-6758</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;This issue was created by maloo for sarah_lw &amp;lt;wei3.liu@intel.com&amp;gt;&lt;/p&gt;

&lt;p&gt;This issue relates to the following test suite run: &lt;a href=&quot;https://testing.hpdd.intel.com/test_sets/8303c8ec-13d8-11e5-b4b0-5254006e85c2&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.hpdd.intel.com/test_sets/8303c8ec-13d8-11e5-b4b0-5254006e85c2&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The sub-test test_1 failed with the following error:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;test failed to respond and timed out
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;There is no MDS logs&lt;br/&gt;
client console&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;16:40:41:Lustre: DEBUG MARKER: == racer test 1: racer on clients: onyx-40vm5,onyx-40vm6.onyx.hpdd.intel.com DURATION=900 == 09:30:44 (1434299444)
16:40:41:Lustre: DEBUG MARKER: DURATION=900 MDSCOUNT=1 				   LFS=/usr/bin/lfs /usr/lib64/lustre/tests/racer/racer.sh /mnt/lustre2/racer 
16:40:41:Lustre: DEBUG MARKER: DURATION=900 MDSCOUNT=1 				   LFS=/usr/bin/lfs /usr/lib64/lustre/tests/racer/racer.sh /mnt/lustre/racer 
16:40:41:LustreError: 21155:0:(llite_lib.c:1497:ll_md_setattr()) md_setattr fails: rc = -30
16:40:42:Lustre: 21248:0:(client.c:2003:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1434299452/real 1434299452]  req@ffff88006d4753c0 x1503970942065616/t0(0) o36-&amp;gt;lustre-MDT0000-mdc-ffff88007c74cc00@10.2.4.185@tcp:12/10 lens 496/568 e 0 to 1 dl 1434299459 ref 2 fl Rpc:X/0/ffffffff rc 0/-1
16:40:42:Lustre: lustre-MDT0000-mdc-ffff88007c74cc00: Connection to lustre-MDT0000 (at 10.2.4.185@tcp) was lost; in progress operations using this service will wait for recovery to complete
16:40:43:Lustre: 31077:0:(client.c:2003:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1434299452/real 1434299452]  req@ffff880070c73c80 x1503970942069540/t0(0) o400-&amp;gt;MGC10.2.4.185@tcp@10.2.4.185@tcp:26/25 lens 224/224 e 0 to 1 dl 1434299459 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
16:40:43:Lustre: 31077:0:(client.c:2003:ptlrpc_expire_one_request()) Skipped 11 previous similar messages
16:40:43:LustreError: 166-1: MGC10.2.4.185@tcp: Connection to MGS (at 10.2.4.185@tcp) was lost; in progress operations using this service will fail
16:40:43:Lustre: 31077:0:(client.c:2003:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1434299457/real 1434299457]  req@ffff880070cd1080 x1503970942081192/t0(0) o400-&amp;gt;lustre-MDT0000-mdc-ffff880079777c00@10.2.4.185@tcp:12/10 lens 224/224 e 0 to 1 dl 1434299464 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
16:40:43:Lustre: 31077:0:(client.c:2003:ptlrpc_expire_one_request()) Skipped 1 previous similar message
16:40:43:Lustre: 31075:0:(client.c:2003:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1434299470/real 1434299470]  req@ffff880070167380 x1503970942081360/t0(0) o250-&amp;gt;MGC10.2.4.185@tcp@10.2.4.185@tcp:26/25 lens 520/544 e 0 to 1 dl 1434299481 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
16:40:43:Lustre: 31075:0:(client.c:2003:ptlrpc_expire_one_request()) Skipped 3 previous similar messages
16:40:43:Lustre: 31075:0:(client.c:2003:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1434299485/real 0]  req@ffff880070a2c3c0 x1503970942081548/t0(0) o250-&amp;gt;MGC10.2.4.185@tcp@10.2.4.185@tcp:26/25 lens 520/544 e 0 to 1 dl 1434299501 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
16:40:44:Lustre: 31075:0:(client.c:2003:ptlrpc_expire_one_request()) Skipped 2 previous similar messages
16:40:44:Lustre: 31075:0:(client.c:2003:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1434299515/real 1434299515]  req@ffff88007010b080 x1503970942081892/t0(0) o38-&amp;gt;lustre-MDT0000-mdc-ffff880079777c00@10.2.4.185@tcp:12/10 lens 520/544 e 0 to 1 dl 1434299531 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
16:40:44:Lustre: 31075:0:(client.c:2003:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1434299535/real 1434299536]  req@ffff88006d7f20c0 x1503970942082156/t0(0) o38-&amp;gt;lustre-MDT0000-mdc-ffff88007c74cc00@10.2.4.185@tcp:12/10 lens 520/544 e 0 to 1 dl 1434299556 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
16:40:44:Lustre: 31075:0:(client.c:2003:ptlrpc_expire_one_request()) Skipped 2 previous similar messages
16:40:44:INFO: task dir_create.sh:5103 blocked for more than 120 seconds.
16:40:44:      Not tainted 2.6.32-504.16.2.el6.x86_64 #1
16:40:44:&quot;echo 0 &amp;gt; /proc/sys/kernel/hung_task_timeout_secs&quot; disables this message.
16:40:45:dir_create.sh D 0000000000000000     0  5103   5096 0x00000080
16:40:45: ffff88007c2e1bc8 0000000000000082 0000000000000000 ffff880000000000
16:40:45: ffff880000000065 ffff88007a19e104 ffffffffa1b012c0 0000000000000000
16:40:46: 000200000a0204b9 ffffffffa1dba813 ffff880079059068 ffff88007c2e1fd8
16:40:46:Call Trace:
16:40:46: [&amp;lt;ffffffffa1d5f5f5&amp;gt;] ? lustre_msg_buf+0x55/0x60 [ptlrpc]
16:40:46: [&amp;lt;ffffffffa1d86a22&amp;gt;] ? __req_capsule_get+0x162/0x6e0 [ptlrpc]
16:40:46: [&amp;lt;ffffffff8152b7a6&amp;gt;] __mutex_lock_slowpath+0x96/0x210
16:40:46: [&amp;lt;ffffffff8152b2cb&amp;gt;] mutex_lock+0x2b/0x50
16:40:46: [&amp;lt;ffffffffa1f40c7a&amp;gt;] mdc_close+0x19a/0xa60 [mdc]
16:40:46: [&amp;lt;ffffffffa2017c09&amp;gt;] ? ll_i2suppgid+0x19/0x30 [lustre]
16:40:46: [&amp;lt;ffffffffa1efefe7&amp;gt;] lmv_close+0x2c7/0x580 [lmv]
16:40:46: [&amp;lt;ffffffffa1feac4f&amp;gt;] ll_close_inode_openhandle+0x2ef/0xe90 [lustre]
16:40:47: [&amp;lt;ffffffffa1febb97&amp;gt;] ll_md_real_close+0x197/0x210 [lustre]
16:40:47: [&amp;lt;ffffffffa1fed331&amp;gt;] ll_file_release+0x641/0xad0 [lustre]
16:40:47: [&amp;lt;ffffffff8118fb85&amp;gt;] __fput+0xf5/0x210
16:40:47: [&amp;lt;ffffffff8118fcc5&amp;gt;] fput+0x25/0x30
16:40:47: [&amp;lt;ffffffff8118af1d&amp;gt;] filp_close+0x5d/0x90
16:40:47: [&amp;lt;ffffffff811a3760&amp;gt;] sys_dup3+0x130/0x190
16:40:47: [&amp;lt;ffffffff811a37d4&amp;gt;] sys_dup2+0x14/0x50
16:40:47: [&amp;lt;ffffffff8100b072&amp;gt;] system_call_fastpath+0x16/0x1b
16:40:47:INFO: task cat:5274 blocked for more than 120 seconds.
16:40:47:      Not tainted 2.6.32-504.16.2.el6.x86_64 #1
16:40:47:&quot;echo 0 &amp;gt; /proc/sys/kernel/hung_task_timeout_secs&quot; disables this message.
16:40:47:cat           D 0000000000000000     0  5274   5133 0x00000080
16:40:47: ffff880071bfbbf8 0000000000000086 0000000000000000 ffff880000000000
16:40:47: ffff880000000065 ffff88007a19e104 ffffffffa1b01f80 0000000000000000
16:40:47: 000200000a0204b9 ffffffffa1dba813 ffff880071bf9ad8 ffff880071bfbfd8
16:40:48:Call Trace:
16:40:48: [&amp;lt;ffffffffa1d5f5f5&amp;gt;] ? lustre_msg_buf+0x55/0x60 [ptlrpc]
16:40:48: [&amp;lt;ffffffffa1d86a22&amp;gt;] ? __req_capsule_get+0x162/0x6e0 [ptlrpc]
16:40:48: [&amp;lt;ffffffff8152b7a6&amp;gt;] __mutex_lock_slowpath+0x96/0x210
16:40:48: [&amp;lt;ffffffff8152b2cb&amp;gt;] mutex_lock+0x2b/0x50
16:40:48: [&amp;lt;ffffffffa1f40c7a&amp;gt;] mdc_close+0x19a/0xa60 [mdc]
16:40:49: [&amp;lt;ffffffffa2017c09&amp;gt;] ? ll_i2suppgid+0x19/0x30 [lustre]
16:40:49: [&amp;lt;ffffffffa1efefe7&amp;gt;] lmv_close+0x2c7/0x580 [lmv]
16:40:49: [&amp;lt;ffffffffa1feac4f&amp;gt;] ll_close_inode_openhandle+0x2ef/0xe90 [lustre]
16:40:49: [&amp;lt;ffffffffa1febb97&amp;gt;] ll_md_real_close+0x197/0x210 [lustre]
16:40:49: [&amp;lt;ffffffffa1fed331&amp;gt;] ll_file_release+0x641/0xad0 [lustre]
16:40:49: [&amp;lt;ffffffff8118fb85&amp;gt;] __fput+0xf5/0x210
16:40:49: [&amp;lt;ffffffff8118fcc5&amp;gt;] fput+0x25/0x30
16:40:50: [&amp;lt;ffffffff8118af1d&amp;gt;] filp_close+0x5d/0x90
16:40:50: [&amp;lt;ffffffff8118aff5&amp;gt;] sys_close+0xa5/0x100
16:40:50: [&amp;lt;ffffffff8100b072&amp;gt;] system_call_fastpath+0x16/0x1b
16:40:50:INFO: task dd:6983 blocked for more than 120 seconds.
16:40:50:      Not tainted 2.6.32-504.16.2.el6.x86_64 #1
16:40:50:&quot;echo 0 &amp;gt; /proc/sys/kernel/hung_task_timeout_secs&quot; disables this message.
16:40:50:dd            D 0000000000000000     0  6983   5154 0x00000080
16:40:50: ffff880071917bf8 0000000000000086 0000000000000000 ffff880000000000
16:40:50: ffff880000000065 ffff88007a56c244 ffffffffa1b03a00 0000000000000000
16:40:51: 000200000a0204b9 ffffffffa1dba813 ffff880070fafad8 ffff880071917fd8
16:40:51:Call Trace:
16:40:51: [&amp;lt;ffffffffa1d5f5f5&amp;gt;] ? lustre_msg_buf+0x55/0x60 [ptlrpc]
16:40:51: [&amp;lt;ffffffffa1d86a22&amp;gt;] ? __req_capsule_get+0x162/0x6e0 [ptlrpc]
16:40:51: [&amp;lt;ffffffff8152b7a6&amp;gt;] __mutex_lock_slowpath+0x96/0x210
16:40:51: [&amp;lt;ffffffff8152b2cb&amp;gt;] mutex_lock+0x2b/0x50
16:40:51: [&amp;lt;ffffffffa1f40c7a&amp;gt;] mdc_close+0x19a/0xa60 [mdc]
16:40:51: [&amp;lt;ffffffffa2017c09&amp;gt;] ? ll_i2suppgid+0x19/0x30 [lustre]
16:40:52: [&amp;lt;ffffffffa1efefe7&amp;gt;] lmv_close+0x2c7/0x580 [lmv]
16:40:52: [&amp;lt;ffffffffa1feac4f&amp;gt;] ll_close_inode_openhandle+0x2ef/0xe90 [lustre]
16:40:52: [&amp;lt;ffffffffa1febb97&amp;gt;] ll_md_real_close+0x197/0x210 [lustre]
16:40:52: [&amp;lt;ffffffffa1fed331&amp;gt;] ll_file_release+0x641/0xad0 [lustre]
16:40:53: [&amp;lt;ffffffff8118fb85&amp;gt;] __fput+0xf5/0x210
16:40:53: [&amp;lt;ffffffff8118fcc5&amp;gt;] fput+0x25/0x30
16:40:53: [&amp;lt;ffffffff8118af1d&amp;gt;] filp_close+0x5d/0x90
16:40:53: [&amp;lt;ffffffff8118aff5&amp;gt;] sys_close+0xa5/0x100
16:40:53: [&amp;lt;ffffffff8100b072&amp;gt;] system_call_fastpath+0x16/0x1b
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</description>
                <environment>server: lustre-master build # 3071 EL7 ldiskfs&lt;br/&gt;
client: lustre-master build # 3071 EL6 ldiskfs</environment>
        <key id="30791">LU-6758</key>
            <summary>racer test_1: test failed to respond and timed out</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="6" iconUrl="https://jira.whamcloud.com/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="2">Won&apos;t Fix</resolution>
                                        <assignee username="wc-triage">WC Triage</assignee>
                                    <reporter username="maloo">Maloo</reporter>
                        <labels>
                    </labels>
                <created>Tue, 23 Jun 2015 22:26:30 +0000</created>
                <updated>Tue, 11 Sep 2018 23:55:16 +0000</updated>
                            <resolved>Mon, 10 Sep 2018 23:45:56 +0000</resolved>
                                    <version>Lustre 2.8.0</version>
                    <version>Lustre 2.10.0</version>
                    <version>Lustre 2.11.0</version>
                    <version>Lustre 2.12.0</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>7</watches>
                                                                            <comments>
                            <comment id="120305" author="green" created="Fri, 3 Jul 2015 18:49:13 +0000"  >&lt;p&gt;no mds logs in the report and we need thta to understand why did something remounted readonly&lt;/p&gt;</comment>
                            <comment id="120306" author="adilger" created="Fri, 3 Jul 2015 18:51:32 +0000"  >&lt;p&gt;Looks like the client got a -30 error (-EROFS) and the MDS crashed. No way to know what is happening without the MDS logs. &lt;/p&gt;

&lt;p&gt;Sarah, is there a TEI ticket open to fix the console logs for RHEL7?  If not, can you please open one.&lt;/p&gt;</comment>
                            <comment id="120484" author="sarah" created="Mon, 6 Jul 2015 20:49:46 +0000"  >&lt;p&gt;There is TEI-3392  for  similar issue,  I will open another one for partial logs missing&lt;/p&gt;</comment>
                            <comment id="120854" author="sarah" created="Thu, 9 Jul 2015 17:34:27 +0000"  >&lt;p&gt;I created TEI-3677 to track the missing log issue&lt;/p&gt;</comment>
                            <comment id="187671" author="casperjx" created="Thu, 9 Mar 2017 17:57:17 +0000"  >&lt;p&gt;Similar &quot;process D&quot; messages seen with parallel-scale-nfsv3 test_racer_on_nfs.&lt;/p&gt;</comment>
                            <comment id="187681" author="casperjx" created="Thu, 9 Mar 2017 18:28:57 +0000"  >&lt;p&gt;Link for parallel-scale-nfsv3 test_racer_on_nfs:&lt;br/&gt;
&lt;a href=&quot;https://testing.hpdd.intel.com/test_sessions/d5c0f53e-f881-11e6-887f-5254006e85c2&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.hpdd.intel.com/test_sessions/d5c0f53e-f881-11e6-887f-5254006e85c2&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="187694" author="sarah" created="Thu, 9 Mar 2017 19:16:09 +0000"  >&lt;p&gt;hi Jim,&lt;br/&gt;
I think the failure you saw is &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-8584&quot; title=&quot;parallel-scale-nfsv3 test_racer_on_nfs: BUG: unable to handle kernel paging request&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-8584&quot;&gt;LU-8584&lt;/a&gt;, caused by the &quot;unable handle kernel  error&quot;&lt;/p&gt;</comment>
                            <comment id="204101" author="jamesanunez" created="Tue, 1 Aug 2017 16:52:40 +0000"  >&lt;p&gt;We still see what looks like this hang with parallel_scale_nfsv3 test_racer_on_nfs. Logs for the most recent hang is at&lt;br/&gt;
&lt;a href=&quot;https://testing.hpdd.intel.com/test_sets/59c82126-7286-11e7-bb95-5254006e85c2&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.hpdd.intel.com/test_sets/59c82126-7286-11e7-bb95-5254006e85c2&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The logs are interesting. On the client 1, in the console log, we see a segfault and then hangs on several different processes including dd, truncate, file_concat.sh, setfattr, ln, and more:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;03:08:04:[11566.277896] Lustre: DEBUG MARKER: == parallel-scale-nfsv3 test racer_on_nfs: racer on NFS client ======================================= 03:04:22 (1501124662)
03:08:04:[11566.741676] 8[31347]: segfault at 8 ip 00007fc99889b3b8 sp 00007fff9753dd40 error 4 in ld-2.17.so[7fc998890000+20000]
03:08:04:[11760.098045] INFO: task dd:3382 blocked for more than 120 seconds.
03:08:04:[11760.100574] &quot;echo 0 &amp;gt; /proc/sys/kernel/hung_task_timeout_secs&quot; disables this message.
03:08:04:[11760.103192] dd              D ffff88007ae64278     0  3382  30972 0x00000080
03:08:04:[11760.105770]  ffff880036c9fd10 0000000000000082 ffff880078760fb0 ffff880036c9ffd8
03:08:04:[11760.108465]  ffff880036c9ffd8 ffff880036c9ffd8 ffff880078760fb0 ffff88007ae64270
03:08:04:[11760.111045]  ffff88007ae64274 ffff880078760fb0 00000000ffffffff ffff88007ae64278
03:08:04:[11760.113570] Call Trace:
03:08:04:[11760.115753]  [&amp;lt;ffffffff8168d6c9&amp;gt;] schedule_preempt_disabled+0x29/0x70
03:08:04:[11760.118168]  [&amp;lt;ffffffff8168b315&amp;gt;] __mutex_lock_slowpath+0xc5/0x1d0
03:08:04:[11760.120533]  [&amp;lt;ffffffff8168a76f&amp;gt;] mutex_lock+0x1f/0x2f
03:08:04:[11760.122898]  [&amp;lt;ffffffff811833f6&amp;gt;] generic_file_aio_write+0x46/0xa0
03:08:04:[11760.125271]  [&amp;lt;ffffffffa057911b&amp;gt;] nfs_file_write+0xbb/0x1e0 [nfs]
03:08:04:[11760.127568]  [&amp;lt;ffffffff811fdfbd&amp;gt;] do_sync_write+0x8d/0xd0
03:08:04:[11760.129784]  [&amp;lt;ffffffff811fe82d&amp;gt;] vfs_write+0xbd/0x1e0
03:08:04:[11760.131928]  [&amp;lt;ffffffff811fe6f7&amp;gt;] ? vfs_read+0xf7/0x170
03:08:04:[11760.134049]  [&amp;lt;ffffffff811ff34f&amp;gt;] SyS_write+0x7f/0xe0
03:08:04:[11760.136168]  [&amp;lt;ffffffff816975c9&amp;gt;] system_call_fastpath+0x16/0x1b
03:08:04:[11760.138325] INFO: task truncate:3454 blocked for more than 120 seconds.
03:08:04:[11760.140499] &quot;echo 0 &amp;gt; /proc/sys/kernel/hung_task_timeout_secs&quot; disables this message.
03:08:04:[11760.142775] truncate        D ffffffff8168a630     0  3454  30984 0x00000080
03:08:04:[11760.145011]  ffff88006e3c3af0 0000000000000082 ffff880079121f60 ffff88006e3c3fd8
03:08:04:[11760.147321]  ffff88006e3c3fd8 ffff88006e3c3fd8 ffff880079121f60 ffff88007fc16c40
03:08:04:[11760.149568]  0000000000000000 7fffffffffffffff ffff88007ff616e8 ffffffff8168a630
03:08:04:[11760.151888] Call Trace:
03:08:04:[11760.153683]  [&amp;lt;ffffffff8168a630&amp;gt;] ? bit_wait+0x50/0x50
03:08:04:[11760.155721]  [&amp;lt;ffffffff8168c5d9&amp;gt;] schedule+0x29/0x70
03:08:04:[11760.157686]  [&amp;lt;ffffffff8168a019&amp;gt;] schedule_timeout+0x239/0x2c0
03:08:04:[11760.159733]  [&amp;lt;ffffffff8113d22e&amp;gt;] ? delayacct_end+0x9e/0xb0
03:08:04:[11760.161691]  [&amp;lt;ffffffff81060c1f&amp;gt;] ? kvm_clock_get_cycles+0x1f/0x30
03:08:04:[11760.163663]  [&amp;lt;ffffffff8168a630&amp;gt;] ? bit_wait+0x50/0x50
03:08:04:[11760.165551]  [&amp;lt;ffffffff8168bb7e&amp;gt;] io_schedule_timeout+0xae/0x130
03:08:04:[11760.167469]  [&amp;lt;ffffffff8168bc18&amp;gt;] io_schedule+0x18/0x20
03:08:04:[11760.169321]  [&amp;lt;ffffffff8168a641&amp;gt;] bit_wait_io+0x11/0x50
03:08:04:[11760.171825]  [&amp;lt;ffffffff8168a165&amp;gt;] __wait_on_bit+0x65/0x90
03:08:04:[11760.174073]  [&amp;lt;ffffffff81180211&amp;gt;] wait_on_page_bit+0x81/0xa0
03:08:04:[11760.175867]  [&amp;lt;ffffffff810b1be0&amp;gt;] ? wake_bit_function+0x40/0x40
03:08:04:[11760.177632]  [&amp;lt;ffffffff81180341&amp;gt;] __filemap_fdatawait_range+0x111/0x190
03:08:04:[11760.179428]  [&amp;lt;ffffffff811803d4&amp;gt;] filemap_fdatawait_range+0x14/0x30
03:08:04:[11760.181160]  [&amp;lt;ffffffff81180417&amp;gt;] filemap_fdatawait+0x27/0x30
03:08:04:[11760.182841]  [&amp;lt;ffffffff8118258c&amp;gt;] filemap_write_and_wait+0x4c/0x80
03:08:04:[11760.184525]  [&amp;lt;ffffffffa058a3b0&amp;gt;] nfs_wb_all+0x20/0x100 [nfs]
03:08:04:[11760.186135]  [&amp;lt;ffffffffa057ca08&amp;gt;] nfs_setattr+0x1d8/0x200 [nfs]
03:08:04:[11760.187756]  [&amp;lt;ffffffff8121be19&amp;gt;] notify_change+0x279/0x3d0
03:08:04:[11760.189312]  [&amp;lt;ffffffff811fc995&amp;gt;] do_truncate+0x75/0xc0
03:08:04:[11760.190830]  [&amp;lt;ffffffffa0578a29&amp;gt;] ? nfs_permission+0x199/0x1e0 [nfs]
03:08:04:[11760.192390]  [&amp;lt;ffffffff811fcb44&amp;gt;] vfs_truncate+0x164/0x190
03:08:04:[11760.193849]  [&amp;lt;ffffffff811fcbfc&amp;gt;] do_sys_truncate+0x8c/0xb0
03:08:04:[11760.195276]  [&amp;lt;ffffffff811fcdae&amp;gt;] SyS_truncate+0xe/0x10
03:08:04:[11760.196650]  [&amp;lt;ffffffff816975c9&amp;gt;] system_call_fastpath+0x16/0x1b
03:08:04:[11760.198062] INFO: task file_concat.sh:3471 blocked for more than 120 seconds.
03:08:04:[11760.199552] &quot;echo 0 &amp;gt; /proc/sys/kernel/hung_task_timeout_secs&quot; disables this message.
03:08:04:[11760.201110] file_concat.sh  D ffff88007ae64278     0  3471  31015 0x00000080
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;On the same client, in the stack trace log we see dd and mkdir hung and an nfs server problem&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;04:06:47:[11760.356684]  [&amp;lt;ffffffff816975c9&amp;gt;] system_call_fastpath+0x16/0x1b
04:06:47:[11760.358129] INFO: task mkdir:3524 blocked for more than 120 seconds.
04:06:47:[11760.359580] &quot;echo 0 &amp;gt; /proc/sys/kernel/hung_task_timeout_secs&quot; disables this message.
04:06:47:[11760.361164] mkdir           D ffff880006f53bc8     0  3524  31009 0x00000080
04:06:47:[11760.362709]  ffff88006a7d7d90 0000000000000082 ffff88005b1dce70 ffff88006a7d7fd8
04:06:47:[11760.364352]  ffff88006a7d7fd8 ffff88006a7d7fd8 ffff88005b1dce70 ffff880006f53bc0
04:06:47:[11760.365957]  ffff880006f53bc4 ffff88005b1dce70 00000000ffffffff ffff880006f53bc8
04:06:47:[11760.367612] Call Trace:
04:06:47:[11760.368814]  [&amp;lt;ffffffff8168d6c9&amp;gt;] schedule_preempt_disabled+0x29/0x70
04:06:47:[11760.370391]  [&amp;lt;ffffffff8168b315&amp;gt;] __mutex_lock_slowpath+0xc5/0x1d0
04:06:47:[11760.371915]  [&amp;lt;ffffffff8168a76f&amp;gt;] mutex_lock+0x1f/0x2f
04:06:47:[11760.373363]  [&amp;lt;ffffffff8120cae5&amp;gt;] filename_create+0x85/0x180
04:06:47:[11760.374860]  [&amp;lt;ffffffffa057870e&amp;gt;] ? nfs_do_access+0x23e/0x390 [nfs]
04:06:47:[11760.376396]  [&amp;lt;ffffffff811de665&amp;gt;] ? kmem_cache_alloc+0x35/0x1e0
04:06:47:[11760.377893]  [&amp;lt;ffffffff8120f2bf&amp;gt;] ? getname_flags+0x4f/0x1a0
04:06:47:[11760.379357]  [&amp;lt;ffffffff8120f334&amp;gt;] ? getname_flags+0xc4/0x1a0
04:06:47:[11760.380793]  [&amp;lt;ffffffff8121eede&amp;gt;] ? mntput_no_expire+0x3e/0x120
04:06:47:[11760.382236]  [&amp;lt;ffffffff8120f5e1&amp;gt;] user_path_create+0x41/0x60
04:06:47:[11760.383667]  [&amp;lt;ffffffff812108f6&amp;gt;] SyS_mkdirat+0x46/0xe0
04:06:47:[11760.385066]  [&amp;lt;ffffffff812109a9&amp;gt;] SyS_mkdir+0x19/0x20
04:06:47:[11760.386419]  [&amp;lt;ffffffff816975c9&amp;gt;] system_call_fastpath+0x16/0x1b
04:06:47:[11760.387839] INFO: task dd:3529 blocked for more than 120 seconds.
04:06:47:[11760.389275] &quot;echo 0 &amp;gt; /proc/sys/kernel/hung_task_timeout_secs&quot; disables this message.
04:06:47:[11760.390867] dd              D ffffffff8168a630     0  3529  31008 0x00000080
04:06:47:[11760.392426]  ffff880077fffbc0 0000000000000086 ffff880074fb0000 ffff880077ffffd8
04:06:48:[11760.394065]  ffff880077ffffd8 ffff880077ffffd8 ffff880074fb0000 ffff88007fd16c40
04:06:48:[11760.395686]  0000000000000000 7fffffffffffffff ffff88007ff5cee8 ffffffff8168a630
04:06:48:[11760.397353] Call Trace:
04:06:48:[11760.398564]  [&amp;lt;ffffffff8168a630&amp;gt;] ? bit_wait+0x50/0x50
04:06:48:[11760.400060]  [&amp;lt;ffffffff8168c5d9&amp;gt;] schedule+0x29/0x70
04:06:48:[11760.401498]  [&amp;lt;ffffffff8168a019&amp;gt;] schedule_timeout+0x239/0x2c0
04:06:48:[11760.403027]  [&amp;lt;ffffffffa0267e10&amp;gt;] ? rpc_put_task+0x10/0x20 [sunrpc]
04:06:48:[11760.404538]  [&amp;lt;ffffffffa0584174&amp;gt;] ? nfs_initiate_pgio+0xd4/0x160 [nfs]
04:06:48:[11760.406022]  [&amp;lt;ffffffff81060c1f&amp;gt;] ? kvm_clock_get_cycles+0x1f/0x30
04:06:48:[11760.407526]  [&amp;lt;ffffffff8168a630&amp;gt;] ? bit_wait+0x50/0x50
04:06:48:[11760.408925]  [&amp;lt;ffffffff8168bb7e&amp;gt;] io_schedule_timeout+0xae/0x130
04:06:48:[11760.410387]  [&amp;lt;ffffffff8168bc18&amp;gt;] io_schedule+0x18/0x20
04:06:48:[11760.411777]  [&amp;lt;ffffffff8168a641&amp;gt;] bit_wait_io+0x11/0x50
04:06:48:[11760.413167]  [&amp;lt;ffffffff8168a165&amp;gt;] __wait_on_bit+0x65/0x90
04:06:48:[11760.414551]  [&amp;lt;ffffffff81180211&amp;gt;] wait_on_page_bit+0x81/0xa0
04:06:48:[11760.415953]  [&amp;lt;ffffffff810b1be0&amp;gt;] ? wake_bit_function+0x40/0x40
04:06:48:[11760.417375]  [&amp;lt;ffffffff81180341&amp;gt;] __filemap_fdatawait_range+0x111/0x190
04:06:48:[11760.418862]  [&amp;lt;ffffffff811803d4&amp;gt;] filemap_fdatawait_range+0x14/0x30
04:06:48:[11760.420312]  [&amp;lt;ffffffff81182656&amp;gt;] filemap_write_and_wait_range+0x56/0x90
04:06:48:[11760.421798]  [&amp;lt;ffffffffa0578e46&amp;gt;] nfs_file_fsync+0x86/0x110 [nfs]
04:06:48:[11760.423232]  [&amp;lt;ffffffff812302cb&amp;gt;] vfs_fsync+0x2b/0x40
04:06:48:[11760.424589]  [&amp;lt;ffffffffa0579286&amp;gt;] nfs_file_flush+0x46/0x60 [nfs]
04:06:48:[11760.426028]  [&amp;lt;ffffffff811fbfd4&amp;gt;] filp_close+0x34/0x80
04:06:48:[11760.427400]  [&amp;lt;ffffffff8121d338&amp;gt;] __close_fd+0x78/0xa0
04:06:48:[11760.428755]  [&amp;lt;ffffffff811fdb73&amp;gt;] SyS_close+0x23/0x50
04:06:48:[11760.430084]  [&amp;lt;ffffffff816975c9&amp;gt;] system_call_fastpath+0x16/0x1b
04:06:48:[11826.143103] nfs: server trevis-6vm4 not responding, still trying
04:06:48:[12018.655111] nfs: server trevis-6vm4 not responding, still trying
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
</comment>
                            <comment id="232749" author="jamesanunez" created="Wed, 29 Aug 2018 17:25:50 +0000"  >&lt;p&gt;We&apos;re seeing this issue or something very close to it at &lt;br/&gt;
&lt;a href=&quot;https://testing.whamcloud.com/test_sets/df96b5ee-aae1-11e8-80f7-52540065bddc&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.whamcloud.com/test_sets/df96b5ee-aae1-11e8-80f7-52540065bddc&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;https://testing.whamcloud.com/test_sets/e801c64e-a918-11e8-80f7-52540065bddc&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.whamcloud.com/test_sets/e801c64e-a918-11e8-80f7-52540065bddc&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;https://testing.whamcloud.com/test_sets/7fcd4432-ab38-11e8-80f7-52540065bddc&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.whamcloud.com/test_sets/7fcd4432-ab38-11e8-80f7-52540065bddc&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In these, we see ls or dir_create hang on the client console logs. In the MDS console log, we see&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[ 8094.197467] Lustre: DEBUG MARKER: == racer test 1: racer on clients: onyx-41vm10,onyx-41vm9.onyx.whamcloud.com DURATION=300 ============ 01:57:42 (1535507862)
[ 8106.463255] LNetError: 14167:0:(lib-msg.c:779:lnet_is_health_check()) Msg is in inconsistent state, don&apos;t perform health checking (-125, 0)
[ 8113.461582] Lustre: lustre-MDT0000: Client 27389196-d811-d02a-8653-acd954b84381 (at 10.2.8.140@tcp) reconnecting
[ 8260.743287] LNet: Service thread pid 27085 was inactive for 62.03s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
[ 8260.745290] Pid: 27085, comm: mdt00_003 3.10.0-862.9.1.el7_lustre.x86_64 #1 SMP Fri Aug 17 20:37:05 UTC 2018
[ 8260.746424] Call Trace:
[ 8260.746727]  [&amp;lt;ffffffffc0d55f61&amp;gt;] ldlm_completion_ast+0x5b1/0x920 [ptlrpc]
[ 8260.747759]  [&amp;lt;ffffffffc0d57e93&amp;gt;] ldlm_cli_enqueue_local+0x233/0x860 [ptlrpc]
[ 8260.748642]  [&amp;lt;ffffffffc11b1bee&amp;gt;] mdt_dom_discard_data+0xfe/0x130 [mdt]
[ 8260.749734]  [&amp;lt;ffffffffc118b603&amp;gt;] mdt_reint_rename_internal.isra.40+0x1923/0x28a0 [mdt]
[ 8260.750712]  [&amp;lt;ffffffffc118dfcb&amp;gt;] mdt_reint_rename_or_migrate.isra.43+0x19b/0x860 [mdt]
[ 8260.751671]  [&amp;lt;ffffffffc118e6c3&amp;gt;] mdt_reint_rename+0x13/0x20 [mdt]
[ 8260.752421]  [&amp;lt;ffffffffc1193033&amp;gt;] mdt_reint_rec+0x83/0x210 [mdt]
[ 8260.753455]  [&amp;lt;ffffffffc11721d2&amp;gt;] mdt_reint_internal+0x6b2/0xa80 [mdt]
[ 8260.754249]  [&amp;lt;ffffffffc117d1e7&amp;gt;] mdt_reint+0x67/0x140 [mdt]
[ 8260.754927]  [&amp;lt;ffffffffc0df431a&amp;gt;] tgt_request_handle+0xaea/0x1580 [ptlrpc]
[ 8260.755931]  [&amp;lt;ffffffffc0d974ab&amp;gt;] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]
[ 8260.756855]  [&amp;lt;ffffffffc0d9ace4&amp;gt;] ptlrpc_main+0xb14/0x1fb0 [ptlrpc]
[ 8260.757643]  [&amp;lt;ffffffffa3cbb621&amp;gt;] kthread+0xd1/0xe0
[ 8260.758294]  [&amp;lt;ffffffffa43205f7&amp;gt;] ret_from_fork_nospec_end+0x0/0x39
[ 8260.759073]  [&amp;lt;ffffffffffffffff&amp;gt;] 0xffffffffffffffff
[ 8260.759863] LustreError: dumping log to /tmp/lustre-log.1535508029.27085
[ 8261.761641] Pid: 3455, comm: mdt00_028 3.10.0-862.9.1.el7_lustre.x86_64 #1 SMP Fri Aug 17 20:37:05 UTC 2018
[ 8261.762905] Call Trace:
[ 8261.763216]  [&amp;lt;ffffffffc0d55f61&amp;gt;] ldlm_completion_ast+0x5b1/0x920 [ptlrpc]
[ 8261.764129]  [&amp;lt;ffffffffc0d57e93&amp;gt;] ldlm_cli_enqueue_local+0x233/0x860 [ptlrpc]
[ 8261.764980]  [&amp;lt;ffffffffc11763c7&amp;gt;] mdt_object_local_lock+0x4e7/0xb20 [mdt]
[ 8261.765776]  [&amp;lt;ffffffffc1176a70&amp;gt;] mdt_object_lock_internal+0x70/0x330 [mdt]
[ 8261.766546]  [&amp;lt;ffffffffc1177ada&amp;gt;] mdt_getattr_name_lock+0x83a/0x1c00 [mdt]
[ 8261.767348]  [&amp;lt;ffffffffc117f885&amp;gt;] mdt_intent_getattr+0x2b5/0x480 [mdt]
[ 8261.768089]  [&amp;lt;ffffffffc117c768&amp;gt;] mdt_intent_policy+0x2f8/0xd10 [mdt]
[ 8261.768912]  [&amp;lt;ffffffffc0d3ce9e&amp;gt;] ldlm_lock_enqueue+0x34e/0xa50 [ptlrpc]
[ 8261.769714]  [&amp;lt;ffffffffc0d65523&amp;gt;] ldlm_handle_enqueue0+0x903/0x1520 [ptlrpc]
[ 8261.770557]  [&amp;lt;ffffffffc0deb9d2&amp;gt;] tgt_enqueue+0x62/0x210 [ptlrpc]
[ 8261.771307]  [&amp;lt;ffffffffc0df431a&amp;gt;] tgt_request_handle+0xaea/0x1580 [ptlrpc]
[ 8261.772124]  [&amp;lt;ffffffffc0d974ab&amp;gt;] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]
[ 8261.773076]  [&amp;lt;ffffffffc0d9ace4&amp;gt;] ptlrpc_main+0xb14/0x1fb0 [ptlrpc]
[ 8261.773824]  [&amp;lt;ffffffffa3cbb621&amp;gt;] kthread+0xd1/0xe0
[ 8261.774388]  [&amp;lt;ffffffffa43205f7&amp;gt;] ret_from_fork_nospec_end+0x0/0x39
[ 8261.775109]  [&amp;lt;ffffffffffffffff&amp;gt;] 0xffffffffffffffff
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="233308" author="jamesanunez" created="Mon, 10 Sep 2018 23:45:56 +0000"  >&lt;p&gt;Created &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-11359&quot; title=&quot;racer test 1 times out with client hung in dir_create.sh, ls, &#8230; and MDS in ldlm_completion_ast()&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-11359&quot;&gt;&lt;del&gt;LU-11359&lt;/del&gt;&lt;/a&gt; to track this issue because this ticket is probably several issues. &lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="53269">LU-11359</issuekey>
        </issuelink>
                            </outwardlinks>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="50269">LU-10525</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzxgfz:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>