<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:43:10 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-11358] racer test 1 hangs in locking with DNE</title>
                <link>https://jira.whamcloud.com/browse/LU-11358</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;For &lt;a href=&quot;https://testing.whamcloud.com/test_sets/122c1c1e-b529-11e8-8c12-52540065bddc&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.whamcloud.com/test_sets/122c1c1e-b529-11e8-8c12-52540065bddc&lt;/a&gt;, we see that racer test_1 hangs while running in a DNE environment. &lt;br/&gt;
On the first client (vm1), we see that dir_create is hung. From the client console log, we see&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[  641.878967] LustreError: 29399:0:(lov_object.c:1208:lov_layout_change()) lustre-clilov-ffff8c19f9a62000: cannot apply new layout on [0x240000404:0x23d:0x0] : rc = -22
[  641.881511] LustreError: 29399:0:(lcommon_cl.c:187:cl_file_inode_init()) Failure to initialize cl object [0x240000404:0x23d:0x0]: -22
[  649.716504] LustreError: 5568:0:(lov_object.c:1208:lov_layout_change()) lustre-clilov-ffff8c19f9a62000: cannot apply new layout on [0x240000404:0x23d:0x0] : rc = -22
[  649.719224] LustreError: 5568:0:(lcommon_cl.c:187:cl_file_inode_init()) Failure to initialize cl object [0x240000404:0x23d:0x0]: -22
[  715.441686] Lustre: lustre-OST0003-osc-ffff8c19f9a62000: Connection to lustre-OST0003 (at 10.2.4.95@tcp) was lost; in progress operations using this service will wait for recovery to complete
[  715.444729] LustreError: 167-0: lustre-OST0003-osc-ffff8c19f9a62000: This client was evicted by lustre-OST0003; in progress operations using this service will fail.
[  715.446755] LustreError: 18480:0:(ldlm_resource.c:1146:ldlm_resource_complain()) lustre-OST0003-osc-ffff8c19f9a62000: namespace resource [0x57:0x0:0x0].0x0 (ffff8c19dec760c0) refcount nonzero (1) after lock cleanup; forcing cleanup.
[  719.586481] Lustre: 4306:0:(client.c:2126:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1536596966/real 1536596966]  req@ffff8c19e35b1800 x1611238269322960/t0(0) o36-&amp;gt;lustre-MDT0000-mdc-ffff8c19fbdd9000@10.2.4.96@tcp:12/10 lens 608/4768 e 0 to 1 dl 1536597121 ref 2 fl Rpc:IX/0/ffffffff rc 0/-1
[  719.589415] LustreError: 4306:0:(llite_lib.c:1559:ll_md_setattr()) md_setattr fails: rc = -4
[  719.593107] Lustre: lustre-MDT0000-mdc-ffff8c19fbdd9000: Connection restored to 10.2.4.96@tcp (at 10.2.4.96@tcp)
[  840.233545] INFO: task dir_create.sh:28323 blocked for more than 120 seconds.
[  840.234422] &quot;echo 0 &amp;gt; /proc/sys/kernel/hung_task_timeout_secs&quot; disables this message.
[  840.235280] dir_create.sh   D ffff8c19e2766eb0     0 28323      1 0x00000084
[  840.236116] Call Trace:
[  840.236610]  [&amp;lt;ffffffffc080985c&amp;gt;] ? jobid_interpret_string+0x12c/0x1c0 [obdclass]
[  840.237611]  [&amp;lt;ffffffff9f514029&amp;gt;] schedule+0x29/0x70
[  840.238134]  [&amp;lt;ffffffff9f511999&amp;gt;] schedule_timeout+0x239/0x2c0
[  840.238757]  [&amp;lt;ffffffff9f1516dd&amp;gt;] ? radix_tree_gang_lookup+0xcd/0x150
[  840.239748]  [&amp;lt;ffffffffc0ba9ab7&amp;gt;] ? osc_page_gang_lookup+0x307/0x370 [osc]
[  840.240501]  [&amp;lt;ffffffff9f5143dd&amp;gt;] wait_for_completion+0xfd/0x140
[  840.241191]  [&amp;lt;ffffffff9eecf670&amp;gt;] ? wake_up_state+0x20/0x20
[  840.241775]  [&amp;lt;ffffffffc0ba3688&amp;gt;] osc_io_setattr_end+0xe8/0x200 [osc]
[  840.242554]  [&amp;lt;ffffffffc0bf4fa0&amp;gt;] ? lov_io_iter_fini_wrapper+0x50/0x50 [lov]
[  840.243333]  [&amp;lt;ffffffffc0803f70&amp;gt;] cl_io_end+0x60/0x150 [obdclass]
[  840.243993]  [&amp;lt;ffffffffc0bf507b&amp;gt;] lov_io_end_wrapper+0xdb/0xe0 [lov]
[  840.244789]  [&amp;lt;ffffffffc0bf5507&amp;gt;] lov_io_call.isra.5+0x87/0x140 [lov]
[  840.245626]  [&amp;lt;ffffffffc0bf55f6&amp;gt;] lov_io_end+0x36/0xb0 [lov]
[  840.246235]  [&amp;lt;ffffffffc0803f70&amp;gt;] cl_io_end+0x60/0x150 [obdclass]
[  840.246867]  [&amp;lt;ffffffffc080661f&amp;gt;] cl_io_loop+0x13f/0xc70 [obdclass]
[  840.247719]  [&amp;lt;ffffffffc0c959c0&amp;gt;] cl_setattr_ost+0x250/0x3c0 [lustre]
[  840.248424]  [&amp;lt;ffffffffc0c71b6f&amp;gt;] ll_setattr_raw+0xe7f/0xee0 [lustre]
[  840.249146]  [&amp;lt;ffffffffc0c71c3c&amp;gt;] ll_setattr+0x6c/0xd0 [lustre]
[  840.249881]  [&amp;lt;ffffffff9f039f14&amp;gt;] notify_change+0x2c4/0x420
[  840.250539]  [&amp;lt;ffffffff9f0f3a5e&amp;gt;] ? process_measurement+0x8e/0x250
[  840.251186]  [&amp;lt;ffffffff9f019185&amp;gt;] do_truncate+0x75/0xc0
[  840.251733]  [&amp;lt;ffffffff9f02b717&amp;gt;] do_last+0x627/0x12c0
[  840.252361]  [&amp;lt;ffffffff9f02c487&amp;gt;] path_openat+0xd7/0x640
[  840.252995]  [&amp;lt;ffffffff9f02e01d&amp;gt;] do_filp_open+0x4d/0xb0
[  840.253633]  [&amp;lt;ffffffff9f03b4c4&amp;gt;] ? __alloc_fd+0xc4/0x170
[  840.254238]  [&amp;lt;ffffffff9f01a327&amp;gt;] do_sys_open+0x137/0x240
[  840.254805]  [&amp;lt;ffffffff9f5206d5&amp;gt;] ? system_call_after_swapgs+0xa2/0x146
[  840.255637]  [&amp;lt;ffffffff9f01a44e&amp;gt;] SyS_open+0x1e/0x20
[  840.256167]  [&amp;lt;ffffffff9f520795&amp;gt;] system_call_fastpath+0x1c/0x21
[  840.256788]  [&amp;lt;ffffffff9f5206e1&amp;gt;] ? system_call_after_swapgs+0xae/0x146
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;On the second client (vm2), we see it is stuck in setfattr, mv, ln, touch, lfs, cat, chmod, &#8230; ending finally with &#8220;md_setattr fails: rc = -4&#8221;. From the client console log, we see:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[  960.234201] INFO: task setfattr:4379 blocked for more than 120 seconds.
[  960.235152] &quot;echo 0 &amp;gt; /proc/sys/kernel/hung_task_timeout_secs&quot; disables this message.
[  960.236088] setfattr        D ffff88319d295ee0     0  4379  10275 0x00000080
[  960.236836] Call Trace:
[  960.237118]  [&amp;lt;ffffffff89e27040&amp;gt;] ? generic_permission+0x110/0x1d0
[  960.237930]  [&amp;lt;ffffffff8a314f49&amp;gt;] schedule_preempt_disabled+0x29/0x70
[  960.238864]  [&amp;lt;ffffffff8a312d07&amp;gt;] __mutex_lock_slowpath+0xc7/0x1d0
[  960.239574]  [&amp;lt;ffffffff8a3120ef&amp;gt;] mutex_lock+0x1f/0x2f
[  960.240178]  [&amp;lt;ffffffff89e432af&amp;gt;] vfs_removexattr+0x5f/0x130
[  960.241016]  [&amp;lt;ffffffff89e433d5&amp;gt;] removexattr+0x55/0x80
[  960.241569]  [&amp;lt;ffffffff89e2ccdd&amp;gt;] ? putname+0x3d/0x60
[  960.242271]  [&amp;lt;ffffffff89e2def2&amp;gt;] ? user_path_at_empty+0x72/0xc0
[  960.242971]  [&amp;lt;ffffffff89e1e418&amp;gt;] ? __sb_start_write+0x58/0x110
[  960.243748]  [&amp;lt;ffffffff8a31b57c&amp;gt;] ? __do_page_fault+0x1bc/0x4f0
[  960.244509]  [&amp;lt;ffffffff89e3e03c&amp;gt;] ? mnt_want_write+0x2c/0x50
[  960.245274]  [&amp;lt;ffffffff89e441d4&amp;gt;] SyS_removexattr+0x94/0xd0
[  960.245851]  [&amp;lt;ffffffff8a320795&amp;gt;] system_call_fastpath+0x1c/0x21
[  960.246428]  [&amp;lt;ffffffff8a3206e1&amp;gt;] ? system_call_after_swapgs+0xae/0x146
[  960.247093] INFO: task setfattr:4708 blocked for more than 120 seconds.
[  960.247889] &quot;echo 0 &amp;gt; /proc/sys/kernel/hung_task_timeout_secs&quot; disables this message.
[  960.248730] setfattr        D ffff88319d285ee0     0  4708  10256 0x00000080
[  960.249474] Call Trace:
[  960.249731]  [&amp;lt;ffffffff89e27040&amp;gt;] ? generic_permission+0x110/0x1d0
[  960.250601]  [&amp;lt;ffffffff8a314f49&amp;gt;] schedule_preempt_disabled+0x29/0x70
[  960.251286]  [&amp;lt;ffffffff8a312d07&amp;gt;] __mutex_lock_slowpath+0xc7/0x1d0
[  960.251901]  [&amp;lt;ffffffff8a3120ef&amp;gt;] mutex_lock+0x1f/0x2f
[  960.252397]  [&amp;lt;ffffffff89e432af&amp;gt;] vfs_removexattr+0x5f/0x130
[  960.253026]  [&amp;lt;ffffffff89e433d5&amp;gt;] removexattr+0x55/0x80
[  960.253722]  [&amp;lt;ffffffff89e2ccdd&amp;gt;] ? putname+0x3d/0x60
[  960.254310]  [&amp;lt;ffffffff89e2def2&amp;gt;] ? user_path_at_empty+0x72/0xc0
[  960.255005]  [&amp;lt;ffffffff89e1e418&amp;gt;] ? __sb_start_write+0x58/0x110
[  960.255580]  [&amp;lt;ffffffff8a31b57c&amp;gt;] ? __do_page_fault+0x1bc/0x4f0
[  960.256302]  [&amp;lt;ffffffff89e3e03c&amp;gt;] ? mnt_want_write+0x2c/0x50
[  960.256943]  [&amp;lt;ffffffff89e441d4&amp;gt;] SyS_removexattr+0x94/0xd0
[  960.257558]  [&amp;lt;ffffffff8a320795&amp;gt;] system_call_fastpath+0x1c/0x21
[  960.258238]  [&amp;lt;ffffffff8a3206e1&amp;gt;] ? system_call_after_swapgs+0xae/0x146
[  960.258922] INFO: task mv:4894 blocked for more than 120 seconds.
[  960.259514] &quot;echo 0 &amp;gt; /proc/sys/kernel/hung_task_timeout_secs&quot; disables this message.
[  960.260406] mv              D ffff8831abf83f40     0  4894  10227 0x00000080
[  960.261156] Call Trace:
[  960.261461]  [&amp;lt;ffffffff89ed1072&amp;gt;] ? security_inode_permission+0x22/0x30
[  960.262321]  [&amp;lt;ffffffff8a314f49&amp;gt;] schedule_preempt_disabled+0x29/0x70
[  960.263071]  [&amp;lt;ffffffff8a312d07&amp;gt;] __mutex_lock_slowpath+0xc7/0x1d0
[  960.263708]  [&amp;lt;ffffffff8a3120ef&amp;gt;] mutex_lock+0x1f/0x2f
[  960.264213]  [&amp;lt;ffffffff8a30bcde&amp;gt;] lookup_slow+0x33/0xa7
[  960.264920]  [&amp;lt;ffffffff89e2aa08&amp;gt;] path_lookupat+0x838/0x8b0
[  960.265479]  [&amp;lt;ffffffff89df7a35&amp;gt;] ? kmem_cache_alloc+0x35/0x1f0
[  960.266156]  [&amp;lt;ffffffff89e2cd4f&amp;gt;] ? getname_flags+0x4f/0x1a0
[  960.266720]  [&amp;lt;ffffffff89e2aaab&amp;gt;] filename_lookup+0x2b/0xc0
[  960.267252]  [&amp;lt;ffffffff89e2dee7&amp;gt;] user_path_at_empty+0x67/0xc0
[  960.267959]  [&amp;lt;ffffffff89d25092&amp;gt;] ? from_kgid_munged+0x12/0x20
[  960.268570]  [&amp;lt;ffffffff89e2142f&amp;gt;] ? cp_new_stat+0x14f/0x180
[  960.269147]  [&amp;lt;ffffffff89e2df51&amp;gt;] user_path_at+0x11/0x20
[  960.269667]  [&amp;lt;ffffffff89e20f23&amp;gt;] vfs_fstatat+0x63/0xc0
[  960.270189]  [&amp;lt;ffffffff89e214f1&amp;gt;] SYSC_newlstat+0x31/0x60
[  960.270976]  [&amp;lt;ffffffff8a3206e1&amp;gt;] ? system_call_after_swapgs+0xae/0x146
[  960.271851]  [&amp;lt;ffffffff8a3206d5&amp;gt;] ? system_call_after_swapgs+0xa2/0x146
[  960.272595]  [&amp;lt;ffffffff8a3206e1&amp;gt;] ? system_call_after_swapgs+0xae/0x146
[  960.273350]  [&amp;lt;ffffffff8a3206d5&amp;gt;] ? system_call_after_swapgs+0xa2/0x146
[  960.274040]  [&amp;lt;ffffffff8a3206e1&amp;gt;] ? system_call_after_swapgs+0xae/0x146
[  960.274856]  [&amp;lt;ffffffff8a3206d5&amp;gt;] ? system_call_after_swapgs+0xa2/0x146
[  960.275794]  [&amp;lt;ffffffff8a3206e1&amp;gt;] ? system_call_after_swapgs+0xae/0x146
[  960.276704]  [&amp;lt;ffffffff8a3206d5&amp;gt;] ? system_call_after_swapgs+0xa2/0x146
[  960.277453]  [&amp;lt;ffffffff8a3206e1&amp;gt;] ? system_call_after_swapgs+0xae/0x146
[  960.278135]  [&amp;lt;ffffffff8a3206d5&amp;gt;] ? system_call_after_swapgs+0xa2/0x146
[  960.278810]  [&amp;lt;ffffffff8a3206e1&amp;gt;] ? system_call_after_swapgs+0xae/0x146
[  960.279628]  [&amp;lt;ffffffff89e2177e&amp;gt;] SyS_newlstat+0xe/0x10
[  960.280204]  [&amp;lt;ffffffff8a320795&amp;gt;] system_call_fastpath+0x1c/0x21
[  960.280914]  [&amp;lt;ffffffff8a3206e1&amp;gt;] ? system_call_after_swapgs+0xae/0x146
[  960.281654] INFO: task ln:5093 blocked for more than 120 seconds.
[  960.282238] &quot;echo 0 &amp;gt; /proc/sys/kernel/hung_task_timeout_secs&quot; disables this message.
[  960.283146] ln              D ffff88319d24af70     0  5093  10229 0x00000080
[  960.283887] Call Trace:
[  960.284172]  [&amp;lt;ffffffff89e2a2f2&amp;gt;] ? path_lookupat+0x122/0x8b0
[  960.284908]  [&amp;lt;ffffffff89df5a59&amp;gt;] ? ___slab_alloc+0x209/0x4f0
[  960.285587]  [&amp;lt;ffffffff8a314f49&amp;gt;] schedule_preempt_disabled+0x29/0x70
[  960.286261]  [&amp;lt;ffffffff8a312d07&amp;gt;] __mutex_lock_slowpath+0xc7/0x1d0
[  960.287045]  [&amp;lt;ffffffff8a3120ef&amp;gt;] mutex_lock+0x1f/0x2f
[  960.287796]  [&amp;lt;ffffffff89e2abc5&amp;gt;] filename_create+0x85/0x180
[  960.288559]  [&amp;lt;ffffffff89df7b9b&amp;gt;] ? kmem_cache_alloc+0x19b/0x1f0
[  960.289344]  [&amp;lt;ffffffff89e2cd4f&amp;gt;] ? getname_flags+0x4f/0x1a0
[  960.290171]  [&amp;lt;ffffffff89e2cdc4&amp;gt;] ? getname_flags+0xc4/0x1a0
[  960.290901]  [&amp;lt;ffffffff89e2d071&amp;gt;] user_path_create+0x41/0x60
[  960.291687]  [&amp;lt;ffffffff89e2e5e8&amp;gt;] SyS_symlinkat+0x58/0x110
[  960.292335]  [&amp;lt;ffffffff8a3206d5&amp;gt;] ? system_call_after_swapgs+0xa2/0x146
[  960.293033]  [&amp;lt;ffffffff8a3206e1&amp;gt;] ? system_call_after_swapgs+0xae/0x146
[  960.293880]  [&amp;lt;ffffffff89e2e6b6&amp;gt;] SyS_symlink+0x16/0x20
[  960.294530]  [&amp;lt;ffffffff8a320795&amp;gt;] system_call_fastpath+0x1c/0x21
[  960.295233]  [&amp;lt;ffffffff8a3206e1&amp;gt;] ? system_call_after_swapgs+0xae/0x146
[  960.295935] INFO: task touch:5326 blocked for more than 120 seconds.
[  960.296628] &quot;echo 0 &amp;gt; /proc/sys/kernel/hung_task_timeout_secs&quot; disables this message.
[  960.297687] touch           D ffff8831aad8eeb0     0  5326  10280 0x00000080
[  960.298689] Call Trace:
[  960.298949]  [&amp;lt;ffffffff8a314f49&amp;gt;] schedule_preempt_disabled+0x29/0x70
[  960.299603]  [&amp;lt;ffffffff8a312d07&amp;gt;] __mutex_lock_slowpath+0xc7/0x1d0
[  960.300357]  [&amp;lt;ffffffff8a3120ef&amp;gt;] mutex_lock+0x1f/0x2f
[  960.300970]  [&amp;lt;ffffffff89e2b37f&amp;gt;] do_last+0x28f/0x12c0
[  960.301611]  [&amp;lt;ffffffff89e2c487&amp;gt;] path_openat+0xd7/0x640
[  960.302331]  [&amp;lt;ffffffff89e2e01d&amp;gt;] do_filp_open+0x4d/0xb0
[  960.303145]  [&amp;lt;ffffffff89e3b4c4&amp;gt;] ? __alloc_fd+0xc4/0x170
[  960.303935]  [&amp;lt;ffffffff89e1a327&amp;gt;] do_sys_open+0x137/0x240
[  960.304594]  [&amp;lt;ffffffff8a3206d5&amp;gt;] ? system_call_after_swapgs+0xa2/0x146
[  960.305239]  [&amp;lt;ffffffff89e1a44e&amp;gt;] SyS_open+0x1e/0x20
[  960.305818]  [&amp;lt;ffffffff8a320795&amp;gt;] system_call_fastpath+0x1c/0x21
[  960.306572]  [&amp;lt;ffffffff8a3206e1&amp;gt;] ? system_call_after_swapgs+0xae/0x146
[  960.307450] INFO: task setfattr:5397 blocked for more than 120 seconds.
[  960.308286] &quot;echo 0 &amp;gt; /proc/sys/kernel/hung_task_timeout_secs&quot; disables this message.
[  960.309232] setfattr        D ffff8831a6433f40     0  5397  10258 0x00000080
[  960.310116] Call Trace:
[  960.310390]  [&amp;lt;ffffffff89e27040&amp;gt;] ? generic_permission+0x110/0x1d0
[  960.311113]  [&amp;lt;ffffffff8a314f49&amp;gt;] schedule_preempt_disabled+0x29/0x70
[  960.312040]  [&amp;lt;ffffffff8a312d07&amp;gt;] __mutex_lock_slowpath+0xc7/0x1d0
[  960.312870]  [&amp;lt;ffffffff8a3120ef&amp;gt;] mutex_lock+0x1f/0x2f
[  960.313448]  [&amp;lt;ffffffff89e43612&amp;gt;] vfs_setxattr+0x62/0xc0
[  960.314029]  [&amp;lt;ffffffff89e4379c&amp;gt;] setxattr+0x12c/0x1c0
[  960.314591]  [&amp;lt;ffffffff89df6ec2&amp;gt;] ? kmem_cache_free+0x1e2/0x200
[  960.315167]  [&amp;lt;ffffffff89e2ccdd&amp;gt;] ? putname+0x3d/0x60
[  960.315661]  [&amp;lt;ffffffff89e2def2&amp;gt;] ? user_path_at_empty+0x72/0xc0
[  960.316354]  [&amp;lt;ffffffff89e1e418&amp;gt;] ? __sb_start_write+0x58/0x110
[  960.316947]  [&amp;lt;ffffffff8a31b57c&amp;gt;] ? __do_page_fault+0x1bc/0x4f0
[  960.317592]  [&amp;lt;ffffffff89e43a97&amp;gt;] SyS_setxattr+0xb7/0x100
[  960.318115]  [&amp;lt;ffffffff8a320795&amp;gt;] system_call_fastpath+0x1c/0x21
[  960.318753]  [&amp;lt;ffffffff8a3206e1&amp;gt;] ? system_call_after_swapgs+0xae/0x146
[  960.319582] INFO: task ln:5444 blocked for more than 120 seconds.
[  960.320167] &quot;echo 0 &amp;gt; /proc/sys/kernel/hung_task_timeout_secs&quot; disables this message.
[  960.320996] ln              D ffff8831abf86eb0     0  5444  10267 0x00000080
[  960.321733] Call Trace:
[  960.322040]  [&amp;lt;ffffffff89e2a2f2&amp;gt;] ? path_lookupat+0x122/0x8b0
[  960.322603]  [&amp;lt;ffffffff89df5a59&amp;gt;] ? ___slab_alloc+0x209/0x4f0
[  960.323223]  [&amp;lt;ffffffff8a314f49&amp;gt;] schedule_preempt_disabled+0x29/0x70
[  960.323886]  [&amp;lt;ffffffff8a312d07&amp;gt;] __mutex_lock_slowpath+0xc7/0x1d0
[  960.324486]  [&amp;lt;ffffffff8a3120ef&amp;gt;] mutex_lock+0x1f/0x2f
[  960.325130]  [&amp;lt;ffffffff89e2abc5&amp;gt;] filename_create+0x85/0x180
[  960.325688]  [&amp;lt;ffffffff89df7b9b&amp;gt;] ? kmem_cache_alloc+0x19b/0x1f0
[  960.326313]  [&amp;lt;ffffffff89e2cd4f&amp;gt;] ? getname_flags+0x4f/0x1a0
[  960.326988]  [&amp;lt;ffffffff89e2cdc4&amp;gt;] ? getname_flags+0xc4/0x1a0
[  960.327537]  [&amp;lt;ffffffff89e2d071&amp;gt;] user_path_create+0x41/0x60
[  960.328165]  [&amp;lt;ffffffff89e2e5e8&amp;gt;] SyS_symlinkat+0x58/0x110
[  960.328705]  [&amp;lt;ffffffff8a3206d5&amp;gt;] ? system_call_after_swapgs+0xa2/0x146
[  960.329367]  [&amp;lt;ffffffff8a3206e1&amp;gt;] ? system_call_after_swapgs+0xae/0x146
[  960.330116]  [&amp;lt;ffffffff89e2e6b6&amp;gt;] SyS_symlink+0x16/0x20
[  960.330631]  [&amp;lt;ffffffff8a320795&amp;gt;] system_call_fastpath+0x1c/0x21
[  960.331304]  [&amp;lt;ffffffff8a3206e1&amp;gt;] ? system_call_after_swapgs+0xae/0x146
[  960.332054] INFO: task lfs:10586 blocked for more than 120 seconds.
[  960.332725] &quot;echo 0 &amp;gt; /proc/sys/kernel/hung_task_timeout_secs&quot; disables this message.
[  960.333471] lfs             D ffff8831abe38fd0     0 10586  10241 0x00000080
[  960.334225] Call Trace:
[  960.334539]  [&amp;lt;ffffffffc0d167e2&amp;gt;] ? ll_dcompare+0x72/0x2e0 [lustre]
[  960.335222]  [&amp;lt;ffffffff8a314f49&amp;gt;] schedule_preempt_disabled+0x29/0x70
[  960.335853]  [&amp;lt;ffffffff8a312d07&amp;gt;] __mutex_lock_slowpath+0xc7/0x1d0
[  960.336513]  [&amp;lt;ffffffff8a3120ef&amp;gt;] mutex_lock+0x1f/0x2f
[  960.337026]  [&amp;lt;ffffffffc0d32a9f&amp;gt;] ll_migrate+0x16f/0xcc0 [lustre]
[  960.337669]  [&amp;lt;ffffffffc0d21a35&amp;gt;] ll_dir_ioctl+0x59c5/0x6560 [lustre]
[  960.338587]  [&amp;lt;ffffffff89dc5a9c&amp;gt;] ? handle_pte_fault+0x2dc/0xc30
[  960.339201]  [&amp;lt;ffffffff89e3dde4&amp;gt;] ? mntput+0x24/0x40
[  960.339693]  [&amp;lt;ffffffff89dc7c3d&amp;gt;] ? handle_mm_fault+0x39d/0x9b0
[  960.340288]  [&amp;lt;ffffffff89e30350&amp;gt;] do_vfs_ioctl+0x350/0x560
[  960.340855]  [&amp;lt;ffffffff8a31b57c&amp;gt;] ? __do_page_fault+0x1bc/0x4f0
[  960.341908]  [&amp;lt;ffffffff89e30601&amp;gt;] SyS_ioctl+0xa1/0xc0
[  960.342852]  [&amp;lt;ffffffff8a3206d5&amp;gt;] ? system_call_after_swapgs+0xa2/0x146
[  960.343595]  [&amp;lt;ffffffff8a320795&amp;gt;] system_call_fastpath+0x1c/0x21
[  960.344181]  [&amp;lt;ffffffff8a3206e1&amp;gt;] ? system_call_after_swapgs+0xae/0x146
[  960.345175] INFO: task cat:10688 blocked for more than 120 seconds.
[  960.346073] &quot;echo 0 &amp;gt; /proc/sys/kernel/hung_task_timeout_secs&quot; disables this message.
[  960.346939] cat             D ffff8831a95acf10     0 10688  10231 0x00000080
[  960.347930] Call Trace:
[  960.348208]  [&amp;lt;ffffffff89ed0a1e&amp;gt;] ? security_capable+0x1e/0x20
[  960.348956]  [&amp;lt;ffffffff8a314f49&amp;gt;] schedule_preempt_disabled+0x29/0x70
[  960.350080]  [&amp;lt;ffffffff8a312d07&amp;gt;] __mutex_lock_slowpath+0xc7/0x1d0
[  960.350886]  [&amp;lt;ffffffff8a3120ef&amp;gt;] mutex_lock+0x1f/0x2f
[  960.351527]  [&amp;lt;ffffffff8a30bcde&amp;gt;] lookup_slow+0x33/0xa7
[  960.352142]  [&amp;lt;ffffffff89e2927f&amp;gt;] link_path_walk+0x80f/0x8b0
[  960.352945]  [&amp;lt;ffffffff89e2c465&amp;gt;] path_openat+0xb5/0x640
[  960.353620]  [&amp;lt;ffffffff89cf9172&amp;gt;] ? ktime_get+0x52/0xe0
[  960.354387]  [&amp;lt;ffffffffc0d285ef&amp;gt;] ? ll_getattr+0x22f/0x580 [lustre]
[  960.355148]  [&amp;lt;ffffffff89e2e01d&amp;gt;] do_filp_open+0x4d/0xb0
[  960.355707]  [&amp;lt;ffffffff89e3b4c4&amp;gt;] ? __alloc_fd+0xc4/0x170
[  960.356563]  [&amp;lt;ffffffff89e1a327&amp;gt;] do_sys_open+0x137/0x240
[  960.357315]  [&amp;lt;ffffffff8a3206d5&amp;gt;] ? system_call_after_swapgs+0xa2/0x146
[  960.358497]  [&amp;lt;ffffffff89e1a44e&amp;gt;] SyS_open+0x1e/0x20
[  960.359026]  [&amp;lt;ffffffff8a320795&amp;gt;] system_call_fastpath+0x1c/0x21
[  960.359646]  [&amp;lt;ffffffff8a3206e1&amp;gt;] ? system_call_after_swapgs+0xae/0x146
[  960.360691] INFO: task chmod:11121 blocked for more than 120 seconds.
[  960.361650] &quot;echo 0 &amp;gt; /proc/sys/kernel/hung_task_timeout_secs&quot; disables this message.
[  960.362816] chmod           D ffff8831a32f8fd0     0 11121  10234 0x00000080
[  960.363567] Call Trace:
[  960.363824]  [&amp;lt;ffffffff89e2cd4f&amp;gt;] ? getname_flags+0x4f/0x1a0
[  960.364484]  [&amp;lt;ffffffff8a314f49&amp;gt;] schedule_preempt_disabled+0x29/0x70
[  960.365104]  [&amp;lt;ffffffff8a312d07&amp;gt;] __mutex_lock_slowpath+0xc7/0x1d0
[  960.365787]  [&amp;lt;ffffffff8a3120ef&amp;gt;] mutex_lock+0x1f/0x2f
[  960.366323]  [&amp;lt;ffffffff89e189c8&amp;gt;] chmod_common+0x78/0x160
[  960.366924]  [&amp;lt;ffffffff8a31b57c&amp;gt;] ? __do_page_fault+0x1bc/0x4f0
[  960.367607]  [&amp;lt;ffffffff89e19d17&amp;gt;] SyS_fchmodat+0x57/0xc0
[  960.368125]  [&amp;lt;ffffffff8a320795&amp;gt;] system_call_fastpath+0x1c/0x21
[  960.368777]  [&amp;lt;ffffffff8a3206e1&amp;gt;] ? system_call_after_swapgs+0xae/0x146
[  990.121506] LustreError: 4369:0:(llite_lib.c:1559:ll_md_setattr()) md_setattr fails: rc = -4
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;On MDS2, 4 (vm5), we see the following three call traces in the console log:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[  259.348108] Lustre: DEBUG MARKER: == racer test 1: racer on clients: onyx-30vm1.onyx.whamcloud.com,onyx-30vm2 DURATION=300 ============= 16:25:55 (1536596755)
[  351.363812] LNet: Service thread pid 22468 was inactive for 62.20s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
[  351.367067] Pid: 22468, comm: mdt00_009 3.10.0-862.9.1.el7_lustre.x86_64 #1 SMP Thu Aug 30 12:08:10 UTC 2018
[  351.368822] Call Trace:
[  351.369374]  [&amp;lt;ffffffffc0d3df61&amp;gt;] ldlm_completion_ast+0x5b1/0x920 [ptlrpc]
[  351.370937]  [&amp;lt;ffffffffc0d3ed83&amp;gt;] ldlm_cli_enqueue_local+0x233/0x860 [ptlrpc]
[  351.372390]  [&amp;lt;ffffffffc10e63d7&amp;gt;] mdt_object_local_lock+0x4e7/0xb20 [mdt]
[  351.373833]  [&amp;lt;ffffffffc10e6a80&amp;gt;] mdt_object_lock_internal+0x70/0x330 [mdt]
[  351.375390]  [&amp;lt;ffffffffc10e6d60&amp;gt;] mdt_object_lock+0x20/0x30 [mdt]
[  351.376680]  [&amp;lt;ffffffffc112186b&amp;gt;] mdt_brw_enqueue+0x44b/0x760 [mdt]
[  351.377955]  [&amp;lt;ffffffffc10d44bf&amp;gt;] mdt_intent_brw+0x1f/0x30 [mdt]
[  351.379141]  [&amp;lt;ffffffffc10ec9c8&amp;gt;] mdt_intent_policy+0x2f8/0xd10 [mdt]
[  351.380500]  [&amp;lt;ffffffffc0d24e9e&amp;gt;] ldlm_lock_enqueue+0x34e/0xa50 [ptlrpc]
[  351.381929]  [&amp;lt;ffffffffc0d4d503&amp;gt;] ldlm_handle_enqueue0+0x903/0x1520 [ptlrpc]
[  351.383341]  [&amp;lt;ffffffffc0dd39b2&amp;gt;] tgt_enqueue+0x62/0x210 [ptlrpc]
[  351.384680]  [&amp;lt;ffffffffc0ddc2fa&amp;gt;] tgt_request_handle+0xaea/0x1580 [ptlrpc]
[  351.386091]  [&amp;lt;ffffffffc0d7f48b&amp;gt;] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]
[  351.387592]  [&amp;lt;ffffffffc0d82cc4&amp;gt;] ptlrpc_main+0xb14/0x1fb0 [ptlrpc]
[  351.388691]  [&amp;lt;ffffffffb9abb621&amp;gt;] kthread+0xd1/0xe0
[  351.389324]  [&amp;lt;ffffffffba1205f7&amp;gt;] ret_from_fork_nospec_end+0x0/0x39
[  351.390034]  [&amp;lt;ffffffffffffffff&amp;gt;] 0xffffffffffffffff
[  351.390691] LustreError: dumping log to /tmp/lustre-log.1536596849.22468
[  351.992686] Pid: 22484, comm: mdt00_019 3.10.0-862.9.1.el7_lustre.x86_64 #1 SMP Thu Aug 30 12:08:10 UTC 2018
[  351.994620] Call Trace:
[  351.995150]  [&amp;lt;ffffffffc0d3df61&amp;gt;] ldlm_completion_ast+0x5b1/0x920 [ptlrpc]
[  351.996680]  [&amp;lt;ffffffffc0d3ed83&amp;gt;] ldlm_cli_enqueue_local+0x233/0x860 [ptlrpc]
[  351.998519]  [&amp;lt;ffffffffc10e63d7&amp;gt;] mdt_object_local_lock+0x4e7/0xb20 [mdt]
[  352.000268]  [&amp;lt;ffffffffc10e6a80&amp;gt;] mdt_object_lock_internal+0x70/0x330 [mdt]
[  352.001606]  [&amp;lt;ffffffffc10e8f1a&amp;gt;] mdt_object_find_lock+0x6a/0x1a0 [mdt]
[  352.002977]  [&amp;lt;ffffffffc1104b4e&amp;gt;] mdt_reint_setxattr+0x1ce/0xf20 [mdt]
[  352.004283]  [&amp;lt;ffffffffc1103293&amp;gt;] mdt_reint_rec+0x83/0x210 [mdt]
[  352.005561]  [&amp;lt;ffffffffc10e21e2&amp;gt;] mdt_reint_internal+0x6b2/0xa80 [mdt]
[  352.006896]  [&amp;lt;ffffffffc10ed447&amp;gt;] mdt_reint+0x67/0x140 [mdt]
[  352.007959]  [&amp;lt;ffffffffc0ddc2fa&amp;gt;] tgt_request_handle+0xaea/0x1580 [ptlrpc]
[  352.009256]  [&amp;lt;ffffffffc0d7f48b&amp;gt;] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]
[  352.010650]  [&amp;lt;ffffffffc0d82cc4&amp;gt;] ptlrpc_main+0xb14/0x1fb0 [ptlrpc]
[  352.011847]  [&amp;lt;ffffffffb9abb621&amp;gt;] kthread+0xd1/0xe0
[  352.012767]  [&amp;lt;ffffffffba1205f7&amp;gt;] ret_from_fork_nospec_end+0x0/0x39
[  352.013960]  [&amp;lt;ffffffffffffffff&amp;gt;] 0xffffffffffffffff
[  352.899805] Pid: 22464, comm: mdt00_005 3.10.0-862.9.1.el7_lustre.x86_64 #1 SMP Thu Aug 30 12:08:10 UTC 2018
[  352.901685] Call Trace:
[  352.902186]  [&amp;lt;ffffffffc0d3df61&amp;gt;] ldlm_completion_ast+0x5b1/0x920 [ptlrpc]
[  352.903489]  [&amp;lt;ffffffffc0d3ed83&amp;gt;] ldlm_cli_enqueue_local+0x233/0x860 [ptlrpc]
[  352.905019]  [&amp;lt;ffffffffc10e63d7&amp;gt;] mdt_object_local_lock+0x4e7/0xb20 [mdt]
[  352.906256]  [&amp;lt;ffffffffc10e6a80&amp;gt;] mdt_object_lock_internal+0x70/0x330 [mdt]
[  352.907593]  [&amp;lt;ffffffffc10e6df7&amp;gt;] mdt_object_lock_try+0x27/0xb0 [mdt]
[  352.908889]  [&amp;lt;ffffffffc10e8547&amp;gt;] mdt_getattr_name_lock+0x1297/0x1c00 [mdt]
[  352.910163]  [&amp;lt;ffffffffc10efae5&amp;gt;] mdt_intent_getattr+0x2b5/0x480 [mdt]
[  352.911631]  [&amp;lt;ffffffffc10ec9c8&amp;gt;] mdt_intent_policy+0x2f8/0xd10 [mdt]
[  352.912921]  [&amp;lt;ffffffffc0d24e9e&amp;gt;] ldlm_lock_enqueue+0x34e/0xa50 [ptlrpc]
[  352.914176]  [&amp;lt;ffffffffc0d4d503&amp;gt;] ldlm_handle_enqueue0+0x903/0x1520 [ptlrpc]
[  352.915530]  [&amp;lt;ffffffffc0dd39b2&amp;gt;] tgt_enqueue+0x62/0x210 [ptlrpc]
[  352.916805]  [&amp;lt;ffffffffc0ddc2fa&amp;gt;] tgt_request_handle+0xaea/0x1580 [ptlrpc]
[  352.918088]  [&amp;lt;ffffffffc0d7f48b&amp;gt;] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]
[  352.919538]  [&amp;lt;ffffffffc0d82cc4&amp;gt;] ptlrpc_main+0xb14/0x1fb0 [ptlrpc]
[  352.920825]  [&amp;lt;ffffffffb9abb621&amp;gt;] kthread+0xd1/0xe0
[  352.921765]  [&amp;lt;ffffffffba1205f7&amp;gt;] ret_from_fork_nospec_end+0x0/0x39
[  352.922967]  [&amp;lt;ffffffffffffffff&amp;gt;] 0xffffffffffffffff
[  352.924148] LustreError: dumping log to /tmp/lustre-log.1536596850.22464
[  353.155770] Pid: 14392, comm: mdt00_002 3.10.0-862.9.1.el7_lustre.x86_64 #1 SMP Thu Aug 30 12:08:10 UTC 2018
[  353.157590] Call Trace:
[  353.158104]  [&amp;lt;ffffffffc0d3df61&amp;gt;] ldlm_completion_ast+0x5b1/0x920 [ptlrpc]
[  353.159532]  [&amp;lt;ffffffffc0d3ed83&amp;gt;] ldlm_cli_enqueue_local+0x233/0x860 [ptlrpc]
[  353.160957]  [&amp;lt;ffffffffc10e63d7&amp;gt;] mdt_object_local_lock+0x4e7/0xb20 [mdt]
[  353.162198]  [&amp;lt;ffffffffc10e6a80&amp;gt;] mdt_object_lock_internal+0x70/0x330 [mdt]
[  353.163514]  [&amp;lt;ffffffffc10e6df7&amp;gt;] mdt_object_lock_try+0x27/0xb0 [mdt]
[  353.164788]  [&amp;lt;ffffffffc10e8547&amp;gt;] mdt_getattr_name_lock+0x1297/0x1c00 [mdt]
[  353.166069]  [&amp;lt;ffffffffc10efae5&amp;gt;] mdt_intent_getattr+0x2b5/0x480 [mdt]
[  353.167514]  [&amp;lt;ffffffffc10ec9c8&amp;gt;] mdt_intent_policy+0x2f8/0xd10 [mdt]
[  353.168828]  [&amp;lt;ffffffffc0d24e9e&amp;gt;] ldlm_lock_enqueue+0x34e/0xa50 [ptlrpc]
[  353.170114]  [&amp;lt;ffffffffc0d4d503&amp;gt;] ldlm_handle_enqueue0+0x903/0x1520 [ptlrpc]
[  353.171477]  [&amp;lt;ffffffffc0dd39b2&amp;gt;] tgt_enqueue+0x62/0x210 [ptlrpc]
[  353.172752]  [&amp;lt;ffffffffc0ddc2fa&amp;gt;] tgt_request_handle+0xaea/0x1580 [ptlrpc]
[  353.174046]  [&amp;lt;ffffffffc0d7f48b&amp;gt;] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]
[  353.175514]  [&amp;lt;ffffffffc0d82cc4&amp;gt;] ptlrpc_main+0xb14/0x1fb0 [ptlrpc]
[  353.176790]  [&amp;lt;ffffffffb9abb621&amp;gt;] kthread+0xd1/0xe0
[  353.177702]  [&amp;lt;ffffffffba1205f7&amp;gt;] ret_from_fork_nospec_end+0x0/0x39
[  353.179138]  [&amp;lt;ffffffffffffffff&amp;gt;] 0xffffffffffffffff
[  354.435795] Pid: 22466, comm: mdt00_007 3.10.0-862.9.1.el7_lustre.x86_64 #1 SMP Thu Aug 30 12:08:10 UTC 2018
[  354.437623] Call Trace:
[  354.438121]  [&amp;lt;ffffffffc0d3df61&amp;gt;] ldlm_completion_ast+0x5b1/0x920 [ptlrpc]
[  354.439416]  [&amp;lt;ffffffffc0d3ed83&amp;gt;] ldlm_cli_enqueue_local+0x233/0x860 [ptlrpc]
[  354.440744]  [&amp;lt;ffffffffc10e63d7&amp;gt;] mdt_object_local_lock+0x4e7/0xb20 [mdt]
[  354.442189]  [&amp;lt;ffffffffc10e6a80&amp;gt;] mdt_object_lock_internal+0x70/0x330 [mdt]
[  354.443492]  [&amp;lt;ffffffffc10e6df7&amp;gt;] mdt_object_lock_try+0x27/0xb0 [mdt]
[  354.444784]  [&amp;lt;ffffffffc10e8547&amp;gt;] mdt_getattr_name_lock+0x1297/0x1c00 [mdt]
[  354.446055]  [&amp;lt;ffffffffc10efae5&amp;gt;] mdt_intent_getattr+0x2b5/0x480 [mdt]
[  354.447309]  [&amp;lt;ffffffffc10ec9c8&amp;gt;] mdt_intent_policy+0x2f8/0xd10 [mdt]
[  354.448497]  [&amp;lt;ffffffffc0d24e9e&amp;gt;] ldlm_lock_enqueue+0x34e/0xa50 [ptlrpc]
[  354.450007]  [&amp;lt;ffffffffc0d4d503&amp;gt;] ldlm_handle_enqueue0+0x903/0x1520 [ptlrpc]
[  354.451294]  [&amp;lt;ffffffffc0dd39b2&amp;gt;] tgt_enqueue+0x62/0x210 [ptlrpc]
[  354.452545]  [&amp;lt;ffffffffc0ddc2fa&amp;gt;] tgt_request_handle+0xaea/0x1580 [ptlrpc]
[  354.453934]  [&amp;lt;ffffffffc0d7f48b&amp;gt;] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]
[  354.455351]  [&amp;lt;ffffffffc0d82cc4&amp;gt;] ptlrpc_main+0xb14/0x1fb0 [ptlrpc]
[  354.456626]  [&amp;lt;ffffffffb9abb621&amp;gt;] kthread+0xd1/0xe0
[  354.457598]  [&amp;lt;ffffffffba1205f7&amp;gt;] ret_from_fork_nospec_end+0x0/0x39
[  354.458800]  [&amp;lt;ffffffffffffffff&amp;gt;] 0xffffffffffffffff
[  354.459764] LustreError: dumping log to /tmp/lustre-log.1536596852.22466
[  355.715812] LNet: Service thread pid 22486 was inactive for 62.01s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one.
[  355.718240] LustreError: dumping log to /tmp/lustre-log.1536596853.22486
[  355.843781] LNet: Service thread pid 22494 was inactive for 62.11s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one.
[  390.164855] LustreError: 14386:0:(ldlm_lockd.c:251:expired_lock_main()) ### lock callback timer expired after 99s: evicting client at 10.2.4.94@tcp  ns: mdt-lustre-MDT0001_UUID lock: ffff92c1dc98ed80/0x1d42b9ca91f94e31 lrc: 3/0,0 mode: PW/PW res: [0x240000402:0x5d:0x0].0x0 bits 0x40/0x0 rrc: 12 type: IBT flags: 0x60200400000020 nid: 10.2.4.94@tcp remote: 0xef3380579c7186d1 expref: 88 pid: 22490 timeout: 390 lvb_type: 0
[  390.173809] LustreError: 22468:0:(ldlm_lockd.c:1342:ldlm_handle_enqueue0()) ### lock on destroyed export ffff92c1dd765400 ns: mdt-lustre-MDT0001_UUID lock: ffff92c1dc98f680/0x1d42b9ca91f94ed2 lrc: 3/0,0 mode: PW/PW res: [0x240000402:0x5d:0x0].0x0 bits 0x40/0x0 rrc: 9 type: IBT flags: 0x50200000000000 nid: 10.2.4.94@tcp remote: 0xef3380579c718717 expref: 18 pid: 22468 timeout: 0 lvb_type: 0
[  390.179772] LNet: Service thread pid 22468 completed after 101.02s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources).
[  390.180082] LNet: Service thread pid 22484 completed after 100.40s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources).
[  390.180474] LNet: Service thread pid 22466 completed after 97.81s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources).
[  390.180626] LustreError: 14392:0:(ldlm_lockd.c:1342:ldlm_handle_enqueue0()) ### lock on destroyed export ffff92c1dd765400 ns: mdt-lustre-MDT0001_UUID lock: ffff92c1dd03f8c0/0x1d42b9ca91f96aa8 lrc: 3/0,0 mode: PR/PR res: [0x240000402:0x5d:0x0].0x0 bits 0x1b/0x0 rrc: 7 type: IBT flags: 0x50200000000000 nid: 10.2.4.94@tcp remote: 0xef3380579c7192d9 expref: 4 pid: 14392 timeout: 0 lvb_type: 0
[  390.180862] LNet: Service thread pid 14392 completed after 99.10s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources).
[  390.181233] LNet: Service thread pid 22464 completed after 99.30s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources).
[  390.181528] LNet: Service thread pid 22486 completed after 96.47s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources).
[  390.181813] LNet: Service thread pid 22494 completed after 96.45s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources).
[  468.993496] LNetError: 14367:0:(lib-msg.c:794:lnet_is_health_check()) Msg is in inconsistent state, don&apos;t perform health checking (-125, 0)
[  468.994385] LNetError: 14366:0:(lib-msg.c:794:lnet_is_health_check()) Msg is in inconsistent state, don&apos;t perform health checking (-125, 0)
[  555.134662] LustreError: 14422:0:(llog_cat.c:801:llog_cat_cancel_records()) lustre-MDT0000-osp-MDT0001: fail to cancel 1 of 1 llog-records: rc = -116
[  555.136979] LustreError: 14422:0:(llog_cat.c:801:llog_cat_cancel_records()) lustre-MDT0000-osp-MDT0001: fail to cancel 1 of 1 llog-records: rc = -116
[  555.139399] LustreError: 14422:0:(llog_cat.c:801:llog_cat_cancel_records()) lustre-MDT0000-osp-MDT0001: fail to cancel 1 of 1 llog-records: rc = -116
[  555.141809] LustreError: 14422:0:(llog_cat.c:801:llog_cat_cancel_records()) lustre-MDT0000-osp-MDT0001: fail to cancel 1 of 1 llog-records: rc = -116
[  555.144689] LustreError: 14422:0:(llog_cat.c:801:llog_cat_cancel_records()) lustre-MDT0000-osp-MDT0001: fail to cancel 1 of 1 llog-records: rc = -116
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Description continued in the first comment due to JIRA description line limits.&lt;/p&gt;</description>
                <environment></environment>
        <key id="53268">LU-11358</key>
            <summary>racer test 1 hangs in locking with DNE</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="1" iconUrl="https://jira.whamcloud.com/images/icons/statuses/open.png" description="The issue is open and ready for the assignee to start work on it.">Open</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="laisiyao">Lai Siyao</assignee>
                                    <reporter username="jamesanunez">James Nunez</reporter>
                        <labels>
                            <label>DNE</label>
                    </labels>
                <created>Mon, 10 Sep 2018 23:29:06 +0000</created>
                <updated>Mon, 12 Jul 2021 16:44:58 +0000</updated>
                                            <version>Lustre 2.12.0</version>
                    <version>Lustre 2.12.1</version>
                    <version>Lustre 2.12.3</version>
                    <version>Lustre 2.12.6</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>6</watches>
                                                                            <comments>
                            <comment id="233306" author="jamesanunez" created="Mon, 10 Sep 2018 23:37:01 +0000"  >&lt;p&gt;JIRA won&apos;t allow me to add more lines to the description. So, here is the rest of the description:&lt;/p&gt;

&lt;p&gt;On MDS1, 3 (vm4), we see similar call traces in the MDS console log:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[  259.315768] Lustre: DEBUG MARKER: == racer test 1: racer on clients: onyx-30vm1.onyx.whamcloud.com,onyx-30vm2 DURATION=300 ============= 16:25:55 (1536596755)
[  408.065620] LNet: Service thread pid 23632 was inactive for 62.12s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
[  408.067505] Pid: 23632, comm: mdt00_012 3.10.0-862.9.1.el7_lustre.x86_64 #1 SMP Thu Aug 30 12:08:10 UTC 2018
[  408.068612] Call Trace:
[  408.068963]  [&amp;lt;ffffffffc0bb8f61&amp;gt;] ldlm_completion_ast+0x5b1/0x920 [ptlrpc]
[  408.070119]  [&amp;lt;ffffffffc0bb9d83&amp;gt;] ldlm_cli_enqueue_local+0x233/0x860 [ptlrpc]
[  408.071034]  [&amp;lt;ffffffffc0fd93d7&amp;gt;] mdt_object_local_lock+0x4e7/0xb20 [mdt]
[  408.071999]  [&amp;lt;ffffffffc0fd9a80&amp;gt;] mdt_object_lock_internal+0x70/0x330 [mdt]
[  408.073061]  [&amp;lt;ffffffffc0fd9df7&amp;gt;] mdt_object_lock_try+0x27/0xb0 [mdt]
[  408.073925]  [&amp;lt;ffffffffc0ffd098&amp;gt;] mdt_object_open_lock+0x688/0xac0 [mdt]
[  408.074920]  [&amp;lt;ffffffffc10017b3&amp;gt;] mdt_reint_open+0x14e3/0x3160 [mdt]
[  408.075798]  [&amp;lt;ffffffffc0ff6293&amp;gt;] mdt_reint_rec+0x83/0x210 [mdt]
[  408.076651]  [&amp;lt;ffffffffc0fd51e2&amp;gt;] mdt_reint_internal+0x6b2/0xa80 [mdt]
[  408.077560]  [&amp;lt;ffffffffc0fe1922&amp;gt;] mdt_intent_open+0x82/0x350 [mdt]
[  408.078413]  [&amp;lt;ffffffffc0fdf9c8&amp;gt;] mdt_intent_policy+0x2f8/0xd10 [mdt]
[  408.079311]  [&amp;lt;ffffffffc0b9fe9e&amp;gt;] ldlm_lock_enqueue+0x34e/0xa50 [ptlrpc]
[  408.080371]  [&amp;lt;ffffffffc0bc8503&amp;gt;] ldlm_handle_enqueue0+0x903/0x1520 [ptlrpc]
[  408.081507]  [&amp;lt;ffffffffc0c4e9b2&amp;gt;] tgt_enqueue+0x62/0x210 [ptlrpc]
[  408.082415]  [&amp;lt;ffffffffc0c572fa&amp;gt;] tgt_request_handle+0xaea/0x1580 [ptlrpc]
[  408.083497]  [&amp;lt;ffffffffc0bfa48b&amp;gt;] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]
[  408.084776]  [&amp;lt;ffffffffc0bfdcc4&amp;gt;] ptlrpc_main+0xb14/0x1fb0 [ptlrpc]
[  408.085796]  [&amp;lt;ffffffff964bb621&amp;gt;] kthread+0xd1/0xe0
[  408.086716]  [&amp;lt;ffffffff96b205f7&amp;gt;] ret_from_fork_nospec_end+0x0/0x39
[  408.087793]  [&amp;lt;ffffffffffffffff&amp;gt;] 0xffffffffffffffff
[  408.088619] LustreError: dumping log to /tmp/lustre-log.1536596905.23632
[  408.671328] Pid: 23651, comm: mdt00_026 3.10.0-862.9.1.el7_lustre.x86_64 #1 SMP Thu Aug 30 12:08:10 UTC 2018
[  408.672721] Call Trace:
[  408.673309]  [&amp;lt;ffffffffc0bb8f61&amp;gt;] ldlm_completion_ast+0x5b1/0x920 [ptlrpc]
[  408.674856]  [&amp;lt;ffffffffc0bb9d83&amp;gt;] ldlm_cli_enqueue_local+0x233/0x860 [ptlrpc]
[  408.676257]  [&amp;lt;ffffffffc0fd93d7&amp;gt;] mdt_object_local_lock+0x4e7/0xb20 [mdt]
[  408.677598]  [&amp;lt;ffffffffc0fd9a80&amp;gt;] mdt_object_lock_internal+0x70/0x330 [mdt]
[  408.678935]  [&amp;lt;ffffffffc0fdb1c5&amp;gt;] mdt_getattr_name_lock+0xf15/0x1c00 [mdt]
[  408.680486]  [&amp;lt;ffffffffc0fe2ae5&amp;gt;] mdt_intent_getattr+0x2b5/0x480 [mdt]
[  408.681934]  [&amp;lt;ffffffffc0fdf9c8&amp;gt;] mdt_intent_policy+0x2f8/0xd10 [mdt]
[  408.683203]  [&amp;lt;ffffffffc0b9fe9e&amp;gt;] ldlm_lock_enqueue+0x34e/0xa50 [ptlrpc]
[  408.684711]  [&amp;lt;ffffffffc0bc8503&amp;gt;] ldlm_handle_enqueue0+0x903/0x1520 [ptlrpc]
[  408.686102]  [&amp;lt;ffffffffc0c4e9b2&amp;gt;] tgt_enqueue+0x62/0x210 [ptlrpc]
[  408.687452]  [&amp;lt;ffffffffc0c572fa&amp;gt;] tgt_request_handle+0xaea/0x1580 [ptlrpc]
[  408.688845]  [&amp;lt;ffffffffc0bfa48b&amp;gt;] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]
[  408.689943]  [&amp;lt;ffffffffc0bfdcc4&amp;gt;] ptlrpc_main+0xb14/0x1fb0 [ptlrpc]
[  408.690671]  [&amp;lt;ffffffff964bb621&amp;gt;] kthread+0xd1/0xe0
[  408.691215]  [&amp;lt;ffffffff96b205f7&amp;gt;] ret_from_fork_nospec_end+0x0/0x39
[  408.691910]  [&amp;lt;ffffffffffffffff&amp;gt;] 0xffffffffffffffff
[  408.692489] Pid: 23638, comm: mdt00_018 3.10.0-862.9.1.el7_lustre.x86_64 #1 SMP Thu Aug 30 12:08:10 UTC 2018
[  408.693465] Call Trace:
[  408.693756]  [&amp;lt;ffffffffc0bb8f61&amp;gt;] ldlm_completion_ast+0x5b1/0x920 [ptlrpc]
[  408.694549]  [&amp;lt;ffffffffc0bb9d83&amp;gt;] ldlm_cli_enqueue_local+0x233/0x860 [ptlrpc]
[  408.695315]  [&amp;lt;ffffffffc0fd93d7&amp;gt;] mdt_object_local_lock+0x4e7/0xb20 [mdt]
[  408.696111]  [&amp;lt;ffffffffc0fd9a80&amp;gt;] mdt_object_lock_internal+0x70/0x330 [mdt]
[  408.696874]  [&amp;lt;ffffffffc0fd9d9c&amp;gt;] mdt_reint_object_lock+0x2c/0x60 [mdt]
[  408.697605]  [&amp;lt;ffffffffc0feb977&amp;gt;] mdt_reint_striped_lock+0x87/0x4d0 [mdt]
[  408.698334]  [&amp;lt;ffffffffc0ff2036&amp;gt;] mdt_reint_unlink+0x706/0x1490 [mdt]
[  408.699045]  [&amp;lt;ffffffffc0ff6293&amp;gt;] mdt_reint_rec+0x83/0x210 [mdt]
[  408.699719]  [&amp;lt;ffffffffc0fd51e2&amp;gt;] mdt_reint_internal+0x6b2/0xa80 [mdt]
[  408.700429]  [&amp;lt;ffffffffc0fe0447&amp;gt;] mdt_reint+0x67/0x140 [mdt]
[  408.701056]  [&amp;lt;ffffffffc0c572fa&amp;gt;] tgt_request_handle+0xaea/0x1580 [ptlrpc]
[  408.701892]  [&amp;lt;ffffffffc0bfa48b&amp;gt;] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]
[  408.702758]  [&amp;lt;ffffffffc0bfdcc4&amp;gt;] ptlrpc_main+0xb14/0x1fb0 [ptlrpc]
[  408.703467]  [&amp;lt;ffffffff964bb621&amp;gt;] kthread+0xd1/0xe0
[  408.704024]  [&amp;lt;ffffffff96b205f7&amp;gt;] ret_from_fork_nospec_end+0x0/0x39
[  408.704732]  [&amp;lt;ffffffffffffffff&amp;gt;] 0xffffffffffffffff
[  408.705624] Pid: 23625, comm: mdt00_007 3.10.0-862.9.1.el7_lustre.x86_64 #1 SMP Thu Aug 30 12:08:10 UTC 2018
[  408.706776] Call Trace:
[  408.707061]  [&amp;lt;ffffffffc0bb8f61&amp;gt;] ldlm_completion_ast+0x5b1/0x920 [ptlrpc]
[  408.707872]  [&amp;lt;ffffffffc0bb9d83&amp;gt;] ldlm_cli_enqueue_local+0x233/0x860 [ptlrpc]
[  408.708669]  [&amp;lt;ffffffffc0fd93d7&amp;gt;] mdt_object_local_lock+0x4e7/0xb20 [mdt]
[  408.709414]  [&amp;lt;ffffffffc0fd9a80&amp;gt;] mdt_object_lock_internal+0x70/0x330 [mdt]
[  408.710173]  [&amp;lt;ffffffffc0fdaaea&amp;gt;] mdt_getattr_name_lock+0x83a/0x1c00 [mdt]
[  408.710928]  [&amp;lt;ffffffffc0fe2ae5&amp;gt;] mdt_intent_getattr+0x2b5/0x480 [mdt]
[  408.711653]  [&amp;lt;ffffffffc0fdf9c8&amp;gt;] mdt_intent_policy+0x2f8/0xd10 [mdt]
[  408.712341]  [&amp;lt;ffffffffc0b9fe9e&amp;gt;] ldlm_lock_enqueue+0x34e/0xa50 [ptlrpc]
[  408.713115]  [&amp;lt;ffffffffc0bc8503&amp;gt;] ldlm_handle_enqueue0+0x903/0x1520 [ptlrpc]
[  408.713906]  [&amp;lt;ffffffffc0c4e9b2&amp;gt;] tgt_enqueue+0x62/0x210 [ptlrpc]
[  408.714665]  [&amp;lt;ffffffffc0c572fa&amp;gt;] tgt_request_handle+0xaea/0x1580 [ptlrpc]
[  408.715429]  [&amp;lt;ffffffffc0bfa48b&amp;gt;] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]
[  408.716277]  [&amp;lt;ffffffffc0bfdcc4&amp;gt;] ptlrpc_main+0xb14/0x1fb0 [ptlrpc]
[  408.717000]  [&amp;lt;ffffffff964bb621&amp;gt;] kthread+0xd1/0xe0
[  408.717564]  [&amp;lt;ffffffff96b205f7&amp;gt;] ret_from_fork_nospec_end+0x0/0x39
[  408.718255]  [&amp;lt;ffffffffffffffff&amp;gt;] 0xffffffffffffffff
[  408.718856] LustreError: dumping log to /tmp/lustre-log.1536596906.23625
[  409.275793] Pid: 23641, comm: mdt00_021 3.10.0-862.9.1.el7_lustre.x86_64 #1 SMP Thu Aug 30 12:08:10 UTC 2018
[  409.277732] Call Trace:
[  409.278221]  [&amp;lt;ffffffffc0bb8f61&amp;gt;] ldlm_completion_ast+0x5b1/0x920 [ptlrpc]
[  409.280058]  [&amp;lt;ffffffffc0bb9d83&amp;gt;] ldlm_cli_enqueue_local+0x233/0x860 [ptlrpc]
[  409.281479]  [&amp;lt;ffffffffc0fd93d7&amp;gt;] mdt_object_local_lock+0x4e7/0xb20 [mdt]
[  409.282821]  [&amp;lt;ffffffffc0fd9a80&amp;gt;] mdt_object_lock_internal+0x70/0x330 [mdt]
[  409.284152]  [&amp;lt;ffffffffc0fdaaea&amp;gt;] mdt_getattr_name_lock+0x83a/0x1c00 [mdt]
[  409.285482]  [&amp;lt;ffffffffc0fe2ae5&amp;gt;] mdt_intent_getattr+0x2b5/0x480 [mdt]
[  409.286819]  [&amp;lt;ffffffffc0fdf9c8&amp;gt;] mdt_intent_policy+0x2f8/0xd10 [mdt]
[  409.288208]  [&amp;lt;ffffffffc0b9fe9e&amp;gt;] ldlm_lock_enqueue+0x34e/0xa50 [ptlrpc]
[  409.289500]  [&amp;lt;ffffffffc0bc8503&amp;gt;] ldlm_handle_enqueue0+0x903/0x1520 [ptlrpc]
[  409.290818]  [&amp;lt;ffffffffc0c4e9b2&amp;gt;] tgt_enqueue+0x62/0x210 [ptlrpc]
[  409.293953]  [&amp;lt;ffffffffc0c572fa&amp;gt;] tgt_request_handle+0xaea/0x1580 [ptlrpc]
[  409.295296]  [&amp;lt;ffffffffc0bfa48b&amp;gt;] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]
[  409.296786]  [&amp;lt;ffffffffc0bfdcc4&amp;gt;] ptlrpc_main+0xb14/0x1fb0 [ptlrpc]
[  409.298362]  [&amp;lt;ffffffff964bb621&amp;gt;] kthread+0xd1/0xe0
[  409.299510]  [&amp;lt;ffffffff96b205f7&amp;gt;] ret_from_fork_nospec_end+0x0/0x39
[  409.300537]  [&amp;lt;ffffffffffffffff&amp;gt;] 0xffffffffffffffff
[  409.301465] LNet: Service thread pid 23636 was inactive for 62.65s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one.
[  409.302975] LNet: Service thread pid 23633 was inactive for 62.64s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one.
[  445.161620] LustreError: 14404:0:(ldlm_lockd.c:251:expired_lock_main()) ### lock callback timer expired after 99s: evicting client at 10.2.4.94@tcp  ns: mdt-lustre-MDT0000_UUID lock: ffff9c691e927d40/0x6cd7fdda56e1ffdf lrc: 3/0,0 mode: EX/EX res: [0x200000405:0x1f6:0x0].0x0 bits 0x4/0x0 rrc: 7 type: IBT flags: 0x60200400000020 nid: 10.2.4.94@tcp remote: 0xef3380579c724faf expref: 137 pid: 23632 timeout: 445 lvb_type: 0
[  445.169506] LNet: Service thread pid 23632 completed after 99.22s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources).
[  445.169656] LNet: Service thread pid 23638 completed after 99.12s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources).
[  445.170016] LNet: Service thread pid 23651 completed after 98.81s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources).
[  445.172273] LNet: Service thread pid 23633 completed after 98.51s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources).
[  445.179865] LNet: Service thread pid 23625 completed after 98.58s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources).
[  445.180073] LNet: Service thread pid 23641 completed after 98.62s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources).
[  445.190969] LNet: Service thread pid 23636 completed after 98.54s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources).
[  469.495444] LNetError: 14384:0:(lib-msg.c:794:lnet_is_health_check()) Msg is in inconsistent state, don&apos;t perform health checking (-125, 0)
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The call traces presented here have similarities to &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-5886&quot; title=&quot;DNE II Testing: Hung threads on MDS; evictions&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-5886&quot;&gt;&lt;del&gt;LU-5886&lt;/del&gt;&lt;/a&gt; and &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-10525&quot; title=&quot;racer test_1: Failure to initialize cl object&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-10525&quot;&gt;LU-10525&lt;/a&gt;. Although this look similar to &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-10852&quot; title=&quot;racer test 1 hangs in locking&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-10852&quot;&gt;LU-10852&lt;/a&gt;, the call trace for &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-10852&quot; title=&quot;racer test 1 hangs in locking&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-10852&quot;&gt;LU-10852&lt;/a&gt; all have &#8216;osp_md_object_lock()&#8217; in the call stack and the hangs described here do not. &lt;/p&gt;

&lt;p&gt;We&#8217;ve seen these call traces in console logs in other test sessions;&lt;br/&gt;
&lt;a href=&quot;https://testing.whamcloud.com/test_sets/1237377a-b529-11e8-8c12-52540065bddc&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.whamcloud.com/test_sets/1237377a-b529-11e8-8c12-52540065bddc&lt;/a&gt;&lt;/p&gt;
</comment>
                            <comment id="233342" author="pjones" created="Tue, 11 Sep 2018 17:23:57 +0000"  >&lt;p&gt;Lai&lt;/p&gt;

&lt;p&gt;Could you please comment?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="245529" author="sarah" created="Wed, 10 Apr 2019 16:55:00 +0000"  >&lt;p&gt;another one on b2_12&lt;br/&gt;
&lt;a href=&quot;https://testing.whamcloud.com/test_sets/5f8412e0-5b42-11e9-a256-52540065bddc&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.whamcloud.com/test_sets/5f8412e0-5b42-11e9-a256-52540065bddc&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="246417" author="sthiell" created="Fri, 26 Apr 2019 20:06:39 +0000"  >&lt;p&gt;We are running 2.12.0 + patches including &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-11359&quot; title=&quot;racer test 1 times out with client hung in dir_create.sh, ls, &#8230; and MDS in ldlm_completion_ast()&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-11359&quot;&gt;&lt;del&gt;LU-11359&lt;/del&gt;&lt;/a&gt; and just had an event with a lot of these call traces. For a long time, one or two &lt;tt&gt;ldlm_bl&lt;/tt&gt; threads per MDS were taking 100% cpu. The system recovered by itself after we killed the robinhood server (that reads changelogs):&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;MDT0000 and MDT0002 in &lt;span class=&quot;nobr&quot;&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/attachment/32489/32489_fir-md1-s1_compl_ast.log&quot; title=&quot;fir-md1-s1_compl_ast.log attached to LU-11358&quot;&gt;fir-md1-s1_compl_ast.log&lt;sup&gt;&lt;img class=&quot;rendericon&quot; src=&quot;https://jira.whamcloud.com/images/icons/link_attachment_7.gif&quot; height=&quot;7&quot; width=&quot;7&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/sup&gt;&lt;/a&gt;&lt;/span&gt;&lt;/li&gt;
	&lt;li&gt;MDT0001 and MDT0003 in &lt;span class=&quot;nobr&quot;&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/attachment/32490/32490_fir-md1-s2_compl_ast.log&quot; title=&quot;fir-md1-s2_compl_ast.log attached to LU-11358&quot;&gt;fir-md1-s2_compl_ast.log&lt;sup&gt;&lt;img class=&quot;rendericon&quot; src=&quot;https://jira.whamcloud.com/images/icons/link_attachment_7.gif&quot; height=&quot;7&quot; width=&quot;7&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/sup&gt;&lt;/a&gt;&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
</comment>
                            <comment id="255439" author="jamesanunez" created="Thu, 26 Sep 2019 16:23:16 +0000"  >&lt;p&gt;We are seeing this issue or something very close to it running racer with DNE/ZFS; &lt;a href=&quot;https://testing.whamcloud.com/test_sets/d49634dc-df31-11e9-b62b-52540065bddc&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.whamcloud.com/test_sets/d49634dc-df31-11e9-b62b-52540065bddc&lt;/a&gt; .&lt;/p&gt;

&lt;p&gt;On the MDS2 console, we see the additional LNet health error&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt; 
[75858.646568] Lustre: DEBUG MARKER: == racer test 1: racer on clients: trevis-48vm11.trevis.whamcloud.com,trevis-48vm12 DURATION=900 ===== 07:13:24 (1569309204)
[75928.264277] LNetError: 1597:0:(lib-msg.c:816:lnet_is_health_check()) Msg is in inconsistent state, don&apos;t perform health checking (-125, 0)
[75971.956667] Lustre: lustre-MDT0001: Client 826f72cb-2390-4ab9-e63f-b2fb6334f6ca (at 10.9.3.135@tcp) reconnecting
[76114.156474] LNet: Service thread pid 22105 was inactive for 62.10s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
[76114.159530] Pid: 22105, comm: mdt00_014 3.10.0-1062.1.1.el7_lustre.x86_64 #1 SMP Wed Sep 18 04:48:28 UTC 2019
[76114.161351] Call Trace:
[76114.161955]  [&amp;lt;ffffffffc117fae0&amp;gt;] ldlm_completion_ast+0x430/0x860 [ptlrpc]
[76114.163733]  [&amp;lt;ffffffffc1180601&amp;gt;] ldlm_cli_enqueue_local+0x231/0x830 [ptlrpc]
[76114.165034]  [&amp;lt;ffffffffc153252b&amp;gt;] mdt_object_local_lock+0x50b/0xb20 [mdt]
[76114.166470]  [&amp;lt;ffffffffc1532bb0&amp;gt;] mdt_object_lock_internal+0x70/0x360 [mdt]
[76114.167741]  [&amp;lt;ffffffffc1532efc&amp;gt;] mdt_reint_object_lock+0x2c/0x60 [mdt]
[76114.168988]  [&amp;lt;ffffffffc154a81c&amp;gt;] mdt_reint_striped_lock+0x8c/0x510 [mdt]
[76114.170334]  [&amp;lt;ffffffffc154b754&amp;gt;] mdt_reint_unlink+0x764/0x14b0 [mdt]
[76114.171605]  [&amp;lt;ffffffffc1550283&amp;gt;] mdt_reint_rec+0x83/0x210 [mdt]
[76114.172706]  [&amp;lt;ffffffffc152e1b3&amp;gt;] mdt_reint_internal+0x6e3/0xaf0 [mdt]
[76114.173982]  [&amp;lt;ffffffffc1539587&amp;gt;] mdt_reint+0x67/0x140 [mdt]
[76114.175070]  [&amp;lt;ffffffffc121deca&amp;gt;] tgt_request_handle+0xaea/0x1580 [ptlrpc]
[76114.176550]  [&amp;lt;ffffffffc11c330b&amp;gt;] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]
[76114.177947]  [&amp;lt;ffffffffc11c6c6c&amp;gt;] ptlrpc_main+0xb2c/0x1460 [ptlrpc]
[76114.179173]  [&amp;lt;ffffffffb6cc50d1&amp;gt;] kthread+0xd1/0xe0
[76114.180206]  [&amp;lt;ffffffffb738cd37&amp;gt;] ret_from_fork_nospec_end+0x0/0x39
[76114.181483]  [&amp;lt;ffffffffffffffff&amp;gt;] 0xffffffffffffffff
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt; </comment>
                            <comment id="257742" author="jamesanunez" created="Tue, 5 Nov 2019 15:37:09 +0000"  >&lt;p&gt;We&apos;re seeing a very similar hang in a non-DNE environment; &lt;a href=&quot;https://testing.whamcloud.com/test_sets/dcea0894-fd71-11e9-bbc3-52540065bddc&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.whamcloud.com/test_sets/dcea0894-fd71-11e9-bbc3-52540065bddc&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;If this is not the same hang, I can open a new ticket.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="63772">LU-14611</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="54819">LU-11943</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="32489" name="fir-md1-s1_compl_ast.log" size="163088" author="sthiell" created="Fri, 26 Apr 2019 20:03:06 +0000"/>
                            <attachment id="32490" name="fir-md1-s2_compl_ast.log" size="86502" author="sthiell" created="Fri, 26 Apr 2019 20:03:10 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i0025j:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>