<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:41:47 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-4335] MDS hangs due to mdt thread hung/inactive</title>
                <link>https://jira.whamcloud.com/browse/LU-4335</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;mdt threaded report as inacive &amp;gt;200s. mds backup and requires a reboot. But after the reboot the mds hung again and required a reboot and mounting with abort recovery.&lt;/p&gt;

&lt;p&gt;uploading these files to ftp site.&lt;br/&gt;
lustre-log.1385580907.6742.gz &amp;lt;- inital hang&lt;br/&gt;
lustre-log.1385589491.8362.gz &amp;lt;- Recover hang&lt;/p&gt;

&lt;p&gt;We have a crashdump for the recover hang.&lt;/p&gt;



&lt;p&gt;&amp;#8212; initial hang----&lt;br/&gt;
Nov 27 10:30:14 nbp8-mds1 kernel: Lustre: 5687:0:(ldlm_lib.c:952:target_handle_connect()) MGS: connection from 377ef706-73ec-d593-7c7e-ac55fd582ec2@10.151.41.73@o2ib t0 exp (null) cur 1385577014 last 0&lt;br/&gt;
Nov 27 10:30:14 nbp8-mds1 kernel: Lustre: 6643:0:(ldlm_lib.c:952:target_handle_connect()) nbp8-MDT0000: connection from c2dc3e1e-9ec6-88a0-ddcb-182e74734295@10.151.41.73@o2ib t0 exp (null) cur 1385577014 last 0&lt;br/&gt;
Nov 27 10:30:53 nbp8-mds1 kernel: Lustre: nbp8-MDT0000: haven&apos;t heard from client fd1a318e-556c-0397-a95e-9d2ed1998bc0 (at 10.151.41.73@o2ib) in 227 seconds. I think it&apos;s dead, and I am evicting it. exp ffff883f83c7fc00, cur 1385577053 expire 1385576903 last 1385576826&lt;br/&gt;
Nov 27 10:30:53 nbp8-mds1 kernel: Lustre: MGS: haven&apos;t heard from client 581c894b-1381-7f25-567b-57c83bbae311 (at 10.151.41.73@o2ib) in 227 seconds. I think it&apos;s dead, and I am evicting it. exp ffff883fa1722c00, cur 1385577053 expire 1385576903 last 1385576826&lt;br/&gt;
Nov 27 10:34:25 nbp8-mds1 kernel: LustreError: 5515:0:(o2iblnd_cb.c:2992:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds&lt;br/&gt;
Nov 27 10:34:25 nbp8-mds1 kernel: LustreError: 5515:0:(o2iblnd_cb.c:3055:kiblnd_check_conns()) Timed out RDMA with 10.151.32.5@o2ib (152): c: 6, oc: 0, rc: 8&lt;br/&gt;
Nov 27 10:34:52 nbp8-mds1 kernel: Lustre: 5687:0:(ldlm_lib.c:952:target_handle_connect()) MGS: connection from 243f629c-ac60-5881-7a9e-e96a02c21f7d@10.151.32.5@o2ib t0 exp (null) cur 1385577292 last 0&lt;br/&gt;
Nov 27 10:35:00 nbp8-mds1 kernel: LustreError: 5996:0:(quota_ctl.c:330:client_quota_ctl()) ptlrpc_queue_wait failed, rc: -3&lt;br/&gt;
Nov 27 10:35:00 nbp8-mds1 kernel: LustreError: 5996:0:(quota_ctl.c:330:client_quota_ctl()) Skipped 311 previous similar messages&lt;br/&gt;
Nov 27 10:35:39 nbp8-mds1 kernel: Lustre: nbp8-MDT0000: haven&apos;t heard from client 3b493920-c724-792f-5966-cf83ffa67f75 (at 10.151.32.5@o2ib) in 227 seconds. I think it&apos;s dead, and I am evicting it. exp ffff881e3176fc00, cur 1385577339 expire 1385577189 last 1385577112&lt;br/&gt;
Nov 27 11:01:27 nbp8-mds1 kernel: LustreError: 5515:0:(o2iblnd_cb.c:2992:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds&lt;br/&gt;
Nov 27 11:01:27 nbp8-mds1 kernel: LustreError: 5515:0:(o2iblnd_cb.c:3055:kiblnd_check_conns()) Timed out RDMA with 10.151.27.18@o2ib (162): c: 7, oc: 0, rc: 8&lt;br/&gt;
Nov 27 11:03:43 nbp8-mds1 kernel: Lustre: 5686:0:(ldlm_lib.c:952:target_handle_connect()) MGS: connection from 236f42d4-1a48-7ef0-79e8-c65ae40bc796@10.151.27.18@o2ib t0 exp (null) cur 1385579023 last 0&lt;br/&gt;
Nov 27 11:03:43 nbp8-mds1 kernel: Lustre: 5686:0:(ldlm_lib.c:952:target_handle_connect()) Skipped 1 previous similar message&lt;br/&gt;
Nov 27 11:03:43 nbp8-mds1 kernel: Lustre: 7068:0:(ldlm_lib.c:952:target_handle_connect()) nbp8-MDT0000: connection from 5b0af5d5-d93c-d433-ec5c-7b17cd82c746@10.151.27.18@o2ib t0 exp (null) cur 1385579023 last 0&lt;br/&gt;
Nov 27 11:35:07 nbp8-mds1 kernel: Lustre: Service thread pid 6742 was inactive for 200.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:&lt;br/&gt;
Nov 27 11:35:07 nbp8-mds1 kernel: Pid: 6742, comm: mdt_137&lt;br/&gt;
Nov 27 11:35:16 nbp8-mds1 kernel:&lt;br/&gt;
Nov 27 11:35:16 nbp8-mds1 kernel: Call Trace:&lt;br/&gt;
Nov 27 11:35:16 nbp8-mds1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa04f819b&amp;gt;&amp;#93;&lt;/span&gt; ? cfs_set_ptldebug_header+0x2b/0xc0 &lt;span class=&quot;error&quot;&gt;&amp;#91;libcfs&amp;#93;&lt;/span&gt;&lt;br/&gt;
Nov 27 11:35:16 nbp8-mds1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa04f960e&amp;gt;&amp;#93;&lt;/span&gt; cfs_waitq_wait+0xe/0x10 &lt;span class=&quot;error&quot;&gt;&amp;#91;libcfs&amp;#93;&lt;/span&gt;&lt;br/&gt;
Nov 27 11:35:16 nbp8-mds1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0a9f6de&amp;gt;&amp;#93;&lt;/span&gt; qos_statfs_update+0x7fe/0xa70 &lt;span class=&quot;error&quot;&gt;&amp;#91;lov&amp;#93;&lt;/span&gt;&lt;br/&gt;
Nov 27 11:35:16 nbp8-mds1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff8110e42e&amp;gt;&amp;#93;&lt;/span&gt; ? find_get_page+0x1e/0xa0&lt;br/&gt;
Nov 27 11:35:16 nbp8-mds1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff8105fab0&amp;gt;&amp;#93;&lt;/span&gt; ? default_wake_function+0x0/0x20&lt;br/&gt;
Nov 27 11:35:16 nbp8-mds1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0aa00fd&amp;gt;&amp;#93;&lt;/span&gt; alloc_qos+0x1ad/0x21a0 &lt;span class=&quot;error&quot;&gt;&amp;#91;lov&amp;#93;&lt;/span&gt;&lt;br/&gt;
Nov 27 11:35:16 nbp8-mds1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0aa5fdf&amp;gt;&amp;#93;&lt;/span&gt; ? lsm_alloc_plain+0xff/0x930 &lt;span class=&quot;error&quot;&gt;&amp;#91;lov&amp;#93;&lt;/span&gt;&lt;br/&gt;
Nov 27 11:35:16 nbp8-mds1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0aa306c&amp;gt;&amp;#93;&lt;/span&gt; qos_prep_create+0x1ec/0x2380 &lt;span class=&quot;error&quot;&gt;&amp;#91;lov&amp;#93;&lt;/span&gt;&lt;br/&gt;
Nov 27 11:35:22 nbp8-mds1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0a9c63a&amp;gt;&amp;#93;&lt;/span&gt; lov_prep_create_set+0xea/0x390 &lt;span class=&quot;error&quot;&gt;&amp;#91;lov&amp;#93;&lt;/span&gt;&lt;br/&gt;
Nov 27 11:35:22 nbp8-mds1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0a84b0c&amp;gt;&amp;#93;&lt;/span&gt; lov_create+0x1ac/0x1400 &lt;span class=&quot;error&quot;&gt;&amp;#91;lov&amp;#93;&lt;/span&gt;&lt;br/&gt;
Nov 27 11:35:22 nbp8-mds1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0d8b0d6&amp;gt;&amp;#93;&lt;/span&gt; ? mdd_get_md+0x96/0x2f0 &lt;span class=&quot;error&quot;&gt;&amp;#91;mdd&amp;#93;&lt;/span&gt;&lt;br/&gt;
Nov 27 11:35:22 nbp8-mds1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0ea2f13&amp;gt;&amp;#93;&lt;/span&gt; ? osd_object_read_unlock+0x53/0xa0 &lt;span class=&quot;error&quot;&gt;&amp;#91;osd_ldiskfs&amp;#93;&lt;/span&gt;&lt;br/&gt;
Nov 27 11:35:22 nbp8-mds1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0dab916&amp;gt;&amp;#93;&lt;/span&gt; ? mdd_read_unlock+0x26/0x30 &lt;span class=&quot;error&quot;&gt;&amp;#91;mdd&amp;#93;&lt;/span&gt;&lt;br/&gt;
Nov 27 11:35:22 nbp8-mds1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0d8f90e&amp;gt;&amp;#93;&lt;/span&gt; mdd_lov_create+0x9ee/0x1ba0 &lt;span class=&quot;error&quot;&gt;&amp;#91;mdd&amp;#93;&lt;/span&gt;&lt;br/&gt;
Nov 27 11:35:22 nbp8-mds1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0da1871&amp;gt;&amp;#93;&lt;/span&gt; mdd_create+0xf81/0x1a90 &lt;span class=&quot;error&quot;&gt;&amp;#91;mdd&amp;#93;&lt;/span&gt;&lt;br/&gt;
Nov 27 11:35:22 nbp8-mds1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0ea9df3&amp;gt;&amp;#93;&lt;/span&gt; ? osd_oi_lookup+0x83/0x110 &lt;span class=&quot;error&quot;&gt;&amp;#91;osd_ldiskfs&amp;#93;&lt;/span&gt;&lt;br/&gt;
Nov 27 11:35:22 nbp8-mds1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0ea456c&amp;gt;&amp;#93;&lt;/span&gt; ? osd_object_init+0xdc/0x3e0 &lt;span class=&quot;error&quot;&gt;&amp;#91;osd_ldiskfs&amp;#93;&lt;/span&gt;&lt;br/&gt;
Nov 27 11:35:22 nbp8-mds1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0eda3f7&amp;gt;&amp;#93;&lt;/span&gt; cml_create+0x97/0x250 &lt;span class=&quot;error&quot;&gt;&amp;#91;cmm&amp;#93;&lt;/span&gt;&lt;br/&gt;
Nov 27 11:35:23 nbp8-mds1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0e165e1&amp;gt;&amp;#93;&lt;/span&gt; ? mdt_version_get_save+0x91/0xd0 &lt;span class=&quot;error&quot;&gt;&amp;#91;mdt&amp;#93;&lt;/span&gt;&lt;br/&gt;
Nov 27 11:35:23 nbp8-mds1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0e2c06e&amp;gt;&amp;#93;&lt;/span&gt; mdt_reint_open+0x1aae/0x28a0 &lt;span class=&quot;error&quot;&gt;&amp;#91;mdt&amp;#93;&lt;/span&gt;&lt;br/&gt;
Nov 27 11:35:23 nbp8-mds1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa078f724&amp;gt;&amp;#93;&lt;/span&gt; ? lustre_msg_add_version+0x74/0xd0 &lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
Nov 27 11:35:23 nbp8-mds1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0da456e&amp;gt;&amp;#93;&lt;/span&gt; ? md_ucred+0x1e/0x60 &lt;span class=&quot;error&quot;&gt;&amp;#91;mdd&amp;#93;&lt;/span&gt;&lt;br/&gt;
Nov 27 11:35:23 nbp8-mds1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0e14c81&amp;gt;&amp;#93;&lt;/span&gt; mdt_reint_rec+0x41/0xe0 &lt;span class=&quot;error&quot;&gt;&amp;#91;mdt&amp;#93;&lt;/span&gt;&lt;br/&gt;
Nov 27 11:35:23 nbp8-mds1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0e0bed4&amp;gt;&amp;#93;&lt;/span&gt; mdt_reint_internal+0x544/0x8e0 &lt;span class=&quot;error&quot;&gt;&amp;#91;mdt&amp;#93;&lt;/span&gt;&lt;br/&gt;
Nov 27 11:35:23 nbp8-mds1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0e0c53d&amp;gt;&amp;#93;&lt;/span&gt; mdt_intent_reint+0x1ed/0x530 &lt;span class=&quot;error&quot;&gt;&amp;#91;mdt&amp;#93;&lt;/span&gt;&lt;br/&gt;
Nov 27 11:35:23 nbp8-mds1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0e0ac09&amp;gt;&amp;#93;&lt;/span&gt; mdt_intent_policy+0x379/0x690 &lt;span class=&quot;error&quot;&gt;&amp;#91;mdt&amp;#93;&lt;/span&gt;&lt;br/&gt;
Nov 27 11:35:23 nbp8-mds1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa074b351&amp;gt;&amp;#93;&lt;/span&gt; ldlm_lock_enqueue+0x361/0x8f0 &lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
Nov 27 11:35:23 nbp8-mds1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa07711ad&amp;gt;&amp;#93;&lt;/span&gt; ldlm_handle_enqueue0+0x48d/0xf50 &lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
Nov 27 11:35:23 nbp8-mds1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0e0b586&amp;gt;&amp;#93;&lt;/span&gt; mdt_enqueue+0x46/0x130 &lt;span class=&quot;error&quot;&gt;&amp;#91;mdt&amp;#93;&lt;/span&gt;&lt;br/&gt;
Nov 27 11:35:23 nbp8-mds1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0e00772&amp;gt;&amp;#93;&lt;/span&gt; mdt_handle_common+0x932/0x1750 &lt;span class=&quot;error&quot;&gt;&amp;#91;mdt&amp;#93;&lt;/span&gt;&lt;br/&gt;
Nov 27 11:35:23 nbp8-mds1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0e01665&amp;gt;&amp;#93;&lt;/span&gt; mdt_regular_handle+0x15/0x20 &lt;span class=&quot;error&quot;&gt;&amp;#91;mdt&amp;#93;&lt;/span&gt;&lt;br/&gt;
Nov 27 11:35:23 nbp8-mds1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa079fb4e&amp;gt;&amp;#93;&lt;/span&gt; ptlrpc_main+0xc4e/0x1a40 &lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
Nov 27 11:35:23 nbp8-mds1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa079ef00&amp;gt;&amp;#93;&lt;/span&gt; ? ptlrpc_main+0x0/0x1a40 &lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
Nov 27 11:35:23 nbp8-mds1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff8100c0ca&amp;gt;&amp;#93;&lt;/span&gt; child_rip+0xa/0x20&lt;br/&gt;
Nov 27 11:35:23 nbp8-mds1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa079ef00&amp;gt;&amp;#93;&lt;/span&gt; ? ptlrpc_main+0x0/0x1a40 &lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
Nov 27 11:35:23 nbp8-mds1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa079ef00&amp;gt;&amp;#93;&lt;/span&gt; ? ptlrpc_main+0x0/0x1a40 &lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
Nov 27 11:35:23 nbp8-mds1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff8100c0c0&amp;gt;&amp;#93;&lt;/span&gt; ? child_rip+0x0/0x20&lt;br/&gt;
Nov 27 11:35:23 nbp8-mds1 kernel:&lt;br/&gt;
Nov 27 11:35:28 nbp8-mds1 kernel: LustreError: dumping log to /tmp/lustre-log.1385580907.6742&lt;br/&gt;
Nov 27 11:35:28 nbp8-mds1 kernel: Lustre: Service thread pid 6645 was inactive for 200.01s. The thread might be hung, or it might only be slow and will resume l&lt;/p&gt;


&lt;p&gt;&amp;#8212; after reboot hang &amp;#8212;&lt;br/&gt;
ov 27 13:57:46 nbp8-mds1 kernel: LustreError: 6771:0:(ldlm_request.c:91:ldlm_expired_completion_wait()) Skipped 14 previous similar messages&lt;br/&gt;
Nov 27 13:58:11 nbp8-mds1 kernel: Lustre: Service thread pid 8362 was inactive for 200.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:&lt;br/&gt;
Nov 27 13:58:11 nbp8-mds1 kernel: Pid: 8362, comm: mdt_454&lt;br/&gt;
Nov 27 13:58:14 nbp8-mds1 kernel:&lt;br/&gt;
Nov 27 13:58:14 nbp8-mds1 kernel: Call Trace:&lt;br/&gt;
Nov 27 13:58:14 nbp8-mds1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff8151d552&amp;gt;&amp;#93;&lt;/span&gt; schedule_timeout+0x192/0x2e0&lt;br/&gt;
Nov 27 13:58:14 nbp8-mds1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff8107bf80&amp;gt;&amp;#93;&lt;/span&gt; ? process_timeout+0x0/0x10&lt;br/&gt;
Nov 27 13:58:14 nbp8-mds1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0764c60&amp;gt;&amp;#93;&lt;/span&gt; ? ldlm_expired_completion_wait+0x0/0x260 &lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
Nov 27 13:58:14 nbp8-mds1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa04f95e1&amp;gt;&amp;#93;&lt;/span&gt; cfs_waitq_timedwait+0x11/0x20 &lt;span class=&quot;error&quot;&gt;&amp;#91;libcfs&amp;#93;&lt;/span&gt;&lt;br/&gt;
Nov 27 13:58:14 nbp8-mds1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0768d0d&amp;gt;&amp;#93;&lt;/span&gt; ldlm_completion_ast+0x48d/0x720 &lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
Nov 27 13:58:14 nbp8-mds1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff8105fab0&amp;gt;&amp;#93;&lt;/span&gt; ? default_wake_function+0x0/0x20&lt;br/&gt;
Nov 27 13:58:14 nbp8-mds1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0768506&amp;gt;&amp;#93;&lt;/span&gt; ldlm_cli_enqueue_local+0x1e6/0x560 &lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
Nov 27 13:58:14 nbp8-mds1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0768880&amp;gt;&amp;#93;&lt;/span&gt; ? ldlm_completion_ast+0x0/0x720 &lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
Nov 27 13:58:14 nbp8-mds1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0df9e60&amp;gt;&amp;#93;&lt;/span&gt; ? mdt_blocking_ast+0x0/0x2a0 &lt;span class=&quot;error&quot;&gt;&amp;#91;mdt&amp;#93;&lt;/span&gt;&lt;br/&gt;
Nov 27 13:58:14 nbp8-mds1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0dfd2a0&amp;gt;&amp;#93;&lt;/span&gt; mdt_object_lock+0x320/0xb70 &lt;span class=&quot;error&quot;&gt;&amp;#91;mdt&amp;#93;&lt;/span&gt;&lt;br/&gt;
Nov 27 13:58:14 nbp8-mds1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0df9e60&amp;gt;&amp;#93;&lt;/span&gt; ? mdt_blocking_ast+0x0/0x2a0 &lt;span class=&quot;error&quot;&gt;&amp;#91;mdt&amp;#93;&lt;/span&gt;&lt;br/&gt;
Nov 27 13:58:14 nbp8-mds1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0768880&amp;gt;&amp;#93;&lt;/span&gt; ? ldlm_completion_ast+0x0/0x720 &lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
Nov 27 13:58:14 nbp8-mds1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0e0dc62&amp;gt;&amp;#93;&lt;/span&gt; mdt_getattr_name_lock+0xe22/0x1880 &lt;span class=&quot;error&quot;&gt;&amp;#91;mdt&amp;#93;&lt;/span&gt;&lt;br/&gt;
Nov 27 13:58:14 nbp8-mds1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa078eb1d&amp;gt;&amp;#93;&lt;/span&gt; ? lustre_msg_buf+0x5d/0x60 &lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
Nov 27 13:58:14 nbp8-mds1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa07b8486&amp;gt;&amp;#93;&lt;/span&gt; ? __req_capsule_get+0x176/0x750 &lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
Nov 27 13:58:14 nbp8-mds1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0790da4&amp;gt;&amp;#93;&lt;/span&gt; ? lustre_msg_get_flags+0x34/0xb0 &lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
Nov 27 13:58:14 nbp8-mds1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0e0ec1d&amp;gt;&amp;#93;&lt;/span&gt; mdt_intent_getattr+0x2cd/0x4a0 &lt;span class=&quot;error&quot;&gt;&amp;#91;mdt&amp;#93;&lt;/span&gt;&lt;br/&gt;
Nov 27 13:58:14 nbp8-mds1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0e0ac09&amp;gt;&amp;#93;&lt;/span&gt; mdt_intent_policy+0x379/0x690 &lt;span class=&quot;error&quot;&gt;&amp;#91;mdt&amp;#93;&lt;/span&gt;&lt;br/&gt;
Nov 27 13:58:14 nbp8-mds1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa074b351&amp;gt;&amp;#93;&lt;/span&gt; ldlm_lock_enqueue+0x361/0x8f0 &lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
Nov 27 13:58:14 nbp8-mds1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa07711ad&amp;gt;&amp;#93;&lt;/span&gt; ldlm_handle_enqueue0+0x48d/0xf50 &lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
Nov 27 13:58:14 nbp8-mds1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0e0b586&amp;gt;&amp;#93;&lt;/span&gt; mdt_enqueue+0x46/0x130 &lt;span class=&quot;error&quot;&gt;&amp;#91;mdt&amp;#93;&lt;/span&gt;&lt;br/&gt;
Nov 27 13:58:14 nbp8-mds1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0e00772&amp;gt;&amp;#93;&lt;/span&gt; mdt_handle_common+0x932/0x1750 &lt;span class=&quot;error&quot;&gt;&amp;#91;mdt&amp;#93;&lt;/span&gt;&lt;br/&gt;
Nov 27 13:58:14 nbp8-mds1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0e01665&amp;gt;&amp;#93;&lt;/span&gt; mdt_regular_handle+0x15/0x20 &lt;span class=&quot;error&quot;&gt;&amp;#91;mdt&amp;#93;&lt;/span&gt;&lt;br/&gt;
Nov 27 13:58:14 nbp8-mds1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa079fb4e&amp;gt;&amp;#93;&lt;/span&gt; ptlrpc_main+0xc4e/0x1a40 &lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
Nov 27 13:58:14 nbp8-mds1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa079ef00&amp;gt;&amp;#93;&lt;/span&gt; ? ptlrpc_main+0x0/0x1a40 &lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
Nov 27 13:58:14 nbp8-mds1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff8100c0ca&amp;gt;&amp;#93;&lt;/span&gt; child_rip+0xa/0x20&lt;br/&gt;
Nov 27 13:58:14 nbp8-mds1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa079ef00&amp;gt;&amp;#93;&lt;/span&gt; ? ptlrpc_main+0x0/0x1a40 &lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
Nov 27 13:58:14 nbp8-mds1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa079ef00&amp;gt;&amp;#93;&lt;/span&gt; ? ptlrpc_main+0x0/0x1a40 &lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
Nov 27 13:58:15 nbp8-mds1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff8100c0c0&amp;gt;&amp;#93;&lt;/span&gt; ? child_rip+0x0/0x20&lt;br/&gt;
Nov 27 13:58:15 nbp8-mds1 kernel:&lt;br/&gt;
Nov 27 13:58:15 nbp8-mds1 kernel: LustreError: dumping log to /tmp/lustre-log.1385589491.8362&lt;/p&gt;
</description>
                <environment></environment>
        <key id="22294">LU-4335</key>
            <summary>MDS hangs due to mdt thread hung/inactive</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="2" iconUrl="https://jira.whamcloud.com/images/icons/priorities/critical.svg">Critical</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="5">Cannot Reproduce</resolution>
                                        <assignee username="niu">Niu Yawei</assignee>
                                    <reporter username="mhanafi">Mahmoud Hanafi</reporter>
                        <labels>
                    </labels>
                <created>Tue, 3 Dec 2013 00:30:55 +0000</created>
                <updated>Wed, 29 Oct 2014 16:29:45 +0000</updated>
                            <resolved>Wed, 29 Oct 2014 16:29:45 +0000</resolved>
                                    <version>Lustre 2.1.5</version>
                                                        <due></due>
                            <votes>1</votes>
                                    <watches>8</watches>
                                                                            <comments>
                            <comment id="72664" author="mhanafi" created="Tue, 3 Dec 2013 00:35:01 +0000"  >&lt;p&gt;Our source is at &lt;a href=&quot;https://github.com/jlan/lustre-nas&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://github.com/jlan/lustre-nas&lt;/a&gt;. Running version was 2.1.5-2nasS&lt;/p&gt;</comment>
                            <comment id="72691" author="pjones" created="Tue, 3 Dec 2013 12:36:35 +0000"  >&lt;p&gt;Niu&lt;/p&gt;

&lt;p&gt;Could you please comment on this one?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="72759" author="mhanafi" created="Tue, 3 Dec 2013 22:53:20 +0000"  >&lt;p&gt;Here is console log of the same system hanging after recover. One thing to note we have turn quota off on this filesystem. &lt;/p&gt;

&lt;ol&gt;
	&lt;li&gt;lfs quota -u mhanafi /nobackupp8&lt;br/&gt;
user quotas are not enabled.&lt;/li&gt;
&lt;/ol&gt;


&lt;ol&gt;
	&lt;li&gt;lfs quota -g css /nobackupp8&lt;br/&gt;
group quotas are not enabled.&lt;/li&gt;
&lt;/ol&gt;



&lt;p&gt;Lustre: 6851:0:(ldlm_lib.c:952:target_handle_connect()) nbp8-MDT0000: connection from 87e9617b-6e16-e700-13c0-c4c0d2508a27@10.151.32.180@o2ib recovering/t181073412670 exp (null) cur 1385588905 last 0&lt;br/&gt;
Lustre: 6851:0:(ldlm_lib.c:952:target_handle_connect()) Skipped 1518 previous similar messages&lt;br/&gt;
Lustre: nbp8-MDT0000: Denying connection for new client 10.151.46.187@o2ib (at 24819010-1767-a13e-301e-056fd24f419a), waiting for 209 clients in recovery for 10:10&lt;br/&gt;
Lustre: Skipped 1516 previous similar messages&lt;br/&gt;
Lustre: nbp8-MDT0000: disconnecting 1643 stale clients&lt;br/&gt;
Lustre: nbp8-MDT0000: sending delayed replies to recovered clients&lt;br/&gt;
Lustre: MDS mdd_obd-nbp8-MDT0000: nbp8-OST003c_UUID now active, resetting orphans&lt;br/&gt;
Lustre: MDS mdd_obd-nbp8-MDT0000: nbp8-OST0043_UUID now active, resetting orphans&lt;br/&gt;
Lustre: Skipped 19 previous similar messages&lt;br/&gt;
LustreError: 7607:0:(quota_ctl.c:330:client_quota_ctl()) ptlrpc_queue_wait failed, rc: -3&lt;br/&gt;
LustreError: 7613:0:(quota_ctl.c:330:client_quota_ctl()) ptlrpc_queue_wait failed, rc: -3&lt;br/&gt;
LustreError: 7607:0:(quota_ctl.c:330:client_quota_ctl()) Skipped 1217 previous similar messages&lt;br/&gt;
LustreError: 7836:0:(quota_ctl.c:330:client_quota_ctl()) ptlrpc_queue_wait failed, rc: -3&lt;br/&gt;
LustreError: 7836:0:(quota_ctl.c:330:client_quota_ctl()) Skipped 19290 previous similar messages&lt;br/&gt;
LustreError: 7622:0:(quota_master.c:1727:qmaster_recovery_main()) mdd_obd-nbp8-MDT0000: qmaster recovery failed for uid 30242 rc:-3)&lt;br/&gt;
LustreError: 7614:0:(quota_master.c:1727:qmaster_recovery_main()) mdd_obd-nbp8-MDT0000: qmaster recovery failed for uid 4127 rc:-3)&lt;br/&gt;
LustreError: 7673:0:(quota_master.c:1727:qmaster_recovery_main()) mdd_obd-nbp8-MDT0000: qmaster recovery failed for uid 30193 rc:-3)&lt;br/&gt;
LustreError: 7673:0:(quota_master.c:1727:qmaster_recovery_main()) Skipped 22 previous similar messages&lt;br/&gt;
LustreError: 7818:0:(quota_ctl.c:330:client_quota_ctl()) ptlrpc_queue_wait failed, rc: -3&lt;br/&gt;
LustreError: 7818:0:(quota_ctl.c:330:client_quota_ctl()) Skipped 43032 previous similar messages&lt;br/&gt;
LustreError: 7902:0:(quota_master.c:1727:qmaster_recovery_main()) mdd_obd-nbp8-MDT0000: qmaster recovery failed for uid 11816 rc:-3)&lt;br/&gt;
LustreError: 7902:0:(quota_master.c:1727:qmaster_recovery_main()) Skipped 143 previous similar messages&lt;br/&gt;
LustreError: 6794:0:(ldlm_request.c:91:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1385589017, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-ffff883fcadeb000 lock: ffff883f5fec5900/0x8702b298b8f52a74 lrc: 3/1,0 mode: --/PR res: 8885893120/2 bits 0x3 rrc: 11 type: IBT flags: 0x4004000 remote: 0x0 expref: -99 pid: 6794 timeout: 0&lt;br/&gt;
LustreError: 6826:0:(ldlm_request.c:91:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1385589017, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-ffff883fcadeb000 lock: ffff883f8ae696c0/0x8702b298b8f52cb9 lrc: 3/1,0 mode: --/PR res: 8885893120/2 bits 0x3 rrc: 11 type: IBT flags: 0x4004000 remote: 0x0 expref: -99 pid: 6826 timeout: 0&lt;br/&gt;
LustreError: 6826:0:(ldlm_request.c:91:ldlm_expired_completion_wait()) Skipped 1 previous similar message&lt;br/&gt;
LustreError: 6794:0:(ldlm_request.c:91:ldlm_expired_completion_wait()) Skipped 62 previous similar messages&lt;br/&gt;
LustreError: dumping log to /tmp/lustre-log.1385589318.6876&lt;br/&gt;
LustreError: 6982:0:(ldlm_request.c:91:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1385589018, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-ffff883fcadeb000 lock: ffff883f77321000/0x8702b298b8f6ba77 lrc: 3/1,0 mode: --/PR res: 77032705/3805586675 bits 0x3 rrc: 913 type: IBT flags: 0x4004000 remote: 0x0 expref: -99 pid: 6982 timeout: 0&lt;br/&gt;
LustreError: 6982:0:(ldlm_request.c:91:ldlm_expired_completion_wait()) Skipped 7 previous similar messages&lt;br/&gt;
LustreError: 7028:0:(ldlm_request.c:91:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1385589019, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-ffff883fcadeb000 lock: ffff883f76160d80/0x8702b298b8f9b156 lrc: 3/1,0 mode: --/PR res: 77032705/3805586675 bits 0x3 rrc: 913 type: IBT flags: 0x4004000 remote: 0x0 expref: -99 pid: 7028 timeout: 0&lt;br/&gt;
LustreError: 7028:0:(ldlm_request.c:91:ldlm_expired_completion_wait()) Skipped 13 previous similar messages&lt;br/&gt;
LustreError: 6989:0:(ldlm_request.c:91:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1385589021, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-ffff883fcadeb000 lock: ffff883f610786c0/0x8702b298b8ff4cbf lrc: 3/1,0 mode: --/PR res: 77032705/3805586675 bits 0x3 rrc: 913 type: IBT flags: 0x4004000 remote: 0x0 expref: -99 pid: 6989 timeout: 0&lt;br/&gt;
LustreError: 6989:0:(ldlm_request.c:91:ldlm_expired_completion_wait()) Skipped 20 previous similar messages&lt;br/&gt;
LustreError: 7098:0:(ldlm_request.c:91:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1385589025, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-ffff883fcadeb000 lock: ffff883fead1c480/0x8702b298b90a8a36 lrc: 3/1,0 mode: --/PR res: 77032705/3805586675 bits 0x3 rrc: 913 type: IBT flags: 0x4004000 remote: 0x0 expref: -99 pid: 7098 timeout: 0&lt;br/&gt;
LustreError: 7098:0:(ldlm_request.c:91:ldlm_expired_completion_wait()) Skipped 12 previous similar messages&lt;br/&gt;
LustreError: 6235:0:(ldlm_request.c:91:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1385589036, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-ffff883fcadeb000 lock: ffff883f76a1c900/0x8702b298b929da01 lrc: 3/1,0 mode: --/PR res: 77032705/3805586675 bits 0x3 rrc: 913 type: IBT flags: 0x4004000 remote: 0x0 expref: -99 pid: 6235 timeout: 0&lt;br/&gt;
LustreError: 6235:0:(ldlm_request.c:91:ldlm_expired_completion_wait()) Skipped 9 previous similar messages&lt;br/&gt;
LustreError: 6884:0:(ldlm_request.c:91:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1385589052, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-ffff883fcadeb000 lock: ffff881e3cc1cb40/0x8702b298b94d4e22 lrc: 3/1,0 mode: --/PR res: 77032705/3805586675 bits 0x3 rrc: 915 type: IBT flags: 0x4004000 remote: 0x0 expref: -99 pid: 6884 timeout: 0&lt;br/&gt;
LustreError: 6884:0:(ldlm_request.c:91:ldlm_expired_completion_wait()) Skipped 13 previous similar messages&lt;br/&gt;
LustreError: 6961:0:(ldlm_request.c:91:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1385589094, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-ffff883fcadeb000 lock: ffff883f81faf000/0x8702b298b94e7b6f lrc: 3/1,0 mode: --/PR res: 77032705/3805586675 bits 0x3 rrc: 931 type: IBT flags: 0x4004000 remote: 0x0 expref: -99 pid: 6961 timeout: 0&lt;br/&gt;
LustreError: 6961:0:(ldlm_request.c:91:ldlm_expired_completion_wait()) Skipped 167 previous similar messages&lt;br/&gt;
Lustre: 8467:0:(ldlm_lib.c:952:target_handle_connect()) nbp8-MDT0000: connection from 29c09c23-d6c1-7349-2728-df8c815d4c8a@10.151.34.98@o2ib t181060241980 exp (null) cur 1385589462 last 0&lt;br/&gt;
Lustre: 8467:0:(ldlm_lib.c:952:target_handle_connect()) Skipped 1490 previous similar messages&lt;br/&gt;
LustreError: 6771:0:(ldlm_request.c:91:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1385589166, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-ffff883fcadeb000 lock: ffff881e4ac88d80/0x8702b298b94ec86f lrc: 3/1,0 mode: --/PR res: 77032705/3805586675 bits 0x3 rrc: 939 type: IBT flags: 0x4004000 remote: 0x0 expref: -99 pid: 6771 timeout: 0&lt;br/&gt;
LustreError: 6771:0:(ldlm_request.c:91:ldlm_expired_completion_wait()) Skipped 14 previous similar messages&lt;br/&gt;
Lustre: Service thread pid 8362 was inactive for 200.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:&lt;br/&gt;
Pid: 8362, comm: mdt_454&lt;/p&gt;

&lt;p&gt;LustreError: 6400:0:(mdt_open.c:1314:mdt_reint_open()) @@@ OPEN &amp;amp; CREAT not in open replay.  req@ffff881e44969400 x1449477118027704/t0(180696561882) o101-&amp;gt;78e59f5c-45f5-0184-142c-ed51e4a48000@10.151.50.23@o2ib:0/0 lens 712/4936 e 0 to 0 dl 1385589463 ref 1 fl Interpret:/4/0 rc 0/0&lt;br/&gt;
LustreError: 6400:0:(mdt_open.c:1314:mdt_reint_open()) Skipped 8446 previous similar messages&lt;br/&gt;
Lustre: nbp8-MDT0000: Denying connection for new client 10.151.40.177@o2ib (at bdedff9d-8f09-576d-7f58-9d0094abc313), waiting for 965 clients in recovery for 14:12&lt;br/&gt;
Lustre: Skipped 137 previous similar messages&lt;br/&gt;
Lustre: 6342:0:(ldlm_lib.c:952:target_handle_connect()) nbp8-MDT0000: connection from beb87751-b41b-41c0-bfcb-75b4366a7947@10.153.1.48@o2ib233 recovering/t0 exp ffff883fb3381000 cur 1385588713 last 1385588655&lt;br/&gt;
Lustre: 6342:0:(ldlm_lib.c:952:target_handle_connect()) Skipped 14289 previous similar messages&lt;br/&gt;
LustreError: 6367:0:(mdt_open.c:1314:mdt_reint_open()) @@@ OPEN &amp;amp; CREAT not in open replay.  req@ffff881e414c9c00 x1449981538622728/t0(180476772260) o101-&amp;gt;816e39dc-d0f1-19d4-c94f-fb65f7237bef@10.151.44.134@o2ib:0/0 lens 712/4936 e 0 to 0 dl 1385589479 ref 1 fl Interpret:/4/0 rc 0/0&lt;br/&gt;
LustreError: 6367:0:(mdt_open.c:1314:mdt_reint_open()) Skipped 15806 previous similar messages&lt;br/&gt;
Lustre: nbp8-MDT0000: Denying connection for new client 10.151.47.216@o2ib (at 101285a6-9cd9-2721-9df7-40da5c311017), waiting for 649 clients in recovery for 13:56&lt;br/&gt;
Lustre: Skipped 159 previous similar messages&lt;br/&gt;
LustreError: 6371:0:(mdt_open.c:1314:mdt_reint_open()) @@@ OPEN &amp;amp; CREAT not in open replay.  req@ffff883f60a91800 x1452353705630036/t0(180790013583) o101-&amp;gt;89a6c2ec-6b29-6fd2-59c8-8f63a0841cdf@10.151.18.205@o2ib:0/0 lens 744/4936 e 0 to 0 dl 1385589511 ref 1 fl Interpret:/4/0 rc 0/0&lt;br/&gt;
LustreError: 6371:0:(mdt_open.c:1314:mdt_reint_open()) Skipped 22830 previous similar messages&lt;br/&gt;
Lustre: nbp8-MDT0000: Denying connection for new client 10.151.49.135@o2ib (at 36f753ed-36ea-320f-74ee-115d64eccd14), waiting for 1197 clients in recovery for 13:23&lt;br/&gt;
Lustre: Skipped 332 previous similar messages&lt;br/&gt;
Lustre: 7000:0:(ldlm_lib.c:952:target_handle_connect()) nbp8-MDT0000: connection from 101285a6-9cd9-2721-9df7-40da5c311017@10.151.47.216@o2ib recovering/t180859536925 exp (null) cur 1385588777 last 0&lt;br/&gt;
Lustre: 7000:0:(ldlm_lib.c:952:target_handle_connect()) Skipped 10109 previous similar messages&lt;br/&gt;
LustreError: 6316:0:(mdt_open.c:1314:mdt_reint_open()) @@@ OPEN &amp;amp; CREAT not in open replay.  req@ffff881e3abeb400 x1452345264923640/t0(180596528255) o101-&amp;gt;d30b2de1-2db4-65f9-e7db-5607bcfd9cd0@10.151.19.38@o2ib:0/0 lens 744/4936 e 0 to 0 dl 1385589575 ref 1 fl Interpret:/4/0 rc 0/0&lt;br/&gt;
LustreError: 6316:0:(mdt_open.c:1314:mdt_reint_open()) Skipped 138328 previous similar messages&lt;br/&gt;
Lustre: nbp8-MDT0000: Denying connection for new client 10.151.29.245@o2ib (at 8e68d8e7-cd33-7f68-d453-41814d65f55a), waiting for 784 clients in recovery for 12:18&lt;br/&gt;
Lustre: Skipped 867 previous similar messages&lt;br/&gt;
Lustre: 6851:0:(ldlm_lib.c:952:target_handle_connect()) nbp8-MDT0000: connection from 87e9617b-6e16-e700-13c0-c4c0d2508a27@10.151.32.180@o2ib recovering/t181073412670 exp (null) cur 1385588905 last 0&lt;br/&gt;
Lustre: 6851:0:(ldlm_lib.c:952:target_handle_connect()) Skipped 1518 previous similar messages&lt;br/&gt;
Lustre: nbp8-MDT0000: Denying connection for new client 10.151.46.187@o2ib (at 24819010-1767-a13e-301e-056fd24f419a), waiting for 209 clients in recovery for 10:10&lt;br/&gt;
Lustre: Skipped 1516 previous similar messages&lt;br/&gt;
Lustre: nbp8-MDT0000: disconnecting 1643 stale clients&lt;br/&gt;
Lustre: nbp8-MDT0000: sending delayed replies to recovered clients&lt;br/&gt;
Lustre: MDS mdd_obd-nbp8-MDT0000: nbp8-OST003c_UUID now active, resetting orphans&lt;br/&gt;
Lustre: MDS mdd_obd-nbp8-MDT0000: nbp8-OST0043_UUID now active, resetting orphans&lt;br/&gt;
Lustre: Skipped 19 previous similar messages&lt;br/&gt;
LustreError: 7607:0:(quota_ctl.c:330:client_quota_ctl()) ptlrpc_queue_wait failed, rc: -3&lt;br/&gt;
LustreError: 7613:0:(quota_ctl.c:330:client_quota_ctl()) ptlrpc_queue_wait failed, rc: -3&lt;br/&gt;
LustreError: 7607:0:(quota_ctl.c:330:client_quota_ctl()) Skipped 1217 previous similar messages&lt;br/&gt;
LustreError: 7836:0:(quota_ctl.c:330:client_quota_ctl()) ptlrpc_queue_wait failed, rc: -3&lt;br/&gt;
LustreError: 7836:0:(quota_ctl.c:330:client_quota_ctl()) Skipped 19290 previous similar messages&lt;br/&gt;
LustreError: 7622:0:(quota_master.c:1727:qmaster_recovery_main()) mdd_obd-nbp8-MDT0000: qmaster recovery failed for uid 30242 rc:-3)&lt;br/&gt;
LustreError: 7614:0:(quota_master.c:1727:qmaster_recovery_main()) mdd_obd-nbp8-MDT0000: qmaster recovery failed for uid 4127 rc:-3)&lt;br/&gt;
LustreError: 7673:0:(quota_master.c:1727:qmaster_recovery_main()) mdd_obd-nbp8-MDT0000: qmaster recovery failed for uid 30193 rc:-3)&lt;br/&gt;
LustreError: 7673:0:(quota_master.c:1727:qmaster_recovery_main()) Skipped 22 previous similar messages&lt;br/&gt;
LustreError: 7818:0:(quota_ctl.c:330:client_quota_ctl()) ptlrpc_queue_wait failed, rc: -3&lt;br/&gt;
LustreError: 7818:0:(quota_ctl.c:330:client_quota_ctl()) Skipped 43032 previous similar messages&lt;br/&gt;
LustreError: 7902:0:(quota_master.c:1727:qmaster_recovery_main()) mdd_obd-nbp8-MDT0000: qmaster recovery failed for uid 11816 rc:-3)&lt;br/&gt;
LustreError: 7902:0:(quota_master.c:1727:qmaster_recovery_main()) Skipped 143 previous similar messages&lt;br/&gt;
LustreError: 6794:0:(ldlm_request.c:91:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1385589017, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-ffff883fcadeb000 lock: ffff883f5fec5900/0x8702b298b8f52a74 lrc: 3/1,0 mode: --/PR res: 8885893120/2 bits 0x3 rrc: 11 type: IBT flags: 0x4004000 remote: 0x0 expref: -99 pid: 6794 timeout: 0&lt;br/&gt;
LustreError: 6826:0:(ldlm_request.c:91:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1385589017, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-ffff883fcadeb000 lock: ffff883f8ae696c0/0x8702b298b8f52cb9 lrc: 3/1,0 mode: --/PR res: 8885893120/2 bits 0x3 rrc: 11 type: IBT flags: 0x4004000 remote: 0x0 expref: -99 pid: 6826 timeout: 0&lt;br/&gt;
LustreError: 6826:0:(ldlm_request.c:91:ldlm_expired_completion_wait()) Skipped 1 previous similar message&lt;br/&gt;
LustreError: 6794:0:(ldlm_request.c:91:ldlm_expired_completion_wait()) Skipped 62 previous similar messages&lt;br/&gt;
LustreError: dumping log to /tmp/lustre-log.1385589318.6876&lt;br/&gt;
LustreError: 6982:0:(ldlm_request.c:91:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1385589018, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-ffff883fcadeb000 lock: ffff883f77321000/0x8702b298b8f6ba77 lrc: 3/1,0 mode: --/PR res: 77032705/3805586675 bits 0x3 rrc: 913 type: IBT flags: 0x4004000 remote: 0x0 expref: -99 pid: 6982 timeout: 0&lt;br/&gt;
LustreError: 6982:0:(ldlm_request.c:91:ldlm_expired_completion_wait()) Skipped 7 previous similar messages&lt;br/&gt;
LustreError: 7028:0:(ldlm_request.c:91:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1385589019, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-ffff883fcadeb000 lock: ffff883f76160d80/0x8702b298b8f9b156 lrc: 3/1,0 mode: --/PR res: 77032705/3805586675 bits 0x3 rrc: 913 type: IBT flags: 0x4004000 remote: 0x0 expref: -99 pid: 7028 timeout: 0&lt;br/&gt;
LustreError: 7028:0:(ldlm_request.c:91:ldlm_expired_completion_wait()) Skipped 13 previous similar messages&lt;br/&gt;
LustreError: 6989:0:(ldlm_request.c:91:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1385589021, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-ffff883fcadeb000 lock: ffff883f610786c0/0x8702b298b8ff4cbf lrc: 3/1,0 mode: --/PR res: 77032705/3805586675 bits 0x3 rrc: 913 type: IBT flags: 0x4004000 remote: 0x0 expref: -99 pid: 6989 timeout: 0&lt;br/&gt;
LustreError: 6989:0:(ldlm_request.c:91:ldlm_expired_completion_wait()) Skipped 20 previous similar messages&lt;br/&gt;
LustreError: 7098:0:(ldlm_request.c:91:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1385589025, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-ffff883fcadeb000 lock: ffff883fead1c480/0x8702b298b90a8a36 lrc: 3/1,0 mode: --/PR res: 77032705/3805586675 bits 0x3 rrc: 913 type: IBT flags: 0x4004000 remote: 0x0 expref: -99 pid: 7098 timeout: 0&lt;br/&gt;
LustreError: 7098:0:(ldlm_request.c:91:ldlm_expired_completion_wait()) Skipped 12 previous similar messages&lt;br/&gt;
LustreError: 6235:0:(ldlm_request.c:91:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1385589036, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-ffff883fcadeb000 lock: ffff883f76a1c900/0x8702b298b929da01 lrc: 3/1,0 mode: --/PR res: 77032705/3805586675 bits 0x3 rrc: 913 type: IBT flags: 0x4004000 remote: 0x0 expref: -99 pid: 6235 timeout: 0&lt;br/&gt;
LustreError: 6235:0:(ldlm_request.c:91:ldlm_expired_completion_wait()) Skipped 9 previous similar messages&lt;br/&gt;
LustreError: 6884:0:(ldlm_request.c:91:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1385589052, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-ffff883fcadeb000 lock: ffff881e3cc1cb40/0x8702b298b94d4e22 lrc: 3/1,0 mode: --/PR res: 77032705/3805586675 bits 0x3 rrc: 915 type: IBT flags: 0x4004000 remote: 0x0 expref: -99 pid: 6884 timeout: 0&lt;br/&gt;
LustreError: 6884:0:(ldlm_request.c:91:ldlm_expired_completion_wait()) Skipped 13 previous similar messages&lt;br/&gt;
LustreError: 6961:0:(ldlm_request.c:91:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1385589094, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-ffff883fcadeb000 lock: ffff883f81faf000/0x8702b298b94e7b6f lrc: 3/1,0 mode: --/PR res: 77032705/3805586675 bits 0x3 rrc: 931 type: IBT flags: 0x4004000 remote: 0x0 expref: -99 pid: 6961 timeout: 0&lt;br/&gt;
LustreError: 6961:0:(ldlm_request.c:91:ldlm_expired_completion_wait()) Skipped 167 previous similar messages&lt;br/&gt;
Lustre: 8467:0:(ldlm_lib.c:952:target_handle_connect()) nbp8-MDT0000: connection from 29c09c23-d6c1-7349-2728-df8c815d4c8a@10.151.34.98@o2ib t181060241980 exp (null) cur 1385589462 last 0&lt;br/&gt;
Lustre: 8467:0:(ldlm_lib.c:952:target_handle_connect()) Skipped 1490 previous similar messages&lt;br/&gt;
LustreError: 6771:0:(ldlm_request.c:91:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1385589166, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-ffff883fcadeb000 lock: ffff881e4ac88d80/0x8702b298b94ec86f lrc: 3/1,0 mode: --/PR res: 77032705/3805586675 bits 0x3 rrc: 939 type: IBT flags: 0x4004000 remote: 0x0 expref: -99 pid: 6771 timeout: 0&lt;br/&gt;
LustreError: 6771:0:(ldlm_request.c:91:ldlm_expired_completion_wait()) Skipped 14 previous similar messages&lt;br/&gt;
Lustre: Service thread pid 8362 was inactive for 200.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:&lt;br/&gt;
Pid: 8362, comm: mdt_454&lt;/p&gt;

&lt;p&gt;Call Trace:&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff8151d552&amp;gt;&amp;#93;&lt;/span&gt; schedule_timeout+0x192/0x2e0&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff8107bf80&amp;gt;&amp;#93;&lt;/span&gt; ? process_timeout+0x0/0x10&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0764c60&amp;gt;&amp;#93;&lt;/span&gt; ? ldlm_expired_completion_wait+0x0/0x260 &lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa04f95e1&amp;gt;&amp;#93;&lt;/span&gt; cfs_waitq_timedwait+0x11/0x20 &lt;span class=&quot;error&quot;&gt;&amp;#91;libcfs&amp;#93;&lt;/span&gt;&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0768d0d&amp;gt;&amp;#93;&lt;/span&gt; ldlm_completion_ast+0x48d/0x720 &lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff8105fab0&amp;gt;&amp;#93;&lt;/span&gt; ? default_wake_function+0x0/0x20&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0768506&amp;gt;&amp;#93;&lt;/span&gt; ldlm_cli_enqueue_local+0x1e6/0x560 &lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0768880&amp;gt;&amp;#93;&lt;/span&gt; ? ldlm_completion_ast+0x0/0x720 &lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0df9e60&amp;gt;&amp;#93;&lt;/span&gt; ? mdt_blocking_ast+0x0/0x2a0 &lt;span class=&quot;error&quot;&gt;&amp;#91;mdt&amp;#93;&lt;/span&gt;&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0dfd2a0&amp;gt;&amp;#93;&lt;/span&gt; mdt_object_lock+0x320/0xb70 &lt;span class=&quot;error&quot;&gt;&amp;#91;mdt&amp;#93;&lt;/span&gt;&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0df9e60&amp;gt;&amp;#93;&lt;/span&gt; ? mdt_blocking_ast+0x0/0x2a0 &lt;span class=&quot;error&quot;&gt;&amp;#91;mdt&amp;#93;&lt;/span&gt;&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0768880&amp;gt;&amp;#93;&lt;/span&gt; ? ldlm_completion_ast+0x0/0x720 &lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0e0dc62&amp;gt;&amp;#93;&lt;/span&gt; mdt_getattr_name_lock+0xe22/0x1880 &lt;span class=&quot;error&quot;&gt;&amp;#91;mdt&amp;#93;&lt;/span&gt;&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa078eb1d&amp;gt;&amp;#93;&lt;/span&gt; ? lustre_msg_buf+0x5d/0x60 &lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa07b8486&amp;gt;&amp;#93;&lt;/span&gt; ? __req_capsule_get+0x176/0x750 &lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0790da4&amp;gt;&amp;#93;&lt;/span&gt; ? lustre_msg_get_flags+0x34/0xb0 &lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0e0ec1d&amp;gt;&amp;#93;&lt;/span&gt; mdt_intent_getattr+0x2cd/0x4a0 &lt;span class=&quot;error&quot;&gt;&amp;#91;mdt&amp;#93;&lt;/span&gt;&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0e0ac09&amp;gt;&amp;#93;&lt;/span&gt; mdt_intent_policy+0x379/0x690 &lt;span class=&quot;error&quot;&gt;&amp;#91;mdt&amp;#93;&lt;/span&gt;&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa074b351&amp;gt;&amp;#93;&lt;/span&gt; ldlm_lock_enqueue+0x361/0x8f0 &lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa07711ad&amp;gt;&amp;#93;&lt;/span&gt; ldlm_handle_enqueue0+0x48d/0xf50 &lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0e0b586&amp;gt;&amp;#93;&lt;/span&gt; mdt_enqueue+0x46/0x130 &lt;span class=&quot;error&quot;&gt;&amp;#91;mdt&amp;#93;&lt;/span&gt;&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0e00772&amp;gt;&amp;#93;&lt;/span&gt; mdt_handle_common+0x932/0x1750 &lt;span class=&quot;error&quot;&gt;&amp;#91;mdt&amp;#93;&lt;/span&gt;&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0e01665&amp;gt;&amp;#93;&lt;/span&gt; mdt_regular_handle+0x15/0x20 &lt;span class=&quot;error&quot;&gt;&amp;#91;mdt&amp;#93;&lt;/span&gt;&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa079fb4e&amp;gt;&amp;#93;&lt;/span&gt; ptlrpc_main+0xc4e/0x1a40 &lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa079ef00&amp;gt;&amp;#93;&lt;/span&gt; ? ptlrpc_main+0x0/0x1a40 &lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff8100c0ca&amp;gt;&amp;#93;&lt;/span&gt; child_rip+0xa/0x20&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa079ef00&amp;gt;&amp;#93;&lt;/span&gt; ? ptlrpc_main+0x0/0x1a40 &lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa079ef00&amp;gt;&amp;#93;&lt;/span&gt; ? ptlrpc_main+0x0/0x1a40 &lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff8100c0c0&amp;gt;&amp;#93;&lt;/span&gt; ? child_rip+0x0/0x20&lt;/p&gt;

&lt;p&gt;LustreError: dumping log to /tmp/lustre-log.1385589491.8362&lt;br/&gt;
Lustre: Service thread pid 8370 was inactive for 200.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:&lt;br/&gt;
Pid: 8370, comm: mdt_459&lt;/p&gt;</comment>
                            <comment id="72770" author="niu" created="Wed, 4 Dec 2013 01:28:33 +0000"  >&lt;p&gt;I didn&apos;t see anything wrong in the log files, but the messages in crash dump is suspicious:&lt;/p&gt;

&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;Nov 27 11:01:27 nbp8-mds1 kernel: LustreError: 5515:0:(o2iblnd_cb.c:2992:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds
Nov 27 11:01:27 nbp8-mds1 kernel: LustreError: 5515:0:(o2iblnd_cb.c:3055:kiblnd_check_conns()) Timed out RDMA with 10.151.27.18@o2ib (162): c: 7, oc: 0, rc: 8
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Looks like there is network problem in your system, could it related to &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-4195&quot; title=&quot;MDT Slow with ptlrpcd using 100% cpu.&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-4195&quot;&gt;&lt;del&gt;LU-4195&lt;/del&gt;&lt;/a&gt;?&lt;/p&gt;</comment>
                            <comment id="72995" author="mhanafi" created="Fri, 6 Dec 2013 16:46:03 +0000"  >&lt;p&gt;we don&apos;t have full bt for all the tasks at the initial hang. But i am attaching full bt for the hang after recovery.&lt;/p&gt;

&lt;p&gt;attach: bta_afterrecover.gz&lt;/p&gt;
</comment>
                            <comment id="73028" author="jaylan" created="Sat, 7 Dec 2013 00:46:50 +0000"  >&lt;p&gt;twelve processes in spin_lock in lnet code.&lt;/p&gt;</comment>
                            <comment id="73052" author="niu" created="Mon, 9 Dec 2013 04:56:17 +0000"  >&lt;blockquote&gt;
&lt;p&gt;twelve processes in spin_lock in lnet code.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;right, it looks quite similar to the trace of &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-4195&quot; title=&quot;MDT Slow with ptlrpcd using 100% cpu.&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-4195&quot;&gt;&lt;del&gt;LU-4195&lt;/del&gt;&lt;/a&gt;, did you try the check you network as Amir suggested.&lt;/p&gt;</comment>
                            <comment id="78721" author="liang" created="Fri, 7 Mar 2014 16:11:06 +0000"  >&lt;p&gt;FYI, please check my comment on &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-4733&quot; title=&quot;All mdt thread stuck in cfs_waitq_wait&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-4733&quot;&gt;&lt;del&gt;LU-4733&lt;/del&gt;&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="78737" author="jaylan" created="Fri, 7 Mar 2014 18:08:59 +0000"  >&lt;p&gt;If   cfs_atomic_inc(&amp;amp;set-&amp;gt;set_completes) &lt;br/&gt;
is supposed to fix the problem as Liang Zhen commented in &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-4733&quot; title=&quot;All mdt thread stuck in cfs_waitq_wait&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-4733&quot;&gt;&lt;del&gt;LU-4733&lt;/del&gt;&lt;/a&gt;,&lt;br/&gt;
the fix is also in 2.4.0.&lt;/p&gt;

&lt;p&gt;Have we seen this problem in nbp7, which runs 2.4.1 server, Mahmoud?&lt;/p&gt;</comment>
                            <comment id="78738" author="mhanafi" created="Fri, 7 Mar 2014 18:20:11 +0000"  >&lt;p&gt;we have seen this at least once on nbp7.&lt;/p&gt;</comment>
                            <comment id="97848" author="mhanafi" created="Wed, 29 Oct 2014 16:26:19 +0000"  >&lt;p&gt;please close. We are running 2.4.3 and have not see the issue.&lt;/p&gt;</comment>
                            <comment id="97851" author="pjones" created="Wed, 29 Oct 2014 16:29:45 +0000"  >&lt;p&gt;ok thanks Mahmoud!&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="22158">LU-4271</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="13890" name="bta_afterrecover.gz" size="5429" author="mhanafi" created="Fri, 6 Dec 2013 16:46:26 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzwaev:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>11860</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>