<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:41:21 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-4286] OST - still busy with 1 active RPCs</title>
                <link>https://jira.whamcloud.com/browse/LU-4286</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;We have hit this at least once a week for the last month. A seemingly random OST gets stuck refusing reconnection with a busy RPC after an ll_ost stack trace e.g.&lt;/p&gt;

&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;Nov 15 07:23:21 boss4 kernel: LustreError: 28716:0:(ost_handler.c:1764:ost_blocking_ast()) Error -2 syncing data on lock cancel
Nov 15 07:23:36 boss4 kernel: LustreError: 28276:0:(ost_handler.c:1764:ost_blocking_ast()) Error -2 syncing data on lock cancel
Nov 15 07:23:36 boss4 kernel: LustreError: 28276:0:(ost_handler.c:1764:ost_blocking_ast()) Skipped 1 previous similar message
Nov 15 07:43:31 boss4 kernel: LustreError: 16252:0:(ost_handler.c:1764:ost_blocking_ast()) Error -2 syncing data on lock cancel
Nov 15 07:43:31 boss4 kernel: LustreError: 16252:0:(ost_handler.c:1764:ost_blocking_ast()) Skipped 6 previous similar messages
Nov 15 07:44:24 boss4 kernel: LNet: Service thread pid 5365 was inactive &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; 200.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; debugging purposes:
Nov 15 07:44:24 boss4 kernel: Pid: 5365, comm: ll_ost02_083
Nov 15 07:44:24 boss4 kernel:
Nov 15 07:44:24 boss4 kernel: Call Trace:
Nov 15 07:44:24 boss4 kernel: [&amp;lt;ffffffffa042b6fe&amp;gt;] cfs_waitq_wait+0xe/0x10 [libcfs]
Nov 15 07:44:24 boss4 kernel: [&amp;lt;ffffffffa05a6523&amp;gt;] lu_object_find_at+0xb3/0x360 [obdclass]
Nov 15 07:44:24 boss4 kernel: [&amp;lt;ffffffff81063410&amp;gt;] ? default_wake_function+0x0/0x20
Nov 15 07:44:24 boss4 kernel: [&amp;lt;ffffffffa05a67e6&amp;gt;] lu_object_find+0x16/0x20 [obdclass]
Nov 15 07:44:24 boss4 kernel: [&amp;lt;ffffffffa0e015c5&amp;gt;] ofd_object_find+0x35/0xf0 [ofd]
Nov 15 07:44:24 boss4 kernel: [&amp;lt;ffffffffa05a73ee&amp;gt;] ? lu_env_init+0x1e/0x30 [obdclass]
Nov 15 07:44:24 boss4 kernel: [&amp;lt;ffffffffa0e11549&amp;gt;] ofd_lvbo_update+0x6d9/0xea8 [ofd]
Nov 15 07:44:24 boss4 kernel: [&amp;lt;ffffffffa0df9fe3&amp;gt;] ofd_setattr+0x7f3/0xbd0 [ofd]
Nov 15 07:44:24 boss4 kernel: [&amp;lt;ffffffffa0dc7c1c&amp;gt;] ost_setattr+0x31c/0x990 [ost]
Nov 15 07:44:24 boss4 kernel: [&amp;lt;ffffffffa0dcb746&amp;gt;] ost_handle+0x21e6/0x48e0 [ost]
Nov 15 07:44:24 boss4 kernel: [&amp;lt;ffffffffa0739beb&amp;gt;] ? ptlrpc_update_export_timer+0x4b/0x560 [ptlrpc]
Nov 15 07:44:24 boss4 kernel: [&amp;lt;ffffffffa07423c8&amp;gt;] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc]
Nov 15 07:44:24 boss4 kernel: [&amp;lt;ffffffffa042b5de&amp;gt;] ? cfs_timer_arm+0xe/0x10 [libcfs]
Nov 15 07:44:24 boss4 kernel: [&amp;lt;ffffffffa043cd9f&amp;gt;] ? lc_watchdog_touch+0x6f/0x170 [libcfs]
Nov 15 07:44:24 boss4 kernel: [&amp;lt;ffffffffa0739729&amp;gt;] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc]
Nov 15 07:44:24 boss4 kernel: [&amp;lt;ffffffff81055ad3&amp;gt;] ? __wake_up+0x53/0x70
Nov 15 07:44:24 boss4 kernel: [&amp;lt;ffffffffa074375e&amp;gt;] ptlrpc_main+0xace/0x1700 [ptlrpc]
Nov 15 07:44:24 boss4 kernel: [&amp;lt;ffffffffa0742c90&amp;gt;] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
Nov 15 07:44:24 boss4 kernel: [&amp;lt;ffffffff8100c0ca&amp;gt;] child_rip+0xa/0x20
Nov 15 07:44:24 boss4 kernel: [&amp;lt;ffffffffa0742c90&amp;gt;] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
Nov 15 07:44:24 boss4 kernel: [&amp;lt;ffffffffa0742c90&amp;gt;] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
Nov 15 07:44:24 boss4 kernel: [&amp;lt;ffffffff8100c0c0&amp;gt;] ? child_rip+0x0/0x20
Nov 15 07:44:24 boss4 kernel:
Nov 15 07:44:24 boss4 kernel: LustreError: dumping log to /tmp/lustre-log.1384501464.5365
Nov 15 07:57:11 boss4 kernel: Lustre: 5360:0:(service.c:1339:ptlrpc_at_send_early_reply()) @@@ Couldn&apos;t add any time (5/-367), not sending early reply
Nov 15 07:57:11 boss4 kernel:  req@ffff88324df9f400 x1450789664028660/t0(0) o2-&amp;gt;bravo-MDT0000-mdtlov_UUID@10.21.22.50@tcp:0/0 lens 560/432 e 5 to 0 dl 1384502236 ref 2 fl Interpret:/0/0 rc 0/0
Nov 15 07:59:32 boss4 kernel: Lustre: bravo-OST0040: Client bravo-MDT0000-mdtlov_UUID (at 10.21.22.50@tcp) reconnecting
Nov 15 07:59:32 boss4 kernel: Lustre: bravo-OST0040: Client bravo-MDT0000-mdtlov_UUID (at 10.21.22.50@tcp) refused reconnection, still busy with 1 active RPCs
Nov 15 07:59:57 boss4 kernel: Lustre: bravo-OST0040: Client bravo-MDT0000-mdtlov_UUID (at 10.21.22.50@tcp) reconnecting
Nov 15 07:59:57 boss4 kernel: Lustre: bravo-OST0040: Client bravo-MDT0000-mdtlov_UUID (at 10.21.22.50@tcp) refused reconnection, still busy with 1 active RPCs
Nov 15 08:00:22 boss4 kernel: Lustre: bravo-OST0040: Client bravo-MDT0000-mdtlov_UUID (at 10.21.22.50@tcp) reconnecting
Nov 15 08:00:22 boss4 kernel: Lustre: bravo-OST0040: Client bravo-MDT0000-mdtlov_UUID (at 10.21.22.50@tcp) refused reconnection, still busy with 1 active RPCs
Nov 15 08:00:47 boss4 kernel: Lustre: bravo-OST0040: Client bravo-MDT0000-mdtlov_UUID (at 10.21.22.50@tcp) reconnecting
Nov 15 08:00:47 boss4 kernel: Lustre: bravo-OST0040: Client bravo-MDT0000-mdtlov_UUID (at 10.21.22.50@tcp) refused reconnection, still busy with 1 active RPCs
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The filesystem grinds to a halt and after rebooting the OSS, recovery brings the OST back to the same place - 1 active RPC. We have to mount and abort_recov to bring things back to life. Another example (different OSS/OST):&lt;/p&gt;

&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;Nov  9 01:16:37 boss1 kernel: LNet: Service thread pid 16551 was inactive &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; 200.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; debugging purposes:
Nov  9 01:16:37 boss1 kernel: Pid: 16551, comm: ll_ost02_038
Nov  9 01:16:37 boss1 kernel:
Nov  9 01:16:37 boss1 kernel: Call Trace:
Nov  9 01:16:37 boss1 kernel: [&amp;lt;ffffffffa03ca6fe&amp;gt;] cfs_waitq_wait+0xe/0x10 [libcfs]
Nov  9 01:16:37 boss1 kernel: [&amp;lt;ffffffffa0545523&amp;gt;] lu_object_find_at+0xb3/0x360 [obdclass]
Nov  9 01:16:37 boss1 kernel: [&amp;lt;ffffffff81063410&amp;gt;] ? default_wake_function+0x0/0x20
Nov  9 01:16:37 boss1 kernel: [&amp;lt;ffffffffa05457e6&amp;gt;] lu_object_find+0x16/0x20 [obdclass]
Nov  9 01:16:37 boss1 kernel: [&amp;lt;ffffffffa0da05c5&amp;gt;] ofd_object_find+0x35/0xf0 [ofd]
Nov  9 01:16:37 boss1 kernel: [&amp;lt;ffffffffa05463ee&amp;gt;] ? lu_env_init+0x1e/0x30 [obdclass]
Nov  9 01:16:37 boss1 kernel: [&amp;lt;ffffffffa0db0549&amp;gt;] ofd_lvbo_update+0x6d9/0xea8 [ofd]
Nov  9 01:16:37 boss1 kernel: [&amp;lt;ffffffffa0d98fe3&amp;gt;] ofd_setattr+0x7f3/0xbd0 [ofd]
Nov  9 01:16:37 boss1 kernel: [&amp;lt;ffffffffa0d66c1c&amp;gt;] ost_setattr+0x31c/0x990 [ost]
Nov  9 01:16:37 boss1 kernel: [&amp;lt;ffffffffa0d6a746&amp;gt;] ost_handle+0x21e6/0x48e0 [ost]
Nov  9 01:16:37 boss1 kernel: [&amp;lt;ffffffffa06d8beb&amp;gt;] ? ptlrpc_update_export_timer+0x4b/0x560 [ptlrpc]
Nov  9 01:16:37 boss1 kernel: [&amp;lt;ffffffffa06e13c8&amp;gt;] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc]
Nov  9 01:16:37 boss1 kernel: [&amp;lt;ffffffffa03ca5de&amp;gt;] ? cfs_timer_arm+0xe/0x10 [libcfs]
Nov  9 01:16:37 boss1 kernel: [&amp;lt;ffffffffa03dbd9f&amp;gt;] ? lc_watchdog_touch+0x6f/0x170 [libcfs]
Nov  9 01:16:37 boss1 kernel: [&amp;lt;ffffffffa06d8729&amp;gt;] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc]
Nov  9 01:16:37 boss1 kernel: [&amp;lt;ffffffff81055ad3&amp;gt;] ? __wake_up+0x53/0x70
Nov  9 01:16:37 boss1 kernel: [&amp;lt;ffffffffa06e275e&amp;gt;] ptlrpc_main+0xace/0x1700 [ptlrpc]
Nov  9 01:16:37 boss1 kernel: [&amp;lt;ffffffffa06e1c90&amp;gt;] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
Nov  9 01:16:37 boss1 kernel: [&amp;lt;ffffffff8100c0ca&amp;gt;] child_rip+0xa/0x20
Nov  9 01:16:37 boss1 kernel: [&amp;lt;ffffffffa06e1c90&amp;gt;] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
Nov  9 01:16:37 boss1 kernel: [&amp;lt;ffffffffa06e1c90&amp;gt;] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
Nov  9 01:16:37 boss1 kernel: [&amp;lt;ffffffff8100c0c0&amp;gt;] ? child_rip+0x0/0x20
Nov  9 01:16:37 boss1 kernel:
Nov  9 01:16:37 boss1 kernel: LustreError: dumping log to /tmp/lustre-log.1383959797.16551
Nov  9 01:26:19 boss1 kernel: INFO: task ll_ost02_038:16551 blocked &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; more than 120 seconds.
Nov  9 01:26:19 boss1 kernel: &lt;span class=&quot;code-quote&quot;&gt;&quot;echo 0 &amp;gt; /proc/sys/kernel/hung_task_timeout_secs&quot;&lt;/span&gt; disables &lt;span class=&quot;code-keyword&quot;&gt;this&lt;/span&gt; message.
Nov  9 01:26:19 boss1 kernel: ll_ost02_038  D 0000000000000007     0 16551      2 0x00000080
Nov  9 01:26:19 boss1 kernel: ffff883f44a259b0 0000000000000046 0000000000000000 ffff883f44a25974
Nov  9 01:26:19 boss1 kernel: ffff883f44a25960 ffffc901077e7030 0000000000000246 0000000000000246
Nov  9 01:26:19 boss1 kernel: ffff883f44491ab8 ffff883f44a25fd8 000000000000fb88 ffff883f44491ab8
Nov  9 01:26:19 boss1 kernel: Call Trace:
Nov  9 01:26:19 boss1 kernel: [&amp;lt;ffffffffa03ca6fe&amp;gt;] cfs_waitq_wait+0xe/0x10 [libcfs]
Nov  9 01:26:19 boss1 kernel: [&amp;lt;ffffffffa0545523&amp;gt;] lu_object_find_at+0xb3/0x360 [obdclass]
Nov  9 01:26:19 boss1 kernel: [&amp;lt;ffffffff81063410&amp;gt;] ? default_wake_function+0x0/0x20
Nov  9 01:26:19 boss1 kernel: [&amp;lt;ffffffffa05457e6&amp;gt;] lu_object_find+0x16/0x20 [obdclass]
Nov  9 01:26:19 boss1 kernel: [&amp;lt;ffffffffa0da05c5&amp;gt;] ofd_object_find+0x35/0xf0 [ofd]
Nov  9 01:26:19 boss1 kernel: [&amp;lt;ffffffffa05463ee&amp;gt;] ? lu_env_init+0x1e/0x30 [obdclass]
Nov  9 01:26:19 boss1 kernel: [&amp;lt;ffffffffa0db0549&amp;gt;] ofd_lvbo_update+0x6d9/0xea8 [ofd]
Nov  9 01:26:19 boss1 kernel: [&amp;lt;ffffffffa0d98fe3&amp;gt;] ofd_setattr+0x7f3/0xbd0 [ofd]
Nov  9 01:26:19 boss1 kernel: [&amp;lt;ffffffffa0d66c1c&amp;gt;] ost_setattr+0x31c/0x990 [ost]
Nov  9 01:26:19 boss1 kernel: [&amp;lt;ffffffffa0d6a746&amp;gt;] ost_handle+0x21e6/0x48e0 [ost]
Nov  9 01:26:19 boss1 kernel: [&amp;lt;ffffffffa06d8beb&amp;gt;] ? ptlrpc_update_export_timer+0x4b/0x560 [ptlrpc]
Nov  9 01:26:19 boss1 kernel: [&amp;lt;ffffffffa06e13c8&amp;gt;] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc]
Nov  9 01:26:19 boss1 kernel: [&amp;lt;ffffffffa03ca5de&amp;gt;] ? cfs_timer_arm+0xe/0x10 [libcfs]
Nov  9 01:26:19 boss1 kernel: [&amp;lt;ffffffffa03dbd9f&amp;gt;] ? lc_watchdog_touch+0x6f/0x170 [libcfs]
Nov  9 01:26:19 boss1 kernel: [&amp;lt;ffffffffa06d8729&amp;gt;] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc]
Nov  9 01:26:19 boss1 kernel: [&amp;lt;ffffffff81055ad3&amp;gt;] ? __wake_up+0x53/0x70
Nov  9 01:26:19 boss1 kernel: [&amp;lt;ffffffffa06e275e&amp;gt;] ptlrpc_main+0xace/0x1700 [ptlrpc]
Nov  9 01:26:19 boss1 kernel: [&amp;lt;ffffffffa06e1c90&amp;gt;] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
Nov  9 01:26:19 boss1 kernel: [&amp;lt;ffffffff8100c0ca&amp;gt;] child_rip+0xa/0x20
Nov  9 01:26:19 boss1 kernel: [&amp;lt;ffffffffa06e1c90&amp;gt;] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
Nov  9 01:26:19 boss1 kernel: [&amp;lt;ffffffffa06e1c90&amp;gt;] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
Nov  9 01:26:19 boss1 kernel: [&amp;lt;ffffffff8100c0c0&amp;gt;] ? child_rip+0x0/0x20
Nov  9 01:26:44 boss1 kernel: Lustre: 16528:0:(service.c:1339:ptlrpc_at_send_early_reply()) @@@ Couldn&apos;t add any time (5/-207), not sending early reply
Nov  9 01:26:44 boss1 kernel:  req@ffff883e8281f400 x1450784570036048/t0(0) o2-&amp;gt;bravo-MDT0000-mdtlov_UUID@10.21.22.50@tcp:0/0 lens 560/432 e 5 to 0 dl 1383960409 ref 2 fl Interpret:/0/0 rc 0/0
Nov  9 01:28:49 boss1 kernel: Lustre: bravo-OST000d: Client bravo-MDT0000-mdtlov_UUID (at 10.21.22.50@tcp) reconnecting
Nov  9 01:28:49 boss1 kernel: Lustre: bravo-OST000d: Client bravo-MDT0000-mdtlov_UUID (at 10.21.22.50@tcp) refused reconnection, still busy with 1 active RPCs
Nov  9 01:28:49 boss1 kernel: Lustre: Skipped 568 previous similar messages
Nov  9 01:29:14 boss1 kernel: Lustre: bravo-OST000d: Client bravo-MDT0000-mdtlov_UUID (at 10.21.22.50@tcp) reconnecting
Nov  9 01:29:14 boss1 kernel: Lustre: bravo-OST000d: Client bravo-MDT0000-mdtlov_UUID (at 10.21.22.50@tcp) refused reconnection, still busy with 1 active RPCs
Nov  9 01:29:39 boss1 kernel: Lustre: bravo-OST000d: Client bravo-MDT0000-mdtlov_UUID (at 10.21.22.50@tcp) reconnecting
Nov  9 01:29:39 boss1 kernel: Lustre: bravo-OST000d: Client bravo-MDT0000-mdtlov_UUID (at 10.21.22.50@tcp) refused reconnection, still busy with 1 active RPCs
Nov  9 01:30:04 boss1 kernel: Lustre: bravo-OST000d: Client bravo-MDT0000-mdtlov_UUID (at 10.21.22.50@tcp) reconnecting
Nov  9 01:30:04 boss1 kernel: Lustre: bravo-OST000d: Client bravo-MDT0000-mdtlov_UUID (at 10.21.22.50@tcp) refused reconnection, still busy with 1 active RPCs
Nov  9 01:30:29 boss1 kernel: Lustre: bravo-OST000d: Client bravo-MDT0000-mdtlov_UUID (at 10.21.22.50@tcp) reconnecting
Nov  9 01:30:29 boss1 kernel: Lustre: bravo-OST000d: Client bravo-MDT0000-mdtlov_UUID (at 10.21.22.50@tcp) refused reconnection, still busy with 1 active RPCs
Nov  9 01:30:54 boss1 kernel: Lustre: bravo-OST000d: Client bravo-MDT0000-mdtlov_UUID (at 10.21.22.50@tcp) reconnecting
Nov  9 01:30:54 boss1 kernel: Lustre: bravo-OST000d: Client bravo-MDT0000-mdtlov_UUID (at 10.21.22.50@tcp) refused reconnection, still busy with 1 active RPCs
Nov  9 01:31:19 boss1 kernel: Lustre: bravo-OST000d: Client bravo-MDT0000-mdtlov_UUID (at 10.21.22.50@tcp) reconnecting
Nov  9 01:31:19 boss1 kernel: Lustre: bravo-OST000d: Client bravo-MDT0000-mdtlov_UUID (at 10.21.22.50@tcp) refused reconnection, still busy with 1 active RPCs
Nov  9 01:31:44 boss1 kernel: Lustre: bravo-OST000d: Client bravo-MDT0000-mdtlov_UUID (at 10.21.22.50@tcp) reconnecting
Nov  9 01:31:44 boss1 kernel: Lustre: bravo-OST000d: Client bravo-MDT0000-mdtlov_UUID (at 10.21.22.50@tcp) refused reconnection, still busy with 1 active RPCs
Nov  9 01:32:34 boss1 kernel: Lustre: bravo-OST000d: Client bravo-MDT0000-mdtlov_UUID (at 10.21.22.50@tcp) reconnecting
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;If I can provide you with any more information next time it happens then let me know.&lt;/p&gt;</description>
                <environment>EL6.4</environment>
        <key id="22194">LU-4286</key>
            <summary>OST - still busy with 1 active RPCs</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="3">Duplicate</resolution>
                                        <assignee username="hongchao.zhang">Hongchao Zhang</assignee>
                                    <reporter username="daire">Daire Byrne</reporter>
                        <labels>
                    </labels>
                <created>Thu, 21 Nov 2013 12:32:14 +0000</created>
                <updated>Wed, 2 Apr 2014 02:51:12 +0000</updated>
                            <resolved>Wed, 2 Apr 2014 02:51:12 +0000</resolved>
                                    <version>Lustre 2.4.1</version>
                                                        <due></due>
                            <votes>2</votes>
                                    <watches>9</watches>
                                                                            <comments>
                            <comment id="74567" author="daire" created="Wed, 8 Jan 2014 16:34:51 +0000"  >&lt;p&gt;We are still seeing this around once/twice a week. We have to reboot the OSS server once it happens. Is there anything else we can do to help you root cause the issue? It seems to occur more often when we run a large deletion across the filesystem. Our workload is extremely metadata heavy with millions of links/unlinks per day.&lt;/p&gt;

&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;Jan  8 15:24:43 boss2 kernel: LustreError: 26897:0:(ost_handler.c:1764:ost_blocking_ast()) Error -2 syncing data on lock cancel
Jan  8 15:31:48 boss2 kernel: LustreError: 18797:0:(ost_handler.c:1764:ost_blocking_ast()) Error -2 syncing data on lock cancel
Jan  8 15:46:17 boss2 kernel: LNet: Service thread pid 22361 was inactive &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; 200.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; debugging purposes:
Jan  8 15:46:17 boss2 kernel: Pid: 22361, comm: ll_ost02_039
Jan  8 15:46:17 boss2 kernel: 
Jan  8 15:46:17 boss2 kernel: Call Trace:
Jan  8 15:46:17 boss2 kernel: [&amp;lt;ffffffffa04af6fe&amp;gt;] cfs_waitq_wait+0xe/0x10 [libcfs]
Jan  8 15:46:17 boss2 kernel: [&amp;lt;ffffffffa060e523&amp;gt;] lu_object_find_at+0xb3/0x360 [obdclass]
Jan  8 15:46:17 boss2 kernel: [&amp;lt;ffffffff81063410&amp;gt;] ? default_wake_function+0x0/0x20
Jan  8 15:46:17 boss2 kernel: [&amp;lt;ffffffffa060e7e6&amp;gt;] lu_object_find+0x16/0x20 [obdclass]
Jan  8 15:46:17 boss2 kernel: [&amp;lt;ffffffffa0db85c5&amp;gt;] ofd_object_find+0x35/0xf0 [ofd]
Jan  8 15:46:17 boss2 kernel: [&amp;lt;ffffffffa060f3ee&amp;gt;] ? lu_env_init+0x1e/0x30 [obdclass]
Jan  8 15:46:17 boss2 kernel: [&amp;lt;ffffffffa0dc8549&amp;gt;] ofd_lvbo_update+0x6d9/0xea8 [ofd]
Jan  8 15:46:17 boss2 kernel: [&amp;lt;ffffffffa0db0fe3&amp;gt;] ofd_setattr+0x7f3/0xbd0 [ofd]
Jan  8 15:46:17 boss2 kernel: [&amp;lt;ffffffffa0d7ec1c&amp;gt;] ost_setattr+0x31c/0x990 [ost]
Jan  8 15:46:17 boss2 kernel: [&amp;lt;ffffffffa0d82746&amp;gt;] ost_handle+0x21e6/0x48e0 [ost]
Jan  8 15:46:17 boss2 kernel: [&amp;lt;ffffffffa07a1beb&amp;gt;] ? ptlrpc_update_export_timer+0x4b/0x560 [ptlrpc]
Jan  8 15:46:17 boss2 kernel: [&amp;lt;ffffffffa07aa3c8&amp;gt;] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc]
Jan  8 15:46:17 boss2 kernel: [&amp;lt;ffffffffa04af5de&amp;gt;] ? cfs_timer_arm+0xe/0x10 [libcfs]
Jan  8 15:46:17 boss2 kernel: [&amp;lt;ffffffffa04c0d9f&amp;gt;] ? lc_watchdog_touch+0x6f/0x170 [libcfs]
Jan  8 15:46:17 boss2 kernel: [&amp;lt;ffffffffa07a1729&amp;gt;] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc]
Jan  8 15:46:17 boss2 kernel: [&amp;lt;ffffffff81063410&amp;gt;] ? default_wake_function+0x0/0x20
Jan  8 15:46:17 boss2 kernel: [&amp;lt;ffffffffa07ab75e&amp;gt;] ptlrpc_main+0xace/0x1700 [ptlrpc]
Jan  8 15:46:17 boss2 kernel: [&amp;lt;ffffffffa07aac90&amp;gt;] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
Jan  8 15:46:17 boss2 kernel: [&amp;lt;ffffffff8100c0ca&amp;gt;] child_rip+0xa/0x20
Jan  8 15:46:17 boss2 kernel: [&amp;lt;ffffffffa07aac90&amp;gt;] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
Jan  8 15:46:17 boss2 kernel: [&amp;lt;ffffffffa07aac90&amp;gt;] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
Jan  8 15:46:17 boss2 kernel: [&amp;lt;ffffffff8100c0c0&amp;gt;] ? child_rip+0x0/0x20
Jan  8 15:46:17 boss2 kernel: 
Jan  8 15:46:17 boss2 kernel: LustreError: dumping log to /tmp/lustre-log.1389195977.22361
Jan  8 15:51:58 boss2 kernel: INFO: task ll_ost02_039:22361 blocked &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; more than 120 seconds.
Jan  8 15:51:58 boss2 kernel: &lt;span class=&quot;code-quote&quot;&gt;&quot;echo 0 &amp;gt; /proc/sys/kernel/hung_task_timeout_secs&quot;&lt;/span&gt; disables &lt;span class=&quot;code-keyword&quot;&gt;this&lt;/span&gt; message.
Jan  8 15:51:58 boss2 kernel: ll_ost02_039  D 0000000000000015     0 22361      2 0x00000000
Jan  8 15:51:58 boss2 kernel: ffff882f016bf9b0 0000000000000046 0000000000000000 ffff882f016bf974
Jan  8 15:51:58 boss2 kernel: ffff882f016bf960 ffffc900cd15a030 0000000000000246 0000000000000246
Jan  8 15:51:58 boss2 kernel: ffff8837d33bd058 ffff882f016bffd8 000000000000fb88 ffff8837d33bd058
Jan  8 15:51:58 boss2 kernel: Call Trace:
Jan  8 15:51:58 boss2 kernel: [&amp;lt;ffffffffa04af6fe&amp;gt;] cfs_waitq_wait+0xe/0x10 [libcfs]
Jan  8 15:51:58 boss2 kernel: [&amp;lt;ffffffffa060e523&amp;gt;] lu_object_find_at+0xb3/0x360 [obdclass]
Jan  8 15:51:58 boss2 kernel: [&amp;lt;ffffffff81063410&amp;gt;] ? default_wake_function+0x0/0x20
Jan  8 15:51:58 boss2 kernel: [&amp;lt;ffffffffa060e7e6&amp;gt;] lu_object_find+0x16/0x20 [obdclass]
Jan  8 15:51:58 boss2 kernel: [&amp;lt;ffffffffa0db85c5&amp;gt;] ofd_object_find+0x35/0xf0 [ofd]
Jan  8 15:51:58 boss2 kernel: [&amp;lt;ffffffffa060f3ee&amp;gt;] ? lu_env_init+0x1e/0x30 [obdclass]
Jan  8 15:51:58 boss2 kernel: [&amp;lt;ffffffffa0dc8549&amp;gt;] ofd_lvbo_update+0x6d9/0xea8 [ofd]
Jan  8 15:51:58 boss2 kernel: [&amp;lt;ffffffffa0db0fe3&amp;gt;] ofd_setattr+0x7f3/0xbd0 [ofd]
Jan  8 15:51:58 boss2 kernel: [&amp;lt;ffffffffa0d7ec1c&amp;gt;] ost_setattr+0x31c/0x990 [ost]
Jan  8 15:51:58 boss2 kernel: [&amp;lt;ffffffffa0d82746&amp;gt;] ost_handle+0x21e6/0x48e0 [ost]
Jan  8 15:51:58 boss2 kernel: [&amp;lt;ffffffffa07a1beb&amp;gt;] ? ptlrpc_update_export_timer+0x4b/0x560 [ptlrpc]
Jan  8 15:51:58 boss2 kernel: [&amp;lt;ffffffffa07aa3c8&amp;gt;] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc]
Jan  8 15:51:58 boss2 kernel: [&amp;lt;ffffffffa04af5de&amp;gt;] ? cfs_timer_arm+0xe/0x10 [libcfs]
Jan  8 15:51:58 boss2 kernel: [&amp;lt;ffffffffa04c0d9f&amp;gt;] ? lc_watchdog_touch+0x6f/0x170 [libcfs]
Jan  8 15:51:58 boss2 kernel: [&amp;lt;ffffffffa07a1729&amp;gt;] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc]
Jan  8 15:51:58 boss2 kernel: [&amp;lt;ffffffff81063410&amp;gt;] ? default_wake_function+0x0/0x20
Jan  8 15:51:58 boss2 kernel: [&amp;lt;ffffffffa07ab75e&amp;gt;] ptlrpc_main+0xace/0x1700 [ptlrpc]
Jan  8 15:51:58 boss2 kernel: [&amp;lt;ffffffffa07aac90&amp;gt;] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
Jan  8 15:51:58 boss2 kernel: [&amp;lt;ffffffff8100c0ca&amp;gt;] child_rip+0xa/0x20
Jan  8 15:51:58 boss2 kernel: [&amp;lt;ffffffffa07aac90&amp;gt;] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
Jan  8 15:51:58 boss2 kernel: [&amp;lt;ffffffffa07aac90&amp;gt;] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
Jan  8 15:51:58 boss2 kernel: [&amp;lt;ffffffff8100c0c0&amp;gt;] ? child_rip+0x0/0x20
Jan  8 15:53:58 boss2 kernel: INFO: task ll_ost02_039:22361 blocked &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; more than 120 seconds.
Jan  8 15:53:58 boss2 kernel: &lt;span class=&quot;code-quote&quot;&gt;&quot;echo 0 &amp;gt; /proc/sys/kernel/hung_task_timeout_secs&quot;&lt;/span&gt; disables &lt;span class=&quot;code-keyword&quot;&gt;this&lt;/span&gt; message.
Jan  8 15:53:58 boss2 kernel: ll_ost02_039  D 0000000000000015     0 22361      2 0x00000000
Jan  8 15:53:58 boss2 kernel: ffff882f016bf9b0 0000000000000046 0000000000000000 ffff882f016bf974
Jan  8 15:53:58 boss2 kernel: ffff882f016bf960 ffffc900cd15a030 0000000000000246 0000000000000246
Jan  8 15:53:58 boss2 kernel: ffff8837d33bd058 ffff882f016bffd8 000000000000fb88 ffff8837d33bd058
Jan  8 15:53:58 boss2 kernel: Call Trace:
Jan  8 15:53:58 boss2 kernel: [&amp;lt;ffffffffa04af6fe&amp;gt;] cfs_waitq_wait+0xe/0x10 [libcfs]
Jan  8 15:53:58 boss2 kernel: [&amp;lt;ffffffffa060e523&amp;gt;] lu_object_find_at+0xb3/0x360 [obdclass]
Jan  8 15:53:58 boss2 kernel: [&amp;lt;ffffffff81063410&amp;gt;] ? default_wake_function+0x0/0x20
Jan  8 15:53:58 boss2 kernel: [&amp;lt;ffffffffa060e7e6&amp;gt;] lu_object_find+0x16/0x20 [obdclass]
Jan  8 15:53:58 boss2 kernel: [&amp;lt;ffffffffa0db85c5&amp;gt;] ofd_object_find+0x35/0xf0 [ofd]
Jan  8 15:53:58 boss2 kernel: [&amp;lt;ffffffffa060f3ee&amp;gt;] ? lu_env_init+0x1e/0x30 [obdclass]
Jan  8 15:53:58 boss2 kernel: [&amp;lt;ffffffffa0dc8549&amp;gt;] ofd_lvbo_update+0x6d9/0xea8 [ofd]
Jan  8 15:53:58 boss2 kernel: [&amp;lt;ffffffffa0db0fe3&amp;gt;] ofd_setattr+0x7f3/0xbd0 [ofd]
Jan  8 15:53:58 boss2 kernel: [&amp;lt;ffffffffa0d7ec1c&amp;gt;] ost_setattr+0x31c/0x990 [ost]
Jan  8 15:53:58 boss2 kernel: [&amp;lt;ffffffffa0d82746&amp;gt;] ost_handle+0x21e6/0x48e0 [ost]
Jan  8 15:53:58 boss2 kernel: [&amp;lt;ffffffffa07a1beb&amp;gt;] ? ptlrpc_update_export_timer+0x4b/0x560 [ptlrpc]
Jan  8 15:53:58 boss2 kernel: [&amp;lt;ffffffffa07aa3c8&amp;gt;] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc]
Jan  8 15:53:58 boss2 kernel: [&amp;lt;ffffffffa04af5de&amp;gt;] ? cfs_timer_arm+0xe/0x10 [libcfs]
Jan  8 15:53:58 boss2 kernel: [&amp;lt;ffffffffa04c0d9f&amp;gt;] ? lc_watchdog_touch+0x6f/0x170 [libcfs]
Jan  8 15:53:58 boss2 kernel: [&amp;lt;ffffffffa07a1729&amp;gt;] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc]
Jan  8 15:53:58 boss2 kernel: [&amp;lt;ffffffff81063410&amp;gt;] ? default_wake_function+0x0/0x20
Jan  8 15:53:58 boss2 kernel: [&amp;lt;ffffffffa07ab75e&amp;gt;] ptlrpc_main+0xace/0x1700 [ptlrpc]
Jan  8 15:53:58 boss2 kernel: [&amp;lt;ffffffffa07aac90&amp;gt;] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
Jan  8 15:53:58 boss2 kernel: [&amp;lt;ffffffff8100c0ca&amp;gt;] child_rip+0xa/0x20
Jan  8 15:53:58 boss2 kernel: [&amp;lt;ffffffffa07aac90&amp;gt;] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
Jan  8 15:53:58 boss2 kernel: [&amp;lt;ffffffffa07aac90&amp;gt;] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
Jan  8 15:53:58 boss2 kernel: [&amp;lt;ffffffff8100c0c0&amp;gt;] ? child_rip+0x0/0x20
Jan  8 15:55:58 boss2 kernel: INFO: task ll_ost02_039:22361 blocked &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; more than 120 seconds.
Jan  8 15:55:58 boss2 kernel: &lt;span class=&quot;code-quote&quot;&gt;&quot;echo 0 &amp;gt; /proc/sys/kernel/hung_task_timeout_secs&quot;&lt;/span&gt; disables &lt;span class=&quot;code-keyword&quot;&gt;this&lt;/span&gt; message.
Jan  8 15:55:58 boss2 kernel: ll_ost02_039  D 0000000000000015     0 22361      2 0x00000000
Jan  8 15:55:58 boss2 kernel: ffff882f016bf9b0 0000000000000046 0000000000000000 ffff882f016bf974
Jan  8 15:55:58 boss2 kernel: ffff882f016bf960 ffffc900cd15a030 0000000000000246 0000000000000246
Jan  8 15:55:58 boss2 kernel: ffff8837d33bd058 ffff882f016bffd8 000000000000fb88 ffff8837d33bd058
Jan  8 15:55:58 boss2 kernel: Call Trace:
Jan  8 15:55:58 boss2 kernel: [&amp;lt;ffffffffa04af6fe&amp;gt;] cfs_waitq_wait+0xe/0x10 [libcfs]
Jan  8 15:55:58 boss2 kernel: [&amp;lt;ffffffffa060e523&amp;gt;] lu_object_find_at+0xb3/0x360 [obdclass]
Jan  8 15:55:58 boss2 kernel: [&amp;lt;ffffffff81063410&amp;gt;] ? default_wake_function+0x0/0x20
Jan  8 15:55:58 boss2 kernel: [&amp;lt;ffffffffa060e7e6&amp;gt;] lu_object_find+0x16/0x20 [obdclass]
Jan  8 15:55:58 boss2 kernel: [&amp;lt;ffffffffa0db85c5&amp;gt;] ofd_object_find+0x35/0xf0 [ofd]
Jan  8 15:55:58 boss2 kernel: [&amp;lt;ffffffffa060f3ee&amp;gt;] ? lu_env_init+0x1e/0x30 [obdclass]
Jan  8 15:55:58 boss2 kernel: [&amp;lt;ffffffffa0dc8549&amp;gt;] ofd_lvbo_update+0x6d9/0xea8 [ofd]
Jan  8 15:55:58 boss2 kernel: [&amp;lt;ffffffffa0db0fe3&amp;gt;] ofd_setattr+0x7f3/0xbd0 [ofd]
Jan  8 15:55:58 boss2 kernel: [&amp;lt;ffffffffa0d7ec1c&amp;gt;] ost_setattr+0x31c/0x990 [ost]
Jan  8 15:55:58 boss2 kernel: [&amp;lt;ffffffffa0d82746&amp;gt;] ost_handle+0x21e6/0x48e0 [ost]
Jan  8 15:55:58 boss2 kernel: [&amp;lt;ffffffffa07a1beb&amp;gt;] ? ptlrpc_update_export_timer+0x4b/0x560 [ptlrpc]
Jan  8 15:55:58 boss2 kernel: [&amp;lt;ffffffffa07aa3c8&amp;gt;] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc]
Jan  8 15:55:58 boss2 kernel: [&amp;lt;ffffffffa04af5de&amp;gt;] ? cfs_timer_arm+0xe/0x10 [libcfs]
Jan  8 15:55:58 boss2 kernel: [&amp;lt;ffffffffa04c0d9f&amp;gt;] ? lc_watchdog_touch+0x6f/0x170 [libcfs]
Jan  8 15:55:58 boss2 kernel: [&amp;lt;ffffffffa07a1729&amp;gt;] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc]
Jan  8 15:55:58 boss2 kernel: [&amp;lt;ffffffff81063410&amp;gt;] ? default_wake_function+0x0/0x20
Jan  8 15:55:58 boss2 kernel: [&amp;lt;ffffffffa07ab75e&amp;gt;] ptlrpc_main+0xace/0x1700 [ptlrpc]
Jan  8 15:55:58 boss2 kernel: [&amp;lt;ffffffffa07aac90&amp;gt;] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
Jan  8 15:55:58 boss2 kernel: [&amp;lt;ffffffff8100c0ca&amp;gt;] child_rip+0xa/0x20
Jan  8 15:55:58 boss2 kernel: [&amp;lt;ffffffffa07aac90&amp;gt;] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
Jan  8 15:55:58 boss2 kernel: [&amp;lt;ffffffffa07aac90&amp;gt;] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
Jan  8 15:55:58 boss2 kernel: [&amp;lt;ffffffff8100c0c0&amp;gt;] ? child_rip+0x0/0x20
Jan  8 15:56:24 boss2 kernel: Lustre: 22383:0:(service.c:1339:ptlrpc_at_send_early_reply()) @@@ Couldn&apos;t add any time (5/-207), not sending early reply
Jan  8 15:56:24 boss2 kernel:  req@ffff883243230c00 x1456592052377504/t0(0) o2-&amp;gt;bravo-MDT0000-mdtlov_UUID@10.21.22.50@tcp:0/0 lens 560/432 e 5 to 0 dl 1389196589 ref 2 fl Interpret:/0/0 rc 0/0
Jan  8 15:57:58 boss2 kernel: INFO: task ll_ost02_039:22361 blocked &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; more than 120 seconds.
Jan  8 15:57:58 boss2 kernel: &lt;span class=&quot;code-quote&quot;&gt;&quot;echo 0 &amp;gt; /proc/sys/kernel/hung_task_timeout_secs&quot;&lt;/span&gt; disables &lt;span class=&quot;code-keyword&quot;&gt;this&lt;/span&gt; message.
Jan  8 15:57:58 boss2 kernel: ll_ost02_039  D 0000000000000015     0 22361      2 0x00000000
Jan  8 15:57:58 boss2 kernel: ffff882f016bf9b0 0000000000000046 0000000000000000 ffff882f016bf974
Jan  8 15:57:58 boss2 kernel: ffff882f016bf960 ffffc900cd15a030 0000000000000246 0000000000000246
Jan  8 15:57:58 boss2 kernel: ffff8837d33bd058 ffff882f016bffd8 000000000000fb88 ffff8837d33bd058
Jan  8 15:57:58 boss2 kernel: Call Trace:
Jan  8 15:57:58 boss2 kernel: [&amp;lt;ffffffffa04af6fe&amp;gt;] cfs_waitq_wait+0xe/0x10 [libcfs]
Jan  8 15:57:58 boss2 kernel: [&amp;lt;ffffffffa060e523&amp;gt;] lu_object_find_at+0xb3/0x360 [obdclass]
Jan  8 15:57:58 boss2 kernel: [&amp;lt;ffffffff81063410&amp;gt;] ? default_wake_function+0x0/0x20
Jan  8 15:57:58 boss2 kernel: [&amp;lt;ffffffffa060e7e6&amp;gt;] lu_object_find+0x16/0x20 [obdclass]
Jan  8 15:57:58 boss2 kernel: [&amp;lt;ffffffffa0db85c5&amp;gt;] ofd_object_find+0x35/0xf0 [ofd]
Jan  8 15:57:58 boss2 kernel: [&amp;lt;ffffffffa060f3ee&amp;gt;] ? lu_env_init+0x1e/0x30 [obdclass]
Jan  8 15:57:58 boss2 kernel: [&amp;lt;ffffffffa0dc8549&amp;gt;] ofd_lvbo_update+0x6d9/0xea8 [ofd]
Jan  8 15:57:58 boss2 kernel: [&amp;lt;ffffffffa0db0fe3&amp;gt;] ofd_setattr+0x7f3/0xbd0 [ofd]
Jan  8 15:57:58 boss2 kernel: [&amp;lt;ffffffffa0d7ec1c&amp;gt;] ost_setattr+0x31c/0x990 [ost]
Jan  8 15:57:58 boss2 kernel: [&amp;lt;ffffffffa0d82746&amp;gt;] ost_handle+0x21e6/0x48e0 [ost]
Jan  8 15:57:58 boss2 kernel: [&amp;lt;ffffffffa07a1beb&amp;gt;] ? ptlrpc_update_export_timer+0x4b/0x560 [ptlrpc]
Jan  8 15:57:58 boss2 kernel: [&amp;lt;ffffffffa07aa3c8&amp;gt;] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc]
Jan  8 15:57:58 boss2 kernel: [&amp;lt;ffffffffa04af5de&amp;gt;] ? cfs_timer_arm+0xe/0x10 [libcfs]
Jan  8 15:57:58 boss2 kernel: [&amp;lt;ffffffffa04c0d9f&amp;gt;] ? lc_watchdog_touch+0x6f/0x170 [libcfs]
Jan  8 15:57:58 boss2 kernel: [&amp;lt;ffffffffa07a1729&amp;gt;] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc]
Jan  8 15:57:58 boss2 kernel: [&amp;lt;ffffffff81063410&amp;gt;] ? default_wake_function+0x0/0x20
Jan  8 15:57:58 boss2 kernel: [&amp;lt;ffffffffa07ab75e&amp;gt;] ptlrpc_main+0xace/0x1700 [ptlrpc]
Jan  8 15:57:58 boss2 kernel: [&amp;lt;ffffffffa07aac90&amp;gt;] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
Jan  8 15:57:58 boss2 kernel: [&amp;lt;ffffffff8100c0ca&amp;gt;] child_rip+0xa/0x20
Jan  8 15:57:58 boss2 kernel: [&amp;lt;ffffffffa07aac90&amp;gt;] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
Jan  8 15:57:58 boss2 kernel: [&amp;lt;ffffffffa07aac90&amp;gt;] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
Jan  8 15:57:58 boss2 kernel: [&amp;lt;ffffffff8100c0c0&amp;gt;] ? child_rip+0x0/0x20
Jan  8 15:58:21 boss2 kernel: Lustre: bravo-OST001a: Client bravo-MDT0000-mdtlov_UUID (at 10.21.22.50@tcp) reconnecting
Jan  8 15:58:21 boss2 kernel: Lustre: Skipped 1 previous similar message
Jan  8 15:58:21 boss2 kernel: Lustre: bravo-OST001a: Client bravo-MDT0000-mdtlov_UUID (at 10.21.22.50@tcp) refused reconnection, still busy with 1 active RPCs
Jan  8 15:58:46 boss2 kernel: Lustre: bravo-OST001a: Client bravo-MDT0000-mdtlov_UUID (at 10.21.22.50@tcp) reconnecting
Jan  8 15:58:46 boss2 kernel: Lustre: bravo-OST001a: Client bravo-MDT0000-mdtlov_UUID (at 10.21.22.50@tcp) refused reconnection, still busy with 1 active RPCs
Jan  8 15:59:12 boss2 kernel: Lustre: bravo-OST001a: Client bravo-MDT0000-mdtlov_UUID (at 10.21.22.50@tcp) reconnecting
Jan  8 15:59:12 boss2 kernel: Lustre: bravo-OST001a: Client bravo-MDT0000-mdtlov_UUID (at 10.21.22.50@tcp) refused reconnection, still busy with 1 active RPCs
Jan  8 15:59:37 boss2 kernel: Lustre: bravo-OST001a: Client bravo-MDT0000-mdtlov_UUID (at 10.21.22.50@tcp) reconnecting
Jan  8 15:59:37 boss2 kernel: Lustre: bravo-OST001a: Client bravo-MDT0000-mdtlov_UUID (at 10.21.22.50@tcp) refused reconnection, still busy with 1 active RPCs
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="78335" author="pjones" created="Tue, 4 Mar 2014 14:33:59 +0000"  >&lt;p&gt;Hongchao&lt;/p&gt;

&lt;p&gt;Could you please look into this one?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="78340" author="green" created="Tue, 4 Mar 2014 16:24:39 +0000"  >&lt;p&gt;I think this is almost certainly a dup of &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-4019&quot; title=&quot;today&amp;#39;s master stick on shutdown on test == sanity test 132: on lu_object_find_at&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-4019&quot;&gt;&lt;del&gt;LU-4019&lt;/del&gt;&lt;/a&gt;, there&apos;s a patch available in there.&lt;/p&gt;</comment>
                            <comment id="78447" author="hongchao.zhang" created="Wed, 5 Mar 2014 10:38:52 +0000"  >&lt;p&gt;Yes, this issue should be the duplicate of &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-4019&quot; title=&quot;today&amp;#39;s master stick on shutdown on test == sanity test 132: on lu_object_find_at&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-4019&quot;&gt;&lt;del&gt;LU-4019&lt;/del&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Hi Daire,&lt;/p&gt;

&lt;p&gt;Could you please test with the patch in &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-4019&quot; title=&quot;today&amp;#39;s master stick on shutdown on test == sanity test 132: on lu_object_find_at&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-4019&quot;&gt;&lt;del&gt;LU-4019&lt;/del&gt;&lt;/a&gt; (&lt;a href=&quot;http://review.whamcloud.com/#/c/7795/)?&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#/c/7795/)?&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;</comment>
                            <comment id="78511" author="daire" created="Wed, 5 Mar 2014 19:23:29 +0000"  >&lt;p&gt;Okay. I will wait for us to hit this again (to make sure something else hasn&apos;t changed) and then apply the patch. Cheers.&lt;/p&gt;</comment>
                            <comment id="78692" author="daire" created="Fri, 7 Mar 2014 13:25:05 +0000"  >&lt;p&gt;Hit the bug a couple of times overnight. I&apos;ve patched in the fix. I will report back if we see this again.&lt;/p&gt;</comment>
                            <comment id="78954" author="hongchao.zhang" created="Tue, 11 Mar 2014 00:40:52 +0000"  >&lt;p&gt;Hi Daire,&lt;br/&gt;
how is the test going, does the issue occur again after patching the fix?&lt;br/&gt;
Thanks&lt;/p&gt;</comment>
                            <comment id="78983" author="daire" created="Tue, 11 Mar 2014 11:24:11 +0000"  >&lt;p&gt;Well there have been no further occurrences yet but it took ~3 weeks between instances last time so we will have to sit on this for a while and see what happens.&lt;/p&gt;

&lt;p&gt;Cheers.&lt;/p&gt;</comment>
                            <comment id="79313" author="hilljjornl" created="Fri, 14 Mar 2014 03:54:49 +0000"  >&lt;p&gt;We are not seeing the ll_ost call traces, but are seeing frequent hits of this in our syslogs for oss nodes:&lt;/p&gt;

&lt;p&gt;&lt;span class=&quot;error&quot;&gt;&amp;#91;root@lustre-mgmt1 apps&amp;#93;&lt;/span&gt;# grep &quot;Mar  9&quot; lustrekernel | grep &quot;Error -2 syncing&quot;  | wc -l&lt;br/&gt;
508&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;root@lustre-mgmt1 apps&amp;#93;&lt;/span&gt;# grep &quot;Mar 10&quot; lustrekernel | grep &quot;Error -2 syncing&quot;  | wc -l&lt;br/&gt;
3420&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;root@lustre-mgmt1 apps&amp;#93;&lt;/span&gt;# grep &quot;Mar 11&quot; lustrekernel | grep &quot;Error -2 syncing&quot;  | wc -l&lt;br/&gt;
1076&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;root@lustre-mgmt1 apps&amp;#93;&lt;/span&gt;# grep &quot;Mar 12&quot; lustrekernel | grep &quot;Error -2 syncing&quot;  | wc -l&lt;br/&gt;
154&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;root@lustre-mgmt1 apps&amp;#93;&lt;/span&gt;# grep &quot;Mar 13&quot; lustrekernel | grep &quot;Error -2 syncing&quot;  | wc -l&lt;br/&gt;
2645&lt;/p&gt;


&lt;p&gt;what further debug can we provide? This is on the production filesystems here at ORNL.&lt;/p&gt;</comment>
                            <comment id="79318" author="green" created="Fri, 14 Mar 2014 05:50:43 +0000"  >&lt;p&gt;Error -2 syncing itself is a somewhat valid race, I would think, though I am nto exactly sure how would it happen.&lt;br/&gt;
You might want to apply 4019 patch just in case since it does fix a real bug of course.&lt;/p&gt;</comment>
                            <comment id="79336" author="hongchao.zhang" created="Fri, 14 Mar 2014 15:52:46 +0000"  >&lt;p&gt;this message in syslogs should be in &quot;ost_blocking_ast&quot;, and it is not an error case,&lt;/p&gt;

&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;&lt;span class=&quot;code-object&quot;&gt;int&lt;/span&gt; ost_blocking_ast(struct ldlm_lock *lock, struct ldlm_lock_desc *desc,
                     void *data, &lt;span class=&quot;code-object&quot;&gt;int&lt;/span&gt; flag)
{
        ...

        &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (rc == 0 &amp;amp;&amp;amp; flag == LDLM_CB_CANCELING &amp;amp;&amp;amp;
            (lock-&amp;gt;l_granted_mode &amp;amp; (LCK_PW|LCK_GROUP)) &amp;amp;&amp;amp;
            (sync_lock_cancel == ALWAYS_SYNC_ON_CANCEL ||
             (sync_lock_cancel == BLOCKING_SYNC_ON_CANCEL &amp;amp;&amp;amp;
              lock-&amp;gt;l_flags &amp;amp; LDLM_FL_CBPENDING))) {
 
                ...

                rc = obd_sync(&amp;amp;env, lock-&amp;gt;l_export, oinfo,
                              lock-&amp;gt;l_policy_data.l_extent.start,
                              lock-&amp;gt;l_policy_data.l_extent.end, NULL);
                &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (rc)
                        CERROR(&lt;span class=&quot;code-quote&quot;&gt;&quot;Error %d syncing data on lock cancel\n&quot;&lt;/span&gt;, rc);

                OBDO_FREE(oa);
                OBD_FREE_PTR(oinfo);
        }

        ...
}
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;when the object is being deleted, &quot;LU_OBJECT_HEARD_BANSHEE&quot; flag will be set, and ofd_sync will return -ENOENT (-2)&lt;/p&gt;

&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;&lt;span class=&quot;code-keyword&quot;&gt;static&lt;/span&gt; &lt;span class=&quot;code-object&quot;&gt;int&lt;/span&gt; ofd_sync(&lt;span class=&quot;code-keyword&quot;&gt;const&lt;/span&gt; struct lu_env *env, struct obd_export *exp,
                    struct obd_info *oinfo, obd_size start, obd_size end,
                    struct ptlrpc_request_set *set)
{                         
        ...
        &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (!ofd_object_exists(fo))
                GOTO(put, rc = -ENOENT);
        ...
} 

&lt;span class=&quot;code-keyword&quot;&gt;static&lt;/span&gt; inline &lt;span class=&quot;code-object&quot;&gt;int&lt;/span&gt; ofd_object_exists(struct ofd_object *obj)
{               
        LASSERT(obj != NULL);
        &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (lu_object_is_dying(obj-&amp;gt;ofo_obj.do_lu.lo_header))
                &lt;span class=&quot;code-keyword&quot;&gt;return&lt;/span&gt; 0;          
        &lt;span class=&quot;code-keyword&quot;&gt;return&lt;/span&gt; lu_object_exists(&amp;amp;obj-&amp;gt;ofo_obj.do_lu);
} 

&lt;span class=&quot;code-keyword&quot;&gt;static&lt;/span&gt; inline &lt;span class=&quot;code-object&quot;&gt;int&lt;/span&gt; lu_object_is_dying(&lt;span class=&quot;code-keyword&quot;&gt;const&lt;/span&gt; struct lu_object_header *h)
{               
        &lt;span class=&quot;code-keyword&quot;&gt;return&lt;/span&gt; test_bit(LU_OBJECT_HEARD_BANSHEE, &amp;amp;h-&amp;gt;loh_flags);
}
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;this issue should be marked as a duplicate of &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-4019&quot; title=&quot;today&amp;#39;s master stick on shutdown on test == sanity test 132: on lu_object_find_at&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-4019&quot;&gt;&lt;del&gt;LU-4019&lt;/del&gt;&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="80042" author="pjones" created="Fri, 21 Mar 2014 22:11:27 +0000"  >&lt;p&gt;Daire&lt;/p&gt;

&lt;p&gt;Have you had a chance to try out the &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-4019&quot; title=&quot;today&amp;#39;s master stick on shutdown on test == sanity test 132: on lu_object_find_at&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-4019&quot;&gt;&lt;del&gt;LU-4019&lt;/del&gt;&lt;/a&gt; patch?&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="80046" author="daire" created="Sat, 22 Mar 2014 03:48:31 +0000"  >&lt;p&gt;Peter,&lt;/p&gt;

&lt;p&gt;All our servers have been running with the patch from &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-4019&quot; title=&quot;today&amp;#39;s master stick on shutdown on test == sanity test 132: on lu_object_find_at&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-4019&quot;&gt;&lt;del&gt;LU-4019&lt;/del&gt;&lt;/a&gt; for over a week now. Sometimes it took a couple of weeks to see this bug so we are waiting a bit longer before we resolve this one.&lt;/p&gt;

&lt;p&gt;Cheers,&lt;/p&gt;</comment>
                            <comment id="80048" author="pjones" created="Sat, 22 Mar 2014 04:47:19 +0000"  >&lt;p&gt;Fair enough - keep us posted!&lt;/p&gt;</comment>
                            <comment id="80800" author="daire" created="Wed, 2 Apr 2014 02:06:07 +0000"  >&lt;p&gt;Well we normally would have seen an instance of this by now (~3 weeks since we patched). Feel free to close this ticket and I&apos;ll re-open if required.&lt;/p&gt;

&lt;p&gt;Cheers.&lt;/p&gt;</comment>
                            <comment id="80803" author="pjones" created="Wed, 2 Apr 2014 02:51:12 +0000"  >&lt;p&gt;ok - thanks Daire!&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzw9vb:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>11764</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>