<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:05:03 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-6991] LBUG in ptlrpc_connection_put(): ASSERTION( atomic_read(&amp;conn-&gt;c_refcount) &gt; 1 )</title>
                <link>https://jira.whamcloud.com/browse/LU-6991</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;We hit the following LBUG on one of our MDS.&lt;/p&gt;

&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;&amp;lt;3&amp;gt;LustreError: 12255:0:(sec.c:379:import_sec_validate_get()) &lt;span class=&quot;code-keyword&quot;&gt;import&lt;/span&gt; ffff880903ac1000 (FULL) with no sec
&amp;lt;0&amp;gt;LustreError: 19842:0:(connection.c:104:ptlrpc_connection_put()) ASSERTION( atomic_read(&amp;amp;conn-&amp;gt;c_refcount) &amp;gt; 1 ) failed:
&amp;lt;0&amp;gt;LustreError: 19842:0:(connection.c:104:ptlrpc_connection_put()) LBUG
&amp;lt;4&amp;gt;Pid: 19842, comm: obd_zombid
&amp;lt;4&amp;gt;
&amp;lt;4&amp;gt;Call Trace:
&amp;lt;4&amp;gt; [&amp;lt;ffffffffa04f2895&amp;gt;] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
&amp;lt;4&amp;gt; [&amp;lt;ffffffffa04f2e97&amp;gt;] lbug_with_loc+0x47/0xb0 [libcfs]
&amp;lt;4&amp;gt; [&amp;lt;ffffffffa082938b&amp;gt;] ptlrpc_connection_put+0x1db/0x1e0 [ptlrpc]
&amp;lt;4&amp;gt; [&amp;lt;ffffffffa066c70d&amp;gt;] class_import_destroy+0x5d/0x420 [obdclass]
&amp;lt;4&amp;gt; [&amp;lt;ffffffffa067067b&amp;gt;] obd_zombie_impexp_cull+0xcb/0x5d0 [obdclass]
&amp;lt;4&amp;gt; [&amp;lt;ffffffffa0670be5&amp;gt;] obd_zombie_impexp_thread+0x65/0x190 [obdclass]
&amp;lt;4&amp;gt; [&amp;lt;ffffffff81064bc0&amp;gt;] ? default_wake_function+0x0/0x20
&amp;lt;4&amp;gt; [&amp;lt;ffffffffa0670b80&amp;gt;] ? obd_zombie_impexp_thread+0x0/0x190 [obdclass]
&amp;lt;4&amp;gt; [&amp;lt;ffffffff8109e71e&amp;gt;] kthread+0x9e/0xc0
&amp;lt;4&amp;gt; [&amp;lt;ffffffff8100c20a&amp;gt;] child_rip+0xa/0x20
&amp;lt;4&amp;gt; [&amp;lt;ffffffff8109e680&amp;gt;] ? kthread+0x0/0xc0
&amp;lt;4&amp;gt; [&amp;lt;ffffffff8100c200&amp;gt;] ? child_rip+0x0/0x20
&amp;lt;4&amp;gt;
&amp;lt;0&amp;gt;Kernel panic - not syncing: LBUG
&amp;lt;4&amp;gt;Pid: 19842, comm: obd_zombid Not tainted 2.6.32-504.16.2.el6.Bull.74.x86_64 #1
&amp;lt;4&amp;gt;Call Trace:
&amp;lt;4&amp;gt; [&amp;lt;ffffffff8152a2bd&amp;gt;] ? panic+0xa7/0x16f
&amp;lt;4&amp;gt; [&amp;lt;ffffffffa04f2eeb&amp;gt;] ? lbug_with_loc+0x9b/0xb0 [libcfs]
&amp;lt;4&amp;gt; [&amp;lt;ffffffffa082938b&amp;gt;] ? ptlrpc_connection_put+0x1db/0x1e0 [ptlrpc]
&amp;lt;4&amp;gt; [&amp;lt;ffffffffa066c70d&amp;gt;] ? class_import_destroy+0x5d/0x420 [obdclass]
&amp;lt;4&amp;gt; [&amp;lt;ffffffffa067067b&amp;gt;] ? obd_zombie_impexp_cull+0xcb/0x5d0 [obdclass]
&amp;lt;4&amp;gt; [&amp;lt;ffffffffa0670be5&amp;gt;] ? obd_zombie_impexp_thread+0x65/0x190 [obdclass]
&amp;lt;4&amp;gt; [&amp;lt;ffffffff81064bc0&amp;gt;] ? default_wake_function+0x0/0x20
&amp;lt;4&amp;gt; [&amp;lt;ffffffffa0670b80&amp;gt;] ? obd_zombie_impexp_thread+0x0/0x190 [obdclass]
&amp;lt;4&amp;gt; [&amp;lt;ffffffff8109e71e&amp;gt;] ? kthread+0x9e/0xc0
&amp;lt;4&amp;gt; [&amp;lt;ffffffff8100c20a&amp;gt;] ? child_rip+0xa/0x20
&amp;lt;4&amp;gt; [&amp;lt;ffffffff8109e680&amp;gt;] ? kthread+0x0/0xc0
&amp;lt;4&amp;gt; [&amp;lt;ffffffff8100c200&amp;gt;] ? child_rip+0x0/0x20
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;It appears that the MDS was overloaded at that time.&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;crash&amp;gt; sys
      KERNEL: /usr/lib/debug/lib/modules/2.6.32-504.16.2.el6.Bull.74.x86_64/vmlinux
    DUMPFILE: vmcore  [PARTIAL DUMP]
        CPUS: 32
        DATE: Sun Jun  7 06:41:03 2015
      UPTIME: 4 days, 19:14:07
LOAD AVERAGE: 646.72, 556.57, 328.17
       TASKS: 2556
    NODENAME: mds111
     RELEASE: 2.6.32-504.16.2.el6.Bull.74.x86_64
     VERSION: #1 SMP Tue Apr 28 01:43:42 CEST 2015
     MACHINE: x86_64  (2266 Mhz)
      MEMORY: 64 GB
       PANIC: &lt;span class=&quot;code-quote&quot;&gt;&quot;Kernel panic - not syncing: LBUG&quot;&lt;/span&gt;
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;You will find attached some traces of my analysis of the vmcore. The customer is a blacksite, I can&apos;t provide the vmcore.&lt;/p&gt;

&lt;p&gt;It appears that the import involved in the LBUG is ffff880903ac1000. In the console, right before the LBUG, we can observe a LustreError involving the same import. It is also reported in the debug log:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;02000000:00020000:28.0F:1433652063.227347:0:12255:0:(sec.c:379:import_sec_validate_get()) &lt;span class=&quot;code-keyword&quot;&gt;import&lt;/span&gt; ffff880903ac1000 (FULL) with no sec
00000100:00040000:28.0:1433652063.248671:0:19842:0:(connection.c:104:ptlrpc_connection_put()) ASSERTION( atomic_read(&amp;amp;conn-&amp;gt;c_refcount) &amp;gt; 1 ) failed:
00000100:00040000:28.0:1433652063.259452:0:19842:0:(connection.c:104:ptlrpc_connection_put()) LBUG
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This import is Lustre client compute2823. Here is the console output of the compute node:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;00000001:02000400:11.0:1433651401.837787:0:30301:0:(debug.c:339:libcfs_debug_mark_buffer()) DEBUG MARKER: Sun Jun  7 06:30:01 2015

00000001:02000400:11.0:1433651701.267541:0:31155:0:(debug.c:339:libcfs_debug_mark_buffer()) DEBUG MARKER: Sun Jun  7 06:35:01 2015

00000001:02000400:10.0:1433652001.829909:0:31973:0:(debug.c:339:libcfs_debug_mark_buffer()) DEBUG MARKER: Sun Jun  7 06:40:01 2015

00000800:00020000:31.0:1433652118.496756:0:15908:0:(o2iblnd_cb.c:3018:kiblnd_check_txs_locked()) Timed out tx: tx_queue, 4 seconds
00000800:00020000:31.0:1433652118.505625:0:15908:0:(o2iblnd_cb.c:3081:kiblnd_check_conns()) Timed out RDMA with X.Y.Z.42@o2ib11 (54): c: 0, oc: 0, rc: 8
00000100:00000400:19.0:1433652118.517268:0:15965:0:(client.c:1942:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1433652063/real 1433652118]  req@ffff880c0cdf1400 x1502885054441032/t0(0) o103-&amp;gt;ptmp2-MDT0000-mdc-ffff88047c753800@X.Y.Z.42@o2ib11:17/18 lens 328/224 e 0 to 1 dl 1433652672 ref 1 fl Rpc:X/2/ffffffff rc -11/-1
00000100:00000400:7.0:1433652118.518199:0:15961:0:(client.c:1942:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1433652063/real 1433652118]  req@ffff880a81a1b800 x1502885054432068/t0(0) o103-&amp;gt;ptmp2-MDT0000-mdc-ffff88047c753800@X.Y.Z.42@o2ib11:17/18 lens 328/224 e 0 to 1 dl 1433652672 ref 1 fl Rpc:X/2/ffffffff rc -11/-1
00000100:00000400:3.0:1433652118.518222:0:15945:0:(client.c:1942:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1433652063/real 1433652118]  req@ffff8805270aa800 x1502885054436340/t0(0) o103-&amp;gt;ptmp2-MDT0000-mdc-ffff88047c753800@X.Y.Z.42@o2ib11:17/18 lens 328/224 e 0 to 1 dl 1433652672 ref 1 fl Rpc:X/2/ffffffff rc -11/-1
00000100:00000400:10.0:1433652118.518231:0:15960:0:(client.c:1942:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1433652063/real 1433652118]  req@ffff880c02ee5c00 x1502885054439124/t0(0) o103-&amp;gt;ptmp2-MDT0000-mdc-ffff88047c753800@X.Y.Z.42@o2ib11:17/18 lens 328/224 e 0 to 1 dl 1433652672 ref 1 fl Rpc:X/2/ffffffff rc -11/-1
00000100:00000400:27.0:1433652118.518233:0:15949:0:(client.c:1942:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1433652063/real 1433652118]  req@ffff880a85e69800 x1502885054432296/t0(0) o103-&amp;gt;ptmp2-MDT0000-mdc-ffff88047c753800@X.Y.Z.42@o2ib11:17/18 lens 328/224 e 0 to 1 dl 1433652672 ref 1 fl Rpc:X/2/ffffffff rc -11/-1
[...]
00000100:02020000:14.0:1433653897.305027:0:15941:0:(&lt;span class=&quot;code-keyword&quot;&gt;import&lt;/span&gt;.c:1359:ptlrpc_import_recovery_state_machine()) 167-0: ptmp2-MDT0000-mdc-ffff88047c753800: This client was evicted by ptmp2-MDT0000; in progress operations using &lt;span class=&quot;code-keyword&quot;&gt;this&lt;/span&gt; service will fail.
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This client has been evicted by the failover MDS. This is the only client we found evicted. We also have a vmcore for this node.&lt;/p&gt;

&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;06/08/2015 06:46 AM
compute2823: /proc/fs/lustre/mdc/ptmp2-MDT0000-mdc-ffff88047c753800/state:current_state: EVICTED
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;I have not been able to understand the root cause of the LBUG.&lt;/p&gt;

&lt;p&gt;Please let me know if you need further details.&lt;/p&gt;</description>
                <environment>RHEL6.6 + Lustre 2.5.3.90 w/ bull patches</environment>
        <key id="31446">LU-6991</key>
            <summary>LBUG in ptlrpc_connection_put(): ASSERTION( atomic_read(&amp;conn-&gt;c_refcount) &gt; 1 )</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="5">Cannot Reproduce</resolution>
                                        <assignee username="bzzz">Alex Zhuravlev</assignee>
                                    <reporter username="bruno.travouillon">Bruno Travouillon</reporter>
                        <labels>
                    </labels>
                <created>Wed, 12 Aug 2015 15:00:27 +0000</created>
                <updated>Thu, 11 Feb 2016 14:26:01 +0000</updated>
                            <resolved>Thu, 11 Feb 2016 14:26:01 +0000</resolved>
                                    <version>Lustre 2.5.4</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>9</watches>
                                                                            <comments>
                            <comment id="123979" author="pjones" created="Wed, 12 Aug 2015 18:30:07 +0000"  >&lt;p&gt;Hongchao&lt;/p&gt;

&lt;p&gt;Could you please help with this issue?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="124058" author="hongchao.zhang" created="Thu, 13 Aug 2015 15:37:59 +0000"  >&lt;p&gt;Thanks for your detailed analysis! &lt;/p&gt;

&lt;p&gt;I looked at the related codes, this obd_import could be the &quot;obd_export-&amp;gt;exp_imp_reverse&quot;, and it was used by &quot;ldlm_server_blocking_ast&quot;&lt;br/&gt;
to send the ldlm request (such as LDLM_BL_CALLBACK) just before it was replaced by the new obd_import in &quot;target_handle_connect&quot;,&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;int target_handle_connect(struct ptlrpc_request *req)
{
        ...
        spin_lock(&amp;amp;export-&amp;gt;exp_lock);
        if (export-&amp;gt;exp_imp_reverse != NULL)
                /* destroyed import can be still referenced in ctxt */
                tmp_imp = export-&amp;gt;exp_imp_reverse;
        export-&amp;gt;exp_imp_reverse = revimp;
        spin_unlock(&amp;amp;export-&amp;gt;exp_lock);
        ...
       if (tmp_imp != NULL)
                client_destroy_import(tmp_imp);
        ...
}
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;the &quot;imp_sec&quot; was cleaned up in &quot;client_destroy_import&quot;, then &quot;import ffff880903ac1000 (FULL) with no sec&quot; error is shown.&lt;br/&gt;
after the obd_import is put in &quot;ldlm_server_blocking_ast&quot;, it will be queued to &quot;obd_zombie_imports&quot; to be freed.&lt;/p&gt;

&lt;p&gt;but it is still unknown why this LASSERT is triggered and the &quot;c_refcount&quot; in the vmcore is NOT zero (it&apos;s 3)!&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;crash&amp;gt; struct ptlrpc_connection 0xffff880b157a7540
struct ptlrpc_connection {
  c_hash = {
    next = 0xffff880496d62b40,
    pprev = 0xffff8803242fdd00
  },
  c_self = 1407422302593066,
  c_peer = {
    nid = 1407422302596706,
    pid = 12345
  },
  c_remote_uuid = {
    uuid = &quot;NET_0x5000OBFUSCATE_UUID\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000&quot;
  },
  c_refcount = {
    counter = 3
  }
}
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="124691" author="hongchao.zhang" created="Thu, 20 Aug 2015 15:07:06 +0000"  >&lt;p&gt;Is the address of MDS is different from that of the evicted client?&lt;br/&gt;
as per the dumped &quot;ptlrpc_connection&quot;, the &quot;c_refcount&quot; is &quot;3&quot; and should not cause the LBUG.&lt;/p&gt;

&lt;p&gt;when did the LBUG occurs? in normal operations or during failing over to another MDS?&lt;/p&gt;</comment>
                            <comment id="124771" author="bruno.travouillon" created="Fri, 21 Aug 2015 09:33:18 +0000"  >&lt;p&gt;LBUG occurs during normal operation. However, we can notice the increasing &lt;tt&gt;LOAD AVERAGE: 646.72, 556.57, 328.17&lt;/tt&gt; in the crash. I will try to get the debug log from the crash if it can help.&lt;/p&gt;

&lt;p&gt;Once mds111 panic, the MDT has been failover to mds110. compute2823 has been evicted by mds110 (failover).&lt;/p&gt;

&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;00000100:02020000:14.0:1433653897.305027:0:15941:0:(&lt;span class=&quot;code-keyword&quot;&gt;import&lt;/span&gt;.c:1359:ptlrpc_import_recovery_state_machine()) 167-0: ptmp2-MDT0000-mdc-ffff88047c753800: This client was evicted by ptmp2-MDT0000; in progress operations using &lt;span class=&quot;code-keyword&quot;&gt;this&lt;/span&gt; service will fail.
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The nid of mds111 is X.Y.Z.42@o2ib11, as seen in the compute2823 console:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;00000800:00020000:31.0:1433652118.505625:0:15908:0:(o2iblnd_cb.c:3081:kiblnd_check_conns()) Timed out RDMA with X.Y.Z.42@o2ib11 (54): c: 0, oc: 0, rc: 8&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The nid of the evicted client is X.Y.46.98@o2ib11.&lt;/p&gt;</comment>
                            <comment id="125170" author="pjones" created="Wed, 26 Aug 2015 12:59:05 +0000"  >&lt;p&gt;Alex&lt;/p&gt;

&lt;p&gt;Could you please assist with this issue?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="125994" author="bruno.travouillon" created="Wed, 2 Sep 2015 08:20:55 +0000"  >&lt;p&gt;Alex,&lt;/p&gt;

&lt;p&gt;Do you need more info on this one ?&lt;/p&gt;</comment>
                            <comment id="126898" author="bzzz" created="Thu, 10 Sep 2015 12:51:42 +0000"  >&lt;p&gt;is it possible to run with a debugging patch?&lt;/p&gt;</comment>
                            <comment id="126900" author="bruno.travouillon" created="Thu, 10 Sep 2015 12:55:22 +0000"  >&lt;p&gt;This is a production cluster, I will need to discuss this. Moreover, we hit this LBUG once for the moment.&lt;/p&gt;

&lt;p&gt;Can we consider using a SystemTap script for debugging?&lt;/p&gt;</comment>
                            <comment id="127058" author="bzzz" created="Fri, 11 Sep 2015 09:34:14 +0000"  >&lt;p&gt;not ready to answer about SystemTap.. will try to find an existing state in the code signaling on the condition.&lt;/p&gt;</comment>
                            <comment id="127059" author="bruno.travouillon" created="Fri, 11 Sep 2015 09:43:28 +0000"  >&lt;p&gt;SystemTap is a short term solution for the production system. Is your debug patch available in gerrit? I can ask my engineering to consider its inclusion into our build. Thanks.&lt;/p&gt;</comment>
                            <comment id="127060" author="bzzz" created="Fri, 11 Sep 2015 09:48:04 +0000"  >&lt;p&gt;it&apos;s been in testing.. especially given your system is production.&lt;/p&gt;</comment>
                            <comment id="128905" author="bzzz" created="Wed, 30 Sep 2015 15:36:40 +0000"  >&lt;p&gt;please, tell what additional patches are used.&lt;/p&gt;</comment>
                            <comment id="129016" author="bruno.travouillon" created="Thu, 1 Oct 2015 17:09:25 +0000"  >&lt;p&gt;Here is the list of additional patches on top of 2.5.3.90:&lt;/p&gt;

&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;&lt;a href=&quot;#19889&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;NF#19889&lt;/a&gt;/&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6471&quot; title=&quot;Unexpected Lustre Client LBUG in llog_write()&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6471&quot;&gt;&lt;del&gt;LU-6471&lt;/del&gt;&lt;/a&gt; obdclass: fix llog_cat_cleanup() usage on client&lt;/li&gt;
	&lt;li&gt;&lt;a href=&quot;#19670&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;NF#19670&lt;/a&gt;/&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6392&quot; title=&quot;short read/write with stripe count &amp;gt; 1&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6392&quot;&gt;&lt;del&gt;LU-6392&lt;/del&gt;&lt;/a&gt;/&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6389&quot; title=&quot;read()/write() returning less than available bytes intermittently&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6389&quot;&gt;&lt;del&gt;LU-6389&lt;/del&gt;&lt;/a&gt; llite: restart short read/write for normal IO&lt;/li&gt;
	&lt;li&gt;&lt;a href=&quot;#16013&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;NF#16013&lt;/a&gt; update PATH of mount.lustre with e2fsprogs utilities directory&lt;/li&gt;
	&lt;li&gt;&lt;a href=&quot;#17470&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;NF#17470&lt;/a&gt; make oss_num_treads max value a tunable of ost module&lt;/li&gt;
	&lt;li&gt;&lt;a href=&quot;#17550&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;NF#17550&lt;/a&gt; add extents_stats_max_processes tunable&lt;/li&gt;
	&lt;li&gt;&lt;a href=&quot;#18158&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;NF#18158&lt;/a&gt; Too many ll_inode_revalidate_fini errors in syslog&lt;/li&gt;
	&lt;li&gt;&lt;a href=&quot;#19021&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;NF#19021&lt;/a&gt;/&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-5740&quot; title=&quot;Kernel upgrade [RHEL6.6 2.6.32-504.el6]&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-5740&quot;&gt;&lt;del&gt;LU-5740&lt;/del&gt;&lt;/a&gt; kernel upgrade &lt;span class=&quot;error&quot;&gt;&amp;#91;RHEL6.6 2.6.32-504.el6&amp;#93;&lt;/span&gt;&lt;/li&gt;
	&lt;li&gt;&lt;a href=&quot;#19191&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;NF#19191&lt;/a&gt;/&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-4582&quot; title=&quot;After failing over Lustre MGS node to the secondary, client mount fails with -5&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-4582&quot;&gt;&lt;del&gt;LU-4582&lt;/del&gt;&lt;/a&gt; mgc: replace hard-coded MGC_ENQUEUE_LIMIT value&lt;/li&gt;
	&lt;li&gt;&lt;a href=&quot;#19198&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;NF#19198&lt;/a&gt;/&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-5678&quot; title=&quot;kernel crash due to NULL pointer dereference in kiblnd_pool_alloc_node()&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-5678&quot;&gt;&lt;del&gt;LU-5678&lt;/del&gt;&lt;/a&gt; o2iblnd: connection refcount fix for kiblnd_post_rx&lt;/li&gt;
	&lt;li&gt;&lt;a href=&quot;#18496&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;NF#18496&lt;/a&gt;/&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-5393&quot; title=&quot;LBUG: (ost_handler.c:882:ost_brw_read()) ASSERTION( local_nb[i].rc == 0 ) failed&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-5393&quot;&gt;&lt;del&gt;LU-5393&lt;/del&gt;&lt;/a&gt; osd-ldiskfs: read i_size once to protect against race&lt;/li&gt;
	&lt;li&gt;&lt;a href=&quot;#19427&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;NF#19427&lt;/a&gt;/&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-3727&quot; title=&quot;LBUG (llite_nfs.c:281:ll_get_parent()) ASSERTION(body-&amp;gt;valid &amp;amp; OBD_MD_FLID) failed&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-3727&quot;&gt;&lt;del&gt;LU-3727&lt;/del&gt;&lt;/a&gt; nfs: fix ll_get_parent() LBUG caused by permission&lt;/li&gt;
	&lt;li&gt;&lt;a href=&quot;#19450&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;NF#19450&lt;/a&gt;/&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-5522&quot; title=&quot;ofd_prolong_extent_locks()) ASSERTION( lock-&amp;gt;l_flags &amp;amp; 0x0000000000000020ULL ) failed&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-5522&quot;&gt;&lt;del&gt;LU-5522&lt;/del&gt;&lt;/a&gt; ldlm: remove expired lock from per-export list&lt;/li&gt;
	&lt;li&gt;&lt;a href=&quot;#18371&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;NF#18371&lt;/a&gt;/&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-5264&quot; title=&quot;ASSERTION( info-&amp;gt;oti_r_locks == 0 ) at OST umount&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-5264&quot;&gt;&lt;del&gt;LU-5264&lt;/del&gt;&lt;/a&gt; obdclass: fix race during key quiescency&lt;/li&gt;
	&lt;li&gt;&lt;a href=&quot;#16790&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;NF#16790&lt;/a&gt;/&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6049&quot; title=&quot;General Protection Fault at echo_session_key_fini+0xa9&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6049&quot;&gt;&lt;del&gt;LU-6049&lt;/del&gt;&lt;/a&gt; obdclass: Add synchro in lu_context_key_degister()&lt;/li&gt;
	&lt;li&gt;&lt;a href=&quot;#19488&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;NF#19488&lt;/a&gt;/&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6084&quot; title=&quot;Tests are failed due to &amp;#39;recovery is aborted by hard timeout&amp;#39;&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6084&quot;&gt;&lt;del&gt;LU-6084&lt;/del&gt;&lt;/a&gt; ptlrpc: prevent request timeout grow due to recovery&lt;/li&gt;
	&lt;li&gt;&lt;a href=&quot;#18975&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;NF#18975&lt;/a&gt;/&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-5764&quot; title=&quot;Crash of MDS on &amp;quot;apparent buffer overflow&amp;quot;&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-5764&quot;&gt;&lt;del&gt;LU-5764&lt;/del&gt;&lt;/a&gt; proc: crash of mds on apparent buffer overflow&lt;/li&gt;
&lt;/ul&gt;
</comment>
                            <comment id="130361" author="bzzz" created="Wed, 14 Oct 2015 11:38:06 +0000"  >&lt;p&gt;sorry, I still have no a good theory on what happened. do you still have that vmcore? would it be possible to dump the content of the import.&lt;br/&gt;
this looks like a race where some RPC got a reference on the connection while it&apos;s been replaced by a new one.&lt;/p&gt;</comment>
                            <comment id="132657" author="bruno.travouillon" created="Wed, 4 Nov 2015 19:29:14 +0000"  >&lt;p&gt;Hi Alex,&lt;/p&gt;

&lt;p&gt;Here is the requested import. Sorry for the delay.&lt;/p&gt;

&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;crash&amp;gt; struct obd_import ffff880903ac1000
struct obd_import {                      
  imp_handle = {                         
    h_link = {                           
      next = 0xffff880ba8b11980,         
      prev = 0xffffc90018637e28          
    },                                   
    h_cookie = 4546027237220243948,      
    h_owner = 0x0,                       
    h_ops = 0xffffffffa06fef10,          
    h_rcu = {                            
      next = 0x0,                        
      func = 0                           
    },                                   
    h_lock = {                           
      raw_lock = {                       
        slock = 0                        
      }                                  
    },                                   
    h_size = 0,                          
    h_in = 1                             
  },                                     
  imp_refcount = {                       
    counter = 1                          
  },                                     
  imp_dlm_handle = {                     
    cookie = 0                           
  },                                     
  imp_connection = 0xffff880b157a7540,   
  imp_client = 0xffff8808d060e270,       
  imp_pinger_chain = {                   
    next = 0xffff880903ac1060,           
    prev = 0xffff880903ac1060            
  },                                     
  imp_zombie_chain = {                   
    next = 0xffff880903ac1070,           
    prev = 0xffff880903ac1070            
  },                                     
  imp_replay_list = {                    
    next = 0xffff880903ac1080,           
    prev = 0xffff880903ac1080            
  },                                     
  imp_sending_list = {                   
    next = 0xffff880903ac1090,           
    prev = 0xffff880903ac1090            
  },                                     
  imp_delayed_list = {                   
    next = 0xffff880903ac10a0,           
    prev = 0xffff880903ac10a0            
  },                                     
  imp_committed_list = {                 
    next = 0xffff880903ac10b0,           
    prev = 0xffff880903ac10b0            
  },                                     
  imp_replay_cursor = 0xffff880903ac10b0, 
  imp_obd = 0xffff8808d060e138,           
  imp_sec = 0xffffffffa09138c0,           
  imp_sec_mutex = {                       
    count = {                             
      counter = 1                         
    },                                    
    wait_lock = {                         
      raw_lock = {                        
        slock = 0                         
      }                                   
    },                                    
    wait_list = 0xffff880903ac10e0,       
    spin_mlock = 0x0,                     
    owner = 0x0                           
  },                                      
  imp_sec_expire = 0,                     
  imp_recovery_waitq = {                  
    lock = {                              
      raw_lock = {                        
        slock = 0                         
      }                                   
    },                                    
    task_list = {                         
      next = 0xffff880903ac1108,          
      prev = 0xffff880903ac1108           
    }                                     
  },                                      
  imp_inflight = {                        
    counter = 0                           
  },                                      
  imp_unregistering = {                   
    counter = 0                           
  },                                      
  imp_replay_inflight = {                 
    counter = 0                           
  },                                      
  imp_inval_count = {                     
    counter = 0                           
  },                                      
  imp_timeouts = {                        
    counter = 0                           
  },                                      
  imp_state = LUSTRE_IMP_FULL,            
  imp_replay_state = 0,                   
  imp_state_hist = {{                     
      ish_state = 0,                      
      ish_time = 0                        
    }, {                                  
      ish_state = 0,                      
      ish_time = 0                        
    }, {                                  
      ish_state = 0,                      
      ish_time = 0                        
    }, {                                  
      ish_state = 0,                      
      ish_time = 0                        
    }, {                                  
      ish_state = 0,                      
      ish_time = 0                        
    }, {                                  
      ish_state = 0,                      
      ish_time = 0                        
    }, {                                  
      ish_state = 0,                      
      ish_time = 0                        
    }, {                                  
      ish_state = 0,                      
      ish_time = 0                        
    }, {                                  
      ish_state = 0,                      
      ish_time = 0                        
    }, {                                  
      ish_state = 0,                      
      ish_time = 0                        
    }, {                                  
      ish_state = 0,                      
      ish_time = 0                        
    }, {                                  
      ish_state = 0,                      
      ish_time = 0                        
    }, {                                  
      ish_state = 0,                      
      ish_time = 0                        
    }, {                                  
      ish_state = 0,                      
      ish_time = 0                        
    }, {                                  
      ish_state = 0,                      
      ish_time = 0                        
    }, {                                  
      ish_state = 0,                      
      ish_time = 0                        
    }},                                   
  imp_state_hist_idx = 0,                 
  imp_generation = 0,                     
  imp_conn_cnt = 0,                       
  imp_last_generation_checked = 0,        
  imp_last_replay_transno = 0,            
  imp_peer_committed_transno = 0,         
  imp_last_transno_checked = 0,           
  imp_remote_handle = {                   
    cookie = 12735384055159921237         
  },                                      
  imp_next_ping = 0,                      
  imp_last_success_conn = 0,              
  imp_conn_list = {                       
    next = 0xffff880903ac1278,            
    prev = 0xffff880903ac1278             
  },                                      
  imp_conn_current = 0x0,                 
  imp_lock = {                            
    raw_lock = {                          
      slock = 131074                      
    }                                     
  },                                      
  imp_no_timeout = 0,                     
  imp_invalid = 0,                        
  imp_deactive = 0,                       
  imp_replayable = 0,                     
  imp_dlm_fake = 1,                       
  imp_server_timeout = 0,                 
  imp_delayed_recovery = 0,               
  imp_no_lock_replay = 0,                 
  imp_vbr_failed = 0,                     
  imp_force_verify = 0,                   
  imp_force_next_verify = 0,              
  imp_pingable = 0,                       
  imp_resend_replay = 0,                  
  imp_no_pinger_recover = 0,              
  imp_need_mne_swab = 0,                  
  imp_force_reconnect = 0,                
  imp_connect_tried = 0,                  
  imp_connect_op = 0,                     
  imp_connect_data = {                    
    ocd_connect_flags = 0,                
    ocd_version = 0,                      
    ocd_grant = 0,                        
    ocd_index = 0,                        
    ocd_brw_size = 0,                     
    ocd_ibits_known = 0,                  
    ocd_blocksize = 0 &lt;span class=&quot;code-quote&quot;&gt;&apos;\000&apos;&lt;/span&gt;,             
    ocd_inodespace = 0 &lt;span class=&quot;code-quote&quot;&gt;&apos;\000&apos;&lt;/span&gt;,            
    ocd_grant_extent = 0,                 
    ocd_unused = 0,                       
    ocd_transno = 0,                      
    ocd_group = 0,                        
    ocd_cksum_types = 0,                  
    ocd_max_easize = 0,                   
    ocd_instance = 0,                     
    ocd_maxbytes = 0,                     
    padding1 = 0,                         
    padding2 = 0,                         
    padding3 = 0,                         
    padding4 = 0,                         
    padding5 = 0,                         
    padding6 = 0,                         
    padding7 = 0,                         
    padding8 = 0,                         
    padding9 = 0,                         
    paddingA = 0,                         
    paddingB = 0,                         
    paddingC = 0,                         
    paddingD = 0,                         
    paddingE = 0,                         
    paddingF = 0                          
  },                                      
  imp_connect_flags_orig = 0,             
  imp_connect_error = 0,                  
  imp_msg_magic = 198183891,              
  imp_msghdr_flags = 3,                   
  imp_rq_pool = 0x0,                      
  imp_at = {                              
    iat_portal = {0, 0, 0, 0, 0, 0, 0, 0}, 
    iat_net_latency = {                    
      at_binstart = 0,                     
      at_hist = {0, 0, 0, 0},              
      at_flags = 0,                        
      at_current = 0,                      
      at_worst_ever = 0,                   
      at_worst_time = 1433652063,          
      at_lock = {                          
        raw_lock = {                       
          slock = 65537                    
        }                                  
      }                                    
    },                                     
    iat_service_estimate = {{              
        at_binstart = 0,                   
        at_hist = {0, 0, 0, 0},            
        at_flags = 1,                      
        at_current = 5,                    
        at_worst_ever = 5,                 
        at_worst_time = 1433652063,        
        at_lock = {                        
          raw_lock = {                     
            slock = 65537                  
          }                                
        }                                  
      }, {                                 
        at_binstart = 0,                   
        at_hist = {0, 0, 0, 0},            
        at_flags = 1,                      
        at_current = 5,                    
        at_worst_ever = 5,                 
        at_worst_time = 1433652063,        
        at_lock = {                        
          raw_lock = {                     
            slock = 65537                  
          }                                
        }                                  
      }, {                                 
        at_binstart = 0,                   
        at_hist = {0, 0, 0, 0},            
        at_flags = 1,                      
        at_current = 5,                    
        at_worst_ever = 5,                 
        at_worst_time = 1433652063,        
        at_lock = {                        
          raw_lock = {                     
            slock = 65537                  
          }                                
        }                                  
      }, {                                 
        at_binstart = 0,
        at_hist = {0, 0, 0, 0},
        at_flags = 1,
        at_current = 5,
        at_worst_ever = 5,
        at_worst_time = 1433652063,
        at_lock = {
          raw_lock = {
            slock = 65537
          }
        }
      }, {
        at_binstart = 0,
        at_hist = {0, 0, 0, 0},
        at_flags = 1,
        at_current = 5,
        at_worst_ever = 5,
        at_worst_time = 1433652063,
        at_lock = {
          raw_lock = {
            slock = 65537
          }
        }
      }, {
        at_binstart = 0,
        at_hist = {0, 0, 0, 0},
        at_flags = 1,
        at_current = 5,
        at_worst_ever = 5,
        at_worst_time = 1433652063,
        at_lock = {
          raw_lock = {
            slock = 65537
          }
        }
      }, {
        at_binstart = 0,
        at_hist = {0, 0, 0, 0},
        at_flags = 1,
        at_current = 5,
        at_worst_ever = 5,
        at_worst_time = 1433652063,
        at_lock = {
          raw_lock = {
            slock = 65537
          }
        }
      }, {
        at_binstart = 0,
        at_hist = {0, 0, 0, 0},
        at_flags = 1,
        at_current = 5,
        at_worst_ever = 5,
        at_worst_time = 1433652063,
        at_lock = {
          raw_lock = {
            slock = 65537
          }
        }
      }}
  },
  imp_last_reply_time = 0
}
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="136696" author="spiechurski" created="Thu, 17 Dec 2015 14:09:34 +0000"  >&lt;p&gt;Hi Alex,&lt;/p&gt;

&lt;p&gt;Do you have any news or progress on this ticket ?&lt;/p&gt;</comment>
                            <comment id="140344" author="spiechurski" created="Thu, 28 Jan 2016 14:24:15 +0000"  >&lt;p&gt;Alex, we don&apos;t have any news for almost 3 months.&lt;br/&gt;
Can we have a (even small) update ?&lt;/p&gt;</comment>
                            <comment id="140345" author="bzzz" created="Thu, 28 Jan 2016 14:39:34 +0000"  >&lt;p&gt;I&apos;m sorry Sebastien, I&apos;m not able to reproduce this and all my hypothesis failed..&lt;/p&gt;</comment>
                            <comment id="141963" author="spiechurski" created="Thu, 11 Feb 2016 14:22:48 +0000"  >&lt;p&gt;Hi Alex.&lt;br/&gt;
Ok, so if you don&apos;t have more idea, and we don&apos;t get more information/data to work on, I guess we will not be able to progress on this one.&lt;br/&gt;
Since the crash was seen only once in the last 6 months, we don&apos;t expect to see it anytime soon.&lt;br/&gt;
I propose to close this as unresolved.&lt;/p&gt;</comment>
                            <comment id="141966" author="pjones" created="Thu, 11 Feb 2016 14:26:01 +0000"  >&lt;p&gt;Sebastien&lt;/p&gt;

&lt;p&gt;This does seem reasonable under the circumstances&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                            <attachment id="19522" name="import" size="13323" author="bruno.travouillon" created="Wed, 4 Nov 2015 19:30:46 +0000"/>
                            <attachment id="18608" name="note" size="21978" author="bruno.travouillon" created="Wed, 12 Aug 2015 15:00:27 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzxkb3:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>