<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:11:21 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-7721] lfsck: slab &apos;size-1048576&apos; exhaust memory (oom-killer)</title>
                <link>https://jira.whamcloud.com/browse/LU-7721</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;The error happened during soak testing of build &apos;20160126&apos; (see &lt;a href=&quot;https://wiki.hpdd.intel.com/display/Releases/Soak+Testing+on+Lola#SoakTestingonLola-20160126&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://wiki.hpdd.intel.com/display/Releases/Soak+Testing+on+Lola#SoakTestingonLola-20160126&lt;/a&gt;). DNE is enabled.&lt;br/&gt;
MDTs had been formated with ldiskfs, OSTs with zfs.&lt;br/&gt;
No faults were injected during soak test. Only application load and execution of lfsck were imposed on the test cluster.&lt;/p&gt;

&lt;p&gt;Sequence of events:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;Jan 27 05:44:56   - Started &lt;tt&gt;lfsck&lt;/tt&gt; - command on primary MDS (&lt;tt&gt;lola-8&lt;/tt&gt;):
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;lctl lfsck_start -M soaked-MDT0000 -s 1000 -t all -A 
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;&lt;/li&gt;
	&lt;li&gt;Jan 27 05:49   - OSS node  &lt;tt&gt;lola-5&lt;/tt&gt; hit LBUG (see &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-7720&quot; title=&quot;osd_object.c:925:osd_attr_set()) ASSERTION( dt_object_exists(dt)&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-7720&quot;&gt;&lt;del&gt;LU-7720&lt;/del&gt;&lt;/a&gt;)&lt;/li&gt;
	&lt;li&gt;Jan 27 08:46    Rebooted &lt;tt&gt;lola-5&lt;/tt&gt;, remounted OSTs, enabled debug for lfsck + increased debug buffer (512MB);&lt;br/&gt;
 increasing number of blocked &lt;tt&gt;ost_*&lt;/tt&gt; - threads&lt;br/&gt;
 A huge number of debug logs were printed before oom-killer starts:
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Call Trace:
 [&amp;lt;ffffffff8106cc43&amp;gt;] ? dequeue_entity+0x113/0x2e0
 [&amp;lt;ffffffff8152bd26&amp;gt;] __mutex_lock_slowpath+0x96/0x210
 [&amp;lt;ffffffffa0fcbe7b&amp;gt;] ? ofd_seq_load+0xbb/0xa90 [ofd]
 [&amp;lt;ffffffff8152b84b&amp;gt;] mutex_lock+0x2b/0x50
 [&amp;lt;ffffffffa0fbff18&amp;gt;] ofd_create_hdl+0xc28/0x2640 [ofd]
 [&amp;lt;ffffffffa093a66b&amp;gt;] ? lustre_pack_reply_v2+0x1eb/0x280 [ptlrpc]
 [&amp;lt;ffffffffa093a7a6&amp;gt;] ? lustre_pack_reply_flags+0xa6/0x1e0 [ptlrpc]
 [&amp;lt;ffffffffa093a8f1&amp;gt;] ? lustre_pack_reply+0x11/0x20 [ptlrpc]
 [&amp;lt;ffffffffa09a4f9c&amp;gt;] tgt_request_handle+0x8ec/0x1470 [ptlrpc]
 [&amp;lt;ffffffffa094c201&amp;gt;] ptlrpc_main+0xe41/0x1910 [ptlrpc]
 [&amp;lt;ffffffff8152a39e&amp;gt;] ? thread_return+0x4e/0x7d0
 [&amp;lt;ffffffffa094b3c0&amp;gt;] ? ptlrpc_main+0x0/0x1910 [ptlrpc]
 [&amp;lt;ffffffff8109e78e&amp;gt;] kthread+0x9e/0xc0
 [&amp;lt;ffffffff8100c28a&amp;gt;] child_rip+0xa/0x20
 [&amp;lt;ffffffff8109e6f0&amp;gt;] ? kthread+0x0/0xc0
 [&amp;lt;ffffffff8100c280&amp;gt;] ? child_rip+0x0/0x20

LustreError: dumping log to /tmp/lustre-log.1453949036.15397
Pid: 15443, comm: ll_ost00_065
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;--&amp;gt; attached this debug log file (&lt;tt&gt;/tmp/lustre-log.1453949036.15397&lt;/tt&gt;)&lt;/p&gt;&lt;/li&gt;
	&lt;li&gt;Jan 27 18:45 oom-killer started on OSS node &lt;tt&gt;lola-5&lt;/tt&gt; + crash 3 mins later&lt;/li&gt;
	&lt;li&gt;Memory exhausted by slab &apos;size-1048576&apos; with ~ 27GB&lt;br/&gt;
(see archive: lola-5-oom-killer-2.tar.bz2)&lt;/li&gt;
	&lt;li&gt;Jan 28 03:59           - lfsck - command still in not finished (see mds-lfsck-status-nslayout.log.bz2,  mds-lfsck-status-oi.log.bz2,  oss-lfsck-status.log.bz2)&lt;/li&gt;
&lt;/ul&gt;
</description>
                <environment>lola&lt;br/&gt;
build: master branch, 2.7.65-38-g607f691 ; 607f6919ea67b101796630d4b55649a12ea0e859</environment>
        <key id="34340">LU-7721</key>
            <summary>lfsck: slab &apos;size-1048576&apos; exhaust memory (oom-killer)</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="1" iconUrl="https://jira.whamcloud.com/images/icons/priorities/blocker.svg">Blocker</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="3">Duplicate</resolution>
                                        <assignee username="yong.fan">nasf</assignee>
                                    <reporter username="heckes">Frank Heckes</reporter>
                        <labels>
                            <label>soak</label>
                    </labels>
                <created>Thu, 28 Jan 2016 13:48:12 +0000</created>
                <updated>Fri, 29 Jan 2016 05:37:06 +0000</updated>
                            <resolved>Fri, 29 Jan 2016 05:35:25 +0000</resolved>
                                                    <fixVersion>Lustre 2.8.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>3</watches>
                                                                            <comments>
                            <comment id="140357" author="yong.fan" created="Thu, 28 Jan 2016 16:01:43 +0000"  >&lt;p&gt;According to the log file messages-lola-5.log.bz2, there are 4 kinds of threads hung there:&lt;/p&gt;

&lt;p&gt;1) out handlers for attr set. These RPC handlers are hit ASSERT() as described in &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6656&quot; title=&quot;Non atomic allocation under spin lock in qmt_glimpse_lock&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6656&quot;&gt;&lt;del&gt;LU-6656&lt;/del&gt;&lt;/a&gt;/&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-7720&quot; title=&quot;osd_object.c:925:osd_attr_set()) ASSERTION( dt_object_exists(dt)&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-7720&quot;&gt;&lt;del&gt;LU-7720&lt;/del&gt;&lt;/a&gt;&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Jan 27 08:46:18 lola-5 kernel: LustreError: 7807:0:(osd_object.c:925:osd_attr_set()) ASSERTION( dt_object_exists(dt) ) failed: 
Jan 27 08:46:18 lola-5 kernel: LustreError: 7807:0:(osd_object.c:925:osd_attr_set()) LBUG
Jan 27 08:46:18 lola-5 kernel: Pid: 7807, comm: ll_ost_out00_00
Jan 27 08:46:18 lola-5 kernel: 
Jan 27 08:46:18 lola-5 kernel: Call Trace:
Jan 27 08:46:18 lola-5 kernel: [&amp;lt;ffffffffa05c5875&amp;gt;] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
Jan 27 08:46:18 lola-5 kernel: [&amp;lt;ffffffffa05c5e77&amp;gt;] lbug_with_loc+0x47/0xb0 [libcfs]
Jan 27 08:46:18 lola-5 kernel: [&amp;lt;ffffffffa0b25af5&amp;gt;] osd_attr_set+0xdd5/0xe40 [osd_zfs]
Jan 27 08:46:18 lola-5 kernel: [&amp;lt;ffffffffa070e795&amp;gt;] ? keys_fill+0xd5/0x1b0 [obdclass]
Jan 27 08:46:18 lola-5 kernel: [&amp;lt;ffffffffa02da916&amp;gt;] ? spl_kmem_alloc+0x96/0x1a0 [spl]
Jan 27 08:46:18 lola-5 kernel: [&amp;lt;ffffffffa09b2033&amp;gt;] out_tx_attr_set_exec+0xa3/0x480 [ptlrpc]
Jan 27 08:46:18 lola-5 kernel: [&amp;lt;ffffffffa09a849a&amp;gt;] out_tx_end+0xda/0x5c0 [ptlrpc]
Jan 27 08:46:18 lola-5 kernel: [&amp;lt;ffffffffa09ae364&amp;gt;] out_handle+0x11c4/0x19a0 [ptlrpc]
Jan 27 08:46:18 lola-5 kernel: [&amp;lt;ffffffff8152b83e&amp;gt;] ? mutex_lock+0x1e/0x50
Jan 27 08:46:18 lola-5 kernel: [&amp;lt;ffffffffa099d6fa&amp;gt;] ? req_can_reconstruct+0x6a/0x120 [ptlrpc]
Jan 27 08:46:18 lola-5 kernel: [&amp;lt;ffffffffa09a4f9c&amp;gt;] tgt_request_handle+0x8ec/0x1470 [ptlrpc]
Jan 27 08:46:18 lola-5 kernel: [&amp;lt;ffffffffa094c201&amp;gt;] ptlrpc_main+0xe41/0x1910 [ptlrpc]
Jan 27 08:46:18 lola-5 kernel: [&amp;lt;ffffffff8152a39e&amp;gt;] ? thread_return+0x4e/0x7d0
Jan 27 08:46:18 lola-5 kernel: [&amp;lt;ffffffffa094b3c0&amp;gt;] ? ptlrpc_main+0x0/0x1910 [ptlrpc]
Jan 27 08:46:18 lola-5 kernel: [&amp;lt;ffffffff8109e78e&amp;gt;] kthread+0x9e/0xc0
Jan 27 08:46:18 lola-5 kernel: [&amp;lt;ffffffff8100c28a&amp;gt;] child_rip+0xa/0x20
Jan 27 08:46:18 lola-5 kernel: [&amp;lt;ffffffff8109e6f0&amp;gt;] ? kthread+0x0/0xc0
Jan 27 08:46:18 lola-5 kernel: [&amp;lt;ffffffff8100c280&amp;gt;] ? child_rip+0x0/0x2
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;2) IO server threads. They are blocked when start transaction on ZFS.&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Jan 27 08:56:22 lola-5 kernel: LNet: Service thread pid 7776 was inactive for 200.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
Jan 27 08:56:22 lola-5 kernel: Pid: 7776, comm: ll_ost_io01_003
Jan 27 08:56:22 lola-5 kernel: 
Jan 27 08:56:22 lola-5 kernel: Call Trace:
Jan 27 08:56:22 lola-5 kernel: [&amp;lt;ffffffff8117591d&amp;gt;] ? kmem_cache_alloc_node_trace+0x1cd/0x200
Jan 27 08:56:22 lola-5 kernel: [&amp;lt;ffffffff8109ee6e&amp;gt;] ? prepare_to_wait_exclusive+0x4e/0x80
Jan 27 08:56:22 lola-5 kernel: [&amp;lt;ffffffffa02e178d&amp;gt;] cv_wait_common+0x11d/0x130 [spl]
Jan 27 08:56:22 lola-5 kernel: [&amp;lt;ffffffff8109ec20&amp;gt;] ? autoremove_wake_function+0x0/0x40
Jan 27 08:56:22 lola-5 kernel: [&amp;lt;ffffffff8100bc0e&amp;gt;] ? apic_timer_interrupt+0xe/0x20
Jan 27 08:56:22 lola-5 kernel: [&amp;lt;ffffffffa02e17f5&amp;gt;] __cv_wait+0x15/0x20 [spl]
Jan 27 08:56:22 lola-5 kernel: [&amp;lt;ffffffffa0356d4d&amp;gt;] dmu_tx_wait+0x21d/0x400 [zfs]
Jan 27 08:56:22 lola-5 kernel: [&amp;lt;ffffffff8152b844&amp;gt;] ? mutex_lock+0x24/0x50
Jan 27 08:56:22 lola-5 kernel: [&amp;lt;ffffffffa0357121&amp;gt;] dmu_tx_assign+0xa1/0x570 [zfs]
Jan 27 08:56:22 lola-5 kernel: [&amp;lt;ffffffffa0b18f5d&amp;gt;] osd_trans_start+0xed/0x430 [osd_zfs]
Jan 27 08:56:22 lola-5 kernel: [&amp;lt;ffffffffa0fccc4c&amp;gt;] ofd_trans_start+0x7c/0x100 [ofd]
Jan 27 08:56:22 lola-5 kernel: [&amp;lt;ffffffffa0fd4993&amp;gt;] ofd_commitrw_write+0x583/0x10f0 [ofd]
Jan 27 08:56:22 lola-5 kernel: [&amp;lt;ffffffffa0fd5abf&amp;gt;] ofd_commitrw+0x5bf/0xbf0 [ofd]
Jan 27 08:56:22 lola-5 kernel: [&amp;lt;ffffffffa06ec921&amp;gt;] ? lprocfs_counter_add+0x151/0x1c0 [obdclass]
Jan 27 08:56:22 lola-5 kernel: [&amp;lt;ffffffffa099cfd4&amp;gt;] obd_commitrw+0x114/0x380 [ptlrpc]
Jan 27 08:56:22 lola-5 kernel: [&amp;lt;ffffffffa09a6790&amp;gt;] tgt_brw_write+0xc70/0x1540 [ptlrpc]
Jan 27 08:56:22 lola-5 kernel: [&amp;lt;ffffffff8105c256&amp;gt;] ? enqueue_task+0x66/0x80
Jan 27 08:56:22 lola-5 kernel: [&amp;lt;ffffffff8105872d&amp;gt;] ? check_preempt_curr+0x6d/0x90
Jan 27 08:56:22 lola-5 kernel: [&amp;lt;ffffffff81064a6e&amp;gt;] ? try_to_wake_up+0x24e/0x3e0
Jan 27 08:56:22 lola-5 kernel: [&amp;lt;ffffffffa093e070&amp;gt;] ? lustre_swab_niobuf_remote+0x0/0x30 [ptlrpc]
Jan 27 08:56:22 lola-5 kernel: [&amp;lt;ffffffffa08f35d0&amp;gt;] ? target_bulk_timeout+0x0/0xc0 [ptlrpc]
Jan 27 08:56:22 lola-5 kernel: [&amp;lt;ffffffffa09a4f9c&amp;gt;] tgt_request_handle+0x8ec/0x1470 [ptlrpc]
Jan 27 08:56:22 lola-5 kernel: [&amp;lt;ffffffffa094c201&amp;gt;] ptlrpc_main+0xe41/0x1910 [ptlrpc]
Jan 27 08:56:22 lola-5 kernel: [&amp;lt;ffffffff8152a39e&amp;gt;] ? thread_return+0x4e/0x7d0
Jan 27 08:56:22 lola-5 kernel: [&amp;lt;ffffffffa094b3c0&amp;gt;] ? ptlrpc_main+0x0/0x1910 [ptlrpc]
Jan 27 08:56:22 lola-5 kernel: [&amp;lt;ffffffff8109e78e&amp;gt;] kthread+0x9e/0xc0
Jan 27 08:56:22 lola-5 kernel: [&amp;lt;ffffffff8100c28a&amp;gt;] child_rip+0xa/0x20
Jan 27 08:56:22 lola-5 kernel: [&amp;lt;ffffffff8109e6f0&amp;gt;] ? kthread+0x0/0xc0
Jan 27 08:56:22 lola-5 kernel: [&amp;lt;ffffffff8100c280&amp;gt;] ? child_rip+0x0/0x20
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;3) OUT handler for orphan clean up. They are blocked when start transaction on ZFS.&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Jan 27 17:51:32 lola-5 kernel: Pid: 8262, comm: ll_ost02_051
Jan 27 17:51:32 lola-5 kernel: 
Jan 27 17:51:32 lola-5 kernel: Call Trace:
Jan 27 17:51:32 lola-5 kernel: [&amp;lt;ffffffffa02e178d&amp;gt;] cv_wait_common+0x11d/0x130 [spl]
Jan 27 17:51:32 lola-5 kernel: [&amp;lt;ffffffff8109ec20&amp;gt;] ? autoremove_wake_function+0x0/0x40
Jan 27 17:51:32 lola-5 kernel: [&amp;lt;ffffffffa02e17f5&amp;gt;] __cv_wait+0x15/0x20 [spl]
Jan 27 17:51:32 lola-5 kernel: [&amp;lt;ffffffffa039877b&amp;gt;] txg_wait_open+0x8b/0xd0 [zfs]
Jan 27 17:51:32 lola-5 kernel: [&amp;lt;ffffffffa0356f27&amp;gt;] dmu_tx_wait+0x3f7/0x400 [zfs]
Jan 27 17:51:32 lola-5 kernel: [&amp;lt;ffffffffa036b5da&amp;gt;] ? dsl_dir_tempreserve_space+0xca/0x190 [zfs]
Jan 27 17:51:32 lola-5 kernel: [&amp;lt;ffffffff8152b83e&amp;gt;] ? mutex_lock+0x1e/0x50
Jan 27 17:51:32 lola-5 kernel: [&amp;lt;ffffffffa0357121&amp;gt;] dmu_tx_assign+0xa1/0x570 [zfs]
Jan 27 17:51:32 lola-5 kernel: [&amp;lt;ffffffffa0b18f5d&amp;gt;] osd_trans_start+0xed/0x430 [osd_zfs]
Jan 27 17:51:32 lola-5 kernel: [&amp;lt;ffffffffa0fcef10&amp;gt;] ofd_object_destroy+0x280/0x8e0 [ofd]
Jan 27 17:51:32 lola-5 kernel: [&amp;lt;ffffffffa0fc8f0d&amp;gt;] ofd_destroy_by_fid+0x35d/0x620 [ofd]
Jan 27 17:51:32 lola-5 kernel: [&amp;lt;ffffffffa090abc0&amp;gt;] ? ldlm_blocking_ast+0x0/0x180 [ptlrpc]
Jan 27 17:51:32 lola-5 kernel: [&amp;lt;ffffffffa090c530&amp;gt;] ? ldlm_completion_ast+0x0/0x9b0 [ptlrpc]
Jan 27 17:51:32 lola-5 kernel: [&amp;lt;ffffffffa05d0d01&amp;gt;] ? libcfs_debug_msg+0x41/0x50 [libcfs]
Jan 27 17:51:32 lola-5 kernel: [&amp;lt;ffffffffa0fc0d13&amp;gt;] ofd_create_hdl+0x1a23/0x2640 [ofd]
Jan 27 17:51:32 lola-5 kernel: [&amp;lt;ffffffffa093a66b&amp;gt;] ? lustre_pack_reply_v2+0x1eb/0x280 [ptlrpc]
Jan 27 17:51:32 lola-5 kernel: [&amp;lt;ffffffffa09a4f9c&amp;gt;] tgt_request_handle+0x8ec/0x1470 [ptlrpc]
Jan 27 17:51:32 lola-5 kernel: [&amp;lt;ffffffffa094c201&amp;gt;] ptlrpc_main+0xe41/0x1910 [ptlrpc]
Jan 27 17:51:32 lola-5 kernel: [&amp;lt;ffffffff8152a39e&amp;gt;] ? thread_return+0x4e/0x7d0
Jan 27 17:51:32 lola-5 kernel: [&amp;lt;ffffffffa094b3c0&amp;gt;] ? ptlrpc_main+0x0/0x1910 [ptlrpc]
Jan 27 17:51:32 lola-5 kernel: [&amp;lt;ffffffff8109e78e&amp;gt;] kthread+0x9e/0xc0
Jan 27 17:51:32 lola-5 kernel: [&amp;lt;ffffffff8100c28a&amp;gt;] child_rip+0xa/0x20
Jan 27 17:51:32 lola-5 kernel: [&amp;lt;ffffffff8109e6f0&amp;gt;] ? kthread+0x0/0xc0
Jan 27 17:51:32 lola-5 kernel: [&amp;lt;ffffffff8100c280&amp;gt;] ? child_rip+0x0/0x20
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;4) OUT handler for create. They are blocked at the mutex_lock().&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Jan 27 17:59:44 lola-5 kernel: [&amp;lt;ffffffff8152bd26&amp;gt;] __mutex_lock_slowpath+0x96/0x210
Jan 27 17:59:44 lola-5 kernel: [&amp;lt;ffffffffa0fcbe7b&amp;gt;] ? ofd_seq_load+0xbb/0xa90 [ofd]
Jan 27 17:59:44 lola-5 kernel: [&amp;lt;ffffffff8152b84b&amp;gt;] mutex_lock+0x2b/0x50
Jan 27 17:59:44 lola-5 kernel: [&amp;lt;ffffffffa0fbff18&amp;gt;] ofd_create_hdl+0xc28/0x2640 [ofd]
Jan 27 17:59:44 lola-5 kernel: [&amp;lt;ffffffffa093a66b&amp;gt;] ? lustre_pack_reply_v2+0x1eb/0x280 [ptlrpc]
Jan 27 17:59:44 lola-5 kernel: [&amp;lt;ffffffffa093a7a6&amp;gt;] ? lustre_pack_reply_flags+0xa6/0x1e0 [ptlrpc]
Jan 27 17:59:44 lola-5 kernel: [&amp;lt;ffffffffa093a8f1&amp;gt;] ? lustre_pack_reply+0x11/0x20 [ptlrpc]
Jan 27 17:59:44 lola-5 kernel: [&amp;lt;ffffffffa09a4f9c&amp;gt;] tgt_request_handle+0x8ec/0x1470 [ptlrpc]
Jan 27 17:59:44 lola-5 kernel: [&amp;lt;ffffffffa094c201&amp;gt;] ptlrpc_main+0xe41/0x1910 [ptlrpc]
Jan 27 17:59:44 lola-5 kernel: [&amp;lt;ffffffff8106c560&amp;gt;] ? pick_next_task_fair+0xd0/0x130
Jan 27 17:59:44 lola-5 kernel: [&amp;lt;ffffffff8152a126&amp;gt;] ? schedule+0x176/0x3a0
Jan 27 17:59:44 lola-5 kernel: [&amp;lt;ffffffffa094b3c0&amp;gt;] ? ptlrpc_main+0x0/0x1910 [ptlrpc]
Jan 27 17:59:44 lola-5 kernel: [&amp;lt;ffffffff8109e78e&amp;gt;] kthread+0x9e/0xc0
Jan 27 17:59:44 lola-5 kernel: [&amp;lt;ffffffff8100c28a&amp;gt;] child_rip+0xa/0x20
Jan 27 17:59:44 lola-5 kernel: [&amp;lt;ffffffff8109e6f0&amp;gt;] ? kthread+0x0/0xc0
Jan 27 17:59:44 lola-5 kernel: [&amp;lt;ffffffff8100c280&amp;gt;] ? child_rip+0x0/0x20
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Because of more and more RPC service threads are blocked, then more and more RAM are occupied, as to OOM finally. Since the 1)/2)/3) cases may held some mutex when blocked, it is not strange that the 4) case happened. So the key issue is not why mutex is held, but why the ZFS transactions hang there.&lt;/p&gt;

&lt;p&gt;According to the stack info, it looks similar as &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6923&quot; title=&quot;writing process hung at txg_wait_open&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6923&quot;&gt;LU-6923&lt;/a&gt;.&lt;/p&gt;</comment>
                            <comment id="140384" author="jgmitter" created="Thu, 28 Jan 2016 18:12:21 +0000"  >&lt;p&gt;Hi Fan Yong,&lt;br/&gt;
Can you take care of this one?&lt;br/&gt;
Thanks.&lt;br/&gt;
Joe&lt;/p&gt;</comment>
                            <comment id="140487" author="yong.fan" created="Fri, 29 Jan 2016 05:34:40 +0000"  >&lt;p&gt;As explained above, the key issue is that the zfs transactions were blocked for some unknown reason. According to the kernel stack trace, it should be another failure instance of &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6923&quot; title=&quot;writing process hung at txg_wait_open&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6923&quot;&gt;LU-6923&lt;/a&gt; which has no solution yet.&lt;/p&gt;</comment>
                            <comment id="140488" author="yong.fan" created="Fri, 29 Jan 2016 05:35:25 +0000"  >&lt;p&gt;It is another failure instance of &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6923&quot; title=&quot;writing process hung at txg_wait_open&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6923&quot;&gt;LU-6923&lt;/a&gt;.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10010">
                    <name>Duplicate</name>
                                            <outwardlinks description="duplicates">
                                        <issuelink>
            <issuekey id="31259">LU-6923</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="20215" name="console-lola-5.log.bz2" size="243186" author="heckes" created="Thu, 28 Jan 2016 13:54:40 +0000"/>
                            <attachment id="20216" name="lfsck-proc-list.bz2" size="475" author="heckes" created="Thu, 28 Jan 2016 13:57:27 +0000"/>
                            <attachment id="20217" name="lola-5-oom-killer-2.tar.bz2" size="1334501" author="heckes" created="Thu, 28 Jan 2016 13:57:27 +0000"/>
                            <attachment id="20218" name="lustre-log.1453949036.15397.bz2" size="1142" author="heckes" created="Thu, 28 Jan 2016 13:57:27 +0000"/>
                            <attachment id="20219" name="mds-lfsck-status-nslayout.log.bz2" size="1879" author="heckes" created="Thu, 28 Jan 2016 13:57:27 +0000"/>
                            <attachment id="20220" name="mds-lfsck-status-oi.log.bz2" size="812" author="heckes" created="Thu, 28 Jan 2016 13:57:27 +0000"/>
                            <attachment id="20222" name="messages-lola-5.log.bz2" size="269720" author="heckes" created="Thu, 28 Jan 2016 13:58:01 +0000"/>
                            <attachment id="20221" name="oss-lfsck-status.log.bz2" size="1172" author="heckes" created="Thu, 28 Jan 2016 13:57:27 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzxzm7:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>