<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:10:41 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-816] Possible bug/dead-lock in Lustre-Lock algorithm/protocol may lead to multiple Clients/processes to blocked for ever</title>
                <link>https://jira.whamcloud.com/browse/LU-816</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>
&lt;p&gt;Hi,&lt;/p&gt;

&lt;p&gt;Several Bull customers (CEA, TGCC,...) are reporting error messages exactly as described in &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-142&quot; title=&quot;system hang when running replay-single or replay-dual with three clients&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-142&quot;&gt;&lt;del&gt;LU-142&lt;/del&gt;&lt;/a&gt;, except that it is on connections between clients and OSS, instead of clients and MDS.&lt;br/&gt;
These customers are installed with Lustre 2.0.0.1 Bull, which does not include the &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-142&quot; title=&quot;system hang when running replay-single or replay-dual with three clients&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-142&quot;&gt;&lt;del&gt;LU-142&lt;/del&gt;&lt;/a&gt; patch.&lt;br/&gt;
DO you think it is the same problem as described in &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-142&quot; title=&quot;system hang when running replay-single or replay-dual with three clients&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-142&quot;&gt;&lt;del&gt;LU-142&lt;/del&gt;&lt;/a&gt; and we only have to include the corresponding patch in our delivery, or is it a similar problem in other parts of code, needing an additional patch ?&lt;/p&gt;

&lt;p&gt;Here are traces collected by our on site support on a customer site:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;
Users reported hung applications/jobs, mainly in Slurm&apos;s &quot;Completing&quot; state.

Logs on affected Clients/nodes have plenty of :
&quot;LutreError: 11-0: an error occurred while communicating with &amp;lt;OSS_nid&amp;gt;. The ost_connect operation failed with -16&quot; msgs.

To find the details of the failing connection on the Client side we use :
# grep current /proc/fs/lustre/osc/*/state | grep -v FULL
--&amp;gt;&amp;gt; one OST connection will show q &quot;CONNECTING&quot; state.

Then on the identified OSS/Server, we find a lot of the following msgs for the original Client and sometimes also others
:
&quot;Lustre: &amp;lt;pid:0&amp;gt;:(ldlm_lib.c:841:target_handle_connect()) &amp;lt;OST-name&amp;gt;: refuse reconnection from &amp;lt;Client_nid&amp;gt;@&amp;lt;portal&amp;gt; to 0x...&quot;
&quot;LustreError: &amp;lt;pid:0&amp;gt;:(ldlm_lib.c:2123:target_send_reply_msg()) @@@ processing error (-16) ....&quot;

on/in the same OSS/log there also messages of the type : &quot;Lustre: &amp;lt;pid:0&amp;gt;:(client.c:1763:ptlrpc_expire_one_request()) @@@ Request ... sent from &amp;lt;OST_name&amp;gt; to NID &amp;lt;other_Client_nid&amp;gt;@&amp;lt;portal&amp;gt; has timed out for slow reply ...&quot;.

On the other/new identified Client, logs contain repeating msgs of the type :
&quot;Lustre: &amp;lt;pid:0&amp;gt;:(service.c:1040:ptlrpc_at_send_early_reply()) @@@ Couldn&apos;t add any time (5/-150) , not sending early reply&quot;

#consequences:
No other way to unblock the situation than to crash/dump the other/new identified Client !!!
 
#details:
To come in further comments/add-ons !!

&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</description>
                <environment></environment>
        <key id="12309">LU-816</key>
            <summary>Possible bug/dead-lock in Lustre-Lock algorithm/protocol may lead to multiple Clients/processes to blocked for ever</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="3">Duplicate</resolution>
                                        <assignee username="jay">Jinshan Xiong</assignee>
                                    <reporter username="lustre-bull">Lustre Bull</reporter>
                        <labels>
                    </labels>
                <created>Wed, 2 Nov 2011 14:38:08 +0000</created>
                <updated>Sat, 8 Mar 2014 00:08:33 +0000</updated>
                            <resolved>Sat, 8 Mar 2014 00:08:33 +0000</resolved>
                                                                        <due></due>
                            <votes>0</votes>
                                    <watches>6</watches>
                                                                            <comments>
                            <comment id="22292" author="pjones" created="Wed, 2 Nov 2011 15:11:58 +0000"  >&lt;p&gt;Fan Yong&lt;/p&gt;

&lt;p&gt;Could you please advise whether this is the same issue as previously observed?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="22448" author="adegremont" created="Thu, 3 Nov 2011 15:55:27 +0000"  >&lt;p&gt;As the patch in &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-142&quot; title=&quot;system hang when running replay-single or replay-dual with three clients&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-142&quot;&gt;&lt;del&gt;LU-142&lt;/del&gt;&lt;/a&gt; is only patching mdt_getattr_name_lock() it is very unlikely this patch will fixed the issue related to OST here.&lt;/p&gt;

&lt;p&gt;My understanding is that the same kind of LDLM dead lock appeared here. That&apos;s the reason why the same kind of messages are shown, but the problem seems different.&lt;/p&gt;

&lt;p&gt;Fan Yong, please tell me if I misunderstood the problem.&lt;/p&gt;</comment>
                            <comment id="22467" author="yong.fan" created="Thu, 3 Nov 2011 22:46:21 +0000"  >&lt;p&gt;I am sure it is not the same issue as &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-142&quot; title=&quot;system hang when running replay-single or replay-dual with three clients&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-142&quot;&gt;&lt;del&gt;LU-142&lt;/del&gt;&lt;/a&gt;. It is better to dump OSS debug_log/stack for further investigation, otherwise it is difficult to say what happened.&lt;/p&gt;</comment>
                            <comment id="22680" author="lustre-bull" created="Tue, 8 Nov 2011 11:00:15 +0000"  >
&lt;p&gt;Below is the analysis done by on line support:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;
And these are the significant stacks I found on the Client owner of the Lustre-lock !!...

All the &quot;callback&quot; threads (&quot;ldlm_cb_&amp;lt;id&amp;gt;&quot;) were stuck since a long time with the same kind of following stack :
===================================================
schedule()
__mutex_lock_slowpath()
mutex_lock()
cl_lock_mutex_get()
osc_ldlm_glimpse_ast()
ldlm_callback_handler()
ptlrpc_server_handle_request()
ptlrpc_main()
kernel_thread()
===================================================

and/when only one &quot;ldlm_bl_&amp;lt;id&amp;gt;&quot; thread is stuck with the following stack :
===================================================
schedule()
io_schedule()
sync_page()
__wait_on_bit_lock()
__lock_page()
vvp_page_own()
cl_page_own0()
cl_page_own()
cl_page_gang_lookup()
cl_lock_page_out()
osc_lock_flush()
osc_lock_cancel()
cl_lock_cancel0()
cl_lock_cancel()
osc_ldlm_blocking()
ldlm_handle_bl_callback()
ldlm_bl_thread_main()
kernel_thread()
===================================================

So seems that this &quot;ldlm_bl_&amp;lt;id&amp;gt;&quot; thread started to cancel the Lustre-lock and flush its associated pages, when the Server/OSS decided to reclaim it (thus the &quot;ldlm_cb_&amp;lt;id&amp;gt;&quot; threads) and we encountered some kind of a remote dead-lock situation also impacting/blocking other Clients trying to grant this same lock ...
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="22694" author="yong.fan" created="Tue, 8 Nov 2011 12:30:00 +0000"  >&lt;p&gt;===================================================&lt;br/&gt;
schedule()&lt;br/&gt;
__mutex_lock_slowpath()&lt;br/&gt;
mutex_lock()&lt;br/&gt;
cl_lock_mutex_get()&lt;br/&gt;
osc_ldlm_glimpse_ast()&lt;br/&gt;
ldlm_callback_handler()&lt;br/&gt;
ptlrpc_server_handle_request()&lt;br/&gt;
ptlrpc_main()&lt;br/&gt;
kernel_thread()&lt;br/&gt;
===================================================&lt;/p&gt;

&lt;p&gt;Above stack means that all the glimpse callbacks (ldlm_cb_xx) were blocked when try to acquire the mutex on some cl_lock on the client. Because such mutex was held by the ldlm_bl_xx thread which was in cancel such cl_lock. On the other hand, the ldlm_bl_xx thread was trying to flush dirty pages to OSS before canceling the cl_lock, but it was hung with mutex held for some unknown reason. Then it seemed that all the threads on client-side were hung.&lt;/p&gt;

&lt;p&gt;Currently, because we do not have lustre debug logs, it is not easy to say why the sync_page is blocked. But there are some similar bugs we found before, one of them is &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-337&quot; title=&quot;Processes stuck in sync_page on lustre client&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-337&quot;&gt;&lt;del&gt;LU-337&lt;/del&gt;&lt;/a&gt; (&lt;a href=&quot;http://jira.whamcloud.com/browse/LU-337&quot; class=&quot;external-link&quot; rel=&quot;nofollow&quot;&gt;http://jira.whamcloud.com/browse/LU-337&lt;/a&gt;). It may be helpful for you, please try the patch:&lt;br/&gt;
&lt;a href=&quot;http://review.whamcloud.com/#change,880&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#change,880&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="22821" author="sebastien.buisson" created="Thu, 10 Nov 2011 11:34:52 +0000"  >&lt;p&gt;Hi,&lt;/p&gt;

&lt;p&gt;As the system is in production, we would like to be sure of the fix before we propose it to the customer.&lt;br/&gt;
Which Lustre debug level do you need to carry out your analysis?&lt;/p&gt;

&lt;p&gt;Moreover, do you think it is OK if debug logs are activated only after the problem has occurred?&lt;/p&gt;

&lt;p&gt;Cheers,&lt;br/&gt;
Sebastien.&lt;/p&gt;</comment>
                            <comment id="23107" author="yong.fan" created="Wed, 16 Nov 2011 22:19:54 +0000"  >&lt;p&gt;Firstly, please check whether quota is activated or not on the system. If not, it will not be &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-337&quot; title=&quot;Processes stuck in sync_page on lustre client&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-337&quot;&gt;&lt;del&gt;LU-337&lt;/del&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Anyway, &quot;-1&quot; level debug is better for both client and OSS during the system run. Activating such debug after the problem can help very little.&lt;/p&gt;

&lt;p&gt;Currently, I suspect it is &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-337&quot; title=&quot;Processes stuck in sync_page on lustre client&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-337&quot;&gt;&lt;del&gt;LU-337&lt;/del&gt;&lt;/a&gt;. So if they think that collecting such debug logs maybe affect the system performance much, please collect client-side debug logs firstly to try to reduce affect the whole system.&lt;/p&gt;

&lt;p&gt;On the other hand, can you give me the client-side all threads stacks log when the problem occurred? You can get them by &quot;echo t &amp;gt; /proc/sysrq-trigger&quot;. Thanks!&lt;/p&gt;</comment>
                            <comment id="23466" author="patrick.valentin" created="Mon, 28 Nov 2011 13:07:42 +0000"  >&lt;p&gt;Hi,&lt;br/&gt;
below is the answer provided by  on site support. I have also attached the file (crash trace) they provided.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Quotas are not used nor active at Tera-100.&lt;br/&gt;
You will find attached &quot;foreach_bt_cartan1121&quot; file containing all Client threads stacks (via &quot;bt -t&quot;) when problem occured.&lt;/p&gt;&lt;/blockquote&gt;</comment>
                            <comment id="23594" author="yong.fan" created="Thu, 1 Dec 2011 11:48:02 +0000"  >&lt;p&gt;From your log, it is obviously that all the hung &quot;ldlm_cb_xxx&quot; are because of &quot;osc_ldlm_glimpse_ast()&quot; blocked by  &quot;cl_lock_mutex_get()&quot; on the cl_lock. Such mutex is held by &quot;poncetr_%%A78_1&quot; which is trying to cancel the cl_lock with the mutex held. But for some unknown reason, the cl_lock cancel cannot finish.&lt;/p&gt;

&lt;p&gt;I have some concern about the possible deadlock: if all the service threads on OST are in processing glimpse_ast(), but the glimpse_ast() is blocked by client-side mutex get as described above. Then when lock cancel RPC comes to OST, what will happen? If it has to wait, then deadlock.&lt;/p&gt;

&lt;p&gt;Jay, I am not quite sure for that, please comment. And I also doubt that whether glimpse_ast should be blocked on client-side? it is false for b1_8.&lt;/p&gt;</comment>
                            <comment id="23626" author="jay" created="Fri, 2 Dec 2011 01:36:33 +0000"  >&lt;p&gt;it looks like the lock is being canceled but it was blocked by locking a page. There are several clio issues fixed in 2.1 release. Can you please tell what patches you have applied for this customer?&lt;/p&gt;</comment>
                            <comment id="26423" author="patrick.valentin" created="Thu, 12 Jan 2012 08:22:36 +0000"  >&lt;p&gt;Here is the list of patches that were present in the customer lustre release.&lt;br/&gt;
This corresponds to the Bull delivey identified as &quot;T-2_0_0-lustrebull-EFIX7_AE1_1&quot; and produced on 4 october 2011.&lt;/p&gt;

&lt;p&gt;bz16919&lt;br/&gt;
bz20687&lt;br/&gt;
bz21732&lt;br/&gt;
bz21122&lt;br/&gt;
bz21804&lt;br/&gt;
bz22078&lt;br/&gt;
bz22360&lt;br/&gt;
bz22375&lt;br/&gt;
bz22421&lt;br/&gt;
bz22683&lt;br/&gt;
bz23035&lt;br/&gt;
bz23120&lt;br/&gt;
bz23123&lt;br/&gt;
bz23289&lt;br/&gt;
bz23298&lt;br/&gt;
bz23357&lt;br/&gt;
bz23399&lt;br/&gt;
bz23460&lt;br/&gt;
bz24010&lt;br/&gt;
bz24291&lt;br/&gt;
bz24420&lt;br/&gt;
&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-81&quot; title=&quot;Some JBD2 journaling deadlock at BULL&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-81&quot;&gt;&lt;del&gt;LU-81&lt;/del&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-91&quot; title=&quot;Impossible to use quotas on RHEL6.0&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-91&quot;&gt;&lt;del&gt;LU-91&lt;/del&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-122&quot; title=&quot;Revert bug 21122 since it causes deadlock&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-122&quot;&gt;&lt;del&gt;LU-122&lt;/del&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-128&quot; title=&quot;OSSs frequent crashes due to LBUG/[ASSERTION(last_rcvd&amp;gt;=le64_to_cpu(lcd-&amp;gt;lcd_last_transno)) failed] in recovery&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-128&quot;&gt;&lt;del&gt;LU-128&lt;/del&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-130&quot; title=&quot;Kernel crash on lustre 2.0 client (page fault in ll_file_read, NULL pointer dereference)&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-130&quot;&gt;&lt;del&gt;LU-130&lt;/del&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-148&quot; title=&quot;ll_readpage has to unlock vmpage by any means&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-148&quot;&gt;&lt;del&gt;LU-148&lt;/del&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-185&quot; title=&quot;LBUG: (cl_page.c:1362:cl_page_completion()) !(pg-&amp;gt;cp_flags &amp;amp; CPF_READ_COMPLETED) ASSERTION(0) failed&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-185&quot;&gt;&lt;del&gt;LU-185&lt;/del&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-190&quot; title=&quot;random mode opencreate will LBUG lustre client&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-190&quot;&gt;&lt;del&gt;LU-190&lt;/del&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-255&quot; title=&quot;use ext4 features by default for newly formatted filesystems&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-255&quot;&gt;&lt;del&gt;LU-255&lt;/del&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-275&quot; title=&quot;I/O errors when lustre uses multipath devices&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-275&quot;&gt;&lt;del&gt;LU-275&lt;/del&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-300&quot; title=&quot;Oops in cl_page_put() during execve()/page-fault on a binary mapped from a Lustre-filesystem and executed by a parallel application&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-300&quot;&gt;&lt;del&gt;LU-300&lt;/del&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-328&quot; title=&quot;OSS pseudo-hang due to (struct filter_obd *)-&amp;gt;fo_llog_list_lock deadlock upon OSTs warm restart/recovery&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-328&quot;&gt;&lt;del&gt;LU-328&lt;/del&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-361&quot; title=&quot;Lustre Client crashes due to ASSERTION(!request-&amp;gt;rq_replay) failed&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-361&quot;&gt;&lt;del&gt;LU-361&lt;/del&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-369&quot; title=&quot;ASSERTION(oti &amp;amp;&amp;amp; oti-&amp;gt;oti_thread &amp;amp;&amp;amp; oti-&amp;gt;oti_thread-&amp;gt;t_watchdog) failed in quota_chk_acq_common()&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-369&quot;&gt;&lt;del&gt;LU-369&lt;/del&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-394&quot; title=&quot;LND failure casued by discontiguous KIOV pages&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-394&quot;&gt;&lt;del&gt;LU-394&lt;/del&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-416&quot; title=&quot;Many processes hung consuming a lot of CPU in Lustre-Client page-cache lookups&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-416&quot;&gt;&lt;del&gt;LU-416&lt;/del&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-418&quot; title=&quot;LNET layer code paths with critical sections protected with the LNET main spinlock can cause clients pseudo-hang situations with a heavy impact on Lustre operations&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-418&quot;&gt;&lt;del&gt;LU-418&lt;/del&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-435&quot; title=&quot;unknow error in page fault when running sanity test_30c&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-435&quot;&gt;&lt;del&gt;LU-435&lt;/del&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-437&quot; title=&quot;Client hang with spinning ldlm_bl_* and ll_imp_inval threads&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-437&quot;&gt;&lt;del&gt;LU-437&lt;/del&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-442&quot; title=&quot;Client LBUG - (osc_request.c:3087:osc_set_lock_data_with_check()) ASSERTION(lock-&amp;gt;l_ast_data == NULL || lock-&amp;gt;l_ast_data == data) failed&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-442&quot;&gt;&lt;del&gt;LU-442&lt;/del&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-484&quot; title=&quot;LASSERT(inode-&amp;gt;i_nlink &amp;gt; 0) failed in osd_handler.c:osd_object_ref_del()&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-484&quot;&gt;&lt;del&gt;LU-484&lt;/del&gt;&lt;/a&gt;&lt;br/&gt;
LU_542&lt;br/&gt;
LU_585&lt;br/&gt;
&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-601&quot; title=&quot;kernel BUG at fs/jbd2/transaction.c:1030&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-601&quot;&gt;&lt;del&gt;LU-601&lt;/del&gt;&lt;/a&gt; patch_set_7&lt;br/&gt;
&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-613&quot; title=&quot;Lustre-Client dead-lock during binary exec() over Lustre FS&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-613&quot;&gt;&lt;del&gt;LU-613&lt;/del&gt;&lt;/a&gt;&lt;br/&gt;
LU_651&lt;br/&gt;
&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-685&quot; title=&quot;Wide busy lock in kiblnd_pool_alloc_node&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-685&quot;&gt;&lt;del&gt;LU-685&lt;/del&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The Jira tickets integrated in the next Bull efix deliveries since october 4, 2011 are the following:&lt;br/&gt;
&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-234&quot; title=&quot;OOM killer causes node hang&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-234&quot;&gt;&lt;del&gt;LU-234&lt;/del&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-333&quot; title=&quot;Lustre client procfs stats: read_bytes does not record the number of bytes transfered from the fs.&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-333&quot;&gt;&lt;del&gt;LU-333&lt;/del&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-399&quot; title=&quot;mkfs.lustre: The resize maximum must be greater than the filesystem size.&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-399&quot;&gt;&lt;del&gt;LU-399&lt;/del&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-481&quot; title=&quot;sanity test_119d fails (ASSERTION((struct cl_page *)vmpage-&amp;gt;private != slice-&amp;gt;cpl_page) failed)&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-481&quot;&gt;&lt;del&gt;LU-481&lt;/del&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-543&quot; title=&quot;Missing UNLINK record on overwritting rename&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-543&quot;&gt;&lt;del&gt;LU-543&lt;/del&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-601&quot; title=&quot;kernel BUG at fs/jbd2/transaction.c:1030&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-601&quot;&gt;&lt;del&gt;LU-601&lt;/del&gt;&lt;/a&gt; patch_set_13&lt;br/&gt;
&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-687&quot; title=&quot;Application is OOM-killed during page-fault resolution on its binary over Lustre when there is plenty of memory available&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-687&quot;&gt;&lt;del&gt;LU-687&lt;/del&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-815&quot; title=&quot;BUG: unable to handle kernel NULL pointer dereference&amp;quot; in lprocfs_rd_import()&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-815&quot;&gt;&lt;del&gt;LU-815&lt;/del&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-857&quot; title=&quot;Lustre client tolerates enforced SELinux.&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-857&quot;&gt;&lt;del&gt;LU-857&lt;/del&gt;&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="26564" author="jay" created="Fri, 13 Jan 2012 19:31:01 +0000"  >&lt;p&gt;From unknown reason, the client had difficulties to grab page lock when it was canceling a lock. How often do you guys see this problem? If possible, I&apos;d like to take a look at the kernel log on the OSS side, especially to see if there exists eviction messages.&lt;/p&gt;

&lt;p&gt;Thanks.&lt;/p&gt;</comment>
                            <comment id="39498" author="patrick.valentin" created="Tue, 29 May 2012 06:49:31 +0000"  >&lt;p&gt;On site support reports that the problem did not occur again since the installation of the efix containing &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-1274&quot; title=&quot;Client threads block for sometime before being evicted and can never reconnect afterward&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-1274&quot;&gt;&lt;del&gt;LU-1274&lt;/del&gt;&lt;/a&gt; patch, one month ago.&lt;/p&gt;</comment>
                            <comment id="39502" author="pjones" created="Tue, 29 May 2012 09:05:45 +0000"  >&lt;p&gt;ok thanks Patrick&lt;/p&gt;</comment>
                            <comment id="40436" author="lustre-bull" created="Tue, 12 Jun 2012 13:21:30 +0000"  >&lt;p&gt;The Lustre 2.1.1 Bull release containing &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-1274&quot; title=&quot;Client threads block for sometime before being evicted and can never reconnect afterward&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-1274&quot;&gt;&lt;del&gt;LU-1274&lt;/del&gt;&lt;/a&gt; patch has been installed on several customer sites.&lt;br/&gt;
AWE customer reports that the problem described in &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-1274&quot; title=&quot;Client threads block for sometime before being evicted and can never reconnect afterward&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-1274&quot;&gt;&lt;del&gt;LU-1274&lt;/del&gt;&lt;/a&gt; no longer occurs since the efix installation, a few weeks ago.&lt;br/&gt;
But CEA customer, which is deploying the same efix, reports that the problem initially described in &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-816&quot; title=&quot;Possible bug/dead-lock in Lustre-Lock algorithm/protocol may lead to multiple Clients/processes to blocked for ever&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-816&quot;&gt;&lt;del&gt;LU-816&lt;/del&gt;&lt;/a&gt; and declared as duplicate of &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-1274&quot; title=&quot;Client threads block for sometime before being evicted and can never reconnect afterward&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-1274&quot;&gt;&lt;del&gt;LU-1274&lt;/del&gt;&lt;/a&gt; re-occured since a few days. So I have transfered the latest syslog file from one the OSS server they provided (uploads/&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-816&quot; title=&quot;Possible bug/dead-lock in Lustre-Lock algorithm/protocol may lead to multiple Clients/processes to blocked for ever&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-816&quot;&gt;&lt;del&gt;LU-816&lt;/del&gt;&lt;/a&gt;/cartan.log2). As this syslog is rather old, I have asked them to provided a new copy of the syslog on both client and OSS side, and all the thread stacks on the OSS side.&lt;/p&gt;</comment>
                            <comment id="40564" author="pjones" created="Thu, 14 Jun 2012 10:16:31 +0000"  >&lt;p&gt;Bull now believe this to be a duplicate of &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-948&quot; title=&quot;Client recovery hang&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-948&quot;&gt;&lt;del&gt;LU-948&lt;/del&gt;&lt;/a&gt; and are testing out the patch&lt;/p&gt;</comment>
                            <comment id="78782" author="jfc" created="Sat, 8 Mar 2014 00:08:33 +0000"  >&lt;p&gt;Last comment is that a patch was being tested.&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                            <attachment id="10636" name="foreach_bt_cartan1121" size="316447" author="patrick.valentin" created="Mon, 28 Nov 2011 13:02:54 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzvsnb:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>8545</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>