<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:37:48 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-3889]  LBUG: (osc_lock.c:497:osc_lock_upcall()) ASSERTION( lock-&gt;cll_state &gt;= CLS_QUEUING ) </title>
                <link>https://jira.whamcloud.com/browse/LU-3889</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;This assertion is hit on a CentOS client system running master.  It&apos;s also been noticed on Cray SLES clients running 2.4.&lt;/p&gt;

&lt;p&gt;This is fairly easy to reproduce on CentOS.  I&apos;ll be attaching a log of this with debug=-1 set.  (I was also running a special debug patch for this bug called rdebug, you may see some extra output from that.)&lt;/p&gt;

&lt;p&gt;Two things are needed:  A reproducer script, and memory pressure on the system.&lt;/p&gt;

&lt;p&gt;The reproducer is the following shell script - This was originally a test for a different bug, so I&apos;m not sure if every step is needed - run in a folder with at least a few thousand files in it:  &lt;span class=&quot;error&quot;&gt;&amp;#91;It may work with a smaller number of files; I&amp;#39;m just describing how I&amp;#39;ve reproduced it.&amp;#93;&lt;/span&gt;&lt;br/&gt;
&#8212;&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;&lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; idx in $(seq 0 10000); &lt;span class=&quot;code-keyword&quot;&gt;do&lt;/span&gt;
    time ls -laR &amp;gt; /dev/&lt;span class=&quot;code-keyword&quot;&gt;null&lt;/span&gt;
    touch somefile
    rm -f somefiles
    echo $idx: $(date +%T) $(grep MemFree /proc/meminfo)
done
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&#8212;&lt;br/&gt;
I used this little tiny piece of C to create the memory pressure. Start the reproducer script above, and then run this as well. &lt;/p&gt;

&lt;p&gt;Simply hold down enter and watch the test script output as free memory drops - Once you&apos;re down to a small amount free, the total amount of free memory will stop dropping.  Then simply keep holding down enter to continue memory pressure, and the bug will happen after a few moments.&lt;/p&gt;

&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;#include &amp;lt;stdio.h&amp;gt;
#include &amp;lt;unistd.h&amp;gt;

&lt;span class=&quot;code-object&quot;&gt;int&lt;/span&gt; main()
{
    &lt;span class=&quot;code-object&quot;&gt;int&lt;/span&gt; i;
    &lt;span class=&quot;code-object&quot;&gt;char&lt;/span&gt;* junk;

start: i = 0;

    &lt;span class=&quot;code-keyword&quot;&gt;while&lt;/span&gt;(i &amp;lt; 50) { 
        printf(&lt;span class=&quot;code-quote&quot;&gt;&quot;Malloc!\n&quot;&lt;/span&gt;); 
        junk = malloc(1024*1024*1024); 
        junk[0] = i; 
        i++; 
    }

    printf(&lt;span class=&quot;code-quote&quot;&gt;&quot;Mallocced 50 GB. Press enter to malloc another 50.\n&quot;&lt;/span&gt;);
    printf(&lt;span class=&quot;code-quote&quot;&gt;&quot;Note: This seems to use roughly 10 MB of real memory each time.\n&quot;&lt;/span&gt;);
    getchar();
    &lt;span class=&quot;code-keyword&quot;&gt;goto&lt;/span&gt; start;
}
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;hr /&gt;

&lt;p&gt;Rahul Deshmukh of Xyratex is looking at this with us, and these are his initial thoughts:&lt;br/&gt;
As per my understanding of the code, osc_lock_enqueue() function enqueue the&lt;br/&gt;
lock and do not wait for network communication.After reply from server we&lt;br/&gt;
execute the call back function i.e. osc_lock_upcall() for the lock enqueue&lt;br/&gt;
through osc_lock_enqueue().&lt;/p&gt;

&lt;p&gt;In this case after successful enqueue and before we get reply from server&lt;br/&gt;
(or call to the osc_lock_upcall()), I see in the log that we unused the lock&lt;br/&gt;
and hence the LBUG.&lt;/p&gt;

&lt;p&gt;I will investigate more and update accordingly.&lt;/p&gt;</description>
                <environment>CentOS 6.4 running fairly recent master.</environment>
        <key id="20796">LU-3889</key>
            <summary> LBUG: (osc_lock.c:497:osc_lock_upcall()) ASSERTION( lock-&gt;cll_state &gt;= CLS_QUEUING ) </summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="1" iconUrl="https://jira.whamcloud.com/images/icons/priorities/blocker.svg">Blocker</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="jay">Jinshan Xiong</assignee>
                                    <reporter username="paf">Patrick Farrell</reporter>
                        <labels>
                            <label>HB</label>
                            <label>mn4</label>
                    </labels>
                <created>Thu, 5 Sep 2013 17:28:23 +0000</created>
                <updated>Mon, 16 Jun 2014 19:16:20 +0000</updated>
                            <resolved>Sat, 4 Jan 2014 14:42:24 +0000</resolved>
                                    <version>Lustre 2.5.0</version>
                    <version>Lustre 2.6.0</version>
                    <version>Lustre 2.4.2</version>
                                    <fixVersion>Lustre 2.6.0</fixVersion>
                    <fixVersion>Lustre 2.5.1</fixVersion>
                                        <due></due>
                            <votes>1</votes>
                                    <watches>26</watches>
                                                                            <comments>
                            <comment id="65943" author="paf" created="Fri, 6 Sep 2013 14:37:06 +0000"  >&lt;p&gt;The system where this was replicated is running the patch from &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-3027&quot; title=&quot;Failure on test suite parallel-scale test_write_disjoint: invalid file size 140329 instead of 160376 = 20047 * 8&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-3027&quot;&gt;&lt;del&gt;LU-3027&lt;/del&gt;&lt;/a&gt;, ie: &lt;a href=&quot;http://review.whamcloud.com/#/c/7481/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#/c/7481/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I may try the other, not accepted, patch for &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-3027&quot; title=&quot;Failure on test suite parallel-scale test_write_disjoint: invalid file size 140329 instead of 160376 = 20047 * 8&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-3027&quot;&gt;&lt;del&gt;LU-3027&lt;/del&gt;&lt;/a&gt; to see if it makes a difference:&lt;br/&gt;
&lt;a href=&quot;http://review.whamcloud.com/#/c/7482/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#/c/7482/&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="65955" author="green" created="Fri, 6 Sep 2013 16:15:05 +0000"  >&lt;p&gt;this patch might be worth a try &lt;a href=&quot;http://review.whamcloud.com/7569&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/7569&lt;/a&gt; (I hi this and other issue too and plan to try it). Meanwhile we reverted patch 7481 from b2_4 tree pending investigations&lt;/p&gt;</comment>
                            <comment id="65972" author="paf" created="Fri, 6 Sep 2013 19:07:26 +0000"  >&lt;p&gt;Oleg,&lt;/p&gt;

&lt;p&gt;With patch 7569 in place, I can no longer reproduce this bug as described above.  Thank you!&lt;/p&gt;</comment>
                            <comment id="65977" author="paf" created="Fri, 6 Sep 2013 20:40:37 +0000"  >&lt;p&gt;Oleg: One more question.  Did 7481 cause any regressions/problems you are aware of, or was it reverted only because it didn&apos;t resolve the issue?&lt;/p&gt;</comment>
                            <comment id="66063" author="spitzcor" created="Mon, 9 Sep 2013 15:26:15 +0000"  >&lt;p&gt;Patrick, Yes, #7481 was removed for a reason.  The change was removed from b2_4 with the commit description, &quot;This uenxpectedly let do &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-7834&quot; title=&quot;parallel-scale-nfsv4 test_compilebench: IOError: [Errno 28] No space left on device&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-7834&quot;&gt;LU-7834&lt;/a&gt; whic appears to be old problem.&quot;  &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-7834&quot; title=&quot;parallel-scale-nfsv4 test_compilebench: IOError: [Errno 28] No space left on device&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-7834&quot;&gt;LU-7834&lt;/a&gt; isn&apos;t a valid ticket though.&lt;/p&gt;</comment>
                            <comment id="67265" author="adilger" created="Mon, 23 Sep 2013 17:51:16 +0000"  >&lt;p&gt;Closing this as a duplicate of &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-3027&quot; title=&quot;Failure on test suite parallel-scale test_write_disjoint: invalid file size 140329 instead of 160376 = 20047 * 8&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-3027&quot;&gt;&lt;del&gt;LU-3027&lt;/del&gt;&lt;/a&gt;, since that has a patch that has fixed the problem reported here.&lt;/p&gt;</comment>
                            <comment id="68089" author="sarah" created="Tue, 1 Oct 2013 17:57:08 +0000"  >&lt;p&gt;Reopen this issue since hit the same error on FC18 client of build lustre-master #1687, build #1687 has the fix of &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-3027&quot; title=&quot;Failure on test suite parallel-scale test_write_disjoint: invalid file size 140329 instead of 160376 = 20047 * 8&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-3027&quot;&gt;&lt;del&gt;LU-3027&lt;/del&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://maloo.whamcloud.com/test_sets/3fe5c386-26f8-11e3-94b1-52540035b04c&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://maloo.whamcloud.com/test_sets/3fe5c386-26f8-11e3-94b1-52540035b04c&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;mds console&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;14:29:29:LustreError: 4824:0:(cl_lock.c:1414:cl_unuse_try()) result = -108, this is unlikely!
14:29:29:nfsd: non-standard errno: -108
14:29:29:Lustre: lustre-OST0001-osc-ffff88007bd9dc00: Connection restored to lustre-OST0001 (at 10.10.4.253@tcp)
14:29:29:nfsd: non-standard errno: -108
14:29:29:LustreError: 2767:0:(osc_lock.c:511:osc_lock_upcall()) ASSERTION( lock-&amp;gt;cll_state &amp;gt;= CLS_QUEUING ) failed: 
14:29:29:LustreError: 2767:0:(osc_lock.c:511:osc_lock_upcall()) LBUG
14:29:29:Pid: 2767, comm: ptlrpcd_0
14:29:29:
14:29:29:Call Trace:
14:29:29: [&amp;lt;ffffffffa0478895&amp;gt;] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
14:29:29: [&amp;lt;ffffffffa0478e97&amp;gt;] lbug_with_loc+0x47/0xb0 [libcfs]
14:29:29: [&amp;lt;ffffffffa0a05e4a&amp;gt;] osc_lock_upcall+0x44a/0x5f0 [osc]
14:29:29: [&amp;lt;ffffffffa0a05a00&amp;gt;] ? osc_lock_upcall+0x0/0x5f0 [osc]
14:29:29: [&amp;lt;ffffffffa09e67a6&amp;gt;] osc_enqueue_fini+0x106/0x240 [osc]
14:29:29: [&amp;lt;ffffffffa09eb1e2&amp;gt;] osc_enqueue_interpret+0xe2/0x1e0 [osc]
14:29:29: [&amp;lt;ffffffffa076bda4&amp;gt;] ptlrpc_check_set+0x2c4/0x1b40 [ptlrpc]
14:29:29: [&amp;lt;ffffffffa079801b&amp;gt;] ptlrpcd_check+0x53b/0x560 [ptlrpc]
14:29:29: [&amp;lt;ffffffffa079853b&amp;gt;] ptlrpcd+0x20b/0x370 [ptlrpc]
14:29:29: [&amp;lt;ffffffff81063410&amp;gt;] ? default_wake_function+0x0/0x20
14:29:29: [&amp;lt;ffffffffa0798330&amp;gt;] ? ptlrpcd+0x0/0x370 [ptlrpc]
14:29:29: [&amp;lt;ffffffff81096a36&amp;gt;] kthread+0x96/0xa0
14:29:29: [&amp;lt;ffffffff8100c0ca&amp;gt;] child_rip+0xa/0x20
14:29:30: [&amp;lt;ffffffff810969a0&amp;gt;] ? kthread+0x0/0xa0
14:29:30: [&amp;lt;ffffffff8100c0c0&amp;gt;] ? child_rip+0x0/0x20
14:29:30:
14:29:30:Kernel panic - not syncing: LBUG
14:29:30:Pid: 2767, comm: ptlrpcd_0 Not tainted 2.6.32-358.18.1.el6_lustre.x86_64 #1
14:29:30:Call Trace:
14:29:30: [&amp;lt;ffffffff8150de58&amp;gt;] ? panic+0xa7/0x16f
14:29:31: [&amp;lt;ffffffffa0478eeb&amp;gt;] ? lbug_with_loc+0x9b/0xb0 [libcfs]
14:29:31: [&amp;lt;ffffffffa0a05e4a&amp;gt;] ? osc_lock_upcall+0x44a/0x5f0 [osc]
14:29:31: [&amp;lt;ffffffffa0a05a00&amp;gt;] ? osc_lock_upcall+0x0/0x5f0 [osc]
14:29:31: [&amp;lt;ffffffffa09e67a6&amp;gt;] ? osc_enqueue_fini+0x106/0x240 [osc]
14:29:31: [&amp;lt;ffffffffa09eb1e2&amp;gt;] ? osc_enqueue_interpret+0xe2/0x1e0 [osc]
14:29:31: [&amp;lt;ffffffffa076bda4&amp;gt;] ? ptlrpc_check_set+0x2c4/0x1b40 [ptlrpc]
14:29:31: [&amp;lt;ffffffffa079801b&amp;gt;] ? ptlrpcd_check+0x53b/0x560 [ptlrpc]
14:29:31: [&amp;lt;ffffffffa079853b&amp;gt;] ? ptlrpcd+0x20b/0x370 [ptlrpc]
14:29:31: [&amp;lt;ffffffff81063410&amp;gt;] ? default_wake_function+0x0/0x20
14:29:31: [&amp;lt;ffffffffa0798330&amp;gt;] ? ptlrpcd+0x0/0x370 [ptlrpc]
14:29:31: [&amp;lt;ffffffff81096a36&amp;gt;] ? kthread+0x96/0xa0
14:29:31: [&amp;lt;ffffffff8100c0ca&amp;gt;] ? child_rip+0xa/0x20
14:29:32: [&amp;lt;ffffffff810969a0&amp;gt;] ? kthread+0x0/0xa0
14:29:32: [&amp;lt;ffffffff8100c0c0&amp;gt;] ? child_rip+0x0/0x20
14:29:32:Initializing cgroup subsys cpuset
14:29:32:Initializing cgroup subsys cpu
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="68473" author="green" created="Mon, 7 Oct 2013 04:47:05 +0000"  >&lt;p&gt;after some research I believe the cause of this other occurrence is different from the first. Unfortunately we do not have any crashdumps from our test cluster still which is unacceptable in my opinion.&lt;br/&gt;
As result we are unable to inspect the lustre debug log and we are not able to see what&apos;s the actual value of cll_state is too.&lt;/p&gt;</comment>
                            <comment id="68690" author="paf" created="Wed, 9 Oct 2013 18:20:42 +0000"  >&lt;p&gt;Oleg - Do you think the new patch from Jinshan at &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-3027&quot; title=&quot;Failure on test suite parallel-scale test_write_disjoint: invalid file size 140329 instead of 160376 = 20047 * 8&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-3027&quot;&gt;&lt;del&gt;LU-3027&lt;/del&gt;&lt;/a&gt; (&lt;a href=&quot;http://review.whamcloud.com/#/c/7841/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#/c/7841/&lt;/a&gt;) covers this second case?&lt;/p&gt;</comment>
                            <comment id="70086" author="sarah" created="Mon, 28 Oct 2013 23:49:10 +0000"  >&lt;p&gt;hit this bug in interop test between 2.4.1 server and 2.5 client:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://maloo.whamcloud.com/test_sets/166bcee0-3eba-11e3-a21b-52540035b04c&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://maloo.whamcloud.com/test_sets/166bcee0-3eba-11e3-a21b-52540035b04c&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;server: 2.4.1 RHEL6 ldiskfs&lt;br/&gt;
client: lustre-b2_5 build #2 RHEL6 ldiskfs&lt;/p&gt;

&lt;p&gt;client 1 console shows:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;00:35:41:Lustre: DEBUG MARKER: == racer test 1: racer on clients: client-32vm5,client-32vm6.lab.whamcloud.com DURATION=900 == 00:35:31 (1382772931)
00:35:41:Lustre: DEBUG MARKER: DURATION=900 MDSCOUNT=1 				   /usr/lib64/lustre/tests/racer/racer.sh /mnt/lustre2/racer 
00:35:41:Lustre: DEBUG MARKER: DURATION=900 MDSCOUNT=1 				   /usr/lib64/lustre/tests/racer/racer.sh /mnt/lustre/racer 
00:37:46:LustreError: 4025:0:(osc_lock.c:511:osc_lock_upcall()) ASSERTION( lock-&amp;gt;cll_state &amp;gt;= CLS_QUEUING ) failed: 
00:37:46:LustreError: 4025:0:(osc_lock.c:511:osc_lock_upcall()) LBUG
00:37:47:Pid: 4025, comm: ptlrpcd_1
00:37:47:
00:37:47:Call Trace:
00:37:47: [&amp;lt;ffffffffa1214895&amp;gt;] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
00:37:48: [&amp;lt;ffffffffa1214e97&amp;gt;] lbug_with_loc+0x47/0xb0 [libcfs]
00:37:48: [&amp;lt;ffffffffa07b0e5a&amp;gt;] osc_lock_upcall+0x44a/0x5f0 [osc]
00:37:48: [&amp;lt;ffffffffa05d3000&amp;gt;] ? lustre_swab_ldlm_reply+0x0/0x40 [ptlrpc]
00:37:49: [&amp;lt;ffffffffa07b0a10&amp;gt;] ? osc_lock_upcall+0x0/0x5f0 [osc]
00:37:49: [&amp;lt;ffffffffa07917a6&amp;gt;] osc_enqueue_fini+0x106/0x240 [osc]
00:37:50: [&amp;lt;ffffffffa0796202&amp;gt;] osc_enqueue_interpret+0xe2/0x1e0 [osc]
00:37:51: [&amp;lt;ffffffffa05c3e04&amp;gt;] ptlrpc_check_set+0x2c4/0x1b40 [ptlrpc]
00:37:51: [&amp;lt;ffffffffa05ef20b&amp;gt;] ptlrpcd_check+0x53b/0x560 [ptlrpc]
00:37:51: [&amp;lt;ffffffffa05ef72b&amp;gt;] ptlrpcd+0x20b/0x370 [ptlrpc]
00:37:51: [&amp;lt;ffffffff81063410&amp;gt;] ? default_wake_function+0x0/0x20
00:37:52: [&amp;lt;ffffffffa05ef520&amp;gt;] ? ptlrpcd+0x0/0x370 [ptlrpc]
00:37:52: [&amp;lt;ffffffff81096a36&amp;gt;] kthread+0x96/0xa0
00:37:52: [&amp;lt;ffffffff8100c0ca&amp;gt;] child_rip+0xa/0x20
00:37:53: [&amp;lt;ffffffff810969a0&amp;gt;] ? kthread+0x0/0xa0
00:37:54: [&amp;lt;ffffffff8100c0c0&amp;gt;] ? child_rip+0x0/0x20
00:37:54:
00:37:54:Kernel panic - not syncing: LBUG
00:37:54:Pid: 4025, comm: ptlrpcd_1 Not tainted 2.6.32-358.18.1.el6.x86_64 #1
00:37:55:Call Trace:
00:37:55: [&amp;lt;ffffffff8150da18&amp;gt;] ? panic+0xa7/0x16f
00:37:55: [&amp;lt;ffffffffa1214eeb&amp;gt;] ? lbug_with_loc+0x9b/0xb0 [libcfs]
00:37:56: [&amp;lt;ffffffffa07b0e5a&amp;gt;] ? osc_lock_upcall+0x44a/0x5f0 [osc]
00:37:56: [&amp;lt;ffffffffa05d3000&amp;gt;] ? lustre_swab_ldlm_reply+0x0/0x40 [ptlrpc]
00:37:56: [&amp;lt;ffffffffa07b0a10&amp;gt;] ? osc_lock_upcall+0x0/0x5f0 [osc]
00:37:56: [&amp;lt;ffffffffa07917a6&amp;gt;] ? osc_enqueue_fini+0x106/0x240 [osc]
00:37:57: [&amp;lt;ffffffffa0796202&amp;gt;] ? osc_enqueue_interpret+0xe2/0x1e0 [osc]
00:37:57: [&amp;lt;ffffffffa05c3e04&amp;gt;] ? ptlrpc_check_set+0x2c4/0x1b40 [ptlrpc]
00:37:58: [&amp;lt;ffffffffa05ef20b&amp;gt;] ? ptlrpcd_check+0x53b/0x560 [ptlrpc]
00:37:58: [&amp;lt;ffffffffa05ef72b&amp;gt;] ? ptlrpcd+0x20b/0x370 [ptlrpc]
00:37:59: [&amp;lt;ffffffff81063410&amp;gt;] ? default_wake_function+0x0/0x20
00:37:59: [&amp;lt;ffffffffa05ef520&amp;gt;] ? ptlrpcd+0x0/0x370 [ptlrpc]
00:37:59: [&amp;lt;ffffffff81096a36&amp;gt;] ? kthread+0x96/0xa0
00:38:00: [&amp;lt;ffffffff8100c0ca&amp;gt;] ? child_rip+0xa/0x20
00:38:00: [&amp;lt;ffffffff810969a0&amp;gt;] ? kthread+0x0/0xa0
00:38:01: [&amp;lt;ffffffff8100c0c0&amp;gt;] ? child_rip+0x0/0x20
00:38:01:Initializing cgroup subsys cpuset
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt; </comment>
                            <comment id="70981" author="amk" created="Thu, 7 Nov 2013 16:22:43 +0000"  >&lt;p&gt;Cray is still seeing this LBUG in both 2.4.1 and 2.5 with the patches from &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-3027&quot; title=&quot;Failure on test suite parallel-scale test_write_disjoint: invalid file size 140329 instead of 160376 = 20047 * 8&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-3027&quot;&gt;&lt;del&gt;LU-3027&lt;/del&gt;&lt;/a&gt; applied:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;http://review.whamcloud.com/#/c/7569/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#/c/7569/&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;http://review.whamcloud.com/#/c/7841/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#/c/7841/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The rate of occurrence has increased since these patches landed but there have been many other changes to the systems and testing so I can&apos;t say the increase is related to the patches.&lt;/p&gt;

&lt;p&gt;Dumps are available if you want them. &lt;/p&gt;
</comment>
                            <comment id="71190" author="sarah" created="Sat, 9 Nov 2013 00:50:36 +0000"  >&lt;p&gt;Hit this error in racer test&lt;br/&gt;
server and client: lustre-master build #1751 RHEL6 ldiskfs&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://maloo.whamcloud.com/test_sets/fca9c502-47e7-11e3-a445-52540035b04c&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://maloo.whamcloud.com/test_sets/fca9c502-47e7-11e3-a445-52540035b04c&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;client console&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;/racer/racer.sh /mnt/lustre/racer 
15:10:34:LustreError: 4980:0:(vvp_io.c:1079:vvp_io_commit_write()) Write page 16966 of inode ffff88006806a6b8 failed -28
15:10:34:LustreError: 4982:0:(vvp_io.c:1079:vvp_io_commit_write()) Write page 16966 of inode ffff88006806a6b8 failed -28
15:10:34:LustreError: 16217:0:(vvp_io.c:1079:vvp_io_commit_write()) Write page 15 of inode ffff88007d0366b8 failed -28
15:10:35:LustreError: 4631:0:(vvp_io.c:1079:vvp_io_commit_write()) Write page 0 of inode ffff88007a4fd638 failed -28
15:10:35:LustreError: 4631:0:(vvp_io.c:1079:vvp_io_commit_write()) Skipped 1 previous similar message
15:10:35:LustreError: 16904:0:(vvp_io.c:1079:vvp_io_commit_write()) Write page 1 of inode ffff88007c6fc678 failed -28
15:10:35:LustreError: 18922:0:(vvp_io.c:1079:vvp_io_commit_write()) Write page 0 of inode ffff88007d73cb38 failed -28
15:10:36:LustreError: 18922:0:(vvp_io.c:1079:vvp_io_commit_write()) Skipped 5 previous similar messages
15:10:36:LustreError: 4613:0:(vvp_io.c:1079:vvp_io_commit_write()) Write page 0 of inode ffff880069823b38 failed -28
15:10:37:LustreError: 4613:0:(vvp_io.c:1079:vvp_io_commit_write()) Skipped 3 previous similar messages
15:10:37:LustreError: 17029:0:(vvp_io.c:1079:vvp_io_commit_write()) Write page 10571 of inode ffff880069ec6b38 failed -28
15:10:37:LustreError: 17029:0:(vvp_io.c:1079:vvp_io_commit_write()) Skipped 5 previous similar messages
15:10:38:LustreError: 4613:0:(vvp_io.c:1079:vvp_io_commit_write()) Write page 0 of inode ffff88002f769178 failed -28
15:10:38:LustreError: 4613:0:(vvp_io.c:1079:vvp_io_commit_write()) Skipped 14 previous similar messages
15:10:38:LustreError: 30383:0:(osc_lock.c:511:osc_lock_upcall()) ASSERTION( lock-&amp;gt;cll_state &amp;gt;= CLS_QUEUING ) failed: 
15:10:40:LustreError: 30383:0:(osc_lock.c:511:osc_lock_upcall()) LBUG
15:10:40:Pid: 30383, comm: ptlrpcd_1
15:10:40:
15:10:40:Call Trace:
15:10:40: [&amp;lt;ffffffffa0834895&amp;gt;] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
15:10:41: [&amp;lt;ffffffffa0834e97&amp;gt;] lbug_with_loc+0x47/0xb0 [libcfs]
15:10:41: [&amp;lt;ffffffffa04a032a&amp;gt;] osc_lock_upcall+0x44a/0x5f0 [osc]
15:10:41: [&amp;lt;ffffffffa0e242b0&amp;gt;] ? lustre_swab_ldlm_reply+0x0/0x40 [ptlrpc]
15:10:42: [&amp;lt;ffffffffa049fee0&amp;gt;] ? osc_lock_upcall+0x0/0x5f0 [osc]
15:10:42: [&amp;lt;ffffffffa04807a6&amp;gt;] osc_enqueue_fini+0x106/0x240 [osc]
15:10:43: [&amp;lt;ffffffffa0485282&amp;gt;] osc_enqueue_interpret+0xe2/0x1e0 [osc]
15:10:43: [&amp;lt;ffffffffa0e15034&amp;gt;] ptlrpc_check_set+0x2c4/0x1b40 [ptlrpc]
15:10:43: [&amp;lt;ffffffffa0e4053b&amp;gt;] ptlrpcd_check+0x53b/0x560 [ptlrpc]
15:10:44: [&amp;lt;ffffffffa0e40a5b&amp;gt;] ptlrpcd+0x20b/0x370 [ptlrpc]
15:10:44: [&amp;lt;ffffffff81063990&amp;gt;] ? default_wake_function+0x0/0x20
15:10:44: [&amp;lt;ffffffffa0e40850&amp;gt;] ? ptlrpcd+0x0/0x370 [ptlrpc]
15:10:44: [&amp;lt;ffffffff81096a36&amp;gt;] kthread+0x96/0xa0
15:10:44: [&amp;lt;ffffffff8100c0ca&amp;gt;] child_rip+0xa/0x20
15:10:44: [&amp;lt;ffffffff810969a0&amp;gt;] ? kthread+0x0/0xa0
15:10:45: [&amp;lt;ffffffff8100c0c0&amp;gt;] ? child_rip+0x0/0x20
15:10:45:
15:10:46:Kernel panic - not syncing: LBUG
15:10:46:Pid: 30383, comm: ptlrpcd_1 Not tainted 2.6.32-358.23.2.el6.x86_64 #1
15:10:46:Call Trace:
15:10:46: [&amp;lt;ffffffff8150daac&amp;gt;] ? panic+0xa7/0x16f
15:10:46: [&amp;lt;ffffffffa0834eeb&amp;gt;] ? lbug_with_loc+0x9b/0xb0 [libcfs]
15:10:47: [&amp;lt;ffffffffa04a032a&amp;gt;] ? osc_lock_upcall+0x44a/0x5f0 [osc]
15:10:48: [&amp;lt;ffffffffa0e242b0&amp;gt;] ? lustre_swab_ldlm_reply+0x0/0x40 [ptlrpc]
15:10:48: [&amp;lt;ffffffffa049fee0&amp;gt;] ? osc_lock_upcall+0x0/0x5f0 [osc]
15:10:48: [&amp;lt;ffffffffa04807a6&amp;gt;] ? osc_enqueue_fini+0x106/0x240 [osc]
15:10:49: [&amp;lt;ffffffffa0485282&amp;gt;] ? osc_enqueue_interpret+0xe2/0x1e0 [osc]
15:10:49: [&amp;lt;ffffffffa0e15034&amp;gt;] ? ptlrpc_check_set+0x2c4/0x1b40 [ptlrpc]
15:10:49: [&amp;lt;ffffffffa0e4053b&amp;gt;] ? ptlrpcd_check+0x53b/0x560 [ptlrpc]
15:10:49: [&amp;lt;ffffffffa0e40a5b&amp;gt;] ? ptlrpcd+0x20b/0x370 [ptlrpc]
15:10:50: [&amp;lt;ffffffff81063990&amp;gt;] ? default_wake_function+0x0/0x20
15:10:50: [&amp;lt;ffffffffa0e40850&amp;gt;] ? ptlrpcd+0x0/0x370 [ptlrpc]
15:10:50: [&amp;lt;ffffffff81096a36&amp;gt;] ? kthread+0x96/0xa0
15:10:50: [&amp;lt;ffffffff8100c0ca&amp;gt;] ? child_rip+0xa/0x20
15:10:50: [&amp;lt;ffffffff810969a0&amp;gt;] ? kthread+0x0/0xa0
15:10:51: [&amp;lt;ffffffff8100c0c0&amp;gt;] ? child_rip+0x0/0x20
15:10:52:Initializing cgroup subsys cpuset
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="71194" author="cliffw" created="Sat, 9 Nov 2013 01:19:01 +0000"  >&lt;p&gt;Hit this error attempting to test ZFS on Hyperion, running IOR single-shared-file&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;2013-11-08 17:12:17 LustreError: 78127:0:(osc_lock.c:511:osc_lock_upcall()) ASSERTION( lock-&amp;gt;cll_state &amp;gt;= CLS_QUEUING ) failed:
2013-11-08 17:12:17 LustreError: 78134:0:(osc_lock.c:511:osc_lock_upcall()) ASSERTION( lock-&amp;gt;cll_state &amp;gt;= CLS_QUEUING ) failed:
2013-11-08 17:12:17 LustreError: 78134:0:(osc_lock.c:511:osc_lock_upcall()) LBUG
2013-11-08 17:12:17 Pid: 78134, comm: ptlrpcd_7
2013-11-08 17:12:17
2013-11-08 17:12:17 Call Trace:
2013-11-08 17:12:17  [&amp;lt;ffffffffa056b895&amp;gt;] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
2013-11-08 17:12:17  [&amp;lt;ffffffffa056be97&amp;gt;] lbug_with_loc+0x47/0xb0 [libcfs]
2013-11-08 17:12:17  [&amp;lt;ffffffffa0dc532a&amp;gt;] osc_lock_upcall+0x44a/0x5f0 [osc]
2013-11-08 17:12:17  [&amp;lt;ffffffffa0ae62b0&amp;gt;] ? lustre_swab_ldlm_reply+0x0/0x40 [ptlrpc]
2013-11-08 17:12:17  [&amp;lt;ffffffffa0dc4ee0&amp;gt;] ? osc_lock_upcall+0x0/0x5f0 [osc]
2013-11-08 17:12:17  [&amp;lt;ffffffffa0da57a6&amp;gt;] osc_enqueue_fini+0x106/0x240 [osc]
2013-11-08 17:12:17  [&amp;lt;ffffffffa0daa282&amp;gt;] osc_enqueue_interpret+0xe2/0x1e0 [osc]
2013-11-08 17:12:17  [&amp;lt;ffffffffa0ad7034&amp;gt;] ptlrpc_check_set+0x2c4/0x1b40 [ptlrpc]
2013-11-08 17:12:17  [&amp;lt;ffffffffa0b0253b&amp;gt;] ptlrpcd_check+0x53b/0x560 [ptlrpc]
2013-11-08 17:12:17  [&amp;lt;ffffffffa0b02a5b&amp;gt;] ptlrpcd+0x20b/0x370 [ptlrpc]
2013-11-08 17:12:17  [&amp;lt;ffffffff81063990&amp;gt;] ? default_wake_function+0x0/0x20
2013-11-08 17:12:17  [&amp;lt;ffffffffa0b02850&amp;gt;] ? ptlrpcd+0x0/0x370 [ptlrpc]
2013-11-08 17:12:17  [&amp;lt;ffffffff81096a36&amp;gt;] kthread+0x96/0xa0
2013-11-08 17:12:17  [&amp;lt;ffffffff8100c0ca&amp;gt;] child_rip+0xa/0x20
2013-11-08 17:12:17  [&amp;lt;ffffffff810969a0&amp;gt;] ? kthread+0x0/0xa0
2013-11-08 17:12:17  [&amp;lt;ffffffff8100c0c0&amp;gt;] ? child_rip+0x0/0x20
2013-11-08 17:12:17
2013-11-08 17:12:17 LustreError: dumping log to /tmp/lustre-log.1383959537.78134
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="71259" author="cliffw" created="Mon, 11 Nov 2013 19:50:20 +0000"  >&lt;p&gt;I am repeating the IOR test, but calling &apos;sync&apos; on the clients every 180 seconds. So far have not hit the LBUG. Was quite repeatable previously.&lt;/p&gt;</comment>
                            <comment id="71263" author="utopiabound" created="Mon, 11 Nov 2013 20:10:53 +0000"  >&lt;p&gt;Fix issues with parallel-scale&lt;br/&gt;
&lt;a href=&quot;http://review.whamcloud.com/8234&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/8234&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="71370" author="jay" created="Tue, 12 Nov 2013 20:37:16 +0000"  >&lt;p&gt;There is almost no log for this issue. Is it possible to reproduce it with latest build and collect logs? To find a reproducible path is also useful to verify the patch in the future.&lt;/p&gt;</comment>
                            <comment id="71372" author="paf" created="Tue, 12 Nov 2013 20:43:09 +0000"  >&lt;p&gt;Jinshan - Our earlier reproducer (found in the ticket description) was fixed by the earlier patches.&lt;/p&gt;

&lt;p&gt;Since getting those patches, Cray has only been hitting this unpredictably, during large test runs, and we haven&apos;t hit it with more than default debugging enabled.  We&apos;ve tried one of these runs with full debugging and didn&apos;t hit it.&lt;/p&gt;</comment>
                            <comment id="71390" author="paf" created="Wed, 13 Nov 2013 02:54:29 +0000"  >&lt;p&gt;A bit of good news.  I hit this with +dlmtrace and +rpctrace enabled on master from today (with an unrelated patch for &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-4185&quot; title=&quot;Incorrect permission handling when creating existing directories at ICHEC&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-4185&quot;&gt;&lt;del&gt;LU-4185&lt;/del&gt;&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;I&apos;ve uploaded the dump to ftp.whamcloud.com in &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-3889&quot; title=&quot; LBUG: (osc_lock.c:497:osc_lock_upcall()) ASSERTION( lock-&amp;gt;cll_state &amp;gt;= CLS_QUEUING ) &quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-3889&quot;&gt;&lt;del&gt;LU-3889&lt;/del&gt;&lt;/a&gt;.  The dump is called lu-3889-131112-client.tar.gz.&lt;/p&gt;

&lt;p&gt;Edit:&lt;br/&gt;
I hit this bug by running racer.sh.&lt;/p&gt;

&lt;p&gt;Edit again:&lt;br/&gt;
I was eventually able to get the logs, it just took a bit over an hour to extract the 1 GB log buffer from the dump.  That&apos;s probably much larger than is needed, but I&apos;d rather too large than too small.&lt;br/&gt;
The logs are now up as well:&lt;br/&gt;
lu-3889-131112-client_logs.tar.gz&lt;/p&gt;</comment>
                            <comment id="71393" author="jay" created="Wed, 13 Nov 2013 04:54:50 +0000"  >&lt;p&gt;Hi Patrick,&lt;/p&gt;

&lt;p&gt;Thank you, I&apos;ll take a look at the log. Can you please tell me what&apos;s the parameters you used to run the racer?&lt;/p&gt;

&lt;p&gt;Jinshan&lt;/p&gt;</comment>
                            <comment id="71394" author="paf" created="Wed, 13 Nov 2013 05:16:29 +0000"  >&lt;p&gt;Jinshan - I ran it without any parameters.  I simply edited it to change the time limit so it would run indefinitely and executed it (the racer.sh found in lustre/tests/racer/racer.sh)&lt;/p&gt;

&lt;p&gt;I hit the bug after a bit less than two hours with just +dlmtrace and +rpctace.  Earlier, I ran for about 12 hours with debug=-1 and didn&apos;t hit the bug.&lt;/p&gt;</comment>
                            <comment id="71459" author="jay" created="Wed, 13 Nov 2013 19:42:38 +0000"  >&lt;p&gt;I can reproduce this issue and I&apos;m working on it now.&lt;/p&gt;</comment>
                            <comment id="72303" author="aboyko" created="Tue, 26 Nov 2013 14:37:59 +0000"  >&lt;p&gt;Hi, root cause is bad lock cleanup for situation when process got fatal signal. Here is the fault analyze.&lt;br/&gt;
1. lov_lock_enqueue() enqueue a sublocks.&lt;br/&gt;
2. osc_enqueue() send the rpc to server.&lt;br/&gt;
3. cl_enqueue_locked()-&amp;gt;cl_lock_state_wait() finish with -ERESTARTSYS&lt;br/&gt;
4. cl_enqueue_locked()-&amp;gt;cl_unuse_try() unuse locks, ENQUEUED lock became NEW&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;00000020:00000001:29.0:1385413027.287708:0:26152:0:(cl_lock.c:1366:cl_unuse_try()) Process entered
00000020:00010000:29.0:1385413027.287709:0:26152:0:(cl_lock.c:150:cl_lock_trace0()) unuse lock: ffff880e6998b078@(3 ffff880e2ad1d0c0 1 2 0 2 1 0)(ffff880e6795ff08/1/0) at cl_unuse_try():1367
00000020:00010000:29.0:1385413027.287721:0:26152:0:(cl_lock.c:150:cl_lock_trace0()) hold release lock: ffff880e6998b078@(4 ffff880e2ad1d0c0 1 0 0 3 0 0)(ffff880e6795ff08/1/0) at cl_lock_hold_release():907
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;5. Got reply from server&lt;br/&gt;
6. osc_lock_upcall()) ASSERTION( lock-&amp;gt;cll_state &amp;gt;= CLS_QUEUING)&lt;/p&gt;

&lt;p&gt;The main problem is lock state change from ENQUEUED to NEW before reply have come. And this call happened by error handling for cl_enqueue_locked(). cl_unuse_try() in cl_enqueue_locked() was introduced by this patch &lt;a href=&quot;http://review.whamcloud.com/#/c/2654&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#/c/2654&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Jinshan, what do you think? What is the best way to fix this?&lt;/p&gt;</comment>
                            <comment id="72345" author="jay" created="Tue, 26 Nov 2013 21:32:02 +0000"  >&lt;p&gt;Hi Boyko,&lt;/p&gt;

&lt;p&gt;I think you&apos;re right. The root cause of this issue is that a CLS_ENQUEUED lock was unused which confused the code at osc_lock_upcall(). Thanks for looking at it.&lt;/p&gt;

&lt;p&gt;I pushed a patch at &lt;a href=&quot;http://review.whamcloud.com/8405&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/8405&lt;/a&gt;, please give it a try.&lt;/p&gt;

&lt;p&gt;Jinshan&lt;/p&gt;
</comment>
                            <comment id="72356" author="paf" created="Tue, 26 Nov 2013 23:16:28 +0000"  >&lt;p&gt;Jinshan, Alex,&lt;/p&gt;

&lt;p&gt;We just tested this patch with racer, and hit the following LBUG:&lt;/p&gt;

&lt;p&gt;2013-11-26T17:06:49.095392-06:00 c1-0c0s1n1 LustreError: 23630:0:(cl_lock.c:1114:cl_use_try()) ASSERTION( result != -38 ) failed:&lt;br/&gt;
2013-11-26T17:06:49.095417-06:00 c1-0c0s1n1 LustreError: 23630:0:(cl_lock.c:1114:cl_use_try()) LBUG&lt;br/&gt;
2013-11-26T17:06:49.095442-06:00 c1-0c0s1n1 Pid: 23630, comm: cat&lt;br/&gt;
2013-11-26T17:06:49.120814-06:00 c1-0c0s1n1 Call Trace:&lt;br/&gt;
2013-11-26T17:06:49.120830-06:00 c1-0c0s1n1 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff81005db9&amp;gt;&amp;#93;&lt;/span&gt; try_stack_unwind+0x169/0x1b0&lt;br/&gt;
2013-11-26T17:06:49.120842-06:00 c1-0c0s1n1 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff81004849&amp;gt;&amp;#93;&lt;/span&gt; dump_trace+0x89/0x450&lt;br/&gt;
2013-11-26T17:06:49.146152-06:00 c1-0c0s1n1 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa02298d7&amp;gt;&amp;#93;&lt;/span&gt; libcfs_debug_dumpstack+0x57/0x80 &lt;span class=&quot;error&quot;&gt;&amp;#91;libcfs&amp;#93;&lt;/span&gt;&lt;br/&gt;
2013-11-26T17:06:49.146167-06:00 c1-0c0s1n1 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0229e37&amp;gt;&amp;#93;&lt;/span&gt; lbug_with_loc+0x47/0xc0 &lt;span class=&quot;error&quot;&gt;&amp;#91;libcfs&amp;#93;&lt;/span&gt;&lt;br/&gt;
2013-11-26T17:06:49.146186-06:00 c1-0c0s1n1 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa037ee8b&amp;gt;&amp;#93;&lt;/span&gt; cl_use_try+0x29b/0x2d0 &lt;span class=&quot;error&quot;&gt;&amp;#91;obdclass&amp;#93;&lt;/span&gt;&lt;br/&gt;
2013-11-26T17:06:49.171468-06:00 c1-0c0s1n1 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa037f01d&amp;gt;&amp;#93;&lt;/span&gt; cl_enqueue_try+0x15d/0x320 &lt;span class=&quot;error&quot;&gt;&amp;#91;obdclass&amp;#93;&lt;/span&gt;&lt;br/&gt;
2013-11-26T17:06:49.171484-06:00 c1-0c0s1n1 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa037feaf&amp;gt;&amp;#93;&lt;/span&gt; cl_enqueue_locked+0x7f/0x1f0 &lt;span class=&quot;error&quot;&gt;&amp;#91;obdclass&amp;#93;&lt;/span&gt;&lt;br/&gt;
2013-11-26T17:06:49.196788-06:00 c1-0c0s1n1 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0380ace&amp;gt;&amp;#93;&lt;/span&gt; cl_lock_request+0x7e/0x270 &lt;span class=&quot;error&quot;&gt;&amp;#91;obdclass&amp;#93;&lt;/span&gt;&lt;br/&gt;
2013-11-26T17:06:49.196815-06:00 c1-0c0s1n1 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa03860ce&amp;gt;&amp;#93;&lt;/span&gt; cl_io_lock+0x39e/0x5d0 &lt;span class=&quot;error&quot;&gt;&amp;#91;obdclass&amp;#93;&lt;/span&gt;&lt;br/&gt;
2013-11-26T17:06:49.196832-06:00 c1-0c0s1n1 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa03863a2&amp;gt;&amp;#93;&lt;/span&gt; cl_io_loop+0xa2/0x1b0 &lt;span class=&quot;error&quot;&gt;&amp;#91;obdclass&amp;#93;&lt;/span&gt;&lt;br/&gt;
2013-11-26T17:06:49.222102-06:00 c1-0c0s1n1 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0811d8f&amp;gt;&amp;#93;&lt;/span&gt; ll_file_io_generic+0x3bf/0x5f0 &lt;span class=&quot;error&quot;&gt;&amp;#91;lustre&amp;#93;&lt;/span&gt;&lt;br/&gt;
2013-11-26T17:06:49.222126-06:00 c1-0c0s1n1 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa081249b&amp;gt;&amp;#93;&lt;/span&gt; ll_file_aio_read+0x23b/0x290 &lt;span class=&quot;error&quot;&gt;&amp;#91;lustre&amp;#93;&lt;/span&gt;&lt;br/&gt;
2013-11-26T17:06:49.247420-06:00 c1-0c0s1n1 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa08130aa&amp;gt;&amp;#93;&lt;/span&gt; ll_file_read+0x1fa/0x290 &lt;span class=&quot;error&quot;&gt;&amp;#91;lustre&amp;#93;&lt;/span&gt;&lt;br/&gt;
2013-11-26T17:06:49.247451-06:00 c1-0c0s1n1 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff81143868&amp;gt;&amp;#93;&lt;/span&gt; vfs_read+0xc8/0x180&lt;br/&gt;
2013-11-26T17:06:49.247481-06:00 c1-0c0s1n1 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff81143a25&amp;gt;&amp;#93;&lt;/span&gt; sys_read+0x55/0x90&lt;br/&gt;
2013-11-26T17:06:49.247497-06:00 c1-0c0s1n1 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff8138d7ab&amp;gt;&amp;#93;&lt;/span&gt; system_call_fastpath+0x16/0x1b&lt;br/&gt;
2013-11-26T17:06:49.247509-06:00 c1-0c0s1n1 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;00002aaaaad98190&amp;gt;&amp;#93;&lt;/span&gt; 0x2aaaaad98190&lt;br/&gt;
2013-11-26T17:06:49.386314-06:00 c1-0c0s1n1 Kernel panic - not syncing: LBUG&lt;br/&gt;
2013-11-26T17:06:49.437010-06:00 c1-0c0s1n1 Pid: 23630, comm: cat Tainted: P            3.0.80-0.5.1_1.0501.7664-cray_ari_c #1&lt;br/&gt;
2013-11-26T17:06:49.437035-06:00 c1-0c0s1n1 Call Trace:&lt;br/&gt;
2013-11-26T17:06:49.437055-06:00 c1-0c0s1n1 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff81005db9&amp;gt;&amp;#93;&lt;/span&gt; try_stack_unwind+0x169/0x1b0&lt;br/&gt;
2013-11-26T17:06:49.462343-06:00 c1-0c0s1n1 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff81004849&amp;gt;&amp;#93;&lt;/span&gt; dump_trace+0x89/0x450&lt;br/&gt;
2013-11-26T17:06:49.462372-06:00 c1-0c0s1n1 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff8100581c&amp;gt;&amp;#93;&lt;/span&gt; show_trace_log_lvl+0x5c/0x80&lt;br/&gt;
2013-11-26T17:06:49.462388-06:00 c1-0c0s1n1 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff81005855&amp;gt;&amp;#93;&lt;/span&gt; show_trace+0x15/0x20&lt;br/&gt;
2013-11-26T17:06:49.462408-06:00 c1-0c0s1n1 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff813829db&amp;gt;&amp;#93;&lt;/span&gt; dump_stack+0x79/0x84&lt;br/&gt;
2013-11-26T17:06:49.462419-06:00 c1-0c0s1n1 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff81382a7a&amp;gt;&amp;#93;&lt;/span&gt; panic+0x94/0x1d2&lt;br/&gt;
2013-11-26T17:06:49.487630-06:00 c1-0c0s1n1 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0229e9b&amp;gt;&amp;#93;&lt;/span&gt; lbug_with_loc+0xab/0xc0 &lt;span class=&quot;error&quot;&gt;&amp;#91;libcfs&amp;#93;&lt;/span&gt;&lt;br/&gt;
2013-11-26T17:06:49.487654-06:00 c1-0c0s1n1 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa037ee8b&amp;gt;&amp;#93;&lt;/span&gt; cl_use_try+0x29b/0x2d0 &lt;span class=&quot;error&quot;&gt;&amp;#91;obdclass&amp;#93;&lt;/span&gt;&lt;br/&gt;
2013-11-26T17:06:49.513026-06:00 c1-0c0s1n1 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa037f01d&amp;gt;&amp;#93;&lt;/span&gt; cl_enqueue_try+0x15d/0x320 &lt;span class=&quot;error&quot;&gt;&amp;#91;obdclass&amp;#93;&lt;/span&gt;&lt;br/&gt;
2013-11-26T17:06:49.513051-06:00 c1-0c0s1n1 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa037feaf&amp;gt;&amp;#93;&lt;/span&gt; cl_enqueue_locked+0x7f/0x1f0 &lt;span class=&quot;error&quot;&gt;&amp;#91;obdclass&amp;#93;&lt;/span&gt;&lt;br/&gt;
2013-11-26T17:06:49.513067-06:00 c1-0c0s1n1 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0380ace&amp;gt;&amp;#93;&lt;/span&gt; cl_lock_request+0x7e/0x270 &lt;span class=&quot;error&quot;&gt;&amp;#91;obdclass&amp;#93;&lt;/span&gt;&lt;br/&gt;
2013-11-26T17:06:49.538364-06:00 c1-0c0s1n1 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa03860ce&amp;gt;&amp;#93;&lt;/span&gt; cl_io_lock+0x39e/0x5d0 &lt;span class=&quot;error&quot;&gt;&amp;#91;obdclass&amp;#93;&lt;/span&gt;&lt;br/&gt;
2013-11-26T17:06:49.538389-06:00 c1-0c0s1n1 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa03863a2&amp;gt;&amp;#93;&lt;/span&gt; cl_io_loop+0xa2/0x1b0 &lt;span class=&quot;error&quot;&gt;&amp;#91;obdclass&amp;#93;&lt;/span&gt;&lt;br/&gt;
2013-11-26T17:06:49.538410-06:00 c1-0c0s1n1 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0811d8f&amp;gt;&amp;#93;&lt;/span&gt; ll_file_io_generic+0x3bf/0x5f0 &lt;span class=&quot;error&quot;&gt;&amp;#91;lustre&amp;#93;&lt;/span&gt;&lt;br/&gt;
2013-11-26T17:06:49.563613-06:00 c1-0c0s1n1 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa081249b&amp;gt;&amp;#93;&lt;/span&gt; ll_file_aio_read+0x23b/0x290 &lt;span class=&quot;error&quot;&gt;&amp;#91;lustre&amp;#93;&lt;/span&gt;&lt;br/&gt;
2013-11-26T17:06:49.563637-06:00 c1-0c0s1n1 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa08130aa&amp;gt;&amp;#93;&lt;/span&gt; ll_file_read+0x1fa/0x290 &lt;span class=&quot;error&quot;&gt;&amp;#91;lustre&amp;#93;&lt;/span&gt;&lt;br/&gt;
2013-11-26T17:06:49.588991-06:00 c1-0c0s1n1 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff81143868&amp;gt;&amp;#93;&lt;/span&gt; vfs_read+0xc8/0x180&lt;br/&gt;
2013-11-26T17:06:49.589016-06:00 c1-0c0s1n1 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff81143a25&amp;gt;&amp;#93;&lt;/span&gt; sys_read+0x55/0x90&lt;br/&gt;
2013-11-26T17:06:49.589034-06:00 c1-0c0s1n1 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff8138d7ab&amp;gt;&amp;#93;&lt;/span&gt; system_call_fastpath+0x16/0x1b&lt;br/&gt;
2013-11-26T17:06:49.589046-06:00 c1-0c0s1n1 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;00002aaaaad98190&amp;gt;&amp;#93;&lt;/span&gt; 0x2aaaaad9818f&lt;/p&gt;
&lt;hr /&gt;

&lt;p&gt;We&apos;re trying to get debug logs from that attempt now.&lt;/p&gt;

&lt;p&gt;One note of caution: Due to a network problem on the system, we&apos;re getting a lot of EBUSYs as well.  The LBUG seems more likely to be related to the patch, but I just wanted you to know.&lt;/p&gt;</comment>
                            <comment id="72359" author="jay" created="Tue, 26 Nov 2013 23:48:33 +0000"  >&lt;p&gt;did you apply other patches?&lt;/p&gt;</comment>
                            <comment id="72360" author="paf" created="Tue, 26 Nov 2013 23:52:09 +0000"  >&lt;p&gt;This is a Cray version of 2.5 - At this point, it&apos;s extremely close to the recent 2.5 release, but with this patch and that from &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-4152&quot; title=&quot; layout locks can cause deadlock&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-4152&quot;&gt;&lt;del&gt;LU-4152&lt;/del&gt;&lt;/a&gt;, which should be unrelated.&lt;/p&gt;

&lt;p&gt;We&apos;re still trying to get debug logs, I&apos;ll let you know if we can get something there.&lt;/p&gt;

&lt;p&gt;Update: We&apos;ve had to table our attempt to get debug logs for tonight.  The soonest we&apos;ll have more is tomorrow, sorry.&lt;br/&gt;
We have a dump if you&apos;d like to take a look, but the debug was set to default levels, so I don&apos;t know if it will help much.&lt;/p&gt;</comment>
                            <comment id="72366" author="jay" created="Wed, 27 Nov 2013 03:32:13 +0000"  >&lt;p&gt;From the message, it seems like -&amp;gt;clo_use is not defined which is impossible. Can you please print out the lock information with cl_lock_print()?&lt;/p&gt;

&lt;p&gt;Can you also provide me a test program you&apos;re running to reproduce this issue?&lt;/p&gt;</comment>
                            <comment id="72377" author="shadow" created="Wed, 27 Nov 2013 08:23:24 +0000"  >&lt;p&gt;Jay,&lt;/p&gt;

&lt;p&gt;reproducer likely to be racer.&lt;/p&gt;</comment>
                            <comment id="72393" author="paf" created="Wed, 27 Nov 2013 15:17:43 +0000"  >&lt;p&gt;Jinshan,&lt;/p&gt;

&lt;p&gt;Alex is right, we&apos;re using racer.  Our first priority this morning is starting testing on a clean system to see what happens, once that&apos;s going I&apos;ll look at printing out the lock info.&lt;/p&gt;

&lt;p&gt;Here&apos;s a more detailed description of how we&apos;re running racer:&lt;br/&gt;
Here&apos;s how we ran racer, with some of the environment variables replaced with command line options:&lt;br/&gt;
racer.sh -t 3600 -T 7 -f 20 -l -d $TMPDIR/race.3&lt;br/&gt;
^-- t is time, T is threads, f is files, -d is directory, and -l is an option we added telling it it&apos;s on Lustre and to run the Lustre specific tests.&lt;/p&gt;</comment>
                            <comment id="72403" author="shadow" created="Wed, 27 Nov 2013 16:06:30 +0000"  >&lt;p&gt;Patrik,&lt;/p&gt;

&lt;p&gt;may you try a test patch without changes in osc ? Alex Boyko worked on test to verify all races is solved with patch so we expect to have a verification in next few days.&lt;/p&gt;</comment>
                            <comment id="72404" author="paf" created="Wed, 27 Nov 2013 16:14:00 +0000"  >&lt;p&gt;Alex - Sure.  Same patch, just only the LOV changes?&lt;/p&gt;

&lt;p&gt;The the original patch as proposed by Jinshan, our testing on the system without the network issues is going well.&lt;/p&gt;


&lt;p&gt;I left out one line of log from the earlier problem:&lt;br/&gt;
2013-11-26T17:06:48.862188-06:00 c1-0c0s1n1 LustreError: 23427:0:(lcommon_cl.c:1209:cl_file_inode_init()) Failure to initialize cl object &lt;span class=&quot;error&quot;&gt;&amp;#91;0x20a750c50:0xac35:0x0&amp;#93;&lt;/span&gt;: -16&lt;br/&gt;
We were getting these repeatedly, due to the network problem on our system.&lt;/p&gt;</comment>
                            <comment id="72416" author="jay" created="Wed, 27 Nov 2013 18:39:24 +0000"  >&lt;p&gt;let me try to reproduce it.&lt;/p&gt;</comment>
                            <comment id="72424" author="paf" created="Wed, 27 Nov 2013 19:07:25 +0000"  >&lt;p&gt;Jinshan - We just tried for about 4 hours on ~20 clients with your full version of the patch and didn&apos;t hit any problems at all.&lt;/p&gt;

&lt;p&gt;We&apos;re going to try without the OSC changes next.&lt;/p&gt;</comment>
                            <comment id="72428" author="shadow" created="Wed, 27 Nov 2013 19:35:08 +0000"  >&lt;p&gt;Jay,&lt;/p&gt;

&lt;p&gt;can you explain why you introduce an OBD_FAIL_LOCK_STATE_WAIT_INTR but none tests used that?&lt;/p&gt;</comment>
                            <comment id="72441" author="jay" created="Wed, 27 Nov 2013 20:37:11 +0000"  >&lt;blockquote&gt;
&lt;p&gt;Jinshan - We just tried for about 4 hours on ~20 clients with your full version of the patch and didn&apos;t hit any problems at all.&lt;/p&gt;

&lt;p&gt;We&apos;re going to try without the OSC changes next.&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;Sorry I&apos;m confused, with which version you didn&apos;t see the problem, and which version caused assertion?&lt;/p&gt;</comment>
                            <comment id="72443" author="paf" created="Wed, 27 Nov 2013 20:43:26 +0000"  >&lt;p&gt;Jinshan,&lt;/p&gt;

&lt;p&gt;We tested your version of the patch yesterday on a system which had network problems that were causing EBUSY over and over.  While testing on that system, we hit the assertion I described in this comment: &lt;a href=&quot;https://jira.hpdd.intel.com/browse/LU-3889?focusedCommentId=72356&amp;amp;page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-72356&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://jira.hpdd.intel.com/browse/LU-3889?focusedCommentId=72356&amp;amp;page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-72356&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This morning, we tested for four hours with your version of the patch, on a different system (without network problems), and have had no problems.&lt;/p&gt;

&lt;p&gt;We are now testing the version of the patch without the OSC changes.  We&apos;ve been testing for about an hour, and have had no problems yet.  We&apos;re going to test for a few more hours today, I will let you know if we see anything.&lt;/p&gt;</comment>
                            <comment id="72448" author="paf" created="Wed, 27 Nov 2013 22:33:06 +0000"  >&lt;p&gt;We did about 3 hours of testing on the version of the patch without OSC changes.  No problems were seen.&lt;/p&gt;

&lt;p&gt;We have a 24 hour general test run scheduled on one of our systems for this weekend.  We&apos;re currently planning to test the version without OSC changes as suggested by Shadow, but if a new patch is generated, I could change the test run to test that instead.&lt;/p&gt;</comment>
                            <comment id="72553" author="jay" created="Sun, 1 Dec 2013 23:41:16 +0000"  >&lt;p&gt;Hi Patrick, it&apos;s fine to take out of OSC changes.&lt;/p&gt;</comment>
                            <comment id="72602" author="paf" created="Mon, 2 Dec 2013 15:30:21 +0000"  >&lt;p&gt;Unfortunately, our test run failed due to unrelated reasons.  We&apos;re going to do another later this week, I&apos;ll update with results as I have them.&lt;/p&gt;</comment>
                            <comment id="72680" author="aboyko" created="Tue, 3 Dec 2013 06:16:08 +0000"  >&lt;p&gt;I have added the regression test for this issue &lt;a href=&quot;http://review.whamcloud.com/8463&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/8463&lt;/a&gt;. I think, it should be included at patch, but the test based on the ASSERT at osc_lock_upcall().&lt;/p&gt;</comment>
                            <comment id="73302" author="jay" created="Wed, 11 Dec 2013 18:59:23 +0000"  >&lt;p&gt;Hi Boyko, I merged your patch into patch 8405, please take a look and thank you for your work.&lt;/p&gt;

&lt;p&gt;Jinshan&lt;/p&gt;</comment>
                            <comment id="73834" author="yujian" created="Thu, 19 Dec 2013 12:17:16 +0000"  >&lt;p&gt;Lustre Build: &lt;a href=&quot;http://build.whamcloud.com/job/lustre-b2_4/69/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://build.whamcloud.com/job/lustre-b2_4/69/&lt;/a&gt; (2.4.2 RC1)&lt;br/&gt;
MDSCOUNT=4&lt;/p&gt;

&lt;p&gt;racer test hit the same failure:&lt;br/&gt;
&lt;a href=&quot;https://maloo.whamcloud.com/test_sets/06c7507e-6875-11e3-a9a3-52540035b04c&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://maloo.whamcloud.com/test_sets/06c7507e-6875-11e3-a9a3-52540035b04c&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;parallel-scale-nfsv4 test iorssf hit the same failure:&lt;br/&gt;
&lt;a href=&quot;https://maloo.whamcloud.com/test_sets/d1809a06-6879-11e3-a9a3-52540035b04c&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://maloo.whamcloud.com/test_sets/d1809a06-6879-11e3-a9a3-52540035b04c&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="73916" author="yujian" created="Fri, 20 Dec 2013 07:46:47 +0000"  >&lt;p&gt;Lustre Build: &lt;a href=&quot;http://build.whamcloud.com/job/lustre-b2_4/69/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://build.whamcloud.com/job/lustre-b2_4/69/&lt;/a&gt; (2.4.2 RC1)&lt;br/&gt;
Distro/Arch: RHEL6.4/x86_64&lt;br/&gt;
MDSCOUNT=1&lt;/p&gt;

&lt;p&gt;racer test hit the same failure:&lt;br/&gt;
&lt;a href=&quot;https://maloo.whamcloud.com/test_sets/d15a7052-68ff-11e3-ab68-52540035b04c&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://maloo.whamcloud.com/test_sets/d15a7052-68ff-11e3-ab68-52540035b04c&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;https://maloo.whamcloud.com/test_sets/1cb24ab0-691f-11e3-8dc5-52540035b04c&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://maloo.whamcloud.com/test_sets/1cb24ab0-691f-11e3-8dc5-52540035b04c&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="74228" author="yujian" created="Thu, 2 Jan 2014 07:59:21 +0000"  >&lt;p&gt;Lustre Build: &lt;a href=&quot;http://build.whamcloud.com/job/lustre-b2_5/5&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://build.whamcloud.com/job/lustre-b2_5/5&lt;/a&gt;&lt;br/&gt;
Distro/Arch: RHEL6.4/x86_64&lt;br/&gt;
MDSCOUNT=1&lt;/p&gt;

&lt;p&gt;racer test hit the same failure:&lt;br/&gt;
&lt;a href=&quot;https://maloo.whamcloud.com/test_sets/c3410662-7362-11e3-8412-52540035b04c&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://maloo.whamcloud.com/test_sets/c3410662-7362-11e3-8412-52540035b04c&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="74323" author="bogl" created="Fri, 3 Jan 2014 23:14:21 +0000"  >&lt;p&gt;in b2_5: &lt;a href=&quot;http://review.whamcloud.com/8717&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/8717&lt;/a&gt;&lt;/p&gt;
</comment>
                            <comment id="74335" author="pjones" created="Sat, 4 Jan 2014 14:42:24 +0000"  >&lt;p&gt;Landed for 2.5.1 and 2.6&lt;/p&gt;</comment>
                            <comment id="75116" author="simmonsja" created="Thu, 16 Jan 2014 18:49:13 +0000"  >&lt;p&gt;Also hit this bug for 2.4&lt;/p&gt;</comment>
                            <comment id="76562" author="niu" created="Mon, 10 Feb 2014 02:46:06 +0000"  >&lt;p&gt;for b2_4: &lt;a href=&quot;http://review.whamcloud.com/9194&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/9194&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="86724" author="utopiabound" created="Mon, 16 Jun 2014 19:16:20 +0000"  >&lt;p&gt;Patch 8234 never made it into b2_5 while 7778 with a typo did:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;http://review.whamcloud.com/10731&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/10731&lt;/a&gt;&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10010">
                    <name>Duplicate</name>
                                                                <inwardlinks description="is duplicated by">
                                        <issuelink>
            <issuekey id="22504">LU-4394</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="18086">LU-3027</issuekey>
        </issuelink>
                            </outwardlinks>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="18086">LU-3027</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="19275">LU-3433</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="13432" name="osc_lock_LBUG-lu-3889.log.tar.gz" size="646561" author="paf" created="Thu, 5 Sep 2013 17:31:00 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzw0d3:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>10138</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>