<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:52:13 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-5525] ASSERTION( new_lock-&gt;l_readers + new_lock-&gt;l_writers == 0 ) failed</title>
                <link>https://jira.whamcloud.com/browse/LU-5525</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;After upgrading to lustre 2.4.2-14chaos (see github.com/chaos/lustre), we soon hit the following assertion on one of our MDS nodes:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;mdt_handler.c:3652:mdt_intent_lock_replace()) ASSERTION( new_lock-&amp;gt;l_readers + new_lock-&amp;gt;l_writers == 0 ) failed&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Perhaps most significantly, this tag of our lustre tree includes the patch entitled:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;LU-4584 mdt: ensure orig lock is found in hash upon resend&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;James Simmons reported this assertion when he tested the &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-4584&quot; title=&quot;Lock revocation process fails consistently&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-4584&quot;&gt;&lt;del&gt;LU-4584&lt;/del&gt;&lt;/a&gt; patch, but the Bruno &lt;a href=&quot;https://jira.hpdd.intel.com/browse/LU-4584?focusedCommentId=82590&amp;amp;page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-82590&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;made the evaluation&lt;/a&gt; that the assertion was unrelated to the patch.&lt;/p&gt;

&lt;p&gt;Whether it is related or not, we need to fix the problem.&lt;/p&gt;</description>
                <environment>MDS node, Lustre 2.4.2-14chaos, ZFS OBD</environment>
        <key id="26093">LU-5525</key>
            <summary>ASSERTION( new_lock-&gt;l_readers + new_lock-&gt;l_writers == 0 ) failed</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="2" iconUrl="https://jira.whamcloud.com/images/icons/priorities/critical.svg">Critical</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="3">Duplicate</resolution>
                                        <assignee username="bfaccini">Bruno Faccini</assignee>
                                    <reporter username="morrone">Christopher Morrone</reporter>
                        <labels>
                            <label>llnl</label>
                    </labels>
                <created>Wed, 20 Aug 2014 21:55:36 +0000</created>
                <updated>Wed, 13 Oct 2021 03:03:31 +0000</updated>
                            <resolved>Wed, 13 Oct 2021 03:03:31 +0000</resolved>
                                                                        <due></due>
                            <votes>0</votes>
                                    <watches>7</watches>
                                                                            <comments>
                            <comment id="92082" author="morrone" created="Wed, 20 Aug 2014 22:01:20 +0000"  >&lt;p&gt;Here is the backtrace for the thread that mdt01_006 thread that hit the assertion:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;mdt_intent_lock_replace
mdt_intent_reint
mdt_intent_polucy
ldlm_lock_enqueue
ldlm_handle_enqueue0
mdt_enqueue
mdt_handle_common
mds_regular_handle
ptlrpc_server_handle_request
ptlrpc_main

&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="92084" author="pjones" created="Wed, 20 Aug 2014 22:35:20 +0000"  >&lt;p&gt;Bruno&lt;/p&gt;

&lt;p&gt;Can you please advise on this ticket?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="92142" author="bfaccini" created="Thu, 21 Aug 2014 14:53:03 +0000"  >&lt;p&gt;Yes, I am investigating the long history around this issue ...&lt;/p&gt;</comment>
                            <comment id="92190" author="morrone" created="Fri, 22 Aug 2014 02:14:06 +0000"  >&lt;p&gt;We have hit this assertion 3 times in just over 24 hours on the MDS of our largest production filesystem.  I reverted the patch for &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-4584&quot; title=&quot;Lock revocation process fails consistently&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-4584&quot;&gt;&lt;del&gt;LU-4584&lt;/del&gt;&lt;/a&gt; in the hope of getting us back into a reasonably operational state again.  We should probably know in a day or two if the problem is gone.&lt;/p&gt;</comment>
                            <comment id="92293" author="bfaccini" created="Mon, 25 Aug 2014 07:42:04 +0000"  >&lt;p&gt;Hello Chris,&lt;/p&gt;

&lt;p&gt;I checked my notes for &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-4584&quot; title=&quot;Lock revocation process fails consistently&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-4584&quot;&gt;&lt;del&gt;LU-4584&lt;/del&gt;&lt;/a&gt; and this LBUG occured without my patch being applied.&lt;/p&gt;

&lt;p&gt;I suspect this will not be possible for you, but just in case. Did you ran with some debug levels enabled in Lustre trace at the time of these 3 crashes ? And if yes, could you extract+provide at least one of them ? Or better, could you provide a crash-dump ?&lt;/p&gt;

&lt;p&gt;Also, and I know it is not an easy question but ..., do you have any idea about what was the work-load that may cause this ?&lt;/p&gt;

&lt;p&gt;I am trying to setup a platform to reproduce in-house so any details about your configuration is of interest (number of MDSs/MDTs, DNE?, number of OSSs/OSTs, OSTs indexing, default striping, ...).&lt;/p&gt;

&lt;p&gt;Thanks again and in advance for your help.&lt;/p&gt;</comment>
                            <comment id="92331" author="morrone" created="Mon, 25 Aug 2014 17:45:06 +0000"  >&lt;p&gt;This is a not a machine for which we can provide logs or crash dumps.  No, we don&apos;t know the details of the various workloads going on at the time.&lt;/p&gt;

&lt;p&gt;The filesystem has 768 OSTs on 768 OSS nodes.  Default stripe value of 1.  Some users do use stripe counts at various widths, some over 700.&lt;/p&gt;

&lt;p&gt;I don&apos;t know what your notes say, but I checked and we crashed the MDS fifteen times during the three days that we were running with the &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-4584&quot; title=&quot;Lock revocation process fails consistently&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-4584&quot;&gt;&lt;del&gt;LU-4584&lt;/del&gt;&lt;/a&gt; patch.  We then rebooted the MDS onto Lustre version 2.4.2-14.1chaos (which does nothing but revert the &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-4584&quot; title=&quot;Lock revocation process fails consistently&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-4584&quot;&gt;&lt;del&gt;LU-4584&lt;/del&gt;&lt;/a&gt; patch), and we have not seen this ticket&apos;s assertion in over three days.&lt;/p&gt;

&lt;p&gt;While not conclusive, the circumstantial evidence points strongly at the &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-4584&quot; title=&quot;Lock revocation process fails consistently&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-4584&quot;&gt;&lt;del&gt;LU-4584&lt;/del&gt;&lt;/a&gt; patch either introducing the bug or making it much easier to hit.&lt;/p&gt;

&lt;p&gt;Here is the list of 2.4.2-14chaos to 2.4.2-14.1chaos changes:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;$ git log --oneline 2.4.2-14chaos..2.4.2-14.1chaos
f28b8cc Revert &quot;LU-4584 mdt: ensure orig lock is found in hash upon resend&quot;
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;You can find those tags at github.com/chaos/lustre.&lt;/p&gt;</comment>
                            <comment id="92348" author="simmonsja" created="Mon, 25 Aug 2014 20:17:04 +0000"  >&lt;p&gt;I hit this bug using the latest b2_5 branch with the &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-2827&quot; title=&quot;mdt_intent_fixup_resent() cannot find the proper lock in hash&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-2827&quot;&gt;&lt;del&gt;LU-2827&lt;/del&gt;&lt;/a&gt; patch. Good news for you is that I can share my crash dumps. A simple simul run on our Cray test bed produced this Oops. I uploaded the vmcore and dmesg to ftp.whamcloud.com/uploads/&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-5525&quot; title=&quot;ASSERTION( new_lock-&amp;gt;l_readers + new_lock-&amp;gt;l_writers == 0 ) failed&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-5525&quot;&gt;&lt;del&gt;LU-5525&lt;/del&gt;&lt;/a&gt;.&lt;/p&gt;</comment>
                            <comment id="92400" author="bfaccini" created="Tue, 26 Aug 2014 09:10:04 +0000"  >&lt;p&gt;Hello James, thanks for this crash-dump already! But I also need the corresponding vmlinux and Lustre modules to allow crash tool to run on it. By &quot;the latest b2_5 branch with the &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-2827&quot; title=&quot;mdt_intent_fixup_resent() cannot find the proper lock in hash&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-2827&quot;&gt;&lt;del&gt;LU-2827&lt;/del&gt;&lt;/a&gt; patch&quot;, do you mean you used one of our recent Jenkins build to get this crash? If yes, can you point me to which one ? And if not, can you also xfer the 2x kernel-debuginfo&lt;span class=&quot;error&quot;&gt;&amp;#91;-common&amp;#93;&lt;/span&gt; RPMs and also the luste-modules (or the lustre-debuginfo) RPM ?&lt;br/&gt;
BTW, I checked the vmcore-dmesg.txt file, you provided with the crash-dump, and its content seems to indicate that you run with both a Kernel and Lustre distro you have builded locally, right?&lt;/p&gt;</comment>
                            <comment id="92422" author="simmonsja" created="Tue, 26 Aug 2014 14:28:58 +0000"  >&lt;p&gt;Sorry I forgot the &lt;b&gt;debug&lt;/b&gt;.rpms. I just uploaded them to the same ftp spot. ORNL would really like to go to 2.5 ASAP.&lt;/p&gt;

&lt;p&gt;Yes, I built b2_5 at commit a43e0e4ce4b57240540e8d35a8ba44e203c70ae1 with some additional patches. We tag our kernels to avoid our automated rpm update system from stomping on them. They basically are the RHEL Lustre patched kernels. We also build our own kernel rpms to enable certain things like timestamps in the printk.&lt;/p&gt;</comment>
                            <comment id="92593" author="bfaccini" created="Wed, 27 Aug 2014 12:48:03 +0000"  >&lt;p&gt;James, &quot;crash&quot; tool claims &quot;WARNING: kernel version inconsistency between vmlinux and dumpfile&quot; and gives up due to further errors ...&lt;br/&gt;
Are you sure that at the time of crash you were running with this &quot;2.6.32-431.17.1.el6.wc&quot; kernel version for which you provided the debuginfo RPMs ? BTW, strings out from vmcore and vmcore-dmesg.txt report &quot;2.6.32-358.23.2.el6.atlas&quot;...&lt;/p&gt;</comment>
                            <comment id="92596" author="bfaccini" created="Wed, 27 Aug 2014 14:21:00 +0000"  >&lt;p&gt;Chris, I am really sorry about the fact my b2_4 patch for &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-4584&quot; title=&quot;Lock revocation process fails consistently&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-4584&quot;&gt;&lt;del&gt;LU-4584&lt;/del&gt;&lt;/a&gt; seems to cause you so many troubles.&lt;br/&gt;
This LBUG already has not occured when running auto-tests and your/&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-4584&quot; title=&quot;Lock revocation process fails consistently&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-4584&quot;&gt;&lt;del&gt;LU-4584&lt;/del&gt;&lt;/a&gt; reproducer, against Jenkins build for patch-set #7 of Gerrit change #9488, and I am still unable to reproduce it when running with a local build made of tag 2.4.3-RC1 + &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-4584&quot; title=&quot;Lock revocation process fails consistently&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-4584&quot;&gt;&lt;del&gt;LU-4584&lt;/del&gt;&lt;/a&gt; patch (ie, patch-set #7 of Gerrit change #9488). &lt;br/&gt;
I am currently building a Lustre version based on 2.4.2-14chaos and see if I can reproduce in-house, but I wanted to ask you if you tried to run your &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-4584&quot; title=&quot;Lock revocation process fails consistently&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-4584&quot;&gt;&lt;del&gt;LU-4584&lt;/del&gt;&lt;/a&gt; reproducer against it already ?&lt;/p&gt;</comment>
                            <comment id="92599" author="simmonsja" created="Wed, 27 Aug 2014 14:28:16 +0000"  >&lt;p&gt;Oops, you are right. I saw this problem with our 2.4 image which did contain the &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-4584&quot; title=&quot;Lock revocation process fails consistently&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-4584&quot;&gt;&lt;del&gt;LU-4584&lt;/del&gt;&lt;/a&gt; patch. Sorry for the confusion. I pushed the proper debug rpms this time.&lt;/p&gt;</comment>
                            <comment id="92618" author="bfaccini" created="Wed, 27 Aug 2014 16:30:48 +0000"  >&lt;p&gt;Chris, can you help and tell me the kernel version you use/build with 2.4.2-14chaos, I am having problems when patching the 2.6.32-358.23.2 kernel version we use to build our latest b2_4/2.4.3-RC1 branch/tag.&lt;/p&gt;</comment>
                            <comment id="92635" author="simmonsja" created="Wed, 27 Aug 2014 17:51:06 +0000"  >&lt;p&gt;I can report I don&apos;t see this issue with 2.5 servers. I see another problem instead.&lt;/p&gt;</comment>
                            <comment id="92656" author="morrone" created="Wed, 27 Aug 2014 20:43:12 +0000"  >&lt;blockquote&gt;&lt;p&gt;I wanted to ask you if you tried to run your &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-4584&quot; title=&quot;Lock revocation process fails consistently&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-4584&quot;&gt;&lt;del&gt;LU-4584&lt;/del&gt;&lt;/a&gt; reproducer against it already ?&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;Yes.  The &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-4584&quot; title=&quot;Lock revocation process fails consistently&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-4584&quot;&gt;&lt;del&gt;LU-4584&lt;/del&gt;&lt;/a&gt; patch did appear to help with the evictions on unlink.&lt;/p&gt;</comment>
                            <comment id="92713" author="bfaccini" created="Thu, 28 Aug 2014 15:10:51 +0000"  >&lt;p&gt;Ok, found that 2.6.32-431.3.1 kernel is compatible with 2.4.2-14chaos kernel patches ...&lt;/p&gt;</comment>
                            <comment id="92790" author="bfaccini" created="Fri, 29 Aug 2014 10:38:20 +0000"  >&lt;p&gt;But now I encounter issues during the make step, Chris, do you use special tricks to build from your source tree ??&lt;/p&gt;</comment>
                            <comment id="92791" author="bfaccini" created="Fri, 29 Aug 2014 10:54:35 +0000"  >&lt;p&gt;James,&lt;br/&gt;
It&apos;s too bad that at the time of this crash you ran with a Lustre debug mask made of only &quot;D_IOCTL+D_NETERROR+D_WANING+D_ERROR+D_EMERG+D_HA+D_CONFIG+D_CONSOLE&quot; and which did not contain at least &quot;D_RPCTRACE+D_DLMTRACE&quot; in addition ... That would have greatly helped to navigate in the crash-dump.&lt;/p&gt;</comment>
                            <comment id="92930" author="bfaccini" created="Mon, 1 Sep 2014 16:47:19 +0000"  >&lt;p&gt;Chris,&lt;br/&gt;
Please, can you help+answer me about my previous request and detail any specific/procedure to build from LLNL source tree/git ?&lt;/p&gt;</comment>
                            <comment id="93038" author="morrone" created="Tue, 2 Sep 2014 20:31:13 +0000"  >&lt;blockquote&gt;&lt;p&gt;But now I encounter issues during the make step, Chris, do you use special tricks to build from your source tree ??&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;Maybe?  Could you be more specific about the commands you issued and the problem that you saw?  If you had trouble in the liblustre part, try adding &lt;tt&gt;--disable-liblustre&lt;/tt&gt; to the configure command line.&lt;/p&gt;</comment>
                            <comment id="93067" author="bfaccini" created="Wed, 3 Sep 2014 00:13:13 +0000"  >&lt;p&gt;I 1st cloned the chaos git, then selected the 2.4.2-14chaos tag.&lt;br/&gt;
And then I applied the/our usual procedure described at &quot;https://wiki.hpdd.intel.com/pages/viewpage.action?pageId=8126821&quot;, to generate a ldiskfs-based (sorry I forgot to mention this ...) Server build, and get these error during the &quot;make rpms&quot; step :&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;
...............

Wrote: /home/bruno.faccini/kernel/rpmbuild/SRPMS/lustre-iokit-1.4.0-1.src.rpm
make[1]: Leaving directory `/home/bruno.faccini/lustre-chaos/lustre-iokit&apos;
rpmbuilddir=`mktemp -t -d lustre-build-$USER-XXXXXXXX`; \
        make  \
                rpmbuilddir=&quot;$rpmbuilddir&quot; \
                rpm-local || exit 1; \
        rpmbuild \
                --define &quot;_tmppath $rpmbuilddir/TMP&quot; \
                --define &quot;_topdir $rpmbuilddir&quot; \
                --define &quot;build_src_rpm 1&quot; \
                --define &quot;dist %{nil}&quot; \
        -ts lustre-2.4.2.tar.gz || exit 1; \
        cp $rpmbuilddir/SRPMS/*.src.rpm . || exit 1; \
        rm -f -R $rpmbuilddir
make[1]: Entering directory `/home/bruno.faccini/lustre-chaos&apos;
make[1]: Leaving directory `/home/bruno.faccini/lustre-chaos&apos;
error: Macro %rel_down has empty body
error: Macro %rel_build has empty body
error: line 96: Empty tag: Release:
make: *** [srpm] Error 1
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;So I suspect I don&apos;t use the right procedure for your tree since we don&apos;t have such &quot;rpmbuilddir&quot; variable/macro in our build infrastructure, which plays with/changes &quot;_topdir&quot; setting ...&lt;/p&gt;</comment>
                            <comment id="93238" author="morrone" created="Thu, 4 Sep 2014 20:54:45 +0000"  >&lt;p&gt;Looks like a bug.  You can work around it by using --with-downstream-release:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;$ sh autogen.sh
$ ./configure --with-downstream-release=14chaos --disable-liblustre
$ make rpms
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;If you are trying to reproduce what we see, we use zfs not ldiskfs.&lt;/p&gt;

&lt;p&gt;Also, is reproduction likely to be an effective approach to this problem?  What do we know about the code, and how do we intend to reproduce the issue?&lt;/p&gt;</comment>
                            <comment id="93707" author="green" created="Wed, 10 Sep 2014 16:23:14 +0000"  >&lt;p&gt;Ok. I think I got to the root of this (using my patch &lt;a href=&quot;http://review.whamcloud.com/11842&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/11842&lt;/a&gt; to make every request to trigger a resend which is great for teting this code that is rarely hit in our testing, but gets hit quite a bit on larger systems now that LLNL added the patch to shrink client supplied buffers).&lt;/p&gt;

&lt;p&gt;First of all the code as is should have failed with this assert on the very first resent, but did not due to a bug. This was fixed in our tree, but not in 2.4 ( &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-3428&quot; title=&quot;Fix &amp;#39;uninitialized variables&amp;#39; errors&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-3428&quot;&gt;&lt;del&gt;LU-3428&lt;/del&gt;&lt;/a&gt;) &lt;a href=&quot;http://review.whamcloud.com/6511&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/6511&lt;/a&gt; - so you need this patch first.&lt;/p&gt;

&lt;p&gt;The assertion itself is due to a logic flaw. Then it becomes clear that patch &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-4584&quot; title=&quot;Lock revocation process fails consistently&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-4584&quot;&gt;&lt;del&gt;LU-4584&lt;/del&gt;&lt;/a&gt; you are carrying is wrong, in particular this part of it:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;--- a/lustre/mdt/mdt_open.c
+++ b/lustre/mdt/mdt_open.c
@@ -1710,6 +1710,10 @@ int mdt_reint_open(struct mdt_thread_info *info, struct m
                /* the open lock might already be gotten in
                 * mdt_intent_fixup_resent */
                LASSERT(lustre_msg_get_flags(req-&amp;gt;rq_reqmsg) &amp;amp; MSG_RESENT);
+
+               /* take again a ref on old lock found to be resent */
+               ldlm_lock_addref(&amp;amp;lhc-&amp;gt;mlh_reg_lh, lhc-&amp;gt;mlh_reg_mode);
+
                if (create_flags &amp;amp; MDS_OPEN_LOCK)
                        mdt_set_disposition(info, ldlm_rep, DISP_OPEN_LOCK);
        } else {
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The reason for that is because the mdt_intent_lock_replace assumes the lock has already been &quot;replaced&quot; into the client export, so it does not need any of those references - it cannot go away because it&apos;s already &quot;owned&quot; by the (not yet aware) client.&lt;/p&gt;

&lt;p&gt;With this part removed (and that other uninitialized var fix from above) I am no longer hitting the assertion or having terrible lock deadlocks on resend from the start.&lt;br/&gt;
even despite that my racer testing on your tree cannot complete as other long fixed (in master tree) issues are getting in the way like &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-4725&quot; title=&quot;wrong lock ordering in rename leads to deadlocks&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-4725&quot;&gt;&lt;del&gt;LU-4725&lt;/del&gt;&lt;/a&gt; and &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-5144&quot; title=&quot;rename vs link deadlock&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-5144&quot;&gt;&lt;del&gt;LU-5144&lt;/del&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;At least this class of problems (&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-2827&quot; title=&quot;mdt_intent_fixup_resent() cannot find the proper lock in hash&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-2827&quot;&gt;&lt;del&gt;LU-2827&lt;/del&gt;&lt;/a&gt; related) should be extinguished for you now and hold you until you are ready to move to a newer release that has the more comprehensive lu-2827 patch with all of its afterfixes.&lt;/p&gt;</comment>
                            <comment id="93708" author="green" created="Wed, 10 Sep 2014 16:38:22 +0000"  >&lt;p&gt;I now refreshed &lt;a href=&quot;http://review.whamcloud.com/#/c/9488/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#/c/9488/&lt;/a&gt; to a more correct version.&lt;br/&gt;
Also updated open_by_fid  part to be aware of resends.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="23009">LU-4584</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="19248">LU-3428</issuekey>
        </issuelink>
                            </outwardlinks>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="26600">LU-5632</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10490" key="com.atlassian.jira.plugin.system.customfieldtypes:datepicker">
                        <customfieldname>End date</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Mon, 10 Nov 2014 21:55:36 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzwu73:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>15383</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                        <customfield id="customfield_10493" key="com.atlassian.jira.plugin.system.customfieldtypes:datepicker">
                        <customfieldname>Start date</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Wed, 20 Aug 2014 21:55:36 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                    </customfields>
    </item>
</channel>
</rss>