[LU-5525] ASSERTION( new_lock->l_readers + new_lock->l_writers == 0 ) failed Created: 20/Aug/14  Updated: 13/Oct/21  Resolved: 13/Oct/21

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Critical
Reporter: Christopher Morrone Assignee: Bruno Faccini (Inactive)
Resolution: Duplicate Votes: 0
Labels: llnl
Environment:

MDS node, Lustre 2.4.2-14chaos, ZFS OBD


Issue Links:
Related
is related to LU-4584 Lock revocation process fails consist... Resolved
is related to LU-3428 Fix 'uninitialized variables' errors Resolved
is related to LU-5632 ldlm_lock_addref()) ASSERTION( lock !... Resolved
Severity: 3
Rank (Obsolete): 15383

 Description   

After upgrading to lustre 2.4.2-14chaos (see github.com/chaos/lustre), we soon hit the following assertion on one of our MDS nodes:

mdt_handler.c:3652:mdt_intent_lock_replace()) ASSERTION( new_lock->l_readers + new_lock->l_writers == 0 ) failed

Perhaps most significantly, this tag of our lustre tree includes the patch entitled:

LU-4584 mdt: ensure orig lock is found in hash upon resend

James Simmons reported this assertion when he tested the LU-4584 patch, but the Bruno made the evaluation that the assertion was unrelated to the patch.

Whether it is related or not, we need to fix the problem.



 Comments   
Comment by Christopher Morrone [ 20/Aug/14 ]

Here is the backtrace for the thread that mdt01_006 thread that hit the assertion:

mdt_intent_lock_replace
mdt_intent_reint
mdt_intent_polucy
ldlm_lock_enqueue
ldlm_handle_enqueue0
mdt_enqueue
mdt_handle_common
mds_regular_handle
ptlrpc_server_handle_request
ptlrpc_main

Comment by Peter Jones [ 20/Aug/14 ]

Bruno

Can you please advise on this ticket?

Thanks

Peter

Comment by Bruno Faccini (Inactive) [ 21/Aug/14 ]

Yes, I am investigating the long history around this issue ...

Comment by Christopher Morrone [ 22/Aug/14 ]

We have hit this assertion 3 times in just over 24 hours on the MDS of our largest production filesystem. I reverted the patch for LU-4584 in the hope of getting us back into a reasonably operational state again. We should probably know in a day or two if the problem is gone.

Comment by Bruno Faccini (Inactive) [ 25/Aug/14 ]

Hello Chris,

I checked my notes for LU-4584 and this LBUG occured without my patch being applied.

I suspect this will not be possible for you, but just in case. Did you ran with some debug levels enabled in Lustre trace at the time of these 3 crashes ? And if yes, could you extract+provide at least one of them ? Or better, could you provide a crash-dump ?

Also, and I know it is not an easy question but ..., do you have any idea about what was the work-load that may cause this ?

I am trying to setup a platform to reproduce in-house so any details about your configuration is of interest (number of MDSs/MDTs, DNE?, number of OSSs/OSTs, OSTs indexing, default striping, ...).

Thanks again and in advance for your help.

Comment by Christopher Morrone [ 25/Aug/14 ]

This is a not a machine for which we can provide logs or crash dumps. No, we don't know the details of the various workloads going on at the time.

The filesystem has 768 OSTs on 768 OSS nodes. Default stripe value of 1. Some users do use stripe counts at various widths, some over 700.

I don't know what your notes say, but I checked and we crashed the MDS fifteen times during the three days that we were running with the LU-4584 patch. We then rebooted the MDS onto Lustre version 2.4.2-14.1chaos (which does nothing but revert the LU-4584 patch), and we have not seen this ticket's assertion in over three days.

While not conclusive, the circumstantial evidence points strongly at the LU-4584 patch either introducing the bug or making it much easier to hit.

Here is the list of 2.4.2-14chaos to 2.4.2-14.1chaos changes:

$ git log --oneline 2.4.2-14chaos..2.4.2-14.1chaos
f28b8cc Revert "LU-4584 mdt: ensure orig lock is found in hash upon resend"

You can find those tags at github.com/chaos/lustre.

Comment by James A Simmons [ 25/Aug/14 ]

I hit this bug using the latest b2_5 branch with the LU-2827 patch. Good news for you is that I can share my crash dumps. A simple simul run on our Cray test bed produced this Oops. I uploaded the vmcore and dmesg to ftp.whamcloud.com/uploads/LU-5525.

Comment by Bruno Faccini (Inactive) [ 26/Aug/14 ]

Hello James, thanks for this crash-dump already! But I also need the corresponding vmlinux and Lustre modules to allow crash tool to run on it. By "the latest b2_5 branch with the LU-2827 patch", do you mean you used one of our recent Jenkins build to get this crash? If yes, can you point me to which one ? And if not, can you also xfer the 2x kernel-debuginfo[-common] RPMs and also the luste-modules (or the lustre-debuginfo) RPM ?
BTW, I checked the vmcore-dmesg.txt file, you provided with the crash-dump, and its content seems to indicate that you run with both a Kernel and Lustre distro you have builded locally, right?

Comment by James A Simmons [ 26/Aug/14 ]

Sorry I forgot the debug.rpms. I just uploaded them to the same ftp spot. ORNL would really like to go to 2.5 ASAP.

Yes, I built b2_5 at commit a43e0e4ce4b57240540e8d35a8ba44e203c70ae1 with some additional patches. We tag our kernels to avoid our automated rpm update system from stomping on them. They basically are the RHEL Lustre patched kernels. We also build our own kernel rpms to enable certain things like timestamps in the printk.

Comment by Bruno Faccini (Inactive) [ 27/Aug/14 ]

James, "crash" tool claims "WARNING: kernel version inconsistency between vmlinux and dumpfile" and gives up due to further errors ...
Are you sure that at the time of crash you were running with this "2.6.32-431.17.1.el6.wc" kernel version for which you provided the debuginfo RPMs ? BTW, strings out from vmcore and vmcore-dmesg.txt report "2.6.32-358.23.2.el6.atlas"...

Comment by Bruno Faccini (Inactive) [ 27/Aug/14 ]

Chris, I am really sorry about the fact my b2_4 patch for LU-4584 seems to cause you so many troubles.
This LBUG already has not occured when running auto-tests and your/LU-4584 reproducer, against Jenkins build for patch-set #7 of Gerrit change #9488, and I am still unable to reproduce it when running with a local build made of tag 2.4.3-RC1 + LU-4584 patch (ie, patch-set #7 of Gerrit change #9488).
I am currently building a Lustre version based on 2.4.2-14chaos and see if I can reproduce in-house, but I wanted to ask you if you tried to run your LU-4584 reproducer against it already ?

Comment by James A Simmons [ 27/Aug/14 ]

Oops, you are right. I saw this problem with our 2.4 image which did contain the LU-4584 patch. Sorry for the confusion. I pushed the proper debug rpms this time.

Comment by Bruno Faccini (Inactive) [ 27/Aug/14 ]

Chris, can you help and tell me the kernel version you use/build with 2.4.2-14chaos, I am having problems when patching the 2.6.32-358.23.2 kernel version we use to build our latest b2_4/2.4.3-RC1 branch/tag.

Comment by James A Simmons [ 27/Aug/14 ]

I can report I don't see this issue with 2.5 servers. I see another problem instead.

Comment by Christopher Morrone [ 27/Aug/14 ]

I wanted to ask you if you tried to run your LU-4584 reproducer against it already ?

Yes. The LU-4584 patch did appear to help with the evictions on unlink.

Comment by Bruno Faccini (Inactive) [ 28/Aug/14 ]

Ok, found that 2.6.32-431.3.1 kernel is compatible with 2.4.2-14chaos kernel patches ...

Comment by Bruno Faccini (Inactive) [ 29/Aug/14 ]

But now I encounter issues during the make step, Chris, do you use special tricks to build from your source tree ??

Comment by Bruno Faccini (Inactive) [ 29/Aug/14 ]

James,
It's too bad that at the time of this crash you ran with a Lustre debug mask made of only "D_IOCTL+D_NETERROR+D_WANING+D_ERROR+D_EMERG+D_HA+D_CONFIG+D_CONSOLE" and which did not contain at least "D_RPCTRACE+D_DLMTRACE" in addition ... That would have greatly helped to navigate in the crash-dump.

Comment by Bruno Faccini (Inactive) [ 01/Sep/14 ]

Chris,
Please, can you help+answer me about my previous request and detail any specific/procedure to build from LLNL source tree/git ?

Comment by Christopher Morrone [ 02/Sep/14 ]

But now I encounter issues during the make step, Chris, do you use special tricks to build from your source tree ??

Maybe? Could you be more specific about the commands you issued and the problem that you saw? If you had trouble in the liblustre part, try adding --disable-liblustre to the configure command line.

Comment by Bruno Faccini (Inactive) [ 03/Sep/14 ]

I 1st cloned the chaos git, then selected the 2.4.2-14chaos tag.
And then I applied the/our usual procedure described at "https://wiki.hpdd.intel.com/pages/viewpage.action?pageId=8126821", to generate a ldiskfs-based (sorry I forgot to mention this ...) Server build, and get these error during the "make rpms" step :

...............

Wrote: /home/bruno.faccini/kernel/rpmbuild/SRPMS/lustre-iokit-1.4.0-1.src.rpm
make[1]: Leaving directory `/home/bruno.faccini/lustre-chaos/lustre-iokit'
rpmbuilddir=`mktemp -t -d lustre-build-$USER-XXXXXXXX`; \
        make  \
                rpmbuilddir="$rpmbuilddir" \
                rpm-local || exit 1; \
        rpmbuild \
                --define "_tmppath $rpmbuilddir/TMP" \
                --define "_topdir $rpmbuilddir" \
                --define "build_src_rpm 1" \
                --define "dist %{nil}" \
        -ts lustre-2.4.2.tar.gz || exit 1; \
        cp $rpmbuilddir/SRPMS/*.src.rpm . || exit 1; \
        rm -f -R $rpmbuilddir
make[1]: Entering directory `/home/bruno.faccini/lustre-chaos'
make[1]: Leaving directory `/home/bruno.faccini/lustre-chaos'
error: Macro %rel_down has empty body
error: Macro %rel_build has empty body
error: line 96: Empty tag: Release:
make: *** [srpm] Error 1

So I suspect I don't use the right procedure for your tree since we don't have such "rpmbuilddir" variable/macro in our build infrastructure, which plays with/changes "_topdir" setting ...

Comment by Christopher Morrone [ 04/Sep/14 ]

Looks like a bug. You can work around it by using --with-downstream-release:

$ sh autogen.sh
$ ./configure --with-downstream-release=14chaos --disable-liblustre
$ make rpms

If you are trying to reproduce what we see, we use zfs not ldiskfs.

Also, is reproduction likely to be an effective approach to this problem? What do we know about the code, and how do we intend to reproduce the issue?

Comment by Oleg Drokin [ 10/Sep/14 ]

Ok. I think I got to the root of this (using my patch http://review.whamcloud.com/11842 to make every request to trigger a resend which is great for teting this code that is rarely hit in our testing, but gets hit quite a bit on larger systems now that LLNL added the patch to shrink client supplied buffers).

First of all the code as is should have failed with this assert on the very first resent, but did not due to a bug. This was fixed in our tree, but not in 2.4 ( LU-3428) http://review.whamcloud.com/6511 - so you need this patch first.

The assertion itself is due to a logic flaw. Then it becomes clear that patch LU-4584 you are carrying is wrong, in particular this part of it:

--- a/lustre/mdt/mdt_open.c
+++ b/lustre/mdt/mdt_open.c
@@ -1710,6 +1710,10 @@ int mdt_reint_open(struct mdt_thread_info *info, struct m
                /* the open lock might already be gotten in
                 * mdt_intent_fixup_resent */
                LASSERT(lustre_msg_get_flags(req->rq_reqmsg) & MSG_RESENT);
+
+               /* take again a ref on old lock found to be resent */
+               ldlm_lock_addref(&lhc->mlh_reg_lh, lhc->mlh_reg_mode);
+
                if (create_flags & MDS_OPEN_LOCK)
                        mdt_set_disposition(info, ldlm_rep, DISP_OPEN_LOCK);
        } else {

The reason for that is because the mdt_intent_lock_replace assumes the lock has already been "replaced" into the client export, so it does not need any of those references - it cannot go away because it's already "owned" by the (not yet aware) client.

With this part removed (and that other uninitialized var fix from above) I am no longer hitting the assertion or having terrible lock deadlocks on resend from the start.
even despite that my racer testing on your tree cannot complete as other long fixed (in master tree) issues are getting in the way like LU-4725 and LU-5144

At least this class of problems (LU-2827 related) should be extinguished for you now and hold you until you are ready to move to a newer release that has the more comprehensive lu-2827 patch with all of its afterfixes.

Comment by Oleg Drokin [ 10/Sep/14 ]

I now refreshed http://review.whamcloud.com/#/c/9488/ to a more correct version.
Also updated open_by_fid part to be aware of resends.

Generated at Sat Feb 10 01:52:13 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.