[LU-5525] ASSERTION( new_lock->l_readers + new_lock->l_writers == 0 ) failed Created: 20/Aug/14 Updated: 13/Oct/21 Resolved: 13/Oct/21 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Critical |
| Reporter: | Christopher Morrone | Assignee: | Bruno Faccini (Inactive) |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | llnl | ||
| Environment: |
MDS node, Lustre 2.4.2-14chaos, ZFS OBD |
||
| Issue Links: |
|
||||||||||||||||
| Severity: | 3 | ||||||||||||||||
| Rank (Obsolete): | 15383 | ||||||||||||||||
| Description |
|
After upgrading to lustre 2.4.2-14chaos (see github.com/chaos/lustre), we soon hit the following assertion on one of our MDS nodes: mdt_handler.c:3652:mdt_intent_lock_replace()) ASSERTION( new_lock->l_readers + new_lock->l_writers == 0 ) failed Perhaps most significantly, this tag of our lustre tree includes the patch entitled: LU-4584 mdt: ensure orig lock is found in hash upon resend James Simmons reported this assertion when he tested the Whether it is related or not, we need to fix the problem. |
| Comments |
| Comment by Christopher Morrone [ 20/Aug/14 ] |
|
Here is the backtrace for the thread that mdt01_006 thread that hit the assertion: mdt_intent_lock_replace mdt_intent_reint mdt_intent_polucy ldlm_lock_enqueue ldlm_handle_enqueue0 mdt_enqueue mdt_handle_common mds_regular_handle ptlrpc_server_handle_request ptlrpc_main |
| Comment by Peter Jones [ 20/Aug/14 ] |
|
Bruno Can you please advise on this ticket? Thanks Peter |
| Comment by Bruno Faccini (Inactive) [ 21/Aug/14 ] |
|
Yes, I am investigating the long history around this issue ... |
| Comment by Christopher Morrone [ 22/Aug/14 ] |
|
We have hit this assertion 3 times in just over 24 hours on the MDS of our largest production filesystem. I reverted the patch for |
| Comment by Bruno Faccini (Inactive) [ 25/Aug/14 ] |
|
Hello Chris, I checked my notes for I suspect this will not be possible for you, but just in case. Did you ran with some debug levels enabled in Lustre trace at the time of these 3 crashes ? And if yes, could you extract+provide at least one of them ? Or better, could you provide a crash-dump ? Also, and I know it is not an easy question but ..., do you have any idea about what was the work-load that may cause this ? I am trying to setup a platform to reproduce in-house so any details about your configuration is of interest (number of MDSs/MDTs, DNE?, number of OSSs/OSTs, OSTs indexing, default striping, ...). Thanks again and in advance for your help. |
| Comment by Christopher Morrone [ 25/Aug/14 ] |
|
This is a not a machine for which we can provide logs or crash dumps. No, we don't know the details of the various workloads going on at the time. The filesystem has 768 OSTs on 768 OSS nodes. Default stripe value of 1. Some users do use stripe counts at various widths, some over 700. I don't know what your notes say, but I checked and we crashed the MDS fifteen times during the three days that we were running with the While not conclusive, the circumstantial evidence points strongly at the Here is the list of 2.4.2-14chaos to 2.4.2-14.1chaos changes: $ git log --oneline 2.4.2-14chaos..2.4.2-14.1chaos f28b8cc Revert "LU-4584 mdt: ensure orig lock is found in hash upon resend" You can find those tags at github.com/chaos/lustre. |
| Comment by James A Simmons [ 25/Aug/14 ] |
|
I hit this bug using the latest b2_5 branch with the |
| Comment by Bruno Faccini (Inactive) [ 26/Aug/14 ] |
|
Hello James, thanks for this crash-dump already! But I also need the corresponding vmlinux and Lustre modules to allow crash tool to run on it. By "the latest b2_5 branch with the |
| Comment by James A Simmons [ 26/Aug/14 ] |
|
Sorry I forgot the debug.rpms. I just uploaded them to the same ftp spot. ORNL would really like to go to 2.5 ASAP. Yes, I built b2_5 at commit a43e0e4ce4b57240540e8d35a8ba44e203c70ae1 with some additional patches. We tag our kernels to avoid our automated rpm update system from stomping on them. They basically are the RHEL Lustre patched kernels. We also build our own kernel rpms to enable certain things like timestamps in the printk. |
| Comment by Bruno Faccini (Inactive) [ 27/Aug/14 ] |
|
James, "crash" tool claims "WARNING: kernel version inconsistency between vmlinux and dumpfile" and gives up due to further errors ... |
| Comment by Bruno Faccini (Inactive) [ 27/Aug/14 ] |
|
Chris, I am really sorry about the fact my b2_4 patch for |
| Comment by James A Simmons [ 27/Aug/14 ] |
|
Oops, you are right. I saw this problem with our 2.4 image which did contain the |
| Comment by Bruno Faccini (Inactive) [ 27/Aug/14 ] |
|
Chris, can you help and tell me the kernel version you use/build with 2.4.2-14chaos, I am having problems when patching the 2.6.32-358.23.2 kernel version we use to build our latest b2_4/2.4.3-RC1 branch/tag. |
| Comment by James A Simmons [ 27/Aug/14 ] |
|
I can report I don't see this issue with 2.5 servers. I see another problem instead. |
| Comment by Christopher Morrone [ 27/Aug/14 ] |
Yes. The |
| Comment by Bruno Faccini (Inactive) [ 28/Aug/14 ] |
|
Ok, found that 2.6.32-431.3.1 kernel is compatible with 2.4.2-14chaos kernel patches ... |
| Comment by Bruno Faccini (Inactive) [ 29/Aug/14 ] |
|
But now I encounter issues during the make step, Chris, do you use special tricks to build from your source tree ?? |
| Comment by Bruno Faccini (Inactive) [ 29/Aug/14 ] |
|
James, |
| Comment by Bruno Faccini (Inactive) [ 01/Sep/14 ] |
|
Chris, |
| Comment by Christopher Morrone [ 02/Sep/14 ] |
Maybe? Could you be more specific about the commands you issued and the problem that you saw? If you had trouble in the liblustre part, try adding --disable-liblustre to the configure command line. |
| Comment by Bruno Faccini (Inactive) [ 03/Sep/14 ] |
|
I 1st cloned the chaos git, then selected the 2.4.2-14chaos tag.
...............
Wrote: /home/bruno.faccini/kernel/rpmbuild/SRPMS/lustre-iokit-1.4.0-1.src.rpm
make[1]: Leaving directory `/home/bruno.faccini/lustre-chaos/lustre-iokit'
rpmbuilddir=`mktemp -t -d lustre-build-$USER-XXXXXXXX`; \
make \
rpmbuilddir="$rpmbuilddir" \
rpm-local || exit 1; \
rpmbuild \
--define "_tmppath $rpmbuilddir/TMP" \
--define "_topdir $rpmbuilddir" \
--define "build_src_rpm 1" \
--define "dist %{nil}" \
-ts lustre-2.4.2.tar.gz || exit 1; \
cp $rpmbuilddir/SRPMS/*.src.rpm . || exit 1; \
rm -f -R $rpmbuilddir
make[1]: Entering directory `/home/bruno.faccini/lustre-chaos'
make[1]: Leaving directory `/home/bruno.faccini/lustre-chaos'
error: Macro %rel_down has empty body
error: Macro %rel_build has empty body
error: line 96: Empty tag: Release:
make: *** [srpm] Error 1
So I suspect I don't use the right procedure for your tree since we don't have such "rpmbuilddir" variable/macro in our build infrastructure, which plays with/changes "_topdir" setting ... |
| Comment by Christopher Morrone [ 04/Sep/14 ] |
|
Looks like a bug. You can work around it by using --with-downstream-release: $ sh autogen.sh $ ./configure --with-downstream-release=14chaos --disable-liblustre $ make rpms If you are trying to reproduce what we see, we use zfs not ldiskfs. Also, is reproduction likely to be an effective approach to this problem? What do we know about the code, and how do we intend to reproduce the issue? |
| Comment by Oleg Drokin [ 10/Sep/14 ] |
|
Ok. I think I got to the root of this (using my patch http://review.whamcloud.com/11842 to make every request to trigger a resend which is great for teting this code that is rarely hit in our testing, but gets hit quite a bit on larger systems now that LLNL added the patch to shrink client supplied buffers). First of all the code as is should have failed with this assert on the very first resent, but did not due to a bug. This was fixed in our tree, but not in 2.4 ( The assertion itself is due to a logic flaw. Then it becomes clear that patch --- a/lustre/mdt/mdt_open.c
+++ b/lustre/mdt/mdt_open.c
@@ -1710,6 +1710,10 @@ int mdt_reint_open(struct mdt_thread_info *info, struct m
/* the open lock might already be gotten in
* mdt_intent_fixup_resent */
LASSERT(lustre_msg_get_flags(req->rq_reqmsg) & MSG_RESENT);
+
+ /* take again a ref on old lock found to be resent */
+ ldlm_lock_addref(&lhc->mlh_reg_lh, lhc->mlh_reg_mode);
+
if (create_flags & MDS_OPEN_LOCK)
mdt_set_disposition(info, ldlm_rep, DISP_OPEN_LOCK);
} else {
The reason for that is because the mdt_intent_lock_replace assumes the lock has already been "replaced" into the client export, so it does not need any of those references - it cannot go away because it's already "owned" by the (not yet aware) client. With this part removed (and that other uninitialized var fix from above) I am no longer hitting the assertion or having terrible lock deadlocks on resend from the start. At least this class of problems ( |
| Comment by Oleg Drokin [ 10/Sep/14 ] |
|
I now refreshed http://review.whamcloud.com/#/c/9488/ to a more correct version. |