[LU-8907] LBUG: (namei.c:816:ll_create_node()) ASSERTION( list_empty(&inode->i_dentry) ) failed Created: 05/Dec/16 Updated: 07/Jun/17 Resolved: 07/Jun/17 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.5.3 |
| Fix Version/s: | Lustre 2.10.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Sebastien Piechurski | Assignee: | Bruno Faccini (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | p4b | ||
| Environment: |
RHEL 6 Bull kernel 2.6.32-642.4.2.el6.Bull.100.x86_64 |
||
| Issue Links: |
|
||||
| Severity: | 3 | ||||
| Rank (Obsolete): | 9223372036854775807 | ||||
| Description |
|
One Lustre client crashed right after a recovery with the following messages: 2016-10-29 16:00:01 Lustre: DEBUG MARKER: Sat Oct 29 16:00:01 2016 2016-10-29 16:00:01 2016-10-29 16:01:07 Lustre: 17766:0:(client.c:1942:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1477749640/real 1477749640] req@ffff8811e1b9a400 x1547883906151520/t0(0) o101->scratch3-MDT0000-mdc-ffff88205ec91400@JO.BOO.AL.IL@o2ib2:12/10 lens 632/1136 e 0 to 1 dl 1477749667 ref 2 fl Rpc:XP/0/ffffffff rc 0/-1 2016-10-29 16:01:07 Lustre: 17766:0:(client.c:1942:ptlrpc_expire_one_request()) Skipped 12 previous similar messages 2016-10-29 16:01:07 Lustre: scratch3-MDT0000-mdc-ffff88205ec91400: Connection to scratch3-MDT0000 (at JO.BOO.AL.IL@o2ib2) was lost; in progress operations using this service will wait for recovery to complet e 2016-10-29 16:01:07 Lustre: Skipped 12 previous similar messages 2016-10-29 16:01:07 Lustre: scratch3-MDT0000-mdc-ffff88205ec91400: Connection restored to scratch3-MDT0000 (at JO.BOO.AL.IL@o2ib2) 2016-10-29 16:01:07 Lustre: Skipped 12 previous similar messages 2016-10-29 16:01:07 LustreError: 17766:0:(namei.c:816:ll_create_node()) ASSERTION( list_empty(&inode->i_dentry) ) failed: 2016-10-29 16:01:07 LustreError: 17766:0:(namei.c:816:ll_create_node()) LBUG The stack is as follows: PID: 17766 TASK: ffff8819ab5b0ab0 CPU: 6 COMMAND: "rsync" #0 [ffff88189c697b20] machine_kexec at ffffffff8103ff4b #1 [ffff88189c697b80] crash_kexec at ffffffff810cfce2 #2 [ffff88189c697c50] panic at ffffffff81546ce9 #3 [ffff88189c697cd0] lbug_with_loc at ffffffffa067beeb [libcfs] #4 [ffff88189c697cf0] ll_create_nd at ffffffffa0dbf854 [lustre] #5 [ffff88189c697d70] vfs_create at ffffffff811a7946 #6 [ffff88189c697db0] do_filp_open at ffffffff811ab75e #7 [ffff88189c697f20] do_sys_open at ffffffff81194e87 #8 [ffff88189c697f70] sys_open at ffffffff81194f90 #9 [ffff88189c697f80] system_call_fastpath at ffffffff8100b0d2 RIP: 0000003c88adb480 RSP: 00007fffd85b6bc8 RFLAGS: 00010246 RAX: 0000000000000002 RBX: ffffffff8100b0d2 RCX: 0000000000000001 RDX: 0000000000000180 RSI: 00000000000000c2 RDI: 00007fffd85b7d70 RBP: 00007fffd85b7d9b R8: 7a672e7761722e30 R9: 00000000ffffffff R10: 0000000000000001 R11: 0000000000000246 R12: ffffffff81194f90 R13: ffff88189c697f78 R14: 00007fffd85b7d9c R15: 0000000000000000 ORIG_RAX: 0000000000000002 CS: 0033 SS: 002b
Even though the intent is to create the inode with a dentry in the filesystem namespace (/somepath/.tmpfile), the inode structure field i_dentry is populated with one dentry pointing at /.lustre/fid/[0x298cd542a:0x3b3d:0x0]. Strangely enough, even though the crash occured in ll_create_nd() -> ll_create_it() -> ll_create_node, before the call to d_instantiate() in ll_create_nd() -> ll_create_it(), the fid actually really points back to the requested location: # lfs fid2path /mountpoint [0x298cd542a:0x3b3d:0x0] /somepath/.tmpfile I can't upload the crashdump due to site restriction, but will be happy to give more informations as you request them. |
| Comments |
| Comment by Bruno Faccini (Inactive) [ 05/Dec/16 ] |
|
Hello Sebastien, |
| Comment by Sebastien Piechurski [ 06/Dec/16 ] |
|
Hi Bruno,
For sure, I'd be happy to work with you on this ! |
| Comment by Bruno Faccini (Inactive) [ 06/Dec/16 ] |
|
Ok, then I will try to contact you offline to see how/when/where we can schedule a RDV. |
| Comment by Sebastien Piechurski [ 08/Dec/16 ] |
|
I forgot to mention that the intent dentry which should not had time to be instantiated (since the assert crashed before the call to d_instantiate()) is really /somepath/.tmpfile (the file actually exists on the fs currently).
|
| Comment by Sebastien Piechurski [ 03/Jan/17 ] |
|
Hi Bruno, and Happy New year to you ! Were you able to discuss the data we extracted from the dump with your colleagues on this issue ? |
| Comment by Bruno Faccini (Inactive) [ 09/Jan/17 ] |
|
Thank you Seb and I wish the same for you and your family! |
| Comment by Bruno Faccini (Inactive) [ 09/Jan/17 ] |
|
Oops, forget my previous question, the conversion can be found in the LBUG text ... |
| Comment by Bruno Faccini (Inactive) [ 09/Jan/17 ] |
|
Hello Seb, |
| Comment by Bruno Faccini (Inactive) [ 11/Jan/17 ] |
|
Hello Seb, did you see my previous update/query? |
| Comment by Bruno Faccini (Inactive) [ 12/Jan/17 ] |
|
Hmm first attempt to reproduce looks interesting (promising?). Lustre: DEBUG MARKER: -----============= acceptance-small: replay-single ============----- Wed Jan 11 16:40:48 UTC 2017 Lustre: DEBUG MARKER: only running test 70b Lustre: DEBUG MARKER: excepting tests: 61d 33a 33b 89 62 Lustre: DEBUG MARKER: skipping tests SLOW=no: 1 2 3 4 6 12 16 44a 44b 65 66 67 68 Lustre: DEBUG MARKER: Using TIMEOUT=20 LustreError: 27268:0:(llite_lib.c:1225:ll_clear_inode()) ASSERTION( lli->u.d.d_sai == ((void *)0) ) failed: LustreError: 27268:0:(llite_lib.c:1225:ll_clear_inode()) LBUG Pid: 27268, comm: rm Call Trace: [<ffffffffa03fe895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] [<ffffffffa03fee97>] lbug_with_loc+0x47/0xb0 [libcfs] [<ffffffffa0f64e72>] ll_clear_inode+0x8e2/0x920 [lustre] [<ffffffffa0f63300>] ? ll_delete_inode+0x0/0x1e0 [lustre] [<ffffffff811a69fc>] clear_inode+0xac/0x140 [<ffffffffa0f6338d>] ll_delete_inode+0x8d/0x1e0 [lustre] [<ffffffffa0f34000>] ? return_if_equal+0x0/0x30 [lustre] [<ffffffff811a70fe>] generic_delete_inode+0xde/0x1d0 [<ffffffff811a7255>] generic_drop_inode+0x65/0x80 [<ffffffff811a60a2>] iput+0x62/0x70 [<ffffffffa0f3590a>] ll_d_iput+0xfa/0x820 [lustre] [<ffffffff811a2c7c>] dentry_iput+0x7c/0x100 [<ffffffff811a2df1>] d_kill+0x31/0x60 [<ffffffff811a49cc>] dput+0x7c/0x150 [<ffffffff8119b1d9>] do_rmdir+0x199/0x1f0 [<ffffffff81196df6>] ? final_putname+0x26/0x50 [<ffffffff810e2257>] ? audit_syscall_entry+0x1d7/0x200 [<ffffffff8119b25d>] sys_unlinkat+0x2d/0x40 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b which is only reported in But unfortunately, kdump over the LAN/nfs has failed : …………….. mapping eth0 to eth0 ADDRCONF(NETDEV_UP): eth0: link is not ready udhcpc (v1.15.1) started Sending discover... e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready Sending discover... Sending discover... Unable to get a DHCP address retsd 5:0:0:0: [sdf] Synchronizing SCSI cache ry... No lease, failing eth0 failed to come up Saving to remote location eagle-2.eagle.hpdd.intel.com:/export/scratch mount: RPC: Remote system error - Network is unreachable sd 4:0:0:0: [sde] Synchronizing SCSI cache sd 3:0:0:0: [sdd] Synchronizing SCSI cache sd 2:0:0:0: [sdc] Synchronizing SCSI cache sd 1:0:0:0: [sdb] Synchronizing SCSI cache sd 0:0:0:0: [sda] Synchronizing SCSI cache e1000e 0000:02:00.0: PCI INT A disabled e1000e 0000:01:00.0: PCI INT A disabled Restarting system. machine restart |
| Comment by Sebastien Piechurski [ 12/Jan/17 ] |
|
Hi Bruno,
Here is the list of patches we apply to our servers kernel: 0003-Applied-Bull_lmigr-x86_64.el6.0.7.patch-for-RHEL-2.6.patch 0006-Applied-Bull_311740-GENERIC_ALLOCATOR-4.patch 0008-Applied-Bull_Dynamic_Power_Limitation_2.6.32-573.patch 0011-Applied-Bull_mce75xx_2.6.32-504.diff.patch 0012-Applied-Bull_improve-x86_64.csum.patch.patch 0017-Applied-raid5-mmp-unplug-dev-rhel6.Bull.3.patch 0018-lustre-blkdev_tunables-2.6-rhel6.patch.patch 0021-lustre-bh_lru_size_config.patch.patch 0022-quota-replace-dqptr-sem.patch 0029-Applied-proc_pid_limits.patch.patch 0030-Applied-patch-Bull_oomprotect.diff.patch 0055-quota-avoid-dqget-calls.patch 0058-fix-CONFIG_XEN.patch 0059-ocfs2-getname.patch 0090-kernel_H-2.6.32-573.1.1-config-x86_64.patch
For the lustre code, this is based on 2.5.3.90 (commit 35bb8577c83f787451cd9842a07ac670cb6f4d3f), with some patches that I will send you by mail. |
| Comment by Bruno Faccini (Inactive) [ 17/Jan/17 ] |
|
Thanks Seb, I don't see any Kernel/Lustre patch that could be related to the problem. Concerning the other LBUG, related to On the other hand, I have been able to reproduce your LBUG by introducing the following tempo just before the LBUG : [root@eagle-28 lustre-release]# git status
# On branch b2_5
# Changed but not updated:
# (use "git add <file>..." to update what will be committed)
# (use "git checkout -- <file>..." to discard changes in working directory)
#
# modified: lustre/include/obd_support.h
# modified: lustre/llite/namei.c
#
# Untracked files:
# (use "git add <file>..." to include in what will be committed)
#
# lustre/include/obd_support.h.bfi
# lustre/llite/namei.c.bfi
# lustre/scripts/make_META.pl
no changes added to commit (use "git add" and/or "git commit -a")
[root@eagle-28 lustre-release]#
[root@eagle-28 lustre-release]# git diff
diff --git a/lustre/include/obd_support.h b/lustre/include/obd_support.h
index 37bee22..a21f8a0 100644
--- a/lustre/include/obd_support.h
+++ b/lustre/include/obd_support.h
@@ -430,6 +430,7 @@ int obd_alloc_fail(const void *ptr, const char *name, const char *type,
#define OBD_FAIL_MDC_GETATTR_ENQUEUE 0x803
#define OBD_FAIL_MDC_RPCS_SEM 0x804
#define OBD_FAIL_MDC_LIGHTWEIGHT 0x805
+#define OBD_FAIL_MDC_CREATE_PAUSE 0x805
#define OBD_FAIL_MGS 0x900
#define OBD_FAIL_MGS_ALL_REQUEST_NET 0x901
diff --git a/lustre/llite/namei.c b/lustre/llite/namei.c
index 39cd6d5..1cf3518 100644
--- a/lustre/llite/namei.c
+++ b/lustre/llite/namei.c
@@ -813,6 +813,8 @@ static struct inode *ll_create_node(struct inode *dir, const char *name,
if (rc)
GOTO(out, inode = ERR_PTR(rc));
+ OBD_FAIL_TIMEOUT(OBD_FAIL_MDC_CREATE_PAUSE, 5);
+
LASSERT(ll_d_hlist_empty(&inode->i_dentry));
/* We asked for a lock on the directory, but were granted a
[root@eagle-28 lustre-release]#
and by running the following snippet : [root@eagle-28 lustre-release]# date; lctl set_param fail_loc=0x80000805 ; echo foofour > /mnt/lustre/foo8 & sleep 2 ; date ; fid=$(lfs changelog lustre-MDT0000 | grep ' foo8' | sed 's/.* t=//' | sed 's/ p=.*//') ; date ; cat /mnt/lustre/.lustre/fid/$fid ; date
Tue Jan 17 16:31:09 UTC 2017
fail_loc=0x80000805
[1] 27152
Tue Jan 17 16:31:11 UTC 2017
Tue Jan 17 16:31:11 UTC 2017
Tue Jan 17 16:31:11 UTC 2017
[root@eagle-28 lustre-release]#
Message from syslogd@eagle-28 at Jan 17 16:31:14 ...
kernel:LustreError: 27152:0:(namei.c:818:ll_create_node()) ASSERTION( list_empty(&inode->i_dentry) ) failed:
Message from syslogd@eagle-28 at Jan 17 16:31:14 ...
kernel:LustreError: 27152:0:(namei.c:818:ll_create_node()) LBUG
So this seems to mean that, as I have suspected from the start of our joint analysis, there is a potential race during file creation (including directory/dentry populate) and concurrent access using .lustre/fid method that is allowed to a ChangeLog reader, some kind of a More to come from crash-dump/debug-log analysis, and will also check if problem is still alive (like the same LBUG!) in master. |
| Comment by Gerrit Updater [ 18/Jan/17 ] |
|
Faccini Bruno (bruno.faccini@intel.com) uploaded a new patch: https://review.whamcloud.com/24976 |
| Comment by Bruno Faccini (Inactive) [ 18/Jan/17 ] |
|
I have just pushed https://review.whamcloud.com/24976 to b2_5 branch. It introduces sanity/test_161d sub-test in order to check that the reproducer I have identified also fails in the auto-tests environment in a first time. And then, with next patch-set that removing LBUG could fix and not introduce any regression. Next, will also check, using the same way, if problem also exists in master. |
| Comment by Sebastien Piechurski [ 02/Feb/17 ] |
|
Hi Bruno, did you see the problem reproduced in the auto tests environment ? |
| Comment by Gerrit Updater [ 07/Feb/17 ] |
|
Faccini Bruno (bruno.faccini@intel.com) uploaded a new patch: https://review.whamcloud.com/25296 |
| Comment by Bruno Faccini (Inactive) [ 08/Feb/17 ] |
|
Yes, new sanity/test_161d sub-test is also a reproducer in auto-tests environment and it also demonstrate that problem is still alive in master! LASSERT(ll_d_hlist_empty(&inode->i_dentry)); in ll_create_node() and exposes it to full tests suites. |
| Comment by Sebastien Piechurski [ 02/Mar/17 ] |
|
Hello Bruno,
What is the next step for this patch ? Are we waiting for reviewers ? |
| Comment by Gerrit Updater [ 26/Mar/17 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/25296/ |
| Comment by Bruno Faccini (Inactive) [ 27/Mar/17 ] |
|
Hello Seb, |
| Comment by Sebastien Piechurski [ 07/Jun/17 ] |
|
Thanks Bruno. The patch was backported to our branch. This ticket can be closed. |
| Comment by Peter Jones [ 07/Jun/17 ] |
|
Thanks |
| Comment by Peter Jones [ 07/Jun/17 ] |
|
Landed for 2.10 |