Details
-
Bug
-
Resolution: Cannot Reproduce
-
Major
-
None
-
Lustre 2.7.0
-
None
-
Versions of 2.6.54 on clients & servers.
Cray SLES11SP3 clients, CentOS servers (2.6.32-431.5.1.el6.x86_64).
Most recent commit on clients:
Ie7a2a98be8cc97db9af7a64476c06fc7321544eb
http://review.whamcloud.com/12142
Most recent commit on servers:
If24443955290b091fd22905dfb74b0d6a6d1b4e8
http://review.whamcloud.com/12490Versions of 2.6.54 on clients & servers. Cray SLES11SP3 clients, CentOS servers (2.6.32-431.5.1.el6.x86_64). Most recent commit on clients: Ie7a2a98be8cc97db9af7a64476c06fc7321544eb http://review.whamcloud.com/12142 Most recent commit on servers: If24443955290b091fd22905dfb74b0d6a6d1b4e8 http://review.whamcloud.com/12490
-
3
-
16451
Description
While doing a general purpose test of master with DNE II (2 MDSes with 3 MDTs each, 6 total MDTs), we did an ls to check the status of something, and our client LBUGged. We turned up debug on a different client, did the ls again, and it crashed as well:
2014-11-06T20:08:43.974554-06:00 c1-0c0s0n3 LustreError: 3797:0:(statahead.c:262:sa_kill()) ASSERTION( !list_empty(&entry->se_list) ) failed:
2014-11-06T20:08:43.974591-06:00 c1-0c0s0n3 LustreError: 3797:0:(statahead.c:262:sa_kill()) LBUG
2014-11-06T20:08:43.974597-06:00 c1-0c0s0n3 Pid: 3797, comm: ls
2014-11-06T20:08:43.974604-06:00 c1-0c0s0n3 Call Trace:
2014-11-06T20:08:43.974617-06:00 c1-0c0s0n3 [<ffffffff81006591>] try_stack_unwind+0x161/0x1a0
2014-11-06T20:08:43.974622-06:00 c1-0c0s0n3 [<ffffffff81004de9>] dump_trace+0x89/0x440
2014-11-06T20:08:43.974632-06:00 c1-0c0s0n3 [<ffffffffa0176897>] libcfs_debug_dumpstack+0x57/0x80 [libcfs]
2014-11-06T20:08:43.974639-06:00 c1-0c0s0n3 [<ffffffffa0176de7>] lbug_with_loc+0x47/0xc0 [libcfs]
2014-11-06T20:08:44.015207-06:00 c1-0c0s0n3 [<ffffffffa0750512>] sa_put+0x332/0x370 [lustre]
2014-11-06T20:08:44.015232-06:00 c1-0c0s0n3 [<ffffffffa0752b6c>] do_statahead_enter+0xfdc/0x1cb0 [lustre]
2014-11-06T20:08:44.015238-06:00 c1-0c0s0n3 [<ffffffffa073a695>] ll_lookup_it+0x7b5/0x1b70 [lustre]
2014-11-06T20:08:44.015245-06:00 c1-0c0s0n3 [<ffffffffa073bad5>] ll_lookup_nd+0x85/0x570 [lustre]
2014-11-06T20:08:44.015250-06:00 c1-0c0s0n3 [<ffffffff81148edc>] d_alloc_and_lookup+0x4c/0x80
2014-11-06T20:08:44.015261-06:00 c1-0c0s0n3 [<ffffffff8114a57e>] do_lookup+0x29e/0x3a0
2014-11-06T20:08:44.015273-06:00 c1-0c0s0n3 [<ffffffff8114cdb3>] path_lookupat+0xc3/0x5e0
2014-11-06T20:08:44.015281-06:00 c1-0c0s0n3 [<ffffffff8114d305>] do_path_lookup+0x35/0xd0
2014-11-06T20:08:44.015287-06:00 c1-0c0s0n3 [<ffffffff8114e043>] user_path_at_empty+0x83/0xb0
2014-11-06T20:08:44.015303-06:00 c1-0c0s0n3 [<ffffffff8114e081>] user_path_at+0x11/0x20
2014-11-06T20:08:44.015311-06:00 c1-0c0s0n3 [<ffffffff81142d85>] vfs_fstatat+0x55/0x90
2014-11-06T20:08:44.054257-06:00 c1-0c0s0n3 [<ffffffff81142e2e>] vfs_lstat+0x1e/0x20
2014-11-06T20:08:44.054275-06:00 c1-0c0s0n3 [<ffffffff81142e54>] sys_newlstat+0x24/0x50
2014-11-06T20:08:44.054281-06:00 c1-0c0s0n3 [<ffffffff8133776b>] system_call_fastpath+0x16/0x1b
2014-11-06T20:08:44.054293-06:00 c1-0c0s0n3 [<00007ffff78b9515>] 0x7ffff78b9515
I'll make the client dump available in a few minutes.
I see that the patch seems to have been applied, though we seem to be having a similar problem. We're using 2.6.93 on the clients and 2.6.92 on the servers. The problem can be reproduced reliably on our end, so I'd be happy to provide additional logs/diagnostic information (Is there a standard procedure for this? I couldn't seem to find anything via google.) I've included the kernel log messages that were dumped to the console as well as the stack trace below.
kernel:LustreError: 13007:0:(statahead.c:262:sa_kill()) ASSERTION( !list_empty(&entry->se_list) ) failed:
kernel:LustreError: 13007:0:(statahead.c:262:sa_kill()) LBUG
PID: 13007 TASK: ffff880b0aea2040 CPU: 25 COMMAND: "rsync"
#0 [ffff881c20ef16f0] machine_kexec at ffffffff8103b5bb
/usr/src/debug/kernel-2.6.32-504.3.3.el6/linux-2.6.32-504.3.3.el6.x86_64/arch/x86/kernel/machine_kexec_64.c: 336
#1 [ffff881c20ef1750] crash_kexec at ffffffff810c9852
/usr/src/debug/kernel-2.6.32-504.3.3.el6/linux-2.6.32-504.3.3.el6.x86_64/kernel/kexec.c: 1106
#2 [ffff881c20ef1820] panic at ffffffff8152927e
/usr/src/debug/kernel-2.6.32-504.3.3.el6/linux-2.6.32-504.3.3.el6.x86_64/kernel/panic.c: 111
#3 [ffff881c20ef18a0] lbug_with_loc at ffffffffa03c5eeb [libcfs]
/usr/src/debug/lustre-2.6.93/libcfs/libcfs/linux/linux-debug.c: 175
#4 [ffff881c20ef18c0] revalidate_statahead_dentry at ffffffffa0a0e31d [lustre]
/usr/src/debug/lustre-2.6.93/lustre/llite/statahead.c: 263
#5 [ffff881c20ef1a00] ll_statahead at ffffffffa0a0e642 [lustre]
/usr/src/debug/lustre-2.6.93/libcfs/include/libcfs/libcfs_debug.h: 219
#6 [ffff881c20ef1ae0] ll_lookup_it at ffffffffa09f8907 [lustre]
/usr/src/debug/lustre-2.6.93/lustre/llite/namei.c: 541
#7 [ffff881c20ef1ba0] ll_lookup_nd at ffffffffa09f9029 [lustre]
/usr/src/debug/lustre-2.6.93/lustre/llite/namei.c: 771
#8 [ffff881c20ef1bf0] do_lookup at ffffffff8119dc65
/usr/src/debug/kernel-2.6.32-504.3.3.el6/linux-2.6.32-504.3.3.el6.x86_64/fs/namei.c: 1063
#9 [ffff881c20ef1c50] __link_path_walk at ffffffff8119e8f4
/usr/src/debug/kernel-2.6.32-504.3.3.el6/linux-2.6.32-504.3.3.el6.x86_64/fs/namei.c: 1239
#10 [ffff881c20ef1d30] path_walk at ffffffff8119f40a
/usr/src/debug/kernel-2.6.32-504.3.3.el6/linux-2.6.32-504.3.3.el6.x86_64/fs/namei.c: 558
#11 [ffff881c20ef1d70] filename_lookup at ffffffff8119f61b
/usr/src/debug/kernel-2.6.32-504.3.3.el6/linux-2.6.32-504.3.3.el6.x86_64/fs/namei.c: 1375
#12 [ffff881c20ef1db0] user_path_at at ffffffff811a0747
/usr/src/debug/kernel-2.6.32-504.3.3.el6/linux-2.6.32-504.3.3.el6.x86_64/fs/namei.c: 1597
#13 [ffff881c20ef1e80] vfs_fstatat at ffffffff81193bc0
/usr/src/debug/kernel-2.6.32-504.3.3.el6/linux-2.6.32-504.3.3.el6.x86_64/fs/stat.c: 84
#14 [ffff881c20ef1ee0] vfs_lstat at ffffffff81193c7e
/usr/src/debug/kernel-2.6.32-504.3.3.el6/linux-2.6.32-504.3.3.el6.x86_64/fs/stat.c: 107
#15 [ffff881c20ef1ef0] sys_newlstat at ffffffff81193ca4
/usr/src/debug/kernel-2.6.32-504.3.3.el6/linux-2.6.32-504.3.3.el6.x86_64/fs/stat.c: 257
#16 [ffff881c20ef1f80] system_call_fastpath at ffffffff8100b072
/usr/src/debug////////kernel-2.6.32-504.3.3.el6/linux-2.6.32-504.3.3.el6.x86_64/arch/x86/kernel/entry_64.S: 489
RIP: 00000031f78daf75 RSP: 00007fff51e03da8 RFLAGS: 00010206
RAX: 0000000000000006 RBX: ffffffff8100b072 RCX: 00007fff51e040b0
RDX: 00007fff51dffb10 RSI: 00007fff51dffb10 RDI: 00007fff51e00ba0
RBP: 0000000000000000 R8: 00007fff51e00be3 R9: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007fff51e00ba0
R13: 00007fff51dffb10 R14: 00007fff51dffb10 R15: 00007fff51e00ba0
ORIG_RAX: 0000000000000006 CS: 0033 SS: 002b