Details
-
Bug
-
Resolution: Cannot Reproduce
-
Critical
-
None
-
Lustre 2.1.5
-
None
-
Linux 2.6.32-279.19.1.el6_lustre.x86_64 #1 SMP
-
3
-
9700
Description
We have a kernel crash on Lustre Client 2.1.5 with the following assertion:
LustreError: 31091:0:(lov_io.c:214:lov_sub_get()) ASSERTION( stripe < lio->lis_stripe_count ) failed:
LustreError: 31091:0:(lov_io.c:214:lov_sub_get()) LBUG
It very similar to:
This bug has been fixed in 2.4? If so, any plans to fix it in 2.1? And how can you get around the error (perhaps by configuring) without updating?
[root@r03 lustre_2.1.5]# crash /usr/lib/debug/lib/modules/2.6.32-279.19.1.el6_lustre.x86_64/vmlinux /var/crash/127.0.0.1-2013-08-13-10\:15\:56/vmcore
crash 6.0.4-2.el6
Copyright (C) 2002-2012 Red Hat, Inc.
Copyright (C) 2004, 2005, 2006 IBM Corporation
Copyright (C) 1999-2006 Hewlett-Packard Co
Copyright (C) 2005, 2006 Fujitsu Limited
Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
Copyright (C) 2005 NEC Corporation
Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions. Enter "help copying" to see the conditions.
This program has absolutely no warranty. Enter "help warranty" for details.
GNU gdb (GDB) 7.3.1
Copyright (C) 2011 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu"...
KERNEL: /usr/lib/debug/lib/modules/2.6.32-279.19.1.el6_lustre.x86_64/vmlinux
DUMPFILE: /var/crash/127.0.0.1-2013-08-13-10:15:56/vmcore [PARTIAL DUMP]
CPUS: 16
DATE: Tue Aug 13 10:14:51 2013
UPTIME: 4 days, 12:04:11
LOAD AVERAGE: 0.00, 0.11, 0.12
TASKS: 513
NODENAME: r03
RELEASE: 2.6.32-279.19.1.el6_lustre.x86_64
VERSION: #1 SMP Wed Mar 20 16:37:18 PDT 2013
MACHINE: x86_64 (2400 Mhz)
MEMORY: 12 GB
PANIC: "Kernel panic - not syncing: LBUG"
PID: 31091
COMMAND: "lrvfarmd"
TASK: ffff88013cd3b500 [THREAD_INFO: ffff880149fd4000]
CPU: 1
STATE: TASK_RUNNING (PANIC)
crash> log
LustreError: 31091:0:(lov_io.c:214:lov_sub_get()) ASSERTION( stripe < lio->lis_stripe_count ) failed:
LustreError: 31091:0:(lov_io.c:214:lov_sub_get()) LBUG
Pid: 31091, comm: lrvfarmd
Call Trace:
[<ffffffffa034a785>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
[<ffffffffa034ad97>] lbug_with_loc+0x47/0xb0 [libcfs]
[<ffffffffa099e93f>] lov_sub_get+0x47f/0x6f0 [lov]
[<ffffffffa0998cfc>] lov_page_init_raid0+0x14c/0x770 [lov]
[<ffffffff812754b4>] ? call_rwsem_down_read_failed+0x14/0x30
[<ffffffffa0995a54>] lov_page_init+0x54/0xe0 [lov]
[<ffffffffa04a415c>] cl_page_find0+0x1cc/0x850 [obdclass]
[<ffffffffa04a4811>] cl_page_find+0x11/0x20 [obdclass]
[<ffffffffa0a591d2>] ll_cl_init+0x152/0x560 [lustre]
[<ffffffff8116b858>] ? mem_cgroup_cache_charge+0x118/0x130
[<ffffffffa0a5962a>] ll_readpage+0x4a/0x200 [lustre]
[<ffffffff811117ec>] generic_file_aio_read+0x1fc/0x700
[<ffffffff8109672f>] ? up+0x2f/0x50
[<ffffffffa0a80cdb>] vvp_io_read_start+0x13b/0x3e0 [lustre]
[<ffffffffa04ac23a>] cl_io_start+0x6a/0x140 [obdclass]
[<ffffffffa04b0a7c>] cl_io_loop+0xcc/0x190 [obdclass]
[<ffffffffa0a31047>] ll_file_io_generic+0x3a7/0x560 [lustre]
[<ffffffffa0a31339>] ll_file_aio_read+0x139/0x2c0 [lustre]
[<ffffffffa0a317f9>] ll_file_read+0x169/0x2a0 [lustre]
[<ffffffff81176cb5>] vfs_read+0xb5/0x1a0
[<ffffffff81176df1>] sys_read+0x51/0x90
[<ffffffff814ed03e>] ? do_device_not_available+0xe/0x10
[<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
Kernel panic - not syncing: LBUG
Pid: 31091, comm: lrvfarmd Not tainted 2.6.32-279.19.1.el6_lustre.x86_64 #1
Call Trace:
[<ffffffff814e9811>] ? panic+0xa0/0x168
[<ffffffffa034adeb>] ? lbug_with_loc+0x9b/0xb0 [libcfs]
[<ffffffffa099e93f>] ? lov_sub_get+0x47f/0x6f0 [lov]
[<ffffffffa0998cfc>] ? lov_page_init_raid0+0x14c/0x770 [lov]
[<ffffffff812754b4>] ? call_rwsem_down_read_failed+0x14/0x30
[<ffffffffa0995a54>] ? lov_page_init+0x54/0xe0 [lov]
[<ffffffffa04a415c>] ? cl_page_find0+0x1cc/0x850 [obdclass]
[<ffffffffa04a4811>] ? cl_page_find+0x11/0x20 [obdclass]
[<ffffffffa0a591d2>] ? ll_cl_init+0x152/0x560 [lustre]
[<ffffffff8116b858>] ? mem_cgroup_cache_charge+0x118/0x130
[<ffffffffa0a5962a>] ? ll_readpage+0x4a/0x200 [lustre]
[<ffffffff811117ec>] ? generic_file_aio_read+0x1fc/0x700
[<ffffffff8109672f>] ? up+0x2f/0x50
[<ffffffffa0a80cdb>] ? vvp_io_read_start+0x13b/0x3e0 [lustre]
[<ffffffffa04ac23a>] ? cl_io_start+0x6a/0x140 [obdclass]
[<ffffffffa04b0a7c>] ? cl_io_loop+0xcc/0x190 [obdclass]
[<ffffffffa0a31047>] ? ll_file_io_generic+0x3a7/0x560 [lustre]
[<ffffffffa0a31339>] ? ll_file_aio_read+0x139/0x2c0 [lustre]
[<ffffffffa0a317f9>] ? ll_file_read+0x169/0x2a0 [lustre]
[<ffffffff81176cb5>] ? vfs_read+0xb5/0x1a0
[<ffffffff81176df1>] ? sys_read+0x51/0x90
[<ffffffff814ed03e>] ? do_device_not_available+0xe/0x10
[<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b