Details
-
Bug
-
Resolution: Duplicate
-
Critical
-
None
-
Lustre 2.1.5
-
None
-
Linux 2.6.32-279.19.1.el6_lustre.x86_64 #1 SMP
-
3
-
13522
Description
We have a kernel crash on Lustre Client 2.1.5 with the following log:
[root@r01 ~]# crash /usr/lib/debug/lib/modules/2.6.32-279.19.1.el6_lustre.x86_64/vmlinux /var/crash/127.0.0.1-2014-04-11-17\:36\:14/vmcore
crash 6.0.4-2.el6
Copyright (C) 2002-2012 Red Hat, Inc.
Copyright (C) 2004, 2005, 2006 IBM Corporation
Copyright (C) 1999-2006 Hewlett-Packard Co
Copyright (C) 2005, 2006 Fujitsu Limited
Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
Copyright (C) 2005 NEC Corporation
Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions. Enter "help copying" to see the conditions.
This program has absolutely no warranty. Enter "help warranty" for details.
GNU gdb (GDB) 7.3.1
Copyright (C) 2011 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu"...
KERNEL: /usr/lib/debug/lib/modules/2.6.32-279.19.1.el6_lustre.x86_64/vmlinux
DUMPFILE: /var/crash/127.0.0.1-2014-04-11-17:36:14/vmcore [PARTIAL DUMP]
CPUS: 16
DATE: Fri Apr 11 17:35:10 2014
UPTIME: 12 days, 06:09:56
LOAD AVERAGE: 1.22, 1.18, 1.57
TASKS: 604
NODENAME: r01
RELEASE: 2.6.32-279.19.1.el6_lustre.x86_64
VERSION: #1 SMP Wed Mar 20 16:37:18 PDT 2013
MACHINE: x86_64 (2400 Mhz)
MEMORY: 12 GB
PANIC: ""
PID: 28331
COMMAND: "ldlm_bl_00"
TASK: ffff880334904aa0 [THREAD_INFO: ffff88023c770000]
CPU: 6
STATE: TASK_RUNNING (PANIC)
crash> log
...
Pid: 28331, comm: ldlm_bl_00 Not tainted 2.6.32-279.19.1.el6_lustre.x86_64 #1 Supermicro X8DTH-i/6/iF/6F/X8DTH
RIP: 0010:[<ffffffffa04de478>] [<ffffffffa04de478>] cl_lock_put+0x118/0x490 [obdclass]
RSP: 0018:ffff88023c771c40 EFLAGS: 00010246
RAX: 0000000000000001 RBX: 5a5a5a5a5a5a5a5a RCX: ffff8801410d37b8
RDX: ffffffffa04ff485 RSI: 5a5a5a5a5a5a5a5a RDI: ffff880181fc3930
RBP: ffff88023c771c70 R08: ffffffffa04ef540 R09: 00000000000002f4
R10: 00000000deadbeef R11: 0000000000000000 R12: ffff880181fc3930
R13: ffff880181fc3930 R14: ffff88018be93420 R15: ffff88023c771ca0
FS: 00007f63a402f700(0000) GS:ffff8801c5840000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000003b1faabc30 CR3: 00000001cf1a7000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process ldlm_bl_00 (pid: 28331, threadinfo ffff88023c770000, task ffff880334904aa0)
Stack:
ffff88023c771c70 ffff880132aaf000 ffff88018be93420 ffff880181fc3930
<d> ffff88018be93420 ffff88023c771ca0 ffff88023c771ce0 ffffffffa0953b30
<d> ffff880200000000 ffff8801410d37b8 ffff8801410d37b8 00000002a0393092
Call Trace:
[<ffffffffa0953b30>] osc_ldlm_blocking_ast+0xb0/0x380 [osc]
[<ffffffffa05e4cc0>] ldlm_cancel_callback+0x60/0x100 [ptlrpc]
[<ffffffffa05ff14b>] ldlm_cli_cancel_local+0x7b/0x380 [ptlrpc]
[<ffffffffa0602fd8>] ldlm_cli_cancel+0x58/0x3a0 [ptlrpc]
[<ffffffffa0952af1>] osc_lock_cancel+0xe1/0x1b0 [osc]
[<ffffffffa04d544d>] ? cl_env_nested_get+0x5d/0xc0 [obdclass]
[<ffffffffa04db225>] cl_lock_cancel0+0x75/0x160 [obdclass]
[<ffffffffa04dbf0b>] cl_lock_cancel+0x13b/0x140 [obdclass]
[<ffffffffa0953bba>] osc_ldlm_blocking_ast+0x13a/0x380 [osc]
[<ffffffffa0606123>] ldlm_handle_bl_callback+0x123/0x2e0 [ptlrpc]
[<ffffffffa0606561>] ldlm_bl_thread_main+0x281/0x3d0 [ptlrpc]
[<ffffffff8105fa40>] ? default_wake_function+0x0/0x20
[<ffffffffa06062e0>] ? ldlm_bl_thread_main+0x0/0x3d0 [ptlrpc]
[<ffffffff8100c0ca>] child_rip+0xa/0x20
[<ffffffffa06062e0>] ? ldlm_bl_thread_main+0x0/0x3d0 [ptlrpc]
[<ffffffffa06062e0>] ? ldlm_bl_thread_main+0x0/0x3d0 [ptlrpc]
[<ffffffff8100c0c0>] ? child_rip+0x0/0x20
Code: 00 00 00 00 00 c7 05 5c db 06 00 01 00 00 00 48 c7 c7 a0 bf 54 a0 8b 13 4c 8b 45 08 31 c0 e8 d0 d9 ea ff eb 12 66 0f 1f 44 00 00 <48> 8b 43 28 48 8b 40 08 4c 8b 68 18 f0 ff 0b 0f 94 c0 84 c0 74
RIP [<ffffffffa04de478>] cl_lock_put+0x118/0x490 [obdclass]
RSP <ffff88023c771c40>
Cluster configuration:
Lustre Server MGS/MDS - mmp-2
Lustre Servers OSS - n11, n12, n13, n14, n15, n21, n22, n23, n24, n25
Lustre Clients - r01, r02, r03, r04, mmp-1, vn-1, cln01, cln02, cln03, cln04
(refer to the diagram "20140113 - Hardware Diagram v0.1_R3.gif" in attachment)
Environment:
Linux 2.6.32-279.19.1.el6_lustre.x86_64 #1 SMP
Mount points:
OSS:
/dev/md11 on /lustre/ost type lustre (rw,noauto,_netdev,abort_recov)
MGS/MDS:
/dev/lustre_mgs on /lustre/mgs type lustre (rw,noauto,_netdev,abort_recov)
/dev/lustre_mdt1 on /lustre/mdt1 type lustre (rw,noauto,_netdev,abort_recov)
Clients (r01, r02, r03, r04, mmp-1, vn-1):
mmp-2@tcp:mmp-1@tcp:/lustre1 on /array1 type lustre (rw,noauto,_netdev,noflock,abort_recov,lazystatfs)
Clients (cln01, cln02, cln03, cln04):
mmp-2@tcp:mmp-1@tcp:/lustre1 on /array1 type lustre (rw,noauto,_netdev,localflock,abort_recov,lazystatfs)
Stripe config:
[root@mmp-1 ~]# lfs getstripe /array1/.
/array1/.
stripe_count: 1 stripe_size: 1048576 stripe_offset: -1
kdump config:
core_collector makedumpfile -c --message-level 1 -d 31
We have a crash dump file, and if you need it for analysis, we are ready to upload it.