Details
-
Bug
-
Resolution: Fixed
-
Major
-
Lustre 2.3.0
-
None
-
3
-
3
-
4498
Description
James A Simmons said:
After finishing the stat test I attempted to lauch a job this morning on the machine and it killed the node with:
[2012-07-26 08:07:19][c0-0c0s0n2]LustreError: 10556:0:(lov_lock.c:273:lov_subresult()) ASSERTION( rc <= 0 || rc == CLO_REPEAT || rc == CLO_WAIT ) failed:
[2012-07-26 08:07:19][c0-0c0s0n2]LustreError: 10556:0:(lov_lock.c:273:lov_subresult()) LBUG
[2012-07-26 08:07:19][c0-0c0s0n2]Pid: 10556, comm: ls
[2012-07-26 08:07:19][c0-0c0s0n2]Call Trace:
[2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffff810072e9>] try_stack_unwind+0x149/0x190
[2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffff81005ca0>] dump_trace+0x90/0x300
[2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffffa01237d2>] libcfs_debug_dumpstack+0x52/0x80 [libcfs]
[2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffffa0123dc2>] lbug_with_loc+0x42/0xa0 [libcfs]
[2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffffa07a7ff1>] lov_subresult+0x181/0x190 [lov]
[2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffffa07ab2bf>] lov_lock_enqueue+0xcf/0x7c0 [lov]
[2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffffa0392b3e>] cl_enqueue_try+0x12e/0x310 [obdclass]
[2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffffa0395347>] cl_enqueue_locked+0x77/0x1e0 [obdclass]
[2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffffa039560e>] cl_lock_request+0x15e/0x260 [obdclass]
[2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffffa08e1185>] cl_glimpse_lock+0x155/0x4b0 [lustre]
[2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffffa08e1732>] cl_glimpse_size0+0x172/0x180 [lustre]
[2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffffa089b7a8>] ll_inode_revalidate_it+0xf8/0x1a0 [lustre]
[2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffffa089b894>] ll_getattr_it+0x44/0x170 [lustre]
[2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffffa089b9fc>] ll_getattr+0x3c/0x40 [lustre]
[2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffff810fdd03>] vfs_getattr+0x23/0x40
[2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffff810fe028>] vfs_fstatat+0x68/0x80
[2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffff810fe059>] vfs_lstat+0x19/0x20
[2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffff810fe1df>] sys_newlstat+0x1f/0x50
[2012-07-26 08:07:19][c0-0c0[2012-07-26 08:07:19][c0-0c0s0n2] [<00007f901dea36c5>] 0x7f901dea36c5
[2012-07-26 08:07:19][c0-0c0s0n2]Kernel panic - not syncing: LBUG
[2012-07-26 08:07:19][c0-0c0s0n2]Pid: 10556, comm: ls Tainted: P 2.6.32.45-0.3.2_1.0400.6453-cray_gem_s #1
[2012-07-26 08:07:19][c0-0c0s0n2]Call Trace:
[2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffff810072e9>] try_stack_unwind+0x149/0x190
[2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffff81005ca0>] dump_trace+0x90/0x300
[2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffff81006e37>] show_trace_log_lvl+0x57/0x70
[2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffff81006e60>] show_trace+0x10/0x20
[2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffff8140319c>] dump_stack+0x72/0x7b
[2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffff8140321a>] panic+0x75/0x13d
[2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffffa0123e13>] lbug_with_loc+0x93/0xa0 [libcfs]
[2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffffa07a7ff1>] lov_subresult+0x181/0x190 [lov]
[2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffffa07ab2bf>] lov_lock_enqueue+0xcf/0x7c0 [lov]
[2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffffa0392b3e>] cl_enqueue_try+0x12e/0x310 [obdclass]
[2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffffa0395347>] cl_enqueue_locked+0x77/0x1e0 [obdclass]
[2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffffa039560e>] cl_lock_request+0x15e/0x260 [obdclass]
[2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffffa08e1185>] cl_glimpse_lock+0x155/0x4b0 [lustre]
[2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffffa08e1732>] cl_glimpse_size0+0x172/0x180 [lustre]
[2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffffa089b7a8>] ll_inode_revalidate_it+0xf8/0x1a0 [lustre]
[2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffffa089b894>] ll_getattr_it+0x44/0x170 [lustre]
[2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffffa089b9fc>] ll_getattr+0x3c/0x40 [lustre]
[2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffff810fdd03>] vfs_getattr+0x23/0x40
[2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffff810fe028>] vfs_fstatat+0x68/0x80
s0n2] [<ffffffff8100272b>] system_call_fastpath+0x16/0x1b
[2012-07-26 08:07:19][c0-0c0s0n2] [<00007f901dea36c5>] 0x7f901dea36c5
[2012-07-26 08:09:32][c0-0c0s0n3]BUG: unable to handle kernel NULL pointer dereference at 000000000000028e
Fouther, James found more issues:
[2012-07-26 15:10:40][c0-0c0s0n2]LustreError: 6383:0:(cl_lock.c:1462:cl_wait_try()) ASSERTION( lock->cll_state == CLS_ENQUEUED || lock->cll_state == CLS_HELD || l
ock->cll_state == CLS_INTRANSIT ) failed:
[2012-07-26 15:10:40][c0-0c0s0n2]LustreError: 6383:0:(cl_lock.c:1462:cl_wait_try()) LBUG
[2012-07-26 15:10:40][c0-0c0s0n2]Pid: 6383, comm: ll_sa_6382
[2012-07-26 15:10:40][c0-0c0s0n2]Call Trace:
[2012-07-26 15:10:40][c0-0c0s0n2] [<ffffffff810072e9>] try_stack_unwind+0x149/0x190
[2012-07-26 15:10:40][c0-0c0s0n2] [<ffffffff81005ca0>] dump_trace+0x90/0x300
[2012-07-26 15:10:40][c0-0c0s0n2] [<ffffffffa01237d2>] libcfs_debug_dumpstack+0x52/0x80 [libcfs]
[2012-07-26 15:10:40][c0-0c0s0n2] [<ffffffffa0123dc2>] lbug_with_loc+0x42/0xa0 [libcfs]
[2012-07-26 15:10:40][c0-0c0s0n2] [<ffffffffa03905d0>] cl_wait_try+0x1e0/0x300 [obdclass]
[2012-07-26 15:10:40][c0-0c0s0n2] [<ffffffffa06ff223>] osc_lock_upcall+0x1f3/0x610 [osc]
[2012-07-26 15:10:40][c0-0c0s0n2] [<ffffffffa06de262>] osc_enqueue_base+0x2e2/0x560 [osc]
[2012-07-26 15:10:40][c0-0c0s0n2] [<ffffffffa06fdf93>] osc_lock_enqueue+0x223/0x8d0 [osc]
[2012-07-26 15:10:40][c0-0c0s0n2] [<ffffffffa0392b3e>] cl_enqueue_try+0x12e/0x310 [obdclass]
[2012-07-26 15:10:40][c0-0c0s0n2] [<ffffffffa07ac36e>] lov_lock_enqueue+0x17e/0x7e0 [lov]
[2012-07-26 15:10:40][c0-0c0s0n2] [<ffffffffa0392b3e>] cl_enqueue_try+0x12e/0x310 [obdclass]
[2012-07-26 15:10:40][c0-0c0s0n2] [<ffffffffa0395347>] cl_enqueue_locked+0x77/0x1e0 [obdclass]
[2012-07-26 15:10:40][c0-0c0s0n2] [<ffffffffa039560e>] cl_lock_request+0x15e/0x260 [obdclass]
[2012-07-26 15:10:40][c0-0c0s0n2] [<ffffffffa08e2185>] cl_glimpse_lock+0x155/0x4b0 [lustre]
[2012-07-26 15:10:40][c0-0c0s0n2] [<ffffffffa08e2732>] cl_glimpse_size0+0x172/0x180 [lustre]
[2012-07-26 15:10:40][c0-0c0s0n2] [<ffffffffa08db1f3>] ll_agl_trigger+0xb3/0x340 [lustre]
[2012-07-26 15:10:40][c0-0c0s0n2] [<ffffffffa08dfe85>] ll_statahead_thread+0x455/0x25a0 [lustre]
[2012-07-26 15:10:40][c0-0c0s0n2] [<ffffffff810035ba>] child_rip+0xa/0x20
[2012-07-26 15:10:40][c0-0c0s0n2]Kernel panic - not syncing: LBUG