After finishing the stat test I attempted to lauch a job this morning on the machine and it killed the node with:
[2012-07-26 08:07:19][c0-0c0s0n2]LustreError: 10556:0:(lov_lock.c:273:lov_subresult()) ASSERTION( rc <= 0 || rc == CLO_REPEAT || rc == CLO_WAIT ) failed:
[2012-07-26 08:07:19][c0-0c0s0n2]LustreError: 10556:0:(lov_lock.c:273:lov_subresult()) LBUG
[2012-07-26 08:07:19][c0-0c0s0n2]Pid: 10556, comm: ls
[2012-07-26 08:07:19][c0-0c0s0n2]Call Trace:
[2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffff810072e9>] try_stack_unwind+0x149/0x190
[2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffff81005ca0>] dump_trace+0x90/0x300
[2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffffa01237d2>] libcfs_debug_dumpstack+0x52/0x80 [libcfs]
[2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffffa0123dc2>] lbug_with_loc+0x42/0xa0 [libcfs]
[2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffffa07a7ff1>] lov_subresult+0x181/0x190 [lov]
[2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffffa07ab2bf>] lov_lock_enqueue+0xcf/0x7c0 [lov]
[2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffffa0392b3e>] cl_enqueue_try+0x12e/0x310 [obdclass]
[2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffffa0395347>] cl_enqueue_locked+0x77/0x1e0 [obdclass]
[2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffffa039560e>] cl_lock_request+0x15e/0x260 [obdclass]
[2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffffa08e1185>] cl_glimpse_lock+0x155/0x4b0 [lustre]
[2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffffa08e1732>] cl_glimpse_size0+0x172/0x180 [lustre]
[2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffffa089b7a8>] ll_inode_revalidate_it+0xf8/0x1a0 [lustre]
[2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffffa089b894>] ll_getattr_it+0x44/0x170 [lustre]
[2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffffa089b9fc>] ll_getattr+0x3c/0x40 [lustre]
[2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffff810fdd03>] vfs_getattr+0x23/0x40
[2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffff810fe028>] vfs_fstatat+0x68/0x80
[2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffff810fe059>] vfs_lstat+0x19/0x20
[2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffff810fe1df>] sys_newlstat+0x1f/0x50
[2012-07-26 08:07:19][c0-0c0[2012-07-26 08:07:19][c0-0c0s0n2] [<00007f901dea36c5>] 0x7f901dea36c5
[2012-07-26 08:07:19][c0-0c0s0n2]Kernel panic - not syncing: LBUG
[2012-07-26 08:07:19][c0-0c0s0n2]Pid: 10556, comm: ls Tainted: P 2.6.32.45-0.3.2_1.0400.6453-cray_gem_s #1
[2012-07-26 08:07:19][c0-0c0s0n2]Call Trace:
[2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffff810072e9>] try_stack_unwind+0x149/0x190
[2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffff81005ca0>] dump_trace+0x90/0x300
[2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffff81006e37>] show_trace_log_lvl+0x57/0x70
[2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffff81006e60>] show_trace+0x10/0x20
[2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffff8140319c>] dump_stack+0x72/0x7b
[2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffff8140321a>] panic+0x75/0x13d
[2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffffa0123e13>] lbug_with_loc+0x93/0xa0 [libcfs]
[2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffffa07a7ff1>] lov_subresult+0x181/0x190 [lov]
[2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffffa07ab2bf>] lov_lock_enqueue+0xcf/0x7c0 [lov]
[2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffffa0392b3e>] cl_enqueue_try+0x12e/0x310 [obdclass]
[2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffffa0395347>] cl_enqueue_locked+0x77/0x1e0 [obdclass]
[2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffffa039560e>] cl_lock_request+0x15e/0x260 [obdclass]
[2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffffa08e1185>] cl_glimpse_lock+0x155/0x4b0 [lustre]
[2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffffa08e1732>] cl_glimpse_size0+0x172/0x180 [lustre]
[2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffffa089b7a8>] ll_inode_revalidate_it+0xf8/0x1a0 [lustre]
[2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffffa089b894>] ll_getattr_it+0x44/0x170 [lustre]
[2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffffa089b9fc>] ll_getattr+0x3c/0x40 [lustre]
[2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffff810fdd03>] vfs_getattr+0x23/0x40
[2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffff810fe028>] vfs_fstatat+0x68/0x80
s0n2] [<ffffffff8100272b>] system_call_fastpath+0x16/0x1b
[2012-07-26 08:07:19][c0-0c0s0n2] [<00007f901dea36c5>] 0x7f901dea36c5
[2012-07-26 08:09:32][c0-0c0s0n3]BUG: unable to handle kernel NULL pointer dereference at 000000000000028e
A new ticket for AGL bugs tracing is opened:
http://jira.whamcloud.com/browse/LU-1683
All AGL related bugs will be moved to such ticket from now on.