Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.3.0
    • Lustre 2.3.0
    • None
    • 3
    • 3
    • 4498

    Description

      James A Simmons said:

      After finishing the stat test I attempted to lauch a job this morning on the machine and it killed the node with:

      [2012-07-26 08:07:19][c0-0c0s0n2]LustreError: 10556:0:(lov_lock.c:273:lov_subresult()) ASSERTION( rc <= 0 || rc == CLO_REPEAT || rc == CLO_WAIT ) failed:
      [2012-07-26 08:07:19][c0-0c0s0n2]LustreError: 10556:0:(lov_lock.c:273:lov_subresult()) LBUG
      [2012-07-26 08:07:19][c0-0c0s0n2]Pid: 10556, comm: ls
      [2012-07-26 08:07:19][c0-0c0s0n2]Call Trace:
      [2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffff810072e9>] try_stack_unwind+0x149/0x190
      [2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffff81005ca0>] dump_trace+0x90/0x300
      [2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffffa01237d2>] libcfs_debug_dumpstack+0x52/0x80 [libcfs]
      [2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffffa0123dc2>] lbug_with_loc+0x42/0xa0 [libcfs]
      [2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffffa07a7ff1>] lov_subresult+0x181/0x190 [lov]
      [2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffffa07ab2bf>] lov_lock_enqueue+0xcf/0x7c0 [lov]
      [2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffffa0392b3e>] cl_enqueue_try+0x12e/0x310 [obdclass]
      [2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffffa0395347>] cl_enqueue_locked+0x77/0x1e0 [obdclass]
      [2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffffa039560e>] cl_lock_request+0x15e/0x260 [obdclass]
      [2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffffa08e1185>] cl_glimpse_lock+0x155/0x4b0 [lustre]
      [2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffffa08e1732>] cl_glimpse_size0+0x172/0x180 [lustre]
      [2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffffa089b7a8>] ll_inode_revalidate_it+0xf8/0x1a0 [lustre]
      [2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffffa089b894>] ll_getattr_it+0x44/0x170 [lustre]
      [2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffffa089b9fc>] ll_getattr+0x3c/0x40 [lustre]
      [2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffff810fdd03>] vfs_getattr+0x23/0x40
      [2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffff810fe028>] vfs_fstatat+0x68/0x80
      [2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffff810fe059>] vfs_lstat+0x19/0x20
      [2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffff810fe1df>] sys_newlstat+0x1f/0x50
      [2012-07-26 08:07:19][c0-0c0[2012-07-26 08:07:19][c0-0c0s0n2] [<00007f901dea36c5>] 0x7f901dea36c5
      [2012-07-26 08:07:19][c0-0c0s0n2]Kernel panic - not syncing: LBUG
      [2012-07-26 08:07:19][c0-0c0s0n2]Pid: 10556, comm: ls Tainted: P 2.6.32.45-0.3.2_1.0400.6453-cray_gem_s #1
      [2012-07-26 08:07:19][c0-0c0s0n2]Call Trace:
      [2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffff810072e9>] try_stack_unwind+0x149/0x190
      [2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffff81005ca0>] dump_trace+0x90/0x300
      [2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffff81006e37>] show_trace_log_lvl+0x57/0x70
      [2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffff81006e60>] show_trace+0x10/0x20
      [2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffff8140319c>] dump_stack+0x72/0x7b
      [2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffff8140321a>] panic+0x75/0x13d
      [2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffffa0123e13>] lbug_with_loc+0x93/0xa0 [libcfs]
      [2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffffa07a7ff1>] lov_subresult+0x181/0x190 [lov]
      [2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffffa07ab2bf>] lov_lock_enqueue+0xcf/0x7c0 [lov]
      [2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffffa0392b3e>] cl_enqueue_try+0x12e/0x310 [obdclass]
      [2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffffa0395347>] cl_enqueue_locked+0x77/0x1e0 [obdclass]
      [2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffffa039560e>] cl_lock_request+0x15e/0x260 [obdclass]
      [2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffffa08e1185>] cl_glimpse_lock+0x155/0x4b0 [lustre]
      [2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffffa08e1732>] cl_glimpse_size0+0x172/0x180 [lustre]
      [2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffffa089b7a8>] ll_inode_revalidate_it+0xf8/0x1a0 [lustre]
      [2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffffa089b894>] ll_getattr_it+0x44/0x170 [lustre]
      [2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffffa089b9fc>] ll_getattr+0x3c/0x40 [lustre]
      [2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffff810fdd03>] vfs_getattr+0x23/0x40
      [2012-07-26 08:07:19][c0-0c0s0n2] [<ffffffff810fe028>] vfs_fstatat+0x68/0x80
      s0n2] [<ffffffff8100272b>] system_call_fastpath+0x16/0x1b
      [2012-07-26 08:07:19][c0-0c0s0n2] [<00007f901dea36c5>] 0x7f901dea36c5
      [2012-07-26 08:09:32][c0-0c0s0n3]BUG: unable to handle kernel NULL pointer dereference at 000000000000028e

      Fouther, James found more issues:

      [2012-07-26 15:10:40][c0-0c0s0n2]LustreError: 6383:0:(cl_lock.c:1462:cl_wait_try()) ASSERTION( lock->cll_state == CLS_ENQUEUED || lock->cll_state == CLS_HELD || l
      ock->cll_state == CLS_INTRANSIT ) failed:
      [2012-07-26 15:10:40][c0-0c0s0n2]LustreError: 6383:0:(cl_lock.c:1462:cl_wait_try()) LBUG
      [2012-07-26 15:10:40][c0-0c0s0n2]Pid: 6383, comm: ll_sa_6382
      [2012-07-26 15:10:40][c0-0c0s0n2]Call Trace:
      [2012-07-26 15:10:40][c0-0c0s0n2] [<ffffffff810072e9>] try_stack_unwind+0x149/0x190
      [2012-07-26 15:10:40][c0-0c0s0n2] [<ffffffff81005ca0>] dump_trace+0x90/0x300
      [2012-07-26 15:10:40][c0-0c0s0n2] [<ffffffffa01237d2>] libcfs_debug_dumpstack+0x52/0x80 [libcfs]
      [2012-07-26 15:10:40][c0-0c0s0n2] [<ffffffffa0123dc2>] lbug_with_loc+0x42/0xa0 [libcfs]
      [2012-07-26 15:10:40][c0-0c0s0n2] [<ffffffffa03905d0>] cl_wait_try+0x1e0/0x300 [obdclass]
      [2012-07-26 15:10:40][c0-0c0s0n2] [<ffffffffa06ff223>] osc_lock_upcall+0x1f3/0x610 [osc]
      [2012-07-26 15:10:40][c0-0c0s0n2] [<ffffffffa06de262>] osc_enqueue_base+0x2e2/0x560 [osc]
      [2012-07-26 15:10:40][c0-0c0s0n2] [<ffffffffa06fdf93>] osc_lock_enqueue+0x223/0x8d0 [osc]
      [2012-07-26 15:10:40][c0-0c0s0n2] [<ffffffffa0392b3e>] cl_enqueue_try+0x12e/0x310 [obdclass]
      [2012-07-26 15:10:40][c0-0c0s0n2] [<ffffffffa07ac36e>] lov_lock_enqueue+0x17e/0x7e0 [lov]
      [2012-07-26 15:10:40][c0-0c0s0n2] [<ffffffffa0392b3e>] cl_enqueue_try+0x12e/0x310 [obdclass]
      [2012-07-26 15:10:40][c0-0c0s0n2] [<ffffffffa0395347>] cl_enqueue_locked+0x77/0x1e0 [obdclass]
      [2012-07-26 15:10:40][c0-0c0s0n2] [<ffffffffa039560e>] cl_lock_request+0x15e/0x260 [obdclass]
      [2012-07-26 15:10:40][c0-0c0s0n2] [<ffffffffa08e2185>] cl_glimpse_lock+0x155/0x4b0 [lustre]
      [2012-07-26 15:10:40][c0-0c0s0n2] [<ffffffffa08e2732>] cl_glimpse_size0+0x172/0x180 [lustre]
      [2012-07-26 15:10:40][c0-0c0s0n2] [<ffffffffa08db1f3>] ll_agl_trigger+0xb3/0x340 [lustre]
      [2012-07-26 15:10:40][c0-0c0s0n2] [<ffffffffa08dfe85>] ll_statahead_thread+0x455/0x25a0 [lustre]
      [2012-07-26 15:10:40][c0-0c0s0n2] [<ffffffff810035ba>] child_rip+0xa/0x20
      [2012-07-26 15:10:40][c0-0c0s0n2]Kernel panic - not syncing: LBUG

      Attachments

        Activity

          People

            yong.fan nasf (Inactive)
            yong.fan nasf (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: