Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1046

fsstress - Watchdog detected hard LOCKUP on cpu 0

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: Lustre 2.2.0
    • Fix Version/s: Lustre 2.2.0
    • Labels:
      None
    • Environment:
      2.1.55 - Hyperion
    • Severity:
      3
    • Rank (Obsolete):
      4745

      Description

      Running fsstress - a single client passes, 10 client run exited with this error.

      I am not certain this is a lustre error.
      Stack:
      ----------
      2012-01-26 16:28:27 Lustre: 6811:0:(cmm_object.c:686:cml_rename_warn()) cml_rename failed for mdo_rename, should revoke: [mo_po [0x
      200000400:0x18a:0x0]] [mo_pn [0x200000400:0x18a:0x0]] [lf [0x2000013a1:0x14938:0x0]] [sname fstest_b2f496891449bc8de19b7849182f2acc
      ] [mo_t [0x2000013a1:0x149a9:0x0]] [tname fstest_30283864daf0badb6075fdf78d2da862] [err -39]
      2012-01-26 16:28:28 Lustre: 6654:0:(cmm_object.c:686:cml_rename_warn()) cml_rename failed for mdo_rename, should revoke: [mo_po [0x
      200000400:0x18a:0x0]] [mo_pn [0x200000400:0x18a:0x0]] [lf [0x2000013a1:0x14938:0x0]] [sname fstest_b2f496891449bc8de19b7849182f2acc
      ] [mo_t [0x2000013a1:0x149a9:0x0]] [tname fstest_30283864daf0badb6075fdf78d2da862] [err -39]

      <ConMan> Console [hyperion-rst6] departed by <root@localhost> on pts/131 at 01-26 16:44.
      2012-01-26 16:44:25 Lustre: 6594:0:(ldlm_lib.c:909:target_handle_connect()) MGS: connection from 7215a55e-8798-a29d-836c-c59c854cdb
      88@192.168.115.141@o2ib t0 exp (null) cur 1327625065 last 0
      2012-01-26 16:44:25 Lustre: 6594:0:(ldlm_lib.c:909:target_handle_connect()) Skipped 12 previous similar messages
      2012-01-26 16:45:29 Lustre: 6817:0:(ldlm_lib.c:909:target_handle_connect()) lustre-MDT0000: connection from c013f4e3-7f4f-0cba-76c9
      -53313da4adf0@192.168.115.130@o2ib1 t0 exp (null) cur 1327625129 last 0
      2012-01-26 16:45:29 Lustre: 6817:0:(ldlm_lib.c:909:target_handle_connect()) Skipped 42 previous similar messages
      2012-01-26 16:57:22 Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 0
      2012-01-26 16:57:22 Pid: 6147, comm: ptlrpcd_8 Not tainted 2.6.32-220.el6_lustre.x86_64 #1
      2012-01-26 16:57:22 Call Trace:
      2012-01-26 16:57:22 <NMI> [<ffffffff814ec701>] ? panic+0x78/0x143
      2012-01-26 16:57:22 [<ffffffff810d8fad>] ? watchdog_overflow_callback+0xcd/0xd0
      2012-01-26 16:57:22 [<ffffffff8110a89d>] ? __perf_event_overflow+0x9d/0x230
      2012-01-26 16:57:22 [<ffffffff8110ae54>] ? perf_event_overflow+0x14/0x20
      2012-01-26 16:57:22 [<ffffffff8101e096>] ? intel_pmu_handle_irq+0x336/0x550
      2012-01-26 16:57:22 [<ffffffff814f22d6>] ? kprobe_exceptions_notify+0x16/0x430
      2012-01-26 16:57:22 [<ffffffff814f0db9>] ? perf_event_nmi_handler+0x39/0xb0
      2012-01-26 16:57:22 [<ffffffff814f2905>] ? notifier_call_chain+0x55/0x80
      2012-01-26 16:57:22 [<ffffffff814f296a>] ? atomic_notifier_call_chain+0x1a/0x20
      2012-01-26 16:57:22 [<ffffffff81096bce>] ? notify_die+0x2e/0x30
      2012-01-26 16:57:22 [<ffffffff814f0583>] ? do_nmi+0x173/0x2b0
      2012-01-26 16:57:22 [<ffffffff814efe90>] ? nmi+0x20/0x30
      2012-01-26 16:57:22 [<ffffffff814ef5f2>] ? _spin_lock_irqsave+0x32/0x40
      2012-01-26 16:57:22 <<EOE>> [<ffffffff810517f2>] ? __wake_up+0x32/0x70
      2012-01-26 16:57:22 [<ffffffffa040175a>] ? cfs_waitq_signal+0x1a/0x20 [libcfs]
      2012-01-26 16:57:22 [<ffffffffa05b3bd9>] ? ldlm_cb_interpret+0x179/0x1b0 [ptlrpc]
      2012-01-26 16:57:22 [<ffffffffa05cc9d0>] ? ptlrpc_check_set+0x5d0/0x1ac0 [ptlrpc]
      2012-01-26 16:57:22 [<ffffffff8107c0ec>] ? lock_timer_base+0x3c/0x70
      2012-01-26 16:57:22 [<ffffffffa0600a03>] ? ptlrpcd_check+0x5f3/0x610 [ptlrpc]
      2012-01-26 16:57:22 [<ffffffffa060221b>] ? ptlrpcd+0x24b/0x3c0 [ptlrpc]
      2012-01-26 16:57:22 [<ffffffff8105fa50>] ? default_wake_function+0x0/0x20
      2012-01-26 16:57:22 [<ffffffffa0601fd0>] ? ptlrpcd+0x0/0x3c0 [ptlrpc]
      2012-01-26 16:57:22 [<ffffffff8100c14a>] ? child_rip+0xa/0x20
      2012-01-26 16:57:22 [<ffffffffa0601fd0>] ? ptlrpcd+0x0/0x3c0 [ptlrpc]
      2012-01-26 16:57:22 [<ffffffffa0601fd0>] ? ptlrpcd+0x0/0x3c0 [ptlrpc]
      2012-01-26 16:57:22 [<ffffffff8100c140>] ? child_rip+0x0/0x20
      2012-01-26 16:57:22 Initializing cgroup subsys cpuset
      -------------

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                yong.fan nasf (Inactive)
                Reporter:
                cliffw Cliff White (Inactive)
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: