Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8418

node fails to kdump after lbug crash

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.9.0
    • Lustre 2.8.0
    • 3
    • 9223372036854775807

    Description

      A customer reported that LBUG() hit does not always cause a crash dump.
      Sometimes the system hangs after LBUG message in the logs.

      No kdump was generated from this crash.

      logs show the panic was not clean; after the LBUG, log messages continue from the node, including cpu soft lockups:

      Eventually a crash dump was triggered manually through serial console.

      It showed that the lbug thread hanged in memory allocation while all CPUs are spinning trying to get a spinlock holding by the lbug thread itself.

      PID: 38035  TASK: ffff880c83804100  CPU: 7   COMMAND: "mdt_446"
       #0 [ffff880cbaf7d6b0] schedule at ffffffff814ea122
       #1 [ffff880cbaf7d778] __cond_resched at ffffffff81061b4a
       #2 [ffff880cbaf7d798] _cond_resched at ffffffff814eab30
       #3 [ffff880cbaf7d7a8] kmem_cache_alloc_notrace at ffffffff8115f385
       #4 [ffff880cbaf7d7d8] call_usermodehelper_setup at ffffffff81089e5d
       #5 [ffff880cbaf7d828] libcfs_run_upcall at ffffffffa04ee9c0 [libcfs]
       #6 [ffff880cbaf7d8a8] libcfs_run_lbug_upcall at ffffffffa04eed5d [libcfs]
       #7 [ffff880cbaf7d928] lbug_with_loc at ffffffffa04eee38 [libcfs]
       #8 [ffff880cbaf7d948] ldlm_export_flock_put at ffffffffa07e137a [ptlrpc]
       #9 [ffff880cbaf7d968] cfs_hash_bd_del_locked at ffffffffa04ffab1 [libcfs]
      #10 [ffff880cbaf7d998] cfs_hash_del at ffffffffa0502811 [libcfs]
      #11 [ffff880cbaf7d9e8] ldlm_flock_blocking_unlink at ffffffffa07e1b82 [ptlrpc]
      #12 [ffff880cbaf7d9f8] ldlm_process_flock_lock at ffffffffa07e25a2 [ptlrpc]
      #13 [ffff880cbaf7daf8] ldlm_reprocess_queue at ffffffffa07b6132 [ptlrpc]
      #14 [ffff880cbaf7db48] ldlm_process_flock_lock at ffffffffa07e265f [ptlrpc]
      #15 [ffff880cbaf7dc48] ldlm_lock_enqueue at ffffffffa07b7533 [ptlrpc]
      #16 [ffff880cbaf7dca8] ldlm_handle_enqueue0 at ffffffffa07df0ef [ptlrpc]
      #17 [ffff880cbaf7dd18] mdt_enqueue at ffffffffa0d18a16 [mdt]
      #18 [ffff880cbaf7dd38] mdt_handle_common at ffffffffa0d0bffa [mdt]
      #19 [ffff880cbaf7dd88] mdt_regular_handle at ffffffffa0d0ceb5 [mdt]
      

      Attachments

        Activity

          People

            wc-triage WC Triage
            zam Alexander Zarochentsev
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: