Details
-
Bug
-
Resolution: Fixed
-
Major
-
Lustre 2.8.0
-
3
-
9223372036854775807
Description
A customer reported that LBUG() hit does not always cause a crash dump.
Sometimes the system hangs after LBUG message in the logs.
No kdump was generated from this crash.
logs show the panic was not clean; after the LBUG, log messages continue from the node, including cpu soft lockups:
Eventually a crash dump was triggered manually through serial console.
It showed that the lbug thread hanged in memory allocation while all CPUs are spinning trying to get a spinlock holding by the lbug thread itself.
PID: 38035 TASK: ffff880c83804100 CPU: 7 COMMAND: "mdt_446" #0 [ffff880cbaf7d6b0] schedule at ffffffff814ea122 #1 [ffff880cbaf7d778] __cond_resched at ffffffff81061b4a #2 [ffff880cbaf7d798] _cond_resched at ffffffff814eab30 #3 [ffff880cbaf7d7a8] kmem_cache_alloc_notrace at ffffffff8115f385 #4 [ffff880cbaf7d7d8] call_usermodehelper_setup at ffffffff81089e5d #5 [ffff880cbaf7d828] libcfs_run_upcall at ffffffffa04ee9c0 [libcfs] #6 [ffff880cbaf7d8a8] libcfs_run_lbug_upcall at ffffffffa04eed5d [libcfs] #7 [ffff880cbaf7d928] lbug_with_loc at ffffffffa04eee38 [libcfs] #8 [ffff880cbaf7d948] ldlm_export_flock_put at ffffffffa07e137a [ptlrpc] #9 [ffff880cbaf7d968] cfs_hash_bd_del_locked at ffffffffa04ffab1 [libcfs] #10 [ffff880cbaf7d998] cfs_hash_del at ffffffffa0502811 [libcfs] #11 [ffff880cbaf7d9e8] ldlm_flock_blocking_unlink at ffffffffa07e1b82 [ptlrpc] #12 [ffff880cbaf7d9f8] ldlm_process_flock_lock at ffffffffa07e25a2 [ptlrpc] #13 [ffff880cbaf7daf8] ldlm_reprocess_queue at ffffffffa07b6132 [ptlrpc] #14 [ffff880cbaf7db48] ldlm_process_flock_lock at ffffffffa07e265f [ptlrpc] #15 [ffff880cbaf7dc48] ldlm_lock_enqueue at ffffffffa07b7533 [ptlrpc] #16 [ffff880cbaf7dca8] ldlm_handle_enqueue0 at ffffffffa07df0ef [ptlrpc] #17 [ffff880cbaf7dd18] mdt_enqueue at ffffffffa0d18a16 [mdt] #18 [ffff880cbaf7dd38] mdt_handle_common at ffffffffa0d0bffa [mdt] #19 [ffff880cbaf7dd88] mdt_regular_handle at ffffffffa0d0ceb5 [mdt]