Details
-
Bug
-
Resolution: Fixed
-
Blocker
-
None
-
None
-
kernel-3.10.0-1160.11.1
lustre-2.12.6_2.llnl-1.ch6.x86_64
-
3
-
9223372036854775807
Description
hard LOCKUP and panic. Most frequently observed on OSTs after mdtest completes or after OST mount , a few seconds after "deleting orphan objects from" console log messages.
This appeared to be due to kernel timer behavior changes introduced between kernel-3.10.0-1160.6.1 and kernel-3.10.0-1160.11.1.
Fix in progress. See https://bugzilla.redhat.com/show_bug.cgi?id=1914011
For brevity, only the bottoms of the stacks, are listed below.
Kernel panic - not syncing: Hard LOCKUP CPU: 14 PID: 0 Comm: swapper/14 Kdump: loaded Tainted: P W OE ------------ 3.10.0-1160.11.1.1chaos.ch6.x86_64 #1 ... Call Trace: <NMI> [<ffffffffa47ae072>] dump_stack+0x19/0x1b [<ffffffffa47a71e7>] panic+0xe8/0x21f ... [<ffffffffa40b1edc>] ? run_timer_softirq+0xbc/0x370 <EOE> <IRQ> [<ffffffffa40a82fd>] __do_softirq+0xfd/0x2c0 [<ffffffffa47c56ec>] call_softirq+0x1c/0x30 [<ffffffffa4030995>] do_softirq+0x65/0xa0 [<ffffffffa40a86d5>] irq_exit+0x105/0x110 [<ffffffffa47c6c88>] smp_apic_timer_interrupt+0x48/0x60 [<ffffffffa47c31ba>] apic_timer_interrupt+0x16a/0x170 <EOI> [<ffffffffa40b3113>] ? get_next_timer_interrupt+0x103/0x270 [<ffffffffa45eace7>] ? cpuidle_enter_state+0x57/0xd0 [<ffffffffa45eae3e>] cpuidle_idle_call+0xde/0x270 [<ffffffffa403919e>] arch_cpu_idle+0xe/0xc0 [<ffffffffa410856a>] cpu_startup_entry+0x14a/0x1e0 [<ffffffffa405cbb7>] start_secondary+0x207/0x280 [<ffffffffa40000d5>] start_cpu+0x5/0x14 Another one that we see is quite similar to what is happening on cpu 21 in the original BZ. Call Trace: <NMI> [<ffffffff85fae072>] dump_stack+0x19/0x1b [<ffffffff85fa71e7>] panic+0xe8/0x21f ... [<ffffffff8591f4e8>] ? native_queued_spin_lock_slowpath+0x158/0x200 <EOE> [<ffffffff85fa7dd2>] queued_spin_lock_slowpath+0xb/0xf [<ffffffff85fb7197>] _raw_spin_lock_irqsave+0x47/0x50 [<ffffffff858b1b8b>] lock_timer_base.isra.38+0x2b/0x50 [<ffffffff858b244f>] try_to_del_timer_sync+0x2f/0x90 [<ffffffff858b2502>] del_timer_sync+0x52/0x60 [<ffffffff85fb1920>] schedule_timeout+0x180/0x320 [<ffffffff858b1870>] ? requeue_timers+0x1f0/0x1f0