[LU-14341] hard LOCKUP lustre servers with kernel-3.10.0-1160.11.1 - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Blocker
Fix Version/s: None
Affects Version/s: None
Labels:
- llnl
Environment:
kernel-3.10.0-1160.11.1
lustre-2.12.6_2.llnl-1.ch6.x86_64

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

hard LOCKUP and panic. Most frequently observed on OSTs after mdtest completes or after OST mount , a few seconds after "deleting orphan objects from" console log messages.

This appeared to be due to kernel timer behavior changes introduced between kernel-3.10.0-1160.6.1 and kernel-3.10.0-1160.11.1.

Fix in progress. See https://bugzilla.redhat.com/show_bug.cgi?id=1914011

For brevity, only the bottoms of the stacks, are listed below.

Kernel panic - not syncing: Hard LOCKUP
CPU: 14 PID: 0 Comm: swapper/14 Kdump: loaded Tainted: P        W  OE  ------------   3.10.0-1160.11.1.1chaos.ch6.x86_64 #1
...
Call Trace:
 <NMI>  [<ffffffffa47ae072>] dump_stack+0x19/0x1b
 [<ffffffffa47a71e7>] panic+0xe8/0x21f
...
 [<ffffffffa40b1edc>] ? run_timer_softirq+0xbc/0x370
 <EOE>  <IRQ>  [<ffffffffa40a82fd>] __do_softirq+0xfd/0x2c0
 [<ffffffffa47c56ec>] call_softirq+0x1c/0x30
 [<ffffffffa4030995>] do_softirq+0x65/0xa0
 [<ffffffffa40a86d5>] irq_exit+0x105/0x110
 [<ffffffffa47c6c88>] smp_apic_timer_interrupt+0x48/0x60
 [<ffffffffa47c31ba>] apic_timer_interrupt+0x16a/0x170
 <EOI>  [<ffffffffa40b3113>] ? get_next_timer_interrupt+0x103/0x270
 [<ffffffffa45eace7>] ? cpuidle_enter_state+0x57/0xd0
 [<ffffffffa45eae3e>] cpuidle_idle_call+0xde/0x270
 [<ffffffffa403919e>] arch_cpu_idle+0xe/0xc0
 [<ffffffffa410856a>] cpu_startup_entry+0x14a/0x1e0
 [<ffffffffa405cbb7>] start_secondary+0x207/0x280
 [<ffffffffa40000d5>] start_cpu+0x5/0x14
 
Another one that we see is quite similar to what is happening on cpu 21 in the original BZ.

Call Trace:
 <NMI>  [<ffffffff85fae072>] dump_stack+0x19/0x1b
 [<ffffffff85fa71e7>] panic+0xe8/0x21f
...
 [<ffffffff8591f4e8>] ? native_queued_spin_lock_slowpath+0x158/0x200
 <EOE>  [<ffffffff85fa7dd2>] queued_spin_lock_slowpath+0xb/0xf
 [<ffffffff85fb7197>] _raw_spin_lock_irqsave+0x47/0x50
 [<ffffffff858b1b8b>] lock_timer_base.isra.38+0x2b/0x50
 [<ffffffff858b244f>] try_to_del_timer_sync+0x2f/0x90
 [<ffffffff858b2502>] del_timer_sync+0x52/0x60
 [<ffffffff85fb1920>] schedule_timeout+0x180/0x320
 [<ffffffff858b1870>] ? requeue_timers+0x1f0/0x1f0

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

timer.patch
3 kB
27/Feb/21 4:48 PM

Issue Links

is related to

LU-14226 kernel update [RHEL7.9 3.10.0-1160.11.1.el7]

Resolved

LU-14395 kernel update [RHEL7.9 3.10.0-1160.15.2.el7]

Resolved

LU-14527 kernel update [RHEL7.9 3.10.0-1160.21.1.el7]

Resolved

Activity

[LU-14341] hard LOCKUP lustre servers with kernel-3.10.0-1160.11.1

Andreas Dilger added a comment - 23/Feb/21 7:29 PM

Is there a copy of that patch somewhere? I was going to revert it and see if that fixes the problem.

Andreas Dilger added a comment - 23/Feb/21 7:29 PM Is there a copy of that patch somewhere? I was going to revert it and see if that fixes the problem.

Jeremy Filizetti added a comment - 23/Feb/21 5:59 PM

In the digging I did I found no indication of an upstream patch for that. It looks like it was Redhat only to address some customer issue.

Jeremy Filizetti added a comment - 23/Feb/21 5:59 PM In the digging I did I found no indication of an upstream patch for that. It looks like it was Redhat only to address some customer issue.

Andreas Dilger added a comment - 23/Feb/21 5:32 PM

What is the commit hash for that patch? I can't find it in the upstream kernel.

Andreas Dilger added a comment - 23/Feb/21 5:32 PM What is the commit hash for that patch? I can't find it in the upstream kernel.

Jeremy Filizetti added a comment - 08/Feb/21 12:02 PM

My investigation found it was introduced in 3.10.0-1160.8.1.el7 by

[kernel] timer: Fix lockup in __run_timers() caused by large jiffies/timer_jiffies delta (Waiman Long) [1849716]

Jeremy Filizetti added a comment - 08/Feb/21 12:02 PM My investigation found it was introduced in 3.10.0-1160.8.1.el7 by [kernel] timer: Fix lockup in __run_timers() caused by large jiffies/timer_jiffies delta (Waiman Long) [1849716]

Stephane Thiell added a comment - 08/Feb/21 4:11 AM

FYI I tried Lustre (server) with kernel-3.10.0-1160.15.2.el7 and immediately got the same hard lockups, so it looks like this issue hasn't been fixed by Redhat yet...

Does anyone know if this incompatibility with Lustre since 3.10.0-1160.11.1 was introduced by the following patch?

Loop in __run_timers() because base->timer_jiffies is very far behind causes a lockup condition. (BZ#1849716)

Stephane Thiell added a comment - 08/Feb/21 4:11 AM FYI I tried Lustre (server) with kernel-3.10.0-1160.15.2.el7 and immediately got the same hard lockups, so it looks like this issue hasn't been fixed by Redhat yet... Does anyone know if this incompatibility with Lustre since 3.10.0-1160.11.1 was introduced by the following patch? Loop in __run_timers() because base->timer_jiffies is very far behind causes a lockup condition. (BZ#1849716)

Olaf Faaland added a comment - 20/Jan/21 7:05 PM

Peter,

Sorry I didn't see your comment earlier. Yes, this ticket is for the benefit of others.

Olaf Faaland added a comment - 20/Jan/21 7:05 PM Peter, Sorry I didn't see your comment earlier. Yes, this ticket is for the benefit of others.

Peter Jones added a comment - 20/Jan/21 1:02 AM

Olaf

Do I read this correctly that you are mostly just opening this ticket for the benefit of others and not expecting action on our part at this time?

Peter

Peter Jones added a comment - 20/Jan/21 1:02 AM Olaf Do I read this correctly that you are mostly just opening this ticket for the benefit of others and not expecting action on our part at this time? Peter

Olaf Faaland added a comment - 20/Jan/21 1:02 AM - edited

Related to ~~LU-14226~~

Olaf Faaland added a comment - 20/Jan/21 1:02 AM - edited Related to LU-14226

Stephane Thiell added a comment - 20/Jan/21 1:02 AM

We are seeing the same hard lockups on new servers (Dell R6525) with Lustre 2.12.6 on CentOS 7.9 using kernel 3.10.0-1160.11.1.el7 + Lustre patches for rhel7.9. It's very easy to trigger, just one client doing some moderate I/Os. I was just about to report it too! Olaf was faster

Example of crash dump:

crash> ps | grep ">"
...
>  4588      2  51  ffff9c8d1dbed280  RU   0.0       0      0  [ll_ost01_000]
>  4589      2  28  ffff9c8d1dbee300  RU   0.0       0      0  [ll_ost01_001]
>  4590      2  60  ffff9c8d2dfc8000  RU   0.0       0      0  [ll_ost01_002]
>  4651      2  50  ffff9c8d15ea5280  RU   0.0       0      0  [ll_ost01_005]
>  4655      2  18  ffff9c73e47f8000  RU   0.0       0      0  [ll_ost01_008]
>  4656      2  17  ffff9c73e47fd280  RU   0.0       0      0  [ll_ost01_009]
>  4663      2  16  ffff9c8d349ab180  RU   0.0       0      0  [ll_ost_io01_006]

I don't have access to this RedHat bz, but I can also see "lock_timer_base" in one of the backtrace (for 4588 ll_ost01_000):

crash> bt 4588
PID: 4588   TASK: ffff9c8d1dbed280  CPU: 51  COMMAND: "ll_ost01_000"
 #0 [ffff9cb1ff0c8b58] crash_nmi_callback at ffffffffad85855b
 #1 [ffff9cb1ff0c8b68] nmi_panic_self_stop at ffffffffad858c86
 #2 [ffff9cb1ff0c8b80] nmi_panic at ffffffffad89aefb
 #3 [ffff9cb1ff0c8b90] watchdog_overflow_callback at ffffffffad94ef81
 #4 [ffff9cb1ff0c8ba8] __perf_event_overflow at ffffffffad9a8b07
 #5 [ffff9cb1ff0c8be0] perf_event_overflow at ffffffffad9b2304
 #6 [ffff9cb1ff0c8bf0] x86_pmu_handle_irq at ffffffffad805595
 #7 [ffff9cb1ff0c8e20] amd_pmu_handle_irq at ffffffffad8066f5
 #8 [ffff9cb1ff0c8e38] perf_event_nmi_handler at ffffffffadf8b031
 #9 [ffff9cb1ff0c8e58] nmi_handle at ffffffffadf8c93c
#10 [ffff9cb1ff0c8eb0] do_nmi at ffffffffadf8cb5d
#11 [ffff9cb1ff0c8ef0] end_repeat_nmi at ffffffffadf8bd9c
    [exception RIP: native_queued_spin_lock_slowpath+464]
    RIP: ffffffffad917920  RSP: ffff9c8d2dfbf6e8  RFLAGS: 00000002
    RAX: 0000000000810101  RBX: 0000000000000282  RCX: 0000000000000001
    RDX: 0000000000000101  RSI: 0000000000000001  RDI: ffff9cb1ff013940
    RBP: ffff9c8d2dfbf6e8   R8: 0000000000000101   R9: 00000000000000c0
    R10: 0000000000000000  R11: 0000000000000000  R12: ffff9cb1ff013940
    R13: ffffffffc158a138  R14: ffff9c8d2dfbf758  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
--- <NMI exception stack> ---
#12 [ffff9c8d2dfbf6e8] native_queued_spin_lock_slowpath at ffffffffad917920
#13 [ffff9c8d2dfbf6f0] queued_spin_lock_slowpath at ffffffffadf7bfe5
#14 [ffff9c8d2dfbf700] _raw_spin_lock_irqsave at ffffffffadf8a777
#15 [ffff9c8d2dfbf718] lock_timer_base at ffffffffad8adbeb
#16 [ffff9c8d2dfbf748] mod_timer at ffffffffad8aed14
#17 [ffff9c8d2dfbf798] __ldlm_add_waiting_lock at ffffffffc146fac7 [ptlrpc]
#18 [ffff9c8d2dfbf7c0] ldlm_add_waiting_lock at ffffffffc146ffb3 [ptlrpc]
#19 [ffff9c8d2dfbf7e8] ldlm_server_blocking_ast at ffffffffc147533c [ptlrpc]
#20 [ffff9c8d2dfbf828] tgt_blocking_ast at ffffffffc15018a9 [ptlrpc]
#21 [ffff9c8d2dfbf8e8] ldlm_work_bl_ast_lock at ffffffffc144a26c [ptlrpc]
#22 [ffff9c8d2dfbf970] ptlrpc_set_wait at ffffffffc1490f42 [ptlrpc]
#23 [ffff9c8d2dfbfa18] ldlm_run_ast_work at ffffffffc144e145 [ptlrpc]
#24 [ffff9c8d2dfbfa48] ldlm_handle_conflict_lock at ffffffffc144ef50 [ptlrpc]
#25 [ffff9c8d2dfbfa80] ldlm_lock_enqueue at ffffffffc144f5e0 [ptlrpc]
#26 [ffff9c8d2dfbfaf8] ldlm_cli_enqueue_local at ffffffffc146a227 [ptlrpc]
#27 [ffff9c8d2dfbfb98] ofd_destroy_by_fid at ffffffffc1bafb21 [ofd]
#28 [ffff9c8d2dfbfc70] ofd_destroy_hdl at ffffffffc1ba6797 [ofd]
#29 [ffff9c8d2dfbfcd0] tgt_request_handle at ffffffffc1506f1a [ptlrpc]
#30 [ffff9c8d2dfbfd58] ptlrpc_server_handle_request at ffffffffc14ab88b [ptlrpc]
#31 [ffff9c8d2dfbfdf8] ptlrpc_main at ffffffffc14af1f4 [ptlrpc]
#32 [ffff9c8d2dfbfec8] kthread at ffffffffad8c5e71
#33 [ffff9c8d2dfbff50] ret_from_fork_nospec_begin at ffffffffadf94ddd

Stephane Thiell added a comment - 20/Jan/21 1:02 AM We are seeing the same hard lockups on new servers (Dell R6525) with Lustre 2.12.6 on CentOS 7.9 using kernel 3.10.0-1160.11.1.el7 + Lustre patches for rhel7.9. It's very easy to trigger, just one client doing some moderate I/Os. I was just about to report it too! Olaf was faster Example of crash dump: crash> ps | grep ">" ... > 4588 2 51 ffff9c8d1dbed280 RU 0.0 0 0 [ll_ost01_000] > 4589 2 28 ffff9c8d1dbee300 RU 0.0 0 0 [ll_ost01_001] > 4590 2 60 ffff9c8d2dfc8000 RU 0.0 0 0 [ll_ost01_002] > 4651 2 50 ffff9c8d15ea5280 RU 0.0 0 0 [ll_ost01_005] > 4655 2 18 ffff9c73e47f8000 RU 0.0 0 0 [ll_ost01_008] > 4656 2 17 ffff9c73e47fd280 RU 0.0 0 0 [ll_ost01_009] > 4663 2 16 ffff9c8d349ab180 RU 0.0 0 0 [ll_ost_io01_006] I don't have access to this RedHat bz, but I can also see "lock_timer_base" in one of the backtrace (for 4588 ll_ost01_000): crash> bt 4588 PID: 4588 TASK: ffff9c8d1dbed280 CPU: 51 COMMAND: "ll_ost01_000" #0 [ffff9cb1ff0c8b58] crash_nmi_callback at ffffffffad85855b #1 [ffff9cb1ff0c8b68] nmi_panic_self_stop at ffffffffad858c86 #2 [ffff9cb1ff0c8b80] nmi_panic at ffffffffad89aefb #3 [ffff9cb1ff0c8b90] watchdog_overflow_callback at ffffffffad94ef81 #4 [ffff9cb1ff0c8ba8] __perf_event_overflow at ffffffffad9a8b07 #5 [ffff9cb1ff0c8be0] perf_event_overflow at ffffffffad9b2304 #6 [ffff9cb1ff0c8bf0] x86_pmu_handle_irq at ffffffffad805595 #7 [ffff9cb1ff0c8e20] amd_pmu_handle_irq at ffffffffad8066f5 #8 [ffff9cb1ff0c8e38] perf_event_nmi_handler at ffffffffadf8b031 #9 [ffff9cb1ff0c8e58] nmi_handle at ffffffffadf8c93c #10 [ffff9cb1ff0c8eb0] do_nmi at ffffffffadf8cb5d #11 [ffff9cb1ff0c8ef0] end_repeat_nmi at ffffffffadf8bd9c [exception RIP: native_queued_spin_lock_slowpath+464] RIP: ffffffffad917920 RSP: ffff9c8d2dfbf6e8 RFLAGS: 00000002 RAX: 0000000000810101 RBX: 0000000000000282 RCX: 0000000000000001 RDX: 0000000000000101 RSI: 0000000000000001 RDI: ffff9cb1ff013940 RBP: ffff9c8d2dfbf6e8 R8: 0000000000000101 R9: 00000000000000c0 R10: 0000000000000000 R11: 0000000000000000 R12: ffff9cb1ff013940 R13: ffffffffc158a138 R14: ffff9c8d2dfbf758 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #12 [ffff9c8d2dfbf6e8] native_queued_spin_lock_slowpath at ffffffffad917920 #13 [ffff9c8d2dfbf6f0] queued_spin_lock_slowpath at ffffffffadf7bfe5 #14 [ffff9c8d2dfbf700] _raw_spin_lock_irqsave at ffffffffadf8a777 #15 [ffff9c8d2dfbf718] lock_timer_base at ffffffffad8adbeb #16 [ffff9c8d2dfbf748] mod_timer at ffffffffad8aed14 #17 [ffff9c8d2dfbf798] __ldlm_add_waiting_lock at ffffffffc146fac7 [ptlrpc] #18 [ffff9c8d2dfbf7c0] ldlm_add_waiting_lock at ffffffffc146ffb3 [ptlrpc] #19 [ffff9c8d2dfbf7e8] ldlm_server_blocking_ast at ffffffffc147533c [ptlrpc] #20 [ffff9c8d2dfbf828] tgt_blocking_ast at ffffffffc15018a9 [ptlrpc] #21 [ffff9c8d2dfbf8e8] ldlm_work_bl_ast_lock at ffffffffc144a26c [ptlrpc] #22 [ffff9c8d2dfbf970] ptlrpc_set_wait at ffffffffc1490f42 [ptlrpc] #23 [ffff9c8d2dfbfa18] ldlm_run_ast_work at ffffffffc144e145 [ptlrpc] #24 [ffff9c8d2dfbfa48] ldlm_handle_conflict_lock at ffffffffc144ef50 [ptlrpc] #25 [ffff9c8d2dfbfa80] ldlm_lock_enqueue at ffffffffc144f5e0 [ptlrpc] #26 [ffff9c8d2dfbfaf8] ldlm_cli_enqueue_local at ffffffffc146a227 [ptlrpc] #27 [ffff9c8d2dfbfb98] ofd_destroy_by_fid at ffffffffc1bafb21 [ofd] #28 [ffff9c8d2dfbfc70] ofd_destroy_hdl at ffffffffc1ba6797 [ofd] #29 [ffff9c8d2dfbfcd0] tgt_request_handle at ffffffffc1506f1a [ptlrpc] #30 [ffff9c8d2dfbfd58] ptlrpc_server_handle_request at ffffffffc14ab88b [ptlrpc] #31 [ffff9c8d2dfbfdf8] ptlrpc_main at ffffffffc14af1f4 [ptlrpc] #32 [ffff9c8d2dfbfec8] kthread at ffffffffad8c5e71 #33 [ffff9c8d2dfbff50] ret_from_fork_nospec_begin at ffffffffadf94ddd

Olaf Faaland added a comment - 20/Jan/21 12:50 AM - edited

Redhat has a fix which we tested successfully on our system. The patch is moving through their system. See their bugzilla for details.

Olaf Faaland added a comment - 20/Jan/21 12:50 AM - edited Redhat has a fix which we tested successfully on our system. The patch is moving through their system. See their bugzilla for details.

Olaf Faaland added a comment - 20/Jan/21 12:47 AM

For my reference, my local ticket is TOSS-4985

Olaf Faaland added a comment - 20/Jan/21 12:47 AM For my reference, my local ticket is TOSS-4985

People

Assignee:: Peter Jones

Reporter:: Olaf Faaland

Votes:: 1 Vote for this issue

Watchers:: 17 Start watching this issue

Dates

Created:: 20/Jan/21 12:45 AM

Updated:: 31/Dec/23 6:47 PM

Resolved:: 05/May/21 9:12 PM