Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-14341

hard LOCKUP lustre servers with kernel-3.10.0-1160.11.1

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • None
    • None
    • kernel-3.10.0-1160.11.1
      lustre-2.12.6_2.llnl-1.ch6.x86_64
    • 3
    • 9223372036854775807

    Description

      hard LOCKUP and panic.  Most frequently observed on OSTs after mdtest completes or after OST mount , a few seconds after "deleting orphan objects from" console log messages. 

      This appeared to be due to kernel timer behavior changes introduced between kernel-3.10.0-1160.6.1 and kernel-3.10.0-1160.11.1. 

      Fix in progress.  See https://bugzilla.redhat.com/show_bug.cgi?id=1914011

      For brevity, only the bottoms of the stacks, are listed below.

      Kernel panic - not syncing: Hard LOCKUP
      CPU: 14 PID: 0 Comm: swapper/14 Kdump: loaded Tainted: P        W  OE  ------------   3.10.0-1160.11.1.1chaos.ch6.x86_64 #1
      ...
      Call Trace:
       <NMI>  [<ffffffffa47ae072>] dump_stack+0x19/0x1b
       [<ffffffffa47a71e7>] panic+0xe8/0x21f
      ...
       [<ffffffffa40b1edc>] ? run_timer_softirq+0xbc/0x370
       <EOE>  <IRQ>  [<ffffffffa40a82fd>] __do_softirq+0xfd/0x2c0
       [<ffffffffa47c56ec>] call_softirq+0x1c/0x30
       [<ffffffffa4030995>] do_softirq+0x65/0xa0
       [<ffffffffa40a86d5>] irq_exit+0x105/0x110
       [<ffffffffa47c6c88>] smp_apic_timer_interrupt+0x48/0x60
       [<ffffffffa47c31ba>] apic_timer_interrupt+0x16a/0x170
       <EOI>  [<ffffffffa40b3113>] ? get_next_timer_interrupt+0x103/0x270
       [<ffffffffa45eace7>] ? cpuidle_enter_state+0x57/0xd0
       [<ffffffffa45eae3e>] cpuidle_idle_call+0xde/0x270
       [<ffffffffa403919e>] arch_cpu_idle+0xe/0xc0
       [<ffffffffa410856a>] cpu_startup_entry+0x14a/0x1e0
       [<ffffffffa405cbb7>] start_secondary+0x207/0x280
       [<ffffffffa40000d5>] start_cpu+0x5/0x14
       
      Another one that we see is quite similar to what is happening on cpu 21 in the original BZ.
      
      Call Trace:
       <NMI>  [<ffffffff85fae072>] dump_stack+0x19/0x1b
       [<ffffffff85fa71e7>] panic+0xe8/0x21f
      ...
       [<ffffffff8591f4e8>] ? native_queued_spin_lock_slowpath+0x158/0x200
       <EOE>  [<ffffffff85fa7dd2>] queued_spin_lock_slowpath+0xb/0xf
       [<ffffffff85fb7197>] _raw_spin_lock_irqsave+0x47/0x50
       [<ffffffff858b1b8b>] lock_timer_base.isra.38+0x2b/0x50
       [<ffffffff858b244f>] try_to_del_timer_sync+0x2f/0x90
       [<ffffffff858b2502>] del_timer_sync+0x52/0x60
       [<ffffffff85fb1920>] schedule_timeout+0x180/0x320
       [<ffffffff858b1870>] ? requeue_timers+0x1f0/0x1f0 

      Attachments

        Issue Links

          Activity

            [LU-14341] hard LOCKUP lustre servers with kernel-3.10.0-1160.11.1

            Is there a copy of that patch somewhere? I was going to revert it and see if that fixes the problem.

            adilger Andreas Dilger added a comment - Is there a copy of that patch somewhere? I was going to revert it and see if that fixes the problem.

            In the digging I did I found no indication of an upstream patch for that. It looks like it was Redhat only to address some customer issue.

            jfilizetti Jeremy Filizetti added a comment - In the digging I did I found no indication of an upstream patch for that. It looks like it was Redhat only to address some customer issue.

            What is the commit hash for that patch? I can't find it in the upstream kernel.

            adilger Andreas Dilger added a comment - What is the commit hash for that patch? I can't find it in the upstream kernel.

            My investigation found it was introduced in 3.10.0-1160.8.1.el7 by

            • [kernel] timer: Fix lockup in __run_timers() caused by large jiffies/timer_jiffies delta (Waiman Long) [1849716]
            jfilizetti Jeremy Filizetti added a comment - My investigation found it was introduced in 3.10.0-1160.8.1.el7 by [kernel] timer: Fix lockup in __run_timers() caused by large jiffies/timer_jiffies delta (Waiman Long) [1849716]

            FYI I tried Lustre (server) with kernel-3.10.0-1160.15.2.el7 and immediately got the same hard lockups, so it looks like this issue hasn't been fixed by Redhat yet...

            Does anyone know if this incompatibility with Lustre since 3.10.0-1160.11.1 was introduced by the following patch?

            Loop in __run_timers() because base->timer_jiffies is very far behind causes a lockup condition. (BZ#1849716)
            
            sthiell Stephane Thiell added a comment - FYI I tried Lustre (server) with kernel-3.10.0-1160.15.2.el7 and immediately got the same hard lockups, so it looks like this issue hasn't been fixed by Redhat yet... Does anyone know if this incompatibility with Lustre since 3.10.0-1160.11.1 was introduced by the following patch? Loop in __run_timers() because base->timer_jiffies is very far behind causes a lockup condition. (BZ#1849716)
            ofaaland Olaf Faaland added a comment -

            Peter,

            Sorry I didn't see your comment earlier.  Yes, this ticket is for the benefit of others.

            ofaaland Olaf Faaland added a comment - Peter, Sorry I didn't see your comment earlier.  Yes, this ticket is for the benefit of others.
            pjones Peter Jones added a comment -

            Olaf

            Do I read this correctly that you are mostly just opening this ticket for the benefit of others and not expecting action on our part at this time?

            Peter

            pjones Peter Jones added a comment - Olaf Do I read this correctly that you are mostly just opening this ticket for the benefit of others and not expecting action on our part at this time? Peter
            ofaaland Olaf Faaland added a comment - - edited

            Related to LU-14226

            ofaaland Olaf Faaland added a comment - - edited Related to  LU-14226

            We are seeing the same hard lockups on new servers (Dell R6525) with Lustre 2.12.6 on CentOS 7.9 using kernel 3.10.0-1160.11.1.el7 + Lustre patches for rhel7.9. It's very easy to trigger, just one client doing some moderate I/Os. I was just about to report it too! Olaf was faster

             Example of crash dump:

            crash> ps | grep ">"
            ...
            >  4588      2  51  ffff9c8d1dbed280  RU   0.0       0      0  [ll_ost01_000]
            >  4589      2  28  ffff9c8d1dbee300  RU   0.0       0      0  [ll_ost01_001]
            >  4590      2  60  ffff9c8d2dfc8000  RU   0.0       0      0  [ll_ost01_002]
            >  4651      2  50  ffff9c8d15ea5280  RU   0.0       0      0  [ll_ost01_005]
            >  4655      2  18  ffff9c73e47f8000  RU   0.0       0      0  [ll_ost01_008]
            >  4656      2  17  ffff9c73e47fd280  RU   0.0       0      0  [ll_ost01_009]
            >  4663      2  16  ffff9c8d349ab180  RU   0.0       0      0  [ll_ost_io01_006]

            I don't have access to this RedHat bz, but I can also see "lock_timer_base" in one of the backtrace (for 4588 ll_ost01_000):

            crash> bt 4588
            PID: 4588   TASK: ffff9c8d1dbed280  CPU: 51  COMMAND: "ll_ost01_000"
             #0 [ffff9cb1ff0c8b58] crash_nmi_callback at ffffffffad85855b
             #1 [ffff9cb1ff0c8b68] nmi_panic_self_stop at ffffffffad858c86
             #2 [ffff9cb1ff0c8b80] nmi_panic at ffffffffad89aefb
             #3 [ffff9cb1ff0c8b90] watchdog_overflow_callback at ffffffffad94ef81
             #4 [ffff9cb1ff0c8ba8] __perf_event_overflow at ffffffffad9a8b07
             #5 [ffff9cb1ff0c8be0] perf_event_overflow at ffffffffad9b2304
             #6 [ffff9cb1ff0c8bf0] x86_pmu_handle_irq at ffffffffad805595
             #7 [ffff9cb1ff0c8e20] amd_pmu_handle_irq at ffffffffad8066f5
             #8 [ffff9cb1ff0c8e38] perf_event_nmi_handler at ffffffffadf8b031
             #9 [ffff9cb1ff0c8e58] nmi_handle at ffffffffadf8c93c
            #10 [ffff9cb1ff0c8eb0] do_nmi at ffffffffadf8cb5d
            #11 [ffff9cb1ff0c8ef0] end_repeat_nmi at ffffffffadf8bd9c
                [exception RIP: native_queued_spin_lock_slowpath+464]
                RIP: ffffffffad917920  RSP: ffff9c8d2dfbf6e8  RFLAGS: 00000002
                RAX: 0000000000810101  RBX: 0000000000000282  RCX: 0000000000000001
                RDX: 0000000000000101  RSI: 0000000000000001  RDI: ffff9cb1ff013940
                RBP: ffff9c8d2dfbf6e8   R8: 0000000000000101   R9: 00000000000000c0
                R10: 0000000000000000  R11: 0000000000000000  R12: ffff9cb1ff013940
                R13: ffffffffc158a138  R14: ffff9c8d2dfbf758  R15: 0000000000000000
                ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
            --- <NMI exception stack> ---
            #12 [ffff9c8d2dfbf6e8] native_queued_spin_lock_slowpath at ffffffffad917920
            #13 [ffff9c8d2dfbf6f0] queued_spin_lock_slowpath at ffffffffadf7bfe5
            #14 [ffff9c8d2dfbf700] _raw_spin_lock_irqsave at ffffffffadf8a777
            #15 [ffff9c8d2dfbf718] lock_timer_base at ffffffffad8adbeb
            #16 [ffff9c8d2dfbf748] mod_timer at ffffffffad8aed14
            #17 [ffff9c8d2dfbf798] __ldlm_add_waiting_lock at ffffffffc146fac7 [ptlrpc]
            #18 [ffff9c8d2dfbf7c0] ldlm_add_waiting_lock at ffffffffc146ffb3 [ptlrpc]
            #19 [ffff9c8d2dfbf7e8] ldlm_server_blocking_ast at ffffffffc147533c [ptlrpc]
            #20 [ffff9c8d2dfbf828] tgt_blocking_ast at ffffffffc15018a9 [ptlrpc]
            #21 [ffff9c8d2dfbf8e8] ldlm_work_bl_ast_lock at ffffffffc144a26c [ptlrpc]
            #22 [ffff9c8d2dfbf970] ptlrpc_set_wait at ffffffffc1490f42 [ptlrpc]
            #23 [ffff9c8d2dfbfa18] ldlm_run_ast_work at ffffffffc144e145 [ptlrpc]
            #24 [ffff9c8d2dfbfa48] ldlm_handle_conflict_lock at ffffffffc144ef50 [ptlrpc]
            #25 [ffff9c8d2dfbfa80] ldlm_lock_enqueue at ffffffffc144f5e0 [ptlrpc]
            #26 [ffff9c8d2dfbfaf8] ldlm_cli_enqueue_local at ffffffffc146a227 [ptlrpc]
            #27 [ffff9c8d2dfbfb98] ofd_destroy_by_fid at ffffffffc1bafb21 [ofd]
            #28 [ffff9c8d2dfbfc70] ofd_destroy_hdl at ffffffffc1ba6797 [ofd]
            #29 [ffff9c8d2dfbfcd0] tgt_request_handle at ffffffffc1506f1a [ptlrpc]
            #30 [ffff9c8d2dfbfd58] ptlrpc_server_handle_request at ffffffffc14ab88b [ptlrpc]
            #31 [ffff9c8d2dfbfdf8] ptlrpc_main at ffffffffc14af1f4 [ptlrpc]
            #32 [ffff9c8d2dfbfec8] kthread at ffffffffad8c5e71
            #33 [ffff9c8d2dfbff50] ret_from_fork_nospec_begin at ffffffffadf94ddd
            
            sthiell Stephane Thiell added a comment - We are seeing the same hard lockups on new servers (Dell R6525) with Lustre 2.12.6 on CentOS 7.9 using kernel 3.10.0-1160.11.1.el7 + Lustre patches for rhel7.9. It's very easy to trigger, just one client doing some moderate I/Os. I was just about to report it too! Olaf was faster  Example of crash dump: crash> ps | grep ">" ... >  4588      2  51  ffff9c8d1dbed280  RU   0.0       0      0  [ll_ost01_000] >  4589      2  28  ffff9c8d1dbee300  RU   0.0       0      0  [ll_ost01_001] >  4590      2  60  ffff9c8d2dfc8000  RU   0.0       0      0  [ll_ost01_002] >  4651      2  50  ffff9c8d15ea5280  RU   0.0       0      0  [ll_ost01_005] >  4655      2  18  ffff9c73e47f8000  RU   0.0       0      0  [ll_ost01_008] >  4656      2  17  ffff9c73e47fd280  RU   0.0       0      0  [ll_ost01_009] >  4663      2  16  ffff9c8d349ab180  RU   0.0       0      0  [ll_ost_io01_006] I don't have access to this RedHat bz, but I can also see "lock_timer_base" in one of the backtrace (for 4588 ll_ost01_000): crash> bt 4588 PID: 4588 TASK: ffff9c8d1dbed280 CPU: 51 COMMAND: "ll_ost01_000" #0 [ffff9cb1ff0c8b58] crash_nmi_callback at ffffffffad85855b #1 [ffff9cb1ff0c8b68] nmi_panic_self_stop at ffffffffad858c86 #2 [ffff9cb1ff0c8b80] nmi_panic at ffffffffad89aefb #3 [ffff9cb1ff0c8b90] watchdog_overflow_callback at ffffffffad94ef81 #4 [ffff9cb1ff0c8ba8] __perf_event_overflow at ffffffffad9a8b07 #5 [ffff9cb1ff0c8be0] perf_event_overflow at ffffffffad9b2304 #6 [ffff9cb1ff0c8bf0] x86_pmu_handle_irq at ffffffffad805595 #7 [ffff9cb1ff0c8e20] amd_pmu_handle_irq at ffffffffad8066f5 #8 [ffff9cb1ff0c8e38] perf_event_nmi_handler at ffffffffadf8b031 #9 [ffff9cb1ff0c8e58] nmi_handle at ffffffffadf8c93c #10 [ffff9cb1ff0c8eb0] do_nmi at ffffffffadf8cb5d #11 [ffff9cb1ff0c8ef0] end_repeat_nmi at ffffffffadf8bd9c [exception RIP: native_queued_spin_lock_slowpath+464] RIP: ffffffffad917920 RSP: ffff9c8d2dfbf6e8 RFLAGS: 00000002 RAX: 0000000000810101 RBX: 0000000000000282 RCX: 0000000000000001 RDX: 0000000000000101 RSI: 0000000000000001 RDI: ffff9cb1ff013940 RBP: ffff9c8d2dfbf6e8 R8: 0000000000000101 R9: 00000000000000c0 R10: 0000000000000000 R11: 0000000000000000 R12: ffff9cb1ff013940 R13: ffffffffc158a138 R14: ffff9c8d2dfbf758 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #12 [ffff9c8d2dfbf6e8] native_queued_spin_lock_slowpath at ffffffffad917920 #13 [ffff9c8d2dfbf6f0] queued_spin_lock_slowpath at ffffffffadf7bfe5 #14 [ffff9c8d2dfbf700] _raw_spin_lock_irqsave at ffffffffadf8a777 #15 [ffff9c8d2dfbf718] lock_timer_base at ffffffffad8adbeb #16 [ffff9c8d2dfbf748] mod_timer at ffffffffad8aed14 #17 [ffff9c8d2dfbf798] __ldlm_add_waiting_lock at ffffffffc146fac7 [ptlrpc] #18 [ffff9c8d2dfbf7c0] ldlm_add_waiting_lock at ffffffffc146ffb3 [ptlrpc] #19 [ffff9c8d2dfbf7e8] ldlm_server_blocking_ast at ffffffffc147533c [ptlrpc] #20 [ffff9c8d2dfbf828] tgt_blocking_ast at ffffffffc15018a9 [ptlrpc] #21 [ffff9c8d2dfbf8e8] ldlm_work_bl_ast_lock at ffffffffc144a26c [ptlrpc] #22 [ffff9c8d2dfbf970] ptlrpc_set_wait at ffffffffc1490f42 [ptlrpc] #23 [ffff9c8d2dfbfa18] ldlm_run_ast_work at ffffffffc144e145 [ptlrpc] #24 [ffff9c8d2dfbfa48] ldlm_handle_conflict_lock at ffffffffc144ef50 [ptlrpc] #25 [ffff9c8d2dfbfa80] ldlm_lock_enqueue at ffffffffc144f5e0 [ptlrpc] #26 [ffff9c8d2dfbfaf8] ldlm_cli_enqueue_local at ffffffffc146a227 [ptlrpc] #27 [ffff9c8d2dfbfb98] ofd_destroy_by_fid at ffffffffc1bafb21 [ofd] #28 [ffff9c8d2dfbfc70] ofd_destroy_hdl at ffffffffc1ba6797 [ofd] #29 [ffff9c8d2dfbfcd0] tgt_request_handle at ffffffffc1506f1a [ptlrpc] #30 [ffff9c8d2dfbfd58] ptlrpc_server_handle_request at ffffffffc14ab88b [ptlrpc] #31 [ffff9c8d2dfbfdf8] ptlrpc_main at ffffffffc14af1f4 [ptlrpc] #32 [ffff9c8d2dfbfec8] kthread at ffffffffad8c5e71 #33 [ffff9c8d2dfbff50] ret_from_fork_nospec_begin at ffffffffadf94ddd
            ofaaland Olaf Faaland added a comment - - edited

            Redhat has a fix which we tested successfully on our system.  The patch is moving through their system.  See their bugzilla for details.

            ofaaland Olaf Faaland added a comment - - edited Redhat has a fix which we tested successfully on our system.  The patch is moving through their system.  See their bugzilla for details.

            For my reference, my local ticket is TOSS-4985

            ofaaland Olaf Faaland added a comment - For my reference, my local ticket is TOSS-4985

            People

              pjones Peter Jones
              ofaaland Olaf Faaland
              Votes:
              1 Vote for this issue
              Watchers:
              17 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: