Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11724

recovery tests crash with ‘page allocation failure: order:0, mode:0x1080020(GFP_ATOMIC)’

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.12.0, Lustre 2.10.6
    • None
    • 3
    • 9223372036854775807

    Description

      recovery-double-scale test_pairwise_fail, recovery-random-scale test_fail_client_mds and recovery-mds-scale test_failover_mds all have client crashes with similar information in the kernel crash log.

      Looking at the failover test session results at https://testing.whamcloud.com/test_sessions/d1f52b33-1a69-47d0-a0c4-03e90d450320, we see the following in the recovery-double-scale test pairwise_fail kernel-crash log on the client:

      [  616.142886] Lustre: DEBUG MARKER: cat /tmp/client-load.pid
      [  709.882775] kswapd0: page allocation failure: order:0, mode:0x1080020(GFP_ATOMIC)
      [  709.882805] CPU: 0 PID: 32 Comm: kswapd0 Tainted: G        W  OE   N  4.4.155-94.50-default #1
      [  709.882806] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [  709.882814]  0000000000000000 ffffffff8132ba20 0000000000000000 ffff88007fc03ce8
      [  709.882819]  ffffffff8119ca92 0108002000000030 0000000000002d60 0000000000000000
      [  709.882820]  00000000000011e0 ffffffff810b79bb ffff88007ca6c4c0 ffff88007fc03d28
      [  709.882821] Call Trace:
      [  709.882926]  [<ffffffff81019aa9>] dump_trace+0x59/0x340
      [  709.882930]  [<ffffffff81019e7a>] show_stack_log_lvl+0xea/0x170
      [  709.882932]  [<ffffffff8101ac21>] show_stack+0x21/0x40
      [  709.882953]  [<ffffffff8132ba20>] dump_stack+0x5c/0x7c
      [  709.882983]  [<ffffffff8119ca92>] warn_alloc_failed+0xe2/0x150
      [  709.883006]  [<ffffffff8119cf0b>] __alloc_pages_nodemask+0x40b/0xb70
      [  709.883010]  [<ffffffff8119d7aa>] __alloc_page_frag+0x10a/0x120
      [  709.883029]  [<ffffffff81512302>] __napi_alloc_skb+0x82/0xd0
      [  709.883076]  [<ffffffffa02ad3a4>] cp_rx_poll+0x1b4/0x550 [8139cp]
      [  709.883097]  [<ffffffff81521fec>] net_rx_action+0x15c/0x370
      [  709.883112]  [<ffffffff8108632c>] __do_softirq+0xec/0x300
      [  709.883124]  [<ffffffff810867fa>] irq_exit+0xfa/0x110
      [  709.883145]  [<ffffffff816201e1>] do_IRQ+0x51/0xe0
      [  709.883162]  [<ffffffff8161d7c2>] common_interrupt+0xc2/0xc2
      [  709.886555] DWARF2 unwinder stuck at ret_from_intr+0x0/0x1b
      [  709.886555] 
      [  709.886560] Leftover inexact backtrace:
                     
      [  709.886583]  <IRQ>  <EOI>  [<ffffffff811a9525>] ? shrink_inactive_list+0x195/0x4f0
      [  709.886585]  [<ffffffff811a951f>] ? shrink_inactive_list+0x18f/0x4f0
      [  709.886586]  [<ffffffff811aa3bb>] ? shrink_zone_memcg+0x2bb/0x6a0
      [  709.886588]  [<ffffffff811aa857>] ? shrink_zone+0xb7/0x260
      [  709.886590]  [<ffffffff811ab9ae>] ? kswapd+0x48e/0x920
      [  709.886591]  [<ffffffff811ab520>] ? mem_cgroup_shrink_node_zone+0x150/0x150
      [  709.886597]  [<ffffffff8109fde9>] ? kthread+0xc9/0xe0
      [  709.886603]  [<ffffffff816185a0>] ? thread_return+0x23/0x5d3
      [  709.886605]  [<ffffffff8109fd20>] ? kthread_park+0x50/0x50
      [  709.886607]  [<ffffffff8161d045>] ? ret_from_fork+0x55/0x80
      [  709.886608]  [<ffffffff8109fd20>] ? kthread_park+0x50/0x50
      … 
      

      with a similar call trace in the kernel-crash log for recovery-mds-scale test failover_mds

      [  750.405629] Lustre: 13389:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 1 previous similar message
      [  757.774431] swapper/0: page allocation failure: order:0, mode:0x1080020(GFP_ATOMIC)
      [  757.774436] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G        W  OE   N  4.4.155-94.50-default #1
      [  757.774437] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [  757.774440]  0000000000000000 ffffffff8132ba20 0000000000000000 ffff88007fc03ce8
      [  757.774442]  ffffffff8119ca92 0108002000000030 00000000000003f5 00000000000003f5
      [  757.774443]  00000000000003f5 00000000000003f5 0000000000000400 00000000000003f5
      [  757.774444] Call Trace:
      [  757.774458]  [<ffffffff81019aa9>] dump_trace+0x59/0x340
      [  757.774461]  [<ffffffff81019e7a>] show_stack_log_lvl+0xea/0x170
      [  757.774464]  [<ffffffff8101ac21>] show_stack+0x21/0x40
      [  757.774468]  [<ffffffff8132ba20>] dump_stack+0x5c/0x7c
      [  757.774472]  [<ffffffff8119ca92>] warn_alloc_failed+0xe2/0x150
      [  757.774476]  [<ffffffff8119cf0b>] __alloc_pages_nodemask+0x40b/0xb70
      [  757.774479]  [<ffffffff8119d7aa>] __alloc_page_frag+0x10a/0x120
      [  757.774483]  [<ffffffff81512302>] __napi_alloc_skb+0x82/0xd0
      [  757.774490]  [<ffffffffa02ad3a4>] cp_rx_poll+0x1b4/0x550 [8139cp]
      [  757.774497]  [<ffffffff81521fec>] net_rx_action+0x15c/0x370
      [  757.774502]  [<ffffffff8108632c>] __do_softirq+0xec/0x300
      [  757.774504]  [<ffffffff810867fa>] irq_exit+0xfa/0x110
      [  757.774510]  [<ffffffff816201e1>] do_IRQ+0x51/0xe0
      [  757.774514]  [<ffffffff8161d7c2>] common_interrupt+0xc2/0xc2
      [  757.776763] DWARF2 unwinder stuck at ret_from_intr+0x0/0x1b
      [  757.776763] 
      [  757.776764] Leftover inexact backtrace:
                     
      [  757.776768]  <IRQ>  <EOI>  [<ffffffff81020e80>] ? idle_notifier_unregister+0x20/0x20
      [  757.776772]  [<ffffffff81061272>] ? native_safe_halt+0x2/0x10
      [  757.776773]  [<ffffffff81020e98>] ? default_idle+0x18/0xd0
      [  757.776776]  [<ffffffff810c5db1>] ? cpu_startup_entry+0x2f1/0x390
      [  757.776780]  [<ffffffff81f8b0c7>] ? start_kernel+0x4c8/0x4d3
      [  757.776781]  [<ffffffff81f8aa03>] ? set_init_arg+0x50/0x50
      [  757.776783]  [<ffffffff81f8a120>] ? early_idt_handler_array+0x120/0x120
      [  757.776785]  [<ffffffff81f8a719>] ? x86_64_start_kernel+0x147/0x156
       

      There are similar kernel crashes at
      https://testing.whamcloud.com/test_sets/ba10c41e-e438-11e8-bfe1-52540065bddc
      https://testing.whamcloud.com/test_sets/5b6d69e0-e43c-11e8-86c0-52540065bddc
      https://testing.whamcloud.com/test_sets/a2b39c4c-e929-11e8-815b-52540065bddc

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: