Details
-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
Lustre 2.12.0, Lustre 2.10.6
-
None
-
3
-
9223372036854775807
Description
recovery-double-scale test_pairwise_fail, recovery-random-scale test_fail_client_mds and recovery-mds-scale test_failover_mds all have client crashes with similar information in the kernel crash log.
Looking at the failover test session results at https://testing.whamcloud.com/test_sessions/d1f52b33-1a69-47d0-a0c4-03e90d450320, we see the following in the recovery-double-scale test pairwise_fail kernel-crash log on the client:
[ 616.142886] Lustre: DEBUG MARKER: cat /tmp/client-load.pid [ 709.882775] kswapd0: page allocation failure: order:0, mode:0x1080020(GFP_ATOMIC) [ 709.882805] CPU: 0 PID: 32 Comm: kswapd0 Tainted: G W OE N 4.4.155-94.50-default #1 [ 709.882806] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 709.882814] 0000000000000000 ffffffff8132ba20 0000000000000000 ffff88007fc03ce8 [ 709.882819] ffffffff8119ca92 0108002000000030 0000000000002d60 0000000000000000 [ 709.882820] 00000000000011e0 ffffffff810b79bb ffff88007ca6c4c0 ffff88007fc03d28 [ 709.882821] Call Trace: [ 709.882926] [<ffffffff81019aa9>] dump_trace+0x59/0x340 [ 709.882930] [<ffffffff81019e7a>] show_stack_log_lvl+0xea/0x170 [ 709.882932] [<ffffffff8101ac21>] show_stack+0x21/0x40 [ 709.882953] [<ffffffff8132ba20>] dump_stack+0x5c/0x7c [ 709.882983] [<ffffffff8119ca92>] warn_alloc_failed+0xe2/0x150 [ 709.883006] [<ffffffff8119cf0b>] __alloc_pages_nodemask+0x40b/0xb70 [ 709.883010] [<ffffffff8119d7aa>] __alloc_page_frag+0x10a/0x120 [ 709.883029] [<ffffffff81512302>] __napi_alloc_skb+0x82/0xd0 [ 709.883076] [<ffffffffa02ad3a4>] cp_rx_poll+0x1b4/0x550 [8139cp] [ 709.883097] [<ffffffff81521fec>] net_rx_action+0x15c/0x370 [ 709.883112] [<ffffffff8108632c>] __do_softirq+0xec/0x300 [ 709.883124] [<ffffffff810867fa>] irq_exit+0xfa/0x110 [ 709.883145] [<ffffffff816201e1>] do_IRQ+0x51/0xe0 [ 709.883162] [<ffffffff8161d7c2>] common_interrupt+0xc2/0xc2 [ 709.886555] DWARF2 unwinder stuck at ret_from_intr+0x0/0x1b [ 709.886555] [ 709.886560] Leftover inexact backtrace: [ 709.886583] <IRQ> <EOI> [<ffffffff811a9525>] ? shrink_inactive_list+0x195/0x4f0 [ 709.886585] [<ffffffff811a951f>] ? shrink_inactive_list+0x18f/0x4f0 [ 709.886586] [<ffffffff811aa3bb>] ? shrink_zone_memcg+0x2bb/0x6a0 [ 709.886588] [<ffffffff811aa857>] ? shrink_zone+0xb7/0x260 [ 709.886590] [<ffffffff811ab9ae>] ? kswapd+0x48e/0x920 [ 709.886591] [<ffffffff811ab520>] ? mem_cgroup_shrink_node_zone+0x150/0x150 [ 709.886597] [<ffffffff8109fde9>] ? kthread+0xc9/0xe0 [ 709.886603] [<ffffffff816185a0>] ? thread_return+0x23/0x5d3 [ 709.886605] [<ffffffff8109fd20>] ? kthread_park+0x50/0x50 [ 709.886607] [<ffffffff8161d045>] ? ret_from_fork+0x55/0x80 [ 709.886608] [<ffffffff8109fd20>] ? kthread_park+0x50/0x50 …
with a similar call trace in the kernel-crash log for recovery-mds-scale test failover_mds
[ 750.405629] Lustre: 13389:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 1 previous similar message [ 757.774431] swapper/0: page allocation failure: order:0, mode:0x1080020(GFP_ATOMIC) [ 757.774436] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W OE N 4.4.155-94.50-default #1 [ 757.774437] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 757.774440] 0000000000000000 ffffffff8132ba20 0000000000000000 ffff88007fc03ce8 [ 757.774442] ffffffff8119ca92 0108002000000030 00000000000003f5 00000000000003f5 [ 757.774443] 00000000000003f5 00000000000003f5 0000000000000400 00000000000003f5 [ 757.774444] Call Trace: [ 757.774458] [<ffffffff81019aa9>] dump_trace+0x59/0x340 [ 757.774461] [<ffffffff81019e7a>] show_stack_log_lvl+0xea/0x170 [ 757.774464] [<ffffffff8101ac21>] show_stack+0x21/0x40 [ 757.774468] [<ffffffff8132ba20>] dump_stack+0x5c/0x7c [ 757.774472] [<ffffffff8119ca92>] warn_alloc_failed+0xe2/0x150 [ 757.774476] [<ffffffff8119cf0b>] __alloc_pages_nodemask+0x40b/0xb70 [ 757.774479] [<ffffffff8119d7aa>] __alloc_page_frag+0x10a/0x120 [ 757.774483] [<ffffffff81512302>] __napi_alloc_skb+0x82/0xd0 [ 757.774490] [<ffffffffa02ad3a4>] cp_rx_poll+0x1b4/0x550 [8139cp] [ 757.774497] [<ffffffff81521fec>] net_rx_action+0x15c/0x370 [ 757.774502] [<ffffffff8108632c>] __do_softirq+0xec/0x300 [ 757.774504] [<ffffffff810867fa>] irq_exit+0xfa/0x110 [ 757.774510] [<ffffffff816201e1>] do_IRQ+0x51/0xe0 [ 757.774514] [<ffffffff8161d7c2>] common_interrupt+0xc2/0xc2 [ 757.776763] DWARF2 unwinder stuck at ret_from_intr+0x0/0x1b [ 757.776763] [ 757.776764] Leftover inexact backtrace: [ 757.776768] <IRQ> <EOI> [<ffffffff81020e80>] ? idle_notifier_unregister+0x20/0x20 [ 757.776772] [<ffffffff81061272>] ? native_safe_halt+0x2/0x10 [ 757.776773] [<ffffffff81020e98>] ? default_idle+0x18/0xd0 [ 757.776776] [<ffffffff810c5db1>] ? cpu_startup_entry+0x2f1/0x390 [ 757.776780] [<ffffffff81f8b0c7>] ? start_kernel+0x4c8/0x4d3 [ 757.776781] [<ffffffff81f8aa03>] ? set_init_arg+0x50/0x50 [ 757.776783] [<ffffffff81f8a120>] ? early_idt_handler_array+0x120/0x120 [ 757.776785] [<ffffffff81f8a719>] ? x86_64_start_kernel+0x147/0x156
There are similar kernel crashes at
https://testing.whamcloud.com/test_sets/ba10c41e-e438-11e8-bfe1-52540065bddc
https://testing.whamcloud.com/test_sets/5b6d69e0-e43c-11e8-86c0-52540065bddc
https://testing.whamcloud.com/test_sets/a2b39c4c-e929-11e8-815b-52540065bddc