[LU-10812] recovery-random-scale test_fail_client_mds: test_fail_client_mds returned 4 Created: 13/Mar/18 Updated: 06/Aug/18 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.11.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Maloo | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
This issue was created by maloo for sarah_lw <wei3.liu@intel.com> This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/85ed0988-278a-11e8-9e0e-52540065bddc test_fail_client_mds failed with the following error: test_fail_client_mds returned 5 client and server: lustre-master tag-2.10.59 SLES12SP3 client 2 hit ll_ping: page allocation failure console log [ 972.544170] Lustre: DEBUG MARKER: /usr/sbin/lctl mark ==== Checking the clients loads BEFORE failover -- failure NOT OK ELAPSED=357 DURATION=86400 PERIOD=1200 [ 1005.987638] Lustre: DEBUG MARKER: ==== Checking the clients loads BEFORE failover -- failure NOT OK ELAPSED=357 DURATION=86400 PERIOD=1200 [ 1022.308773] Lustre: DEBUG MARKER: rc=0; [ 1022.308773] val=$(/usr/sbin/lctl get_param -n catastrophe 2>&1); [ 1022.308773] if [[ $? -eq 0 && $val -ne 0 ]]; then [ 1022.308773] echo $(hostname -s): $val; [ 1022.308773] rc=$val; [ 1022.308773] fi; [ 1022.308773] exit $rc [ 1022.357893] Lustre: DEBUG MARKER: ps auxwww | grep -v grep | grep -q run_dd.sh [ 1022.566140] Lustre: DEBUG MARKER: /usr/sbin/lctl mark FAIL CLIENT onyx-47vm4... [ 1023.976580] Lustre: DEBUG MARKER: FAIL CLIENT onyx-47vm4... [ 1033.244860] Lustre: DEBUG MARKER: /usr/sbin/lctl mark Starting failover on mds1 [ 1099.399100] Lustre: DEBUG MARKER: Starting failover on mds1 [ 1123.785219] Lustre: 1838:0:(client.c:2100:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1520989165/real 1520989165] req@ffff880040416c40 x1594871608337664/t0(0) o400->MGC10.2.8.216@tcp@10.2.8.217@tcp:26/25 lens 224/224 e 0 to 1 dl 1520989172 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 [ 1123.785238] LustreError: 166-1: MGC10.2.8.216@tcp: Connection to MGS (at 10.2.8.217@tcp) was lost; in progress operations using this service will fail [ 1158.804380] Lustre: 1838:0:(client.c:2100:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1520989206/real 1520989211] req@ffff880064c4d080 x1594871608390224/t0(0) o400->lustre-MDT0000-mdc-ffff88007a3ec800@10.2.8.217@tcp:12/10 lens 224/224 e 0 to 1 dl 1520989255 ref 1 fl Rpc:eXN/0/ffffffff rc 0/-1 [ 1158.804393] Lustre: lustre-MDT0000-mdc-ffff88007a3ec800: Connection to lustre-MDT0000 (at 10.2.8.217@tcp) was lost; in progress operations using this service will wait for recovery to complete [ 1178.804585] Lustre: Evicted from MGS (at MGC10.2.8.216@tcp_0) after server handle changed from 0x620469374b42c3ff to 0x3e7afd771be5d16d [ 1178.806122] Lustre: MGC10.2.8.216@tcp: Connection restored to MGC10.2.8.216@tcp_0 (at 10.2.8.216@tcp) [ 1178.809735] LustreError: 1836:0:(client.c:2972:ptlrpc_replay_interpret()) @@@ status 301, old was 0 req@ffff88005eb3bc40 x1594871608322400/t8590014200(8590014200) o101->lustre-MDT0000-mdc-ffff88007a3ec800@10.2.8.216@tcp:12/10 lens 952/544 e 0 to 0 dl 1520989279 ref 2 fl Interpret:RP/4/0 rc 301/301 [ 1179.747452] Lustre: 1838:0:(client.c:2100:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1520989176/real 1520989176] req@ffff880069295cc0 x1594871608341840/t0(0) o400->lustre-MDT0000-mdc-ffff88007a3ec800@10.2.8.216@tcp:12/10 lens 224/224 e 0 to 1 dl 1520989225 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 [ 1179.747464] Lustre: 1838:0:(client.c:2100:ptlrpc_expire_one_request()) Skipped 3 previous similar messages [ 1180.738657] Lustre: lustre-MDT0000-mdc-ffff88007a3ec800: Connection restored to 10.2.8.216@tcp (at 10.2.8.216@tcp) [ 1214.574355] Lustre: 1837:0:(client.c:2100:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1520989191/real 1520989191] req@ffff880060bc8380 x1594871608349568/t0(0) o400->lustre-MDT0000-mdc-ffff88007a3ec800@10.2.8.216@tcp:12/10 lens 224/224 e 0 to 1 dl 1520989240 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 [ 1214.574362] Lustre: 1837:0:(client.c:2100:ptlrpc_expire_one_request()) Skipped 5 previous similar messages [ 1332.924289] Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-47vm4: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4 [ 1333.499978] Lustre: DEBUG MARKER: onyx-47vm4: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4 [ 1366.096335] Lustre: DEBUG MARKER: /usr/sbin/lctl mark Started client load: tar on onyx-47vm4 [ 1366.593473] Lustre: DEBUG MARKER: Started client load: tar on onyx-47vm4 [ 1368.645975] Lustre: DEBUG MARKER: /usr/sbin/lctl mark ==== Checking the clients loads AFTER failed client reintegrated -- failure NOT OK [ 1369.798370] Lustre: DEBUG MARKER: ==== Checking the clients loads AFTER failed client reintegrated -- failure NOT OK [ 1370.023893] Lustre: DEBUG MARKER: rc=0; [ 1370.023893] val=$(/usr/sbin/lctl get_param -n catastrophe 2>&1); [ 1370.023893] if [[ $? -eq 0 && $val -ne 0 ]]; then [ 1370.023893] echo $(hostname -s): $val; [ 1370.023893] rc=$val; [ 1370.023893] fi; [ 1370.023893] exit $rc [ 1370.080186] Lustre: DEBUG MARKER: ps auxwww | grep -v grep | grep -q run_dd.sh [ 1370.190671] Lustre: DEBUG MARKER: /usr/sbin/lctl mark Number of failovers: [ 1370.190671] mds1 failed over 2 times and counting... [ 1370.405207] Lustre: DEBUG MARKER: Number of failovers: [ 1859.772170] ll_ping: page allocation failure: order:0, mode:0x1080020(GFP_ATOMIC) [ 1859.772182] CPU: 0 PID: 1839 Comm: ll_ping Tainted: G OE N 4.4.114-94.11-default #1 [ 1859.772183] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 1859.772189] 0000000000000000 ffffffff813274b0 0000000000000000 ffff88007fc03d00 [ 1859.772190] ffffffff8119a732 0108002000000030 000000000000008a 0000000000000400 [ 1859.772191] 0000000000000194 0000000000000003 0000000000000001 0000000000000000 [ 1859.772192] Call Trace: [ 1859.772207] [<ffffffff81019b59>] dump_trace+0x59/0x340 [ 1859.772211] [<ffffffff81019f2a>] show_stack_log_lvl+0xea/0x170 [ 1859.772213] [<ffffffff8101acd1>] show_stack+0x21/0x40 [ 1859.772219] [<ffffffff813274b0>] dump_stack+0x5c/0x7c [ 1859.772223] [<ffffffff8119a732>] warn_alloc_failed+0xe2/0x150 [ 1859.772226] [<ffffffff8119aba9>] __alloc_pages_nodemask+0x409/0xb80 [ 1859.772229] [<ffffffff8119b45a>] __alloc_page_frag+0x10a/0x120 [ 1859.772235] [<ffffffff8150c0f2>] __napi_alloc_skb+0x82/0xd0 [ 1859.772242] [<ffffffffa02db354>] cp_rx_poll+0x1b4/0x550 [8139cp] [ 1859.772251] [<ffffffff8151aebc>] net_rx_action+0x15c/0x370 [ 1859.772255] [<ffffffff810852bc>] __do_softirq+0xec/0x300 [ 1859.772258] [<ffffffff8108578a>] irq_exit+0xfa/0x110 [ 1859.772265] [<ffffffff816181f1>] do_IRQ+0x51/0xe0 [ 1859.772268] [<ffffffff816156c9>] common_interrupt+0xc9/0xc9 [ 1859.774527] DWARF2 unwinder stuck at ret_from_intr+0x0/0x1b [ 1859.774528] [ 1859.774528] Leftover inexact backtrace: [ 1859.774528] [ 1859.774534] <IRQ> <EOI> [<ffffffff811a6820>] ? shrink_page_list+0x440/0x7f0 [ 1859.774536] [<ffffffff811a7190>] ? shrink_inactive_list+0x1f0/0x4f0 [ 1859.774538] [<ffffffff811a7fcb>] ? shrink_zone_memcg+0x2bb/0x6a0 [ 1859.774540] [<ffffffff811a8467>] ? shrink_zone+0xb7/0x260 [ 1859.774542] [<ffffffff811a896d>] ? do_try_to_free_pages+0x15d/0x450 [ 1859.774543] [<ffffffff811a8d1a>] ? try_to_free_pages+0xba/0x170 [ 1859.774545] [<ffffffff8119ad93>] ? __alloc_pages_nodemask+0x5f3/0xb80 [ 1859.774547] [<ffffffff811e94cd>] ? kmem_getpages+0x4d/0xf0 [ 1859.774548] [<ffffffff811eacc9>] ? fallback_alloc+0x199/0x260 [ 1859.774550] [<ffffffff811eb3f9>] ? kmem_cache_alloc+0x1f9/0x460 [ 1859.774598] [<ffffffffa0ad3dc6>] ? ptlrpc_request_cache_alloc+0x26/0x100 [ptlrpc] [ 1859.774621] [<ffffffffa0ad3ebe>] ? ptlrpc_request_alloc_internal+0x1e/0x420 [ptlrpc] [ 1859.774644] [<ffffffffa0adc408>] ? ptlrpc_request_alloc_pack+0x18/0x50 [ptlrpc] [ 1859.774669] [<ffffffffa0af7c8c>] ? ptlrpc_prep_ping+0x1c/0x40 [ptlrpc] [ 1859.774693] [<ffffffffa0af8105>] ? ptlrpc_pinger_main+0x335/0xa90 [ptlrpc] [ 1859.774696] [<ffffffff810ab000>] ? wake_up_q+0x80/0x80 [ 1859.774718] [<ffffffffa0af7dd0>] ? ptlrpc_obd_ping+0x120/0x120 [ptlrpc] [ 1859.774720] [<ffffffff8109e8f9>] ? kthread+0xc9/0xe0 [ 1859.774722] [<ffffffff8109e830>] ? kthread_park+0x50/0x50 [ 1859.774723] [<ffffffff81614f45>] ? ret_from_fork+0x55/0x80 [ 1859.774725] [<ffffffff8109e830>] ? kthread_park+0x50/0x50 [ 1860.067158] ll_ping: page allocation failure: order:0, mode:0x1080020(GFP_ATOMIC) [ 1860.067163] CPU: 0 PID: 1839 Comm: ll_ping Tainted: G OE N 4.4.114-94.11-default #1 [ 1860.067170] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 1860.067174] 0000000000000000 ffffffff813274b0 0000000000000000 ffff88007fc03d00 [ 1860.067175] ffffffff8119a732 0108002000000030 0000000000000005 0000000000000400 [ 1860.067177] 00000000000002f3 0000000000000001 0000000000000001 0000000000000000 [ 1860.067177] Call Trace: [ 1860.067191] [<ffffffff81019b59>] dump_trace+0x59/0x340 [ 1860.067195] [<ffffffff81019f2a>] show_stack_log_lvl+0xea/0x170 [ 1860.067197] [<ffffffff8101acd1>] show_stack+0x21/0x40 [ 1860.067201] [<ffffffff813274b0>] dump_stack+0x5c/0x7c [ 1860.067205] [<ffffffff8119a732>] warn_alloc_failed+0xe2/0x150 [ 1860.067209] [<ffffffff8119aba9>] __alloc_pages_nodemask+0x409/0xb80 [ 1860.067212] [<ffffffff8119b45a>] __alloc_page_frag+0x10a/0x120 [ 1860.067216] [<ffffffff8150c0f2>] __napi_alloc_skb+0x82/0xd0 [ 1860.067222] [<ffffffffa02db354>] cp_rx_poll+0x1b4/0x550 [8139cp] [ 1860.067232] [<ffffffff8151aebc>] net_rx_action+0x15c/0x370 [ 1860.067236] [<ffffffff810852bc>] __do_softirq+0xec/0x300 [ 1860.067238] [<ffffffff8108578a>] irq_exit+0xfa/0x110 [ 1860.067245] [<ffffffff816181f1>] do_IRQ+0x51/0xe0 [ 1860.067248] [<ffffffff816156c9>] common_interrupt+0xc9/0xc9 [ 1860.069507] DWARF2 unwinder stuck at ret_from_intr+0x0/0x1b [ 1860.069508] VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV |
| Comments |
| Comment by Sarah Liu [ 14/Mar/18 ] |
[ 1369.798370] Lustre: DEBUG MARKER: ==== Checking the clients loads AFTER failed client reintegrated -- failure NOT OK [ 1370.023893] Lustre: DEBUG MARKER: rc=0; [ 1370.023893] val=$(/usr/sbin/lctl get_param -n catastrophe 2>&1); [ 1370.023893] if [[ $? -eq 0 && $val -ne 0 ]]; then [ 1370.023893] echo $(hostname -s): $val; [ 1370.023893] rc=$val; [ 1370.023893] fi; [ 1370.023893] exit $rc [ 1370.080186] Lustre: DEBUG MARKER: ps auxwww | grep -v grep | grep -q run_dd.sh [ 1370.190671] Lustre: DEBUG MARKER: /usr/sbin/lctl mark Number of failovers: [ 1370.190671] mds1 failed over 2 times and counting... [ 1370.405207] Lustre: DEBUG MARKER: Number of failovers: [ 1859.772170] ll_ping: page allocation failure: order:0, mode:0x1080020(GFP_ATOMIC) [ 1859.772182] CPU: 0 PID: 1839 Comm: ll_ping Tainted: G OE N 4.4.114-94.11-default #1 [ 1859.772183] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 1859.772189] 0000000000000000 ffffffff813274b0 0000000000000000 ffff88007fc03d00 [ 1859.772190] ffffffff8119a732 0108002000000030 000000000000008a 0000000000000400 [ 1859.772191] 0000000000000194 0000000000000003 0000000000000001 0000000000000000 [ 1859.772192] Call Trace: [ 1859.772207] [<ffffffff81019b59>] dump_trace+0x59/0x340 [ 1859.772211] [<ffffffff81019f2a>] show_stack_log_lvl+0xea/0x170 [ 1859.772213] [<ffffffff8101acd1>] show_stack+0x21/0x40 [ 1859.772219] [<ffffffff813274b0>] dump_stack+0x5c/0x7c [ 1859.772223] [<ffffffff8119a732>] warn_alloc_failed+0xe2/0x150 [ 1859.772226] [<ffffffff8119aba9>] __alloc_pages_nodemask+0x409/0xb80 [ 1859.772229] [<ffffffff8119b45a>] __alloc_page_frag+0x10a/0x120 [ 1859.772235] [<ffffffff8150c0f2>] __napi_alloc_skb+0x82/0xd0 [ 1859.772242] [<ffffffffa02db354>] cp_rx_poll+0x1b4/0x550 [8139cp] [ 1859.772251] [<ffffffff8151aebc>] net_rx_action+0x15c/0x370 [ 1859.772255] [<ffffffff810852bc>] __do_softirq+0xec/0x300 [ 1859.772258] [<ffffffff8108578a>] irq_exit+0xfa/0x110 [ 1859.772265] [<ffffffff816181f1>] do_IRQ+0x51/0xe0 [ 1859.772268] [<ffffffff816156c9>] common_interrupt+0xc9/0xc9 [ 1859.774527] DWARF2 unwinder stuck at ret_from_intr+0x0/0x1b [ 1859.774528] [ 1859.774528] Leftover inexact backtrace: [ 1859.774528] [ 1859.774534] <IRQ> <EOI> [<ffffffff811a6820>] ? shrink_page_list+0x440/0x7f0 [ 1859.774536] [<ffffffff811a7190>] ? shrink_inactive_list+0x1f0/0x4f0 [ 1859.774538] [<ffffffff811a7fcb>] ? shrink_zone_memcg+0x2bb/0x6a0 [ 1859.774540] [<ffffffff811a8467>] ? shrink_zone+0xb7/0x260 [ 1859.774542] [<ffffffff811a896d>] ? do_try_to_free_pages+0x15d/0x450 [ 1859.774543] [<ffffffff811a8d1a>] ? try_to_free_pages+0xba/0x170 [ 1859.774545] [<ffffffff8119ad93>] ? __alloc_pages_nodemask+0x5f3/0xb80 [ 1859.774547] [<ffffffff811e94cd>] ? kmem_getpages+0x4d/0xf0 [ 1859.774548] [<ffffffff811eacc9>] ? fallback_alloc+0x199/0x260 [ 1859.774550] [<ffffffff811eb3f9>] ? kmem_cache_alloc+0x1f9/0x460 [ 1859.774598] [<ffffffffa0ad3dc6>] ? ptlrpc_request_cache_alloc+0x26/0x100 [ptlrpc] [ 1859.774621] [<ffffffffa0ad3ebe>] ? ptlrpc_request_alloc_internal+0x1e/0x420 [ptlrpc] [ 1859.774644] [<ffffffffa0adc408>] ? ptlrpc_request_alloc_pack+0x18/0x50 [ptlrpc] [ 1859.774669] [<ffffffffa0af7c8c>] ? ptlrpc_prep_ping+0x1c/0x40 [ptlrpc] [ 1859.774693] [<ffffffffa0af8105>] ? ptlrpc_pinger_main+0x335/0xa90 [ptlrpc] [ 1859.774696] [<ffffffff810ab000>] ? wake_up_q+0x80/0x80 [ 1859.774718] [<ffffffffa0af7dd0>] ? ptlrpc_obd_ping+0x120/0x120 [ptlrpc] [ 1859.774720] [<ffffffff8109e8f9>] ? kthread+0xc9/0xe0 [ 1859.774722] [<ffffffff8109e830>] ? kthread_park+0x50/0x50 [ 1859.774723] [<ffffffff81614f45>] ? ret_from_fork+0x55/0x80 [ 1859.774725] [<ffffffff8109e830>] ? kthread_park+0x50/0x50 [ 1860.067158] ll_ping: page allocation failure: order:0, mode:0x1080020(GFP_ATOMIC) [ 1860.067163] CPU: 0 PID: 1839 Comm: ll_ping Tainted: G OE N 4.4.114-94.11-default #1 [ 1860.067170] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 1860.067174] 0000000000000000 ffffffff813274b0 0000000000000000 ffff88007fc03d00 [ 1860.067175] ffffffff8119a732 0108002000000030 0000000000000005 0000000000000400 [ 1860.067177] 00000000000002f3 0000000000000001 0000000000000001 0000000000000000 [ 1860.067177] Call Trace: [ 1860.067191] [<ffffffff81019b59>] dump_trace+0x59/0x340 [ 1860.067195] [<ffffffff81019f2a>] show_stack_log_lvl+0xea/0x170 [ 1860.067197] [<ffffffff8101acd1>] show_stack+0x21/0x40 [ 1860.067201] [<ffffffff813274b0>] dump_stack+0x5c/0x7c [ 1860.067205] [<ffffffff8119a732>] warn_alloc_failed+0xe2/0x150 [ 1860.067209] [<ffffffff8119aba9>] __alloc_pages_nodemask+0x409/0xb80 [ 1860.067212] [<ffffffff8119b45a>] __alloc_page_frag+0x10a/0x120 [ 1860.067216] [<ffffffff8150c0f2>] __napi_alloc_skb+0x82/0xd0 [ 1860.067222] [<ffffffffa02db354>] cp_rx_poll+0x1b4/0x550 [8139cp] [ 1860.067232] [<ffffffff8151aebc>] net_rx_action+0x15c/0x370 [ 1860.067236] [<ffffffff810852bc>] __do_softirq+0xec/0x300 [ 1860.067238] [<ffffffff8108578a>] irq_exit+0xfa/0x110 [ 1860.067245] [<ffffffff816181f1>] do_IRQ+0x51/0xe0 [ 1860.067248] [<ffffffff816156c9>] common_interrupt+0xc9/0xc9 [ 1860.069507] DWARF2 unwinder stuck at ret_from_intr+0x0/0x1b [ 1860.069508] [ 1860.069508] Leftover inexact backtrace: [ 1860.069508] [ 1860.069514] <IRQ> <EOI> [<ffffffff811a71d6>] ? shrink_inactive_list+0x236/0x4f0 [ 1860.069516] [<ffffffff811a71d2>] ? shrink_inactive_list+0x232/0x4f0 [ 1860.069518] [<ffffffff811a7fcb>] ? shrink_zone_memcg+0x2bb/0x6a0 [ 1860.069520] [<ffffffff811a8467>] ? shrink_zone+0xb7/0x260 [ 1860.069522] [<ffffffff811a896d>] ? do_try_to_free_pages+0x15d/0x450 [ 1860.069524] [<ffffffff811a8d1a>] ? try_to_free_pages+0xba/0x170 [ 1860.069525] [<ffffffff8119ad93>] ? __alloc_pages_nodemask+0x5f3/0xb80 [ 1860.069528] [<ffffffff811e94cd>] ? kmem_getpages+0x4d/0xf0 [ 1860.069529] [<ffffffff811eacc9>] ? fallback_alloc+0x199/0x260 [ 1860.069530] [<ffffffff811eb3f9>] ? kmem_cache_alloc+0x1f9/0x460 [ 1860.069581] [<ffffffffa0ad3dc6>] ? ptlrpc_request_cache_alloc+0x26/0x100 [ptlrpc] [ 1860.069603] [<ffffffffa0ad3ebe>] ? ptlrpc_request_alloc_internal+0x1e/0x420 [ptlrpc] [ 1860.069626] [<ffffffffa0adc408>] ? ptlrpc_request_alloc_pack+0x18/0x50 [ptlrpc] [ 1860.069651] [<ffffffffa0af7c8c>] ? ptlrpc_prep_ping+0x1c/0x40 [ptlrpc] [ 1860.069675] [<ffffffffa0af8105>] ? ptlrpc_pinger_main+0x335/0xa90 [ptlrpc] [ 1860.069678] [<ffffffff810ab000>] ? wake_up_q+0x80/0x80 [ 1860.069700] [<ffffffffa0af7dd0>] ? ptlrpc_obd_ping+0x120/0x120 [ptlrpc] [ 1860.069702] [<ffffffff8109e8f9>] ? kthread+0xc9/0xe0 [ 1860.069704] [<ffffffff8109e830>] ? kthread_park+0x50/0x50 [ 1860.069705] [<ffffffff81614f45>] ? ret_from_fork+0x55/0x80 [ 1860.069707] [<ffffffff8109e830>] ? kthread_park+0x50/0x50 [ 1860.325324] ll_ping: page allocation failure: order:0, mode:0x1284020(GFP_ATOMIC|__GFP_COMP|__GFP_NOTRACK) [ 1860.325328] CPU: 0 PID: 1839 Comm: ll_ping Tainted: G OE N 4.4.114-94.11-default #1 [ 1860.325328] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 1860.325332] 0000000000000000 ffffffff813274b0 0000000000000000 ffff8800372638b0 [ 1860.325333] ffffffff8119a732 0128402000000004 0000000400000000 ffff88007fc19828 [ 1860.325335] ffff88007ffcf490 ffff88007fc19828 0000000400000000 ffff880037263938 [ 1860.325335] Call Trace: [ 1860.325347] [<ffffffff81019b59>] dump_trace+0x59/0x340 [ 1860.325351] [<ffffffff81019f2a>] show_stack_log_lvl+0xea/0x170 [ 1860.325353] [<ffffffff8101acd1>] show_stack+0x21/0x40 [ 1860.325358] [<ffffffff813274b0>] dump_stack+0x5c/0x7c [ 1860.325362] [<ffffffff8119a732>] warn_alloc_failed+0xe2/0x150 [ 1860.325365] [<ffffffff8119aba9>] __alloc_pages_nodemask+0x409/0xb80 [ 1860.325369] [<ffffffff811e94cd>] kmem_getpages+0x4d/0xf0 [ 1860.325372] [<ffffffff811ead35>] fallback_alloc+0x205/0x260 [ 1860.325376] [<ffffffff811eb856>] kmem_cache_alloc_trace+0x1f6/0x460 [ 1860.325381] [<ffffffff8123792b>] wb_start_writeback+0x3b/0xe0 [ 1860.325384] [<ffffffff81237ed6>] wakeup_flusher_threads+0xc6/0x150 [ 1860.325388] [<ffffffff811a8a51>] do_try_to_free_pages+0x241/0x450 [ 1860.325391] [<ffffffff811a8d1a>] try_to_free_pages+0xba/0x170 [ 1860.325394] [<ffffffff8119ad93>] __alloc_pages_nodemask+0x5f3/0xb80 [ 1860.325396] [<ffffffff811e94cd>] kmem_getpages+0x4d/0xf0 [ 1860.325398] [<ffffffff811eacc9>] fallback_alloc+0x199/0x260 [ 1860.325401] [<ffffffff811eb3f9>] kmem_cache_alloc+0x1f9/0x460 [ 1860.325452] [<ffffffffa0ad3dc6>] ptlrpc_request_cache_alloc+0x26/0x100 [ptlrpc] [ 1860.325484] [<ffffffffa0ad3ebe>] ptlrpc_request_alloc_internal+0x1e/0x420 [ptlrpc] [ 1860.325509] [<ffffffffa0adc408>] ptlrpc_request_alloc_pack+0x18/0x50 [ptlrpc] [ 1860.325537] [<ffffffffa0af7c8c>] ptlrpc_prep_ping+0x1c/0x40 [ptlrpc] [ 1860.325563] [<ffffffffa0af8105>] ptlrpc_pinger_main+0x335/0xa90 [ptlrpc] [ 1860.325568] [<ffffffff8109e8f9>] kthread+0xc9/0xe0 [ 1860.325574] [<ffffffff81614f45>] ret_from_fork+0x55/0x80 [ 1860.327818] DWARF2 unwinder stuck at ret_from_fork+0x55/0x80 [ 1860.327818] [ 1860.327818] Leftover inexact backtrace: [ 1860.327818] [ 1860.327821] [<ffffffff8109e830>] ? kthread_park+0x50/0x50 [ 1860.327822] Mem-Info: [ 1860.327826] active_anon:1857 inactive_anon:1875 isolated_anon:0 [ 1860.327826] active_file:67798 inactive_file:369012 isolated_file:0 [ 1860.327826] unevictable:20 dirty:0 writeback:512 unstable:0 [ 1860.327826] slab_reclaimable:2671 slab_unreclaimable:22061 [ 1860.327826] mapped:4296 shmem:2177 pagetables:887 bounce:0 [ 1860.327826] free:0 free_pcp:4 free_cma:0 [ 1860.327831] Node 0 DMA free:0kB min:376kB low:468kB high:560kB active_anon:32kB inactive_anon:88kB active_file:708kB inactive_file:6696kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:72kB shmem:92kB slab_reclaimable:32kB slab_unreclaimable:7856kB kernel_stack:16kB pagetables:28kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:1176144 all_unreclaimable? yes [ 1860.327833] lowmem_reserve[]: 0 1837 1837 1837 1837 [ 1860.327837] Node 0 DMA32 free:0kB min:44676kB low:55844kB high:67012kB active_anon:7396kB inactive_anon:7412kB active_file:270484kB inactive_file:1469352kB unevictable:80kB isolated(anon):0kB isolated(file):0kB present:2080744kB managed:1900872kB mlocked:80kB dirty:0kB writeback:2048kB mapped:17112kB shmem:8616kB slab_reclaimable:10652kB slab_unreclaimable:80388kB kernel_stack:2688kB pagetables:3520kB unstable:0kB bounce:0kB free_pcp:16kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:12966040 all_unreclaimable? yes [ 1860.327839] lowmem_reserve[]: 0 0 0 0 0 [ 1860.327843] Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB [ 1860.327848] Node 0 DMA32: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB [ 1860.327849] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [ 1860.327850] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [ 1860.327850] 7606 total pagecache pages [ 1860.327851] 169 pages in swap cache [ 1860.327852] Swap cache stats: add 8260, delete 8091, find 594/907 [ 1860.327853] Free swap = 14307180kB [ 1860.327853] Total swap = 14338044kB [ 1860.327853] 524184 pages RAM [ 1860.327854] 0 pages HighMem/MovableOnly [ 1860.327854] 44990 pages reserved [ 1860.327854] 0 pages hwpoisoned [ 1860.327865] ll_ping: page allocation failure: order:0, mode:0x1284020(GFP_ATOMIC|__GFP_COMP|__GFP_NOTRACK) |
| Comment by Sarah Liu [ 14/Mar/18 ] |
|
same error in recovery-mds-scale https://testing.hpdd.intel.com/test_sets/85e9815a-278a-11e8-9e0e-52540065bddc |
| Comment by James Nunez (Inactive) [ 22/Mar/18 ] |
|
Similar issue, but with dd having the page allocation error; https://testing.hpdd.intel.com/test_sets/794b9dac-2cf3-11e8-9e0e-52540065bddc [ 756.319053] Lustre: DEBUG MARKER: /usr/sbin/lctl mark Starting failover on mds1 [ 800.372031] dd: page allocation failure: order:0, mode:0x1080020(GFP_ATOMIC) [ 800.372040] CPU: 0 PID: 19516 Comm: dd Tainted: G OE N 4.4.114-94.11-default #1 [ 800.372041] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007 [ 800.372045] 0000000000000000 ffffffff813274b0 0000000000000000 ffff88007fc03d00 [ 800.372046] ffffffff8119a732 0108002000000030 0000000000016f00 0000000000000000 [ 800.372048] 0000000000000082 ffff88007fc16440 ffff88007fc03cf8 0000000000000000 [ 800.372048] Call Trace: [ 800.372092] [<ffffffff81019b59>] dump_trace+0x59/0x340 [ 800.372098] [<ffffffff81019f2a>] show_stack_log_lvl+0xea/0x170 [ 800.372101] [<ffffffff8101acd1>] show_stack+0x21/0x40 [ 800.372113] [<ffffffff813274b0>] dump_stack+0x5c/0x7c [ 800.372131] [<ffffffff8119a732>] warn_alloc_failed+0xe2/0x150 [ 800.372141] [<ffffffff8119aba9>] __alloc_pages_nodemask+0x409/0xb80 [ 800.372144] [<ffffffff8119b45a>] __alloc_page_frag+0x10a/0x120 [ 800.372156] [<ffffffff8150c0f2>] __napi_alloc_skb+0x82/0xd0 [ 800.372175] [<ffffffffa0357354>] cp_rx_poll+0x1b4/0x550 [8139cp] [ 800.372192] [<ffffffff8151aebc>] net_rx_action+0x15c/0x370 [ 800.372202] [<ffffffff810852bc>] __do_softirq+0xec/0x300 [ 800.372210] [<ffffffff8108578a>] irq_exit+0xfa/0x110 [ 800.372224] [<ffffffff816181f1>] do_IRQ+0x51/0xe0 [ 800.372232] [<ffffffff816156c9>] common_interrupt+0xc9/0xc9 [ 800.374978] DWARF2 unwinder stuck at ret_from_intr+0x0/0x1b [ 800.374978] [ 800.374979] Leftover inexact backtrace: [ 800.374979] [ 800.374991] <IRQ> <EOI> [<ffffffff811a726e>] ? shrink_inactive_list+0x2ce/0x4f0 [ 800.374993] [<ffffffff811a742f>] ? shrink_inactive_list+0x48f/0x4f0 [ 800.374995] [<ffffffff811a7fcb>] ? shrink_zone_memcg+0x2bb/0x6a0 [ 800.374997] [<ffffffff811a8467>] ? shrink_zone+0xb7/0x260 [ 800.374999] [<ffffffff811a896d>] ? do_try_to_free_pages+0x15d/0x450 [ 800.375001] [<ffffffff811a8d1a>] ? try_to_free_pages+0xba/0x170 [ 800.375002] [<ffffffff8119ad93>] ? __alloc_pages_nodemask+0x5f3/0xb80 [ 800.375012] [<ffffffff811e27ff>] ? alloc_pages_current+0x7f/0x100 [ 800.375018] [<ffffffff8119368d>] ? pagecache_get_page+0x4d/0x1d0 [ 800.375056] [<ffffffffa0e86c1d>] ? ll_write_begin+0xed/0xbd0 [lustre] [ 800.375058] [<ffffffff811927b5>] ? generic_perform_write+0xc5/0x1b0 [ 800.375063] [<ffffffff81223aeb>] ? file_update_time+0x3b/0xf0 [ 800.375065] [<ffffffff81194524>] ? __generic_file_write_iter+0x184/0x1c0 [ 800.375081] [<ffffffffa0d2c241>] ? lov_object_maxbytes+0x31/0x40 [lov] [ 800.375094] [<ffffffffa0e976ad>] ? vvp_io_write_start+0x44d/0x740 [lustre] [ 800.375140] [<ffffffffa092f4f2>] ? cl_lock_request+0x62/0x1d0 [obdclass] [ 800.375146] [<ffffffffa0d14557>] ? lov_io_call.isra.5+0x77/0x120 [lov] [ 800.375166] [<ffffffffa0931278>] ? cl_io_start+0x58/0x110 [obdclass] [ 800.375184] [<ffffffffa09332e4>] ? cl_io_loop+0x104/0xc30 [obdclass] [ 800.375195] [<ffffffffa0e4f160>] ? ll_file_io_generic+0x490/0xb50 [lustre] [ 800.375205] [<ffffffffa0e4fab1>] ? ll_file_write_iter+0xe1/0x3d0 [lustre] [ 800.375212] [<ffffffff8120a360>] ? __vfs_write+0xd0/0x140 [ 800.375213] [<ffffffff8120afbd>] ? vfs_write+0x9d/0x190 [ 800.375215] [<ffffffff81617b99>] ? stuff_rsb+0x59/0xf0 [ 800.375217] [<ffffffff8120c032>] ? SyS_write+0x42/0xa0 [ 800.375219] [<ffffffff81617b68>] ? stuff_rsb+0x28/0xf0 [ 800.375220] [<ffffffff81614b0a>] ? entry_SYSCALL_64_fastpath+0x1e/0xb6 [ 800.791280] kswapd0: page allocation failure: order:0, mode:0x1080020(GFP_ATOMIC) [ 800.791287] CPU: 0 PID: 32 Comm: kswapd0 Tainted: G OE N 4.4.114-94.11-default #1 [ 800.791288] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007 [ 800.791300] 0000000000000000 ffffffff813274b0 0000000000000000 ffff88007fc03d00 [ 800.791301] ffffffff8119a732 0108002000000030 00000000000003e0 0000000000000400 [ 800.791303] 00000000000003fa 0000000000000002 0000000200000001 0000000000000001 [ 800.791303] Call Trace: [ 800.791321] [<ffffffff81019b59>] dump_trace+0x59/0x340 [ 800.791325] [<ffffffff81019f2a>] show_stack_log_lvl+0xea/0x170 [ 800.791328] [<ffffffff8101acd1>] show_stack+0x21/0x40 [ 800.791334] [<ffffffff813274b0>] dump_stack+0x5c/0x7c [ 800.791340] [<ffffffff8119a732>] warn_alloc_failed+0xe2/0x150 [ 800.791343] [<ffffffff8119aba9>] __alloc_pages_nodemask+0x409/0xb80 [ 800.791346] [<ffffffff8119b45a>] __alloc_page_frag+0x10a/0x120 [ 800.791352] [<ffffffff8150c0f2>] __napi_alloc_skb+0x82/0xd0 [ 800.791362] [<ffffffffa0357354>] cp_rx_poll+0x1b4/0x550 [8139cp] [ 800.791375] [<ffffffff8151aebc>] net_rx_action+0x15c/0x370 [ 800.791380] [<ffffffff810852bc>] __do_softirq+0xec/0x300 [ 800.791383] [<ffffffff8108578a>] irq_exit+0xfa/0x110 [ 800.791391] [<ffffffff816181f1>] do_IRQ+0x51/0xe0 [ 800.791395] [<ffffffff816156c9>] common_interrupt+0xc9/0xc9 [ 800.792004] DWARF2 unwinder stuck at ret_from_intr+0x0/0x1b [ 800.792004] [ 800.792004] Leftover inexact backtrace: [ 800.792004] [ 800.792004] <IRQ> <EOI> [<ffffffff811a71d6>] ? shrink_inactive_list+0x236/0x4f0 [ 800.792004] [<ffffffff811a71c6>] ? shrink_inactive_list+0x226/0x4f0 [ 800.792004] [<ffffffff811a7fcb>] ? shrink_zone_memcg+0x2bb/0x6a0 [ 800.792004] [<ffffffff811a8467>] ? shrink_zone+0xb7/0x260 [ 800.792004] [<ffffffff811a95be>] ? kswapd+0x48e/0x920 [ 800.792004] [<ffffffff811a9130>] ? mem_cgroup_shrink_node_zone+0x150/0x150 [ 800.792004] [<ffffffff8109e8f9>] ? kthread+0xc9/0xe0 [ 800.792004] [<ffffffff8109e830>] ? kthread_park+0x50/0x50 [ 800.792004] [<ffffffff81614f45>] ? ret_from_fork+0x55/0x80 [ 800.792004] [<ffffffff8109e830>] ? kthread_park+0x50/0x50
|