Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Cannot Reproduce
Priority: Minor
Fix Version/s: None
Affects Version/s: Lustre 2.11.0, Lustre 2.10.2
Labels:
None
Environment:
onyx, failover
servers: sles12sp3, ldiskfs, branch b2_10, v2.10.2.RC1, b50
clients: sles12sp3, branch b2_10, v2.10.2.RC1, b50

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

This impacts the SLES client that runs the dd load during failover recovery tests.

Note: SLES out-of-memory was first seen with LU-9601.

recovery-mds-scale: https://testing.hpdd.intel.com/test_sets/c95ce2ce-d41a-11e7-9840-52540065bddc

Note: LBUG/LASSERT (LU-10221) was also seen during the first recovery test run in the failover group (recovery-mds-scale).

From the client console (vm3):

[ 2075.737415] jbd2/vda1-8 invoked oom-killer: gfp_mask=0x1420848(GFP_NOFS|__GFP_NOFAIL|__GFP_HARDWALL|__GFP_MOVABLE), nodemask=0, order=0, oom_score_adj=0

followed by a core dump.

recovery-random-scale: https://testing.hpdd.intel.com/test_sets/c9603c80-d41a-11e7-9840-52540065bddc
recovery-double-scale: https://testing.hpdd.intel.com/test_sets/c963786e-d41a-11e7-9840-52540065bddc

The next two recovery tests run in the failover group (recovery-random-scale, recovery-double-scale) have page allocation failures:

From the client console (vm3):

[  960.559009] swapper/0: page allocation failure: order:0, mode:0x1080020(GFP_ATOMIC)
[  960.559012] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G           OE   N  4.4.92-6.18-default #1
[  960.559013] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[  960.559016]  0000000000000000 ffffffff813211b0 0000000000000000 ffff88007fc03d00
[  960.559018]  ffffffff81196022 0108002000000030 0000000000000000 0000000000000400
[  960.559019]  ffff88007fc15f00 ffff88007fc03d28 ffff88007fc15fb8 ffff88007fc15f00
[  960.559019] Call Trace:
[  960.559056]  [<ffffffff81019b19>] dump_trace+0x59/0x310
[  960.559059]  [<ffffffff81019eba>] show_stack_log_lvl+0xea/0x170
[  960.559064]  [<ffffffff8101ac41>] show_stack+0x21/0x40
[  960.559075]  [<ffffffff813211b0>] dump_stack+0x5c/0x7c
[  960.559087]  [<ffffffff81196022>] warn_alloc_failed+0xe2/0x150
[  960.559091]  [<ffffffff81196497>] __alloc_pages_nodemask+0x407/0xb80
[  960.559093]  [<ffffffff81196d4a>] __alloc_page_frag+0x10a/0x120
[  960.559104]  [<ffffffff81502e82>] __napi_alloc_skb+0x82/0xd0
[  960.559110]  [<ffffffffa02b6334>] cp_rx_poll+0x1b4/0x540 [8139cp]
[  960.559122]  [<ffffffff81511ae7>] net_rx_action+0x157/0x360
[  960.559133]  [<ffffffff810826d2>] __do_softirq+0xe2/0x2e0
[  960.559136]  [<ffffffff81082b8a>] irq_exit+0xfa/0x110
[  960.559149]  [<ffffffff8160ce71>] do_IRQ+0x51/0xd0
[  960.559152]  [<ffffffff8160ad0c>] common_interrupt+0x8c/0x8c
[  960.560535] DWARF2 unwinder stuck at ret_from_intr+0x0/0x1b

and

[  138.384058] Leftover inexact backtrace:

(many instances of this follow the page allocation traces)

followed by core dumps.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

vmcore_onyx-44vm3_recovery-double-scale
73.25 MB
04/Dec/17 6:19 PM
vmcore_onyx-44vm3_recovery-mds-scale
75.17 MB
04/Dec/17 6:19 PM
vmcore_onyx-44vm3_recovery-random-scale
73.02 MB
04/Dec/17 6:19 PM

Issue Links

is related to

LU-12067 recovery-mds-scale test failover_mds crashes with OOM

Open

LU-9601 recovery-mds-scale test_failover_mds: test_failover_mds returned 1

Reopened

is related to

LU-10221 recovery-mds-scale test_failover_mds: onyx-40vm1:LBUG/LASSERT detected

Open

mentioned in: Page Loading...; Page Loading...

Activity

People

Assignee:: Hongchao Zhang

Reporter:: James Casper (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 9 Start watching this issue

Dates

Created:: 04/Dec/17 3:59 PM

Updated:: 12/Aug/22 9:51 PM

Resolved:: 12/Aug/22 9:51 PM