Details
-
Bug
-
Resolution: Fixed
-
Minor
-
Lustre 2.12.0, Lustre 2.13.0
-
3
-
9223372036854775807
Description
This issue was created by maloo for Andreas Dilger <adilger@whamcloud.com>
This issue relates to the following test suite run:
https://testing.whamcloud.com/test_sets/adfb4bd4-1978-11e9-8388-52540065bddc
The test_103b code is running 512 parallel bash processes to verify that different values umask are working properly. On the x86 clients there is either not as much kernel debugging enabled, or the smaller pages (== smaller stack) doesn't cause as much grief. On ARM the client crashes because of slow allocation and OOM with the following stack trace:
[ 5945.554571] bash: page allocation stalls for 18420ms, order:0, mode:0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null) [ 5945.562347] bash cpuset=/ mems_allowed=0 [ 5945.564625] CPU: 1 PID: 20442 Comm: bash Kdump: loaded Tainted: G OE ------------ 4.14.0-115.2.2.el7a.aarch64 #1 [ 5945.578547] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 [ 5945.586497] Call trace: [ 5945.588107] [<ffff000008089e14>] dump_backtrace+0x0/0x23c [ 5945.599468] [<ffff00000808a074>] show_stack+0x24/0x2c [ 5945.603148] [<ffff000008855c28>] dump_stack+0x84/0xa8 [ 5945.606676] [<ffff000008216e34>] warn_alloc+0x11c/0x1ac [ 5945.614536] [<ffff000008217ddc>] __alloc_pages_nodemask+0xe90/0xec0 [ 5945.624463] [<ffff00000827bca4>] alloc_pages_vma+0x90/0x1c0 [ 5945.628873] [<ffff00000824b574>] wp_page_copy+0x94/0x670 [ 5945.633271] [<ffff00000824ea40>] do_wp_page+0xbc/0x63c [ 5945.639748] [<ffff000008251868>] __handle_mm_fault+0x4d0/0x560 [ 5945.650364] [<ffff0000082519d8>] handle_mm_fault+0xe0/0x178 [ 5945.655960] [<ffff000008872dc4>] do_page_fault+0x1c4/0x3cc [ 5945.663762] [<ffff0000080813e8>] do_mem_abort+0x64/0xe4 [ 5945.756137] Mem-Info: [ 5945.759687] active_anon:4916 inactive_anon:4896 isolated_anon:584 active_file:65 inactive_file:50 isolated_file:0 unevictable:0 dirty:0 writeback:58 unstable:0 slab_reclaimable:353 slab_unreclaimable:2005 mapped:86 shmem:5 pagetables:4117 bounce:0 free:2810 free_pcp:10 free_cma:0 [ 5945.783426] Node 0 active_anon:307392kB inactive_anon:307648kB active_file:2752kB inactive_file:3200kB unevictable:0kB isolated(anon):37376kB isolated(file):0kB mapped:5504kB dirty:0kB writeback:2368kB shmem:320kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no [ 5945.800403] Node 0 DMA free:195968kB min:75520kB low:94400kB high:113280kB active_anon:309184kB inactive_anon:311360kB active_file:4928kB inactive_file:4608kB unevictable:0kB writepending:0kB present:2097152kB managed:1537088kB mlocked:0kB kernel_stack:76544kB pagetables:263488kB bounce:0kB free_pcp:640kB local_pcp:320kB free_cma:0kB [ 5945.817944] lowmem_reserve[]: 0 0 0 [ 5945.820200] Node 0 DMA: 1794*64kB (U) 236*128kB (U) 36*256kB (U) 1*512kB (U) 0*1024kB 1*2048kB (U) 0*4096kB 0*8192kB 1*16384kB (U) 1*32768kB (U) 0*65536kB 0*131072kB 0*262144kB 0*524288kB = 205952kB [ 5945.830444] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [ 5945.835293] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=524288kB [ 5945.845101] 1568 total pagecache pages [ 5945.850497] 1505 pages in swap cache [ 5945.854516] Swap cache stats: add 131425, delete 129953, find 93983/140484 [ 5945.861924] Free swap = 208256kB [ 5945.864822] Total swap = 2098112kB [ 5945.867189] 32768 pages RAM [ 5945.869354] 0 pages HighMem/MovableOnly [ 5945.873040] 8751 pages reserved [ 5945.876243] 0 pages hwpoisoned [ 5979.408965] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name [ 5979.414229] [ 1334] 0 1334 237 3 4 2 37 0 systemd-journal [ 5979.419778] [ 1354] 0 1354 1282 0 4 2 43 0 lvmetad [ 5979.425682] [ 1364] 0 1364 243 2 4 2 42 -1000 systemd-udevd : : [ 5979.569754] [11382] 0 11382 1739 0 4 2 15 0 run_test.sh [ 5979.575016] [11652] 0 11652 1785 2 3 2 62 0 bash [ 5979.579985] [19821] 0 19821 1785 1 3 2 62 0 bash [ 5979.584878] [19822] 0 19822 1715 1 3 2 8 0 tee [ 5979.589861] [20003] 0 20003 1828 2 4 2 104 0 bash [ 5979.594729] [15391] 0 15391 1743 1 5 2 23 0 anacron [ 5979.599854] [17647] 0 17647 1834 4 4 2 108 0 bash [ 5979.604748] [17648] 0 17648 1715 1 4 2 9 0 tee [ 5979.609712] [17832] 0 17832 1831 30 4 2 76 0 bash [ 5979.614600] [17834] 0 17834 1831 0 4 2 109 0 bash [ 5979.619561] [17835] 0 17835 1828 9 4 2 97 0 bash [ 5979.624770] [17836] 0 17836 1831 23 4 2 89 0 bash [ 5979.629739] [17841] 0 17841 1828 0 4 2 109 0 bash : : [ 5986.229441] [22230] 0 22230 1831 24 4 2 83 0 bash [ 5986.234602] [22231] 0 22231 1828 26 4 2 79 0 bash [ 5986.239474] [22232] 0 22232 1834 24 4 2 86 0 bash [ 5986.244709] [22233] 0 22233 1831 15 4 2 92 0 bash [ 5986.249630] [22234] 0 22234 1831 20 4 2 86 0 bash [ 5986.254535] [22235] 0 22235 1834 22 4 2 88 0 bash [ 5986.259377] Kernel panic - not syncing: Out of memory: system-wide panic_on_oom is enabled
It was initially a bit of a surprise that there was any swap in use, because Lustre runs in the kernel and cannot be swapped out, but this space is used by the many (nearly 1000) bash processes that are running on the node, in addition to many lfs and rm processes.
VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
sanity test_103b - onyx-90vm17 crashed during sanity test_103b