Description
I hit 3 identical OOM panics during tests on eagle this weekend, all happened on zfs OSS during sanity-benchmark test pios_fpp:
Lustre: DEBUG MARKER: == sanity-benchmark test pios_fpp: pios file per process == 06:54:04 (1414418044) Lustre: DEBUG MARKER: /usr/sbin/lctl mark \/usr\/bin\/pios -t 1,8,40 -n 1024 -c 1M -s 8M -o 16M -L fpp -p \/mnt\/lustre\/dpios_fpp.sanity-benchmark Lustre: DEBUG MARKER: /usr/bin/pios -t 1,8,40 -n 1024 -c 1M -s 8M -o 16M -L fpp -p /mnt/lustre/dpios_fpp.sanity-benchmark Lustre: lustre-OST0001: Slow creates, 128/256 objects created at a rate of 2/s LNet: Service thread pid 3372 completed after 91.53s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). LNet: Skipped 15 previous similar messages Lustre: DEBUG MARKER: /usr/sbin/lctl mark \/usr\/bin\/pios -t 1,8,40 -n 1024 -c 1M -s 8M -o 16M -L fpp --verify -p \/mnt\/lustre\/dpios_fpp.sanity-benchmark Lustre: DEBUG MARKER: /usr/bin/pios -t 1,8,40 -n 1024 -c 1M -s 8M -o 16M -L fpp --verify -p /mnt/lustre/dpios_fpp.sanity-benchmark spl_kmem_cache/ invoked oom-killer: gfp_mask=0x84d0, order=0, oom_adj=0, oom_score_adj=0 spl_kmem_cache/ cpuset=/ mems_allowed=0 Pid: 396, comm: spl_kmem_cache/ Tainted: P --------------- 2.6.32-431.29.2.el6_lustre.g9835a2a.x86_64 #1 Call Trace: [<ffffffff810d07b1>] ? cpuset_print_task_mems_allowed+0x91/0xb0 [<ffffffff81122b80>] ? dump_header+0x90/0x1b0 [<ffffffff81122cee>] ? check_panic_on_oom+0x4e/0x80 [<ffffffff811233db>] ? out_of_memory+0x1bb/0x3c0 [<ffffffff8112fd5f>] ? __alloc_pages_nodemask+0x89f/0x8d0 [<ffffffff81167cea>] ? alloc_pages_current+0xaa/0x110 [<ffffffff8112d15e>] ? __get_free_pages+0xe/0x50 [<ffffffff8104ec85>] ? pte_alloc_one_kernel+0x15/0x20 [<ffffffff8114650b>] ? __pte_alloc_kernel+0x1b/0xc0 [<ffffffff81157769>] ? vmap_page_range_noflush+0x309/0x370 [<ffffffff81157802>] ? map_vm_area+0x32/0x50 [<ffffffff81159270>] ? __vmalloc_area_node+0x100/0x190 [<ffffffffa0115a09>] ? kv_alloc+0x59/0x60 [spl] [<ffffffff811590fd>] ? __vmalloc_node+0xad/0x120 [<ffffffffa0115a09>] ? kv_alloc+0x59/0x60 [spl] [<ffffffff811594e2>] ? __vmalloc+0x22/0x30 [<ffffffffa0115a09>] ? kv_alloc+0x59/0x60 [spl] [<ffffffffa0115a49>] ? spl_cache_grow_work+0x39/0x2d0 [spl] [<ffffffff81058bd3>] ? __wake_up+0x53/0x70 [<ffffffffa01174a7>] ? taskq_thread+0x1e7/0x3f0 [spl] [<ffffffff81061d00>] ? default_wake_function+0x0/0x20 [<ffffffffa01172c0>] ? taskq_thread+0x0/0x3f0 [spl] [<ffffffff8109abf6>] ? kthread+0x96/0xa0 [<ffffffff8100c20a>] ? child_rip+0xa/0x20 [<ffffffff8109ab60>] ? kthread+0x0/0xa0 [<ffffffff8100c200>] ? child_rip+0x0/0x20 Mem-Info: Node 0 DMA per-cpu: CPU 0: hi: 0, btch: 1 usd: 0 Node 0 DMA32 per-cpu: CPU 0: hi: 186, btch: 31 usd: 0 active_anon:0 inactive_anon:0 isolated_anon:0 active_file:11 inactive_file:0 isolated_file:0 unevictable:0 dirty:0 writeback:0 unstable:0 free:8559 slab_reclaimable:1482 slab_unreclaimable:12252 mapped:1 shmem:0 pagetables:1242 bounce:0 Node 0 DMA free:8352kB min:332kB low:412kB high:496kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15348kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:20kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes lowmem_reserve[]: 0 2004 2004 2004 Node 0 DMA32 free:25884kB min:44720kB low:55900kB high:67080kB active_anon:0kB inactive_anon:0kB active_file:44kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2052308kB mlocked:0kB dirty:0kB writeback:0kB mapped:4kB shmem:0kB slab_reclaimable:5928kB slab_unreclaimable:48988kB kernel_stack:3416kB pagetables:4968kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:100 all_unreclaimable? yes lowmem_reserve[]: 0 0 0 0 Node 0 DMA: 0*4kB 0*8kB 2*16kB 0*32kB 0*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 2*2048kB 1*4096kB = 8352kB Node 0 DMA32: 719*4kB 340*8kB 184*16kB 84*32kB 33*64kB 16*128kB 5*256kB 2*512kB 2*1024kB 1*2048kB 1*4096kB = 25884kB 20 total pagecache pages 0 pages in swap cache Swap cache stats: add 5121, delete 5121, find 16/25 Free swap = 4108600kB Total swap = 4128764kB 524284 pages RAM 43654 pages reserved 54 pages shared 465254 pages non-shared
In all cases, the paniced OSS had 1.8G memory, and ran build lustre-b2_5/96.