Details
-
Bug
-
Resolution: Fixed
-
Critical
-
Lustre 2.8.0
-
None
-
autotest
-
3
-
9223372036854775807
Description
We’ve seen four recent hangs on the MDS during sanity test 60a due to OOM.
For https://testing.hpdd.intel.com/test_sets/383ea288-787c-11e5-b04c-5254006e85c2, we see the following from the console on the MDS:
23:58:40:Lustre: 1823:0:(llog_test.c:962:llog_test_7_sub()) 7_sub: records are not aligned, written 64452 from 64767 23:58:40:Lustre: 1823:0:(llog_test.c:1079:llog_test_7()) 7g: test llog_gen_rec 23:58:40:hald invoked oom-killer: gfp_mask=0x200da, order=0, oom_adj=0, oom_score_adj=0 23:58:40:hald cpuset=/ mems_allowed=0 23:58:40:Pid: 1967, comm: hald Tainted: P -- ------------ 2.6.32-573.7.1.el6_lustre.gd359461.x86_64 #1 23:58:40:Call Trace: 23:58:40: [<ffffffff810d71a1>] ? cpuset_print_task_mems_allowed+0x91/0xb0 23:58:40: [<ffffffff8112a9a0>] ? dump_header+0x90/0x1b0 23:58:40: [<ffffffff8112ab0e>] ? check_panic_on_oom+0x4e/0x80 23:58:40: [<ffffffff8112b1fb>] ? out_of_memory+0x1bb/0x3c0 23:58:40: [<ffffffff81137c3c>] ? __alloc_pages_nodemask+0x93c/0x950 23:58:40: [<ffffffff815395e2>] ? io_schedule+0x92/0xc0 23:58:40: [<ffffffff811279c0>] ? sync_page_killable+0x0/0x40 23:58:40: [<ffffffff8117074a>] ? alloc_pages_vma+0x9a/0x150 23:58:40: [<ffffffff81163c72>] ? read_swap_cache_async+0xf2/0x160 23:58:40: [<ffffffff811647c9>] ? valid_swaphandles+0x69/0x160 23:58:40: [<ffffffff81163d67>] ? swapin_readahead+0x87/0xc0 23:58:40: [<ffffffff81152ead>] ? handle_pte_fault+0x6dd/0xb20 23:58:40: [<ffffffff81153589>] ? handle_mm_fault+0x299/0x3d0 23:58:40: [<ffffffff81155fa6>] ? find_vma+0x46/0x80 23:58:40: [<ffffffff8104f156>] ? __do_page_fault+0x146/0x500 23:58:40: [<ffffffff81043f28>] ? pvclock_clocksource_read+0x58/0xd0 23:58:40: [<ffffffff81042fbc>] ? kvm_clock_read+0x1c/0x20 23:58:40: [<ffffffff81042fc9>] ? kvm_clock_get_cycles+0x9/0x10 23:58:40: [<ffffffff8153f3de>] ? do_page_fault+0x3e/0xa0 23:58:40: [<ffffffff8153c785>] ? page_fault+0x25/0x30 23:58:40:Mem-Info: 23:58:40:Node 0 DMA per-cpu: 23:58:40:CPU 0: hi: 0, btch: 1 usd: 0 23:58:40:CPU 1: hi: 0, btch: 1 usd: 0 23:58:40:Node 0 DMA32 per-cpu: 23:58:40:CPU 0: hi: 186, btch: 31 usd: 30 23:58:40:CPU 1: hi: 186, btch: 31 usd: 4 23:58:40:active_anon:97 inactive_anon:877 isolated_anon:0 23:58:40: active_file:3 inactive_file:55 isolated_file:0 23:58:40: unevictable:0 dirty:0 writeback:876 unstable:0 23:58:40: free:13242 slab_reclaimable:1788 slab_unreclaimable:442860 23:58:40: mapped:1 shmem:1 pagetables:681 bounce:0 23:58:40:Node 0 DMA free:8356kB min:332kB low:412kB high:496kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15348kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:7388kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes 23:58:40:lowmem_reserve[]: 0 2004 2004 2004 23:58:40:Node 0 DMA32 free:44612kB min:44720kB low:55900kB high:67080kB active_anon:388kB inactive_anon:3508kB active_file:12kB inactive_file:220kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2052308kB mlocked:0kB dirty:0kB writeback:3504kB mapped:4kB shmem:4kB slab_reclaimable:7152kB slab_unreclaimable:1764052kB kernel_stack:4112kB pagetables:2724kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:137468 all_unreclaimable? yes 23:58:40:lowmem_reserve[]: 0 0 0 0 23:58:40:Node 0 DMA: 5*4kB 0*8kB 1*16kB 0*32kB 0*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 2*2048kB 1*4096kB = 8356kB 23:58:40:Node 0 DMA32: 527*4kB 271*8kB 147*16kB 109*32kB 69*64kB 47*128kB 20*256kB 11*512kB 7*1024kB 1*2048kB 1*4096kB = 44612kB 23:58:40:957 total pagecache pages 23:58:40:898 pages in swap cache 23:58:40:Swap cache stats: add 5911, delete 5013, find 54/85 23:58:40:Free swap = 4106092kB 23:58:40:Total swap = 4128764kB 23:58:40:524284 pages RAM 23:58:40:43736 pages reserved 23:58:40:88 pages shared 23:58:40:438370 pages non-shared 23:58:40:[ pid ] uid tgid total_vm rss cpu oom_adj oom_score_adj name 23:58:40:[ 520] 0 520 2709 0 0 -17 -1000 udevd 23:58:40:[ 1706] 0 1706 6902 1 0 -17 -1000 auditd 23:58:40:[ 1740] 0 1740 62272 13 0 0 0 rsyslogd 23:58:40:[ 1774] 0 1774 4561 18 0 0 0 irqbalance 23:58:40:[ 1792] 32 1792 4744 1 0 0 0 rpcbind 23:58:40:[ 1814] 29 1814 5837 1 1 0 0 rpc.statd 23:58:40:[ 1848] 81 1848 6418 1 1 0 0 dbus-daemon 23:58:40:[ 1868] 0 1868 53919 1 1 0 0 ypbind 23:58:40:[ 1898] 0 1898 47233 1 1 0 0 cupsd 23:58:40:[ 1955] 0 1955 1020 0 1 0 0 acpid 23:58:40:[ 1967] 68 1967 10517 37 1 0 0 hald 23:58:40:[ 1968] 0 1968 5099 1 1 0 0 hald-runner 23:58:40:[ 2000] 0 2000 5629 1 1 0 0 hald-addon-inpu 23:58:40:[ 2010] 68 2010 4501 1 1 0 0 hald-addon-acpi 23:58:40:[ 2029] 0 2029 7646 1 1 0 0 zed 23:58:40:[ 2067] 0 2067 26827 0 0 0 0 rpc.rquotad 23:58:40:[ 2072] 0 2072 5417 0 0 0 0 rpc.mountd 23:58:40:[ 2116] 0 2116 6292 1 0 0 0 rpc.idmapd 23:58:40:[ 2163] 498 2163 57325 1 0 0 0 munged 23:58:40:[ 2274] 0 2274 16555 0 0 -17 -1000 sshd 23:58:40:[ 2285] 0 2285 5429 1 1 0 0 xinetd 23:58:40:[ 2317] 0 2317 22211 33 1 0 0 sendmail 23:58:40:[ 2326] 51 2326 20074 0 0 0 0 sendmail 23:58:40:[ 2354] 0 2354 29216 1 0 0 0 crond 23:58:40:[ 2369] 0 2369 5276 0 0 0 0 atd 23:58:40:[ 2382] 0 2382 1020 1 1 0 0 agetty 23:58:40:[ 2384] 0 2384 1016 1 0 0 0 mingetty 23:58:40:[ 2386] 0 2386 1016 1 1 0 0 mingetty 23:58:40:[ 2388] 0 2388 1016 1 1 0 0 mingetty 23:58:40:[ 2390] 0 2390 1016 1 1 0 0 mingetty 23:58:40:[ 2392] 0 2392 1016 1 1 0 0 mingetty 23:58:40:[ 2394] 0 2394 1016 1 1 0 0 mingetty 23:58:40:[ 2396] 0 2396 2664 0 0 -17 -1000 udevd 23:58:40:[ 2397] 0 2397 2662 0 0 -17 -1000 udevd 23:58:40:[ 2963] 38 2963 7689 1 1 0 0 ntpd 23:58:40:[ 1713] 0 1713 14747 1 1 0 0 in.mrshd 23:58:40:[ 1714] 0 1714 26515 1 1 0 0 bash 23:58:40:[ 1744] 0 1744 26515 0 0 0 0 bash 23:58:40:[ 1745] 0 1745 27594 1 1 0 0 sh 23:58:40:[ 1823] 0 1823 28024 1 0 0 0 lctl 23:58:40:Kernel panic - not syncing: Out of memory: system-wide panic_on_oom is enabled 23:58:40: 23:58:40:Pid: 1967, comm: hald Tainted: P -- ------------ 2.6.32-573.7.1.el6_lustre.gd359461.x86_64 #1 23:58:40:Call Trace: 23:58:40: [<ffffffff815386d4>] ? panic+0xa7/0x16f 23:58:40: [<ffffffff8112aaa1>] ? dump_header+0x191/0x1b0 23:58:40: [<ffffffff8112ab3c>] ? check_panic_on_oom+0x7c/0x80 23:58:40: [<ffffffff8112b1fb>] ? out_of_memory+0x1bb/0x3c0 23:58:40: [<ffffffff81137c3c>] ? __alloc_pages_nodemask+0x93c/0x950 23:58:40: [<ffffffff815395e2>] ? io_schedule+0x92/0xc0 23:58:40: [<ffffffff811279c0>] ? sync_page_killable+0x0/0x40 23:58:40: [<ffffffff8117074a>] ? alloc_pages_vma+0x9a/0x150 23:58:40: [<ffffffff81163c72>] ? read_swap_cache_async+0xf2/0x160 23:58:40: [<ffffffff811647c9>] ? valid_swaphandles+0x69/0x160 23:58:40: [<ffffffff81163d67>] ? swapin_readahead+0x87/0xc0 23:58:40: [<ffffffff81152ead>] ? handle_pte_fault+0x6dd/0xb20 23:58:40: [<ffffffff81153589>] ? handle_mm_fault+0x299/0x3d0 23:58:40: [<ffffffff81155fa6>] ? find_vma+0x46/0x80 23:58:40: [<ffffffff8104f156>] ? __do_page_fault+0x146/0x500 23:58:40: [<ffffffff81043f28>] ? pvclock_clocksource_read+0x58/0xd0 23:58:40: [<ffffffff81042fbc>] ? kvm_clock_read+0x1c/0x20 23:58:40: [<ffffffff81042fc9>] ? kvm_clock_get_cycles+0x9/0x10 23:58:40: [<ffffffff8153f3de>] ? do_page_fault+0x3e/0xa0 23:58:40: [<ffffffff8153c785>] ? page_fault+0x25/0x30
For https://testing.hpdd.intel.com/test_sets/0ad70b9c-7858-11e5-9a01-5254006e85c2, we see the following from the console on the MDS:
17:40:08:Lustre: 20451:0:(llog_test.c:1692:llog_test_10()) 10g: Cancel 65536 records, see one log zapped 17:40:08:irqbalance invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0 17:40:08:irqbalance cpuset=/ mems_allowed=0 17:40:08:Pid: 1349, comm: irqbalance Not tainted 2.6.32-573.7.1.el6_lustre.gd359461.x86_64 #1 17:40:08:Call Trace: 17:40:08: [<ffffffff810d71a1>] ? cpuset_print_task_mems_allowed+0x91/0xb0 17:40:08: [<ffffffff8112a9a0>] ? dump_header+0x90/0x1b0 17:40:08: [<ffffffff8112ab0e>] ? check_panic_on_oom+0x4e/0x80 17:40:08: [<ffffffff8112b1fb>] ? out_of_memory+0x1bb/0x3c0 17:40:08: [<ffffffff81137c3c>] ? __alloc_pages_nodemask+0x93c/0x950 17:40:08: [<ffffffff8117064a>] ? alloc_pages_current+0xaa/0x110 17:40:08: [<ffffffff81127d97>] ? __page_cache_alloc+0x87/0x90 17:40:08: [<ffffffff8112777e>] ? find_get_page+0x1e/0xa0 17:40:08: [<ffffffff81128d37>] ? filemap_fault+0x1a7/0x500 17:40:08: [<ffffffff811522f4>] ? __do_fault+0x54/0x530 17:40:08: [<ffffffff811528c7>] ? handle_pte_fault+0xf7/0xb20 17:40:08: [<ffffffff81066053>] ? perf_event_task_sched_out+0x33/0x70 17:40:08: [<ffffffff81043f28>] ? pvclock_clocksource_read+0x58/0xd0 17:40:08: [<ffffffff810098a5>] ? __switch_to+0x285/0x340 17:40:08: [<ffffffff81042fbc>] ? kvm_clock_read+0x1c/0x20 17:40:11: [<ffffffff81153589>] ? handle_mm_fault+0x299/0x3d0 17:40:11: [<ffffffff81538dce>] ? thread_return+0x4e/0x7d0 17:40:11: [<ffffffff8104f156>] ? __do_page_fault+0x146/0x500 17:40:11: [<ffffffff810a677d>] ? hrtimer_try_to_cancel+0x3d/0xd0 17:40:12: [<ffffffff810a6832>] ? hrtimer_cancel+0x22/0x30 17:40:12: [<ffffffff8153b463>] ? do_nanosleep+0x93/0xc0 17:40:12: [<ffffffff810a6904>] ? hrtimer_nanosleep+0xc4/0x180 17:40:12: [<ffffffff810a5730>] ? hrtimer_wakeup+0x0/0x30 17:40:12: [<ffffffff8153f3de>] ? do_page_fault+0x3e/0xa0 17:40:12: [<ffffffff8153c785>] ? page_fault+0x25/0x30 17:40:12:Mem-Info: 17:40:12:Node 0 DMA per-cpu: 17:40:12:CPU 0: hi: 0, btch: 1 usd: 0 17:40:12:CPU 1: hi: 0, btch: 1 usd: 0 17:40:12:Node 0 DMA32 per-cpu: 17:40:12:CPU 0: hi: 186, btch: 31 usd: 13 17:40:12:CPU 1: hi: 186, btch: 31 usd: 55 17:40:12:active_anon:1008 inactive_anon:1027 isolated_anon:0 17:40:12: active_file:862 inactive_file:834 isolated_file:0 17:40:12: unevictable:0 dirty:0 writeback:0 unstable:0 17:40:12: free:13265 slab_reclaimable:2472 slab_unreclaimable:437874 17:40:12: mapped:1 shmem:1 pagetables:709 bounce:0 17:40:12:Node 0 DMA free:8336kB min:332kB low:412kB high:496kB active_anon:0kB inactive_anon:0kB active_file:128kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15348kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:4kB slab_unreclaimable:7276kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:157 all_unreclaimable? yes 17:40:12:lowmem_reserve[]: 0 2004 2004 2004 17:40:12:Node 0 DMA32 free:44724kB min:44720kB low:55900kB high:67080kB active_anon:4032kB inactive_anon:4108kB active_file:3320kB inactive_file:3336kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2052308kB mlocked:0kB dirty:0kB writeback:0kB mapped:4kB shmem:4kB slab_reclaimable:9884kB slab_unreclaimable:1744220kB kernel_stack:3808kB pagetables:2836kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:430063 all_unreclaimable? yes 17:40:12:lowmem_reserve[]: 0 0 0 0 17:40:12:Node 0 DMA: 2*4kB 1*8kB 2*16kB 1*32kB 1*64kB 2*128kB 3*256kB 2*512kB 2*1024kB 2*2048kB 0*4096kB = 8336kB 17:40:13:Node 0 DMA32: 1993*4kB 954*8kB 510*16kB 271*32kB 72*64kB 8*128kB 4*256kB 1*512kB 1*1024kB 0*2048kB 1*4096kB = 44724kB 17:40:13:2723 total pagecache pages 17:40:13:1026 pages in swap cache 17:40:13:Swap cache stats: add 3678, delete 2652, find 0/0 17:40:13:Free swap = 4114052kB 17:40:13:Total swap = 4128764kB 17:40:13:524284 pages RAM 17:40:13:43736 pages reserved 17:40:13:1750 pages shared 17:40:13:438911 pages non-shared 17:40:13:[ pid ] uid tgid total_vm rss cpu oom_adj oom_score_adj name 17:40:13:[ 509] 0 509 2663 0 0 -17 -1000 udevd 17:40:13:[ 1281] 0 1281 6899 1 0 -17 -1000 auditd 17:40:13:[ 1315] 0 1315 62273 54 1 0 0 rsyslogd 17:40:13:[ 1349] 0 1349 4561 1 0 0 0 irqbalance 17:40:13:[ 1367] 32 1367 4744 1 1 0 0 rpcbind 17:40:13:[ 1389] 29 1389 5837 1 1 0 0 rpc.statd 17:40:13:[ 1423] 81 1423 6418 1 1 0 0 dbus-daemon 17:40:13:[ 1443] 0 1443 53919 1 1 0 0 ypbind 17:40:13:[ 1477] 0 1477 47233 1 1 0 0 cupsd 17:40:13:[ 1534] 0 1534 1020 0 1 0 0 acpid 17:40:13:[ 1546] 68 1546 10521 1 1 0 0 hald 17:40:13:[ 1547] 0 1547 5099 1 1 0 0 hald-runner 17:40:13:[ 1579] 0 1579 5629 1 1 0 0 hald-addon-inpu 17:40:13:[ 1586] 68 1586 4501 1 1 0 0 hald-addon-acpi 17:40:13:[ 1635] 0 1635 26827 0 0 0 0 rpc.rquotad 17:40:13:[ 1640] 0 1640 5417 0 0 0 0 rpc.mountd 17:40:13:[ 1684] 0 1684 6292 1 0 0 0 rpc.idmapd 17:40:13:[ 1721] 498 1721 57325 1 0 0 0 munged 17:40:13:[ 1832] 0 1832 16555 0 0 -17 -1000 sshd 17:40:13:[ 1843] 0 1843 5429 16 1 0 0 xinetd 17:40:13:[ 1875] 0 1875 22211 6 0 0 0 sendmail 17:40:13:[ 1884] 51 1884 20074 0 0 0 0 sendmail 17:40:13:[ 1912] 0 1912 29215 1 0 0 0 crond 17:40:13:[ 1927] 0 1927 5276 0 0 0 0 atd 17:40:13:[ 1940] 0 1940 1020 1 1 0 0 agetty 17:40:13:[ 1942] 0 1942 1016 1 0 0 0 mingetty 17:40:13:[ 1944] 0 1944 1016 1 0 0 0 mingetty 17:40:13:[ 1946] 0 1946 1016 1 0 0 0 mingetty 17:40:13:[ 1948] 0 1948 1016 1 1 0 0 mingetty 17:40:13:[ 1950] 0 1950 2730 0 0 -17 -1000 udevd 17:40:13:[ 1951] 0 1951 2728 0 0 -17 -1000 udevd 17:40:13:[ 1952] 0 1952 1016 1 0 0 0 mingetty 17:40:13:[ 1954] 0 1954 1016 1 0 0 0 mingetty 17:40:13:[ 2706] 38 2706 7689 1 1 0 0 ntpd 17:40:13:[11506] 0 11506 4763 1 1 0 0 anacron 17:40:13:[20349] 0 20349 14749 161 1 0 0 in.mrshd 17:40:13:[20350] 0 20350 26524 58 0 0 0 bash 17:40:13:[20375] 0 20375 26524 58 1 0 0 bash 17:40:13:[20376] 0 20376 27603 617 1 0 0 sh 17:40:13:[20451] 0 20451 28026 64 0 0 0 lctl 17:40:13:Kernel panic - not syncing: Out of memory: system-wide panic_on_oom is enabled 17:40:14: 17:40:14:Pid: 1349, comm: irqbalance Not tainted 2.6.32-573.7.1.el6_lustre.gd359461.x86_64 #1 17:40:14:Call Trace: 17:40:14: [<ffffffff815386d4>] ? panic+0xa7/0x16f 17:40:14: [<ffffffff8112aaa1>] ? dump_header+0x191/0x1b0 17:40:14: [<ffffffff8112ab3c>] ? check_panic_on_oom+0x7c/0x80 17:40:14: [<ffffffff8112b1fb>] ? out_of_memory+0x1bb/0x3c0 17:40:14: [<ffffffff81137c3c>] ? __alloc_pages_nodemask+0x93c/0x950 17:40:14: [<ffffffff8117064a>] ? alloc_pages_current+0xaa/0x110 17:40:14: [<ffffffff81127d97>] ? __page_cache_alloc+0x87/0x90 17:40:14: [<ffffffff8112777e>] ? find_get_page+0x1e/0xa0 17:40:14: [<ffffffff81128d37>] ? filemap_fault+0x1a7/0x500 17:40:14: [<ffffffff811522f4>] ? __do_fault+0x54/0x530 17:40:14: [<ffffffff811528c7>] ? handle_pte_fault+0xf7/0xb20 17:40:14: [<ffffffff81066053>] ? perf_event_task_sched_out+0x33/0x70 17:40:14: [<ffffffff81043f28>] ? pvclock_clocksource_read+0x58/0xd0 17:40:14: [<ffffffff810098a5>] ? __switch_to+0x285/0x340 17:40:14: [<ffffffff81042fbc>] ? kvm_clock_read+0x1c/0x20 17:40:14: [<ffffffff81153589>] ? handle_mm_fault+0x299/0x3d0 17:40:14: [<ffffffff81538dce>] ? thread_return+0x4e/0x7d0 17:40:14: [<ffffffff8104f156>] ? __do_page_fault+0x146/0x500 17:40:14: [<ffffffff810a677d>] ? hrtimer_try_to_cancel+0x3d/0xd0 17:40:14: [<ffffffff810a6832>] ? hrtimer_cancel+0x22/0x30 17:40:14: [<ffffffff8153b463>] ? do_nanosleep+0x93/0xc0 17:40:14: [<ffffffff810a6904>] ? hrtimer_nanosleep+0xc4/0x180 17:40:14: [<ffffffff810a5730>] ? hrtimer_wakeup+0x0/0x30 17:40:14: [<ffffffff8153f3de>] ? do_page_fault+0x3e/0xa0 17:40:14: [<ffffffff8153c785>] ? page_fault+0x25/0x30
For https://testing.hpdd.intel.com/test_sets/aee14c08-769a-11e5-b71e-5254006e85c2, we see the following from the console on the MDS:
13:02:02:Lustre: 20522:0:(llog_test.c:1443:llog_test_10()) 10c: write 131072 more log records 13:02:02:sendmail invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0 13:02:02:sendmail cpuset=/ mems_allowed=0 13:02:02:Pid: 1827, comm: sendmail Not tainted 2.6.32-573.7.1.el6_lustre.gd359461.x86_64 #1 13:02:02:Call Trace: 13:02:02: [<ffffffff810d71a1>] ? cpuset_print_task_mems_allowed+0x91/0xb0 13:02:02: [<ffffffff8112a9a0>] ? dump_header+0x90/0x1b0 13:02:02: [<ffffffff8112ab0e>] ? check_panic_on_oom+0x4e/0x80 13:02:02: [<ffffffff8112b1fb>] ? out_of_memory+0x1bb/0x3c0 13:02:02: [<ffffffff81137c3c>] ? __alloc_pages_nodemask+0x93c/0x950 13:02:02: [<ffffffff8117064a>] ? alloc_pages_current+0xaa/0x110 13:02:02: [<ffffffff81127d97>] ? __page_cache_alloc+0x87/0x90 13:02:02: [<ffffffff8112777e>] ? find_get_page+0x1e/0xa0 13:02:02: [<ffffffff81128d37>] ? filemap_fault+0x1a7/0x500 13:02:02: [<ffffffff811522f4>] ? __do_fault+0x54/0x530 13:02:02: [<ffffffff811528c7>] ? handle_pte_fault+0xf7/0xb20 13:02:02: [<ffffffff811b4901>] ? save_mount_options+0x21/0x40 13:02:02: [<ffffffff811b7b22>] ? seq_vprintf+0x32/0x60 13:02:02: [<ffffffff81012bbe>] ? copy_user_generic+0xe/0x20 13:02:02: [<ffffffff81153589>] ? handle_mm_fault+0x299/0x3d0 13:02:02: [<ffffffff8104f156>] ? __do_page_fault+0x146/0x500 13:02:02: [<ffffffff81042fc9>] ? kvm_clock_get_cycles+0x9/0x10 13:02:02: [<ffffffff810ad40f>] ? ktime_get_ts+0xbf/0x100 13:02:02: [<ffffffff811a9778>] ? poll_select_copy_remaining+0xf8/0x150 13:02:02: [<ffffffff8153f3de>] ? do_page_fault+0x3e/0xa0 13:02:02: [<ffffffff8153c785>] ? page_fault+0x25/0x30 13:02:02:Mem-Info: 13:02:02:Node 0 DMA per-cpu: 13:02:02:CPU 0: hi: 0, btch: 1 usd: 0 13:02:02:Node 0 DMA32 per-cpu: 13:02:02:CPU 0: hi: 186, btch: 31 usd: 115 13:02:02:active_anon:1685 inactive_anon:1752 isolated_anon:0 13:02:02: active_file:1244 inactive_file:1252 isolated_file:0 13:02:02: unevictable:0 dirty:0 writeback:0 unstable:0 13:02:02: free:13240 slab_reclaimable:2343 slab_unreclaimable:436182 13:02:02: mapped:1 shmem:1 pagetables:676 bounce:0 13:02:02:Node 0 DMA free:8352kB min:332kB low:412kB high:496kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:4kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15348kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:4kB slab_unreclaimable:7384kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:3 all_unreclaimable? yes 13:02:02:lowmem_reserve[]: 0 2004 2004 2004 13:02:02:Node 0 DMA32 free:44608kB min:44720kB low:55900kB high:67080kB active_anon:6740kB inactive_anon:7008kB active_file:4976kB inactive_file:5004kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2052308kB mlocked:0kB dirty:0kB writeback:0kB mapped:4kB shmem:4kB slab_reclaimable:9368kB slab_unreclaimable:1737344kB kernel_stack:3200kB pagetables:2704kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:374310 all_unreclaimable? yes 13:02:02:lowmem_reserve[]: 0 0 0 0 13:02:02:Node 0 DMA: 2*4kB 3*8kB 2*16kB 3*32kB 2*64kB 3*128kB 2*256kB 2*512kB 2*1024kB 2*2048kB 0*4096kB = 8352kB 13:02:02:Node 0 DMA32: 2062*4kB 1079*8kB 497*16kB 232*32kB 79*64kB 11*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 1*4096kB = 44608kB 13:02:02:4249 total pagecache pages 13:02:02:1752 pages in swap cache 13:02:02:Swap cache stats: add 2908, delete 1156, find 0/0 13:02:02:Free swap = 4117132kB 13:02:02:Total swap = 4128764kB 13:02:02:524284 pages RAM 13:02:02:43736 pages reserved 13:02:02:2548 pages shared 13:02:02:438391 pages non-shared 13:02:02:[ pid ] uid tgid total_vm rss cpu oom_adj oom_score_adj name 13:02:02:[ 482] 0 482 2716 23 0 -17 -1000 udevd 13:02:02:[ 1235] 0 1235 6902 14 0 -17 -1000 auditd 13:02:02:[ 1269] 0 1269 62273 201 0 0 0 rsyslogd 13:02:02:[ 1320] 32 1320 4744 1 0 0 0 rpcbind 13:02:02:[ 1342] 29 1342 5837 1 0 0 0 rpc.statd 13:02:02:[ 1376] 81 1376 6444 1 0 0 0 dbus-daemon 13:02:02:[ 1398] 0 1398 53919 1 0 0 0 ypbind 13:02:02:[ 1430] 0 1430 47233 1 0 0 0 cupsd 13:02:02:[ 1486] 0 1486 1020 0 0 0 0 acpid 13:02:02:[ 1498] 68 1498 10523 1 0 0 0 hald 13:02:02:[ 1499] 0 1499 5099 1 0 0 0 hald-runner 13:02:02:[ 1531] 0 1531 5629 1 0 0 0 hald-addon-inpu 13:02:02:[ 1541] 68 1541 4501 1 0 0 0 hald-addon-acpi 13:02:02:[ 1587] 0 1587 26827 0 0 0 0 rpc.rquotad 13:02:02:[ 1592] 0 1592 5417 0 0 0 0 rpc.mountd 13:02:02:[ 1636] 0 1636 6292 1 0 0 0 rpc.idmapd 13:02:02:[ 1673] 498 1673 57325 139 0 0 0 munged 13:02:02:[ 1784] 0 1784 16555 0 0 -17 -1000 sshd 13:02:02:[ 1795] 0 1795 5429 18 0 0 0 xinetd 13:02:02:[ 1827] 0 1827 22210 23 0 0 0 sendmail 13:02:02:[ 1836] 51 1836 20074 13 0 0 0 sendmail 13:02:02:[ 1864] 0 1864 29216 10 0 0 0 crond 13:02:02:[ 1879] 0 1879 5276 0 0 0 0 atd 13:02:02:[ 1893] 0 1893 1020 2 0 0 0 agetty 13:02:02:[ 1894] 0 1894 1016 2 0 0 0 mingetty 13:02:02:[ 1896] 0 1896 1016 12 0 0 0 mingetty 13:02:02:[ 1898] 0 1898 1016 21 0 0 0 mingetty 13:02:02:[ 1900] 0 1900 1016 22 0 0 0 mingetty 13:02:02:[ 1902] 0 1902 2715 24 0 -17 -1000 udevd 13:02:02:[ 1903] 0 1903 2714 31 0 -17 -1000 udevd 13:02:02:[ 1904] 0 1904 1016 21 0 0 0 mingetty 13:02:02:[ 1906] 0 1906 1016 22 0 0 0 mingetty 13:02:02:[ 2658] 38 2658 7689 154 0 0 0 ntpd 13:02:02:[20420] 0 20420 14749 161 0 0 0 in.mrshd 13:02:02:[20421] 0 20421 26524 59 0 0 0 bash 13:02:02:[20446] 0 20446 26524 59 0 0 0 bash 13:02:02:[20447] 0 20447 27603 617 0 0 0 sh 13:02:02:[20522] 0 20522 28026 65 0 0 0 lctl 13:02:02:Kernel panic - not syncing: Out of memory: system-wide panic_on_oom is enabled 13:02:02: 13:02:02:Pid: 1827, comm: sendmail Not tainted 2.6.32-573.7.1.el6_lustre.gd359461.x86_64 #1 13:02:02:Call Trace: 13:02:02: [<ffffffff815386d4>] ? panic+0xa7/0x16f 13:02:02: [<ffffffff8112aaa1>] ? dump_header+0x191/0x1b0 13:02:02: [<ffffffff8112ab3c>] ? check_panic_on_oom+0x7c/0x80 13:02:02: [<ffffffff8112b1fb>] ? out_of_memory+0x1bb/0x3c0 13:02:02: [<ffffffff81137c3c>] ? __alloc_pages_nodemask+0x93c/0x950 13:02:02: [<ffffffff8117064a>] ? alloc_pages_current+0xaa/0x110 13:02:02: [<ffffffff81127d97>] ? __page_cache_alloc+0x87/0x90 13:02:02: [<ffffffff8112777e>] ? find_get_page+0x1e/0xa0 13:02:02: [<ffffffff81128d37>] ? filemap_fault+0x1a7/0x500 13:02:02: [<ffffffff811522f4>] ? __do_fault+0x54/0x530 13:02:02: [<ffffffff811528c7>] ? handle_pte_fault+0xf7/0xb20 13:02:02: [<ffffffff811b4901>] ? save_mount_options+0x21/0x40 13:02:02: [<ffffffff811b7b22>] ? seq_vprintf+0x32/0x60 13:02:02: [<ffffffff81012bbe>] ? copy_user_generic+0xe/0x20 13:02:02: [<ffffffff81153589>] ? handle_mm_fault+0x299/0x3d0 13:02:02: [<ffffffff8104f156>] ? __do_page_fault+0x146/0x500 13:02:02: [<ffffffff81042fc9>] ? kvm_clock_get_cycles+0x9/0x10 13:02:02: [<ffffffff810ad40f>] ? ktime_get_ts+0xbf/0x100 13:02:02: [<ffffffff811a9778>] ? poll_select_copy_remaining+0xf8/0x150 13:02:02: [<ffffffff8153f3de>] ? do_page_fault+0x3e/0xa0 13:02:02: [<ffffffff8153c785>] ? page_fault+0x25/0x30
For https://testing.hpdd.intel.com/test_sets/4790f2ba-767c-11e5-86b7-5254006e85c2, we see the following from the console on the MDS:
08:00:56:Lustre: 20445:0:(llog_test.c:1443:llog_test_10()) 10c: write 131072 more log records 08:00:56:sendmail invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0 08:00:56:sendmail cpuset=/ mems_allowed=0 08:00:56:Pid: 1875, comm: sendmail Not tainted 2.6.32-573.7.1.el6_lustre.gd359461.x86_64 #1 08:00:56:Call Trace: 08:00:56: [<ffffffff810d71a1>] ? cpuset_print_task_mems_allowed+0x91/0xb0 08:00:56: [<ffffffff8112a9a0>] ? dump_header+0x90/0x1b0 08:00:56: [<ffffffff8112ab0e>] ? check_panic_on_oom+0x4e/0x80 08:00:56: [<ffffffff8112b1fb>] ? out_of_memory+0x1bb/0x3c0 08:00:56: [<ffffffff81137c3c>] ? __alloc_pages_nodemask+0x93c/0x950 08:00:56: [<ffffffff8117064a>] ? alloc_pages_current+0xaa/0x110 08:00:56: [<ffffffff81127d97>] ? __page_cache_alloc+0x87/0x90 08:00:56: [<ffffffff8112777e>] ? find_get_page+0x1e/0xa0 08:00:56: [<ffffffff81128d37>] ? filemap_fault+0x1a7/0x500 08:00:56: [<ffffffff811522f4>] ? __do_fault+0x54/0x530 08:00:56: [<ffffffff811528c7>] ? handle_pte_fault+0xf7/0xb20 08:00:56: [<ffffffff811b4901>] ? save_mount_options+0x21/0x40 08:00:56: [<ffffffff811b7b22>] ? seq_vprintf+0x32/0x60 08:00:56: [<ffffffff81012bbe>] ? copy_user_generic+0xe/0x20 08:00:56: [<ffffffff81153589>] ? handle_mm_fault+0x299/0x3d0 08:00:56: [<ffffffff8104f156>] ? __do_page_fault+0x146/0x500 08:00:56: [<ffffffff81042fc9>] ? kvm_clock_get_cycles+0x9/0x10 08:00:56: [<ffffffff810ad40f>] ? ktime_get_ts+0xbf/0x100 08:00:56: [<ffffffff811a9778>] ? poll_select_copy_remaining+0xf8/0x150 08:00:56: [<ffffffff8153f3de>] ? do_page_fault+0x3e/0xa0 08:00:56: [<ffffffff8153c785>] ? page_fault+0x25/0x30 08:00:56:Mem-Info: 08:00:56:Node 0 DMA per-cpu: 08:00:56:CPU 0: hi: 0, btch: 1 usd: 0 08:00:56:CPU 1: hi: 0, btch: 1 usd: 0 08:00:56:Node 0 DMA32 per-cpu: 08:00:56:CPU 0: hi: 186, btch: 31 usd: 94 08:00:56:CPU 1: hi: 186, btch: 31 usd: 0 08:00:56:active_anon:2321 inactive_anon:2319 isolated_anon:0 08:00:56: active_file:1224 inactive_file:1294 isolated_file:0 08:00:56: unevictable:0 dirty:13 writeback:2015 unstable:0 08:00:56: free:13240 slab_reclaimable:2534 slab_unreclaimable:434564 08:00:56: mapped:18 shmem:14 pagetables:709 bounce:0 08:00:56:Node 0 DMA free:8336kB min:332kB low:412kB high:496kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15348kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:4kB slab_unreclaimable:7328kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:1 all_unreclaimable? yes 08:00:56:lowmem_reserve[]: 0 2004 2004 2004 08:00:56:Node 0 DMA32 free:44624kB min:44720kB low:55900kB high:67080kB active_anon:9284kB inactive_anon:9276kB active_file:4896kB inactive_file:5176kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2052308kB mlocked:0kB dirty:52kB writeback:8060kB mapped:72kB shmem:56kB slab_reclaimable:10132kB slab_unreclaimable:1730928kB kernel_stack:3680kB pagetables:2836kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:300165 all_unreclaimable? yes 08:00:56:lowmem_reserve[]: 0 0 0 0 08:00:56:Node 0 DMA: 2*4kB 2*8kB 2*16kB 2*32kB 3*64kB 3*128kB 2*256kB 2*512kB 2*1024kB 2*2048kB 0*4096kB = 8376kB 08:00:56:Node 0 DMA32: 2124*4kB 1102*8kB 573*16kB 241*32kB 57*64kB 7*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 1*4096kB = 44624kB 08:00:56:4866 total pagecache pages 08:00:56:2340 pages in swap cache 08:00:56:Swap cache stats: add 2340, delete 0, find 0/0 08:00:56:Free swap = 4119404kB 08:00:56:Total swap = 4128764kB 08:00:56:524284 pages RAM 08:00:56:43736 pages reserved 08:00:56:2566 pages shared 08:00:56:438172 pages non-shared 08:00:56:[ pid ] uid tgid total_vm rss cpu oom_adj oom_score_adj name 08:00:56:[ 506] 0 506 2708 25 0 -17 -1000 udevd 08:00:56:[ 1281] 0 1281 6902 13 0 -17 -1000 auditd 08:00:56:[ 1315] 0 1315 62272 176 1 0 0 rsyslogd 08:00:56:[ 1349] 0 1349 4561 1 1 0 0 irqbalance 08:00:56:[ 1367] 32 1367 4744 1 0 0 0 rpcbind 08:00:56:[ 1389] 29 1389 5837 1 1 0 0 rpc.statd 08:00:56:[ 1423] 81 1423 6418 1 0 0 0 dbus-daemon 08:00:56:[ 1443] 0 1443 53919 1 1 0 0 ypbind 08:00:56:[ 1477] 0 1477 47233 1 1 0 0 cupsd 08:00:56:[ 1534] 0 1534 1020 0 1 0 0 acpid 08:00:56:[ 1546] 68 1546 10517 1 1 0 0 hald 08:00:56:[ 1547] 0 1547 5099 1 0 0 0 hald-runner 08:00:56:[ 1579] 0 1579 5629 1 1 0 0 hald-addon-inpu 08:00:56:[ 1589] 68 1589 4501 1 1 0 0 hald-addon-acpi 08:00:56:[ 1635] 0 1635 26827 0 0 0 0 rpc.rquotad 08:00:56:[ 1640] 0 1640 5417 0 0 0 0 rpc.mountd 08:00:56:[ 1684] 0 1684 6292 1 0 0 0 rpc.idmapd 08:00:56:[ 1721] 498 1721 57325 139 0 0 0 munged 08:00:56:[ 1832] 0 1832 16555 0 0 -17 -1000 sshd 08:00:56:[ 1843] 0 1843 5429 17 1 0 0 xinetd 08:00:56:[ 1875] 0 1875 22211 31 0 0 0 sendmail 08:00:56:[ 1884] 51 1884 20074 365 0 0 0 sendmail 08:00:56:[ 1912] 0 1912 29215 151 1 0 0 crond 08:00:56:[ 1927] 0 1927 5276 45 0 0 0 atd 08:00:56:[ 1940] 0 1940 1020 22 1 0 0 agetty 08:00:56:[ 1942] 0 1942 1016 21 1 0 0 mingetty 08:00:56:[ 1944] 0 1944 1016 21 1 0 0 mingetty 08:00:56:[ 1946] 0 1946 1016 21 1 0 0 mingetty 08:00:56:[ 1948] 0 1948 1016 22 1 0 0 mingetty 08:00:56:[ 1951] 0 1951 2710 35 0 -17 -1000 udevd 08:00:56:[ 1952] 0 1952 2708 32 0 -17 -1000 udevd 08:00:56:[ 1953] 0 1953 1016 21 0 0 0 mingetty 08:00:56:[ 1955] 0 1955 1016 21 0 0 0 mingetty 08:00:56:[ 2711] 38 2711 7689 156 1 0 0 ntpd 08:00:56:[21148] 0 21148 4763 55 1 0 0 anacron 08:00:56:[20343] 0 20343 14749 161 0 0 0 in.mrshd 08:00:56:[20344] 0 20344 26524 58 0 0 0 bash 08:00:56:[20369] 0 20369 26524 58 1 0 0 bash 08:00:56:[20370] 0 20370 27603 619 0 0 0 sh 08:00:56:[20445] 0 20445 28026 65 1 0 0 lctl 08:00:56:Kernel panic - not syncing: Out of memory: system-wide panic_on_oom is enabled 08:00:56: 08:00:56:Pid: 1875, comm: sendmail Not tainted 2.6.32-573.7.1.el6_lustre.gd359461.x86_64 #1 08:00:56:Call Trace: 08:00:56: [<ffffffff815386d4>] ? panic+0xa7/0x16f 08:00:56: [<ffffffff8112aaa1>] ? dump_header+0x191/0x1b0 08:00:56: [<ffffffff8112ab3c>] ? check_panic_on_oom+0x7c/0x80 08:00:56: [<ffffffff8112b1fb>] ? out_of_memory+0x1bb/0x3c0 08:00:56: [<ffffffff81137c3c>] ? __alloc_pages_nodemask+0x93c/0x950 08:00:56: [<ffffffff8117064a>] ? alloc_pages_current+0xaa/0x110 08:00:56: [<ffffffff81127d97>] ? __page_cache_alloc+0x87/0x90 08:00:56: [<ffffffff8112777e>] ? find_get_page+0x1e/0xa0 08:00:56: [<ffffffff81128d37>] ? filemap_fault+0x1a7/0x500 08:00:56: [<ffffffff811522f4>] ? __do_fault+0x54/0x530 08:00:56: [<ffffffff811528c7>] ? handle_pte_fault+0xf7/0xb20 08:00:56: [<ffffffff811b4901>] ? save_mount_options+0x21/0x40 08:00:56: [<ffffffff811b7b22>] ? seq_vprintf+0x32/0x60 08:00:56: [<ffffffff81012bbe>] ? copy_user_generic+0xe/0x20 08:00:56: [<ffffffff81153589>] ? handle_mm_fault+0x299/0x3d0 08:00:56: [<ffffffff8104f156>] ? __do_page_fault+0x146/0x500 08:00:56: [<ffffffff81042fc9>] ? kvm_clock_get_cycles+0x9/0x10 08:00:56: [<ffffffff810ad40f>] ? ktime_get_ts+0xbf/0x100 08:00:56: [<ffffffff811a9778>] ? poll_select_copy_remaining+0xf8/0x150 08:00:56: [<ffffffff8153f3de>] ? do_page_fault+0x3e/0xa0 08:00:56: [<ffffffff8153c785>] ? page_fault+0x25/0x30