Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7329

sanity test_60a timeouts with “* invoking oom-killer”

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.8.0
    • Lustre 2.8.0
    • None
    • autotest
    • 3
    • 9223372036854775807

    Description

      We’ve seen four recent hangs on the MDS during sanity test 60a due to OOM.

      For https://testing.hpdd.intel.com/test_sets/383ea288-787c-11e5-b04c-5254006e85c2, we see the following from the console on the MDS:

      23:58:40:Lustre: 1823:0:(llog_test.c:962:llog_test_7_sub()) 7_sub: records are not aligned, written 64452 from 64767
      23:58:40:Lustre: 1823:0:(llog_test.c:1079:llog_test_7()) 7g: test llog_gen_rec
      23:58:40:hald invoked oom-killer: gfp_mask=0x200da, order=0, oom_adj=0, oom_score_adj=0
      23:58:40:hald cpuset=/ mems_allowed=0
      23:58:40:Pid: 1967, comm: hald Tainted: P           -- ------------    2.6.32-573.7.1.el6_lustre.gd359461.x86_64 #1
      23:58:40:Call Trace:
      23:58:40: [<ffffffff810d71a1>] ? cpuset_print_task_mems_allowed+0x91/0xb0
      23:58:40: [<ffffffff8112a9a0>] ? dump_header+0x90/0x1b0
      23:58:40: [<ffffffff8112ab0e>] ? check_panic_on_oom+0x4e/0x80
      23:58:40: [<ffffffff8112b1fb>] ? out_of_memory+0x1bb/0x3c0
      23:58:40: [<ffffffff81137c3c>] ? __alloc_pages_nodemask+0x93c/0x950
      23:58:40: [<ffffffff815395e2>] ? io_schedule+0x92/0xc0
      23:58:40: [<ffffffff811279c0>] ? sync_page_killable+0x0/0x40
      23:58:40: [<ffffffff8117074a>] ? alloc_pages_vma+0x9a/0x150
      23:58:40: [<ffffffff81163c72>] ? read_swap_cache_async+0xf2/0x160
      23:58:40: [<ffffffff811647c9>] ? valid_swaphandles+0x69/0x160
      23:58:40: [<ffffffff81163d67>] ? swapin_readahead+0x87/0xc0
      23:58:40: [<ffffffff81152ead>] ? handle_pte_fault+0x6dd/0xb20
      23:58:40: [<ffffffff81153589>] ? handle_mm_fault+0x299/0x3d0
      23:58:40: [<ffffffff81155fa6>] ? find_vma+0x46/0x80
      23:58:40: [<ffffffff8104f156>] ? __do_page_fault+0x146/0x500
      23:58:40: [<ffffffff81043f28>] ? pvclock_clocksource_read+0x58/0xd0
      23:58:40: [<ffffffff81042fbc>] ? kvm_clock_read+0x1c/0x20
      23:58:40: [<ffffffff81042fc9>] ? kvm_clock_get_cycles+0x9/0x10
      23:58:40: [<ffffffff8153f3de>] ? do_page_fault+0x3e/0xa0
      23:58:40: [<ffffffff8153c785>] ? page_fault+0x25/0x30
      23:58:40:Mem-Info:
      23:58:40:Node 0 DMA per-cpu:
      23:58:40:CPU    0: hi:    0, btch:   1 usd:   0
      23:58:40:CPU    1: hi:    0, btch:   1 usd:   0
      23:58:40:Node 0 DMA32 per-cpu:
      23:58:40:CPU    0: hi:  186, btch:  31 usd:  30
      23:58:40:CPU    1: hi:  186, btch:  31 usd:   4
      23:58:40:active_anon:97 inactive_anon:877 isolated_anon:0
      23:58:40: active_file:3 inactive_file:55 isolated_file:0
      23:58:40: unevictable:0 dirty:0 writeback:876 unstable:0
      23:58:40: free:13242 slab_reclaimable:1788 slab_unreclaimable:442860
      23:58:40: mapped:1 shmem:1 pagetables:681 bounce:0
      23:58:40:Node 0 DMA free:8356kB min:332kB low:412kB high:496kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15348kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:7388kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
      23:58:40:lowmem_reserve[]: 0 2004 2004 2004
      23:58:40:Node 0 DMA32 free:44612kB min:44720kB low:55900kB high:67080kB active_anon:388kB inactive_anon:3508kB active_file:12kB inactive_file:220kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2052308kB mlocked:0kB dirty:0kB writeback:3504kB mapped:4kB shmem:4kB slab_reclaimable:7152kB slab_unreclaimable:1764052kB kernel_stack:4112kB pagetables:2724kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:137468 all_unreclaimable? yes
      23:58:40:lowmem_reserve[]: 0 0 0 0
      23:58:40:Node 0 DMA: 5*4kB 0*8kB 1*16kB 0*32kB 0*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 2*2048kB 1*4096kB = 8356kB
      23:58:40:Node 0 DMA32: 527*4kB 271*8kB 147*16kB 109*32kB 69*64kB 47*128kB 20*256kB 11*512kB 7*1024kB 1*2048kB 1*4096kB = 44612kB
      23:58:40:957 total pagecache pages
      23:58:40:898 pages in swap cache
      23:58:40:Swap cache stats: add 5911, delete 5013, find 54/85
      23:58:40:Free swap  = 4106092kB
      23:58:40:Total swap = 4128764kB
      23:58:40:524284 pages RAM
      23:58:40:43736 pages reserved
      23:58:40:88 pages shared
      23:58:40:438370 pages non-shared
      23:58:40:[ pid ]   uid  tgid total_vm      rss cpu oom_adj oom_score_adj name
      23:58:40:[  520]     0   520     2709        0   0     -17         -1000 udevd
      23:58:40:[ 1706]     0  1706     6902        1   0     -17         -1000 auditd
      23:58:40:[ 1740]     0  1740    62272       13   0       0             0 rsyslogd
      23:58:40:[ 1774]     0  1774     4561       18   0       0             0 irqbalance
      23:58:40:[ 1792]    32  1792     4744        1   0       0             0 rpcbind
      23:58:40:[ 1814]    29  1814     5837        1   1       0             0 rpc.statd
      23:58:40:[ 1848]    81  1848     6418        1   1       0             0 dbus-daemon
      23:58:40:[ 1868]     0  1868    53919        1   1       0             0 ypbind
      23:58:40:[ 1898]     0  1898    47233        1   1       0             0 cupsd
      23:58:40:[ 1955]     0  1955     1020        0   1       0             0 acpid
      23:58:40:[ 1967]    68  1967    10517       37   1       0             0 hald
      23:58:40:[ 1968]     0  1968     5099        1   1       0             0 hald-runner
      23:58:40:[ 2000]     0  2000     5629        1   1       0             0 hald-addon-inpu
      23:58:40:[ 2010]    68  2010     4501        1   1       0             0 hald-addon-acpi
      23:58:40:[ 2029]     0  2029     7646        1   1       0             0 zed
      23:58:40:[ 2067]     0  2067    26827        0   0       0             0 rpc.rquotad
      23:58:40:[ 2072]     0  2072     5417        0   0       0             0 rpc.mountd
      23:58:40:[ 2116]     0  2116     6292        1   0       0             0 rpc.idmapd
      23:58:40:[ 2163]   498  2163    57325        1   0       0             0 munged
      23:58:40:[ 2274]     0  2274    16555        0   0     -17         -1000 sshd
      23:58:40:[ 2285]     0  2285     5429        1   1       0             0 xinetd
      23:58:40:[ 2317]     0  2317    22211       33   1       0             0 sendmail
      23:58:40:[ 2326]    51  2326    20074        0   0       0             0 sendmail
      23:58:40:[ 2354]     0  2354    29216        1   0       0             0 crond
      23:58:40:[ 2369]     0  2369     5276        0   0       0             0 atd
      23:58:40:[ 2382]     0  2382     1020        1   1       0             0 agetty
      23:58:40:[ 2384]     0  2384     1016        1   0       0             0 mingetty
      23:58:40:[ 2386]     0  2386     1016        1   1       0             0 mingetty
      23:58:40:[ 2388]     0  2388     1016        1   1       0             0 mingetty
      23:58:40:[ 2390]     0  2390     1016        1   1       0             0 mingetty
      23:58:40:[ 2392]     0  2392     1016        1   1       0             0 mingetty
      23:58:40:[ 2394]     0  2394     1016        1   1       0             0 mingetty
      23:58:40:[ 2396]     0  2396     2664        0   0     -17         -1000 udevd
      23:58:40:[ 2397]     0  2397     2662        0   0     -17         -1000 udevd
      23:58:40:[ 2963]    38  2963     7689        1   1       0             0 ntpd
      23:58:40:[ 1713]     0  1713    14747        1   1       0             0 in.mrshd
      23:58:40:[ 1714]     0  1714    26515        1   1       0             0 bash
      23:58:40:[ 1744]     0  1744    26515        0   0       0             0 bash
      23:58:40:[ 1745]     0  1745    27594        1   1       0             0 sh
      23:58:40:[ 1823]     0  1823    28024        1   0       0             0 lctl
      23:58:40:Kernel panic - not syncing: Out of memory: system-wide panic_on_oom is enabled
      23:58:40:
      23:58:40:Pid: 1967, comm: hald Tainted: P           -- ------------    2.6.32-573.7.1.el6_lustre.gd359461.x86_64 #1
      23:58:40:Call Trace:
      23:58:40: [<ffffffff815386d4>] ? panic+0xa7/0x16f
      23:58:40: [<ffffffff8112aaa1>] ? dump_header+0x191/0x1b0
      23:58:40: [<ffffffff8112ab3c>] ? check_panic_on_oom+0x7c/0x80
      23:58:40: [<ffffffff8112b1fb>] ? out_of_memory+0x1bb/0x3c0
      23:58:40: [<ffffffff81137c3c>] ? __alloc_pages_nodemask+0x93c/0x950
      23:58:40: [<ffffffff815395e2>] ? io_schedule+0x92/0xc0
      23:58:40: [<ffffffff811279c0>] ? sync_page_killable+0x0/0x40
      23:58:40: [<ffffffff8117074a>] ? alloc_pages_vma+0x9a/0x150
      23:58:40: [<ffffffff81163c72>] ? read_swap_cache_async+0xf2/0x160
      23:58:40: [<ffffffff811647c9>] ? valid_swaphandles+0x69/0x160
      23:58:40: [<ffffffff81163d67>] ? swapin_readahead+0x87/0xc0
      23:58:40: [<ffffffff81152ead>] ? handle_pte_fault+0x6dd/0xb20
      23:58:40: [<ffffffff81153589>] ? handle_mm_fault+0x299/0x3d0
      23:58:40: [<ffffffff81155fa6>] ? find_vma+0x46/0x80
      23:58:40: [<ffffffff8104f156>] ? __do_page_fault+0x146/0x500
      23:58:40: [<ffffffff81043f28>] ? pvclock_clocksource_read+0x58/0xd0
      23:58:40: [<ffffffff81042fbc>] ? kvm_clock_read+0x1c/0x20
      23:58:40: [<ffffffff81042fc9>] ? kvm_clock_get_cycles+0x9/0x10
      23:58:40: [<ffffffff8153f3de>] ? do_page_fault+0x3e/0xa0
      23:58:40: [<ffffffff8153c785>] ? page_fault+0x25/0x30
      

      For https://testing.hpdd.intel.com/test_sets/0ad70b9c-7858-11e5-9a01-5254006e85c2, we see the following from the console on the MDS:

      17:40:08:Lustre: 20451:0:(llog_test.c:1692:llog_test_10()) 10g: Cancel 65536 records, see one log zapped
      17:40:08:irqbalance invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0
      17:40:08:irqbalance cpuset=/ mems_allowed=0
      17:40:08:Pid: 1349, comm: irqbalance Not tainted 2.6.32-573.7.1.el6_lustre.gd359461.x86_64 #1
      17:40:08:Call Trace:
      17:40:08: [<ffffffff810d71a1>] ? cpuset_print_task_mems_allowed+0x91/0xb0
      17:40:08: [<ffffffff8112a9a0>] ? dump_header+0x90/0x1b0
      17:40:08: [<ffffffff8112ab0e>] ? check_panic_on_oom+0x4e/0x80
      17:40:08: [<ffffffff8112b1fb>] ? out_of_memory+0x1bb/0x3c0
      17:40:08: [<ffffffff81137c3c>] ? __alloc_pages_nodemask+0x93c/0x950
      17:40:08: [<ffffffff8117064a>] ? alloc_pages_current+0xaa/0x110
      17:40:08: [<ffffffff81127d97>] ? __page_cache_alloc+0x87/0x90
      17:40:08: [<ffffffff8112777e>] ? find_get_page+0x1e/0xa0
      17:40:08: [<ffffffff81128d37>] ? filemap_fault+0x1a7/0x500
      17:40:08: [<ffffffff811522f4>] ? __do_fault+0x54/0x530
      17:40:08: [<ffffffff811528c7>] ? handle_pte_fault+0xf7/0xb20
      17:40:08: [<ffffffff81066053>] ? perf_event_task_sched_out+0x33/0x70
      17:40:08: [<ffffffff81043f28>] ? pvclock_clocksource_read+0x58/0xd0
      17:40:08: [<ffffffff810098a5>] ? __switch_to+0x285/0x340
      17:40:08: [<ffffffff81042fbc>] ? kvm_clock_read+0x1c/0x20
      17:40:11: [<ffffffff81153589>] ? handle_mm_fault+0x299/0x3d0
      17:40:11: [<ffffffff81538dce>] ? thread_return+0x4e/0x7d0
      17:40:11: [<ffffffff8104f156>] ? __do_page_fault+0x146/0x500
      17:40:11: [<ffffffff810a677d>] ? hrtimer_try_to_cancel+0x3d/0xd0
      17:40:12: [<ffffffff810a6832>] ? hrtimer_cancel+0x22/0x30
      17:40:12: [<ffffffff8153b463>] ? do_nanosleep+0x93/0xc0
      17:40:12: [<ffffffff810a6904>] ? hrtimer_nanosleep+0xc4/0x180
      17:40:12: [<ffffffff810a5730>] ? hrtimer_wakeup+0x0/0x30
      17:40:12: [<ffffffff8153f3de>] ? do_page_fault+0x3e/0xa0
      17:40:12: [<ffffffff8153c785>] ? page_fault+0x25/0x30
      17:40:12:Mem-Info:
      17:40:12:Node 0 DMA per-cpu:
      17:40:12:CPU    0: hi:    0, btch:   1 usd:   0
      17:40:12:CPU    1: hi:    0, btch:   1 usd:   0
      17:40:12:Node 0 DMA32 per-cpu:
      17:40:12:CPU    0: hi:  186, btch:  31 usd:  13
      17:40:12:CPU    1: hi:  186, btch:  31 usd:  55
      17:40:12:active_anon:1008 inactive_anon:1027 isolated_anon:0
      17:40:12: active_file:862 inactive_file:834 isolated_file:0
      17:40:12: unevictable:0 dirty:0 writeback:0 unstable:0
      17:40:12: free:13265 slab_reclaimable:2472 slab_unreclaimable:437874
      17:40:12: mapped:1 shmem:1 pagetables:709 bounce:0
      17:40:12:Node 0 DMA free:8336kB min:332kB low:412kB high:496kB active_anon:0kB inactive_anon:0kB active_file:128kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15348kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:4kB slab_unreclaimable:7276kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:157 all_unreclaimable? yes
      17:40:12:lowmem_reserve[]: 0 2004 2004 2004
      17:40:12:Node 0 DMA32 free:44724kB min:44720kB low:55900kB high:67080kB active_anon:4032kB inactive_anon:4108kB active_file:3320kB inactive_file:3336kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2052308kB mlocked:0kB dirty:0kB writeback:0kB mapped:4kB shmem:4kB slab_reclaimable:9884kB slab_unreclaimable:1744220kB kernel_stack:3808kB pagetables:2836kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:430063 all_unreclaimable? yes
      17:40:12:lowmem_reserve[]: 0 0 0 0
      17:40:12:Node 0 DMA: 2*4kB 1*8kB 2*16kB 1*32kB 1*64kB 2*128kB 3*256kB 2*512kB 2*1024kB 2*2048kB 0*4096kB = 8336kB
      17:40:13:Node 0 DMA32: 1993*4kB 954*8kB 510*16kB 271*32kB 72*64kB 8*128kB 4*256kB 1*512kB 1*1024kB 0*2048kB 1*4096kB = 44724kB
      17:40:13:2723 total pagecache pages
      17:40:13:1026 pages in swap cache
      17:40:13:Swap cache stats: add 3678, delete 2652, find 0/0
      17:40:13:Free swap  = 4114052kB
      17:40:13:Total swap = 4128764kB
      17:40:13:524284 pages RAM
      17:40:13:43736 pages reserved
      17:40:13:1750 pages shared
      17:40:13:438911 pages non-shared
      17:40:13:[ pid ]   uid  tgid total_vm      rss cpu oom_adj oom_score_adj name
      17:40:13:[  509]     0   509     2663        0   0     -17         -1000 udevd
      17:40:13:[ 1281]     0  1281     6899        1   0     -17         -1000 auditd
      17:40:13:[ 1315]     0  1315    62273       54   1       0             0 rsyslogd
      17:40:13:[ 1349]     0  1349     4561        1   0       0             0 irqbalance
      17:40:13:[ 1367]    32  1367     4744        1   1       0             0 rpcbind
      17:40:13:[ 1389]    29  1389     5837        1   1       0             0 rpc.statd
      17:40:13:[ 1423]    81  1423     6418        1   1       0             0 dbus-daemon
      17:40:13:[ 1443]     0  1443    53919        1   1       0             0 ypbind
      17:40:13:[ 1477]     0  1477    47233        1   1       0             0 cupsd
      17:40:13:[ 1534]     0  1534     1020        0   1       0             0 acpid
      17:40:13:[ 1546]    68  1546    10521        1   1       0             0 hald
      17:40:13:[ 1547]     0  1547     5099        1   1       0             0 hald-runner
      17:40:13:[ 1579]     0  1579     5629        1   1       0             0 hald-addon-inpu
      17:40:13:[ 1586]    68  1586     4501        1   1       0             0 hald-addon-acpi
      17:40:13:[ 1635]     0  1635    26827        0   0       0             0 rpc.rquotad
      17:40:13:[ 1640]     0  1640     5417        0   0       0             0 rpc.mountd
      17:40:13:[ 1684]     0  1684     6292        1   0       0             0 rpc.idmapd
      17:40:13:[ 1721]   498  1721    57325        1   0       0             0 munged
      17:40:13:[ 1832]     0  1832    16555        0   0     -17         -1000 sshd
      17:40:13:[ 1843]     0  1843     5429       16   1       0             0 xinetd
      17:40:13:[ 1875]     0  1875    22211        6   0       0             0 sendmail
      17:40:13:[ 1884]    51  1884    20074        0   0       0             0 sendmail
      17:40:13:[ 1912]     0  1912    29215        1   0       0             0 crond
      17:40:13:[ 1927]     0  1927     5276        0   0       0             0 atd
      17:40:13:[ 1940]     0  1940     1020        1   1       0             0 agetty
      17:40:13:[ 1942]     0  1942     1016        1   0       0             0 mingetty
      17:40:13:[ 1944]     0  1944     1016        1   0       0             0 mingetty
      17:40:13:[ 1946]     0  1946     1016        1   0       0             0 mingetty
      17:40:13:[ 1948]     0  1948     1016        1   1       0             0 mingetty
      17:40:13:[ 1950]     0  1950     2730        0   0     -17         -1000 udevd
      17:40:13:[ 1951]     0  1951     2728        0   0     -17         -1000 udevd
      17:40:13:[ 1952]     0  1952     1016        1   0       0             0 mingetty
      17:40:13:[ 1954]     0  1954     1016        1   0       0             0 mingetty
      17:40:13:[ 2706]    38  2706     7689        1   1       0             0 ntpd
      17:40:13:[11506]     0 11506     4763        1   1       0             0 anacron
      17:40:13:[20349]     0 20349    14749      161   1       0             0 in.mrshd
      17:40:13:[20350]     0 20350    26524       58   0       0             0 bash
      17:40:13:[20375]     0 20375    26524       58   1       0             0 bash
      17:40:13:[20376]     0 20376    27603      617   1       0             0 sh
      17:40:13:[20451]     0 20451    28026       64   0       0             0 lctl
      17:40:13:Kernel panic - not syncing: Out of memory: system-wide panic_on_oom is enabled
      17:40:14:
      17:40:14:Pid: 1349, comm: irqbalance Not tainted 2.6.32-573.7.1.el6_lustre.gd359461.x86_64 #1
      17:40:14:Call Trace:
      17:40:14: [<ffffffff815386d4>] ? panic+0xa7/0x16f
      17:40:14: [<ffffffff8112aaa1>] ? dump_header+0x191/0x1b0
      17:40:14: [<ffffffff8112ab3c>] ? check_panic_on_oom+0x7c/0x80
      17:40:14: [<ffffffff8112b1fb>] ? out_of_memory+0x1bb/0x3c0
      17:40:14: [<ffffffff81137c3c>] ? __alloc_pages_nodemask+0x93c/0x950
      17:40:14: [<ffffffff8117064a>] ? alloc_pages_current+0xaa/0x110
      17:40:14: [<ffffffff81127d97>] ? __page_cache_alloc+0x87/0x90
      17:40:14: [<ffffffff8112777e>] ? find_get_page+0x1e/0xa0
      17:40:14: [<ffffffff81128d37>] ? filemap_fault+0x1a7/0x500
      17:40:14: [<ffffffff811522f4>] ? __do_fault+0x54/0x530
      17:40:14: [<ffffffff811528c7>] ? handle_pte_fault+0xf7/0xb20
      17:40:14: [<ffffffff81066053>] ? perf_event_task_sched_out+0x33/0x70
      17:40:14: [<ffffffff81043f28>] ? pvclock_clocksource_read+0x58/0xd0
      17:40:14: [<ffffffff810098a5>] ? __switch_to+0x285/0x340
      17:40:14: [<ffffffff81042fbc>] ? kvm_clock_read+0x1c/0x20
      17:40:14: [<ffffffff81153589>] ? handle_mm_fault+0x299/0x3d0
      17:40:14: [<ffffffff81538dce>] ? thread_return+0x4e/0x7d0
      17:40:14: [<ffffffff8104f156>] ? __do_page_fault+0x146/0x500
      17:40:14: [<ffffffff810a677d>] ? hrtimer_try_to_cancel+0x3d/0xd0
      17:40:14: [<ffffffff810a6832>] ? hrtimer_cancel+0x22/0x30
      17:40:14: [<ffffffff8153b463>] ? do_nanosleep+0x93/0xc0
      17:40:14: [<ffffffff810a6904>] ? hrtimer_nanosleep+0xc4/0x180
      17:40:14: [<ffffffff810a5730>] ? hrtimer_wakeup+0x0/0x30
      17:40:14: [<ffffffff8153f3de>] ? do_page_fault+0x3e/0xa0
      17:40:14: [<ffffffff8153c785>] ? page_fault+0x25/0x30
      

      For https://testing.hpdd.intel.com/test_sets/aee14c08-769a-11e5-b71e-5254006e85c2, we see the following from the console on the MDS:

      13:02:02:Lustre: 20522:0:(llog_test.c:1443:llog_test_10()) 10c: write 131072 more log records
      13:02:02:sendmail invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0
      13:02:02:sendmail cpuset=/ mems_allowed=0
      13:02:02:Pid: 1827, comm: sendmail Not tainted 2.6.32-573.7.1.el6_lustre.gd359461.x86_64 #1
      13:02:02:Call Trace:
      13:02:02: [<ffffffff810d71a1>] ? cpuset_print_task_mems_allowed+0x91/0xb0
      13:02:02: [<ffffffff8112a9a0>] ? dump_header+0x90/0x1b0
      13:02:02: [<ffffffff8112ab0e>] ? check_panic_on_oom+0x4e/0x80
      13:02:02: [<ffffffff8112b1fb>] ? out_of_memory+0x1bb/0x3c0
      13:02:02: [<ffffffff81137c3c>] ? __alloc_pages_nodemask+0x93c/0x950
      13:02:02: [<ffffffff8117064a>] ? alloc_pages_current+0xaa/0x110
      13:02:02: [<ffffffff81127d97>] ? __page_cache_alloc+0x87/0x90
      13:02:02: [<ffffffff8112777e>] ? find_get_page+0x1e/0xa0
      13:02:02: [<ffffffff81128d37>] ? filemap_fault+0x1a7/0x500
      13:02:02: [<ffffffff811522f4>] ? __do_fault+0x54/0x530
      13:02:02: [<ffffffff811528c7>] ? handle_pte_fault+0xf7/0xb20
      13:02:02: [<ffffffff811b4901>] ? save_mount_options+0x21/0x40
      13:02:02: [<ffffffff811b7b22>] ? seq_vprintf+0x32/0x60
      13:02:02: [<ffffffff81012bbe>] ? copy_user_generic+0xe/0x20
      13:02:02: [<ffffffff81153589>] ? handle_mm_fault+0x299/0x3d0
      13:02:02: [<ffffffff8104f156>] ? __do_page_fault+0x146/0x500
      13:02:02: [<ffffffff81042fc9>] ? kvm_clock_get_cycles+0x9/0x10
      13:02:02: [<ffffffff810ad40f>] ? ktime_get_ts+0xbf/0x100
      13:02:02: [<ffffffff811a9778>] ? poll_select_copy_remaining+0xf8/0x150
      13:02:02: [<ffffffff8153f3de>] ? do_page_fault+0x3e/0xa0
      13:02:02: [<ffffffff8153c785>] ? page_fault+0x25/0x30
      13:02:02:Mem-Info:
      13:02:02:Node 0 DMA per-cpu:
      13:02:02:CPU    0: hi:    0, btch:   1 usd:   0
      13:02:02:Node 0 DMA32 per-cpu:
      13:02:02:CPU    0: hi:  186, btch:  31 usd: 115
      13:02:02:active_anon:1685 inactive_anon:1752 isolated_anon:0
      13:02:02: active_file:1244 inactive_file:1252 isolated_file:0
      13:02:02: unevictable:0 dirty:0 writeback:0 unstable:0
      13:02:02: free:13240 slab_reclaimable:2343 slab_unreclaimable:436182
      13:02:02: mapped:1 shmem:1 pagetables:676 bounce:0
      13:02:02:Node 0 DMA free:8352kB min:332kB low:412kB high:496kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:4kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15348kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:4kB slab_unreclaimable:7384kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:3 all_unreclaimable? yes
      13:02:02:lowmem_reserve[]: 0 2004 2004 2004
      13:02:02:Node 0 DMA32 free:44608kB min:44720kB low:55900kB high:67080kB active_anon:6740kB inactive_anon:7008kB active_file:4976kB inactive_file:5004kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2052308kB mlocked:0kB dirty:0kB writeback:0kB mapped:4kB shmem:4kB slab_reclaimable:9368kB slab_unreclaimable:1737344kB kernel_stack:3200kB pagetables:2704kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:374310 all_unreclaimable? yes
      13:02:02:lowmem_reserve[]: 0 0 0 0
      13:02:02:Node 0 DMA: 2*4kB 3*8kB 2*16kB 3*32kB 2*64kB 3*128kB 2*256kB 2*512kB 2*1024kB 2*2048kB 0*4096kB = 8352kB
      13:02:02:Node 0 DMA32: 2062*4kB 1079*8kB 497*16kB 232*32kB 79*64kB 11*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 1*4096kB = 44608kB
      13:02:02:4249 total pagecache pages
      13:02:02:1752 pages in swap cache
      13:02:02:Swap cache stats: add 2908, delete 1156, find 0/0
      13:02:02:Free swap  = 4117132kB
      13:02:02:Total swap = 4128764kB
      13:02:02:524284 pages RAM
      13:02:02:43736 pages reserved
      13:02:02:2548 pages shared
      13:02:02:438391 pages non-shared
      13:02:02:[ pid ]   uid  tgid total_vm      rss cpu oom_adj oom_score_adj name
      13:02:02:[  482]     0   482     2716       23   0     -17         -1000 udevd
      13:02:02:[ 1235]     0  1235     6902       14   0     -17         -1000 auditd
      13:02:02:[ 1269]     0  1269    62273      201   0       0             0 rsyslogd
      13:02:02:[ 1320]    32  1320     4744        1   0       0             0 rpcbind
      13:02:02:[ 1342]    29  1342     5837        1   0       0             0 rpc.statd
      13:02:02:[ 1376]    81  1376     6444        1   0       0             0 dbus-daemon
      13:02:02:[ 1398]     0  1398    53919        1   0       0             0 ypbind
      13:02:02:[ 1430]     0  1430    47233        1   0       0             0 cupsd
      13:02:02:[ 1486]     0  1486     1020        0   0       0             0 acpid
      13:02:02:[ 1498]    68  1498    10523        1   0       0             0 hald
      13:02:02:[ 1499]     0  1499     5099        1   0       0             0 hald-runner
      13:02:02:[ 1531]     0  1531     5629        1   0       0             0 hald-addon-inpu
      13:02:02:[ 1541]    68  1541     4501        1   0       0             0 hald-addon-acpi
      13:02:02:[ 1587]     0  1587    26827        0   0       0             0 rpc.rquotad
      13:02:02:[ 1592]     0  1592     5417        0   0       0             0 rpc.mountd
      13:02:02:[ 1636]     0  1636     6292        1   0       0             0 rpc.idmapd
      13:02:02:[ 1673]   498  1673    57325      139   0       0             0 munged
      13:02:02:[ 1784]     0  1784    16555        0   0     -17         -1000 sshd
      13:02:02:[ 1795]     0  1795     5429       18   0       0             0 xinetd
      13:02:02:[ 1827]     0  1827    22210       23   0       0             0 sendmail
      13:02:02:[ 1836]    51  1836    20074       13   0       0             0 sendmail
      13:02:02:[ 1864]     0  1864    29216       10   0       0             0 crond
      13:02:02:[ 1879]     0  1879     5276        0   0       0             0 atd
      13:02:02:[ 1893]     0  1893     1020        2   0       0             0 agetty
      13:02:02:[ 1894]     0  1894     1016        2   0       0             0 mingetty
      13:02:02:[ 1896]     0  1896     1016       12   0       0             0 mingetty
      13:02:02:[ 1898]     0  1898     1016       21   0       0             0 mingetty
      13:02:02:[ 1900]     0  1900     1016       22   0       0             0 mingetty
      13:02:02:[ 1902]     0  1902     2715       24   0     -17         -1000 udevd
      13:02:02:[ 1903]     0  1903     2714       31   0     -17         -1000 udevd
      13:02:02:[ 1904]     0  1904     1016       21   0       0             0 mingetty
      13:02:02:[ 1906]     0  1906     1016       22   0       0             0 mingetty
      13:02:02:[ 2658]    38  2658     7689      154   0       0             0 ntpd
      13:02:02:[20420]     0 20420    14749      161   0       0             0 in.mrshd
      13:02:02:[20421]     0 20421    26524       59   0       0             0 bash
      13:02:02:[20446]     0 20446    26524       59   0       0             0 bash
      13:02:02:[20447]     0 20447    27603      617   0       0             0 sh
      13:02:02:[20522]     0 20522    28026       65   0       0             0 lctl
      13:02:02:Kernel panic - not syncing: Out of memory: system-wide panic_on_oom is enabled
      13:02:02:
      13:02:02:Pid: 1827, comm: sendmail Not tainted 2.6.32-573.7.1.el6_lustre.gd359461.x86_64 #1
      13:02:02:Call Trace:
      13:02:02: [<ffffffff815386d4>] ? panic+0xa7/0x16f
      13:02:02: [<ffffffff8112aaa1>] ? dump_header+0x191/0x1b0
      13:02:02: [<ffffffff8112ab3c>] ? check_panic_on_oom+0x7c/0x80
      13:02:02: [<ffffffff8112b1fb>] ? out_of_memory+0x1bb/0x3c0
      13:02:02: [<ffffffff81137c3c>] ? __alloc_pages_nodemask+0x93c/0x950
      13:02:02: [<ffffffff8117064a>] ? alloc_pages_current+0xaa/0x110
      13:02:02: [<ffffffff81127d97>] ? __page_cache_alloc+0x87/0x90
      13:02:02: [<ffffffff8112777e>] ? find_get_page+0x1e/0xa0
      13:02:02: [<ffffffff81128d37>] ? filemap_fault+0x1a7/0x500
      13:02:02: [<ffffffff811522f4>] ? __do_fault+0x54/0x530
      13:02:02: [<ffffffff811528c7>] ? handle_pte_fault+0xf7/0xb20
      13:02:02: [<ffffffff811b4901>] ? save_mount_options+0x21/0x40
      13:02:02: [<ffffffff811b7b22>] ? seq_vprintf+0x32/0x60
      13:02:02: [<ffffffff81012bbe>] ? copy_user_generic+0xe/0x20
      13:02:02: [<ffffffff81153589>] ? handle_mm_fault+0x299/0x3d0
      13:02:02: [<ffffffff8104f156>] ? __do_page_fault+0x146/0x500
      13:02:02: [<ffffffff81042fc9>] ? kvm_clock_get_cycles+0x9/0x10
      13:02:02: [<ffffffff810ad40f>] ? ktime_get_ts+0xbf/0x100
      13:02:02: [<ffffffff811a9778>] ? poll_select_copy_remaining+0xf8/0x150
      13:02:02: [<ffffffff8153f3de>] ? do_page_fault+0x3e/0xa0
      13:02:02: [<ffffffff8153c785>] ? page_fault+0x25/0x30
      

      For https://testing.hpdd.intel.com/test_sets/4790f2ba-767c-11e5-86b7-5254006e85c2, we see the following from the console on the MDS:

      08:00:56:Lustre: 20445:0:(llog_test.c:1443:llog_test_10()) 10c: write 131072 more log records
      08:00:56:sendmail invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0
      08:00:56:sendmail cpuset=/ mems_allowed=0
      08:00:56:Pid: 1875, comm: sendmail Not tainted 2.6.32-573.7.1.el6_lustre.gd359461.x86_64 #1
      08:00:56:Call Trace:
      08:00:56: [<ffffffff810d71a1>] ? cpuset_print_task_mems_allowed+0x91/0xb0
      08:00:56: [<ffffffff8112a9a0>] ? dump_header+0x90/0x1b0
      08:00:56: [<ffffffff8112ab0e>] ? check_panic_on_oom+0x4e/0x80
      08:00:56: [<ffffffff8112b1fb>] ? out_of_memory+0x1bb/0x3c0
      08:00:56: [<ffffffff81137c3c>] ? __alloc_pages_nodemask+0x93c/0x950
      08:00:56: [<ffffffff8117064a>] ? alloc_pages_current+0xaa/0x110
      08:00:56: [<ffffffff81127d97>] ? __page_cache_alloc+0x87/0x90
      08:00:56: [<ffffffff8112777e>] ? find_get_page+0x1e/0xa0
      08:00:56: [<ffffffff81128d37>] ? filemap_fault+0x1a7/0x500
      08:00:56: [<ffffffff811522f4>] ? __do_fault+0x54/0x530
      08:00:56: [<ffffffff811528c7>] ? handle_pte_fault+0xf7/0xb20
      08:00:56: [<ffffffff811b4901>] ? save_mount_options+0x21/0x40
      08:00:56: [<ffffffff811b7b22>] ? seq_vprintf+0x32/0x60
      08:00:56: [<ffffffff81012bbe>] ? copy_user_generic+0xe/0x20
      08:00:56: [<ffffffff81153589>] ? handle_mm_fault+0x299/0x3d0
      08:00:56: [<ffffffff8104f156>] ? __do_page_fault+0x146/0x500
      08:00:56: [<ffffffff81042fc9>] ? kvm_clock_get_cycles+0x9/0x10
      08:00:56: [<ffffffff810ad40f>] ? ktime_get_ts+0xbf/0x100
      08:00:56: [<ffffffff811a9778>] ? poll_select_copy_remaining+0xf8/0x150
      08:00:56: [<ffffffff8153f3de>] ? do_page_fault+0x3e/0xa0
      08:00:56: [<ffffffff8153c785>] ? page_fault+0x25/0x30
      08:00:56:Mem-Info:
      08:00:56:Node 0 DMA per-cpu:
      08:00:56:CPU    0: hi:    0, btch:   1 usd:   0
      08:00:56:CPU    1: hi:    0, btch:   1 usd:   0
      08:00:56:Node 0 DMA32 per-cpu:
      08:00:56:CPU    0: hi:  186, btch:  31 usd:  94
      08:00:56:CPU    1: hi:  186, btch:  31 usd:   0
      08:00:56:active_anon:2321 inactive_anon:2319 isolated_anon:0
      08:00:56: active_file:1224 inactive_file:1294 isolated_file:0
      08:00:56: unevictable:0 dirty:13 writeback:2015 unstable:0
      08:00:56: free:13240 slab_reclaimable:2534 slab_unreclaimable:434564
      08:00:56: mapped:18 shmem:14 pagetables:709 bounce:0
      08:00:56:Node 0 DMA free:8336kB min:332kB low:412kB high:496kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15348kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:4kB slab_unreclaimable:7328kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:1 all_unreclaimable? yes
      08:00:56:lowmem_reserve[]: 0 2004 2004 2004
      08:00:56:Node 0 DMA32 free:44624kB min:44720kB low:55900kB high:67080kB active_anon:9284kB inactive_anon:9276kB active_file:4896kB inactive_file:5176kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2052308kB mlocked:0kB dirty:52kB writeback:8060kB mapped:72kB shmem:56kB slab_reclaimable:10132kB slab_unreclaimable:1730928kB kernel_stack:3680kB pagetables:2836kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:300165 all_unreclaimable? yes
      08:00:56:lowmem_reserve[]: 0 0 0 0
      08:00:56:Node 0 DMA: 2*4kB 2*8kB 2*16kB 2*32kB 3*64kB 3*128kB 2*256kB 2*512kB 2*1024kB 2*2048kB 0*4096kB = 8376kB
      08:00:56:Node 0 DMA32: 2124*4kB 1102*8kB 573*16kB 241*32kB 57*64kB 7*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 1*4096kB = 44624kB
      08:00:56:4866 total pagecache pages
      08:00:56:2340 pages in swap cache
      08:00:56:Swap cache stats: add 2340, delete 0, find 0/0
      08:00:56:Free swap  = 4119404kB
      08:00:56:Total swap = 4128764kB
      08:00:56:524284 pages RAM
      08:00:56:43736 pages reserved
      08:00:56:2566 pages shared
      08:00:56:438172 pages non-shared
      08:00:56:[ pid ]   uid  tgid total_vm      rss cpu oom_adj oom_score_adj name
      08:00:56:[  506]     0   506     2708       25   0     -17         -1000 udevd
      08:00:56:[ 1281]     0  1281     6902       13   0     -17         -1000 auditd
      08:00:56:[ 1315]     0  1315    62272      176   1       0             0 rsyslogd
      08:00:56:[ 1349]     0  1349     4561        1   1       0             0 irqbalance
      08:00:56:[ 1367]    32  1367     4744        1   0       0             0 rpcbind
      08:00:56:[ 1389]    29  1389     5837        1   1       0             0 rpc.statd
      08:00:56:[ 1423]    81  1423     6418        1   0       0             0 dbus-daemon
      08:00:56:[ 1443]     0  1443    53919        1   1       0             0 ypbind
      08:00:56:[ 1477]     0  1477    47233        1   1       0             0 cupsd
      08:00:56:[ 1534]     0  1534     1020        0   1       0             0 acpid
      08:00:56:[ 1546]    68  1546    10517        1   1       0             0 hald
      08:00:56:[ 1547]     0  1547     5099        1   0       0             0 hald-runner
      08:00:56:[ 1579]     0  1579     5629        1   1       0             0 hald-addon-inpu
      08:00:56:[ 1589]    68  1589     4501        1   1       0             0 hald-addon-acpi
      08:00:56:[ 1635]     0  1635    26827        0   0       0             0 rpc.rquotad
      08:00:56:[ 1640]     0  1640     5417        0   0       0             0 rpc.mountd
      08:00:56:[ 1684]     0  1684     6292        1   0       0             0 rpc.idmapd
      08:00:56:[ 1721]   498  1721    57325      139   0       0             0 munged
      08:00:56:[ 1832]     0  1832    16555        0   0     -17         -1000 sshd
      08:00:56:[ 1843]     0  1843     5429       17   1       0             0 xinetd
      08:00:56:[ 1875]     0  1875    22211       31   0       0             0 sendmail
      08:00:56:[ 1884]    51  1884    20074      365   0       0             0 sendmail
      08:00:56:[ 1912]     0  1912    29215      151   1       0             0 crond
      08:00:56:[ 1927]     0  1927     5276       45   0       0             0 atd
      08:00:56:[ 1940]     0  1940     1020       22   1       0             0 agetty
      08:00:56:[ 1942]     0  1942     1016       21   1       0             0 mingetty
      08:00:56:[ 1944]     0  1944     1016       21   1       0             0 mingetty
      08:00:56:[ 1946]     0  1946     1016       21   1       0             0 mingetty
      08:00:56:[ 1948]     0  1948     1016       22   1       0             0 mingetty
      08:00:56:[ 1951]     0  1951     2710       35   0     -17         -1000 udevd
      08:00:56:[ 1952]     0  1952     2708       32   0     -17         -1000 udevd
      08:00:56:[ 1953]     0  1953     1016       21   0       0             0 mingetty
      08:00:56:[ 1955]     0  1955     1016       21   0       0             0 mingetty
      08:00:56:[ 2711]    38  2711     7689      156   1       0             0 ntpd
      08:00:56:[21148]     0 21148     4763       55   1       0             0 anacron
      08:00:56:[20343]     0 20343    14749      161   0       0             0 in.mrshd
      08:00:56:[20344]     0 20344    26524       58   0       0             0 bash
      08:00:56:[20369]     0 20369    26524       58   1       0             0 bash
      08:00:56:[20370]     0 20370    27603      619   0       0             0 sh
      08:00:56:[20445]     0 20445    28026       65   1       0             0 lctl
      08:00:56:Kernel panic - not syncing: Out of memory: system-wide panic_on_oom is enabled
      08:00:56:
      08:00:56:Pid: 1875, comm: sendmail Not tainted 2.6.32-573.7.1.el6_lustre.gd359461.x86_64 #1
      08:00:56:Call Trace:
      08:00:56: [<ffffffff815386d4>] ? panic+0xa7/0x16f
      08:00:56: [<ffffffff8112aaa1>] ? dump_header+0x191/0x1b0
      08:00:56: [<ffffffff8112ab3c>] ? check_panic_on_oom+0x7c/0x80
      08:00:56: [<ffffffff8112b1fb>] ? out_of_memory+0x1bb/0x3c0
      08:00:56: [<ffffffff81137c3c>] ? __alloc_pages_nodemask+0x93c/0x950
      08:00:56: [<ffffffff8117064a>] ? alloc_pages_current+0xaa/0x110
      08:00:56: [<ffffffff81127d97>] ? __page_cache_alloc+0x87/0x90
      08:00:56: [<ffffffff8112777e>] ? find_get_page+0x1e/0xa0
      08:00:56: [<ffffffff81128d37>] ? filemap_fault+0x1a7/0x500
      08:00:56: [<ffffffff811522f4>] ? __do_fault+0x54/0x530
      08:00:56: [<ffffffff811528c7>] ? handle_pte_fault+0xf7/0xb20
      08:00:56: [<ffffffff811b4901>] ? save_mount_options+0x21/0x40
      08:00:56: [<ffffffff811b7b22>] ? seq_vprintf+0x32/0x60
      08:00:56: [<ffffffff81012bbe>] ? copy_user_generic+0xe/0x20
      08:00:56: [<ffffffff81153589>] ? handle_mm_fault+0x299/0x3d0
      08:00:56: [<ffffffff8104f156>] ? __do_page_fault+0x146/0x500
      08:00:56: [<ffffffff81042fc9>] ? kvm_clock_get_cycles+0x9/0x10
      08:00:56: [<ffffffff810ad40f>] ? ktime_get_ts+0xbf/0x100
      08:00:56: [<ffffffff811a9778>] ? poll_select_copy_remaining+0xf8/0x150
      08:00:56: [<ffffffff8153f3de>] ? do_page_fault+0x3e/0xa0
      08:00:56: [<ffffffff8153c785>] ? page_fault+0x25/0x30
      

      Attachments

        Issue Links

          Activity

            People

              bfaccini Bruno Faccini (Inactive)
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: