Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7883

sanity test_60a invokes oom-killer in subtest 10b and times out

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.9.0
    • Lustre 2.9.0
    • None
    • review-dne-part-1
    • 3
    • 9223372036854775807

    Description

      sanity test 60a invokes oom-killer and times out on subtest 10b. From a recent failure, the MDS console log has

      02:35:42:Lustre: 6822:0:(llog_test.c:1468:llog_test_10()) 10a: create a catalog log with name: 851d315f
      02:35:42:Lustre: 6822:0:(llog_test.c:1487:llog_test_10()) 10b: write 65536 log records
      02:35:42:ntpd invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0
      02:35:42:ntpd cpuset=/ mems_allowed=0
      02:35:42:Pid: 10244, comm: ntpd Not tainted 2.6.32-573.18.1.el6_lustre.ge5f28dc.x86_64 #1
      02:35:42:Call Trace:
      02:35:42: [<ffffffff810d71a1>] ? cpuset_print_task_mems_allowed+0x91/0xb0
      02:35:42: [<ffffffff8112a9a0>] ? dump_header+0x90/0x1b0
      02:35:42: [<ffffffff8112ab0e>] ? check_panic_on_oom+0x4e/0x80
      02:35:42: [<ffffffff8112b1fb>] ? out_of_memory+0x1bb/0x3c0
      02:35:42: [<ffffffff81137c3c>] ? __alloc_pages_nodemask+0x93c/0x950
      02:35:42: [<ffffffff811709ba>] ? alloc_pages_current+0xaa/0x110
      02:35:42: [<ffffffff81127d97>] ? __page_cache_alloc+0x87/0x90
      02:35:42: [<ffffffff8112777e>] ? find_get_page+0x1e/0xa0
      02:35:42: [<ffffffff81128d37>] ? filemap_fault+0x1a7/0x500
      02:35:42: [<ffffffff81152314>] ? __do_fault+0x54/0x530
      02:35:42: [<ffffffff811a9f30>] ? pollwake+0x0/0x60
      02:35:42: [<ffffffff811528e7>] ? handle_pte_fault+0xf7/0xb20
      02:35:42: [<ffffffff8153cd75>] ? _read_unlock_bh+0x15/0x20
      02:35:42: [<ffffffff8145b227>] ? sock_i_uid+0x47/0x60
      02:35:42: [<ffffffff81012bbe>] ? copy_user_generic+0xe/0x20
      02:35:42: [<ffffffff811535a9>] ? handle_mm_fault+0x299/0x3d0
      02:35:42: [<ffffffff8104f156>] ? __do_page_fault+0x146/0x500
      02:35:42: [<ffffffff81042fc9>] ? kvm_clock_get_cycles+0x9/0x10
      02:35:42: [<ffffffff810ad40f>] ? ktime_get_ts+0xbf/0x100
      02:35:42: [<ffffffff811a9b78>] ? poll_select_copy_remaining+0xf8/0x150
      02:35:42: [<ffffffff8153fd1e>] ? do_page_fault+0x3e/0xa0
      02:35:42: [<ffffffff8153d0c5>] ? page_fault+0x25/0x30
      02:35:42:Mem-Info:
      02:35:42:Node 0 DMA per-cpu:
      02:35:42:CPU    0: hi:    0, btch:   1 usd:   0
      02:35:42:CPU    1: hi:    0, btch:   1 usd:   0
      02:35:42:Node 0 DMA32 per-cpu:
      02:35:42:CPU    0: hi:  186, btch:  31 usd:  75
      02:35:42:CPU    1: hi:  186, btch:  31 usd:   0
      02:35:42:active_anon:3065 inactive_anon:3126 isolated_anon:0
      02:35:42: active_file:1149 inactive_file:1114 isolated_file:0
      02:35:42: unevictable:0 dirty:0 writeback:3125 unstable:0
      02:35:42: free:13241 slab_reclaimable:2692 slab_unreclaimable:420617
      02:35:42: mapped:1 shmem:3 pagetables:1219 bounce:0
      02:35:42:Node 0 DMA free:8356kB min:332kB low:412kB high:496kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:4kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15348kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:4kB slab_unreclaimable:7360kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:1 all_unreclaimable? yes
      02:35:42:lowmem_reserve[]: 0 2004 2004 2004
      02:35:42:Node 0 DMA32 free:44608kB min:44720kB low:55900kB high:67080kB active_anon:12260kB inactive_anon:12504kB active_file:4596kB inactive_file:4452kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2052308kB mlocked:0kB dirty:0kB writeback:3324kB mapped:4kB shmem:12kB slab_reclaimable:10764kB slab_unreclaimable:1674856kB kernel_stack:4592kB pagetables:4876kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:279133 all_unreclaimable? yes
      02:35:42:lowmem_reserve[]: 0 0 0 0
      02:35:42:Node 0 DMA: 3*4kB 1*8kB 3*16kB 3*32kB 2*64kB 3*128kB 2*256kB 2*512kB 2*1024kB 2*2048kB 0*4096kB = 8356kB
      02:35:42:Node 0 DMA32: 1325*4kB 753*8kB 501*16kB 426*32kB 90*64kB 2*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 1*4096kB = 44876kB
      02:35:42:5204 total pagecache pages
      02:35:42:2638 pages in swap cache
      02:35:42:Swap cache stats: add 3431, delete 825, find 6/6
      02:35:42:Free swap  = 4115064kB
      02:35:42:Total swap = 4128764kB
      02:35:42:524284 pages RAM
      02:35:42:43737 pages reserved
      02:35:42:2420 pages shared
      02:35:42:437717 pages non-shared
      02:35:42:[ pid ]   uid  tgid total_vm      rss cpu oom_adj oom_score_adj name
      02:35:42:[  456]     0   456     2695        4   0     -17         -1000 udevd
      02:35:42:[  761]     0   761     2695        9   0     -17         -1000 udevd
      02:35:42:[  915]     0   915     2694       12   0     -17         -1000 udevd
      02:35:42:[ 1175]     0  1175     2280        1   0       0             0 dhclient
      02:35:42:[ 1238]     0  1238     6905       13   1     -17         -1000 auditd
      02:35:42:[ 1272]     0  1272    63854      276   0       0             0 rsyslogd
      02:35:42:[ 1306]     0  1306     4561        1   0       0             0 irqbalance
      02:35:42:[ 1324]    32  1324     4744        1   0       0             0 rpcbind
      02:35:42:[ 1340]     0  1340    52816        5   0       0             0 sssd
      02:35:42:[ 1341]     0  1341    74242       55   1       0             0 sssd_be
      02:35:42:[ 1342]     0  1342    54172        6   1       0             0 sssd_nss
      02:35:42:[ 1343]     0  1343    50533        2   1       0             0 sssd_pam
      02:35:42:[ 1344]     0  1344    50020       26   0       0             0 sssd_ssh
      02:35:42:[ 1345]     0  1345    55112        2   1       0             0 sssd_pac
      02:35:42:[ 1367]    29  1367     6357        1   1       0             0 rpc.statd
      02:35:42:[ 1401]    81  1401     5878        3   0       0             0 dbus-daemon
      02:35:42:[ 1422]     0  1422    47234        1   1       0             0 cupsd
      02:35:42:[ 1466]     0  1466     1020        0   0       0             0 acpid
      02:35:42:[ 1478]    68  1478    10493        1   0       0             0 hald
      02:35:42:[ 1479]     0  1479     5099        1   1       0             0 hald-runner
      02:35:42:[ 1511]     0  1511     5629        1   0       0             0 hald-addon-inpu
      02:35:42:[ 1523]    68  1523     4501        1   0       0             0 hald-addon-acpi
      02:35:42:[ 1547]     0  1547   169290      182   1       0             0 automount
      02:35:42:[ 1601]     0  1601    26827        0   0       0             0 rpc.rquotad
      02:35:42:[ 1606]     0  1606     5417        0   0       0             0 rpc.mountd
      02:35:42:[ 1651]     0  1651     5774        1   0       0             0 rpc.idmapd
      02:35:42:[ 1688]   496  1688    56786      138   0       0             0 munged
      02:35:42:[ 8272]     0  8272    16556        0   1     -17         -1000 sshd
      02:35:42:[ 8283]     0  8283     5429       36   1       0             0 xinetd
      02:35:42:[ 8373]     0  8373    20737      227   0       0             0 master
      02:35:42:[ 8387]    89  8387    20757      219   0       0             0 pickup
      02:35:42:[ 8388]    89  8388    20800      221   0       0             0 qmgr
      02:35:42:[ 8402]     0  8402    29215      151   0       0             0 crond
      02:35:42:[ 8421]     0  8421     5276       46   1       0             0 atd
      02:35:42:[ 8453]     0  8453    16112      187   1       0             0 certmonger
      02:35:42:[ 8482]     0  8482     1020       23   1       0             0 agetty
      02:35:42:[ 8486]     0  8486     1016       21   1       0             0 mingetty
      02:35:42:[ 8491]     0  8491     1016       22   1       0             0 mingetty
      02:35:42:[ 8494]     0  8494     1016       22   0       0             0 mingetty
      02:35:42:[ 8499]     0  8499     1016       21   1       0             0 mingetty
      02:35:42:[ 8504]     0  8504     1016       21   1       0             0 mingetty
      02:35:42:[ 8507]     0  8507     1016       21   0       0             0 mingetty
      02:35:42:[10244]    38 10244     8207      161   1       0             0 ntpd
      02:35:42:[30737]     0 30737     4237       52   1       0             0 anacron
      02:35:42:[ 6720]     0  6720    15806      178   0       0             0 in.mrshd
      02:35:42:[ 6721]     0  6721    26515       66   0       0             0 bash
      02:35:42:[ 6746]     0  6746    26515       66   1       0             0 bash
      02:35:42:[ 6747]     0  6747    27593      618   0       0             0 sh
      02:35:42:[ 6822]     0  6822    28024       73   0       0             0 lctl
      02:35:42:Kernel panic - not syncing: Out of memory: system-wide panic_on_oom is enabled
      02:35:42:
      02:35:42:Pid: 10244, comm: ntpd Not tainted 2.6.32-573.18.1.el6_lustre.ge5f28dc.x86_64 #1
      

      Here are recent logs for this type of failure
      2016-03-10
      https://testing.hpdd.intel.com/test_sets/2e6d4b9a-e700-11e5-bdde-5254006e85c2
      https://testing.hpdd.intel.com/test_sets/6c42ab98-e6d0-11e5-8590-5254006e85c2

      2016-03-14
      https://testing.hpdd.intel.com/test_sets/8f4538da-ea7d-11e5-8779-5254006e85c2
      https://testing.hpdd.intel.com/test_sets/bc22e04c-ea4f-11e5-8606-5254006e85c2
      https://testing.hpdd.intel.com/test_sets/7dc47e5e-ea41-11e5-8606-5254006e85c2
      https://testing.hpdd.intel.com/test_sets/a332f46a-e9fe-11e5-8186-5254006e85c2

      Attachments

        Issue Links

          Activity

            People

              ys Yang Sheng
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: