Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4559

Failure on test suite ost-pools test_23b: client oom

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Minor
    • None
    • Lustre 2.6.0
    • None
    • client and server: lustre-master build # 1837 RHEL6 ldiskfs
    • 3
    • 12449

    Description

      This issue was created by maloo for sarah <sarah@whamcloud.com>

      This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/2ce5d4d2-859f-11e3-a2cb-52540035b04c.

      The sub-test test_23b failed with the following error:

      test failed to respond and timed out

      client 1 console

      05:01:48:Lustre: DEBUG MARKER: == ost-pools test 23b: OST pools and OOS == 05:00:49 (1390568449)
      05:01:48:Lustre: DEBUG MARKER: lctl get_param -n lov.lustre-*.pools.testpool 2>/dev/null || echo foo
      05:01:51:Lustre: DEBUG MARKER: lctl get_param -n lov.lustre-*.pools.testpool 2>/dev/null || echo foo
      05:01:52:Lustre: DEBUG MARKER: lctl get_param -n lov.lustre-*.pools.testpool | sort -u | tr '\n' ' ' 
      05:01:52:Lustre: DEBUG MARKER: lctl get_param -n lov.lustre-*.pools.testpool | sort -u | tr '\n' ' ' 
      05:01:52:Lustre: DEBUG MARKER: lctl get_param -n lov.lustre-*.pools.testpool | sort -u | tr '\n' ' ' 
      05:01:52:Lustre: DEBUG MARKER: lctl get_param -n lov.lustre-*.pools.testpool | sort -u | tr '\n' ' ' 
      05:01:53:munged invoked oom-killer: gfp_mask=0x200da, order=0, oom_adj=0, oom_score_adj=0
      05:01:53:munged cpuset=/ mems_allowed=0
      05:01:53:Pid: 5654, comm: munged Not tainted 2.6.32-358.23.2.el6.x86_64 #1
      05:01:54:Call Trace:
      05:01:54: [<ffffffff810cb641>] ? cpuset_print_task_mems_allowed+0x91/0xb0
      05:01:54: [<ffffffff8111ce40>] ? dump_header+0x90/0x1b0
      05:01:55: [<ffffffff810e930e>] ? __delayacct_freepages_end+0x2e/0x30
      05:01:55: [<ffffffff8121d4ec>] ? security_real_capable_noaudit+0x3c/0x70
      05:01:55: [<ffffffff8111d2c2>] ? oom_kill_process+0x82/0x2a0
      05:01:56: [<ffffffff8111d201>] ? select_bad_process+0xe1/0x120
      05:01:56: [<ffffffff8111d700>] ? out_of_memory+0x220/0x3c0
      05:01:57: [<ffffffff8112c3dc>] ? __alloc_pages_nodemask+0x8ac/0x8d0
      05:01:57: [<ffffffff81160d6a>] ? alloc_pages_vma+0x9a/0x150
      05:01:57: [<ffffffff81154aa2>] ? read_swap_cache_async+0xf2/0x160
      05:01:57: [<ffffffff811555c9>] ? valid_swaphandles+0x69/0x150
      05:01:58: [<ffffffff81154b97>] ? swapin_readahead+0x87/0xc0
      05:01:58: [<ffffffff81143eab>] ? handle_pte_fault+0x70b/0xb50
      05:02:00: [<ffffffff8114452a>] ? handle_mm_fault+0x23a/0x310
      05:02:01: [<ffffffff810474e9>] ? __do_page_fault+0x139/0x480
      05:02:01: [<ffffffff81186b94>] ? cp_new_stat+0xe4/0x100
      05:02:01: [<ffffffff8103c7d8>] ? pvclock_clocksource_read+0x58/0xd0
      05:02:01: [<ffffffff8103b8cc>] ? kvm_clock_read+0x1c/0x20
      05:02:02: [<ffffffff8103b8d9>] ? kvm_clock_get_cycles+0x9/0x10
      05:02:03: [<ffffffff810a1507>] ? getnstimeofday+0x57/0xe0
      05:02:04: [<ffffffff81513bfe>] ? do_page_fault+0x3e/0xa0
      05:02:05: [<ffffffff81510fb5>] ? page_fault+0x25/0x30
      05:02:06:Mem-Info:
      05:02:07:Node 0 DMA per-cpu:
      05:02:08:CPU    0: hi:    0, btch:   1 usd:   0
      05:02:10:CPU    1: hi:    0, btch:   1 usd:   0
      05:02:11:Node 0 DMA32 per-cpu:
      05:02:12:CPU    0: hi:  186, btch:  31 usd:   0
      05:02:12:CPU    1: hi:  186, btch:  31 usd:  51
      05:02:12:active_anon:0 inactive_anon:23 isolated_anon:0
      05:02:13: active_file:106823 inactive_file:314646 isolated_file:0
      05:02:14: unevictable:0 dirty:209 writeback:0 unstable:4096
      05:02:15: free:13257 slab_reclaimable:4006 slab_unreclaimable:28126
      05:02:15: mapped:3476 shmem:0 pagetables:1011 bounce:0
      05:02:15:Node 0 DMA free:8348kB min:332kB low:412kB high:496kB active_anon:0kB inactive_anon:0kB active_file:3000kB inactive_file:3624kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15324kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:760kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:10272 all_unreclaimable? yes
      05:02:18:lowmem_reserve[]: 0 2003 2003 2003
      05:02:18:Node 0 DMA32 free:44680kB min:44720kB low:55900kB high:67080kB active_anon:0kB inactive_anon:92kB active_file:424292kB inactive_file:1254876kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2052064kB mlocked:0kB dirty:836kB writeback:0kB mapped:13904kB shmem:0kB slab_reclaimable:16024kB slab_unreclaimable:111744kB kernel_stack:1376kB pagetables:4044kB unstable:16384kB bounce:0kB writeback_tmp:0kB pages_scanned:471584 all_unreclaimable? no
      05:02:18:lowmem_reserve[]: 0 0 0 0
      05:02:19:Node 0 DMA: 3*4kB 0*8kB 1*16kB 2*32kB 5*64kB 4*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 1*4096kB = 8348kB
      05:02:21:Node 0 DMA32: 973*4kB 411*8kB 152*16kB 364*32kB 131*64kB 30*128kB 14*256kB 5*512kB 1*1024kB 0*2048kB 1*4096kB = 44748kB
      05:02:22:107072 total pagecache pages
      05:02:23:7 pages in swap cache
      05:02:23:Swap cache stats: add 10191, delete 10184, find 1623/1924
      05:02:24:Free swap  = 4099492kB
      05:02:24:Total swap = 4128760kB
      05:02:25:524284 pages RAM
      05:02:26:43709 pages reserved
      05:02:27:15081820 pages shared
      05:02:29:105406 pages non-shared
      05:02:29:[ pid ]   uid  tgid total_vm      rss cpu oom_adj oom_score_adj name
      05:02:30:[  451]     0   451     2760       75   0     -17         -1000 udevd
      05:02:30:[ 1134]     0  1134    62320      248   1       0             0 rsyslogd
      05:02:30:[ 1163]     0  1163     2704      103   0       0             0 irqbalance
      05:02:30:[ 1177]    32  1177     4743      167   0       0             0 rpcbind
      05:02:31:[ 1189]     0  1189    49856      482   0       0             0 sssd
      05:02:31:[ 1191]     0  1191    56622     1236   0       0             0 sssd_be
      05:02:33:[ 1192]     0  1192    50900      804   0       0             0 sssd_nss
      05:02:34:[ 1193]     0  1193    48426      649   0       0             0 sssd_pam
      05:02:34:[ 1210]    29  1210     6355      222   0       0             0 rpc.statd
      05:02:35:[ 5408]    81  5408     5869      141   1       0             0 dbus-daemon
      05:02:35:[ 5446]     0  5446     1019      128   1       0             0 acpid
      05:02:35:[ 5455]    68  5455     6781      584   0       0             0 hald
      05:02:36:[ 5456]     0  5456     4525      269   0       0             0 hald-runner
      05:02:36:[ 5485]     0  5485     5054      264   0       0             0 hald-addon-inpu
      05:02:38:[ 5497]    68  5497     4450      240   0       0             0 hald-addon-acpi
      05:02:39:[ 5514]     0  5514   168291      710   0       0             0 automount
      05:02:39:[ 5555]     0  5555    26826       34   1       0             0 rpc.rquotad
      05:02:39:[ 5559]     0  5559     5413      100   1       0             0 rpc.mountd
      05:02:41:[ 5608]     0  5608     6290       94   1       0             0 rpc.idmapd
      05:02:41:[ 5653]   498  5653    58372      305   0       0             0 munged
      05:02:41:[ 5683]     0  5683    16562      126   0     -17         -1000 sshd
      05:02:42:[ 5691]     0  5691     5533      182   0       0             0 xinetd
      05:02:42:[ 5718]     0  5718    22729      169   1       0             0 sendmail
      05:02:43:[ 5726]    51  5726    20074      157   1       0             0 sendmail
      05:02:43:[ 5748]     0  5748    29313      154   1       0             0 crond
      05:02:44:[ 5759]     0  5759     5373       75   1       0             0 atd
      05:02:44:[ 5782]     0  5782     2759       81   1     -17         -1000 udevd
      05:02:44:[ 5783]     0  5783     2759       79   1     -17         -1000 udevd
      05:02:44:[ 5806]     0  5806    23299      141   0     -17         -1000 auditd
      05:02:44:[ 8195]     0  8195    27700      730   0       0             0 sshd
      05:02:45:[ 8197] 840000043  8197    27735      261   0       0             0 sshd
      05:02:46:[ 8198] 840000043  8198     7536      219   1       0             0 rsh
      05:02:46:[ 8200] 840000043  8200     7536       30   1       0             0 rsh
      05:02:46:[ 8201]     0  8201    26517      307   1       0             0 run_test.sh
      05:02:48:[ 8381]    38  8381     7005      275   0       0             0 ntpd
      05:02:49:[ 8415]     0  8415    27591      328   1       0             0 bash
      05:02:49:[26047]     0 26047    27591      151   0       0             0 bash
      05:02:49:[26048]     0 26048    25228      137   1       0             0 tee
      05:02:50:[26363]     0 26363    27669      293   0       0             0 bash
      05:02:51:[14540]     0 14540    27669      162   0       0             0 bash
      05:02:51:[14541]     0 14541    25228      137   1       0             0 tee
      05:02:53:[14861]     0 14861    27669      119   1       0             0 bash
      05:02:53:[14862]     0 14862    26555      140   1       0             0 dd
      05:02:53:Out of memory: Kill process 1134 (rsyslogd) score 1 or sacrifice child
      05:02:54:Killed process 1134, UID 0, (rsyslogd) total-vm:249280kB, anon-rss:0kB, file-rss:992kB
      05:02:54:rs:main Q:Reg invoked oom-killer: gfp_mask=0x200da, order=0, oom_adj=0, oom_score_adj=0
      05:02:55:rs:main Q:Reg cpuset=/ mems_allowed=0
      05:02:55:Pid: 1135, comm: rs:main Q:Reg Not tainted 2.6.32-358.23.2.el6.x86_64 #1
      

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: