Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2014

Test failure on test suite performance-sanity

    XMLWordPrintable

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Major
    • None
    • Lustre 2.4.0
    • 3
    • 4115

    Description

      This issue was created by maloo for Li Wei <liwei@whamcloud.com>

      This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/b21b4ae6-049b-11e2-bfd4-52540035b04c.

      The OSS went OOM. From the OSS console:

      01:36:27:Lustre: DEBUG MARKER: /usr/sbin/lctl mark ===== mdsrate-create-large.sh ### 1 NODE CREATE ###
      01:36:27:Lustre: DEBUG MARKER: ===== mdsrate-create-large.sh
      01:43:20:Lustre: 3565:0:(client.c:1905:ptlrpc_expire_one_request()) @@@ Request  sent has timed out for sent delay: [sent 1348216986/real 0]  req@ffff880011e35000 x1413704938685180/t0(0) o400->MGC10.10.4.186@tcp@10.10.4.186@tcp:26/25 lens 224/224 e 0 to 1 dl 1348216993 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
      01:43:20:LustreError: 166-1: MGC10.10.4.186@tcp: Connection to MGS (at 10.10.4.186@tcp) was lost; in progress operations using this service will fail
      01:43:20:Lustre: 3563:0:(client.c:1905:ptlrpc_expire_one_request()) @@@ Request  sent has timed out for sent delay: [sent 1348216993/real 0]  req@ffff88007ad55c00 x1413704938685182/t0(0) o250->MGC10.10.4.186@tcp@10.10.4.186@tcp:26/25 lens 400/544 e 0 to 1 dl 1348216999 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
      01:43:31:auditd invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=-17, oom_score_adj=-1000
      01:43:31:auditd cpuset=/ mems_allowed=0
      01:43:31:Pid: 1143, comm: auditd Not tainted 2.6.32-279.5.1.el6_lustre.g7f15218.x86_64 #1
      01:43:31:Call Trace:
      01:43:31: [<ffffffff810c4aa1>] ? cpuset_print_task_mems_allowed+0x91/0xb0
      01:43:31: [<ffffffff81117210>] ? dump_header+0x90/0x1b0
      01:43:31: [<ffffffff810e368e>] ? __delayacct_freepages_end+0x2e/0x30
      01:43:31: [<ffffffff8121489c>] ? security_real_capable_noaudit+0x3c/0x70
      01:43:31: [<ffffffff81117692>] ? oom_kill_process+0x82/0x2a0
      01:43:31: [<ffffffff8111758e>] ? select_bad_process+0x9e/0x120
      01:43:31: [<ffffffff81117ad0>] ? out_of_memory+0x220/0x3c0
      01:43:31: [<ffffffff81136a89>] ? zone_statistics+0x99/0xc0
      01:43:31: [<ffffffff811277ee>] ? __alloc_pages_nodemask+0x89e/0x940
      01:43:31: [<ffffffff8115c30a>] ? alloc_pages_current+0xaa/0x110
      01:43:31: [<ffffffff81114617>] ? __page_cache_alloc+0x87/0x90
      01:43:31: [<ffffffff8112a23b>] ? __do_page_cache_readahead+0xdb/0x210
      01:43:31: [<ffffffff8112a391>] ? ra_submit+0x21/0x30
      01:43:31: [<ffffffff81115943>] ? filemap_fault+0x4c3/0x500
      01:43:31: [<ffffffff8113ed44>] ? __do_fault+0x54/0x510
      01:43:31: [<ffffffff8113f2f7>] ? handle_pte_fault+0xf7/0xb50
      01:43:31: [<ffffffff814fdca0>] ? thread_return+0x4e/0x76e
      01:43:31: [<ffffffff81096fa3>] ? __hrtimer_start_range_ns+0x1a3/0x460
      01:43:31: [<ffffffff81096481>] ? lock_hrtimer_base+0x31/0x60
      01:43:31: [<ffffffff810972df>] ? hrtimer_try_to_cancel+0x3f/0xd0
      01:43:31: [<ffffffff8113ff34>] ? handle_mm_fault+0x1e4/0x2b0
      01:43:31: [<ffffffff81044479>] ? __do_page_fault+0x139/0x480
      01:43:31: [<ffffffff811beed6>] ? ep_poll+0x306/0x330
      01:43:31: [<ffffffff81060250>] ? default_wake_function+0x0/0x20
      01:43:31: [<ffffffff815036de>] ? do_page_fault+0x3e/0xa0
      01:43:31: [<ffffffff81500a95>] ? page_fault+0x25/0x30
      01:43:31:Mem-Info:
      01:43:31:Node 0 DMA per-cpu:
      01:43:31:CPU    0: hi:    0, btch:   1 usd:   0
      01:43:31:Node 0 DMA32 per-cpu:
      01:43:31:CPU    0: hi:  186, btch:  31 usd:  95
      01:43:31:active_anon:0 inactive_anon:1 isolated_anon:0
      01:43:31: active_file:3080 inactive_file:2931 isolated_file:0
      01:43:31: unevictable:0 dirty:0 writeback:0 unstable:0
      01:43:31: free:13267 slab_reclaimable:410440 slab_unreclaimable:11251
      01:43:31: mapped:1 shmem:0 pagetables:726 bounce:0
      01:43:31:Node 0 DMA free:8352kB min:332kB low:412kB high:496kB active_anon:0kB inactive_anon:0kB active_file:160kB inactive_file:112kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15324kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:6964kB slab_unreclaimable:132kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:123126 all_unreclaimable? yes
      01:43:31:lowmem_reserve[]: 0 2003 2003 2003
      01:43:31:Node 0 DMA32 free:44716kB min:44720kB low:55900kB high:67080kB active_anon:0kB inactive_anon:4kB active_file:12160kB inactive_file:11612kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2052064kB mlocked:0kB dirty:0kB writeback:0kB mapped:4kB shmem:0kB slab_reclaimable:1634796kB slab_unreclaimable:44872kB kernel_stack:1624kB pagetables:2904kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:663531 all_unreclaimable? yes
      01:43:31:lowmem_reserve[]: 0 0 0 0
      01:43:31:Node 0 DMA: 34*4kB 35*8kB 20*16kB 16*32kB 9*64kB 7*128kB 2*256kB 2*512kB 2*1024kB 1*2048kB 0*4096kB = 8352kB
      01:43:31:Node 0 DMA32: 3943*4kB 1842*8kB 412*16kB 48*32kB 1*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 1*4096kB = 44716kB
      01:43:31:6012 total pagecache pages
      01:43:31:1 pages in swap cache
      01:43:31:Swap cache stats: add 4103, delete 4102, find 33/55
      01:43:31:Free swap  = 4112960kB
      01:43:31:Total swap = 4128760kB
      01:43:31:524284 pages RAM
      01:43:31:43628 pages reserved
      01:43:31:6039 pages shared
      01:43:31:456143 pages non-shared
      01:43:31:[ pid ]   uid  tgid total_vm      rss cpu oom_adj oom_score_adj name
      01:43:31:[  469]     0   469     2754        0   0     -17         -1000 udevd
      01:43:31:[  796]     0   796     2664        0   0     -17         -1000 udevd
      01:43:31:[ 1143]     0  1143     6914        1   0     -17         -1000 auditd
      01:43:31:[ 1159]     0  1159    62270        1   0       0             0 rsyslogd
      01:43:31:[ 1201]    32  1201     4742        1   0       0             0 rpcbind
      01:43:31:[ 1213]     0  1213    45433        1   0       0             0 sssd
      01:43:31:[ 1217]     0  1217    50973        1   0       0             0 sssd_be
      01:43:31:[ 1225]     0  1225    42895        1   0       0             0 sssd_nss
      01:43:31:[ 1226]     0  1226    42844        1   0       0             0 sssd_pam
      01:43:31:[ 1233]    29  1233     6353        1   0       0             0 rpc.statd
      01:43:31:[ 1359]    81  1359     5867        1   0       0             0 dbus-daemon
      01:43:31:[ 1392]     0  1392     1018        0   0       0             0 acpid
      01:43:31:[ 1401]    68  1401     6785        1   0       0             0 hald
      01:43:31:[ 1402]     0  1402     4524        1   0       0             0 hald-runner
      01:43:31:[ 1430]     0  1430     5053        1   0       0             0 hald-addon-inpu
      01:43:31:[ 1440]    68  1440     4449        1   0       0             0 hald-addon-acpi
      01:43:31:[ 1461]     0  1461   150867        2   0       0             0 automount
      01:43:31:[ 1502]     0  1502    26825        0   0       0             0 rpc.rquotad
      01:43:31:[ 1506]     0  1506     5412        0   0       0             0 rpc.mountd
      01:43:31:[ 1555]     0  1555     6289        1   0       0             0 rpc.idmapd
      01:43:31:[ 1580]     0  1580    16016        0   0     -17         -1000 sshd
      01:43:31:[ 1588]     0  1588     5521        1   0       0             0 xinetd
      01:43:31:[ 1596]    38  1596     7003        1   0       0             0 ntpd
      01:43:31:[ 1612]     0  1612    22182        0   0       0             0 sendmail
      01:43:31:[ 1620]    51  1620    19528        0   0       0             0 sendmail
      01:43:31:[ 1642]     0  1642    27016        1   0       0             0 abrt-dump-oops
      01:43:31:[ 1650]     0  1650    29302        1   0       0             0 crond
      01:43:31:[ 1661]     0  1661     5362        0   0       0             0 atd
      01:43:31:[ 1686]     0  1686     1017        1   0       0             0 agetty
      01:43:31:[ 1688]     0  1688     1014        1   0       0             0 mingetty
      01:43:31:[ 1690]     0  1690     1014        1   0       0             0 mingetty
      01:43:31:[ 1692]     0  1692     1014        1   0       0             0 mingetty
      01:43:31:[ 1694]     0  1694     2753        0   0     -17         -1000 udevd
      01:43:31:[ 1695]     0  1695     1014        1   0       0             0 mingetty
      01:43:31:[ 1697]     0  1697     1014        1   0       0             0 mingetty
      01:43:31:[ 1699]     0  1699     1014        1   0       0             0 mingetty
      01:43:31:Out of memory: Kill process 1159 (rsyslogd) score 1 or sacrifice child
      01:43:31:Killed process 1159, UID 0, (rsyslogd) total-vm:249080kB, anon-rss:0kB, file-rss:4kB
      

      Attachments

        Activity

          People

            wc-triage WC Triage
            maloo Maloo
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: