Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5809

sanity-benchmark test pios_fpp: OOM on zfs OSS

    XMLWordPrintable

Details

    • Bug
    • Resolution: Won't Fix
    • Major
    • None
    • None
    • 3
    • 16295

    Description

      I hit 3 identical OOM panics during tests on eagle this weekend, all happened on zfs OSS during sanity-benchmark test pios_fpp:

      Lustre: DEBUG MARKER: == sanity-benchmark test pios_fpp: pios file per process == 06:54:04 (1414418044)
      Lustre: DEBUG MARKER: /usr/sbin/lctl mark \/usr\/bin\/pios  -t 1,8,40 -n 1024                          -c 1M -s 8M                             -o 16M -L fpp -p \/mnt\/lustre\/dpios_fpp.sanity-benchmark
      Lustre: DEBUG MARKER: /usr/bin/pios -t 1,8,40 -n 1024 -c 1M -s 8M -o 16M -L fpp -p /mnt/lustre/dpios_fpp.sanity-benchmark
      Lustre: lustre-OST0001: Slow creates, 128/256 objects created at a rate of 2/s
      LNet: Service thread pid 3372 completed after 91.53s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources).
      LNet: Skipped 15 previous similar messages
      Lustre: DEBUG MARKER: /usr/sbin/lctl mark \/usr\/bin\/pios  -t 1,8,40 -n 1024                          -c 1M -s 8M                             -o 16M -L fpp --verify -p \/mnt\/lustre\/dpios_fpp.sanity-benchmark
      Lustre: DEBUG MARKER: /usr/bin/pios -t 1,8,40 -n 1024 -c 1M -s 8M -o 16M -L fpp --verify -p /mnt/lustre/dpios_fpp.sanity-benchmark
      spl_kmem_cache/ invoked oom-killer: gfp_mask=0x84d0, order=0, oom_adj=0, oom_score_adj=0
      spl_kmem_cache/ cpuset=/ mems_allowed=0
      Pid: 396, comm: spl_kmem_cache/ Tainted: P           ---------------    2.6.32-431.29.2.el6_lustre.g9835a2a.x86_64 #1
      Call Trace:
       [<ffffffff810d07b1>] ? cpuset_print_task_mems_allowed+0x91/0xb0
       [<ffffffff81122b80>] ? dump_header+0x90/0x1b0
       [<ffffffff81122cee>] ? check_panic_on_oom+0x4e/0x80
       [<ffffffff811233db>] ? out_of_memory+0x1bb/0x3c0
       [<ffffffff8112fd5f>] ? __alloc_pages_nodemask+0x89f/0x8d0
       [<ffffffff81167cea>] ? alloc_pages_current+0xaa/0x110
       [<ffffffff8112d15e>] ? __get_free_pages+0xe/0x50
       [<ffffffff8104ec85>] ? pte_alloc_one_kernel+0x15/0x20
       [<ffffffff8114650b>] ? __pte_alloc_kernel+0x1b/0xc0
       [<ffffffff81157769>] ? vmap_page_range_noflush+0x309/0x370
       [<ffffffff81157802>] ? map_vm_area+0x32/0x50
       [<ffffffff81159270>] ? __vmalloc_area_node+0x100/0x190
       [<ffffffffa0115a09>] ? kv_alloc+0x59/0x60 [spl]
       [<ffffffff811590fd>] ? __vmalloc_node+0xad/0x120
       [<ffffffffa0115a09>] ? kv_alloc+0x59/0x60 [spl]
       [<ffffffff811594e2>] ? __vmalloc+0x22/0x30
       [<ffffffffa0115a09>] ? kv_alloc+0x59/0x60 [spl]
       [<ffffffffa0115a49>] ? spl_cache_grow_work+0x39/0x2d0 [spl]
       [<ffffffff81058bd3>] ? __wake_up+0x53/0x70
       [<ffffffffa01174a7>] ? taskq_thread+0x1e7/0x3f0 [spl]
       [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
       [<ffffffffa01172c0>] ? taskq_thread+0x0/0x3f0 [spl]
       [<ffffffff8109abf6>] ? kthread+0x96/0xa0
       [<ffffffff8100c20a>] ? child_rip+0xa/0x20
       [<ffffffff8109ab60>] ? kthread+0x0/0xa0
       [<ffffffff8100c200>] ? child_rip+0x0/0x20
      Mem-Info:
      Node 0 DMA per-cpu:
      CPU    0: hi:    0, btch:   1 usd:   0
      Node 0 DMA32 per-cpu:
      CPU    0: hi:  186, btch:  31 usd:   0
      active_anon:0 inactive_anon:0 isolated_anon:0
       active_file:11 inactive_file:0 isolated_file:0
       unevictable:0 dirty:0 writeback:0 unstable:0
       free:8559 slab_reclaimable:1482 slab_unreclaimable:12252
       mapped:1 shmem:0 pagetables:1242 bounce:0
      Node 0 DMA free:8352kB min:332kB low:412kB high:496kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15348kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:20kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
      lowmem_reserve[]: 0 2004 2004 2004
      Node 0 DMA32 free:25884kB min:44720kB low:55900kB high:67080kB active_anon:0kB inactive_anon:0kB active_file:44kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2052308kB mlocked:0kB dirty:0kB writeback:0kB mapped:4kB shmem:0kB slab_reclaimable:5928kB slab_unreclaimable:48988kB kernel_stack:3416kB pagetables:4968kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:100 all_unreclaimable? yes
      lowmem_reserve[]: 0 0 0 0
      Node 0 DMA: 0*4kB 0*8kB 2*16kB 0*32kB 0*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 2*2048kB 1*4096kB = 8352kB
      Node 0 DMA32: 719*4kB 340*8kB 184*16kB 84*32kB 33*64kB 16*128kB 5*256kB 2*512kB 2*1024kB 1*2048kB 1*4096kB = 25884kB
      20 total pagecache pages
      0 pages in swap cache
      Swap cache stats: add 5121, delete 5121, find 16/25
      Free swap  = 4108600kB
      Total swap = 4128764kB
      524284 pages RAM
      43654 pages reserved
      54 pages shared
      465254 pages non-shared
      

      In all cases, the paniced OSS had 1.8G memory, and ran build lustre-b2_5/96.

      Attachments

        Activity

          People

            isaac Isaac Huang (Inactive)
            isaac Isaac Huang (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: