Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1261

oom-killer was invoked while running recovery-*-scale tests on VMs

    XMLWordPrintable

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Minor
    • None
    • Lustre 2.2.0
    • None
    • 3
    • 9770

    Description

      While running recovery-*-scale tests on VMs with RHEL5.7/x86_64 clients and RHEL6.2/x86_64 servers, oom issue kept occurring on one of the client nodes:

      init invoked oom-killer: gfp_mask=0x200d2, order=0, oomkilladj=0
      
      Call Trace:
       [<ffffffff800c962a>] out_of_memory+0x8e/0x2f3
       [<ffffffff8000f625>] __alloc_pages+0x27f/0x308
       [<ffffffff80032903>] read_swap_cache_async+0x45/0xd8
       [<ffffffff800cf3e3>] swapin_readahead+0x60/0xd3
       [<ffffffff800092cb>] __handle_mm_fault+0xb62/0x1039
       [<ffffffff8008e430>] default_wake_function+0x0/0xe
       [<ffffffff8006720b>] do_page_fault+0x4cb/0x874
       [<ffffffff800a4931>] ktime_get_ts+0x1a/0x4e
       [<ffffffff800bfe9c>] delayacct_end+0x5d/0x86
       [<ffffffff8005dde9>] error_exit+0x0/0x84
       [<ffffffff80061e0e>] copy_user_generic_unrolled+0x86/0xac
       [<ffffffff800eb7f9>] core_sys_select+0x1f9/0x265
       [<ffffffff8002cc16>] mntput_no_expire+0x19/0x89
       [<ffffffff8001b007>] cp_new_stat+0xe5/0xfd
       [<ffffffff80016a40>] sys_select+0x153/0x17c
       [<ffffffff8005d116>] system_call+0x7e/0x83
      
      Node 0 DMA per-cpu:
      cpu 0 hot: high 0, batch 1 used:0
      cpu 0 cold: high 0, batch 1 used:0
      Node 0 DMA32 per-cpu:
      cpu 0 hot: high 186, batch 31 used:48
      cpu 0 cold: high 62, batch 15 used:61
      Node 0 Normal per-cpu: empty
      Node 0 HighMem per-cpu: empty
      Free pages:        8656kB (0kB HighMem)
      Active:6 inactive:486961 dirty:0 writeback:675 unstable:0 free:2164 slab:12585 mapped-file:1064 mapped-anon:596 pagetables:1241
      Node 0 DMA free:3032kB min:24kB low:28kB high:36kB active:0kB inactive:0kB present:9736kB pages_scanned:0 all_unreclaimable? yes
      lowmem_reserve[]: 0 2003 2003 2003
      Node 0 DMA32 free:5624kB min:5712kB low:7140kB high:8568kB active:24kB inactive:1947844kB present:2052068kB pages_scanned:5192088 all_unreclaimable? yes
      lowmem_reserve[]: 0 0 0 0
      Node 0 Normal free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
      lowmem_reserve[]: 0 0 0 0
      Node 0 HighMem free:0kB min:128kB low:128kB high:128kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
      lowmem_reserve[]: 0 0 0 0
      Node 0 DMA: 4*4kB 5*8kB 2*16kB 4*32kB 4*64kB 2*128kB 1*256kB 0*512kB 2*1024kB 0*2048kB 0*4096kB = 3032kB
      Node 0 DMA32: 0*4kB 1*8kB 1*16kB 1*32kB 1*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 1*4096kB = 5624kB
      Node 0 Normal: empty
      Node 0 HighMem: empty
      486967 pagecache pages
      Swap cache: add 9174, delete 8578, find 337/499, race 0+0
      Free swap  = 4072784kB
      Total swap = 4104596kB
      Out of memory: Killed process 2072, UID 51, (sendmail).
      

      Maloo report: https://maloo.whamcloud.com/test_sessions/b3c52910-77de-11e1-841d-5254004bbbd3

      Another instance: https://maloo.whamcloud.com/test_sessions/1eaca93a-7800-11e1-841d-5254004bbbd3

      syslogd invoked oom-killer: gfp_mask=0x201d2, order=0, oomkilladj=0
      
      Call Trace:
       [<ffffffff800c962a>] out_of_memory+0x8e/0x2f3
       [<ffffffff8000f625>] __alloc_pages+0x27f/0x308
       [<ffffffff8001300a>] __do_page_cache_readahead+0x96/0x179
       [<ffffffff80013945>] filemap_nopage+0x14c/0x360
       [<ffffffff80008964>] __handle_mm_fault+0x1fb/0x1039
       [<ffffffff800a28fb>] autoremove_wake_function+0x0/0x2e
       [<ffffffff8000ebd4>] find_get_pages_tag+0x34/0x89
       [<ffffffff8006720b>] do_page_fault+0x4cb/0x874
       [<ffffffff800f5a22>] sync_inode+0x24/0x33
       [<ffffffff8804c370>] :ext3:ext3_sync_file+0xcc/0xf8
       [<ffffffff8005dde9>] error_exit+0x0/0x84
      
      Node 0 DMA per-cpu:
      cpu 0 hot: high 0, batch 1 used:0
      cpu 0 cold: high 0, batch 1 used:0
      Node 0 DMA32 per-cpu:
      cpu 0 hot: high 186, batch 31 used:55
      cpu 0 cold: high 62, batch 15 used:47
      Node 0 Normal per-cpu: empty
      Node 0 HighMem per-cpu: empty
      Free pages:        8624kB (0kB HighMem)
      Active:6 inactive:486329 dirty:0 writeback:2652 unstable:0 free:2156 slab:13169 mapped-file:1064 mapped-anon:596 pagetables:1232
      Node 0 DMA free:3032kB min:24kB low:28kB high:36kB active:0kB inactive:0kB present:9736kB pages_scanned:0 all_unreclaimable? yes
      lowmem_reserve[]: 0 2003 2003 2003
      Node 0 DMA32 free:5592kB min:5712kB low:7140kB high:8568kB active:24kB inactive:1945316kB present:2052068kB pages_scanned:4943222 all_unreclaimable? yes
      lowmem_reserve[]: 0 0 0 0
      Node 0 Normal free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
      lowmem_reserve[]: 0 0 0 0
      Node 0 HighMem free:0kB min:128kB low:128kB high:128kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
      lowmem_reserve[]: 0 0 0 0
      Node 0 DMA: 4*4kB 5*8kB 2*16kB 4*32kB 4*64kB 2*128kB 1*256kB 0*512kB 2*1024kB 0*2048kB 0*4096kB = 3032kB
      Node 0 DMA32: 0*4kB 1*8kB 1*16kB 0*32kB 1*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 1*4096kB = 5592kB
      Node 0 Normal: empty
      Node 0 HighMem: empty
      486367 pagecache pages
      Swap cache: add 8131, delete 7535, find 108/139, race 0+0
      Free swap  = 4072828kB
      Total swap = 4104596kB
      Out of memory: Killed process 2075, UID 51, (sendmail).
      

      The total memory size on each VM is about 2GB.

      BTW, the same tests passed on the same VMs with RHEL6.2/x86_64 distro/arch both on clients and servers:
      https://maloo.whamcloud.com/test_sessions/f4dd044e-7708-11e1-a169-5254004bbbd3

      Attachments

        Activity

          People

            wc-triage WC Triage
            yujian Jian Yu
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: