[LU-1261] oom-killer was invoked while running recovery-*-scale tests on VMs Created: 27/Mar/12 Updated: 29/May/17 Resolved: 29/May/17 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.2.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Jian Yu | Assignee: | WC Triage |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Lustre Tag: v2_2_0_0_RC2 MGS/MDS Nodes: client-32vm5(active), client-32vm6(passive) OSS Nodes: client-32vm7(active), client-32vm8(active) Client Nodes: client-32vm[1-4] |
||
| Severity: | 3 |
| Rank (Obsolete): | 9770 |
| Description |
|
While running recovery-*-scale tests on VMs with RHEL5.7/x86_64 clients and RHEL6.2/x86_64 servers, oom issue kept occurring on one of the client nodes: init invoked oom-killer: gfp_mask=0x200d2, order=0, oomkilladj=0 Call Trace: [<ffffffff800c962a>] out_of_memory+0x8e/0x2f3 [<ffffffff8000f625>] __alloc_pages+0x27f/0x308 [<ffffffff80032903>] read_swap_cache_async+0x45/0xd8 [<ffffffff800cf3e3>] swapin_readahead+0x60/0xd3 [<ffffffff800092cb>] __handle_mm_fault+0xb62/0x1039 [<ffffffff8008e430>] default_wake_function+0x0/0xe [<ffffffff8006720b>] do_page_fault+0x4cb/0x874 [<ffffffff800a4931>] ktime_get_ts+0x1a/0x4e [<ffffffff800bfe9c>] delayacct_end+0x5d/0x86 [<ffffffff8005dde9>] error_exit+0x0/0x84 [<ffffffff80061e0e>] copy_user_generic_unrolled+0x86/0xac [<ffffffff800eb7f9>] core_sys_select+0x1f9/0x265 [<ffffffff8002cc16>] mntput_no_expire+0x19/0x89 [<ffffffff8001b007>] cp_new_stat+0xe5/0xfd [<ffffffff80016a40>] sys_select+0x153/0x17c [<ffffffff8005d116>] system_call+0x7e/0x83 Node 0 DMA per-cpu: cpu 0 hot: high 0, batch 1 used:0 cpu 0 cold: high 0, batch 1 used:0 Node 0 DMA32 per-cpu: cpu 0 hot: high 186, batch 31 used:48 cpu 0 cold: high 62, batch 15 used:61 Node 0 Normal per-cpu: empty Node 0 HighMem per-cpu: empty Free pages: 8656kB (0kB HighMem) Active:6 inactive:486961 dirty:0 writeback:675 unstable:0 free:2164 slab:12585 mapped-file:1064 mapped-anon:596 pagetables:1241 Node 0 DMA free:3032kB min:24kB low:28kB high:36kB active:0kB inactive:0kB present:9736kB pages_scanned:0 all_unreclaimable? yes lowmem_reserve[]: 0 2003 2003 2003 Node 0 DMA32 free:5624kB min:5712kB low:7140kB high:8568kB active:24kB inactive:1947844kB present:2052068kB pages_scanned:5192088 all_unreclaimable? yes lowmem_reserve[]: 0 0 0 0 Node 0 Normal free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 0 0 0 Node 0 HighMem free:0kB min:128kB low:128kB high:128kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 0 0 0 Node 0 DMA: 4*4kB 5*8kB 2*16kB 4*32kB 4*64kB 2*128kB 1*256kB 0*512kB 2*1024kB 0*2048kB 0*4096kB = 3032kB Node 0 DMA32: 0*4kB 1*8kB 1*16kB 1*32kB 1*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 1*4096kB = 5624kB Node 0 Normal: empty Node 0 HighMem: empty 486967 pagecache pages Swap cache: add 9174, delete 8578, find 337/499, race 0+0 Free swap = 4072784kB Total swap = 4104596kB Out of memory: Killed process 2072, UID 51, (sendmail). Maloo report: https://maloo.whamcloud.com/test_sessions/b3c52910-77de-11e1-841d-5254004bbbd3 Another instance: https://maloo.whamcloud.com/test_sessions/1eaca93a-7800-11e1-841d-5254004bbbd3 syslogd invoked oom-killer: gfp_mask=0x201d2, order=0, oomkilladj=0 Call Trace: [<ffffffff800c962a>] out_of_memory+0x8e/0x2f3 [<ffffffff8000f625>] __alloc_pages+0x27f/0x308 [<ffffffff8001300a>] __do_page_cache_readahead+0x96/0x179 [<ffffffff80013945>] filemap_nopage+0x14c/0x360 [<ffffffff80008964>] __handle_mm_fault+0x1fb/0x1039 [<ffffffff800a28fb>] autoremove_wake_function+0x0/0x2e [<ffffffff8000ebd4>] find_get_pages_tag+0x34/0x89 [<ffffffff8006720b>] do_page_fault+0x4cb/0x874 [<ffffffff800f5a22>] sync_inode+0x24/0x33 [<ffffffff8804c370>] :ext3:ext3_sync_file+0xcc/0xf8 [<ffffffff8005dde9>] error_exit+0x0/0x84 Node 0 DMA per-cpu: cpu 0 hot: high 0, batch 1 used:0 cpu 0 cold: high 0, batch 1 used:0 Node 0 DMA32 per-cpu: cpu 0 hot: high 186, batch 31 used:55 cpu 0 cold: high 62, batch 15 used:47 Node 0 Normal per-cpu: empty Node 0 HighMem per-cpu: empty Free pages: 8624kB (0kB HighMem) Active:6 inactive:486329 dirty:0 writeback:2652 unstable:0 free:2156 slab:13169 mapped-file:1064 mapped-anon:596 pagetables:1232 Node 0 DMA free:3032kB min:24kB low:28kB high:36kB active:0kB inactive:0kB present:9736kB pages_scanned:0 all_unreclaimable? yes lowmem_reserve[]: 0 2003 2003 2003 Node 0 DMA32 free:5592kB min:5712kB low:7140kB high:8568kB active:24kB inactive:1945316kB present:2052068kB pages_scanned:4943222 all_unreclaimable? yes lowmem_reserve[]: 0 0 0 0 Node 0 Normal free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 0 0 0 Node 0 HighMem free:0kB min:128kB low:128kB high:128kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 0 0 0 Node 0 DMA: 4*4kB 5*8kB 2*16kB 4*32kB 4*64kB 2*128kB 1*256kB 0*512kB 2*1024kB 0*2048kB 0*4096kB = 3032kB Node 0 DMA32: 0*4kB 1*8kB 1*16kB 0*32kB 1*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 1*4096kB = 5592kB Node 0 Normal: empty Node 0 HighMem: empty 486367 pagecache pages Swap cache: add 8131, delete 7535, find 108/139, race 0+0 Free swap = 4072828kB Total swap = 4104596kB Out of memory: Killed process 2075, UID 51, (sendmail). The total memory size on each VM is about 2GB. BTW, the same tests passed on the same VMs with RHEL6.2/x86_64 distro/arch both on clients and servers: |
| Comments |
| Comment by Jian Yu [ 29/Mar/12 ] |
|
While running the tests with async journal commit disabled on OSSs, the above issue did not occur. lctl set_param obdfilter.${FSNAME}-*.sync_journal=1
Maloo reports: |
| Comment by Andreas Dilger [ 29/May/17 ] |
|
Close old ticket. |