Details
-
Bug
-
Resolution: Cannot Reproduce
-
Minor
-
None
-
Lustre 2.2.0
-
None
-
Lustre Tag: v2_2_0_0_RC2
Lustre Build: http://build.whamcloud.com/job/lustre-b2_2/17/
Distro/Arch: RHEL5.7/x86_64(client), RHEL6.2/x86_64(server)
Network: TCP (1GigE)
ENABLE_QUOTA=yes
FAILURE_MODE=HARD
MGS/MDS Nodes: client-32vm5(active), client-32vm6(passive)
\ /
1 combined MGS/MDT
OSS Nodes: client-32vm7(active), client-32vm8(active)
\ /
OST1 (active in client-32vm7)
OST2 (active in client-32vm8)
OST3 (active in client-32vm7)
OST4 (active in client-32vm8)
OST5 (active in client-32vm7)
OST6 (active in client-32vm8)
Client Nodes: client-32vm[1-4]
Lustre Tag: v2_2_0_0_RC2 Lustre Build: http://build.whamcloud.com/job/lustre-b2_2/17/ Distro/Arch: RHEL5.7/x86_64(client), RHEL6.2/x86_64(server) Network: TCP (1GigE) ENABLE_QUOTA=yes FAILURE_MODE=HARD MGS/MDS Nodes: client-32vm5(active), client-32vm6(passive) \ / 1 combined MGS/MDT OSS Nodes: client-32vm7(active), client-32vm8(active) \ / OST1 (active in client-32vm7) OST2 (active in client-32vm8) OST3 (active in client-32vm7) OST4 (active in client-32vm8) OST5 (active in client-32vm7) OST6 (active in client-32vm8) Client Nodes: client-32vm[1-4]
-
3
-
9770
Description
While running recovery-*-scale tests on VMs with RHEL5.7/x86_64 clients and RHEL6.2/x86_64 servers, oom issue kept occurring on one of the client nodes:
init invoked oom-killer: gfp_mask=0x200d2, order=0, oomkilladj=0 Call Trace: [<ffffffff800c962a>] out_of_memory+0x8e/0x2f3 [<ffffffff8000f625>] __alloc_pages+0x27f/0x308 [<ffffffff80032903>] read_swap_cache_async+0x45/0xd8 [<ffffffff800cf3e3>] swapin_readahead+0x60/0xd3 [<ffffffff800092cb>] __handle_mm_fault+0xb62/0x1039 [<ffffffff8008e430>] default_wake_function+0x0/0xe [<ffffffff8006720b>] do_page_fault+0x4cb/0x874 [<ffffffff800a4931>] ktime_get_ts+0x1a/0x4e [<ffffffff800bfe9c>] delayacct_end+0x5d/0x86 [<ffffffff8005dde9>] error_exit+0x0/0x84 [<ffffffff80061e0e>] copy_user_generic_unrolled+0x86/0xac [<ffffffff800eb7f9>] core_sys_select+0x1f9/0x265 [<ffffffff8002cc16>] mntput_no_expire+0x19/0x89 [<ffffffff8001b007>] cp_new_stat+0xe5/0xfd [<ffffffff80016a40>] sys_select+0x153/0x17c [<ffffffff8005d116>] system_call+0x7e/0x83 Node 0 DMA per-cpu: cpu 0 hot: high 0, batch 1 used:0 cpu 0 cold: high 0, batch 1 used:0 Node 0 DMA32 per-cpu: cpu 0 hot: high 186, batch 31 used:48 cpu 0 cold: high 62, batch 15 used:61 Node 0 Normal per-cpu: empty Node 0 HighMem per-cpu: empty Free pages: 8656kB (0kB HighMem) Active:6 inactive:486961 dirty:0 writeback:675 unstable:0 free:2164 slab:12585 mapped-file:1064 mapped-anon:596 pagetables:1241 Node 0 DMA free:3032kB min:24kB low:28kB high:36kB active:0kB inactive:0kB present:9736kB pages_scanned:0 all_unreclaimable? yes lowmem_reserve[]: 0 2003 2003 2003 Node 0 DMA32 free:5624kB min:5712kB low:7140kB high:8568kB active:24kB inactive:1947844kB present:2052068kB pages_scanned:5192088 all_unreclaimable? yes lowmem_reserve[]: 0 0 0 0 Node 0 Normal free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 0 0 0 Node 0 HighMem free:0kB min:128kB low:128kB high:128kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 0 0 0 Node 0 DMA: 4*4kB 5*8kB 2*16kB 4*32kB 4*64kB 2*128kB 1*256kB 0*512kB 2*1024kB 0*2048kB 0*4096kB = 3032kB Node 0 DMA32: 0*4kB 1*8kB 1*16kB 1*32kB 1*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 1*4096kB = 5624kB Node 0 Normal: empty Node 0 HighMem: empty 486967 pagecache pages Swap cache: add 9174, delete 8578, find 337/499, race 0+0 Free swap = 4072784kB Total swap = 4104596kB Out of memory: Killed process 2072, UID 51, (sendmail).
Maloo report: https://maloo.whamcloud.com/test_sessions/b3c52910-77de-11e1-841d-5254004bbbd3
Another instance: https://maloo.whamcloud.com/test_sessions/1eaca93a-7800-11e1-841d-5254004bbbd3
syslogd invoked oom-killer: gfp_mask=0x201d2, order=0, oomkilladj=0 Call Trace: [<ffffffff800c962a>] out_of_memory+0x8e/0x2f3 [<ffffffff8000f625>] __alloc_pages+0x27f/0x308 [<ffffffff8001300a>] __do_page_cache_readahead+0x96/0x179 [<ffffffff80013945>] filemap_nopage+0x14c/0x360 [<ffffffff80008964>] __handle_mm_fault+0x1fb/0x1039 [<ffffffff800a28fb>] autoremove_wake_function+0x0/0x2e [<ffffffff8000ebd4>] find_get_pages_tag+0x34/0x89 [<ffffffff8006720b>] do_page_fault+0x4cb/0x874 [<ffffffff800f5a22>] sync_inode+0x24/0x33 [<ffffffff8804c370>] :ext3:ext3_sync_file+0xcc/0xf8 [<ffffffff8005dde9>] error_exit+0x0/0x84 Node 0 DMA per-cpu: cpu 0 hot: high 0, batch 1 used:0 cpu 0 cold: high 0, batch 1 used:0 Node 0 DMA32 per-cpu: cpu 0 hot: high 186, batch 31 used:55 cpu 0 cold: high 62, batch 15 used:47 Node 0 Normal per-cpu: empty Node 0 HighMem per-cpu: empty Free pages: 8624kB (0kB HighMem) Active:6 inactive:486329 dirty:0 writeback:2652 unstable:0 free:2156 slab:13169 mapped-file:1064 mapped-anon:596 pagetables:1232 Node 0 DMA free:3032kB min:24kB low:28kB high:36kB active:0kB inactive:0kB present:9736kB pages_scanned:0 all_unreclaimable? yes lowmem_reserve[]: 0 2003 2003 2003 Node 0 DMA32 free:5592kB min:5712kB low:7140kB high:8568kB active:24kB inactive:1945316kB present:2052068kB pages_scanned:4943222 all_unreclaimable? yes lowmem_reserve[]: 0 0 0 0 Node 0 Normal free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 0 0 0 Node 0 HighMem free:0kB min:128kB low:128kB high:128kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 0 0 0 Node 0 DMA: 4*4kB 5*8kB 2*16kB 4*32kB 4*64kB 2*128kB 1*256kB 0*512kB 2*1024kB 0*2048kB 0*4096kB = 3032kB Node 0 DMA32: 0*4kB 1*8kB 1*16kB 0*32kB 1*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 1*4096kB = 5592kB Node 0 Normal: empty Node 0 HighMem: empty 486367 pagecache pages Swap cache: add 8131, delete 7535, find 108/139, race 0+0 Free swap = 4072828kB Total swap = 4104596kB Out of memory: Killed process 2075, UID 51, (sendmail).
The total memory size on each VM is about 2GB.
BTW, the same tests passed on the same VMs with RHEL6.2/x86_64 distro/arch both on clients and servers:
https://maloo.whamcloud.com/test_sessions/f4dd044e-7708-11e1-a169-5254004bbbd3