[LU-851] Test failure on test suite parallel-scale, subtest test_iorssf Created: 15/Nov/11  Updated: 29/May/17  Resolved: 29/May/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: WC Triage
Resolution: Cannot Reproduce Votes: 0
Labels: None

Issue Links:
Duplicate
is duplicated by LU-1567 1.8.8<->2.3 Test failure on test suit... Closed
Severity: 3
Rank (Obsolete): 5426

 Description   

This issue was created by maloo for sarah <sarah@whamcloud.com>

This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/08d44894-1008-11e1-8338-52540025f9af.

The sub-test test_iorssf failed with the following error:

ior failed! 1

Info required for matching: parallel-scale iorssf



 Comments   
Comment by Johann Lombardi (Inactive) [ 24/Nov/11 ]

OOM issue, ouch

Lustre: DEBUG MARKER: == parallel-scale test iorssf: iorssf == 19:33:32 (1321414412)
automount invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=0
automount cpuset=/ mems_allowed=0
Pid: 1617, comm: automount Not tainted 2.6.32-131.6.1.el6.i686 #1
Call Trace:
 [<c04df510>] ? oom_kill_process+0xb0/0x2d0
 [<c04dfbba>] ? __out_of_memory+0x4a/0x90
 [<c04dfc55>] ? out_of_memory+0x55/0xb0
 [<c04eda4b>] ? __alloc_pages_nodemask+0x7fb/0x810
 [<c0519a5c>] ? cache_alloc_refill+0x2bc/0x510
 [<c0519734>] ? kmem_cache_alloc+0xa4/0x110
 [<c0574d6f>] ? proc_self_follow_link+0x5f/0x90
 [<c053d972>] ? touch_atime+0xf2/0x140
 [<c0534452>] ? do_follow_link+0xe2/0x3d0
 [<c053bfa8>] ? __d_instantiate+0x38/0xd0
 [<c0533e70>] ? __link_path_walk+0x1b0/0x6b0
 [<c05344be>] ? do_follow_link+0x14e/0x3d0
 [<c05342f7>] ? __link_path_walk+0x637/0x6b0
 [<c0534951>] ? path_walk+0x51/0xc0
 [<c0534ad9>] ? do_path_lookup+0x59/0x90
 [<c05355b4>] ? do_filp_open+0xc4/0xb30
 [<c0519982>] ? cache_alloc_refill+0x1e2/0x510
 [<c0525448>] ? do_sys_open+0x58/0x130
 [<c04adc5c>] ? audit_syscall_entry+0x21c/0x240
 [<c04ad970>] ? __audit_syscall_exit+0x220/0x250
 [<c052559c>] ? sys_open+0x2c/0x40
 [<c0409bdf>] ? sysenter_do_call+0x12/0x28
Mem-Info:
DMA per-cpu:
CPU    0: hi:    0, btch:   1 usd:   0
CPU    1: hi:    0, btch:   1 usd:   0
CPU    2: hi:    0, btch:   1 usd:   0
CPU    3: hi:    0, btch:   1 usd:   0
Normal per-cpu:
CPU    0: hi:  186, btch:  31 usd:   0
CPU    1: hi:  186, btch:  31 usd:   0
CPU    2: hi:  186, btch:  31 usd:   0
CPU    3: hi:  186, btch:  31 usd:   0
HighMem per-cpu:
CPU    0: hi:  186, btch:  31 usd:   0
CPU    1: hi:  186, btch:  31 usd:   0
CPU    2: hi:  186, btch:  31 usd:   0
CPU    3: hi:  186, btch:  31 usd:   0
active_anon:15644 inactive_anon:1589 isolated_anon:0
 active_file:3561 inactive_file:934696 isolated_file:0
 unevictable:0 dirty:0 writeback:0 unstable:0
 free:1953182 slab_reclaimable:4012 slab_unreclaimable:127469
 mapped:4625 shmem:46 pagetables:569 bounce:0
DMA free:3464kB min:64kB low:80kB high:96kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15792kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:148kB slab_unreclaimable:4560kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
lowmem_reserve[]: 0 863 12159 12159
Normal free:3808kB min:3724kB low:4652kB high:5584kB active_anon:0kB inactive_anon:0kB active_file:64kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:883912kB mlocked:0kB dirty:0kB writeback:0kB mapped:4kB shmem:0kB slab_reclaimable:15900kB slab_unreclaimable:505316kB kernel_stack:2256kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:168 all_unreclaimable? yes
lowmem_reserve[]: 0 0 90370 90370
HighMem free:7805456kB min:512kB low:12700kB high:24888kB active_anon:62576kB inactive_anon:6356kB active_file:14180kB inactive_file:3738784kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:11567412kB mlocked:0kB dirty:0kB writeback:0kB mapped:18496kB shmem:184kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:2276kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
DMA: 2*4kB 0*8kB 2*16kB 3*32kB 2*64kB 9*128kB 0*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 3464kB
Normal: 69*4kB 0*8kB 1*16kB 1*32kB 1*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 0*4096kB = 3844kB
HighMem: 45*4kB 27*8kB 4*16kB 10*32kB 14*64kB 4*128kB 6*256kB 6*512kB 2*1024kB 3*2048kB 1902*4096kB = 7805580kB
803188 total pagecache pages
0 pages in swap cache
Swap cache stats: add 0, delete 0, find 0/0
Free swap  = 14565368kB
Total swap = 14565368kB
3145712 pages RAM
2918914 pages HighMem
62162 pages reserved
823317 pages shared
285915 pages non-shared
Out of memory: kill process 1536 (mpirun) score 5602 or a child
Killed process 1540 (IOR) vsz:118652kB, anon-rss:26332kB, file-rss:2704kB
IOR: page allocation failure. order:0, mode:0x50
Pid: 1540, comm: IOR Not tainted 2.6.32-131.6.1.el6.i686 #1
Call Trace:
 [<c04ed8e6>] ? __alloc_pages_nodemask+0x696/0x810
 [<c0519a5c>] ? cache_alloc_refill+0x2bc/0x510
 [<c0519734>] ? kmem_cache_alloc+0xa4/0x110
 [<fc8d0c2a>] ? osc_page_init+0x3a/0x3f0 [osc]
 [<fd139463>] ? lovsub_page_init+0x183/0x4e0 [lov]
 [<fc8d0bf0>] ? osc_page_init+0x0/0x3f0 [osc]
 [<fa0c3271>] ? cl_page_find0+0x251/0xd60 [obdclass]
 [<fd133016>] ? lov_sub_get+0xf6/0x900 [lov]
 [<fa0c3d9f>] ? cl_page_find_sub+0x1f/0x30 [obdclass]
 [<fd12a2f8>] ? lov_page_init_raid0+0x218/0xb30 [lov]
 [<fddea077>] ? vvp_page_init+0x187/0x380 [lustre]
 [<fd125d4f>] ? lov_page_init+0x4f/0xa0 [lov]
 [<fd125d00>] ? lov_page_init+0x0/0xa0 [lov]
 [<fa0c3271>] ? cl_page_find0+0x251/0xd60 [obdclass]
 [<c04dcd14>] ? add_to_page_cache_locked+0xa4/0x110
 [<fa0c3dd0>] ? cl_page_find+0x20/0x30 [obdclass]
 [<fddadc5c>] ? ll_readahead+0x10ec/0x1c10 [lustre]
 [<fddeb73d>] ? vvp_io_read_page+0x40d/0x5b0 [lustre]
 [<fa0d2d36>] ? cl_io_read_page+0xa6/0x2b0 [obdclass]
 [<fddaec99>] ? ll_readpage+0x99/0x2c0 [lustre]
 [<c04dc75d>] ? find_get_page+0x1d/0x90
 [<c04ddb86>] ? generic_file_aio_read+0x1e6/0x780
 [<c05f845a>] ? vsnprintf+0x2ea/0x3f0
 [<fddec09b>] ? vvp_io_read_start+0x1cb/0x5a0 [lustre]
 [<fa0ce862>] ? cl_io_start+0x82/0x270 [obdclass]
 [<fa0d6285>] ? cl_io_loop+0x135/0x2a0 [obdclass]
 [<fdd6656a>] ? ll_file_io_generic+0x41a/0x6c0 [lustre]
 [<fdd66931>] ? ll_file_aio_read+0x121/0x4d0 [lustre]
 [<fdd727f5>] ? ll_file_read+0x165/0x440 [lustre]
 [<c059d18c>] ? security_file_permission+0xc/0x10
 [<c0527c06>] ? rw_verify_area+0x66/0xe0
 [<fdd72690>] ? ll_file_read+0x0/0x440 [lustre]
 [<c05285fd>] ? vfs_read+0x9d/0x190
 [<c04adc5c>] ? audit_syscall_entry+0x21c/0x240
 [<c0528731>] ? sys_read+0x41/0x70
 [<c0409bdf>] ? sysenter_do_call+0x12/0x28
Comment by Johann Lombardi (Inactive) [ 24/Nov/11 ]

I wonder if this bug is due to the combination of 32-bit clients with bugzilla 23529.
Sarah, could you please try to reproduce this issue? If it can be reproduced reliably, then it is worth trying to disable async journal commit to see if the problem goes away.
Also, have we ever hit this OOM issue with x86_64 client?

Comment by Peter Jones [ 24/Nov/11 ]

We are deprecating i686 clients for 2.2 so if this problem is limited to i686 then we should not worry about it

Comment by Oleg Drokin [ 03/Jan/12 ]

Is this another "async journal OOM" issue I wonder?

Comment by Andreas Dilger [ 29/May/17 ]

Close old ticket.

Generated at Sat Feb 10 01:11:01 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.