Support for larger than 1MB sequential I/O RPCs
(LU-1431)
|
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.0 |
| Fix Version/s: | Lustre 2.4.0 |
| Type: | Technical task | Priority: | Minor |
| Reporter: | Alexey Lyashkov | Assignee: | Alex Zhuravlev |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
maloo |
||
| Issue Links: |
|
||||||||||||||||||||||||||||||||
| Rank (Obsolete): | 6674 | ||||||||||||||||||||||||||||||||
| Description |
05:34:20:ll_ost_io00_029: page allocation failure. order:8, mode:0x50 05:34:20:Pid: 13919, comm: ll_ost_io00_029 Not tainted 2.6.32-279.19.1.el6_lustre.gc7121e9.x86_64 #1 05:34:20:Call Trace: 05:34:20: [<ffffffff811231ff>] ? __alloc_pages_nodemask+0x77f/0x940 05:34:20: [<ffffffff8115d1a2>] ? kmem_getpages+0x62/0x170 05:34:20: [<ffffffff8115ddba>] ? fallback_alloc+0x1ba/0x270 05:34:20: [<ffffffff8115d80f>] ? cache_grow+0x2cf/0x320 05:34:20: [<ffffffff8115db39>] ? ____cache_alloc_node+0x99/0x160 05:34:20: [<ffffffffa04d8b60>] ? cfs_alloc+0x30/0x60 [libcfs] 05:34:20: [<ffffffff8115e909>] ? __kmalloc+0x189/0x220 05:34:20: [<ffffffffa04d8b60>] ? cfs_alloc+0x30/0x60 [libcfs] 05:34:20: [<ffffffffa0d1c8ae>] ? osd_key_init+0x1e/0x670 [osd_ldiskfs] 05:34:20: [<ffffffffa06643ef>] ? keys_fill+0x6f/0x190 [obdclass] 05:34:20: [<ffffffffa066843b>] ? lu_context_init+0xab/0x260 [obdclass] 05:34:20: [<ffffffffa07f78b3>] ? ptlrpc_main+0x203/0x1870 [ptlrpc] 05:34:20: [<ffffffff810097cc>] ? __switch_to+0x1ac/0x320 05:34:20: [<ffffffffa07f76b0>] ? ptlrpc_main+0x0/0x1870 [ptlrpc] 05:34:20: [<ffffffff8100c0ca>] ? child_rip+0xa/0x20 05:34:20: [<ffffffffa07f76b0>] ? ptlrpc_main+0x0/0x1870 [ptlrpc] 05:34:20: [<ffffffffa07f76b0>] ? ptlrpc_main+0x0/0x1870 instead of kmalloc, kmem_cache_alloc should be used to allocate and effectively handle key's. |
| Comments |
| Comment by Alexey Lyashkov [ 04/Feb/13 ] |
|
one more bug in that area. so 1Mb allocation via kmalloc, but for such buffer vmalloc prefered aka cfs_alloc_large. |
| Comment by Andreas Dilger [ 05/Feb/13 ] |
|
There are a number of problems found with PTLRPC_MAX_BRW_SIZE of 32MB that can be fixed relatively easily:
There may be others, but these were obviously apparent at the first glance into the failed allocation paths. |
| Comment by Andreas Dilger [ 11/Feb/13 ] |
|
Probably dr_blocks[] can be reduced in size as well, since we will normally have 4kB blocksize instead of 1kB blocksize, so we don't need a buffer_head for every 1kB of space. |
| Comment by Oleg Drokin [ 11/Feb/13 ] |
|
Bandaid patch to just change the allocation to OBD_ALLOC_LARGE is in http://review.whamcloud.com/5323, this seems to greatly help in my testing. |
| Comment by Alex Zhuravlev [ 15/Feb/13 ] |
| Comment by Peter Jones [ 13/Mar/13 ] |
|
As per Oleg, dropping priority for remaining work |
| Comment by James A Simmons [ 13/Mar/13 ] |
|
I found a bug in http://review.whamcloud.com/#change,5444. The rhel6.3 ldiskfs patch ext4-map_inode_page-2.6.18.patch defines ext4_map_inode_page function has having a integer array created need for it function but in fsfilt_ext3.c use call out to this function with extern int ext3_map_inode_page(struct inode *inode, struct page *page, This will crash in a very painful way. |
| Comment by James A Simmons [ 13/Mar/13 ] |
|
Patch to fix this is at http://review.whamcloud.com/#change,5708 |
| Comment by James A Simmons [ 13/Mar/13 ] |
|
Patch to fix this is at http://review.whamcloud.com/#change,5708 |
| Comment by Alex Zhuravlev [ 22/Apr/13 ] |
|
should be resolved now? |
| Comment by Peter Jones [ 22/Apr/13 ] |
|
Oleg, can your http://review.whamcloud.com/#change,5323 "band aid" patch be abandoned? |
| Comment by Andreas Dilger [ 13/May/13 ] |
|
The http://review.whamcloud.com/5444 patch was landed for 2.4.0, and http://review.whamcloud.com/5323 is no longer needed. |