Support for larger than 1MB sequential I/O RPCs (LU-1431)

[LU-2748] OSD uses kmalloc with high order to allocate a keys Created: 04/Feb/13  Updated: 29/Jan/22  Resolved: 13/May/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0
Fix Version/s: Lustre 2.4.0

Type: Technical task Priority: Minor
Reporter: Alexey Lyashkov Assignee: Alex Zhuravlev
Resolution: Fixed Votes: 0
Labels: None
Environment:

maloo


Issue Links:
Duplicate
is duplicated by LU-2964 lustre build fails on sles11sp2 Resolved
Related
is related to LU-2791 Stuck client on server OOM/lost message Resolved
is related to LU-2790 Failure to allocated osd keys leads t... Resolved
is related to LU-1431 Support for larger than 1MB sequentia... Resolved
is related to LU-2790 Failure to allocated osd keys leads t... Resolved
is related to LU-2818 Failure on test suite parallel-scale-... Resolved
Rank (Obsolete): 6674

 Description   
05:34:20:ll_ost_io00_029: page allocation failure. order:8, mode:0x50
05:34:20:Pid: 13919, comm: ll_ost_io00_029 Not tainted 2.6.32-279.19.1.el6_lustre.gc7121e9.x86_64 #1
05:34:20:Call Trace:
05:34:20: [<ffffffff811231ff>] ? __alloc_pages_nodemask+0x77f/0x940
05:34:20: [<ffffffff8115d1a2>] ? kmem_getpages+0x62/0x170
05:34:20: [<ffffffff8115ddba>] ? fallback_alloc+0x1ba/0x270
05:34:20: [<ffffffff8115d80f>] ? cache_grow+0x2cf/0x320
05:34:20: [<ffffffff8115db39>] ? ____cache_alloc_node+0x99/0x160
05:34:20: [<ffffffffa04d8b60>] ? cfs_alloc+0x30/0x60 [libcfs]
05:34:20: [<ffffffff8115e909>] ? __kmalloc+0x189/0x220
05:34:20: [<ffffffffa04d8b60>] ? cfs_alloc+0x30/0x60 [libcfs]
05:34:20: [<ffffffffa0d1c8ae>] ? osd_key_init+0x1e/0x670 [osd_ldiskfs]
05:34:20: [<ffffffffa06643ef>] ? keys_fill+0x6f/0x190 [obdclass]
05:34:20: [<ffffffffa066843b>] ? lu_context_init+0xab/0x260 [obdclass]
05:34:20: [<ffffffffa07f78b3>] ? ptlrpc_main+0x203/0x1870 [ptlrpc]
05:34:20: [<ffffffff810097cc>] ? __switch_to+0x1ac/0x320
05:34:20: [<ffffffffa07f76b0>] ? ptlrpc_main+0x0/0x1870 [ptlrpc]
05:34:20: [<ffffffff8100c0ca>] ? child_rip+0xa/0x20
05:34:20: [<ffffffffa07f76b0>] ? ptlrpc_main+0x0/0x1870 [ptlrpc]
05:34:20: [<ffffffffa07f76b0>] ? ptlrpc_main+0x0/0x1870

instead of kmalloc, kmem_cache_alloc should be used to allocate and effectively handle key's.



 Comments   
Comment by Alexey Lyashkov [ 04/Feb/13 ]

one more bug in that area.
4382 static void *osd_key_init(const struct lu_context *ctx,
4383 struct lu_context_key *key)
...
4391 OBD_ALLOC(info->oti_it_ea_buf, OSD_IT_EA_BUFSIZE);

so 1Mb allocation via kmalloc, but for such buffer vmalloc prefered aka cfs_alloc_large.

Comment by Andreas Dilger [ 05/Feb/13 ]

There are a number of problems found with PTLRPC_MAX_BRW_SIZE of 32MB that can be fixed relatively easily:

  • osd_thread_info.osd_iobuf.dr_blocks[] is 512kB
  • osd_thread_info.osd_iobuf.dr_pages[] is 64kB
  • osd_thread_info.osd_iobuf is only needed for OST_IO_PORTAL and does not need to be allocated for other threads
  • osd_thread_info.oti_created[] is 32kB and is unused and can be removed
  • oti_thread_info uses OBD_ALLOC() instead of OBD_ALLOC_LARGE()

There may be others, but these were obviously apparent at the first glance into the failed allocation paths.

Comment by Andreas Dilger [ 11/Feb/13 ]

Probably dr_blocks[] can be reduced in size as well, since we will normally have 4kB blocksize instead of 1kB blocksize, so we don't need a buffer_head for every 1kB of space.

Comment by Oleg Drokin [ 11/Feb/13 ]

Bandaid patch to just change the allocation to OBD_ALLOC_LARGE is in http://review.whamcloud.com/5323, this seems to greatly help in my testing.

Comment by Alex Zhuravlev [ 15/Feb/13 ]

http://review.whamcloud.com/#change,5444

Comment by Peter Jones [ 13/Mar/13 ]

As per Oleg, dropping priority for remaining work

Comment by James A Simmons [ 13/Mar/13 ]

I found a bug in http://review.whamcloud.com/#change,5444.

The rhel6.3 ldiskfs patch ext4-map_inode_page-2.6.18.patch defines ext4_map_inode_page function has having a integer array created need for it function but in fsfilt_ext3.c use call out to this function with

extern int ext3_map_inode_page(struct inode *inode, struct page *page,
unsigned long *blocks, int create);

This will crash in a very painful way.

Comment by James A Simmons [ 13/Mar/13 ]

Patch to fix this is at http://review.whamcloud.com/#change,5708

Comment by James A Simmons [ 13/Mar/13 ]

Patch to fix this is at http://review.whamcloud.com/#change,5708

Comment by Alex Zhuravlev [ 22/Apr/13 ]

should be resolved now?

Comment by Peter Jones [ 22/Apr/13 ]

Oleg, can your http://review.whamcloud.com/#change,5323 "band aid" patch be abandoned?

Comment by Andreas Dilger [ 13/May/13 ]

The http://review.whamcloud.com/5444 patch was landed for 2.4.0, and http://review.whamcloud.com/5323 is no longer needed.

Generated at Sat Feb 10 01:27:52 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.