[LU-16387] reduce negative effect from failed mem allocations in OBD_ALLOC_LARGE Created: 12/Dec/22 Updated: 07/Jan/23 Resolved: 07/Jan/23 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.16.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Alexander Zarochentsev | Assignee: | Alexander Zarochentsev |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
OBD_ALLOC_LARGE has a switch to vmalloc, if kmalloc allocation fails. Really the vm tries to kmalloc memory so hard that the system spends significant amount of time in try_to_free_pages() allocation loops instead of failing back to vmalloc(). #define OBD_ALLOC_LARGE(ptr, size) \ do { \ /* LU-8196 - force large allocations to use vmalloc, not kmalloc */ \ if ((size) > KMALLOC_MAX_SIZE) \ ptr = NULL; \ else \ OBD_ALLOC_GFP(ptr, size, GFP_NOFS | __GFP_NOWARN); \ if (ptr == NULL) \ OBD_VMALLOC(ptr, size); \ } while (0) in-kernel (linux-4.18) implementation of kvmalloc() is more smart:
/*
* We want to attempt a large physically contiguous block first because
* it is less likely to fragment multiple larger blocks and therefore
* contribute to a long term fragmentation less than vmalloc fallback.
* However make sure that larger requests are not too disruptive - no
* OOM killer and no allocation failure warnings as we have a fallback.
*/
if (size > PAGE_SIZE) {
kmalloc_flags |= __GFP_NOWARN;
if (!(kmalloc_flags & __GFP_RETRY_MAYFAIL))
kmalloc_flags |= __GFP_NORETRY;
}
ret = kmalloc_node(size, kmalloc_flags, node);
__GFP_NORETRY can be used in OBD_ALLOC_LARGE() the same way for the same purposes. Here is an example of failed mem allocations on a heavy loaded system (CAST-31591) 2022-11-17 17:32:44 [974463.901303] Pid: 31862, comm: ll_ost_io00_744 3.10.0-957.1.3957.1.3.x3.5.46.x86_64 #1 SMP Thu Jan 20 13:08:08 CST 2022 2022-11-17 17:32:44 [974463.913928] Call Trace: 2022-11-17 17:32:44 [974463.918313] [<0>] __cond_resched+0x26/0x30 2022-11-17 17:32:44 [974463.924347] [<0>] shrink_page_list+0x97/0xc30 2022-11-17 17:32:44 [974463.930623] [<0>] shrink_inactive_list+0x1c6/0x5d0 2022-11-17 17:32:44 [974463.937296] [<0>] shrink_lruvec+0x385/0x730 2022-11-17 17:32:44 [974463.943307] [<0>] shrink_zone+0x76/0x1a0 2022-11-17 17:32:44 [974463.949020] [<0>] do_try_to_free_pages+0xf0/0x4e0 2022-11-17 17:32:44 [974463.955486] [<0>] try_to_free_pages+0xfc/0x180 2022-11-17 17:32:44 [974463.961645] [<0>] __alloc_pages_slowpath+0x457/0x724 2022-11-17 17:32:44 [974463.968328] [<0>] __alloc_pages_nodemask+0x405/0x420 2022-11-17 17:32:44 [974463.974996] [<0>] alloc_pages_current+0x98/0x110 2022-11-17 17:32:44 [974463.981328] [<0>] __get_free_pages+0xe/0x40 2022-11-17 17:32:44 [974463.987227] [<0>] kmalloc_order_trace+0x2e/0xa0 2022-11-17 17:32:44 [974463.993478] [<0>] __kmalloc+0x211/0x230 2022-11-17 17:32:44 [974463.999085] [<0>] ptlrpc_new_bulk+0x13a/0x870 [ptlrpc] |
| Comments |
| Comment by Gerrit Updater [ 12/Dec/22 ] |
|
"Alexander Zarochentsev <alexander.zarochentsev@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/49380 |
| Comment by Gerrit Updater [ 07/Jan/23 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/49380/ |
| Comment by Peter Jones [ 07/Jan/23 ] |
|
Landed for 2.16 |