[LU-16387] reduce negative effect from failed mem allocations in OBD_ALLOC_LARGE Created: 12/Dec/22  Updated: 07/Jan/23  Resolved: 07/Jan/23

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.16.0

Type: Bug Priority: Minor
Reporter: Alexander Zarochentsev Assignee: Alexander Zarochentsev
Resolution: Fixed Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

OBD_ALLOC_LARGE has a switch to vmalloc, if kmalloc allocation fails. Really the vm tries to kmalloc memory so hard that the system spends significant amount of time in try_to_free_pages() allocation loops instead of failing back to vmalloc().

#define OBD_ALLOC_LARGE(ptr, size)                                            \
do {                                                                          \
        /* LU-8196 - force large allocations to use vmalloc, not kmalloc */   \
        if ((size) > KMALLOC_MAX_SIZE)                                          \
                ptr = NULL;                                                   \
        else                                                                  \
                OBD_ALLOC_GFP(ptr, size, GFP_NOFS | __GFP_NOWARN);            \
        if (ptr == NULL)                                                      \
                OBD_VMALLOC(ptr, size);                                       \
} while (0)

in-kernel (linux-4.18) implementation of kvmalloc() is more smart:

        /*
         * We want to attempt a large physically contiguous block first because
         * it is less likely to fragment multiple larger blocks and therefore
         * contribute to a long term fragmentation less than vmalloc fallback.
         * However make sure that larger requests are not too disruptive - no
         * OOM killer and no allocation failure warnings as we have a fallback.
         */
        if (size > PAGE_SIZE) {
                kmalloc_flags |= __GFP_NOWARN;

                if (!(kmalloc_flags & __GFP_RETRY_MAYFAIL))
                        kmalloc_flags |= __GFP_NORETRY;
        }

        ret = kmalloc_node(size, kmalloc_flags, node);

__GFP_NORETRY can be used in OBD_ALLOC_LARGE() the same way for the same purposes.

Here is an example of failed mem allocations on a heavy loaded system (CAST-31591)

2022-11-17 17:32:44 [974463.901303] Pid: 31862, comm: ll_ost_io00_744 3.10.0-957.1.3957.1.3.x3.5.46.x86_64 #1 SMP Thu Jan 20 13:08:08 CST 2022
2022-11-17 17:32:44 [974463.913928] Call Trace:
2022-11-17 17:32:44 [974463.918313] [<0>] __cond_resched+0x26/0x30
2022-11-17 17:32:44 [974463.924347] [<0>] shrink_page_list+0x97/0xc30
2022-11-17 17:32:44 [974463.930623] [<0>] shrink_inactive_list+0x1c6/0x5d0
2022-11-17 17:32:44 [974463.937296] [<0>] shrink_lruvec+0x385/0x730
2022-11-17 17:32:44 [974463.943307] [<0>] shrink_zone+0x76/0x1a0
2022-11-17 17:32:44 [974463.949020] [<0>] do_try_to_free_pages+0xf0/0x4e0
2022-11-17 17:32:44 [974463.955486] [<0>] try_to_free_pages+0xfc/0x180
2022-11-17 17:32:44 [974463.961645] [<0>] __alloc_pages_slowpath+0x457/0x724
2022-11-17 17:32:44 [974463.968328] [<0>] __alloc_pages_nodemask+0x405/0x420
2022-11-17 17:32:44 [974463.974996] [<0>] alloc_pages_current+0x98/0x110
2022-11-17 17:32:44 [974463.981328] [<0>] __get_free_pages+0xe/0x40
2022-11-17 17:32:44 [974463.987227] [<0>] kmalloc_order_trace+0x2e/0xa0
2022-11-17 17:32:44 [974463.993478] [<0>] __kmalloc+0x211/0x230
2022-11-17 17:32:44 [974463.999085] [<0>] ptlrpc_new_bulk+0x13a/0x870 [ptlrpc] 


 Comments   
Comment by Gerrit Updater [ 12/Dec/22 ]

"Alexander Zarochentsev <alexander.zarochentsev@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/49380
Subject: LU-16387 lustre: switch OBD_ALLOC_LARGE to vmalloc faster
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 566a9aa05042971e20854106e6830a5062d7e583

Comment by Gerrit Updater [ 07/Jan/23 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/49380/
Subject: LU-16387 lustre: switch OBD_ALLOC_LARGE to vmalloc faster
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 41bed753b3830543740d0f099695adde50f4c20e

Comment by Peter Jones [ 07/Jan/23 ]

Landed for 2.16

Generated at Sat Feb 10 03:26:35 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.