Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-16387

reduce negative effect from failed mem allocations in OBD_ALLOC_LARGE

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.16.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      OBD_ALLOC_LARGE has a switch to vmalloc, if kmalloc allocation fails. Really the vm tries to kmalloc memory so hard that the system spends significant amount of time in try_to_free_pages() allocation loops instead of failing back to vmalloc().

      #define OBD_ALLOC_LARGE(ptr, size)                                            \
      do {                                                                          \
              /* LU-8196 - force large allocations to use vmalloc, not kmalloc */   \
              if ((size) > KMALLOC_MAX_SIZE)                                          \
                      ptr = NULL;                                                   \
              else                                                                  \
                      OBD_ALLOC_GFP(ptr, size, GFP_NOFS | __GFP_NOWARN);            \
              if (ptr == NULL)                                                      \
                      OBD_VMALLOC(ptr, size);                                       \
      } while (0)
      

      in-kernel (linux-4.18) implementation of kvmalloc() is more smart:

              /*
               * We want to attempt a large physically contiguous block first because
               * it is less likely to fragment multiple larger blocks and therefore
               * contribute to a long term fragmentation less than vmalloc fallback.
               * However make sure that larger requests are not too disruptive - no
               * OOM killer and no allocation failure warnings as we have a fallback.
               */
              if (size > PAGE_SIZE) {
                      kmalloc_flags |= __GFP_NOWARN;
      
                      if (!(kmalloc_flags & __GFP_RETRY_MAYFAIL))
                              kmalloc_flags |= __GFP_NORETRY;
              }
      
              ret = kmalloc_node(size, kmalloc_flags, node);
      

      __GFP_NORETRY can be used in OBD_ALLOC_LARGE() the same way for the same purposes.

      Here is an example of failed mem allocations on a heavy loaded system (CAST-31591)

      2022-11-17 17:32:44 [974463.901303] Pid: 31862, comm: ll_ost_io00_744 3.10.0-957.1.3957.1.3.x3.5.46.x86_64 #1 SMP Thu Jan 20 13:08:08 CST 2022
      2022-11-17 17:32:44 [974463.913928] Call Trace:
      2022-11-17 17:32:44 [974463.918313] [<0>] __cond_resched+0x26/0x30
      2022-11-17 17:32:44 [974463.924347] [<0>] shrink_page_list+0x97/0xc30
      2022-11-17 17:32:44 [974463.930623] [<0>] shrink_inactive_list+0x1c6/0x5d0
      2022-11-17 17:32:44 [974463.937296] [<0>] shrink_lruvec+0x385/0x730
      2022-11-17 17:32:44 [974463.943307] [<0>] shrink_zone+0x76/0x1a0
      2022-11-17 17:32:44 [974463.949020] [<0>] do_try_to_free_pages+0xf0/0x4e0
      2022-11-17 17:32:44 [974463.955486] [<0>] try_to_free_pages+0xfc/0x180
      2022-11-17 17:32:44 [974463.961645] [<0>] __alloc_pages_slowpath+0x457/0x724
      2022-11-17 17:32:44 [974463.968328] [<0>] __alloc_pages_nodemask+0x405/0x420
      2022-11-17 17:32:44 [974463.974996] [<0>] alloc_pages_current+0x98/0x110
      2022-11-17 17:32:44 [974463.981328] [<0>] __get_free_pages+0xe/0x40
      2022-11-17 17:32:44 [974463.987227] [<0>] kmalloc_order_trace+0x2e/0xa0
      2022-11-17 17:32:44 [974463.993478] [<0>] __kmalloc+0x211/0x230
      2022-11-17 17:32:44 [974463.999085] [<0>] ptlrpc_new_bulk+0x13a/0x870 [ptlrpc] 

      Attachments

        Activity

          People

            zam Alexander Zarochentsev
            zam Alexander Zarochentsev
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: