Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2084

Kernel freeze allocating more memory than there is RAM

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.15.0
    • Lustre 2.2.0, Lustre 2.3.0, Lustre 2.4.0, Lustre 2.1.3, Lustre 1.8.8
    • None
    • 3
    • 4350

    Description

      While working with router buffers, I set the number of large buffers to a number beyond the amount of memory I had assigned to the VM running Lustre. Number of large buffer: 1024, amount of memory: 1G. The VM froze with all 3 virtual cpu's running at 100%.

      Looking deeper into this, I found that the Linux memory allocation system will keep trying to free up memory to satisfy the request. However, even after waiting 15 minutes, the VM did not "unfreeze".

      I changed the default flags we use for memory allocation to include __GFP_NORETRY to stop the memory allocator from looping. When re-running the above test, I found the system no longer froze but returned -ENOMEM to the caller as expected.

      This bug is to track a discussion as to whether we should start using __GFP_NORETRY and if so, how widespread.

      Attachments

        Activity

          [LU-2084] Kernel freeze allocating more memory than there is RAM
          pjones Peter Jones added a comment -

          Landed for 2.15

          pjones Peter Jones added a comment - Landed for 2.15

          "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/45174/
          Subject: LU-2084 lnet: don't retry allocating router buffers
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: 3038917f12a53b059473db172f5126136e20abc0

          gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/45174/ Subject: LU-2084 lnet: don't retry allocating router buffers Project: fs/lustre-release Branch: master Current Patch Set: Commit: 3038917f12a53b059473db172f5126136e20abc0

          "Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/45174
          Subject: LU-2084 lnet: don't retry allocating router buffers
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: ebd97d4585eea1aa7717f555a52dc24bcfa1885e

          gerrit Gerrit Updater added a comment - "Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/45174 Subject: LU-2084 lnet: don't retry allocating router buffers Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: ebd97d4585eea1aa7717f555a52dc24bcfa1885e

          As much as we could wish everyone using Lustre understood it as well as the developers, I don't think this is at all realistic. Users need to be told that something they are trying to do is unrealistic, rather than causing failures or hanging/crashing the node. Having a check like the following seems reasonable:

                  if (router_buffer_pages > cfs_num_physpages * 7 / 8) {
                          CERROR("too much router memory requested: max %u\n",
                                 cfs_num_physpages * 7 / 8);
                          RETURN(-EINVAL);
                  fi
          

          with allowances for printing messages with proper units, etc. We still need to keep some memory for other things as well, which we may not get with a simple -ENOMEM case.

          adilger Andreas Dilger added a comment - As much as we could wish everyone using Lustre understood it as well as the developers, I don't think this is at all realistic. Users need to be told that something they are trying to do is unrealistic, rather than causing failures or hanging/crashing the node. Having a check like the following seems reasonable: if (router_buffer_pages > cfs_num_physpages * 7 / 8) { CERROR( "too much router memory requested: max %u\n" , cfs_num_physpages * 7 / 8); RETURN(-EINVAL); fi with allowances for printing messages with proper units, etc. We still need to keep some memory for other things as well, which we may not get with a simple -ENOMEM case.

          Good point.

          Ok, I can add CFS_ALLOC_NORETRY to our own set of memory allocation flags and map this to __GFP_NORETRY when present. This way it can be added on a case by case basis. I will only add this flag when allocating router buffers.

          doug Doug Oucharek (Inactive) added a comment - Good point. Ok, I can add CFS_ALLOC_NORETRY to our own set of memory allocation flags and map this to __GFP_NORETRY when present. This way it can be added on a case by case basis. I will only add this flag when allocating router buffers.

          I tend to think __GFP_NORETRY is sufficient. On dedicated routers, where could the VM free much memory from?

          isaac Isaac Huang (Inactive) added a comment - I tend to think __GFP_NORETRY is sufficient. On dedicated routers, where could the VM free much memory from?

          This becomes more complicated when looking forward to the Dynamic LNet Config project which will be making the router buffer pools changeable. With the code as is today, if a user tells a running router to increase the size of a pool beyond available memory, we will see the router lock up for potentially hours. That is unacceptable.

          If we use __GFP_NORETRY, it may return ENOMEM in cases where memory could have been freed to satisfy the request. However, I would rather see this than a live router lockup.

          Checking ahead of time to see if there is RAM available does not sound easy given how the Linux memory manager works. Also, I feel this would be doing the OS's job for it.

          I heard somewhere that work was done to the memory manager in the Linux 3 streams to address these sort of issues. None of that was back-ported to 2.6.

          doug Doug Oucharek (Inactive) added a comment - This becomes more complicated when looking forward to the Dynamic LNet Config project which will be making the router buffer pools changeable. With the code as is today, if a user tells a running router to increase the size of a pool beyond available memory, we will see the router lock up for potentially hours. That is unacceptable. If we use __GFP_NORETRY, it may return ENOMEM in cases where memory could have been freed to satisfy the request. However, I would rather see this than a live router lockup. Checking ahead of time to see if there is RAM available does not sound easy given how the Linux memory manager works. Also, I feel this would be doing the OS's job for it. I heard somewhere that work was done to the memory manager in the Linux 3 streams to address these sort of issues. None of that was back-ported to 2.6.

          1. I think __GFP_NORETRY is reasonable for router buffers. Routers should be dedicated nodes, where there's nothing else running - i.e. there's nothing like dirty pages to be flushed or idle process pages to be swapped out, so it makes little sense to make the VM retry.

          2. I don't think we should make it fool proof by limiting large_router_buffers. System administrators should understand what large_router_buffers does, if they ask for too much, they are asking for trouble and should end up with troubles. Such failures happen only once at router startup, and such routers would be avoided by clients and servers by router pingers, so the consequence should not be catastrophic. Then the admin should notice it and learn his lesson.

          isaac Isaac Huang (Inactive) added a comment - 1. I think __GFP_NORETRY is reasonable for router buffers. Routers should be dedicated nodes, where there's nothing else running - i.e. there's nothing like dirty pages to be flushed or idle process pages to be swapped out, so it makes little sense to make the VM retry. 2. I don't think we should make it fool proof by limiting large_router_buffers. System administrators should understand what large_router_buffers does, if they ask for too much, they are asking for trouble and should end up with troubles. Such failures happen only once at router startup, and such routers would be avoided by clients and servers by router pingers, so the consequence should not be catastrophic. Then the admin should notice it and learn his lesson.

          Doug, wouldn't it make sense to limit the number of router buffers to some amount less than the total amount of RAM? Using __GFP_NORETRY in a blanket fashion seems like it could cause gratuitous system failures for cases where there is low memory, but the allocation is not absurd like in your case.

          adilger Andreas Dilger added a comment - Doug, wouldn't it make sense to limit the number of router buffers to some amount less than the total amount of RAM? Using __GFP_NORETRY in a blanket fashion seems like it could cause gratuitous system failures for cases where there is low memory, but the allocation is not absurd like in your case.

          Yes working to keep the system out of OOM is a much better user experience.

          cfs_alloc_flags_to_gfp seems to be pretty low. I would think that is a huge amount of code affected.
          Are you using cfs_alloc(size_t nr_bytes, u_int32_t flags)? You can passdown __GFP_NORETRY to your specific allocation.

          What are you seeing as your -ENOMEM indication?

          keith Keith Mannthey (Inactive) added a comment - Yes working to keep the system out of OOM is a much better user experience. cfs_alloc_flags_to_gfp seems to be pretty low. I would think that is a huge amount of code affected. Are you using cfs_alloc(size_t nr_bytes, u_int32_t flags)? You can passdown __GFP_NORETRY to your specific allocation. What are you seeing as your -ENOMEM indication?

          People

            adilger Andreas Dilger
            doug Doug Oucharek (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: