Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12524

mdc_close() matched open debug msg causes memory fragmentation

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.13.0
    • Lustre 2.7.0
    • None
    • 3
    • 9223372036854775807

    Description

      Customer reported not being able to run job because the job could not always get the contiguous memory it required. Memory was too fragmented. The primary source of the fragmentation was traced to the Lustre cfs_trace_data pages that are allocated and freed dynamically. Over 99% of the debug messages were the matched open messages issued by mdc_close:

      DEBUG_REQ(D_HA, mod->mod_open_req, "matched open; tag %d", tag);

      The customer was able to work around the problem by removing HA from the default set of debug trace flags. This is a reasonable workaround but not a good solution because the HA tracing is often useful for diagnosing connection problems, particularly at mount time.

      The matched open debug message, however, is not nearly as useful as the other HA messages. So moving the message under a different debug flag, one that must be set explicitly, reduces the amount of default tracing and thereby helps reduce fragmentation at a fairly low cost. With HA eliminated as a possible debug type, the only other available flag that makes much sense is OTHER. Thus, let's change D_HA in the above DEBUG_REQ statement to D_OTHER.

      Attachments

        Activity

          [LU-12524] mdc_close() matched open debug msg causes memory fragmentation
          pjones Peter Jones added a comment -

          Sure! This work meanwhile has landed for 2.13

          pjones Peter Jones added a comment - Sure! This work meanwhile has landed for 2.13

          Peter should the migration of the debug buffer to the kernel ring buffer implementation be done under a different ticket.

          simmonsja James A Simmons added a comment - Peter should the migration of the debug buffer to the kernel ring buffer implementation be done under a different ticket.

          Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35449/
          Subject: LU-12524 libcfs: Reduce memory frag due to HA debug msg
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: 076a5961f20b4c55347e8968be2fb2504de6f8dd

          gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35449/ Subject: LU-12524 libcfs: Reduce memory frag due to HA debug msg Project: fs/lustre-release Branch: master Current Patch Set: Commit: 076a5961f20b4c55347e8968be2fb2504de6f8dd
          green Oleg Drokin added a comment -

          I'd vote for an option to preallocate debug buffer pages. That was people that need it can enable it and those that don't can have their RAM.

          green Oleg Drokin added a comment - I'd vote for an option to preallocate debug buffer pages. That was people that need it can enable it and those that don't can have their RAM.

          If we do move to a static buffer then this code ends up looking a lot like the ring buffer implementation used by oprofile and trace events. The default ring_buffer in the kernel might handle these fragmentation issues better. Something to explore perhaps?

          simmonsja James A Simmons added a comment - If we do move to a static buffer then this code ends up looking a lot like the ring buffer implementation used by oprofile and trace events. The default ring_buffer in the kernel might handle these fragmentation issues better. Something to explore perhaps?

          I agree that removing this message from the default set does not eliminate the possibility of fragmentation. Certainly a large number of any of the default debug messages could have the same effect. It's just that in practice, the mdc_close() message is the one that occurs in a sufficiently large number to cause churn in trace page allocation. Evidence that this message was the problem is the fact that disabling HA solved the customer issue.

          Setting debug_mb doesn't resolve the fragmentation problem. The trace pages are not a circular buffer. Pages are allocated on demand and freed back to the kernel memory pool when debug_mb is exceeded. My understanding is that fragmentation caused by the trace buffer pages interferes primarily with hugepage allocations.

          I've looked at pre-allocating trace pages. Certainly doable, but has the drawback of over allocating memory that may never be needed. Mostly I was looking for an easy solution to a specific problem. Also I find those mdc_close messages annoying and worthless in default debug logs, so I'd like to get rid of them. I will change the message to D_RPCTRACE as suggested.

          amk Ann Koehler (Inactive) added a comment - I agree that removing this message from the default set does not eliminate the possibility of fragmentation. Certainly a large number of any of the default debug messages could have the same effect. It's just that in practice, the mdc_close() message is the one that occurs in a sufficiently large number to cause churn in trace page allocation. Evidence that this message was the problem is the fact that disabling HA solved the customer issue. Setting debug_mb doesn't resolve the fragmentation problem. The trace pages are not a circular buffer. Pages are allocated on demand and freed back to the kernel memory pool when debug_mb is exceeded. My understanding is that fragmentation caused by the trace buffer pages interferes primarily with hugepage allocations. I've looked at pre-allocating trace pages. Certainly doable, but has the drawback of over allocating memory that may never be needed. Mostly I was looking for an easy solution to a specific problem. Also I find those mdc_close messages annoying and worthless in default debug logs, so I'd like to get rid of them. I will change the message to D_RPCTRACE as suggested.

          While I'm happy to clean up this debug message (I've seen it in the logs as well), it surprises me that this one message would be enough to cause memory allocation problems on the client. The size of the kernel debug logs is limited by the "debug_mb" parameter, with a default of a few MB of memory per core, so even if this message is removed, there will be other messages generated over time that will just stay in the kernel logs longer until they consume an equal amount of space.

          I guess it might be possible to age out the debug log pages after they are a certain age, but it isn't clear that repeatedly freeing old pages and allocating new pages will help or make the problem worse. It may be best to just allocate the full debug buffer size at startup (or have an option to do so), so that the pages are allocated the same time/location instead of during runtime.

          adilger Andreas Dilger added a comment - While I'm happy to clean up this debug message (I've seen it in the logs as well), it surprises me that this one message would be enough to cause memory allocation problems on the client. The size of the kernel debug logs is limited by the " debug_mb " parameter, with a default of a few MB of memory per core, so even if this message is removed, there will be other messages generated over time that will just stay in the kernel logs longer until they consume an equal amount of space. I guess it might be possible to age out the debug log pages after they are a certain age, but it isn't clear that repeatedly freeing old pages and allocating new pages will help or make the problem worse. It may be best to just allocate the full debug buffer size at startup (or have an option to do so), so that the pages are allocated the same time/location instead of during runtime.

          Ann Koehler (amk@cray.com) uploaded a new patch: https://review.whamcloud.com/35449
          Subject: LU-12524 libcfs: Reduce memory frag due to HA debug msg
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: f47c9ef23eab67da5e3a4f19ba304d7247735098

          gerrit Gerrit Updater added a comment - Ann Koehler (amk@cray.com) uploaded a new patch: https://review.whamcloud.com/35449 Subject: LU-12524 libcfs: Reduce memory frag due to HA debug msg Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: f47c9ef23eab67da5e3a4f19ba304d7247735098

          People

            amk Ann Koehler (Inactive)
            amk Ann Koehler (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: