<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:53:21 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-12524] mdc_close() matched open debug msg causes memory fragmentation</title>
                <link>https://jira.whamcloud.com/browse/LU-12524</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Customer reported not being able to run job because the job could not always get the contiguous memory it required. Memory was too fragmented. The primary source of the fragmentation was traced to the Lustre cfs_trace_data pages that are allocated and freed  dynamically. Over 99% of the debug messages were the matched open messages issued by mdc_close:&lt;/p&gt;

&lt;p&gt; DEBUG_REQ(D_HA, mod-&amp;gt;mod_open_req, &quot;matched open; tag %d&quot;, tag); &lt;/p&gt;

&lt;p&gt;The customer was able to work around the problem by removing HA from the default set of debug trace flags. This is a reasonable workaround but not a good solution because the HA tracing is often useful for diagnosing connection problems, particularly at mount time.&lt;/p&gt;

&lt;p&gt;The matched open debug message, however, is not nearly as useful as the other HA messages. So moving the message under a different debug flag, one that must be set explicitly, reduces the amount of default tracing and thereby helps reduce fragmentation at a fairly low cost. With HA eliminated as a possible debug type, the only other available flag that makes much sense is OTHER. Thus, let&apos;s change D_HA in the above DEBUG_REQ statement to D_OTHER.&lt;/p&gt;</description>
                <environment></environment>
        <key id="56295">LU-12524</key>
            <summary>mdc_close() matched open debug msg causes memory fragmentation</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="amk">Ann Koehler</assignee>
                                    <reporter username="amk">Ann Koehler</reporter>
                        <labels>
                    </labels>
                <created>Mon, 8 Jul 2019 21:33:58 +0000</created>
                <updated>Wed, 17 Jul 2019 21:29:44 +0000</updated>
                            <resolved>Wed, 17 Jul 2019 21:29:44 +0000</resolved>
                                    <version>Lustre 2.7.0</version>
                                    <fixVersion>Lustre 2.13.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>5</watches>
                                                                            <comments>
                            <comment id="250863" author="gerrit" created="Mon, 8 Jul 2019 21:44:41 +0000"  >&lt;p&gt;Ann Koehler (amk@cray.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/35449&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/35449&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-12524&quot; title=&quot;mdc_close() matched open debug msg causes memory fragmentation&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-12524&quot;&gt;&lt;del&gt;LU-12524&lt;/del&gt;&lt;/a&gt; libcfs: Reduce memory frag due to HA debug msg&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: f47c9ef23eab67da5e3a4f19ba304d7247735098&lt;/p&gt;</comment>
                            <comment id="250870" author="adilger" created="Tue, 9 Jul 2019 00:35:58 +0000"  >&lt;p&gt;While I&apos;m happy to clean up this debug message (I&apos;ve seen it in the logs as well), it surprises me that this one message would be enough to cause memory allocation problems on the client.  The size of the kernel debug logs is limited by the &quot;&lt;tt&gt;debug_mb&lt;/tt&gt;&quot; parameter, with a default of a few MB of memory per core, so even if this message is removed, there will be other messages generated over time that will just stay in the kernel logs longer until they consume an equal amount of space.&lt;/p&gt;

&lt;p&gt;I guess it might be possible to age out the debug log pages after they are a certain age, but it isn&apos;t clear that repeatedly freeing old pages and allocating new pages will help or make the problem worse.  It may be best to just allocate the full debug buffer size at startup (or have an option to do so), so that the pages are allocated the same time/location instead of during runtime.&lt;/p&gt;</comment>
                            <comment id="250903" author="amk" created="Tue, 9 Jul 2019 15:45:22 +0000"  >&lt;p&gt;I agree that removing this message from the default set does not eliminate the possibility of fragmentation. Certainly a large number of any of the default debug messages could have the same effect. It&apos;s just that in practice, the mdc_close() message is the one that occurs in a sufficiently large number to cause churn in trace page allocation. Evidence that this message was the problem is the fact that disabling HA solved the customer issue.&lt;/p&gt;

&lt;p&gt;Setting debug_mb doesn&apos;t resolve the fragmentation problem. The trace pages are not a circular buffer. Pages are allocated on demand and freed back to the kernel memory pool when debug_mb is exceeded. My understanding is that fragmentation caused by the trace buffer pages interferes primarily with hugepage allocations.&lt;/p&gt;

&lt;p&gt;I&apos;ve looked at pre-allocating trace pages. Certainly doable, but has the drawback of over allocating memory that may never be needed. Mostly I was looking for an easy solution to a specific problem. Also I find those mdc_close messages annoying and worthless in default debug logs, so I&apos;d like to get rid of them. I will change the message to D_RPCTRACE as suggested.&lt;/p&gt;</comment>
                            <comment id="250904" author="simmonsja" created="Tue, 9 Jul 2019 16:29:58 +0000"  >&lt;p&gt;If we do move to a static buffer then this code ends up looking a lot like the ring buffer implementation used by oprofile and trace events. The default ring_buffer in the kernel might handle these fragmentation issues better. Something to explore perhaps?&lt;/p&gt;</comment>
                            <comment id="251526" author="green" created="Wed, 17 Jul 2019 06:13:03 +0000"  >&lt;p&gt;I&apos;d vote for an option to preallocate debug buffer pages. That was people that need it can enable it and those that don&apos;t can have their RAM.&lt;/p&gt;</comment>
                            <comment id="251539" author="gerrit" created="Wed, 17 Jul 2019 06:22:06 +0000"  >&lt;p&gt;Oleg Drokin (green@whamcloud.com) merged in patch &lt;a href=&quot;https://review.whamcloud.com/35449/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/35449/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-12524&quot; title=&quot;mdc_close() matched open debug msg causes memory fragmentation&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-12524&quot;&gt;&lt;del&gt;LU-12524&lt;/del&gt;&lt;/a&gt; libcfs: Reduce memory frag due to HA debug msg&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 076a5961f20b4c55347e8968be2fb2504de6f8dd&lt;/p&gt;</comment>
                            <comment id="251563" author="simmonsja" created="Wed, 17 Jul 2019 13:54:55 +0000"  >&lt;p&gt;Peter should the migration of the debug buffer to the kernel ring buffer implementation be done under a different ticket.&lt;/p&gt;</comment>
                            <comment id="251585" author="pjones" created="Wed, 17 Jul 2019 21:29:44 +0000"  >&lt;p&gt;Sure! This work meanwhile has landed for 2.13&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i00jcv:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>