<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:22:14 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-2084] Kernel freeze allocating more memory than there is RAM</title>
                <link>https://jira.whamcloud.com/browse/LU-2084</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;While working with router buffers, I set the number of large buffers to a number beyond the amount of memory I had assigned to the VM running Lustre.  Number of large buffer: 1024, amount of memory: 1G.  The VM froze with all 3 virtual cpu&apos;s running at 100%.&lt;/p&gt;

&lt;p&gt;Looking deeper into this, I found that the Linux memory allocation system will keep trying to free up memory to satisfy the request.  However, even after waiting 15 minutes, the VM did not &quot;unfreeze&quot;.&lt;/p&gt;

&lt;p&gt;I changed the default flags we use for memory allocation to include __GFP_NORETRY to stop the memory allocator from looping.  When re-running the above test, I found the system no longer froze but returned -ENOMEM to the caller as expected.&lt;/p&gt;

&lt;p&gt;This bug is to track a discussion as to whether we should start using __GFP_NORETRY and if so, how widespread.&lt;/p&gt;</description>
                <environment></environment>
        <key id="16235">LU-2084</key>
            <summary>Kernel freeze allocating more memory than there is RAM</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="adilger">Andreas Dilger</assignee>
                                    <reporter username="doug">Doug Oucharek</reporter>
                        <labels>
                    </labels>
                <created>Wed, 3 Oct 2012 19:20:24 +0000</created>
                <updated>Wed, 27 Oct 2021 03:48:00 +0000</updated>
                            <resolved>Wed, 27 Oct 2021 03:48:00 +0000</resolved>
                                    <version>Lustre 2.2.0</version>
                    <version>Lustre 2.3.0</version>
                    <version>Lustre 2.4.0</version>
                    <version>Lustre 2.1.3</version>
                    <version>Lustre 1.8.8</version>
                                    <fixVersion>Lustre 2.15.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>7</watches>
                                                                            <comments>
                            <comment id="45964" author="keith" created="Wed, 3 Oct 2012 19:49:38 +0000"  >&lt;p&gt;In general:&lt;/p&gt;

&lt;p&gt;I have seen near OOM situations take hours (think overnight to days) to work themselves out if it is possible, over committal of kernel size code fairs badly. 15min is just getting started with these things.  In your &quot;frozen&quot; state your kernel was likely not broken in any way just busy trying to accomplish what you asked it to do. &lt;/p&gt;

&lt;p&gt;There are going to be some critical sections of code that cannot fail and should block until fulfilled. It is definitely a per-allocation-class question of weather it should block or not. If you are allocating a whole systems amount of memory it should use __GFP_NORETRY and check for -ENOMEM, also I would hope that resources users this large would carefully grow as needed.&lt;/p&gt;

&lt;p&gt;Where did you add the  __GFP_NORETRY flag? &lt;/p&gt;</comment>
                            <comment id="45965" author="doug" created="Wed, 3 Oct 2012 20:03:59 +0000"  >&lt;p&gt;Given how there are many layers of &quot;abstraction&quot; in libcfs for OS portability, I ended up just hardcoding the addition of __GFP_NORETRY in routine cfs_alloc_flags_to_gfp() in linux-mem.c.&lt;/p&gt;

&lt;p&gt;Since we don&apos;t preallocate everything and do have some level of dynamic allocation, the possibility will exist that a happy running server could all of a sudden appear to freeze.  I myself thought I had crashed the kernel since the terminal was no longer responsive.  Only by seeing the CPU meter on the host system did I notice that the VM is still running at 100%.&lt;/p&gt;

&lt;p&gt;From a user&apos;s perspective, I would rather have an error message saying &quot;task X could not be done because of a lack of memory&quot; rather than a freeze.  &lt;/p&gt;</comment>
                            <comment id="45966" author="keith" created="Wed, 3 Oct 2012 20:29:50 +0000"  >&lt;p&gt;Yes working to keep the system out of OOM is a much better user experience.&lt;/p&gt;

&lt;p&gt;cfs_alloc_flags_to_gfp seems to be pretty low. I would think that is a huge amount of code affected. &lt;br/&gt;
Are you using cfs_alloc(size_t nr_bytes, u_int32_t flags)?  You can passdown  __GFP_NORETRY to your specific allocation. &lt;/p&gt;

&lt;p&gt;What are you seeing as your -ENOMEM indication? &lt;/p&gt;</comment>
                            <comment id="45973" author="adilger" created="Thu, 4 Oct 2012 04:01:02 +0000"  >&lt;p&gt;Doug, wouldn&apos;t it make sense to limit the number of router buffers to some amount less than the total amount of RAM?  Using __GFP_NORETRY in a blanket fashion seems like it could cause gratuitous system failures for cases where there is low memory, but the allocation is not absurd like in your case.&lt;/p&gt;</comment>
                            <comment id="46016" author="isaac" created="Thu, 4 Oct 2012 15:49:09 +0000"  >&lt;p&gt;1. I think __GFP_NORETRY is reasonable for router buffers. Routers should be dedicated nodes, where there&apos;s nothing else running - i.e. there&apos;s nothing like dirty pages to be flushed or idle process pages to be swapped out, so it makes little sense to make the VM retry.&lt;/p&gt;

&lt;p&gt;2. I don&apos;t think we should make it fool proof by limiting large_router_buffers. System administrators should understand what large_router_buffers does, if they ask for too much, they are asking for trouble and should end up with troubles. Such failures happen only once at router startup, and such routers would be avoided by clients and servers by router pingers, so the consequence should not be catastrophic. Then the admin should notice it and learn his lesson.&lt;/p&gt;</comment>
                            <comment id="46023" author="doug" created="Thu, 4 Oct 2012 17:48:58 +0000"  >&lt;p&gt;This becomes more complicated when looking forward to the Dynamic LNet Config project which will be making the router buffer pools changeable.  With the code as is today, if a user tells a running router to increase the size of a pool beyond available memory, we will see the router lock up for potentially hours.  That is unacceptable.&lt;/p&gt;

&lt;p&gt;If we use __GFP_NORETRY, it may return ENOMEM in cases where memory could have been freed to satisfy the request.  However, I would rather see this than a live router lockup.&lt;/p&gt;

&lt;p&gt;Checking ahead of time to see if there is RAM available does not sound easy given how the Linux memory manager works.  Also, I feel this would be doing the OS&apos;s job for it.&lt;/p&gt;

&lt;p&gt;I heard somewhere that work was done to the memory manager in the Linux 3 streams to address these sort of issues.  None of that was back-ported to 2.6. &lt;/p&gt;</comment>
                            <comment id="46032" author="isaac" created="Thu, 4 Oct 2012 22:36:43 +0000"  >&lt;p&gt;I tend to think __GFP_NORETRY is sufficient. On dedicated routers, where could the VM free much memory from?&lt;/p&gt;</comment>
                            <comment id="46035" author="doug" created="Thu, 4 Oct 2012 23:43:00 +0000"  >&lt;p&gt;Good point.&lt;/p&gt;

&lt;p&gt;Ok, I can add CFS_ALLOC_NORETRY to our own set of memory allocation flags and map this to __GFP_NORETRY when present.  This way it can be added on a case by case basis.  I will only add this flag when allocating router buffers.&lt;/p&gt;</comment>
                            <comment id="46037" author="adilger" created="Fri, 5 Oct 2012 00:06:18 +0000"  >&lt;p&gt;As much as we could wish everyone using Lustre understood it as well as the developers, I don&apos;t think this is at all realistic.  Users need to be told that something they are trying to do is unrealistic, rather than causing failures or hanging/crashing the node.  Having a check like the following seems reasonable:&lt;/p&gt;

&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;        &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (router_buffer_pages &amp;gt; cfs_num_physpages * 7 / 8) {
                CERROR(&lt;span class=&quot;code-quote&quot;&gt;&quot;too much router memory requested: max %u\n&quot;&lt;/span&gt;,
                       cfs_num_physpages * 7 / 8);
                RETURN(-EINVAL);
        fi
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;with allowances for printing messages with proper units, etc.  We still need to keep some memory for other things as well, which we may not get with a simple -ENOMEM case.&lt;/p&gt;</comment>
                            <comment id="315092" author="gerrit" created="Sat, 9 Oct 2021 01:22:19 +0000"  >&lt;p&gt;&quot;Andreas Dilger &amp;lt;adilger@whamcloud.com&amp;gt;&quot; uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/45174&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/45174&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-2084&quot; title=&quot;Kernel freeze allocating more memory than there is RAM&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-2084&quot;&gt;&lt;del&gt;LU-2084&lt;/del&gt;&lt;/a&gt; lnet: don&apos;t retry allocating router buffers&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: ebd97d4585eea1aa7717f555a52dc24bcfa1885e&lt;/p&gt;</comment>
                            <comment id="316622" author="gerrit" created="Wed, 27 Oct 2021 00:35:21 +0000"  >&lt;p&gt;&quot;Oleg Drokin &amp;lt;green@whamcloud.com&amp;gt;&quot; merged in patch &lt;a href=&quot;https://review.whamcloud.com/45174/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/45174/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-2084&quot; title=&quot;Kernel freeze allocating more memory than there is RAM&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-2084&quot;&gt;&lt;del&gt;LU-2084&lt;/del&gt;&lt;/a&gt; lnet: don&apos;t retry allocating router buffers&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 3038917f12a53b059473db172f5126136e20abc0&lt;/p&gt;</comment>
                            <comment id="316650" author="pjones" created="Wed, 27 Oct 2021 03:48:00 +0000"  >&lt;p&gt;Landed for 2.15&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzv51j:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>4350</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>