<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:28:42 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-9728] out of memory on OSS causing allocation failures or hung threads</title>
                <link>https://jira.whamcloud.com/browse/LU-9728</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;In several cases recently there have been memory allocation failures on the OSS due to large amounts of RAM usage from the Lustre read cache:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;LNet: Service thread pid 4950 was inactive for 200.73s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:

schedule+0x29/0x70
schedule_timeout+0x209/0x2d0
io_schedule_timeout+0xae/0x130
io_schedule+0x18/0x20
sleep_on_page+0xe/0x20
__wait_on_bit_lock+0x5b/0xc0
__lock_page+0x78/0xa0
__find_lock_page+0x54/0x70
find_or_create_page+0x34/0xa0
osd_bufs_get+0x20f/0x410 [osd_ldiskfs]
ofd_preprw+0x647/0x11a0 [ofd]
tgt_brw_read+0x9a1/0x14c0 [ptlrpc]
tgt_request_handle+0x8fb/0x11f0 [ptlrpc]
ptlrpc_server_handle_request+0x21b/0xa90 [ptlrpc]
ptlrpc_main+0xc00/0x1f60 [ptlrpc]
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Looking at the page allocation code from &lt;tt&gt;osd_bufs_get()&lt;/tt&gt; to &lt;tt&gt;osd_get_page()&lt;/tt&gt; it appears this is only using &lt;tt&gt;GFP_NOFS&lt;/tt&gt; for allocations, to avoid recursing into the filesystem.  &lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;&lt;span class=&quot;code-keyword&quot;&gt;static&lt;/span&gt; struct page *osd_get_page(struct dt_object *dt, loff_t offset, &lt;span class=&quot;code-object&quot;&gt;int&lt;/span&gt; rw)
{
        page = find_or_create_page(inode-&amp;gt;i_mapping, offset &amp;gt;&amp;gt; PAGE_SHIFT,
                                   GFP_NOFS | __GFP_HIGHMEM);
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;However, looking back in the pre-OSD code, the equivalent code was using &lt;tt&gt;GFP_HIGHUSER&lt;/tt&gt; to allow memory pressure and direct memory reclaim from the OSS threads when memory was short:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;/*
 * the routine is used to request pages from pagecache
 *
 * use GFP_NOFS &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; requests from a local client not allowing to enter FS
 * as we might end up waiting on a page he sent in the request we&apos;re serving.
 * use __GFP_HIGHMEM so that the pages can use all of the available memory
 * on 32-bit machines
 * use more aggressive GFP_HIGHUSER flags from non-local clients to be able to
 * generate more memory pressure.
 *
 * See Bug 19529 and Bug 19917 &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; details.
 */
&lt;span class=&quot;code-keyword&quot;&gt;static&lt;/span&gt; struct page *filter_get_page(struct obd_device *obd, struct inode *inode,
                                    obd_off offset, &lt;span class=&quot;code-object&quot;&gt;int&lt;/span&gt; localreq)
{
        page = find_or_create_page(inode-&amp;gt;i_mapping, offset &amp;gt;&amp;gt; CFS_PAGE_SHIFT,
                                   (localreq ? (GFP_NOFS | __GFP_HIGHMEM) :
                                             GFP_HIGHUSER));
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;It looks like something similar can be done with the OSD code for ldiskfs at least, though it isn&apos;t as clear what is possible for ZFS since the buffer allocation is handled quite differently.&lt;/p&gt;</description>
                <environment></environment>
        <key id="47084">LU-9728</key>
            <summary>out of memory on OSS causing allocation failures or hung threads</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="adilger">Andreas Dilger</assignee>
                                    <reporter username="adilger">Andreas Dilger</reporter>
                        <labels>
                    </labels>
                <created>Fri, 30 Jun 2017 22:02:40 +0000</created>
                <updated>Sat, 16 Oct 2021 16:38:41 +0000</updated>
                            <resolved>Sat, 29 Jul 2017 13:37:57 +0000</resolved>
                                    <version>Lustre 2.7.0</version>
                    <version>Lustre 2.5.3</version>
                    <version>Lustre 2.10.0</version>
                                    <fixVersion>Lustre 2.10.1</fixVersion>
                    <fixVersion>Lustre 2.11.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>5</watches>
                                                                            <comments>
                            <comment id="200815" author="gerrit" created="Sat, 1 Jul 2017 02:12:42 +0000"  >&lt;p&gt;Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/27908&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/27908&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-9728&quot; title=&quot;out of memory on OSS causing allocation failures or hung threads&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-9728&quot;&gt;&lt;del&gt;LU-9728&lt;/del&gt;&lt;/a&gt; osd: use GFP_HIGHUSER for non-local IO&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 2c79106a72d42070768c887f8a1b85a508d4f9b3&lt;/p&gt;</comment>
                            <comment id="203850" author="gerrit" created="Sat, 29 Jul 2017 00:02:30 +0000"  >&lt;p&gt;Oleg Drokin (oleg.drokin@intel.com) merged in patch &lt;a href=&quot;https://review.whamcloud.com/27908/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/27908/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-9728&quot; title=&quot;out of memory on OSS causing allocation failures or hung threads&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-9728&quot;&gt;&lt;del&gt;LU-9728&lt;/del&gt;&lt;/a&gt; osd: use GFP_HIGHUSER for non-local IO&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: b0ab95d6133e783acacc6329c025d17fb282775e&lt;/p&gt;</comment>
                            <comment id="203883" author="pjones" created="Sat, 29 Jul 2017 13:37:57 +0000"  >&lt;p&gt;Landed for 2.11&lt;/p&gt;</comment>
                            <comment id="204208" author="gerrit" created="Wed, 2 Aug 2017 15:56:55 +0000"  >&lt;p&gt;Minh Diep (minh.diep@intel.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/28318&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/28318&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-9728&quot; title=&quot;out of memory on OSS causing allocation failures or hung threads&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-9728&quot;&gt;&lt;del&gt;LU-9728&lt;/del&gt;&lt;/a&gt; osd: use GFP_HIGHUSER for non-local IO&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: b2_10&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 06e40d2220cf9895a7fac74a2f86582d3fc38c1f&lt;/p&gt;</comment>
                            <comment id="205036" author="gerrit" created="Thu, 10 Aug 2017 16:25:58 +0000"  >&lt;p&gt;John L. Hammond (john.hammond@intel.com) merged in patch &lt;a href=&quot;https://review.whamcloud.com/28318/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/28318/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-9728&quot; title=&quot;out of memory on OSS causing allocation failures or hung threads&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-9728&quot;&gt;&lt;del&gt;LU-9728&lt;/del&gt;&lt;/a&gt; osd: use GFP_HIGHUSER for non-local IO&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: b2_10&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: a4c7545f6e77229a3eabe537eb9ed161ff3c88ee&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10010">
                    <name>Duplicate</name>
                                            <outwardlinks description="duplicates">
                                                        </outwardlinks>
                                                        </issuelinktype>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                                        </outwardlinks>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="66697">LU-15117</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzzfyf:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>