<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:53:57 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-5722] memory allocation deadlock under lu_cache_shrink()</title>
                <link>https://jira.whamcloud.com/browse/LU-5722</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;While running sanity-benchmark.sh dbench, I hit the following memory allocation deadlock under mdc_read_page_remote():&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;dbench D 0000000000000001 0 14532 1 0x00000004
Call Trace:
resched_task+0x68/0x80
__mutex_lock_slowpath+0x13e/0x180
mutex_lock+0x2b/0x50
lu_cache_shrink+0x203/0x310 [obdclass]
shrink_slab+0x11a/0x1a0
do_try_to_free_pages+0x3f7/0x610
try_to_free_pages+0x92/0x120
__alloc_pages_nodemask+0x478/0x8d0
alloc_pages_current+0xaa/0x110
__page_cache_alloc+0x87/0x90
mdc_read_page_remote+0x13c/0xd90 [mdc] do_read_cache_page+0x7b/0x180
read_cache_page_async+0x19/0x20
read_cache_page+0xe/0x20
mdc_read_page+0x192/0x950 [mdc]
lmv_read_page+0x1e0/0x1210 [lmv]
ll_get_dir_page+0xbc/0x370 [lustre]
ll_dir_read+0x9e/0x300 [lustre]
ll_readdir+0x12a/0x4d0 [lustre]
vfs_readdir+0xc0/0xe0
sys_getdents+0x89/0xf0
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The page allocation is recursing into Lustre and the DLM slab shrinker, which is blocked on a lock that is being held. Presumably it needs to use GFP_NOFS during the allocation? I didn&apos;t actually check what locks were held, since the machine hung as I was trying to get more info.&lt;/p&gt;</description>
                <environment>single-node testing on master (5c4f68be57 + &lt;a href=&quot;http://review.whamcloud.com/11258&quot;&gt;http://review.whamcloud.com/11258&lt;/a&gt; )&lt;br/&gt;
kernel: 2.6.32-358.23.2.el6_lustre.gc9be53c.x86_64&lt;br/&gt;
combined MDS+MGS+OSS, 2x MDT, 3xOST on LVM</environment>
        <key id="26937">LU-5722</key>
            <summary>memory allocation deadlock under lu_cache_shrink()</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="cliffw">Cliff White</assignee>
                                    <reporter username="adilger">Andreas Dilger</reporter>
                        <labels>
                            <label>mq414</label>
                            <label>patch</label>
                    </labels>
                <created>Thu, 9 Oct 2014 17:49:04 +0000</created>
                <updated>Sun, 18 Sep 2016 17:13:50 +0000</updated>
                            <resolved>Sun, 8 Feb 2015 04:52:27 +0000</resolved>
                                    <version>Lustre 2.7.0</version>
                                    <fixVersion>Lustre 2.7.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>10</watches>
                                                                            <comments>
                            <comment id="96098" author="adilger" created="Thu, 9 Oct 2014 23:15:32 +0000"  >&lt;p&gt;It may be that this is a larger problem than just mdc_read_page_local().  Running sanity.sh again I see multiple threads stuck in test_49() calling lu_cache_shrink(), even threads unrelated to Lustre:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;kswapd0       D 0000000000000001     0    38      2 0x00000000
Call Trace:
__mutex_lock_slowpath+0x13e/0x180
mutex_lock+0x2b/0x50
lu_cache_shrink+0x203/0x310 [obdclass]
shrink_slab+0x11a/0x1a0
balance_pgdat+0x59a/0x820
kswapd+0x134/0x3c0
kthread+0x96/0xa0
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;This will impact all threads trying to allocate memory.  Some threads also get stuck in direct reclaim if memory is low (this is just one of many threads stuck at &lt;tt&gt;lu&amp;#95;cache&amp;#95;shrink+0x203/0x310&lt;/tt&gt;):&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;sendmail      D 0000000000000001     0 15730 
  2260 0x00000000
Call Trace:
__mutex_lock_slowpath+0x13e/0x180
mutex_lock+0x2b/0x50
lu_cache_shrink+0x203/0x310 [obdclass]
shrink_slab+0x11a/0x1a0
do_try_to_free_pages+0x3f7/0x610
try_to_free_pages+0x92/0x120
__alloc_pages_nodemask+0x478/0x8d0
kmem_getpages+0x62/0x170
fallback_alloc+0x1ba/0x270
____cache_alloc_node+0x99/0x160
kmem_cache_alloc_node+0x89/0x1d0
__alloc_skb+0x4f/0x190
sk_stream_alloc_skb+0x41/0x110
tcp_sendmsg+0x350/0xa20
sock_aio_write+0x19b/0x1c0
do_sync_write+0xfa/0x140
vfs_write+0x184/0x1a0
sys_write+0x51/0x90
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Some threads are stuck at &lt;tt&gt;lu&amp;#95;cache&amp;#95;shrink+0x144/0x310&lt;/tt&gt; instead:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Oct  9 15:43:18 sookie-gig kernel: irqbalance    D 0000000000000000     0  1616     1 0x00000000
Oct  9 15:43:18 sookie-gig kernel: Call Trace:
__mutex_lock_slowpath+0x13e/0x180
mutex_lock+0x2b/0x50
lu_cache_shrink+0x144/0x310 [obdclass]
shrink_slab+0x11a/0x1a0
do_try_to_free_pages+0x3f7/0x610
try_to_free_pages+0x92/0x120
__alloc_pages_nodemask+0x478/0x8d0
alloc_pages_vma+0x9a/0x150
handle_pte_fault+0x76b/0xb50
handle_mm_fault+0x23a/0x310
__do_page_fault+0x139/0x480
do_page_fault+0x3e/0xa0
page_fault+0x25/0x30
proc_reg_read+0x7e/0xc0
vfs_read+0xb5/0x1a0
sys_read+0x51/0x90
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;It seems some of the code has been inlined, but all callers are blocked on getting lu_sites_guard.&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;(gdb) list *(lu_cache_shrink+0x203)
0x51c33 is in lu_cache_shrink (/usr/src/lustre-head/lustre/obdclass/lu_object.c:1961).
1956	
1957		&lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (!(sc-&amp;gt;gfp_mask &amp;amp; __GFP_FS))
1958			&lt;span class=&quot;code-keyword&quot;&gt;return&lt;/span&gt; 0;
1959	
1960		mutex_lock(&amp;amp;lu_sites_guard);
1961		list_for_each_entry_safe(s, tmp, &amp;amp;lu_sites, ls_linkage) {
1962			memset(&amp;amp;stats, 0, sizeof(stats));
1963			lu_site_stats_get(s-&amp;gt;ls_obj_hash, &amp;amp;stats, 0);
1964			cached += stats.lss_total - stats.lss_busy;
1965		}
(gdb) list *(lu_cache_shrink+0x144)
0x51b74 is in lu_cache_shrink (/usr/src/lustre-head/lustre/obdclass/lu_object.c:1996).
1991                     * anyways.
1992                     */
1993                    &lt;span class=&quot;code-keyword&quot;&gt;return&lt;/span&gt; SHRINK_STOP;
1994
1995            mutex_lock(&amp;amp;lu_sites_guard);
1996            list_for_each_entry_safe(s, tmp, &amp;amp;lu_sites, ls_linkage) {
1997                    remain = lu_site_purge(&amp;amp;lu_shrink_env, s, remain);
1998                    /*
1999                     * Move just shrunk site to the tail of site list to
2000                     * assure shrinking fairness.
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt; It isn&apos;t clear which thread is holding the lu_sites_guard mutex. In both hangs so far, it appears there is a running thread that &lt;em&gt;may&lt;/em&gt; be holding this mutex.  The stack trace is unclear because it is entirely marked with &quot;?&quot;, but that may always be the case for running threads, or this may be leftover garbage on the stack and the process is running in userspace (though I don&apos;t see any &quot;cleanup&quot; routines on the stack)?&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;cpuspeed      R  running task        0  1606      1 0x00000000
Call Trace:
? thread_return+0x4e/0x76e
? apic_timer_interrupt+0xe/0x20
? mutex_lock+0x1e/0x50
? cfs_hash_spin_lock+0xe/0x10 [libcfs]
? lu_site_purge+0x134/0x4e0 [obdclass]
? _spin_lock+0x12/0x30
? cfs_hash_spin_lock+0xe/0x10 [libcfs]
? lu_site_stats_get+0x98/0x170 [obdclass]
? lu_cache_shrink+0x242/0x310 [obdclass]
? shrink_slab+0x12a/0x1a0
? do_try_to_free_pages+0x3f7/0x610
? try_to_free_pages+0x92/0x120
? __alloc_pages_nodemask+0x478/0x8d0
? alloc_pages_vma+0x9a/0x150
? handle_pte_fault+0x76b/0xb50
? handle_mm_fault+0x23a/0x310
? __do_page_fault+0x139/0x480
? do_page_fault+0x3e/0xa0
? page_fault+0x25/0x30
? proc_reg_read+0x7e/0xc0
? vfs_read+0xb5/0x1a0
? sys_read+0x51/0x90
runnable tasks:
           task   PID         tree-key  switches  prio     exec-runtime         sum-exec        sum-sleep
----------------------------------------------------------------------------------------------------------
R       cpuspeed  1606   2735264.028890   1596454   120   2735264.028890   1068807.826562  11067012.338451 /
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;(gdb) list *(lu_cache_shrink+0x242)
0x51c72 is in lu_cache_shrink (/usr/src/lustre-head/lustre/obdclass/lu_object.c:1964).
1959
1960            mutex_lock(&amp;amp;lu_sites_guard);
1961            list_for_each_entry_safe(s, tmp, &amp;amp;lu_sites, ls_linkage) {
1962                    memset(&amp;amp;stats, 0, sizeof(stats));
1963                    lu_site_stats_get(s-&amp;gt;ls_obj_hash, &amp;amp;stats, 0);
1964                    cached += stats.lss_total - stats.lss_busy;
1965            }
1966            mutex_unlock(&amp;amp;lu_sites_guard);
1967
1968            cached = (cached / 100) * sysctl_vfs_cache_pressure;
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="97843" author="fzago" created="Wed, 29 Oct 2014 15:57:36 +0000"  >&lt;p&gt;I don&apos;t know whether it&apos;s the same bug or not, but I&apos;ve seen something that looks similar under 2.5. A lot of processes are stuck in or around lu_cache_shrink. The machine is not really hung, but not usable either, and needs rebooting. I made a patch for it.&lt;/p&gt;

&lt;p&gt;Here&apos;s a forward ported patch for head of tree, but untested: &lt;a href=&quot;http://review.whamcloud.com/#/c/12468/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#/c/12468/&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="106168" author="gerrit" created="Sun, 8 Feb 2015 02:26:10 +0000"  >&lt;p&gt;Oleg Drokin (oleg.drokin@intel.com) merged in patch &lt;a href=&quot;http://review.whamcloud.com/12468/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/12468/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-5722&quot; title=&quot;memory allocation deadlock under lu_cache_shrink()&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-5722&quot;&gt;&lt;del&gt;LU-5722&lt;/del&gt;&lt;/a&gt; obdclass: reorganize busy object accounting&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: ff0b34274d4f8754ebba0a5a812bd117cbec37b1&lt;/p&gt;</comment>
                            <comment id="106185" author="pjones" created="Sun, 8 Feb 2015 04:52:27 +0000"  >&lt;p&gt;Landed for 2.7&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="16899">LU-2468</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="10110">LU-14</issuekey>
        </issuelink>
                            </outwardlinks>
                                                                <inwardlinks description="is related to">
                                                        </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzwy7r:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>16062</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>