<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:25:38 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-2487] 2.2 Client deadlock between ll_md_blocking_ast, sys_close, and sys_open</title>
                <link>https://jira.whamcloud.com/browse/LU-2487</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Spinlock Usage&lt;/p&gt;

&lt;p&gt;Walking through the code and digging through pointers in the stack frames of the 3 threads leads to the following 3 suspect structures and their related spinlocks:&lt;/p&gt;

&lt;p&gt;inode         = 0xffff880581a26638  (i_lock)      &lt;br/&gt;
dentry        = 0xffff88051f7242c0  (d_lock)&lt;br/&gt;
ldlm_resource = 0xffff880308f916c0  (lr_lock)&lt;/p&gt;

&lt;p&gt;Looks like there is a 3 way deadlock between ll_md_blocking_ast, sys_open, and sys_close using the above spinlocks.&lt;/p&gt;

&lt;p&gt;CPU 10: ll_md_blocking_ast()&lt;br/&gt;
   + ll_inode_from_resource() gets lock-l_resource-&amp;gt;lr_lock&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;igrab() wants lock-&amp;gt;l_resource-&amp;gt;lr_lvb_inode-&amp;gt;i_lock&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;CPU 7: sys_close()&lt;br/&gt;
   + dput() gets dentry-&amp;gt;d_lock &lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;ldlm_resource_foreach() wants res-&amp;gt;lr_lock&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;CPU 15: sys_open&lt;br/&gt;
   + ll_splice_alias()/ll_find_alias() gets inode-&amp;gt;i_lock &lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;ll_splice_alias()/ll_find_alias() wants dentry-&amp;gt;d_lock&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;The lr_lock, d_lock, and i_lock are the same in all cases. So ll_md_blocking_ast() waits for sys_open(), sys_open waits for sys_close(), and sys_close() waits for ll_md_blocking_ast. A 3-way deadlock.&lt;/p&gt;

&lt;p&gt;This deadlock is possible because of two pathces: &lt;br/&gt;
1) &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-903&quot; title=&quot;Race condition while get_attr after cancel_lru_locks and sysctl drop_caches&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-903&quot;&gt;&lt;del&gt;LU-903&lt;/del&gt;&lt;/a&gt;(not landed yet) (bz24555) - get inode lock. If we get ldlm_lock lock thin no deadlock possible.&lt;br/&gt;
resource dentry inode&lt;br/&gt;
ldlm_lock resource dentry&lt;br/&gt;
2) dcache_lock removing (&lt;a href=&quot;http://review.whamcloud.com/#change,1865&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#change,1865&lt;/a&gt;). Before this patch (with bz24555 enabled)also no deadlock possible.&lt;br/&gt;
resource dentry inode&lt;br/&gt;
inode resource dcache_lock&lt;/p&gt;</description>
                <environment></environment>
        <key id="16922">LU-2487</key>
            <summary>2.2 Client deadlock between ll_md_blocking_ast, sys_close, and sys_open</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="5">Cannot Reproduce</resolution>
                                        <assignee username="adilger">Andreas Dilger</assignee>
                                    <reporter username="artem_blagodarenko">Artem Blagodarenko</reporter>
                        <labels>
                            <label>client</label>
                            <label>patch</label>
                    </labels>
                <created>Thu, 13 Dec 2012 00:07:25 +0000</created>
                <updated>Thu, 3 Oct 2013 16:35:00 +0000</updated>
                            <resolved>Wed, 2 Jan 2013 15:47:05 +0000</resolved>
                                    <version>Lustre 2.2.0</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>8</watches>
                                                                            <comments>
                            <comment id="49180" author="artem_blagodarenko" created="Thu, 13 Dec 2012 01:28:22 +0000"  >&lt;p&gt;I set minor priority because race is possible only with patch from &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-903&quot; title=&quot;Race condition while get_attr after cancel_lru_locks and sysctl drop_caches&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-903&quot;&gt;&lt;del&gt;LU-903&lt;/del&gt;&lt;/a&gt;. I already have patch, but need to rebase it ot master.&lt;/p&gt;</comment>
                            <comment id="49238" author="artem_blagodarenko" created="Fri, 14 Dec 2012 05:05:45 +0000"  >&lt;p&gt;Xyratex MRP-675&lt;/p&gt;</comment>
                            <comment id="49240" author="artem_blagodarenko" created="Fri, 14 Dec 2012 06:48:31 +0000"  >&lt;p&gt;&lt;a href=&quot;http://review.whamcloud.com/4833&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/4833&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="49532" author="adilger" created="Fri, 21 Dec 2012 03:34:47 +0000"  >&lt;p&gt;Which client kernel is this?  The new dcache_lock removal is only in use for kernels &amp;gt; 2.6.37, so I guess this is SLES11 SP2 (3.0)?  I&apos;m trying to see where there is a deadlock in the code, because your original comment is not showing the callpath to the function getting the second lock.  I&apos;m guessing something like:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;CPU 10: ll_md_blocking_ast()
   + ll_inode_from_resource()/ll_inode_from_lock() gets lock-&amp;gt;l_lock and lock-&amp;gt;l_resource-&amp;gt;lr_lock
   - igrab() wants lock-&amp;gt;l_resource-&amp;gt;lr_lvb_inode-&amp;gt;i_lock

CPU 7: sys_close()
   + dput() gets dentry-&amp;gt;d_lock
   - ll_ddelete-&amp;gt;find_cbdata-&amp;gt;ldlm_resource_foreach() wants res-&amp;gt;lr_lock

CPU 15: sys_open
   + ll_splice_alias()/ll_find_alias() gets inode-&amp;gt;i_lock 
   - ll_splice_alias()/ll_find_alias() wants dentry-&amp;gt;d_lock
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The ll_ddelete-&amp;gt;find_cbdata() path has been disabled in 9f3469f1:&lt;/p&gt;

&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;        /* Disable &lt;span class=&quot;code-keyword&quot;&gt;this&lt;/span&gt; piece of code temproarily because &lt;span class=&quot;code-keyword&quot;&gt;this&lt;/span&gt; is called
         * inside dcache_lock so it&apos;s not appropriate to &lt;span class=&quot;code-keyword&quot;&gt;do&lt;/span&gt; lots of work
         * here. */
#&lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; 0
        /* &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; not ldlm lock &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;code-keyword&quot;&gt;this&lt;/span&gt; inode, set i_nlink to 0 so that
         * &lt;span class=&quot;code-keyword&quot;&gt;this&lt;/span&gt; inode can be recycled later b=20433 */
        &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (de-&amp;gt;d_inode &amp;amp;&amp;amp; !find_cbdata(de-&amp;gt;d_inode))
                clear_nlink(de-&amp;gt;d_inode);
#endif
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;So it seems there is no need for this complex patch?&lt;/p&gt;</comment>
                            <comment id="49780" author="artem_blagodarenko" created="Sat, 29 Dec 2012 09:39:21 +0000"  >&lt;p&gt;Yes, we do not need this patch, because one of deadlock&apos;s branches is disabled in current master. But I have added comment &quot;&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-2487&quot; title=&quot;2.2 Client deadlock between ll_md_blocking_ast, sys_close, and sys_open&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-2487&quot;&gt;&lt;del&gt;LU-2487&lt;/del&gt;&lt;/a&gt; need to be resolved before branch can be enabled again&quot; to &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-903&quot; title=&quot;Race condition while get_attr after cancel_lru_locks and sysctl drop_caches&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-903&quot;&gt;&lt;del&gt;LU-903&lt;/del&gt;&lt;/a&gt; patch to prevent possible deadlock if somebody enable that branch.&lt;/p&gt;</comment>
                            <comment id="49781" author="artem_blagodarenko" created="Sat, 29 Dec 2012 09:41:54 +0000"  >&lt;p&gt;Can we close this issue with &quot;wan&apos;t fix&quot;?&lt;/p&gt;</comment>
                            <comment id="49844" author="adilger" created="Wed, 2 Jan 2013 15:47:05 +0000"  >&lt;p&gt;I think the comment you added in the &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-903&quot; title=&quot;Race condition while get_attr after cancel_lru_locks and sysctl drop_caches&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-903&quot;&gt;&lt;del&gt;LU-903&lt;/del&gt;&lt;/a&gt; patch should be enough for now. Thanks for the update.  &lt;/p&gt;

&lt;p&gt;I&apos;m going to close this as Cannot Reproduce, since the problem never existed in any of the public Lustre releases.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="21245">LU-4053</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzvdrj:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>5837</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10021"><![CDATA[2]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>