<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:15:54 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-8249] Potential deadlock in lnet</title>
                <link>https://jira.whamcloud.com/browse/LU-8249</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;While testing a patch I was able to deadlock all the CPU of my machine.&lt;br/&gt;
After investigating the log reports I found out that in &quot;LNetMDAttach&quot; (implemented in lnet/lnet/lib-md.c) there is a call to the &quot;vfree&quot; function in interrupt context which is illegal in linux kernel versions prior to 3.10.&lt;/p&gt;

&lt;p&gt;I cannot be sure whether or not this was actually the cause of my problem though, because I have not been able to reproduce since then.&lt;/p&gt;</description>
                <environment></environment>
        <key id="37454">LU-8249</key>
            <summary>Potential deadlock in lnet</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="doug">Doug Oucharek</assignee>
                                    <reporter username="cealustre">CEA</reporter>
                        <labels>
                            <label>patch</label>
                    </labels>
                <created>Tue, 7 Jun 2016 23:11:49 +0000</created>
                <updated>Wed, 5 Aug 2020 13:50:23 +0000</updated>
                            <resolved>Mon, 29 Aug 2016 13:04:09 +0000</resolved>
                                                    <fixVersion>Lustre 2.9.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>7</watches>
                                                                            <comments>
                            <comment id="155005" author="gerrit" created="Tue, 7 Jun 2016 23:13:19 +0000"  >&lt;p&gt;Quentin Bouget (quentin.bouget.ocre@cea.fr) uploaded a new patch: &lt;a href=&quot;http://review.whamcloud.com/20676&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/20676&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-8249&quot; title=&quot;Potential deadlock in lnet&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-8249&quot;&gt;&lt;del&gt;LU-8249&lt;/del&gt;&lt;/a&gt; lnet: potential deadlock in lnet&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: ef1052386265b73c129c2dc5d20bd86e175a8c0d&lt;/p&gt;</comment>
                            <comment id="155050" author="bfaccini" created="Wed, 8 Jun 2016 08:36:32 +0000"  >&lt;p&gt;Hello Quentin, could you provide additional infos to help us to understand the problem ?? Like the log reports you refer too, and may be a crash-dump is available too ??&lt;/p&gt;</comment>
                            <comment id="155138" author="jgmitter" created="Wed, 8 Jun 2016 18:37:03 +0000"  >&lt;p&gt;Hi Doug,&lt;/p&gt;

&lt;p&gt;Can you please investigate this patch?&lt;/p&gt;

&lt;p&gt;Thanks.&lt;br/&gt;
Joe&lt;/p&gt;</comment>
                            <comment id="155210" author="bougetq" created="Thu, 9 Jun 2016 07:00:08 +0000"  >&lt;p&gt;Hello Bruno, I actually do not have much, I did not realize what this was at the time. I thought it was related to my patch and would be easy to reproduce. The best I have is a partial calling stack incriminating &quot;cfs_percpt_lock&quot; (attachment file). I reviewed the code that used it and found the inverted &quot;vfree&quot; and &quot;cfs_percpt_unlock&quot; in &quot;LNetMDAttach&quot;.&lt;/p&gt;</comment>
                            <comment id="157793" author="bfaccini" created="Wed, 6 Jul 2016 12:57:16 +0000"  >&lt;p&gt;Hello Quentin, even if I don&apos;t think it will fix the specific dead-lock situation you have encountered, I am not against your patch that sounds ok to comply to the rule against calling vfree() with a spin-lock currently granted, but then I think you should at least modify LNetMDBind() the same way and there also not to try to lnet_res_unlock() upon lnet_res_lock_current() failure.&lt;/p&gt;

&lt;p&gt;Also, you may have triggered an issue similar to the one tracked in &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-8334&quot; title=&quot;OSS lockup&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-8334&quot;&gt;&lt;del&gt;LU-8334&lt;/del&gt;&lt;/a&gt;, in fact we will need a crash-dump taken upon next occurrence to be able to determine the real root cause.&lt;/p&gt;

&lt;p&gt;What do you think?&lt;/p&gt;</comment>
                            <comment id="157903" author="bougetq" created="Wed, 6 Jul 2016 21:48:44 +0000"  >&lt;p&gt;Indeed I missed the one in LnetMDBind().&lt;br/&gt;
I&apos;m not sure lnet_res_lock_current() can actually fail though, the return code that is tested after it is set by lnet_md_build() and I don&apos;t think there is any actual reason to take the spinlock upon failure (if it is to release it right after that).&lt;/p&gt;

&lt;p&gt;If this ever occurs again I will make sure to get a crash-dump.&lt;/p&gt;

&lt;p&gt;(New patch is available)&lt;/p&gt;</comment>
                            <comment id="157940" author="bfaccini" created="Thu, 7 Jul 2016 09:09:41 +0000"  >&lt;p&gt;Sorry, I meant for failures before lnet_res_lock_current(), but I see that this what you also implemented in the new patch-set #4 !&lt;/p&gt;</comment>
                            <comment id="163367" author="gerrit" created="Mon, 29 Aug 2016 03:52:17 +0000"  >&lt;p&gt;Oleg Drokin (oleg.drokin@intel.com) merged in patch &lt;a href=&quot;http://review.whamcloud.com/20676/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/20676/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-8249&quot; title=&quot;Potential deadlock in lnet&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-8249&quot;&gt;&lt;del&gt;LU-8249&lt;/del&gt;&lt;/a&gt; lnet: potential deadlock in lnet&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: cc025e667464672edd25da819e106854c220e668&lt;/p&gt;</comment>
                            <comment id="163392" author="pjones" created="Mon, 29 Aug 2016 13:04:09 +0000"  >&lt;p&gt;Landed for 2.9&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                            <attachment id="21837" name="partial_stack_lnet_deadlock" size="658" author="bougetq" created="Thu, 9 Jun 2016 07:00:08 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzydz3:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>