<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:23:42 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-2258] races between evict and umount</title>
                <link>https://jira.whamcloud.com/browse/LU-2258</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;I believe that I&apos;ve found two possible race conditions between evict and umount on the server-side. &lt;/p&gt;

&lt;p&gt;&#12539;About The Cases&lt;br/&gt;
1) Touch an Already-Finalized Lustre Hash&lt;br/&gt;
After class_manual_cleanup() with mds or obdfilter, obd-&amp;gt;obd_uuid_hash has been finalized. so I believe that the hash shouldn&apos;t be touched by anyone. But, evict process can touch the hash right after class_cleanup() because no exclusive function works in this case. Which is why the case ends up in OS panic.&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;PID: 3361   TASK: ffff810621f21080  CPU: 13  COMMAND: &quot;lctl&quot;
 #0 [ffff8105e1817a30] crash_kexec at ffffffff800aeb6b
 #1 [ffff8105e1817af0] __die at ffffffff80066157
 #2 [ffff8105e1817b30] die at ffffffff8006cce5
 #3 [ffff8105e1817b60] do_general_protection at ffffffff8006659f
 #4 [ffff8105e1817ba0] error_exit at ffffffff8005ede9
    [exception RIP: lustre_hash_lookup+249]
    RIP: ffffffff887400f9  RSP: ffff8105e1817c58  RFLAGS: 00010206
    RAX: ffff810115f337c0  RBX: 00000000ffff8106  RCX: 0000000000000001
    RDX: 00000000ffff8106  RSI: ffff8105e1817ce8  RDI: ffff810628834140
    RBP: ffff810628834140   R8: 0000000000000032   R9: 0000000000000020
    R10: 000000000000001b  R11: 0000000000000000  R12: 0000000000000000
    R13: ffff8105e1817ce8  R14: ffff8102de86e078  R15: 00007fff6a0adc10
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #5 [ffff8105e1817cb0] obd_export_evict_by_uuid at ffffffff8874426f
 #6 [ffff8105e1817d40] lprocfs_wr_evict_client at ffffffff88869bfd
 #7 [ffff8105e1817e00] lprocfs_mds_wr_evict_client at ffffffff889a6a39
 #8 [ffff8105e1817ee0] lprocfs_fops_write at ffffffff8875349b
 #9 [ffff8105e1817f10] vfs_write at ffffffff80016a49
#10 [ffff8105e1817f40] sys_write at ffffffff80017316
#11 [ffff8105e1817f80] system_call at ffffffff8005e116
    RIP: 000000394bac6070  RSP: 00007fff6a0ad3c0  RFLAGS: 00010287
    RAX: 0000000000000001  RBX: ffffffff8005e116  RCX: 00000000fbad2a84
    RDX: 0000000000000024  RSI: 00007fff6a0b0bc7  RDI: 0000000000000003
    RBP: 0000000000000001   R8: fefefefefefefeff   R9: 632d303038362d63
    R10: 0000000000000000  R11: 0000000000000246  R12: 0000000000000000
    R13: 00007fff6a0b0bb4  R14: 0000000000000003  R15: 0000000000000000
    ORIG_RAX: 0000000000000001  CS: 0033  SS: 002b
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;2) Deadlock at _procfs_lock&lt;br/&gt;
lprocfs_wr_evict_client() calls class_incref() before LPROCFS_EXIT() and calls class_decref() after LPROCFS_ENTRY(). So when the case in which the class_decref() calls lprocfs_removes() via osc_cleanup() while having _lprocfs_lock already, this case ends up in a deadlock. &lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;PID: 4495   TASK: ffff810638e25820  CPU: 5   COMMAND: &quot;lctl&quot;
 #0 [ffff8106184df728] schedule at ffffffff80063f96
 #1 [ffff8106184df800] __down_write_nested at ffffffff80065613
 #2 [ffff8106184df840] lprocfs_remove at ffffffff88751fc1
 #3 [ffff8106184df8a0] lprocfs_obd_cleanup at ffffffff887524a8
 #4 [ffff8106184df8b0] osc_cleanup at ffffffff88b25e70
 #5 [ffff8106184df900] class_decref at ffffffff8875c46c
 #6 [ffff8106184df950] obd_zombie_impexp_cull at ffffffff88743ad2
 #7 [ffff8106184df970] class_detach at ffffffff8875d9dd
 #8 [ffff8106184df9b0] class_process_config at ffffffff8876183e
 #9 [ffff8106184dfa30] class_manual_cleanup at ffffffff88763747
#10 [ffff8106184dfb20] lov_putref at ffffffff88aceb60
#11 [ffff8106184dfbe0] lov_disconnect at ffffffff88ad1f2b
#12 [ffff8106184dfc40] mds_lov_clean at ffffffff88975fd4
#13 [ffff8106184dfca0] mds_precleanup at ffffffff889851a9
#14 [ffff8106184dfcf0] class_decref at ffffffff8875c07e
#15 [ffff8106184dfd40] lprocfs_wr_evict_client at ffffffff88869c14
#16 [ffff8106184dfe00] lprocfs_mds_wr_evict_client at ffffffff889a6a39
#17 [ffff8106184dfee0] lprocfs_fops_write at ffffffff887534cb
#18 [ffff8106184dff10] vfs_write at ffffffff80016a49
#19 [ffff8106184dff40] sys_write at ffffffff80017316
#20 [ffff8106184dff80] system_call at ffffffff8005e116
    RIP: 000000394bac6070  RSP: 00007fff3dfad010  RFLAGS: 00010287
    RAX: 0000000000000001  RBX: ffffffff8005e116  RCX: 00000000fbad2a84
    RDX: 0000000000000024  RSI: 00007fff3dfb1bc3  RDI: 0000000000000003
    RBP: 0000000000000001   R8: fefefefefefefeff   R9: 0000000000000000
    R10: 0000000000000000  R11: 0000000000000246  R12: 0000000000000000
    R13: 00007fff3dfb1bb0  R14: 0000000000000003  R15: 0000000000000000
    ORIG_RAX: 0000000000000001  CS: 0033  SS: 002b
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;the both cases happen with FEFS based on Lustre-1.8.5. But after my source code reading, I came to think that the both case could happen with Lustre-1.8.8, and Lustre-2.3.x too. But I&apos;m not so sure, so I&apos;m happy if someone confirms that. &lt;/p&gt;

&lt;p&gt;&#12539;About The Patches&apos; Features&lt;br/&gt;
1) When someone is touching evict_nid on procfs, mount thread waits for finishing touching it with a new function, server_wait_for_evict_client().&lt;br/&gt;
2) After server_wait_for_evict_client(), a new variable, obd_evict_client_frozen, is set and this prohibits &quot;evict_client&quot; from evicting export object.&lt;br/&gt;
3) Add a if-statement to the evict functions which checks obd_stopping and obd_evict_client_frozen so as not to touch the lustre hash which has already been finalized by class_manual_cleanup().&lt;/p&gt;

&lt;p&gt;I will be happy when someone reviews my patches and when my patches for the both cases help you to fix the problem. &lt;/p&gt;

&lt;p&gt;Thank you.&lt;/p&gt;</description>
                <environment>FEFS based on Lustre-1.8.5&lt;br/&gt;
MDSx1, OSSx1(OSTx3), Clientx1</environment>
        <key id="16543">LU-2258</key>
            <summary>races between evict and umount</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="laisiyao">Lai Siyao</assignee>
                                    <reporter username="nozaki">Hiroya Nozaki</reporter>
                        <labels>
                            <label>mn8</label>
                            <label>patch</label>
                    </labels>
                <created>Fri, 2 Nov 2012 02:25:20 +0000</created>
                <updated>Mon, 28 Oct 2013 13:19:50 +0000</updated>
                            <resolved>Mon, 28 Oct 2013 13:19:50 +0000</resolved>
                                    <version>Lustre 2.3.0</version>
                    <version>Lustre 2.4.0</version>
                    <version>Lustre 1.8.8</version>
                    <version>Lustre 1.8.x (1.8.0 - 1.8.5)</version>
                                    <fixVersion>Lustre 2.4.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>2</watches>
                                                                            <comments>
                            <comment id="47280" author="nozaki" created="Fri, 2 Nov 2012 02:29:40 +0000"  >&lt;p&gt;Here is patch for b1_8.&lt;br/&gt;
&lt;a href=&quot;http://review.whamcloud.com/#change,4442&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#change,4442&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I&apos;ll make the patch for the master branch soon.&lt;br/&gt;
please wait for a while.&lt;/p&gt;</comment>
                            <comment id="47283" author="nozaki" created="Fri, 2 Nov 2012 05:20:29 +0000"  >&lt;p&gt;patch for the master branch&lt;br/&gt;
&lt;a href=&quot;http://review.whamcloud.com/4444&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/4444&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="47563" author="nozaki" created="Thu, 8 Nov 2012 00:21:56 +0000"  >&lt;p&gt;Regarding the issue which we&apos;ve been discussing in the patch-set-1 comment, I&apos;ve taken this way to fix it.&lt;/p&gt;


&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeHeader panelHeader&quot; style=&quot;border-bottom-width: 1px;&quot;&gt;&lt;b&gt;class_decref()&lt;/b&gt;&lt;/div&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;void class_decref(struct obd_device *obd)
{
        &lt;span class=&quot;code-object&quot;&gt;int&lt;/span&gt; err;
        &lt;span class=&quot;code-object&quot;&gt;int&lt;/span&gt; refs;

        mutex_down(&amp;amp;obd-&amp;gt;obd_dev_cleanup_sem); &amp;lt;----- adds &lt;span class=&quot;code-keyword&quot;&gt;new&lt;/span&gt; mutex to obd_device
        spin_lock(&amp;amp;obd-&amp;gt;obd_dev_lock);
        atomic_dec(&amp;amp;obd-&amp;gt;obd_refcount);
        refs = atomic_read(&amp;amp;obd-&amp;gt;obd_refcount);
        spin_unlock(&amp;amp;obd-&amp;gt;obd_dev_lock);

        CDEBUG(D_CONFIG, &lt;span class=&quot;code-quote&quot;&gt;&quot;Decref %s (%p) now %d\n&quot;&lt;/span&gt;, obd-&amp;gt;obd_name, obd, refs);

        &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; ((refs == 1) &amp;amp;&amp;amp; obd-&amp;gt;obd_stopping &amp;amp;&amp;amp;
            !obd-&amp;gt;obd_precleaned_up) {      &amp;lt;----- adds &lt;span class=&quot;code-keyword&quot;&gt;new&lt;/span&gt; stage to obd_device and check here
                obd-&amp;gt;obd_precleaned_up = 1; &amp;lt;----- set obd_precleaned_up so as not to handle here again

                ...

                mutex_up(&amp;amp;obd-&amp;gt;obd_dev_cleanup_sem);
                &lt;span class=&quot;code-keyword&quot;&gt;return&lt;/span&gt;;
        }
        mutex_up(&amp;amp;obd-&amp;gt;obd_dev_cleanup_sem);

        &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (refs == 0) {

                ...

        }
}
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;


&lt;p&gt;I believe that class_decref() isn&apos;t called so often and that this exclusive section doesn&apos;t takes so much time especially in 2.x.&lt;br/&gt;
So I think this way couldn&apos;t be problem .... whereas I&apos;d like to refrain from adding a new member to obd_device, though.&lt;/p&gt;</comment>
                            <comment id="70002" author="pjones" created="Mon, 28 Oct 2013 13:19:50 +0000"  >&lt;p&gt;Patch was landed for 2.4&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzvbif:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>5407</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>