<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:43:01 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-4471] Failed &apos;rmdir&apos; on remote directories still removes directory on MDT0</title>
                <link>https://jira.whamcloud.com/browse/LU-4471</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;When doing an rmdir on a &apos;first level&apos; remote directory, IE, a directory on MDT1 which is in a directory on MDT0, the directory entry on MDT0 is removed before the sanity checking is done.&lt;/p&gt;

&lt;p&gt;To reproduce (/lus/TEMP is a directory on MDT0):&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;/lus/TEMP # mkdir mdt0
/lus/TEMP # lfs mkdir -i 1 mdt1
/lus/TEMP # touch mdt1/file
/lus/TEMP # ls
mdt0  mdt1
/lus/TEMP # ls mdt1
1
/lus/TEMP # rmdir mdt1
rmdir: failed to remove `mdt1&apos;: Directory not empty
/lus/TEMP #  ls
mdt0
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;As you can see, rmdir returns with an error saying it failed to remove the directory mdt1, but the director no longer exists on MDT0.&lt;/p&gt;

&lt;p&gt;Looking at mdt_reint_unlink (which is executing on MDT1), it&apos;s easy to see why.&lt;br/&gt;
When a first level remote directory is found, the delete RPC is sent to MDT0 before the sanity checking on MDT1 is done.&lt;/p&gt;

&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;&lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (mdt_object_remote(mc)) {
                struct mdt_body  *repbody;

                &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (!fid_is_zero(rr-&amp;gt;rr_fid2)) {
                        CDEBUG(D_INFO, &lt;span class=&quot;code-quote&quot;&gt;&quot;%s: name &quot;&lt;/span&gt;DNAME&lt;span class=&quot;code-quote&quot;&gt;&quot; cannot find &quot;&lt;/span&gt;DFID&lt;span class=&quot;code-quote&quot;&gt;&quot;\n&quot;&lt;/span&gt;,
                               mdt_obd_name(info-&amp;gt;mti_mdt),
                               PNAME(&amp;amp;rr-&amp;gt;rr_name), PFID(mdt_object_fid(mc)));
                        GOTO(put_child, rc = -ENOENT);
                }
                CDEBUG(D_INFO, &lt;span class=&quot;code-quote&quot;&gt;&quot;%s: name &quot;&lt;/span&gt;DNAME&lt;span class=&quot;code-quote&quot;&gt;&quot;: &quot;&lt;/span&gt;DFID&lt;span class=&quot;code-quote&quot;&gt;&quot; is on another MDT\n&quot;&lt;/span&gt;,
                       mdt_obd_name(info-&amp;gt;mti_mdt),
                       PNAME(&amp;amp;rr-&amp;gt;rr_name), PFID(mdt_object_fid(mc)));

                &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (!mdt_is_dne_client(req-&amp;gt;rq_export))
                        &lt;span class=&quot;code-comment&quot;&gt;/* Return -EIO &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; old client */&lt;/span&gt;
                        GOTO(put_child, rc = -EIO);

                &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (info-&amp;gt;mti_spec.sp_rm_entry) {
                        struct lu_ucred *uc  = mdt_ucred(info);

                        &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (!md_capable(uc, CFS_CAP_SYS_ADMIN)) {
                                CERROR(&lt;span class=&quot;code-quote&quot;&gt;&quot;%s: unlink remote entry is only &quot;&lt;/span&gt;
                                       &lt;span class=&quot;code-quote&quot;&gt;&quot;permitted &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; administrator: rc = %d\n&quot;&lt;/span&gt;,
                                        mdt_obd_name(info-&amp;gt;mti_mdt),
                                        -EPERM);
                                GOTO(put_child, rc = -EPERM);
                        }

                        ma-&amp;gt;ma_need = MA_INODE;
                        ma-&amp;gt;ma_valid = 0;
                        mdt_set_capainfo(info, 1, child_fid, BYPASS_CAPA);
                        rc = mdo_unlink(info-&amp;gt;mti_env, mdt_object_child(mp),
                                        NULL, &amp;amp;rr-&amp;gt;rr_name, ma, no_name);
                        GOTO(put_child, rc);
                }
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Followed shortly after by this:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;        mutex_lock(&amp;amp;mc-&amp;gt;mot_lov_mutex);

        rc = mdo_unlink(info-&amp;gt;mti_env, mdt_object_child(mp),
                        mdt_object_child(mc), &amp;amp;rr-&amp;gt;rr_name, ma, no_name);

        mutex_unlock(&amp;amp;mc-&amp;gt;mot_lov_mutex);
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;It is mdo_unlink that returns the -39 (ENOTEMPTY) back to the client, because it calls mdd_unlink_sanity_check (which calls mdd_dir_is_empty).&lt;/p&gt;

&lt;p&gt;I have logs from both MDTs and the client of an rmdir on MDT0 failing as expected, and an rmdir on MDT1 showing the unusual behavior described.  I&apos;ll attach those shortly.&lt;/p&gt;</description>
                <environment></environment>
        <key id="22704">LU-4471</key>
            <summary>Failed &apos;rmdir&apos; on remote directories still removes directory on MDT0</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="1" iconUrl="https://jira.whamcloud.com/images/icons/priorities/blocker.svg">Blocker</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="cliffw">Cliff White</assignee>
                                    <reporter username="paf">Patrick Farrell</reporter>
                        <labels>
                            <label>MB</label>
                            <label>mn4</label>
                            <label>patch</label>
                    </labels>
                <created>Fri, 10 Jan 2014 17:48:00 +0000</created>
                <updated>Sat, 1 Mar 2014 00:10:34 +0000</updated>
                            <resolved>Tue, 25 Feb 2014 21:15:18 +0000</resolved>
                                    <version>Lustre 2.5.0</version>
                                    <fixVersion>Lustre 2.6.0</fixVersion>
                    <fixVersion>Lustre 2.5.1</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>6</watches>
                                                                            <comments>
                            <comment id="74745" author="paf" created="Fri, 10 Jan 2014 19:36:17 +0000"  >&lt;p&gt;Logs of the client, MDS0, and MDS1 for:&lt;/p&gt;

&lt;p&gt;The &apos;normal&apos; rmdir in these logs is the attempted removal of a directory on MDT0 with files in it, which fails as expected.&lt;/p&gt;

&lt;p&gt;The &apos;broken&apos; rmdir in these logs is the attempted removal of a directory on MDT1, which returns a failure but also removes the directory entry for the directory on MDT0.&lt;/p&gt;

&lt;p&gt;Logs are from CentOS, Lustre is master from early January 2014.&lt;/p&gt;</comment>
                            <comment id="74857" author="paf" created="Mon, 13 Jan 2014 21:05:05 +0000"  >&lt;p&gt;Ugh, my mistake here.  The logs are still valid, but my analysis above was mistaken.&lt;/p&gt;

&lt;p&gt;The problem seems to be specifically in mdd_unlink.&lt;/p&gt;

&lt;p&gt;mdd_declare_unlink is called from mdd_unlink, and then when mdd_trans_start is called, the remote_sync it does causes the remote MDT to delete the directory.&lt;/p&gt;

&lt;p&gt;Further down, mdd_unlink_sanity_check is called on the non-remote MDT, and it errors because the directory is non-empty.&lt;/p&gt;</comment>
                            <comment id="74862" author="paf" created="Mon, 13 Jan 2014 22:44:38 +0000"  >&lt;p&gt;Patch here:&lt;br/&gt;
&lt;a href=&quot;http://review.whamcloud.com/8827&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/8827&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Confirmed it fixes the issue in question.&lt;/p&gt;</comment>
                            <comment id="74960" author="cliffw" created="Tue, 14 Jan 2014 21:31:10 +0000"  >&lt;p&gt;Thanks, will monitor.&lt;/p&gt;</comment>
                            <comment id="75283" author="jlevi" created="Mon, 20 Jan 2014 16:47:14 +0000"  >&lt;p&gt;Cliff,&lt;br/&gt;
Are you able to comment on your monitoring of this ticket yet?&lt;br/&gt;
Thank you!&lt;/p&gt;</comment>
                            <comment id="75290" author="cliffw" created="Mon, 20 Jan 2014 17:39:49 +0000"  >&lt;p&gt;Patrick, it appears that you patch has failed some reviews, are you able to address the comments?&lt;/p&gt;</comment>
                            <comment id="75291" author="paf" created="Mon, 20 Jan 2014 17:48:55 +0000"  >&lt;p&gt;Cliff - Most definitely.  Barring the unexpected, I&apos;ll get to fixing it up this afternoon.&lt;/p&gt;</comment>
                            <comment id="76356" author="cliffw" created="Thu, 6 Feb 2014 16:54:20 +0000"  >&lt;p&gt;Patrick, any updates on this issue?&lt;/p&gt;</comment>
                            <comment id="76360" author="paf" created="Thu, 6 Feb 2014 17:06:26 +0000"  >&lt;p&gt;Cliff - Not really.  This one&apos;s in Intel&apos;s hands now.  My patch, I think, satisfies Andreas and Fan Yong, and Cray has been using it internally for a bit now with great success.  The Maloo tests failed for a reason unrelated to my patch, and need to be re-run.&lt;/p&gt;

&lt;p&gt;I can&apos;t really do anything more, as the patch is good - Someone at Intel can restart the Maloo testing, and hopefully it won&apos;t hit any other issues.&lt;/p&gt;</comment>
                            <comment id="76640" author="cliffw" created="Mon, 10 Feb 2014 20:00:22 +0000"  >&lt;p&gt;I have restarted testing, sorry about that. Will monitor.&lt;/p&gt;</comment>
                            <comment id="77808" author="bogl" created="Tue, 25 Feb 2014 15:18:06 +0000"  >&lt;p&gt;backport to b2_5:&lt;br/&gt;
&lt;a href=&quot;http://review.whamcloud.com/9379&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/9379&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="77858" author="pjones" created="Tue, 25 Feb 2014 21:15:18 +0000"  >&lt;p&gt;Landed for 2.6&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="23395">LU-4690</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="13968" name="lu-4471.tar.gz" size="305567" author="paf" created="Fri, 10 Jan 2014 19:36:17 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzwcmf:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>12248</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>