<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:33:25 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-3381] lock enqueue fails for open(&quot;.lustre/fid&quot;, ...)</title>
                <link>https://jira.whamcloud.com/browse/LU-3381</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;While watching the performance reported by Robinhood while it was scanning a test filesystem with 24 million files I noticed something strange.  Performance was gradually decreasing over time because the STAGE_GET_INFO_FS stage was taking longer and longer.  That&apos;s not good.&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;May 22 13:31:48 zeno66 robinhood[12124]: STATS |  2: STAGE_GET_INFO_FS    | Wait:     0 | Curr:   1 | Done:   0 | Total:  20898 | ms/op: 1.00
May 22 13:32:48 zeno66 robinhood[12124]: STATS |  2: STAGE_GET_INFO_FS    | Wait:     0 | Curr:   0 | Done:   0 | Total:  54724 | ms/op: 0.49
May 22 13:33:48 zeno66 robinhood[12124]: STATS |  2: STAGE_GET_INFO_FS    | Wait:     0 | Curr:   0 | Done:   0 | Total:  95518 | ms/op: 0.36
May 22 13:34:48 zeno66 robinhood[12124]: STATS |  2: STAGE_GET_INFO_FS    | Wait:     0 | Curr:   0 | Done:   0 | Total: 133057 | ms/op: 0.31
May 22 13:35:48 zeno66 robinhood[12124]: STATS |  2: STAGE_GET_INFO_FS    | Wait:     0 | Curr:   0 | Done:   0 | Total: 141859 | ms/op: 0.31
May 22 13:36:48 zeno66 robinhood[12124]: STATS |  2: STAGE_GET_INFO_FS    | Wait:     0 | Curr:   0 | Done:   0 | Total: 173883 | ms/op: 0.29
May 22 13:37:48 zeno66 robinhood[12124]: STATS |  2: STAGE_GET_INFO_FS    | Wait:     0 | Curr:   5 | Done:   0 | Total: 210361 | ms/op: 0.58
May 22 13:38:48 zeno66 robinhood[12124]: STATS |  2: STAGE_GET_INFO_FS    | Wait:     0 | Curr:   3 | Done:   0 | Total: 262009 | ms/op: 1.49
May 22 13:39:48 zeno66 robinhood[12124]: STATS |  2: STAGE_GET_INFO_FS    | Wait:     0 | Curr:   6 | Done:   0 | Total: 316537 | ms/op: 2.07
May 22 13:40:48 zeno66 robinhood[12124]: STATS |  2: STAGE_GET_INFO_FS    | Wait:     0 | Curr:   4 | Done:   1 | Total: 373078 | ms/op: 2.47
May 22 13:41:48 zeno66 robinhood[12124]: STATS |  2: STAGE_GET_INFO_FS    | Wait:     0 | Curr:   5 | Done:   0 | Total: 432204 | ms/op: 2.74
May 22 13:42:48 zeno66 robinhood[12124]: STATS |  2: STAGE_GET_INFO_FS    | Wait:     0 | Curr:   6 | Done:   0 | Total: 492933 | ms/op: 2.93
May 22 13:43:48 zeno66 robinhood[12124]: STATS |  2: STAGE_GET_INFO_FS    | Wait:     0 | Curr:   2 | Done:   0 | Total: 554324 | ms/op: 3.10
May 22 13:44:48 zeno66 robinhood[12124]: STATS |  2: STAGE_GET_INFO_FS    | Wait:     0 | Curr:   7 | Done:   0 | Total: 612107 | ms/op: 3.25
May 22 13:45:48 zeno66 robinhood[12124]: STATS |  2: STAGE_GET_INFO_FS    | Wait:     0 | Curr:   6 | Done:   0 | Total: 665969 | ms/op: 3.40
May 22 13:46:48 zeno66 robinhood[12124]: STATS |  2: STAGE_GET_INFO_FS    | Wait:     0 | Curr:   5 | Done:   0 | Total: 720286 | ms/op: 3.52
May 22 13:47:48 zeno66 robinhood[12124]: STATS |  2: STAGE_GET_INFO_FS    | Wait:     0 | Curr:   6 | Done:   0 | Total: 776099 | ms/op: 3.62
May 22 13:48:48 zeno66 robinhood[12124]: STATS |  2: STAGE_GET_INFO_FS    | Wait:     0 | Curr:   6 | Done:   1 | Total: 834779 | ms/op: 3.72
May 22 13:49:48 zeno66 robinhood[12124]: STATS |  2: STAGE_GET_INFO_FS    | Wait:     0 | Curr:   5 | Done:   0 | Total: 879068 | ms/op: 3.90
May 22 13:50:48 zeno66 robinhood[12124]: STATS |  2: STAGE_GET_INFO_FS    | Wait:     0 | Curr:   4 | Done:   0 | Total: 932166 | ms/op: 4.01
May 22 13:51:48 zeno66 robinhood[12124]: STATS |  2: STAGE_GET_INFO_FS    | Wait:     0 | Curr:   5 | Done:   0 | Total: 986457 | ms/op: 4.11
May 22 13:52:48 zeno66 robinhood[12124]: STATS |  2: STAGE_GET_INFO_FS    | Wait:     0 | Curr:   5 | Done:   1 | Total: 1014814 | ms/op: 4.31
...
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;After a little investigation it was clear that the open(&quot;./lustre/fid&quot;, O_RDONLY) call was getting on average slower and slower.  It could take as little as 0.1ms or as long as 15ms from what I observed.  The subsequent GETSTRIPE ioctl() was always fast.&lt;/p&gt;

&lt;p&gt;The Lustre debug logs show that all the time is being spent in do_lookup() because there is never a valid dentry on the client.  That means for every open we attempt to enqueue a lock for this special file.  That lock enqueue fails on the MDT with ELDLM_LOCK_ABORTED but the open still seems to succeed.  To make maters worse the client serials the enqueues because they are all IT_OPEN.&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;00010000:00010000:0.0:1369264019.850032:0:11631:0:(ldlm_lockd.c:1426:ldlm_handle_enqueue0()) ### server-side enqueue handler, sending reply(err=301, rc=0) ns: mdt-zeno-MDT0000_UUID lock: ffff8802f011a000/0xed969b9b44bc7d27 lrc: 1/0,0 mode: --/CR res: 8589934594/1 bits 0x2 rrc: 2 type: IBT flags: 0x0 nid: 172.19.1.68@o2ib199 remote: 0xe4f855be29d7308d expref: 12 pid: 11631 timeout: 0 lvb_type: 0
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;To get a handle on what was going on I wrote a trivial reproducer which just opens and closes a specified file repeatedly.  If you run just a few iterations and grab the Lustre debug logs you can easily see the failing enqueues.&lt;/p&gt;

&lt;p&gt;I suspect if the enqueues were allowed to succeed this would be a non-issue since we would have a valid read lock on the client.  However, I haven&apos;t tested that and it&apos;s not at all clear to me what you guys are planning to do with that .lustre directory.  Perhaps you can propose a fix.&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;#include &amp;lt;stdio.h&amp;gt;
#include &amp;lt;sys/types.h&amp;gt;
#include &amp;lt;sys/stat.h&amp;gt;
#include &amp;lt;fcntl.h&amp;gt;
#include &amp;lt;errno.h&amp;gt;

int
main(int argc, char **argv)
{
        long int start, last;
        int error, fd, i, j, i_max, j_max;
        char *path;

        start = last = time(0);
        path = argv[1];
        i_max = strtol(argv[2], NULL, 10);
        j_max = strtol(argv[3], NULL, 10);

        printf(&quot;file:  %s\n&quot;, path);
        printf(&quot;i_max: %d\n&quot;, i_max);
        printf(&quot;j_max: %d\n&quot;, j_max);

        for (i = 0; i &amp;lt; i_max; i++) {
                printf(&quot;%4d / %4d - %ld sec total, %ld sec delta\n&quot;, i, i_max,
                    time(0) - start, time(0) - last);
                last = time(0);

                for (j = 0; j &amp;lt; j_max; j++) {
                        fd = open(path, O_RDONLY);
                        if (fd == -1) {
                                printf(&quot;open() = %d\n&quot;, errno);
                                return (1);
                        }

                        error = close(fd);
                        if (error) {
                                printf(&quot;close() = %d\n&quot;, errno);
                                return (1);
                        }
                }
        }

        return (0);
}
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</description>
                <environment>Lustre 2.4 client</environment>
        <key id="19114">LU-3381</key>
            <summary>lock enqueue fails for open(&quot;.lustre/fid&quot;, ...)</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="6" iconUrl="https://jira.whamcloud.com/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="2">Won&apos;t Fix</resolution>
                                        <assignee username="jhammond">John Hammond</assignee>
                                    <reporter username="behlendorf">Brian Behlendorf</reporter>
                        <labels>
                            <label>fid</label>
                            <label>llnl</label>
                    </labels>
                <created>Wed, 22 May 2013 23:56:46 +0000</created>
                <updated>Wed, 23 Jun 2021 22:38:27 +0000</updated>
                            <resolved>Wed, 23 Jun 2021 22:38:21 +0000</resolved>
                                    <version>Lustre 2.4.0</version>
                                    <fixVersion>Lustre 2.4.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>8</watches>
                                                                            <comments>
                            <comment id="59184" author="pjones" created="Thu, 23 May 2013 17:32:14 +0000"  >&lt;p&gt;John&lt;/p&gt;

&lt;p&gt;Could you please look into this one?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="59186" author="behlendorf" created="Thu, 23 May 2013 18:42:42 +0000"  >&lt;p&gt;I&apos;m attaching a section from the Lustre server debug log showing how the enqueue for FID_SEQ_DOT_LUSTRE is handled.  According to the log the request is properly handled, the object is successfully looked up, locks are created, the handles are hashed, everything looks great.  Then at the very end of mdt_reint_open() we call mdt_object_open_unlock() which drops the only (and last) reference on the lock causing it to be destroyed.  Therefore when we return back to mdt_intent_reint() there is no longer a valid lock handle and we return success to the client with a ELDLM_LOCK_ABORTED status (lustre/mdt/mdt_handler.c:3834).&lt;/p&gt;

&lt;p&gt;Unfortunately for me, since there&apos;s no documented protocol I&apos;m aware of describing how the Lustre server is supposed to behave in this case it&apos;s hard for me to say how this should be handled.  It&apos;s certainly inefficient since it forces a round trip enqueue RPC for every open of FID_SEQ_DOT_LUSTRE which I&apos;m sure was not intended.  Can we get whomever is familiar with this area of the code to weigh in with the expected behavior (and better yet a fix)?&lt;/p&gt;</comment>
                            <comment id="59196" author="jhammond" created="Thu, 23 May 2013 21:08:06 +0000"  >&lt;p&gt;Brian, to clarify, how is this specific to .lustre/fid? Or are you suggesting that .lustre/fid be handled specially in this case?&lt;/p&gt;

&lt;p&gt;Do you know if there is any reason that robinhood does not just hold .lustre/fid open? (Re your example, I think you&apos;ll see much better performance if you hoist the open out of the two loops...)&lt;/p&gt;</comment>
                            <comment id="59207" author="behlendorf" created="Thu, 23 May 2013 22:30:40 +0000"  >&lt;p&gt;&amp;gt; to clarify, how is this specific to .lustre/fid?&lt;/p&gt;

&lt;p&gt;It may not be.  My concern is that what I expected to happen when opening .lustre/fid was for the first open to enqueue a CR lock on the directory.  Then on all subsequent opens, because we&apos;ve cached a lock on the directory, we shouldn&apos;t need to revalidate the dentry and issue another enqueue to the server.  What I&apos;m seeing is that for every open on that directory (perhaps all directories?) we&apos;re forced to resend an enqueue to revalidate the dentry.  Presumably this is because we were never properly granted the lock.  But the end result is that opening the directory can be slow and we&apos;re generating a LOT of extra RPCs.  Perhaps this is expected behavior for directories, but why that would be isn&apos;t immediately clear to me.&lt;/p&gt;

&lt;p&gt;&amp;gt; Do you know if there is any reason that robinhood does not just hold .lustre/fid open?&lt;/p&gt;

&lt;p&gt;I&apos;ve no idea, I haven&apos;t looked at their code yet.  Perhaps because this is during the scan which might run for many days (or weeks) and they didn&apos;t want to keep it open.  I&apos;m sure they assumed (as I did) that open/closing the directory would be very cheap once we&apos;d initially acquired the lock.  That is the point of the ldlm after all...&lt;/p&gt;</comment>
                            <comment id="59490" author="adilger" created="Tue, 28 May 2013 23:05:02 +0000"  >&lt;p&gt;I think there is potentially a red herring here with the locking.  It is normal Lustre protocol that trying to lock a parent directory for lookup will not return a directory lock, but instead return a lock on the file.  If the client was returned a directory lock during lookup, it would cause a large amount of contention among all of the clients, since the directory lock would be revoked each time another client wants to create a file therein.&lt;/p&gt;

&lt;p&gt;It looks like this problem is actually related to &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-2613&quot; title=&quot;opening and closing file can generate &amp;#39;unreclaimable slab&amp;#39; space&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-2613&quot;&gt;&lt;del&gt;LU-2613&lt;/del&gt;&lt;/a&gt;, which is caused by the client caching the open/close RPCs in memory if it is not doing any other updates to the filesystem.  A simple workaround listed in that bug is to periodically &quot;touch&quot; any file on the node running the scanner, which will cause a new transaction commit and flush the cache.  This might take some time if it hasn&apos;t been done previously, since there will be two saved RPCs in memory for every file that was opened/closed.&lt;/p&gt;</comment>
                            <comment id="84688" author="leibovici-cea" created="Thu, 22 May 2014 09:30:44 +0000"  >&lt;p&gt;&amp;gt; Do you know if there is any reason that robinhood does not just hold .lustre/fid open?&lt;br/&gt;
robinhood does not explicitly open .lustre/fid, but it uses liblustreapi calls that do open it (e.g. llapi_fid2path).&lt;/p&gt;

&lt;p&gt;I don&apos;t like performing the ioctl directly from robinhood code, as this shortcuts the abstraction liblustreapi provides.&lt;/p&gt;

&lt;p&gt;Optimizing .lustre/fid opening would require adding calls to llapi that take the root fd as argument.&lt;br/&gt;
But I don&apos;t like it that much, as the need for root fd in the llapi code is implementation dependant and this may change.&lt;/p&gt;</comment>
                            <comment id="305364" author="jhammond" created="Wed, 23 Jun 2021 22:38:21 +0000"  >&lt;p&gt;This is a bug in RobinHood.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10010">
                    <name>Duplicate</name>
                                            <outwardlinks description="duplicates">
                                        <issuelink>
            <issuekey id="17161">LU-2613</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="12927" name="lu-3381-server.log" size="56794" author="behlendorf" created="Thu, 23 May 2013 19:00:59 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10490" key="com.atlassian.jira.plugin.system.customfieldtypes:datepicker">
                        <customfieldname>End date</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Thu, 26 Jun 2014 23:56:46 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10040" key="com.atlassian.jira.plugin.system.customfieldtypes:labels">
                        <customfieldname>Epic</customfieldname>
                        <customfieldvalues>
                                        <label>client</label>
            <label>metadata</label>
            <label>performance</label>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzvrnj:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>8372</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                        <customfield id="customfield_10493" key="com.atlassian.jira.plugin.system.customfieldtypes:datepicker">
                        <customfieldname>Start date</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Wed, 22 May 2013 23:56:46 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                    </customfields>
    </item>
</channel>
</rss>