<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:33:57 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-3443] performance impact of mdc_rpc_lock serialization</title>
                <link>https://jira.whamcloud.com/browse/LU-3443</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>
&lt;p&gt;Serialization of in-flight RPCs on mdc_rpc_lock makes non-modifying operations, such as calling &lt;tt&gt;open()&lt;/tt&gt; on a directory, vulnerable to blocking due to a slow backend MDS filesystem.  In particular, users may see long delays running &lt;tt&gt;ls&lt;/tt&gt; when &lt;tt&gt;LDLM_ENQUEUE&lt;/tt&gt; requests get blocked behind long-lived metadata-modifying &lt;tt&gt;MDS_REINT&lt;/tt&gt; requests. &lt;/p&gt;

&lt;p&gt;For example, in &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-3442&quot; title=&quot;MDS performance degraded by reading of ZFS spacemaps&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-3442&quot;&gt;&lt;del&gt;LU-3442&lt;/del&gt;&lt;/a&gt;, RPCs involving writing llog records experienced long service times due to a misbehaving backend filesystem.  Therefore RPCs for operations like &lt;tt&gt;create()&lt;/tt&gt;, &lt;tt&gt;unlink()&lt;/tt&gt; and &lt;tt&gt;rename()&lt;/tt&gt; would stay in flight for many seconds on the client.  Unfortunately, these long-lived in-flight RPCs prevent &lt;tt&gt;LDLM_ENQUEUE&lt;/tt&gt; requests for &lt;tt&gt;open()&lt;/tt&gt; on a directory from being sent, due to the &lt;tt&gt;mdc_get_rpc_lock()&lt;/tt&gt;  call in &lt;tt&gt;mdc_enqueue()&lt;/tt&gt;.  Once issued, the LDLM_ENQUEUE request completes almost immediately since it doesn&apos;t involve synchronous I/O on the backend.  It would be desirable if such non-modifying operations could be shielded from the effects of slow synchronous operations. &lt;/p&gt;

&lt;p&gt;To that end, it would be helpful to clarify what the mdc_rpc_lock is protecting.  &lt;tt&gt;mdc_enqueue()&lt;/tt&gt; has this to say: &lt;/p&gt;

&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;812         /* It is important to obtain rpc_lock first (&lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; applicable), so that
813          * threads that are serialised with rpc_lock are not polluting our
814          * rpcs in flight counter. We &lt;span class=&quot;code-keyword&quot;&gt;do&lt;/span&gt; not &lt;span class=&quot;code-keyword&quot;&gt;do&lt;/span&gt; flock request limiting, though*/
815         &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (it) {
816                 mdc_get_rpc_lock(obddev-&amp;gt;u.cli.cl_rpc_lock, it);
817                 rc = mdc_enter_request(&amp;amp;obddev-&amp;gt;u.cli);
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;but it&apos;s not clear to me what is meant by &quot;polluting&quot;, and why the counter can&apos;t be protected by a separate lock that need no be held across the entire network request.&lt;br/&gt;
I also observe that the in-flight RPC counter for an OBD import rarely exceeds 1 or 2, and never approaches the upper limit of 8.  So it seems we are not doing a good job of keeping a full pipeline of in-flight RPCs. &lt;/p&gt;

&lt;p&gt;LLNL-bug-id: TOSS-2084&lt;/p&gt;</description>
                <environment></environment>
        <key id="19312">LU-3443</key>
            <summary>performance impact of mdc_rpc_lock serialization</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="bzzz">Alex Zhuravlev</assignee>
                                    <reporter username="nedbass">Ned Bass</reporter>
                        <labels>
                            <label>performance</label>
                    </labels>
                <created>Thu, 6 Jun 2013 23:00:07 +0000</created>
                <updated>Wed, 16 Oct 2013 01:37:23 +0000</updated>
                            <resolved>Wed, 16 Oct 2013 01:37:23 +0000</resolved>
                                    <version>Lustre 2.4.0</version>
                    <version>Lustre 2.1.4</version>
                                    <fixVersion>Lustre 2.5.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>7</watches>
                                                                            <comments>
                            <comment id="60147" author="bzzz" created="Fri, 7 Jun 2013 06:13:11 +0000"  >&lt;p&gt;this semaphore is needed to implement &quot;execute-once&quot; semantics on MDS. MDS remembers the last result/XID for every client in last_rcvd file, so if the client hasn&apos;t got a reply, it can just resend the request and MDS will reconstruct the reply being aware the request has been already executed.&lt;br/&gt;
if we don&apos;t have this semaphore, then concurrent request will rewrite that information in last_rcvd and MDS won&apos;t be able to detect execution status, which is pretty wrong in the number of cases (as many metadata requests are not idempotent).&lt;/p&gt;

&lt;p&gt;this is a known issue we&apos;re going to address with multislot last_rcvd at some point. it&apos;ll likely request small changes to the protocol so that the client can say which specific slot to use like NFSv4 does (if I remember correctly).&lt;/p&gt;</comment>
                            <comment id="60180" author="adilger" created="Fri, 7 Jun 2013 16:22:01 +0000"  >&lt;p&gt;Ned, in &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-2613&quot; title=&quot;opening and closing file can generate &amp;#39;unreclaimable slab&amp;#39; space&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-2613&quot;&gt;&lt;del&gt;LU-2613&lt;/del&gt;&lt;/a&gt; there is work being done to avoid allocating a new transno to the open request from the client.  It might also be possible to send multiple open requests from the client in parallel if they do not have the O_CREAT flag set. However, Alex&apos;s point is also valid - the open requests still change the state of the MDS in memory. If the client resends the open request and the MDS has already opened it and does not detect this correctly, then it will leak an open refcount and the file will not be closed or deleted when it is unlinked. &lt;/p&gt;

&lt;p&gt;It might be possible to detect this based on the handle used for the open, but that would further increase the complexity of the recovery process, and only fix the non-O_CREAT open() case. The generic solution for allowing multiple modifying RPCs in flight is to allow multiple slots on the last_rcvd file for a single client.&lt;/p&gt;

&lt;p&gt;The exact mechanism for this has not been designed at this point, and is not on any development roadmap.  To be honest, given the glacial speed of the OpenSFS RFP process, it is unclear how any feature development projects will be started in the foreseeable future. &lt;/p&gt;</comment>
                            <comment id="60190" author="nedbass" created="Fri, 7 Jun 2013 18:29:03 +0000"  >&lt;p&gt;Alex, Andreas, thank you for filling in the details here.  It sounds like there&apos;s nothing to do in the short term for this issue.  Though if you are amenable, I&apos;ll submit a patch adding a comment block above the definition of &lt;tt&gt;struct mdc_rpc_lock&lt;/tt&gt; to capture some of the useful information here.&lt;/p&gt;</comment>
                            <comment id="60192" author="nedbass" created="Fri, 7 Jun 2013 19:36:11 +0000"  >&lt;p&gt;Since it&apos;s just a comment, I&apos;ll post it for review here first, to avoid wasting time for hudson/maloo.&lt;/p&gt;

&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;--- a/lustre/include/lustre_mdc.h
+++ b/lustre/include/lustre_mdc.h
@@ -69,9 +69,27 @@ struct obd_export;
 struct ptlrpc_request;
 struct obd_device;
 
+/**
+ * Serializes in-flight MDT-modifying RPC requests to preserve idempotency.
+ *
+ * This mutex is used to implement execute-once semantics on the MDT.
+ * The MDT stores the last transaction ID and result &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; every client in
+ * its last_rcvd file. If the client doesn&apos;t get a reply, it can safely
+ * resend the request and the MDT will reconstruct the reply being aware
+ * that the request has already been executed. Without &lt;span class=&quot;code-keyword&quot;&gt;this&lt;/span&gt; lock,
+ * execution status of concurrent in-flight requests would be
+ * overwritten.
+ *
+ * This design limits the extent to which we can keep a full pipeline of
+ * in-flight requests from a single client.  This limitation could be
+ * overcome by allowing multiple slots per client in the last_rcvd file.
+ */
 struct mdc_rpc_lock {
+       &lt;span class=&quot;code-comment&quot;&gt;/** Lock protecting in-flight RPC concurrency. */&lt;/span&gt;
        struct mutex            rpcl_mutex;
+       &lt;span class=&quot;code-comment&quot;&gt;/** Intent associated with currently executing request. */&lt;/span&gt;
        struct lookup_intent    *rpcl_it;
+       &lt;span class=&quot;code-comment&quot;&gt;/** Used &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; MDS/RPC load testing purposes. */&lt;/span&gt;
        &lt;span class=&quot;code-object&quot;&gt;int&lt;/span&gt;                     rpcl_fakes;
 };

&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="60193" author="prakash" created="Fri, 7 Jun 2013 19:36:26 +0000"  >&lt;blockquote&gt;
&lt;p&gt;To be honest, given the glacial speed of the OpenSFS RFP process, it is unclear how any feature development projects will be started in the foreseeable future.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;Should I open a new ticket for this? &lt;img class=&quot;emoticon&quot; src=&quot;https://jira.whamcloud.com/images/icons/emoticons/wink.png&quot; height=&quot;16&quot; width=&quot;16&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/p&gt;</comment>
                            <comment id="60205" author="nedbass" created="Fri, 7 Jun 2013 23:25:13 +0000"  >&lt;p&gt;Patch to add above comment:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;http://review.whamcloud.com/6593&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/6593&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="69062" author="jlevi" created="Wed, 16 Oct 2013 01:37:23 +0000"  >&lt;p&gt;Patch landed to Master. Please let me know if more work is needed in this ticket and I will reopen.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="19309">LU-3442</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzvsvb:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>8584</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>