<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:27:34 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-2713] limit HSM RPC count from client</title>
                <link>https://jira.whamcloud.com/browse/LU-2713</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;The client-side HSM coordinator patches in &lt;a href=&quot;http://review.whamcloud.com/5029&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/5029&lt;/a&gt; and &lt;a href=&quot;http://review.whamcloud.com/5030&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/5030&lt;/a&gt; were landed, but Oleg realized that there are no client-side limits on the number of concurrent RPCs that can be sent.&lt;/p&gt;

&lt;p&gt;This could potentially overwhelm the MDS service threads and block all other requests if they become blocked handling HSM requests, or if they are not being processed very quickly.&lt;/p&gt;

&lt;p&gt;Please institue a client-side RPC limit, like cl_max_rpcs_in_flight, but for HSM requests, that introduces some reasonable limit.&lt;/p&gt;

&lt;p&gt;The ticket is assigned to Jinshan, but only because we cannot currently assign it to someone external.&lt;/p&gt;</description>
                <environment></environment>
        <key id="17373">LU-2713</key>
            <summary>limit HSM RPC count from client</summary>
                <type id="7" iconUrl="https://jira.whamcloud.com/images/icons/issuetypes/task_agile.png">Technical task</type>
                            <parent id="16195">LU-2061</parent>
                                    <priority id="1" iconUrl="https://jira.whamcloud.com/images/icons/priorities/blocker.svg">Blocker</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="jhammond">John Hammond</assignee>
                                    <reporter username="adilger">Andreas Dilger</reporter>
                        <labels>
                            <label>MB</label>
                    </labels>
                <created>Wed, 30 Jan 2013 18:00:08 +0000</created>
                <updated>Wed, 13 Mar 2013 08:57:16 +0000</updated>
                            <resolved>Wed, 13 Mar 2013 08:57:16 +0000</resolved>
                                    <version>Lustre 2.4.0</version>
                                    <fixVersion>Lustre 2.4.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>7</watches>
                                                                            <comments>
                            <comment id="52181" author="jcl" created="Mon, 11 Feb 2013 18:46:09 +0000"  >&lt;p&gt;I will work on a patch&lt;/p&gt;</comment>
                            <comment id="53356" author="jhammond" created="Tue, 5 Mar 2013 14:16:19 +0000"  >&lt;p&gt;Can the client side maximum include 0 as a possible value (or even a default, unless root)? Otherwise, a malicious/accident-prone user can simply issue HSM RPCs from multiple clients: &quot;Hmm, login1 seems wedged. I think I&apos;ll kill my ssh session and try this again on login2.&quot;&lt;/p&gt;

&lt;p&gt;Would an MDT side limit be better?&lt;/p&gt;</comment>
                            <comment id="53385" author="jcl" created="Tue, 5 Mar 2013 15:23:04 +0000"  >&lt;p&gt;As I understand the limit comes from the MDT capacity to receive RPC request, so an MDT side is better but if the MDT had to count the requests it will already have received them so too late. The client side is a simple way to limit the load.&lt;/p&gt;

&lt;p&gt;Do you confirm you work on a patch (so I will not prepare one)  &lt;/p&gt;</comment>
                            <comment id="53392" author="jhammond" created="Tue, 5 Mar 2013 16:51:59 +0000"  >&lt;p&gt;I was proposing that the MDT keep a semaphore (as with cl_max_rpcs_in_flight) but that it do a non blocking down. If the semaphore would block then it returns -EAGAIN to the client. Then the client must wait and retry.&lt;/p&gt;

&lt;p&gt;I understood that processing some HSM requests would put the MDT thread to sleep until the coordinator responded. Is that correct? I have only seen the stubbed out version of mdt_hsm.c. Will any of these handlers every have to wait for tape?&lt;/p&gt;

&lt;p&gt;In either case (waiting on the coordinator or waiting on tape) I think it must be handled as an unbounded wait by Lustre.&lt;/p&gt;

&lt;p&gt;I confirm that I will work on a patch.&lt;/p&gt;</comment>
                            <comment id="53403" author="jcl" created="Tue, 5 Mar 2013 20:30:59 +0000"  >&lt;p&gt;HSM request are not blocking, they just record something to do on the MDT and the restore/archive is done asynchronously by coordinator. We the use of EAGAIN the only risk is to have slow clients which are never served because fast one are always taking the slots. We need a way to be sure all the clients are doing progress in their call list&lt;/p&gt;</comment>
                            <comment id="53441" author="adilger" created="Wed, 6 Mar 2013 10:21:01 +0000"  >&lt;p&gt;John, the current RPC throttling mechanism for OSC and MDC RPCs is on the client. While this is not ideal, the problem is indeed that if the server has seen the request that it is too late to throttle it.&lt;/p&gt;

&lt;p&gt;At this stage, we&apos;re just looking for an equivalent to max_rpcs_in_flight for the HSM requests, so they do not overwhelm the server.&lt;/p&gt;</comment>
                            <comment id="53457" author="jhammond" created="Wed, 6 Mar 2013 13:26:18 +0000"  >&lt;p&gt;OK, thanks for the clarification.&lt;/p&gt;

&lt;p&gt;Please see &lt;a href=&quot;http://review.whamcloud.com/5616&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/5616&lt;/a&gt;.&lt;/p&gt;</comment>
                            <comment id="53903" author="pjones" created="Wed, 13 Mar 2013 08:57:16 +0000"  >&lt;p&gt;Landed for 2.4&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="17836">LU-2949</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzvi9j:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>6608</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                </customfields>
    </item>
</channel>
</rss>