<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:49:22 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-12064] Adaptive timeout at_min adjustment &amp; granularity</title>
                <link>https://jira.whamcloud.com/browse/LU-12064</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;The adaptive timeout code currently works on a granularity of full seconds, and ignores timeouts of &quot;0&quot;.&#160; This means the MDS adaptive timeout code doesn&apos;t really adjust the timeouts there.&lt;/p&gt;

&lt;p&gt;This means, for example, the bl_ast timeout stays at the default value of 100 seconds * 1.5 (ldlm_bl_timeout), so, 150 seconds.&lt;/p&gt;

&lt;p&gt;This is a &lt;b&gt;very&lt;/b&gt; long time to wait, and the AT code is supposed to shorten this.&lt;/p&gt;

&lt;p&gt;There are two obvious approaches here.&lt;/p&gt;
&lt;ol&gt;
	&lt;li&gt;Stop ignoring &quot;0&quot; values in the adaptive timeout code, and set a default non-zero at_min (setting it to 1 second should mean no behavioral change, as that&apos;s the current minimum real value).&#160; This solution should be simple and shouldn&apos;t affect existing installs too much.&#160; (configuring at_min is pretty common anyway)&lt;/li&gt;
	&lt;li&gt;Update the adaptive timeout code to use more precise time intervals than 1 second.&lt;/li&gt;
&lt;/ol&gt;


&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;I&apos;m inclined to #1.&#160; But in real configs, at_min is generally recommended to be something like 40 seconds.&#160; So perhaps we should default to that instead.&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;Note specifically in the ldlm_bl_timeout we use the max() of this and ldlm_enqueue_min (default is&#160;OBD_TIMEOUT_DEFAULT, 100 seconds), so we&apos;ll only get down to that value there.&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;A few open questions here.&lt;/p&gt;</description>
                <environment></environment>
        <key id="55145">LU-12064</key>
            <summary>Adaptive timeout at_min adjustment &amp; granularity</summary>
                <type id="4" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11310&amp;avatarType=issuetype">Improvement</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="4" iconUrl="https://jira.whamcloud.com/images/icons/statuses/reopened.png" description="This issue was once resolved, but the resolution was deemed incorrect. From here issues are either marked assigned or resolved.">Reopened</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="wc-triage">WC Triage</assignee>
                                    <reporter username="pfarrell">Patrick Farrell</reporter>
                        <labels>
                    </labels>
                <created>Tue, 12 Mar 2019 21:23:20 +0000</created>
                <updated>Wed, 7 Feb 2024 22:28:46 +0000</updated>
                                                                                <due></due>
                            <votes>0</votes>
                                    <watches>10</watches>
                                                                            <comments>
                            <comment id="243777" author="adilger" created="Tue, 12 Mar 2019 22:37:55 +0000"  >&lt;p&gt;It seems patch &lt;a href=&quot;http://review.whamcloud.com/9336&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/9336&lt;/a&gt; &quot;&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-4942&quot; title=&quot;lock callback timeout is not per-export&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-4942&quot;&gt;&lt;del&gt;LU-4942&lt;/del&gt;&lt;/a&gt; at: per-export lock callback timeout&quot; changed &lt;tt&gt;prolog_timeout()&lt;/tt&gt; significantly:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
-       &lt;span class=&quot;code-keyword&quot;&gt;return&lt;/span&gt; max(at_est2timeout(at_get(&amp;amp;svcpt-&amp;gt;scp_at_estimate)), ldlm_timeout);
+       /* We are in the middle of the process - BL AST is sent, CANCEL
+        * is ahead. Take half of AT + IO process time. /
+       &lt;span class=&quot;code-keyword&quot;&gt;return&lt;/span&gt; at_est2timeout(at_get(&amp;amp;svcpt-&amp;gt;scp_at_estimate)) +
+               (ldlm_bl_timeout(lock) &amp;gt;&amp;gt; 1);
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;which I&apos;m not sure I agree with.  Definitely if a client is responsive and sending IO it should be allowed to complete, but there should be a shorter timeout for the initial AST if the client is not responsive.&lt;/p&gt;

&lt;p&gt;It looks like there was an incremental smearing of logic over several patches.  Initially, &lt;tt&gt;ldlm&amp;#95;get&amp;#95;rq&amp;#95;timeout()&lt;/tt&gt; returned &lt;tt&gt;min(ldlm&amp;#95;timeout, obd&amp;#95;timeout / 3)&lt;/tt&gt;, which seems reasonable - we want to allow a timeout and a retry before evicting a client.  then AT came in and disabled this if AT_OFF, in favor of &lt;tt&gt;ptlrpc&amp;#95;at&amp;#95;set&amp;#95;req&amp;#95;timeout()&lt;/tt&gt; set when the request is allocated, which uses only &lt;tt&gt;obd_timeout&lt;/tt&gt; internally.   We currently don&apos;t use &lt;tt&gt;ldlm&amp;#95;timeout&lt;/tt&gt; anywhere in the code when AT is enabled.&lt;/p&gt;

&lt;p&gt;The &lt;tt&gt;ldlm&amp;#95;server&amp;#95;blocking&amp;#95;ast()&lt;/tt&gt; code should not use &lt;tt&gt;ldlm&amp;#95;bl&amp;#95;timeout()&lt;/tt&gt; for the initial BL AST reply timeout, since we don&apos;t know at this point if the client is responsive or not, but rather something like &lt;tt&gt;max(ldlm&amp;#95;timeout, at&amp;#95;est2timeout(at&amp;#95;get(&amp;amp;lock&amp;#45;&amp;gt;l&amp;#95;export&amp;#45;&amp;gt;exp&amp;#95;bl&amp;#95;lock&amp;#95;at)))&lt;/tt&gt;.  This is OK for later bulk IO timeouts when doing &lt;tt&gt;prolong&amp;#95;timeout()&lt;/tt&gt; after we know the client has replied to the initial blocking AST and is busy doing writes under the lock.&lt;/p&gt;</comment>
                            <comment id="243944" author="adilger" created="Thu, 14 Mar 2019 18:11:31 +0000"  >&lt;p&gt;In any case, Patrick I agree that setting at_min = 1 by default and allowing zero elapsed time replies makes sense and is relatively easy and low risk to implement. &lt;/p&gt;</comment>
                            <comment id="308836" author="adilger" created="Thu, 29 Jul 2021 19:40:13 +0000"  >&lt;p&gt;What else would be useful here is to tune &lt;tt&gt;at_min&lt;/tt&gt; as a function of the number of clients connected to the servers.  For systems with ~200 clients, having &lt;tt&gt;at_min=15&lt;/tt&gt; is typical, and with ~1500 clients &lt;tt&gt;at_min=30&lt;/tt&gt; is typical, so a function like the following seems reasonable:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
        at_min = ilog(num_clients) * 3;
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;though any explicitly-specified &lt;tt&gt;at_min&lt;/tt&gt; value should take precedence.  That probably means having a separate flag that indicates whether at_min has been explicitly set or not, and otherwise recalculating it when clients connect and disconnect.&lt;/p&gt;</comment>
                            <comment id="348182" author="adilger" created="Wed, 28 Sep 2022 14:07:44 +0000"  >&lt;p&gt;Presentation from CEA set &lt;tt&gt;at_min=55&lt;/tt&gt; for a cluster with 20,000 clients,  so this would also match approximately the &lt;tt&gt;ilog(num_clients) &amp;#42; 3&lt;/tt&gt; formula (&lt;tt&gt;16 &amp;#42; 3 = 48&lt;/tt&gt;).&lt;/p&gt;</comment>
                            <comment id="348190" author="delbaryg" created="Wed, 28 Sep 2022 15:37:23 +0000"  >&lt;p&gt;The main concern about at_min is that if you specify a value inferior to your lnet transaction timeout (modulo the number of lnet retry if it is setup...) the propabilty to flood your lnet networks increases drastically because an rpc not acknowledge by server will be retransmitted by client (not directly but through an high priority rpc) regarding the at_min value. I don&apos;t remember by heart the piece of code on client side but it is what we have observed. Anyway in lustre others timeouts like ldlm_timeout, obd_timeout are generally initialized in the code with constant value higher than default transaction timeout. On what we have seen on large scale we could set a default at_min value to &lt;tt&gt;lnet_transaction_timeout+1&lt;/tt&gt; (for non routing configuration) or &lt;tt&gt;max((lnet_transaction_timeout+1),ilog(num_clients) * 3)&lt;/tt&gt;. I think we have to rely on under layers. Does it make sense?&lt;/p&gt;</comment>
                            <comment id="348426" author="adilger" created="Fri, 30 Sep 2022 21:33:53 +0000"  >&lt;p&gt;I think one of the current issues is that &lt;tt&gt;at_min=0&lt;/tt&gt; allows the ping flood to happen.  If there is &lt;tt&gt;at_min &amp;gt; 0&lt;/tt&gt; it would significantly reduce the flood.  Having a reasonable &lt;tt&gt;at_min&lt;/tt&gt; for the cluster size will help significantly.&lt;/p&gt;</comment>
                            <comment id="359814" author="adilger" created="Fri, 20 Jan 2023 05:41:14 +0000"  >&lt;p&gt;With ever-increasing core counts on the client, it makes sense to scale the number of &quot;clients&quot; by &lt;tt&gt;max_mod_rpcs_in_flight&lt;/tt&gt; when computing &lt;tt&gt;at_min&lt;/tt&gt; so that a multi-threaded workload on a smaller number of clients is handled similarly to a larger number of clients with &lt;tt&gt;max_mod_rpcs_in_flight=1&lt;/tt&gt;.   With the default &lt;tt&gt;max_mod_rpcs_in_flight=8&lt;/tt&gt; this would be a multiplier of 3 to the calculated &lt;tt&gt;at_min&lt;/tt&gt; value.&lt;/p&gt;</comment>
                            <comment id="369189" author="gerrit" created="Wed, 12 Apr 2023 04:31:44 +0000"  >&lt;p&gt;&quot;Andreas Dilger &amp;lt;adilger@whamcloud.com&amp;gt;&quot; uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/c/fs/lustre-release/+/50609&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/c/fs/lustre-release/+/50609&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-12064&quot; title=&quot;Adaptive timeout at_min adjustment &amp;amp; granularity&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-12064&quot;&gt;LU-12064&lt;/a&gt; ptlrpc: set at_min=5 by default&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 1805e4dfa8fcba712b5fa3868f26988e7635dbcb&lt;/p&gt;</comment>
                            <comment id="384889" author="gerrit" created="Wed, 6 Sep 2023 06:15:13 +0000"  >&lt;p&gt;&quot;Oleg Drokin &amp;lt;green@whamcloud.com&amp;gt;&quot; merged in patch &lt;a href=&quot;https://review.whamcloud.com/c/fs/lustre-release/+/50609/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/c/fs/lustre-release/+/50609/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-12064&quot; title=&quot;Adaptive timeout at_min adjustment &amp;amp; granularity&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-12064&quot;&gt;LU-12064&lt;/a&gt; ptlrpc: set at_min=5 by default&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 46804c2230cc0f72d4472cddd5a37456e1f2fb00&lt;/p&gt;</comment>
                            <comment id="384967" author="pjones" created="Wed, 6 Sep 2023 13:06:59 +0000"  >&lt;p&gt;Landed for 2.16&lt;/p&gt;</comment>
                            <comment id="385835" author="adilger" created="Wed, 13 Sep 2023 17:29:20 +0000"  >&lt;p&gt;The patch that landed is only increasing at_min to a reasonable minimum value.  The work to implement a dynamic at_min/at_max based on the number of connected clients has not been done.&lt;/p&gt;</comment>
                            <comment id="385849" author="adilger" created="Wed, 13 Sep 2023 18:15:57 +0000"  >&lt;p&gt;One proposal that might help here (and in other places) is for the servers to persistently track the maximum number of connected clients, so that the MDS/OSS knows after a restart how many clients might connect and can set at_min to an appropriate value right from the start.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="24327">LU-4942</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="67226">LU-15246</issuekey>
        </issuelink>
                            </outwardlinks>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="54949">LU-11989</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="80710">LU-17514</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="47510" name="LAD2022-Scaling_Up_to_20k_Client-Cedeyn_Delbary.pdf" size="736851" author="adilger" created="Sat, 24 Dec 2022 02:04:13 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i00d8v:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                </customfields>
    </item>
</channel>
</rss>