<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:07:04 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-7224] Tunning ko2iblnd for Large clustre</title>
                <link>https://jira.whamcloud.com/browse/LU-7224</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;We have large cluster&lt;br/&gt;
  10K nodes on main subnet&lt;br/&gt;
  2K nodes behind 12 routers on second subnet&lt;br/&gt;
  100 OSSess with 5 filesystem&lt;/p&gt;

&lt;p&gt;We need your best recommendations for ko2iblnd tunning such as peer_credits, credits, ntx, etc.&lt;/p&gt;
</description>
                <environment>lustre 2.5.3</environment>
        <key id="32366">LU-7224</key>
            <summary>Tunning ko2iblnd for Large clustre</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="10000">Done</resolution>
                                        <assignee username="doug">Doug Oucharek</assignee>
                                    <reporter username="mhanafi">Mahmoud Hanafi</reporter>
                        <labels>
                    </labels>
                <created>Mon, 28 Sep 2015 22:25:09 +0000</created>
                <updated>Sat, 16 Apr 2016 01:59:52 +0000</updated>
                            <resolved>Sat, 16 Apr 2016 01:59:52 +0000</resolved>
                                                                        <due></due>
                            <votes>0</votes>
                                    <watches>9</watches>
                                                                            <comments>
                            <comment id="128759" author="pjones" created="Tue, 29 Sep 2015 17:17:59 +0000"  >&lt;p&gt;Hi Mahmoud&lt;/p&gt;

&lt;p&gt;There is definitely an art to getting these values set appropriately. The basic effects of the settings are detailed in the ops manual but I understand that there can be surprising results when these settings are made in relation to each other. I will ask around to see what experiential knowledge people can share that might help you in this process&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="132413" author="doug" created="Tue, 3 Nov 2015 00:26:12 +0000"  >&lt;p&gt;As I understand it, this ticket is related to the problems being found in ticket &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-7054&quot; title=&quot;ib_cm scalling issue when lustre clients connect to OSS&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-7054&quot;&gt;&lt;del&gt;LU-7054&lt;/del&gt;&lt;/a&gt; and others.  My interpretation is this:&lt;/p&gt;

&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;a large number of clients are &quot;aggressively&quot; firing RPCs at the OSSs.  By aggressive, I mean there is a lot of parallelism from each client potentially maximizing the number of peer credits being used from the client side.&lt;/li&gt;
	&lt;li&gt;the OSSs are struggling with the load specifically in the area of TX buffer management/allocation.&lt;/li&gt;
	&lt;li&gt;when running low on memory, or having fragmented memory, the OSSs can freeze during memory allocation calls for TX buffers.&lt;/li&gt;
	&lt;li&gt;these freezes create evictions.&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;Please confirm is my understanding is incorrect.&lt;/p&gt;

&lt;p&gt;So, two things need to be tuned for:&lt;/p&gt;

&lt;p&gt;1- Resources on the OSSs to make it easier to accommodate the load.&lt;br/&gt;
2- Traffic shaping from the clients to create back pressure on the application should it get too aggressive in accessing the file system.&lt;/p&gt;

&lt;p&gt;The traffic shaping is managed by &quot;lowering&quot; the peer credit value on the clients.  This reduces the number of outstanding messages to any given OSS from any given client.  However, you can also change the max_rpc_in_flight parameter (Lustre, not LNet, parameter) to management how many operations can be outstanding at any given time.  This is the better parameter to change as, unlike peer_credits, it does not have to be the same on two peers communicating.&lt;/p&gt;

&lt;p&gt;Lowering max_rpc_in_flight keeps the door open for increasing peer_credits.  You may want to do this on the OSSs so they are not holding back sending out responses.  But, as that parameter currently needs to be the same on all nodes, you would need to increase it on the clients as well as the OSSs.  max_rpc_in_flight will make sure the clients don&apos;t make use of the higher peer_credits value but the OSSs do.&lt;/p&gt;

&lt;p&gt;FMR will probably not help much here.  I looked at the code for FMR and do not see it using any less memory than regular buffers.  In fact, it may use a little more.  FMR helps out when using Truescale IB cards and when dealing with high latency networks (like a WAN).  If using Mellanox over a LAN, you should not see much benefit from FMR.&lt;/p&gt;

&lt;p&gt;With regards to the resources on the OSSs, memory seems to be the key one here.  The TX pool allocation system returns the pools back to the system after 300 seconds.  To avoid this, it is good to allocate a very large initial TX pool by setting a very high NTX value.  This initial pool is never returned back to the system so having a large pool means we don&apos;t needs to spend any time in memory allocation/deallocation routines.  Of course, to have a large TX pool also means having a lot of physical memory in the OSSs so they can accommodate so much pre-allocated buffers.&lt;/p&gt;

&lt;p&gt;So, in summary, I am recommending:&lt;/p&gt;

&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;Increase the NTX value on the servers&lt;/li&gt;
	&lt;li&gt;Increase the peer_credits on all systems&lt;/li&gt;
	&lt;li&gt;Reduce the max_rpc_in_flight on the clients to ensure they do not get &quot;too aggressive&quot;&lt;/li&gt;
&lt;/ul&gt;
</comment>
                            <comment id="148498" author="mhanafi" created="Mon, 11 Apr 2016 20:37:32 +0000"  >&lt;p&gt;We can close this issue.&lt;/p&gt;</comment>
                            <comment id="149175" author="jfc" created="Sat, 16 Apr 2016 01:59:52 +0000"  >&lt;p&gt;Thanks Mahmoud.&lt;br/&gt;
~ jfc.&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzxoy7:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>