<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:11:58 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-947] ptlrpc dynamic service thread count handling</title>
                <link>https://jira.whamcloud.com/browse/LU-947</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;It should be possible to dynamically tune the number of ptlrpc threads at runtime for testing purposes.  Currently it is possible to increase the maximum thread count, but it is not possible to stop threads that are already running.&lt;/p&gt;

&lt;p&gt;This was being worked on in bug 22417:&lt;br/&gt;
&lt;a href=&quot;https://bugzilla.lustre.org/attachment.cgi?id=32351&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://bugzilla.lustre.org/attachment.cgi?id=32351&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;and later enhanced in bug 22516:&lt;br/&gt;
&lt;a href=&quot;https://bugzilla.lustre.org/attachment.cgi?id=32510&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://bugzilla.lustre.org/attachment.cgi?id=32510&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The later patch includes code to dynamically tune the thread counts based on the threads in use over the past several minutes.&lt;/p&gt;</description>
                <environment></environment>
        <key id="12730">LU-947</key>
            <summary>ptlrpc dynamic service thread count handling</summary>
                <type id="4" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11310&amp;avatarType=issuetype">Improvement</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="adilger">Andreas Dilger</assignee>
                                    <reporter username="adilger">Andreas Dilger</reporter>
                        <labels>
                    </labels>
                <created>Tue, 20 Dec 2011 12:50:33 +0000</created>
                <updated>Fri, 24 Nov 2023 07:35:45 +0000</updated>
                            <resolved>Tue, 17 Mar 2020 03:34:37 +0000</resolved>
                                                    <fixVersion>Lustre 2.13.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>4</watches>
                                                                            <comments>
                            <comment id="243514" author="adilger" created="Thu, 7 Mar 2019 23:30:12 +0000"  >&lt;p&gt;We also want to reduce the default maximum number of service threads, as this is typically too high for most systems.&lt;/p&gt;

&lt;p&gt;Chris, can you please provide details about which threads should be reduced, and what the preferred thread count is.&lt;/p&gt;</comment>
                            <comment id="243676" author="chunteraa" created="Mon, 11 Mar 2019 18:52:52 +0000"  >&lt;p&gt;For systems with large number of clients (ie. ~1000) we find the max number of OSS &amp;amp; MDS threads too high. This causes high system load &amp;amp; lost network connections. Too many service threads particularly impacts Ethernet &amp;amp; OPA since they have higer CPU utilization/system load.&lt;/p&gt;

&lt;p&gt;AFAIK the current default is max 512 threads. We usually set fixed values &lt;em&gt;mds_num_threads=256&lt;/em&gt; and &lt;em&gt;oss_num_threads=256&lt;/em&gt; via module options (ie. half default). To set kernel module options we have to stop lustre &amp;amp; reload lustre modules. For virtual machines with limited CPU cores we often use smaller values.&lt;/p&gt;

&lt;p&gt;We also tested tuneable &lt;em&gt;ost.OSS.ost_io.threads_max&lt;/em&gt;. I believe we also have to reload lustre modules to set this parameters.&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;</comment>
                            <comment id="243711" author="adilger" created="Tue, 12 Mar 2019 01:07:16 +0000"  >&lt;p&gt;Could you comment on the core count for 256 threads vs. smaller systems?  I&apos;m wondering if that could be made automatic?  &lt;/p&gt;

&lt;p&gt;The &lt;tt&gt;threads_max&lt;/tt&gt; parameter can currently be increased to allow more threads to be started, if needed, but decreasing it does not stop the threads. I&apos;m just lookin my at the code to determine if this is practical to change (it was previously not with the &quot;obdfilter&quot; code, but it seems the &quot;ofd&quot; code does not suffer the same limitations. &lt;/p&gt;</comment>
                            <comment id="243754" author="adilger" created="Tue, 12 Mar 2019 19:21:14 +0000"  >&lt;p&gt;Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/34400&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/34400&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-947&quot; title=&quot;ptlrpc dynamic service thread count handling&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-947&quot;&gt;&lt;del&gt;LU-947&lt;/del&gt;&lt;/a&gt; ptlrpc: stop threads if more than threads_max&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 039e8c72f5ace8b7d9b6b1d3ec543f1dbf95335b&lt;/p&gt;</comment>
                            <comment id="243761" author="adilger" created="Tue, 12 Mar 2019 19:46:57 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/ViewProfile.jspa?name=chunteraa&quot; class=&quot;user-hover&quot; rel=&quot;chunteraa&quot;&gt;chunteraa&lt;/a&gt; it looks like &lt;tt&gt;oss&amp;#95;max&amp;#95;threads=512&lt;/tt&gt; and &lt;tt&gt;MDS&amp;#95;NTHRS&amp;#95;MAX=1024&lt;/tt&gt; in the code. The &lt;tt&gt;oss&amp;#95;max&amp;#95;threads&lt;/tt&gt; upper limit is tunable since commit &lt;tt&gt;v2&amp;#95;8&amp;#95;50&amp;#95;0&amp;#45;44&amp;#45;gaa84d18864&lt;/tt&gt;, but &lt;tt&gt;MDS&amp;#95;NTHRS&amp;#95;MAX=1024&lt;/tt&gt; is fixed. It seems that rather than setting &lt;tt&gt;oss&amp;#95;&lt;b&gt;num&lt;/b&gt;&amp;#95;threads&lt;/tt&gt; and &lt;tt&gt;mds&amp;#95;&lt;b&gt;num&lt;/b&gt;&amp;#95;threads&lt;/tt&gt; (which sets the minimum, maximum, and number of threads started) it might be better to set &lt;tt&gt;oss&amp;#95;&lt;b&gt;max&lt;/b&gt;&amp;#95;threads=256&lt;/tt&gt; which sets the upper limit of threads (and add a tunable &lt;tt&gt;mds&amp;#95;&lt;b&gt;max&lt;/b&gt;&amp;#95;threads&lt;/tt&gt; also), but this allows a system to start fewer threads if more are not needed (e.g. only few clients).&lt;/p&gt;

&lt;p&gt;It looks like &lt;tt&gt;LDLM&amp;#95;NTHRS&amp;#95;MAX&lt;/tt&gt; is already somewhat dependent on the number of cores (&lt;tt&gt;num&amp;#95;online&amp;#95;cpus() == 1 ? 64 : 128&lt;/tt&gt;), but this is probably a holdover from days gone by, or maybe single-core VMs?&#160; It does show that it is possible to auto-tune based on the core count, however.&lt;/p&gt;</comment>
                            <comment id="243767" author="chunteraa" created="Tue, 12 Mar 2019 20:56:11 +0000"  >&lt;blockquote&gt;&lt;p&gt;Could you comment on the core count for 256 threads vs. smaller systems? I&apos;m wondering if that could be made automatic?&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;example VM environments:&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;6 cpu core 32G memory: oss_num_threads=192 or mds_num_threads=128; we also used ost.OSS.ost_io.threads_max=150 when there are many disks installed.&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;&#160;&lt;/p&gt;

&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;16 cpu core 90G memory: oss_num_threads=256 or mds_num_threads=192&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;&#160;&lt;/p&gt;</comment>
                            <comment id="243906" author="gerrit" created="Thu, 14 Mar 2019 08:21:21 +0000"  >&lt;p&gt;Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/34418&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/34418&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-947&quot; title=&quot;ptlrpc dynamic service thread count handling&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-947&quot;&gt;&lt;del&gt;LU-947&lt;/del&gt;&lt;/a&gt; ptlrpc: reduce default MDS/OSS thread count&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 52ebfd503a17fac1a02cb63a8807d3d289a1c853&lt;/p&gt;</comment>
                            <comment id="243923" author="chunteraa" created="Thu, 14 Mar 2019 14:03:11 +0000"  >&lt;p&gt;A general rule of mdt_threads_max=16*num_cpus has been good starting point, with caveats if processor hyperthreading is enabled.&lt;/p&gt;

&lt;p&gt;For ost &amp;amp; ost_io threads , the &lt;a href=&quot;http://doc.lustre.org/lustre_manual.xhtml#dbdoclet.50438272_55226&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;LOM states&lt;/a&gt; &quot;You may want to start with a number of OST threads equal to the number of actual disk spindles on the node. &quot; That makes sense for spinning media but perhaps not relevant for flash storage. With flash storage, I suspect ost_io_threads_max=N*num_cpus, with N in range 10-20 is good starting point. However not sure if storage blk_mq support means more or less ost_io threads.&lt;/p&gt;

&lt;p&gt;Of course ability to reduce active thread count would help tuning.&lt;/p&gt;</comment>
                            <comment id="243990" author="sihara" created="Fri, 15 Mar 2019 11:46:43 +0000"  >&lt;p&gt;I agree reducing number of threads might be help if large number of clients send messages simultaneously, but need to keep maximum performance with small number of client with large network bandwdith. e.g. 8 clients with EDR needs to get 80GB/sec too.&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;For systems with large number of clients (ie. ~1000) we find the max number of OSS &amp;amp; MDS threads too high. This causes high system load &amp;amp; lost network connections. Too many service threads particularly impacts Ethernet &amp;amp; OPA since they have higer CPU utilization/system load.
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;did you disable 16MB rpc here? most of large instllastion, memory presure comes from large rpc and we have been disabling rpc size to 8M or 4MB.&lt;/p&gt;
</comment>
                            <comment id="245852" author="chunteraa" created="Tue, 16 Apr 2019 17:01:35 +0000"  >&lt;blockquote&gt;&lt;p&gt;did you disable 16MB rpc here? most of large instllastion, memory presure comes from large rpc and we have been disabling rpc size to 8M or 4MB.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;Reducing &lt;tt&gt;brw_size&lt;/tt&gt; doesn&apos;t help much with many clients.&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;but need to keep maximum performance with small number of client with large network bandwdith. e.g. 8 clients with EDR needs to get 80GB/sec too.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;I suspect the main factors:&lt;br/&gt;
 1. number of clients (ie. number of simultaneous messages to OST target)&lt;br/&gt;
 2. number of OSTs per server&lt;br/&gt;
 3. performance difference between flash and spinning disk storage&lt;/p&gt;

&lt;p&gt;Ideally we would have ability to set &lt;tt&gt;max_threads&lt;/tt&gt; based on different environments and change value if workload changes.&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;</comment>
                            <comment id="245871" author="adilger" created="Tue, 16 Apr 2019 19:19:45 +0000"  >&lt;blockquote&gt;
&lt;p&gt;Ideally we would have ability to set &lt;tt&gt;max_threads&lt;/tt&gt; based on different environments and change value if workload changes.&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;That ability &lt;em&gt;already&lt;/em&gt; exists, it is a module parameter, but can also be increased at runtime if needed.  What is new in the first patch is the ability to reduce it at runtime.  That said, changing it based on workload seems impractical since there may be many different jobs running at the same time.  What I&apos;d like is to have a reasonable out-of-the box value, possibly a function of some well-known parameters (RAM, core count, possibly OST count, &lt;em&gt;maybe&lt;/em&gt; client count though I&apos;m not sure that is right).&lt;/p&gt;</comment>
                            <comment id="245989" author="chunteraa" created="Thu, 18 Apr 2019 14:08:33 +0000"  >&lt;blockquote&gt;
&lt;p&gt; What is new in the first patch is the ability to reduce it at runtime. &lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;This would help for managing larger systems.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;reasonable out-of-the box value, possibly a function of some well-known parameters (RAM, core count, possibly OST count, maybe client count though I&apos;m not sure that is right).&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;Perhaps we could provide guidelines in the documentation ?&lt;/p&gt;</comment>
                            <comment id="246115" author="gerrit" created="Sun, 21 Apr 2019 05:47:18 +0000"  >&lt;p&gt;Oleg Drokin (green@whamcloud.com) merged in patch &lt;a href=&quot;https://review.whamcloud.com/34400/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/34400/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-947&quot; title=&quot;ptlrpc dynamic service thread count handling&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-947&quot;&gt;&lt;del&gt;LU-947&lt;/del&gt;&lt;/a&gt; ptlrpc: allow stopping threads above threads_max&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 183cb1e3cdd2de93aca5dff79b3d56bbadc00178&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="79153">LU-17312</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                    <customfield id="customfield_10020" key="com.atlassian.jira.plugin.system.customfieldtypes:float">
                        <customfieldname>Bugzilla ID</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>22516.0</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzw41j:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>10749</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                </customfields>
    </item>
</channel>
</rss>