<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:44:01 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-11454] Allow switching off CPT binding for PTLRPC threads</title>
                <link>https://jira.whamcloud.com/browse/LU-11454</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;It is not always advantageous to bind the various server service threads to specific CPTs.&#160; If all or most traffic is coming in from one node (likely a router), then activity on the server ends up limited to the CPT associated with that router.&#160; This is advantageous if NUMA latencies are relatively high, but can be disadvantageous if CPTs are small and latencies are low.&lt;/p&gt;

&lt;p&gt;Specifically, the work is limited to the CPUs in the CPT, which means that some workloads can end up needing more CPU, but are unable to get it.&lt;/p&gt;

&lt;p&gt;In essence, the default behavior of strict binding is fine but is not always preferable.&#160; So, add an option to disable this strict binding.&lt;/p&gt;</description>
                <environment></environment>
        <key id="53464">LU-11454</key>
            <summary>Allow switching off CPT binding for PTLRPC threads</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="paf">Patrick Farrell</assignee>
                                    <reporter username="paf">Patrick Farrell</reporter>
                        <labels>
                    </labels>
                <created>Mon, 1 Oct 2018 19:25:16 +0000</created>
                <updated>Sun, 13 Jun 2021 06:55:39 +0000</updated>
                            <resolved>Sat, 13 Oct 2018 05:00:37 +0000</resolved>
                                                    <fixVersion>Lustre 2.12.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>12</watches>
                                                                            <comments>
                            <comment id="234187" author="gerrit" created="Mon, 1 Oct 2018 19:35:06 +0000"  >&lt;p&gt;Patrick Farrell (paf@cray.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/33262&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/33262&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-11454&quot; title=&quot;Allow switching off CPT binding for PTLRPC threads&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-11454&quot;&gt;&lt;del&gt;LU-11454&lt;/del&gt;&lt;/a&gt; ptlrpc: Make binding switchable&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: fea65e1c476f634196cf87b3597f2accda7f69a6&lt;/p&gt;</comment>
                            <comment id="234362" author="paf" created="Thu, 4 Oct 2018 15:46:15 +0000"  >&lt;p&gt;By the way, I&apos;d like to sound people out - Olaf and Andreas in particular - on what we think about changing the default to &quot;CPT aware but no binding&quot;, at least for the MDT. I think it is going to be better for almost all real world server configs. (Server, not client)&lt;/p&gt;

&lt;p&gt;The &quot;bind to just this CPT&quot; behavior seems like something designed for high NUMA distances/latencies, like seen on the SGI/HPE big iron - but my understanding is they&apos;re not used as servers, just clients. Servers are usually much smaller systems.&lt;/p&gt;

&lt;p&gt;We&apos;ve seen significant performance improvements in routed configs (where there is effectively a router-CPT binding) by disabling the binding of worker threads, and no performance loss in other configs (even a slight gain). We still want the CPT awareness so there are multiple queues for sleeping &amp;amp; getting work for the ptlrpc threads, we just don&apos;t want to limit the CPUs they can use.&lt;/p&gt;

&lt;p&gt;The patch today does &lt;b&gt;not&lt;/b&gt; change the default behavior - but I think we should consider it. Currently we&apos;ve only got solid numbers for the MDT - We don&apos;t see any major performance problems on our OSTs related to this, so I&apos;m not (yet) suggesting changing the default there.&lt;/p&gt;

&lt;p&gt;If there&apos;s interest in changing the default, I can work to get those numbers.&lt;/p&gt;</comment>
                            <comment id="234368" author="dougo" created="Thu, 4 Oct 2018 16:24:47 +0000"  >&lt;p&gt;Patrick, how about this approach:&lt;/p&gt;

&lt;p&gt;If the configuration is binding an NI to one or more CPTs, we continue to keep the worker threads bound to those CPTs. &#160;This should continue to support the SGI big iron. &#160;If there is no binding of the NIs, then we are free to use all cores and can have a single pool of worker threads which don&apos;t have any CPT bindings.&lt;/p&gt;

&lt;p&gt;Locks and other &quot;limited&quot; resources continue to be CPT based to alleviate contention. &#160;If we properly round-robin over the worker threads, we should get a good distribution of work over the CPT resources.&lt;/p&gt;

&lt;p&gt;Thoughts?&lt;/p&gt;</comment>
                            <comment id="234371" author="adilger" created="Thu, 4 Oct 2018 16:42:34 +0000"  >&lt;p&gt;There are definitely reason to have CPT bindings on clients as well - avoiding Lustre contention/jitter with application threads comes to mind. &lt;/p&gt;

&lt;p&gt;I&apos;ve added Ihara to the ticket, as he is better positioned to report if this change improves  performance (due to more efficient CPU usage), or hurts it (due to cross-core contention).&lt;/p&gt;</comment>
                            <comment id="234423" author="sihara" created="Fri, 5 Oct 2018 11:01:18 +0000"  >&lt;p&gt;I saw huge IOPS drops if oss_cpu_bind and oss_create_cpu_bind are disabled conjunctions with oss_max_threads, oss_num_threads, oss_num_create_threads. Something like this.&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;options libcfs cpu_npartitions=16 cpu_pattern=&quot;&quot;
options ost oss_max_threads=128 oss_num_threads=128 oss_num_create_threads=128 oss_cpu_bind=0 oss_create_cpu_bind=0
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Here is test results on my test box.&lt;/p&gt;
&lt;div class=&apos;table-wrap&apos;&gt;
&lt;table class=&apos;confluenceTable&apos;&gt;&lt;tbody&gt;
&lt;tr&gt;
&lt;th class=&apos;confluenceTh&apos;&gt;params&lt;/th&gt;
&lt;th class=&apos;confluenceTh&apos;&gt;IOPS (4K ranodm read)&lt;/th&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;oss_max_threads=128,&lt;br/&gt;
 oss_num_threads=128,&lt;br/&gt;
 oss_num_create_threads=128&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;833K&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;oss_max_threads=128,&lt;br/&gt;
 oss_num_threads=128,&lt;br/&gt;
 oss_num_create_threads=128,&lt;br/&gt;
 oss_cpu_bind=0,&lt;br/&gt;
 oss_create_cpu_bind=0&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;621K&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/div&gt;


&lt;p&gt;The parameters are still tunable and disabled by default, so, it&apos;s no impacts at all. Howerver, any particular reasons this performance drop? or did we expect? btw, OSS has single numa domain.&lt;/p&gt;</comment>
                            <comment id="234870" author="gerrit" created="Fri, 12 Oct 2018 23:50:29 +0000"  >&lt;p&gt;Oleg Drokin (green@whamcloud.com) merged in patch &lt;a href=&quot;https://review.whamcloud.com/33262/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/33262/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-11454&quot; title=&quot;Allow switching off CPT binding for PTLRPC threads&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-11454&quot;&gt;&lt;del&gt;LU-11454&lt;/del&gt;&lt;/a&gt; ptlrpc: Make CPU binding switchable&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 3eb7a1dfc3e7401ebcc45ccb116ed673607fd27f&lt;/p&gt;</comment>
                            <comment id="234883" author="pjones" created="Sat, 13 Oct 2018 05:00:37 +0000"  >&lt;p&gt;Landed for 2.12&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="64088">LU-14676</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i003d3:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>