<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:42:53 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-4454] &quot;Lustre: can&apos;t support CPU hotplug well now&quot;</title>
                <link>https://jira.whamcloud.com/browse/LU-4454</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;We have some Lustre clients where hyperthreading is enabled and disabled, possibly on a per job basis.  The admins are noting streams of scary messages on the console from Lustre:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;2013-12-02 09:58:29 LNet: 5546:0:(linux-cpu.c:1035:cfs_cpu_notify()) Lustre: can&apos;t support CPU hotplug well now, performance and stability could be impacted[CPU 40 notify: 3]
2013-12-02 09:58:29 LNet: 5546:0:(linux-cpu.c:1035:cfs_cpu_notify()) Skipped 30 previous similar messages
2013-12-02 09:58:29 Booting Node 0 Processor 40 APIC 0x1
2013-12-02 09:58:30 microcode: CPU40 sig=0x206f2, pf=0x4, revision=0x37
2013-12-02 09:58:30 platform microcode: firmware: requesting intel-ucode/06-2f-02
2013-12-02 09:58:30 Booting Node 0 Processor 41 APIC 0x3
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The above message is not acceptable.  Please fix.&lt;/p&gt;

&lt;p&gt;Further, when I went to look into how this cpu partitions code worked, I wound up mighty confused.  For instance, on a node with 4 sockets and 10 codes per socket, I see this:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;/proc/sys/lnet$ cat cpu_partition_table
0       : 0 1 2 3 4
1       : 5 6 7 8 9
2       : 10 11 12 13 14
3       : 15 16 17 18 19
4       : 20 21 22 23 24
5       : 25 26 27 28 29
6       : 30 31 32 33 34
7       : 35 36 37 38 39
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Why are there two parititions per socket?  Is this by design, or a bug?&lt;/p&gt;

&lt;p&gt;What is going to happen when hyperthreading is enabled, and there are 80 &quot;cpus&quot; suddenly available?&lt;/p&gt;</description>
                <environment></environment>
        <key id="22663">LU-4454</key>
            <summary>&quot;Lustre: can&apos;t support CPU hotplug well now&quot;</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="liang">Liang Zhen</assignee>
                                    <reporter username="morrone">Christopher Morrone</reporter>
                        <labels>
                    </labels>
                <created>Wed, 8 Jan 2014 02:21:24 +0000</created>
                <updated>Fri, 14 Feb 2014 17:16:49 +0000</updated>
                            <resolved>Tue, 21 Jan 2014 22:20:59 +0000</resolved>
                                    <version>Lustre 2.5.0</version>
                    <version>Lustre 2.4.2</version>
                                    <fixVersion>Lustre 2.6.0</fixVersion>
                    <fixVersion>Lustre 2.5.1</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>7</watches>
                                                                            <comments>
                            <comment id="74543" author="pjones" created="Wed, 8 Jan 2014 04:57:44 +0000"  >&lt;p&gt;Liang&lt;/p&gt;

&lt;p&gt;Could you please advise?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="74544" author="liang" created="Wed, 8 Jan 2014 05:51:22 +0000"  >&lt;p&gt;Hi Chris, it is by design to have multiple partitions per socket, two different partitions (and thread pools) per socket should have better performance when there are many cores (HTs) on each socket, and it also can be set/changed by configuration. &lt;/p&gt;

&lt;p&gt;CPU partition is not well designed for hot plug-out CPU, e.g. all CPUs (or cores) in a specific CPU partition are offline, then threads on that CPU partition just lose affinity, we can do nothing for this so far. Hot plug-in new CPU is OK, but new added CPU will never be used by CPU affinity threads.&lt;/p&gt;

&lt;p&gt;Enabling/Disabling HT should be fine because HTs of a same core will be put in a same CPU partition:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;if HT is disable, when enabling HT, Lustre threads with CPU affinity will never run on those new appeared &quot;CPUs&quot;, that&apos;s it.&lt;/li&gt;
	&lt;li&gt;if HT is enabled, When disabling HT, Lustre threads with CPU affinity will still run on same cores.&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;I can work out a patch which only print warnings when we lose a physical core (all HTs in a core are gone).&lt;/p&gt;</comment>
                            <comment id="74547" author="liang" created="Wed, 8 Jan 2014 06:59:41 +0000"  >&lt;p&gt;patch is here: &lt;a href=&quot;http://review.whamcloud.com/#/c/8770/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#/c/8770/&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="75112" author="jlevi" created="Thu, 16 Jan 2014 18:38:10 +0000"  >&lt;p&gt;What else needs to be completed in this ticket and what is the priority of that work (if any)?&lt;/p&gt;</comment>
                            <comment id="75392" author="pjones" created="Tue, 21 Jan 2014 22:20:59 +0000"  >&lt;p&gt;As per LLNL this ticket can be resolved.&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzwcdz:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>12210</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>