<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:07:15 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-7245] Improve SMP scaling support for LND drivers</title>
                <link>https://jira.whamcloud.com/browse/LU-7245</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;While working on enhancing the lnetctl utility it was discovered that more SMP scaling improvements can be done to the currently supported LND driver.&lt;/p&gt;</description>
                <environment>Any TCP, infiniband or Gemini network system</environment>
        <key id="32454">LU-7245</key>
            <summary>Improve SMP scaling support for LND drivers</summary>
                <type id="4" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11310&amp;avatarType=issuetype">Improvement</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="simmonsja">James A Simmons</assignee>
                                    <reporter username="simmonsja">James A Simmons</reporter>
                        <labels>
                            <label>patch</label>
                    </labels>
                <created>Fri, 2 Oct 2015 17:56:41 +0000</created>
                <updated>Thu, 13 May 2021 18:56:06 +0000</updated>
                            <resolved>Sat, 22 Apr 2017 18:22:40 +0000</resolved>
                                    <version>Lustre 2.9.0</version>
                                    <fixVersion>Lustre 2.9.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>8</watches>
                                                                            <comments>
                            <comment id="129175" author="gerrit" created="Fri, 2 Oct 2015 18:48:53 +0000"  >&lt;p&gt;James Simmons (uja.ornl@yahoo.com) uploaded a new patch: &lt;a href=&quot;http://review.whamcloud.com/16710&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/16710&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-7245&quot; title=&quot;Improve SMP scaling support for LND drivers&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-7245&quot;&gt;&lt;del&gt;LU-7245&lt;/del&gt;&lt;/a&gt; socklnd: Bind peers to a specific CPT&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 05f09aec7aecbd1d57eb321da2f4a65056d9a483&lt;/p&gt;</comment>
                            <comment id="129514" author="pjones" created="Tue, 6 Oct 2015 17:05:50 +0000"  >&lt;p&gt;James&lt;/p&gt;

&lt;p&gt;Are you thinking of targeting this work for 2.9?&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="129676" author="olaf" created="Wed, 7 Oct 2015 09:21:00 +0000"  >&lt;p&gt;James, since you asked, a few notes about NUMA.&lt;/p&gt;

&lt;p&gt;Reasoning about NUMA is a lot like thinking about a cluster, except you spend a lot of time worrying about cache lines instead of files. (Much of the terminology is also similar, which becomes confusing when discussing NUMA issues for systems that are part of a cluster.) It is worth noting that NUMA considerations already apply once a system has more than one socket. Ideally, the process driving I/O, the memory involved, and the interface involved all live on the same socket.&lt;/p&gt;

&lt;p&gt;In practice, the memory placement may have been done by some user space process outside our control, and the same goes for the process that initiates the I/O. Selection of an interface that is close in the topology of the system is useful, but that already assumes a multi-rail type configuration. Much of the time there will be no choice because there is only one interface. (LNet routers are an exception: there we do have full control over the location of all buffers and threads relative to the interfaces used.)&lt;/p&gt;

&lt;p&gt;So the main concern becomes doing the best we can in areas we do control, in particular avoiding cache line bouncing. Placing a datastructure like &lt;tt&gt;ksock_peer&lt;/tt&gt; in the same CPT as the interface helps a little bit here, but only a little. The layout of the &lt;tt&gt;ksock_peer&lt;/tt&gt; structure is actually a good example of what not to do. Take a look at the first few members, which will likely all be in the same cache line:&lt;/p&gt;

&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;typedef struct ksock_peer
{
        struct list_head        ksnp_list;      &lt;span class=&quot;code-comment&quot;&gt;/* stash on global peer list */&lt;/span&gt;
        cfs_time_t            ksnp_last_alive;  &lt;span class=&quot;code-comment&quot;&gt;/* when (in jiffies) I was last alive */&lt;/span&gt;
        lnet_process_id_t     ksnp_id;       &lt;span class=&quot;code-comment&quot;&gt;/* who&apos;s on the other end(s) */&lt;/span&gt;
        atomic_t              ksnp_refcount; &lt;span class=&quot;code-comment&quot;&gt;/* # users */&lt;/span&gt;
        &lt;span class=&quot;code-object&quot;&gt;int&lt;/span&gt;                   ksnp_sharecount;  &lt;span class=&quot;code-comment&quot;&gt;/* lconf usage counter */&lt;/span&gt;
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;&lt;tt&gt;ksnp_list&lt;/tt&gt; and &lt;tt&gt;ksnp_id&lt;/tt&gt; are semi-constant, and read by any thread that does a lookup of some peer in the hash table (shared/read access). In contrast &lt;tt&gt;ksnp_refcount&lt;/tt&gt; and &lt;tt&gt;ksnp_last_alive&lt;/tt&gt; are updated by threads doing work for this particular peer (exclusive/write access). So a lookup of some unrelated peer causes a cache line bounce between the CPU doing the lookup and the CPU managing the I/O. This particular case can be mitigated by being very careful with the layout of a datastructure, and by making sure that threads that do modify the structure run on the same socket, even if that socket is not where the datastructure lives.&lt;/p&gt;</comment>
                            <comment id="129850" author="simmonsja" created="Thu, 8 Oct 2015 17:24:41 +0000"  >&lt;p&gt;The patch for the Gemini SMP work under &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-2544&quot; title=&quot;Add SMP scaling to Cray Gemini driver&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-2544&quot;&gt;LU-2544&lt;/a&gt; reworked a lot of the data structures to deal with what you described. Never looked closely at the other LND drivers but I can see the problem there. Will require a lot of data structure reworking :-/&lt;/p&gt;</comment>
                            <comment id="131812" author="gerrit" created="Wed, 28 Oct 2015 13:49:47 +0000"  >&lt;p&gt;Oleg Drokin (oleg.drokin@intel.com) merged in patch &lt;a href=&quot;http://review.whamcloud.com/16710/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/16710/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-7245&quot; title=&quot;Improve SMP scaling support for LND drivers&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-7245&quot;&gt;&lt;del&gt;LU-7245&lt;/del&gt;&lt;/a&gt; socklnd: Bind peers to a specific CPT&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 68eb6e841f49d41a289bd0b3f559973b6cb31738&lt;/p&gt;</comment>
                            <comment id="131827" author="simmonsja" created="Wed, 28 Oct 2015 14:32:14 +0000"  >&lt;p&gt;I suspect more work will coming from this ticket for 2.9.&lt;/p&gt;</comment>
                            <comment id="193112" author="simmonsja" created="Sat, 22 Apr 2017 18:22:40 +0000"  >&lt;p&gt;Patches for this work already landed and the multi-rail work filled in the rest of the gaps.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="17046">LU-2544</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="27731">LU-5960</issuekey>
        </issuelink>
                            </outwardlinks>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="64088">LU-14676</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10490" key="com.atlassian.jira.plugin.system.customfieldtypes:datepicker">
                        <customfieldname>End date</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Wed, 28 Oct 2015 17:56:41 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                        <customfield id="customfield_10030" key="com.atlassian.jira.plugin.system.customfieldtypes:labels">
                        <customfieldname>Epic/Theme</customfieldname>
                        <customfieldvalues>
                                        <label>lnet</label>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzxpfj:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_10493" key="com.atlassian.jira.plugin.system.customfieldtypes:datepicker">
                        <customfieldname>Start date</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Fri, 2 Oct 2015 17:56:41 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                    </customfields>
    </item>
</channel>
</rss>