<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:43:40 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-11413] Large performance degradation in routing environment. </title>
                <link>https://jira.whamcloud.com/browse/LU-11413</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Large performance degradation found with MD testing in routed env. While simplify a test case, I found an async routes was is root cause of this. Replication is quite simple,&lt;br/&gt;
you need LST server and client in different logical networks and ask server to be route traffic via &apos;roter1&apos; while client have route a traffic via route2.&lt;br/&gt;
Example of results is &lt;br/&gt;
server have route via 73@o2ib and lst results in case different router used:&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;LNet Rates of 172.18.1.4@o2ib1&amp;#93;&lt;/span&gt;&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;R&amp;#93;&lt;/span&gt; Avg: 8615     RPC/s Min: 8615     RPC/s Max: 8615     RPC/s&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;W&amp;#93;&lt;/span&gt; Avg: 8615     RPC/s Min: 8615     RPC/s Max: 8615     RPC/s&lt;br/&gt;
but once routing changed - results is much better.&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;root@c-lmo069 ~&amp;#93;&lt;/span&gt;# lnetctl route del --net o2ib1 --gateway 172.18.2.76@o2ib10&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;root@c-lmo069 ~&amp;#93;&lt;/span&gt;# lnetctl route add --net o2ib1 --gateway 172.18.2.73@o2ib10&lt;/p&gt;

&lt;p&gt;&lt;span class=&quot;error&quot;&gt;&amp;#91;root@c-lmo069 ~&amp;#93;&lt;/span&gt;# lctl dk &amp;gt; log; bash /root/lnet.sh write 4k 1; lctl dk &amp;gt; log-r4&lt;br/&gt;
Performing write&lt;br/&gt;
SESSION: read/write FEATURES: 1 TIMEOUT: 300 FORCE: No&lt;br/&gt;
172.18.1.4@o2ib1 are added to session&lt;br/&gt;
172.18.2.69@o2ib10 are added to session&lt;br/&gt;
Test was added successfully&lt;br/&gt;
bulk_rw is running now&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;LNet Rates of 172.18.1.4@o2ib1&amp;#93;&lt;/span&gt;&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;R&amp;#93;&lt;/span&gt; Avg: 11349    RPC/s Min: 11349    RPC/s Max: 11349    RPC/s&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;W&amp;#93;&lt;/span&gt; Avg: 11349    RPC/s Min: 11349    RPC/s Max: 11349    RPC/s&lt;/p&gt;

&lt;p&gt;lnet.sh is a simple script to send a 4k rpc, with 1 send in parallel (concurrent sends == 1).&lt;/p&gt;

&lt;p&gt;This issue can replicated with socklnd also.&lt;/p&gt;

&lt;p&gt;socklnd problem is - reply is scheduled to the different thread and can be send quickly as possible.&lt;/p&gt;

&lt;p&gt;o2ib problem - it looks need additional network messaged to distribute a credits between nodes, as o2ib protocol have send an additional credits which sending a request which assume a reply.&lt;/p&gt;

&lt;p&gt;First part of these issues is looks easy and caused an incomplete implementation with with lnet_send() function call, it&apos;s never use a right source NID to make it preferable. &lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;/* NB: we probably want to use NID of msg::msg_from as 3rd
&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194; * parameter (router NID) &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; it&apos;s routed message */
&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;rc = lnet_send(msg-&amp;gt;msg_ev.target.nid, msg, LNET_NID_ANY);
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;while second part of problem, more complex.&lt;br/&gt;
two problems in this area.&lt;br/&gt;
1) server can set just a single router and we need to route all traffic to this to avoid performance penalty.&lt;br/&gt;
It can be done with adding a incoming message counter as similar as lpni_seq counted as outgunning events. &lt;/p&gt;

&lt;p&gt;2) server router is outside of router&apos;s list. did we need to add this router to make ability to choose it in lnet_find_route_locked() ?&lt;/p&gt;</description>
                <environment>any LST server/client and two routers</environment>
        <key id="53382">LU-11413</key>
            <summary>Large performance degradation in routing environment. </summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="2" iconUrl="https://jira.whamcloud.com/images/icons/priorities/critical.svg">Critical</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="shadow">Alexey Lyashkov</assignee>
                                    <reporter username="shadow">Alexey Lyashkov</reporter>
                        <labels>
                    </labels>
                <created>Fri, 21 Sep 2018 15:55:56 +0000</created>
                <updated>Sun, 3 Mar 2019 14:36:16 +0000</updated>
                            <resolved>Sun, 3 Mar 2019 14:36:16 +0000</resolved>
                                                    <fixVersion>Lustre 2.13.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>7</watches>
                                                                            <comments>
                            <comment id="239974" author="shadow" created="Tue, 15 Jan 2019 10:16:49 +0000"  >&lt;p&gt;In fact one more problem found. LNet incorrectly hash a messages based on router NID, not a message initiator NID.&lt;/p&gt;</comment>
                            <comment id="239980" author="gerrit" created="Tue, 15 Jan 2019 12:52:49 +0000"  >&lt;p&gt;Alexey Lyashkov (c17817@cray.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/34031&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/34031&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-11413&quot; title=&quot;Large performance degradation in routing environment. &quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-11413&quot;&gt;&lt;del&gt;LU-11413&lt;/del&gt;&lt;/a&gt; lnet: use right rtr address&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 7cf61dc30972d5a1aba89d402f0d50b2b68a2bb9&lt;/p&gt;</comment>
                            <comment id="239981" author="gerrit" created="Tue, 15 Jan 2019 12:52:50 +0000"  >&lt;p&gt;Alexey Lyashkov (c17817@cray.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/34032&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/34032&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-11413&quot; title=&quot;Large performance degradation in routing environment. &quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-11413&quot;&gt;&lt;del&gt;LU-11413&lt;/del&gt;&lt;/a&gt; lnet: use right address for routing message&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 99ce2bfbcf058314810b22fef3ea648fa748334f&lt;/p&gt;</comment>
                            <comment id="240439" author="shadow" created="Mon, 21 Jan 2019 09:45:33 +0000"  >&lt;p&gt;Second patch is fixing regression introduced a Multi rail landing. &lt;/p&gt;</comment>
                            <comment id="243235" author="gerrit" created="Sun, 3 Mar 2019 00:20:06 +0000"  >&lt;p&gt;Oleg Drokin (green@whamcloud.com) merged in patch &lt;a href=&quot;https://review.whamcloud.com/34031/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/34031/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-11413&quot; title=&quot;Large performance degradation in routing environment. &quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-11413&quot;&gt;&lt;del&gt;LU-11413&lt;/del&gt;&lt;/a&gt; lnet: use right rtr address&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 3f45206081301508ce55b51c1c57027247bb0c1d&lt;/p&gt;</comment>
                            <comment id="243236" author="gerrit" created="Sun, 3 Mar 2019 00:20:13 +0000"  >&lt;p&gt;Oleg Drokin (green@whamcloud.com) merged in patch &lt;a href=&quot;https://review.whamcloud.com/34032/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/34032/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-11413&quot; title=&quot;Large performance degradation in routing environment. &quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-11413&quot;&gt;&lt;del&gt;LU-11413&lt;/del&gt;&lt;/a&gt; lnet: use right address for routing message&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: ad263e5d6e93e3951f3066ddec653205d6d08eae&lt;/p&gt;</comment>
                            <comment id="243258" author="pjones" created="Sun, 3 Mar 2019 14:36:16 +0000"  >&lt;p&gt;Landed for 2.13&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i002uv:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>