<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 03:25:01 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-16213] kfilnd: Optimize issuing of hello messages to a peer</title>
                <link>https://jira.whamcloud.com/browse/LU-16213</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;kfilnd &amp;lt;&amp;gt; kfabric &amp;lt;&amp;gt; kcxi_prov &amp;lt;-&amp;gt; Cassini has the following issue. If LNet is trying to send to a kfilnd peer which is down, the Cassini retry handler can take up to 60 seconds to cancel the corresponding message. When retrying, the Cassini retry handler takes control of hardware resources and does not release them until the retries are complete. If enough of these messages are sent to down peers, it is possible the Cassini retry handler can take control of all available hardware resources. Once this happens, Cassini cannot process new RDMA commands and back pressure will start occuring in kcxi_prov which gets propagated to kfilnd as an -EAGAIN. As seen on JT, this can results in single RDMA operations taking minutes to complete.&lt;/p&gt;

&lt;p&gt;To help prevent this issue, kfilnd should only send to peers it knows are up.&lt;/p&gt;

&lt;p&gt;Looking at the kfilnd code today, I believe we can have multiple hello messages inflight to a single. If the peer is down, this can result in a build up of hello messages where each hello message will take the CXI retry handler 60 seconds to complete. There should only be a single hello message inflight per peer.&lt;/p&gt;

&lt;p&gt;As a part of this work, kiflnd transaction may have to be queued until a hello message comes back. If the hello message results in a failure, all queued transactions should be finalized.&lt;/p&gt;</description>
                <environment></environment>
        <key id="72667">LU-16213</key>
            <summary>kfilnd: Optimize issuing of hello messages to a peer</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="hornc">Chris Horn</assignee>
                                    <reporter username="hornc">Chris Horn</reporter>
                        <labels>
                    </labels>
                <created>Wed, 5 Oct 2022 15:50:31 +0000</created>
                <updated>Thu, 19 Jan 2023 20:38:57 +0000</updated>
                            <resolved>Thu, 19 Jan 2023 20:38:57 +0000</resolved>
                                                    <fixVersion>Lustre 2.16.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>2</watches>
                                                                            <comments>
                            <comment id="348793" author="gerrit" created="Wed, 5 Oct 2022 15:52:52 +0000"  >&lt;p&gt;&quot;Chris Horn &amp;lt;chris.horn@hpe.com&amp;gt;&quot; uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/c/fs/lustre-release/+/48780&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/c/fs/lustre-release/+/48780&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-16213&quot; title=&quot;kfilnd: Optimize issuing of hello messages to a peer&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-16213&quot;&gt;&lt;del&gt;LU-16213&lt;/del&gt;&lt;/a&gt; kfilnd: Rename struct kfilnd_peer members&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 44087d1f7a2110d12cba7e19fe34c3b4ecadf9d9&lt;/p&gt;</comment>
                            <comment id="348794" author="gerrit" created="Wed, 5 Oct 2022 15:52:53 +0000"  >&lt;p&gt;&quot;Chris Horn &amp;lt;chris.horn@hpe.com&amp;gt;&quot; uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/c/fs/lustre-release/+/48781&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/c/fs/lustre-release/+/48781&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-16213&quot; title=&quot;kfilnd: Optimize issuing of hello messages to a peer&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-16213&quot;&gt;&lt;del&gt;LU-16213&lt;/del&gt;&lt;/a&gt; kfilnd: Add peer info to some debug statements&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: c37a74c68a47c830ba662d223229acf2c88e8ae0&lt;/p&gt;</comment>
                            <comment id="348795" author="gerrit" created="Wed, 5 Oct 2022 15:52:54 +0000"  >&lt;p&gt;&quot;Chris Horn &amp;lt;chris.horn@hpe.com&amp;gt;&quot; uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/c/fs/lustre-release/+/48782&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/c/fs/lustre-release/+/48782&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-16213&quot; title=&quot;kfilnd: Optimize issuing of hello messages to a peer&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-16213&quot;&gt;&lt;del&gt;LU-16213&lt;/del&gt;&lt;/a&gt; kfilnd: Fail sends of particular message type&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 0c7af25572427b7b6c46e0e8548ef6856fb69e09&lt;/p&gt;</comment>
                            <comment id="348796" author="gerrit" created="Wed, 5 Oct 2022 15:52:55 +0000"  >&lt;p&gt;&quot;Chris Horn &amp;lt;chris.horn@hpe.com&amp;gt;&quot; uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/c/fs/lustre-release/+/48783&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/c/fs/lustre-release/+/48783&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-16213&quot; title=&quot;kfilnd: Optimize issuing of hello messages to a peer&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-16213&quot;&gt;&lt;del&gt;LU-16213&lt;/del&gt;&lt;/a&gt; kfilnd: Allow one HELLO in-flight per peer&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 4071f4148546da4d4496d7d929600c90b32aab46&lt;/p&gt;</comment>
                            <comment id="348797" author="gerrit" created="Wed, 5 Oct 2022 15:52:55 +0000"  >&lt;p&gt;&quot;Chris Horn &amp;lt;chris.horn@hpe.com&amp;gt;&quot; uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/c/fs/lustre-release/+/48784&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/c/fs/lustre-release/+/48784&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-16213&quot; title=&quot;kfilnd: Optimize issuing of hello messages to a peer&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-16213&quot;&gt;&lt;del&gt;LU-16213&lt;/del&gt;&lt;/a&gt; kfilnd: Finalize replay TNs with deleted peer&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 78d58c06f3f90cc81f9327b7fd614056ad2a4fea&lt;/p&gt;</comment>
                            <comment id="359669" author="gerrit" created="Thu, 19 Jan 2023 15:29:45 +0000"  >&lt;p&gt;&quot;Oleg Drokin &amp;lt;green@whamcloud.com&amp;gt;&quot; merged in patch &lt;a href=&quot;https://review.whamcloud.com/c/fs/lustre-release/+/48780/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/c/fs/lustre-release/+/48780/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-16213&quot; title=&quot;kfilnd: Optimize issuing of hello messages to a peer&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-16213&quot;&gt;&lt;del&gt;LU-16213&lt;/del&gt;&lt;/a&gt; kfilnd: Rename struct kfilnd_peer members&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 679e73db770d188f43aa4d50592d65e337ad135e&lt;/p&gt;</comment>
                            <comment id="359670" author="gerrit" created="Thu, 19 Jan 2023 15:29:55 +0000"  >&lt;p&gt;&quot;Oleg Drokin &amp;lt;green@whamcloud.com&amp;gt;&quot; merged in patch &lt;a href=&quot;https://review.whamcloud.com/c/fs/lustre-release/+/48781/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/c/fs/lustre-release/+/48781/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-16213&quot; title=&quot;kfilnd: Optimize issuing of hello messages to a peer&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-16213&quot;&gt;&lt;del&gt;LU-16213&lt;/del&gt;&lt;/a&gt; kfilnd: Add peer info to some debug statements&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: ba0e08cfdc5cfc1b7f1fc368916ff14e229e0b29&lt;/p&gt;</comment>
                            <comment id="359671" author="gerrit" created="Thu, 19 Jan 2023 15:30:08 +0000"  >&lt;p&gt;&quot;Oleg Drokin &amp;lt;green@whamcloud.com&amp;gt;&quot; merged in patch &lt;a href=&quot;https://review.whamcloud.com/c/fs/lustre-release/+/48782/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/c/fs/lustre-release/+/48782/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-16213&quot; title=&quot;kfilnd: Optimize issuing of hello messages to a peer&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-16213&quot;&gt;&lt;del&gt;LU-16213&lt;/del&gt;&lt;/a&gt; kfilnd: Fail sends of particular message type&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 35747c871df1c2e97d415cb7c3601e045a58c8e6&lt;/p&gt;</comment>
                            <comment id="359672" author="gerrit" created="Thu, 19 Jan 2023 15:30:18 +0000"  >&lt;p&gt;&quot;Oleg Drokin &amp;lt;green@whamcloud.com&amp;gt;&quot; merged in patch &lt;a href=&quot;https://review.whamcloud.com/c/fs/lustre-release/+/48783/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/c/fs/lustre-release/+/48783/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-16213&quot; title=&quot;kfilnd: Optimize issuing of hello messages to a peer&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-16213&quot;&gt;&lt;del&gt;LU-16213&lt;/del&gt;&lt;/a&gt; kfilnd: Allow one HELLO in-flight per peer&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 11a32d886b3c9b7c3c9a6ec5a6ebdc2786ef1c71&lt;/p&gt;</comment>
                            <comment id="359673" author="gerrit" created="Thu, 19 Jan 2023 15:30:29 +0000"  >&lt;p&gt;&quot;Oleg Drokin &amp;lt;green@whamcloud.com&amp;gt;&quot; merged in patch &lt;a href=&quot;https://review.whamcloud.com/c/fs/lustre-release/+/48784/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/c/fs/lustre-release/+/48784/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-16213&quot; title=&quot;kfilnd: Optimize issuing of hello messages to a peer&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-16213&quot;&gt;&lt;del&gt;LU-16213&lt;/del&gt;&lt;/a&gt; kfilnd: Finalize replay TNs with deleted peer&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 08bbe9e562c403f247a74e99101d238398df6351&lt;/p&gt;</comment>
                            <comment id="359744" author="pjones" created="Thu, 19 Jan 2023 20:38:57 +0000"  >&lt;p&gt;Landed for 2.16&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i0323j:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>