<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 03:02:24 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-13571] Refine which network errors result in LNet Health activity</title>
                <link>https://jira.whamcloud.com/browse/LU-13571</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;&lt;del&gt;There are a category of errors, like unable to resolve address or route which shouldn&apos;t result in health of the remote or the local being decremented or recovered. This category of errors indicate that the remote address does not exist or is unreachable.&lt;/del&gt;&lt;/p&gt;

&lt;p&gt;Rather than ignore these errors we decided that with the enhancement in &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-13569&quot; class=&quot;external-link&quot; rel=&quot;nofollow&quot;&gt;https://jira.whamcloud.com/browse/LU-13569&lt;/a&gt; we should instead have LND return LNET_MSG_STATUS_NETWORK_TIMEOUT to LNet so that both the local NI and remote NI health is ding&apos;d. This way, if the problem really is with the remote NI then we can have that reflected in the health value for the remote NI and it can be accounted for on future sends. With &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-13569&quot; title=&quot;LNet Health should not recover interfaces indefinitely&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-13569&quot;&gt;&lt;del&gt;LU-13569&lt;/del&gt;&lt;/a&gt; we don&apos;t run the risk of forever recovering a remote NI that will never be returned to service.&lt;/p&gt;

&lt;p&gt;Related to this, we decided that the LOCAL_TIMEOUT returned in kiblnd_check_conns() path should also be NETWORK_TIMEOUT:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;kiblnd_check_conns()
...
                /* Check tx_deadline */
                list_for_each_entry_safe(tx, tx_tmp, &amp;amp;peer_ni-&amp;gt;ibp_tx_queue, tx_list) {
                        if (ktime_compare(ktime_get(), tx-&amp;gt;tx_deadline) &amp;gt;= 0) {
                                CWARN(&quot;Timed out tx for %s: %lld seconds\n&quot;,
                                      libcfs_nid2str(peer_ni-&amp;gt;ibp_nid),
                                      ktime_ms_delta(ktime_get(),
                                                     tx-&amp;gt;tx_deadline) / MSEC_PER_SEC);
                                list_move(&amp;amp;tx-&amp;gt;tx_list, &amp;amp;timedout_txs);
                        }
                }
...
        if (!list_empty(&amp;amp;timedout_txs))
                kiblnd_txlist_done(&amp;amp;timedout_txs, -ETIMEDOUT,
                                   LNET_MSG_STATUS_LOCAL_TIMEOUT);
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;So for this ticket I plan to push three patches:&lt;br/&gt;
1. Modify lnet_health_check() so that NETWORK_TIMEOUT dings both local and remote NI health (this was the original design intent).&lt;br/&gt;
2. Modify kiblnd_check_conns() so that it returns NETWORK_TIMEOUT rather than LOCAL_TIMEOUT.&lt;br/&gt;
3. Modify the status for unresolvable address or route to return NETWORK_TIMEOUT.&lt;/p&gt;

&lt;p&gt;3 probably needs to be based on top of the patches for &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-13569&quot; title=&quot;LNet Health should not recover interfaces indefinitely&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-13569&quot;&gt;&lt;del&gt;LU-13569&lt;/del&gt;&lt;/a&gt;.&lt;/p&gt;</description>
                <environment></environment>
        <key id="59213">LU-13571</key>
            <summary>Refine which network errors result in LNet Health activity</summary>
                <type id="4" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11310&amp;avatarType=issuetype">Improvement</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="hornc">Chris Horn</assignee>
                                    <reporter username="hornc">Chris Horn</reporter>
                        <labels>
                    </labels>
                <created>Fri, 15 May 2020 20:31:53 +0000</created>
                <updated>Tue, 23 Feb 2021 13:16:02 +0000</updated>
                            <resolved>Thu, 3 Dec 2020 14:42:47 +0000</resolved>
                                                    <fixVersion>Lustre 2.14.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>2</watches>
                                                                            <comments>
                            <comment id="279528" author="gerrit" created="Mon, 14 Sep 2020 16:07:26 +0000"  >&lt;p&gt;Chris Horn (chris.horn@hpe.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/39898&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/39898&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-13571&quot; title=&quot;Refine which network errors result in LNet Health activity&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-13571&quot;&gt;&lt;del&gt;LU-13571&lt;/del&gt;&lt;/a&gt; lnet: Correct handling of NETWORK_TIMEOUT status&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: d4a6d8ea328c7b112f0f99027f8acac0c0cf78d5&lt;/p&gt;</comment>
                            <comment id="279529" author="gerrit" created="Mon, 14 Sep 2020 16:07:26 +0000"  >&lt;p&gt;Chris Horn (chris.horn@hpe.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/39901&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/39901&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-13571&quot; title=&quot;Refine which network errors result in LNet Health activity&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-13571&quot;&gt;&lt;del&gt;LU-13571&lt;/del&gt;&lt;/a&gt; tests: Test health and resends for network timeout&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 7e6c5ec549010f3e7afa72cfdf690395b5d76e32&lt;/p&gt;</comment>
                            <comment id="279530" author="gerrit" created="Mon, 14 Sep 2020 16:07:27 +0000"  >&lt;p&gt;Chris Horn (chris.horn@hpe.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/39899&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/39899&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-13571&quot; title=&quot;Refine which network errors result in LNet Health activity&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-13571&quot;&gt;&lt;del&gt;LU-13571&lt;/del&gt;&lt;/a&gt; lnd: Use NETWORK_TIMEOUT for txs on ibp_tx_queue&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 86905008c043bdbdaadde393d68af1906c93ae22&lt;/p&gt;</comment>
                            <comment id="279531" author="gerrit" created="Mon, 14 Sep 2020 16:07:28 +0000"  >&lt;p&gt;Chris Horn (chris.horn@hpe.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/39900&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/39900&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-13571&quot; title=&quot;Refine which network errors result in LNet Health activity&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-13571&quot;&gt;&lt;del&gt;LU-13571&lt;/del&gt;&lt;/a&gt; lnd: Use NETWORK_TIMEOUT for some conn failures&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: ff01a4476feb652757fcccda036b52581bad15dd&lt;/p&gt;</comment>
                            <comment id="279867" author="gerrit" created="Thu, 17 Sep 2020 18:40:22 +0000"  >&lt;p&gt;Chris Horn (chris.horn@hpe.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/39965&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/39965&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-13571&quot; title=&quot;Refine which network errors result in LNet Health activity&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-13571&quot;&gt;&lt;del&gt;LU-13571&lt;/del&gt;&lt;/a&gt; tests: Debug sanity-lnet test 210&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 36ebf465fb51cdc4fe461a091290fb5dd836c688&lt;/p&gt;</comment>
                            <comment id="286060" author="gerrit" created="Thu, 26 Nov 2020 09:25:40 +0000"  >&lt;p&gt;Oleg Drokin (green@whamcloud.com) merged in patch &lt;a href=&quot;https://review.whamcloud.com/39898/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/39898/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-13571&quot; title=&quot;Refine which network errors result in LNet Health activity&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-13571&quot;&gt;&lt;del&gt;LU-13571&lt;/del&gt;&lt;/a&gt; lnet: Correct handling of NETWORK_TIMEOUT status&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: ffd4523f2d50ef952112f44ffd524af991b4baed&lt;/p&gt;</comment>
                            <comment id="286570" author="gerrit" created="Thu, 3 Dec 2020 07:26:27 +0000"  >&lt;p&gt;Oleg Drokin (green@whamcloud.com) merged in patch &lt;a href=&quot;https://review.whamcloud.com/39899/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/39899/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-13571&quot; title=&quot;Refine which network errors result in LNet Health activity&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-13571&quot;&gt;&lt;del&gt;LU-13571&lt;/del&gt;&lt;/a&gt; lnd: Use NETWORK_TIMEOUT for txs on ibp_tx_queue&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 7af63191370fd2337d0bc9045d211b918c61fdd1&lt;/p&gt;</comment>
                            <comment id="286571" author="gerrit" created="Thu, 3 Dec 2020 07:26:31 +0000"  >&lt;p&gt;Oleg Drokin (green@whamcloud.com) merged in patch &lt;a href=&quot;https://review.whamcloud.com/39900/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/39900/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-13571&quot; title=&quot;Refine which network errors result in LNet Health activity&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-13571&quot;&gt;&lt;del&gt;LU-13571&lt;/del&gt;&lt;/a&gt; lnd: Use NETWORK_TIMEOUT for some conn failures&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 12333c1fecc00ed67597f189715a68cbfea7b287&lt;/p&gt;</comment>
                            <comment id="286598" author="pjones" created="Thu, 3 Dec 2020 14:42:49 +0000"  >&lt;p&gt;Landed for 2.14&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                                        </outwardlinks>
                                                                <inwardlinks description="is related to">
                                                        </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                        <customfield id="customfield_10092" key="com.pyxis.greenhopper.jira:gh-epic-link">
                        <customfieldname>Epic Link</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>LU-13422</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                    <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i010jj:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                </customfields>
    </item>
</channel>
</rss>