<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 03:35:25 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-17435] improved reliability in the face of intermittent network errors</title>
                <link>https://jira.whamcloud.com/browse/LU-17435</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;In cases of unstable client network interfaces, it is useful to improve the reliability of the overall filesystem by tracking the history of RPC timeouts and resends to each peer NID to determine how long a node should wait before resending (or not resending) an RPC to that node, and when to give up completely even when the peer is partially responsive.&lt;/p&gt;

&lt;p&gt;On the server side, if there are repeated RPC timeouts to a client that succeed with a resend (e.g. blocking AST) we might consider to reduce the RPC timeout duration to the client and send more often, and eventually evict the client if it is repeatedly unresponsive to lock callbacks (while other clients are &lt;b&gt;not&lt;/b&gt; unresponsive during the same time period) even if the client eventually replies.  While this would be &quot;unfair&quot; to that client, it would put the burden of bad behavior on that client instead of other well-behaved clients also accessing the filesystem.  That makes it more obvious that there is a problem with a specific node, instead of hard-to-debug timeout issue distributed across all nodes in the cluster.&lt;/p&gt;

&lt;p&gt;We might also consider implementing a &quot;deny list&quot; to block specific client NIDs from connecting to the filesystem.  This could be used by the peer history mechanism to semi-permanently (at least until reboot) block client NIDs from reconnecting to the filesystem after eviction, so that they are not flapping their Lustre mountpoint, but are &quot;hard down&quot;.  This might be implemented as part of &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-17217&quot; title=&quot;Allow server to control/deny client connections&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-17217&quot;&gt;LU-17217&lt;/a&gt; &quot;&lt;tt&gt;Allow server to control client connections&lt;/tt&gt;&quot;.&lt;/p&gt;

&lt;p&gt;On the client side, this &quot;deny&quot; should show some kind of clear error &quot;refused connection&quot; message, similar to the case when a very old client is connecting to a newer server. &lt;/p&gt;</description>
                <environment></environment>
        <key id="80064">LU-17435</key>
            <summary>improved reliability in the face of intermittent network errors</summary>
                <type id="4" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11310&amp;avatarType=issuetype">Improvement</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="1" iconUrl="https://jira.whamcloud.com/images/icons/statuses/open.png" description="The issue is open and ready for the assignee to start work on it.">Open</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="wc-triage">WC Triage</assignee>
                                    <reporter username="adilger">Andreas Dilger</reporter>
                        <labels>
                    </labels>
                <created>Wed, 17 Jan 2024 06:10:36 +0000</created>
                <updated>Wed, 17 Jan 2024 06:34:57 +0000</updated>
                                            <version>Lustre 2.14.0</version>
                    <version>Lustre 2.16.0</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>2</watches>
                                                                                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="78525">LU-17217</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i047vr:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>