<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:51:18 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-12292] Decrement Health Value even if recovery processing fault</title>
                <link>https://jira.whamcloud.com/browse/LU-12292</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Health value is used to determine route.&lt;br/&gt;
In case of device failure, the value is periodically decremented by recovery process and&lt;br/&gt;
after restration of the device, the value is incremented periodically.&lt;br/&gt;
But, normal route is not selected until the value will be restored.&lt;br/&gt;
We think the value should not be decremented by recovery process,&lt;br/&gt;
because it takes time to be restored to its original value.&lt;/p&gt;

&lt;p&gt;We stopped health value decrement at recovery processing after a device failure is detected &lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;</description>
                <environment></environment>
        <key id="55620">LU-12292</key>
            <summary>Decrement Health Value even if recovery processing fault</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="takamura">Tatsushi Takamura</assignee>
                                    <reporter username="takamura">Tatsushi Takamura</reporter>
                        <labels>
                            <label>LTS12</label>
                    </labels>
                <created>Mon, 13 May 2019 09:44:45 +0000</created>
                <updated>Fri, 16 Oct 2020 06:21:54 +0000</updated>
                            <resolved>Tue, 31 Mar 2020 11:38:36 +0000</resolved>
                                    <version>Lustre 2.12.1</version>
                                    <fixVersion>Lustre 2.14.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>8</watches>
                                                                            <comments>
                            <comment id="247263" author="ashehata" created="Thu, 16 May 2019 14:42:40 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/ViewProfile.jspa?name=simmonsja&quot; class=&quot;user-hover&quot; rel=&quot;simmonsja&quot;&gt;simmonsja&lt;/a&gt; had a good suggestion in this case. We should increment the health value by the same amount we decrement it with. In this way the interface recovers faster. For example if you want to fail the interface on the first failure and recover it on the first success you can set the health value to 1000. This will basically bring down the interface on the first failure and once we get one successful recovery the interface will be back to the full health. &lt;/p&gt;

&lt;p&gt;thoughts?&lt;/p&gt;</comment>
                            <comment id="247265" author="hornc" created="Thu, 16 May 2019 14:46:25 +0000"  >&lt;p&gt;I like that idea.&lt;/p&gt;</comment>
                            <comment id="247334" author="curtispb" created="Fri, 17 May 2019 15:15:39 +0000"  >&lt;p&gt;I think Andreas had a valid point in &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-12303&quot; title=&quot;Use lnet_health_sensitivity for restoring health for each lnet_recovery_internal&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-12303&quot;&gt;&lt;del&gt;LU-12303&lt;/del&gt;&lt;/a&gt; where this behavior could cause issues with flapping hardware. Could a better approach be a more weighted recovery where it would start with the 1 per success but increase the health recovery amount on consecutive pings? This could allow faster recovery after the hardware issue is resolved, but not simply trust it is resolved right away.&lt;/p&gt;</comment>
                            <comment id="247489" author="ashehata" created="Tue, 21 May 2019 20:45:49 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/ViewProfile.jspa?name=curtispb&quot; class=&quot;user-hover&quot; rel=&quot;curtispb&quot;&gt;curtispb&lt;/a&gt;, yes that sounds like a good idea. Do you have suggestions on the ratio it should increase by on consecutive successes?&lt;/p&gt;</comment>
                            <comment id="247518" author="curtispb" created="Wed, 22 May 2019 11:15:45 +0000"  >&lt;p&gt;My first pass at this would be an exponential growth pattern such as health value + ((1 + consecutive successes)^2) with a max bound of 1000 which would recover in ~15 consecutive successes if the health value hit the floor. If there is a failure during this recovery period decrement normally and the consecutive counter is reset. Thoughts?&lt;/p&gt;</comment>
                            <comment id="255113" author="takamura" created="Fri, 20 Sep 2019 09:18:01 +0000"  >&lt;p&gt;Recent IB is stable and high quality, so we thought it is enough for us to stop health value decrement at recovery processing (1000sec&#160; is too much).&lt;br/&gt;
I think your idea is good, because it can handle flapping hardware and recovers in 15 seconds normal case.&lt;/p&gt;</comment>
                            <comment id="259115" author="gerrit" created="Wed, 4 Dec 2019 01:00:43 +0000"  >&lt;p&gt;Amir Shehata (ashehata@whamcloud.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/36921&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/36921&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-12292&quot; title=&quot;Decrement Health Value even if recovery processing fault&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-12292&quot;&gt;&lt;del&gt;LU-12292&lt;/del&gt;&lt;/a&gt; lnet: keep health even if recovery failed&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: f29e84d3ea2f7a16c98619489b63b42571774003&lt;/p&gt;</comment>
                            <comment id="266387" author="gerrit" created="Tue, 31 Mar 2020 07:00:18 +0000"  >&lt;p&gt;Oleg Drokin (green@whamcloud.com) merged in patch &lt;a href=&quot;https://review.whamcloud.com/36921/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/36921/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-12292&quot; title=&quot;Decrement Health Value even if recovery processing fault&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-12292&quot;&gt;&lt;del&gt;LU-12292&lt;/del&gt;&lt;/a&gt; lnet: keep health even if recovery failed&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 610a7542107d5a8ab0a12dc8bda7a4f44f9f0b60&lt;/p&gt;</comment>
                            <comment id="266411" author="pjones" created="Tue, 31 Mar 2020 11:38:36 +0000"  >&lt;p&gt;Landed for 2.14&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="55649">LU-12303</issuekey>
        </issuelink>
                            </outwardlinks>
                                                                <inwardlinks description="is related to">
                                                        </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i00g5z:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>