<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:48:37 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-11981] lnet_is_health_check() Msg is in inconsistent state, don&apos;t perform health checking (0, 2)</title>
                <link>https://jira.whamcloud.com/browse/LU-11981</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Over the span of about 20 minutes, routers reported the following in their console logs:&lt;br/&gt;
2019-02-19 10:05:02 &lt;span class=&quot;error&quot;&gt;&amp;#91;330235.278414&amp;#93;&lt;/span&gt; LNetError: 33048:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don&apos;t perform health checking (0, 2)&lt;br/&gt;
2019-02-19 10:05:02 &lt;span class=&quot;error&quot;&gt;&amp;#91;330235.294305&amp;#93;&lt;/span&gt; LNetError: 33048:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 1646 previous similar messages&lt;/p&gt;

&lt;p&gt;While the lustre servers were being rebooted.&lt;br/&gt;
(0, 2) corresponds to:&lt;br/&gt;
msg-&amp;gt;msg_ev.status == 0 (success)&lt;br/&gt;
msg-&amp;gt;msg_health_status == 2 (LNET_MSG_STATUS_LOCAL_DROPPED)&lt;/p&gt;

&lt;p&gt;See &lt;a href=&quot;https://github.com/LLNL/lustre/releases&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://github.com/LLNL/lustre/releases&lt;/a&gt; for contents of 2.12.0_1.chaos.&lt;/p&gt;</description>
                <environment>clients and routers: Lustre 2.12.0_1.chaos&lt;br/&gt;
lustre servers: Lustre 2.10.6_2.chaos&lt;br/&gt;
&lt;br/&gt;
Linux version 3.10.0-957.1.3.1chaos.ch6.x86_64&lt;br/&gt;
Clients OmniPath &amp;lt;-&amp;gt; routers &amp;lt;-&amp;gt; Servers mlx5</environment>
        <key id="54931">LU-11981</key>
            <summary>lnet_is_health_check() Msg is in inconsistent state, don&apos;t perform health checking (0, 2)</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="ashehata">Amir Shehata</assignee>
                                    <reporter username="ofaaland">Olaf Faaland</reporter>
                        <labels>
                            <label>llnl</label>
                    </labels>
                <created>Wed, 20 Feb 2019 18:23:30 +0000</created>
                <updated>Thu, 30 Jan 2020 20:54:16 +0000</updated>
                            <resolved>Fri, 20 Dec 2019 13:52:15 +0000</resolved>
                                    <version>Lustre 2.12.0</version>
                                    <fixVersion>Lustre 2.12.4</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>5</watches>
                                                                            <comments>
                            <comment id="242367" author="ashehata" created="Wed, 20 Feb 2019 18:47:58 +0000"  >&lt;p&gt;would you be able to turn on net logging&lt;/p&gt;

&lt;p&gt;lctl set_param debug=+&quot;net neterror&quot;&lt;/p&gt;

&lt;p&gt;and capture the logs when you reproduce this message. I have made some changes in this area as part of &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-11477&quot; title=&quot;handle health for both incoming and outgoing messages&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-11477&quot;&gt;&lt;del&gt;LU-11477&lt;/del&gt;&lt;/a&gt;. I want to see if my changes there resolve this particular problem. I can then port it to 2.12.&lt;/p&gt;</comment>
                            <comment id="242368" author="ofaaland" created="Wed, 20 Feb 2019 19:00:11 +0000"  >&lt;p&gt;Thanks Amir.  See dk.opal190.1550688817.txt.gz attached.  Look towards the end of the file, the beginning starts before I turned on net logging.&lt;/p&gt;</comment>
                            <comment id="243384" author="ashehata" created="Wed, 6 Mar 2019 01:37:21 +0000"  >&lt;p&gt;Sorry for the delay. It looks like there is a path in the code where the message is dropped but the message error status is not updated. However, the health status is updated. Leading to the inconsistent message you see. I&apos;ll update that path to correctly set the error status in the message&lt;/p&gt;</comment>
                            <comment id="258796" author="ofaaland" created="Mon, 25 Nov 2019 22:43:29 +0000"  >&lt;p&gt;&amp;lt;poke&amp;gt; Thanks&lt;/p&gt;</comment>
                            <comment id="259112" author="ashehata" created="Wed, 4 Dec 2019 00:41:40 +0000"  >&lt;p&gt;there are a series of patches which were back ported to b2_12 which resolve some issues including the one reported in this ticket.&lt;/p&gt;

&lt;p&gt;The particular patch which resolves this issue is:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-12199&quot; title=&quot;md&amp;#39;s are not detached from uncommitted messages that have health check performed on them&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-12199&quot;&gt;&lt;del&gt;LU-12199&lt;/del&gt;&lt;/a&gt; lnet: Ensure md is detached when msg is not committed&lt;/p&gt;

&lt;p&gt;However, I would suggest moving to 2.12.3 which includes this fix and others.&lt;/p&gt;</comment>
                            <comment id="259654" author="ofaaland" created="Wed, 11 Dec 2019 23:23:42 +0000"  >&lt;p&gt;Hmm.&#160; We&apos;re seeing&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;2019-12-11 10:01:08 [  972.859958] LNetError: 28880:0:(lib-msg.c:820:lnet_is_health_check()) Msg is in inconsistent state, don&apos;t perform health checking (-125, 0)&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;with Lustre 2.12.3.&lt;/p&gt;

&lt;p&gt;I see now that the values at the end of the message are (-125,0) which are different than the originally reported ones.  And the system where I see this is Mellanox IB. New ticket for that?&lt;/p&gt;
</comment>
                            <comment id="259709" author="ashehata" created="Thu, 12 Dec 2019 17:54:03 +0000"  >&lt;p&gt;No need to create a new ticket. I would say this scenario is expected, since -125 is ECANCELED. For that we do not bother to adjust the health. The problem is that this message is at error level. We had already changed it to debug level, but that change was part of a bigger patch:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-11477&quot; title=&quot;handle health for both incoming and outgoing messages&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-11477&quot;&gt;&lt;del&gt;LU-11477&lt;/del&gt;&lt;/a&gt; lnet: handle health for incoming messages&lt;/p&gt;

&lt;p&gt;I&apos;ll create a patch just to change this log level to debug and push it on b2_12&lt;/p&gt;</comment>
                            <comment id="259711" author="gerrit" created="Thu, 12 Dec 2019 18:01:56 +0000"  >&lt;p&gt;Amir Shehata (ashehata@whamcloud.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/37001&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/37001&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-11981&quot; title=&quot;lnet_is_health_check() Msg is in inconsistent state, don&amp;#39;t perform health checking (0, 2)&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-11981&quot;&gt;&lt;del&gt;LU-11981&lt;/del&gt;&lt;/a&gt; lnet: clean up error message&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: b2_12&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 96a278829b7a375e05f23b538f8db876a68caa71&lt;/p&gt;</comment>
                            <comment id="260213" author="gerrit" created="Fri, 20 Dec 2019 06:44:31 +0000"  >&lt;p&gt;Oleg Drokin (green@whamcloud.com) merged in patch &lt;a href=&quot;https://review.whamcloud.com/37001/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/37001/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-11981&quot; title=&quot;lnet_is_health_check() Msg is in inconsistent state, don&amp;#39;t perform health checking (0, 2)&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-11981&quot;&gt;&lt;del&gt;LU-11981&lt;/del&gt;&lt;/a&gt; lnet: clean up error message&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: b2_12&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: f549927ea633b910a8c788fa970af742b3bf10c1&lt;/p&gt;</comment>
                            <comment id="260243" author="pjones" created="Fri, 20 Dec 2019 13:52:15 +0000"  >&lt;p&gt;Landed for 2.12.4. Not needed on master&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                                        </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="32040" name="dk.opal190.1550688817.txt.gz" size="3635611" author="ofaaland" created="Wed, 20 Feb 2019 18:59:11 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i00bxj:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>