<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:56:30 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-12886] A lot of LNetError: lnet_peer_ni_add_to_recoveryq_locked() messages</title>
                <link>https://jira.whamcloud.com/browse/LU-12886</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;After upgrading our servers on Fir (Sherlock&apos;s /scratch) to Lustre 2.12.3 RC1, we are noticing a lot of these messages on all Lustre servers:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;LNetError: 49537:0:(peer.c:3451:lnet_peer_ni_add_to_recoveryq_locked()) lpni 10.0.10.201@o2ib7 added to recovery queue. Health = 900
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The NIDs reported are our Lustre routers, that are still running 2.12.0+patches (they are on the client clusters).&lt;/p&gt;

&lt;p&gt;Attaching logs from all servers as  &lt;span class=&quot;nobr&quot;&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/attachment/33681/33681_lnet_recoveryq.log&quot; title=&quot;lnet_recoveryq.log attached to LU-12886&quot;&gt;lnet_recoveryq.log&lt;sup&gt;&lt;img class=&quot;rendericon&quot; src=&quot;https://jira.whamcloud.com/images/icons/link_attachment_7.gif&quot; height=&quot;7&quot; width=&quot;7&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/sup&gt;&lt;/a&gt;&lt;/span&gt; &lt;/p&gt;

&lt;p&gt;This doesn&apos;t seem to have an impact on production and so far 2.12.3 RC1 has been just great for us (and we run it without additional patches now!). Thanks!&lt;/p&gt;

&lt;p&gt;St&#233;phane&lt;/p&gt;</description>
                <environment>2.12.3 RC1 (vanilla) on servers, CentOS 7.6, patched kernel; 2.12.0 + patches on clients</environment>
        <key id="57199">LU-12886</key>
            <summary>A lot of LNetError: lnet_peer_ni_add_to_recoveryq_locked() messages</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="ashehata">Amir Shehata</assignee>
                                    <reporter username="sthiell">Stephane Thiell</reporter>
                        <labels>
                            <label>llnl</label>
                    </labels>
                <created>Sat, 19 Oct 2019 18:50:18 +0000</created>
                <updated>Fri, 10 Apr 2020 21:57:49 +0000</updated>
                            <resolved>Fri, 10 Apr 2020 21:57:49 +0000</resolved>
                                    <version>Lustre 2.12.3</version>
                                                        <due></due>
                            <votes>1</votes>
                                    <watches>6</watches>
                                                                            <comments>
                            <comment id="256697" author="sthiell" created="Sat, 19 Oct 2019 18:57:44 +0000"  >&lt;p&gt;I forgot to mention that we&apos;re now using Mellanox OFED 4.7 on IB HDR100.&lt;/p&gt;</comment>
                            <comment id="256702" author="pjones" created="Sun, 20 Oct 2019 13:30:21 +0000"  >&lt;p&gt;Amir&lt;/p&gt;

&lt;p&gt;Could you please advise&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="256706" author="ashehata" created="Sun, 20 Oct 2019 16:38:57 +0000"  >&lt;p&gt;That&apos;s the health feature. When sending to a peer fails the health of the interface is adjusted. If there are multiple interfaces available the healthiest one is selected. The interface is added on a recovery queue and pinged every 1 second. Until it&apos;s fully healthy.&lt;/p&gt;

&lt;p&gt;Are these particular routers down? Or did you see a burst of these messages early on and then they stopped?&lt;/p&gt;</comment>
                            <comment id="256707" author="sthiell" created="Sun, 20 Oct 2019 17:19:55 +0000"  >&lt;p&gt;Thanks for checking Amir! No, I just see a constant (slow) flow of these messages on the server side, but all our routers are up and running. Attached kernel logs of an OSS as &lt;span class=&quot;nobr&quot;&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/attachment/33682/33682_fir-io1-s1.log&quot; title=&quot;fir-io1-s1.log attached to LU-12886&quot;&gt;fir-io1-s1.log&lt;sup&gt;&lt;img class=&quot;rendericon&quot; src=&quot;https://jira.whamcloud.com/images/icons/link_attachment_7.gif&quot; height=&quot;7&quot; width=&quot;7&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/sup&gt;&lt;/a&gt;&lt;/span&gt;. The trafic on the routers can be pretty high though (close to 8-10 GB/s per EDR/HDR100 router right now). We&apos;re not using multi-rail lnet. According to lnet/peers, we might have ran out of rtr credits but at the moment it seems OK. BTW, is there a way to reset the counters of &lt;tt&gt;/sys/kernel/debug/lnet/peers&lt;/tt&gt;? that would be helpful &lt;img class=&quot;emoticon&quot; src=&quot;https://jira.whamcloud.com/images/icons/emoticons/smile.png&quot; height=&quot;16&quot; width=&quot;16&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;# clush -w @rtr_fir_[1-2] -b &quot;cat /sys/kernel/debug/lnet/peers  | egrep &apos;^nid|o2ib7&apos;&quot;
---------------
sh-rtr-fir-1-1
---------------
nid                      refs state  last   max   rtr   min    tx   min queue
10.0.10.3@o2ib7             1    up   176     8     8     8     8     6 0
10.0.10.105@o2ib7           3    up    36     8     6   -63     8 -1210 0
10.0.10.110@o2ib7           1    up    26     8     8     8     8     6 0
10.0.10.102@o2ib7           4    up    32     8     5   -70     8 -1157 0
10.0.10.107@o2ib7           1    up    94     8     8   -45     8 -1249 0
10.0.10.52@o2ib7            1    up   168     8     8   -16     8 -1645 0
10.0.10.104@o2ib7           2    up   170     8     7   -51     8 -1195 0
10.0.10.54@o2ib7            1    up    57     8     8    -8     8   -39 0
10.0.10.101@o2ib7           3    up    52     8     6   -54     8 -1150 0
10.0.10.106@o2ib7           1    up     7     8     8   -43     8 -1221 0
10.0.10.51@o2ib7            1    up    13     8     8   -16     8 -3546 0
10.0.10.103@o2ib7           4    up   152     8     5   -70     8 -1244 0
10.0.10.116@o2ib7           1    up    33     8     8     8     8     6 0
10.0.10.108@o2ib7           1    up    12     8     8   -43     8 -1151 0
10.0.10.53@o2ib7            1    up   165     8     8    -8     8   -46 0
---------------
sh-rtr-fir-1-2
---------------
nid                      refs state  last   max   rtr   min    tx   min queue
10.0.10.3@o2ib7             1    up   176     8     8     8     8     6 0
10.0.10.105@o2ib7           1    up   166     8     8    -8     8   -39 0
10.0.10.110@o2ib7           1    up    82     8     8     8     8     6 0
10.0.10.102@o2ib7           1    up    83     8     8    -8     8   -73 0
10.0.10.107@o2ib7           1    up    32     8     8    -8     8   -30 0
10.0.10.52@o2ib7            1    up    61     8     8    -8     8  -741 0
10.0.10.104@o2ib7           3    up    45     8     6    -8     8   -33 0
10.0.10.54@o2ib7            1    up    26     8     8    -8     8   -18 0
10.0.10.101@o2ib7           1    up   120     8     8    -8     8   -19 0
10.0.10.106@o2ib7           1    up    35     8     8    -8     8   -34 0
10.0.10.51@o2ib7            1    up   100     8     8    -8     8  -123 0
10.0.10.103@o2ib7           2    up    24     8     7    -8     8   -28 0
10.0.10.116@o2ib7           1    up    82     8     8     8     8     6 0
10.0.10.108@o2ib7           1    up    18     8     8    -8     8   -72 0
10.0.10.53@o2ib7            2    up   111     8     7    -8     8   -51 0
---------------
sh-rtr-fir-1-3
---------------
nid                      refs state  last   max   rtr   min    tx   min queue
10.0.10.3@o2ib7             1    up   176     8     8     8     8     6 0
10.0.10.105@o2ib7           2    up   179     8     7  -549     8 -1016 0
10.0.10.110@o2ib7           1    up   177     8     8     8     8     6 0
10.0.10.102@o2ib7           2    up     6     8     7  -793     8  -807 0
10.0.10.107@o2ib7           3    up    60     8     6  -664     8  -808 0
10.0.10.52@o2ib7            1    up    96     8     8 -1151     8 -1394 0
10.0.10.104@o2ib7           4    up   138     8     6  -789     8 -1020 0
10.0.10.54@o2ib7            1    up    58     8     8    -8     8   -36 0
10.0.10.101@o2ib7           2    up    17     8     7  -826     8  -827 0
10.0.10.106@o2ib7           1    up   176     8     8  -773     8  -782 0
10.0.10.51@o2ib7            1    up    33     8     8    -8     8 -3907 0
10.0.10.103@o2ib7           1    up   170     8     8 -1383     8 -1052 0
10.0.10.116@o2ib7           1    up     3     8     8     8     8     6 0
10.0.10.108@o2ib7           2    up   154     8     7  -688     8  -787 0
10.0.10.53@o2ib7            1    up   144     8     8    -8     8   -23 0
---------------
sh-rtr-fir-1-4
---------------
nid                      refs state  last   max   rtr   min    tx   min queue
10.0.10.3@o2ib7             1    up   175     8     8     8     8     6 0
10.0.10.105@o2ib7           4    up   127     8     5    -8     8   -59 0
10.0.10.110@o2ib7           1    up    49     8     8     8     8     6 0
10.0.10.102@o2ib7           3    up   153     8     6    -8     8   -91 0
10.0.10.107@o2ib7           1    up    98     8     8    -8     8   -28 0
10.0.10.52@o2ib7            1    up    83     8     8    -8     8 -1007 0
10.0.10.104@o2ib7           3    up   122     8     6    -8     8   -42 0
10.0.10.54@o2ib7            1    up    43     8     8    -8     8   -39 0
10.0.10.101@o2ib7           2    up   160     8     7    -8     8   -22 0
10.0.10.114@o2ib7           1  down  9999     8     8     8     8     7 0
10.0.10.106@o2ib7           2    up   115     8     7    -8     8   -50 0
10.0.10.51@o2ib7            1    up     8     8     8    -8     8  -252 0
10.0.10.103@o2ib7           3    up   120     8     6    -8     8   -59 0
10.0.10.116@o2ib7           1    up     0     8     8     8     8     6 0
10.0.10.108@o2ib7           2    up    80     8     7    -8     8   -89 0
10.0.10.53@o2ib7            1    up   126     8     8    -8     8   -42 0
---------------
sh-rtr-fir-2-1
---------------
nid                      refs state  last   max   rtr   min    tx   min queue
10.0.10.3@o2ib7             1    up   176     8     8     8     8     6 0
10.0.10.105@o2ib7           3    up    43     8     6   -16     8  -243 0
10.0.10.110@o2ib7           1    up   157     8     8     8     8     6 0
10.0.10.102@o2ib7           1    up     6     8     8   -16     8  -289 0
10.0.10.107@o2ib7           2    up    62     8     8   -16     7  -261 560
10.0.10.52@o2ib7            1    up    81     8     8    -8     8   -23 0
10.0.10.104@o2ib7           1    up     9     8     8    -8     8  -197 0
10.0.10.54@o2ib7            1    up    73     8     8    -8     8   -75 0
10.0.10.101@o2ib7           1    up    48     8     8   -16     8  -228 0
10.0.10.106@o2ib7           1    up    34     8     8   -16     8  -284 0
10.0.10.51@o2ib7            1    up   157     8     8    -8     8  -685 0
0.0.10.51@o2ib7             1  down  9999     8     8     8     8     7 0
10.0.10.103@o2ib7           2    up    38     8     7   -16     8  -259 0
10.0.10.116@o2ib7           1    up   158     8     8     8     8     6 0
10.0.10.108@o2ib7           1    up    37     8     8   -16     8  -376 0
10.0.10.53@o2ib7            1    up    87     8     8    -8     8  -410 0
---------------
sh-rtr-fir-2-2
---------------
nid                      refs state  last   max   rtr   min    tx   min queue
10.0.10.3@o2ib7             1    up   176     8     8     8     8     6 0
10.0.10.105@o2ib7           3    up    58     8     6   -16     8  -237 0
10.0.10.110@o2ib7           1    up    12     8     8     8     8     6 0
10.0.10.102@o2ib7           1    up    56     8     8   -16     8  -297 0
10.0.10.107@o2ib7           3    up    84     8     6   -16     8  -229 0
10.0.10.52@o2ib7            1    up   132     8     8    -8     8   -14 0
10.0.10.104@o2ib7           4    up    95     8     6    -8     7  -217 72
10.0.10.54@o2ib7            1    up   130     8     8    -8     8   -72 0
10.0.10.101@o2ib7           1    up    99     8     8   -16     8  -250 0
10.0.10.106@o2ib7           2    up    58     8     7   -16     8  -308 0
10.0.10.51@o2ib7            1    up   141     8     8    -8     8  -353 0
10.0.10.103@o2ib7           2    up    35     8     7   -16     8  -313 0
10.0.10.116@o2ib7           1    up    15     8     8     8     8     6 0
10.0.10.108@o2ib7           2    up    37     8     7   -16     8  -267 0
10.0.10.53@o2ib7            1    up   117     8     8    -8     8  -323 0
---------------
sh-rtr-fir-2-3
---------------
nid                      refs state  last   max   rtr   min    tx   min queue
10.0.10.3@o2ib7             1    up   176     8     8     8     8     6 0
10.0.10.105@o2ib7           2    up    42     8     7   -16     8  -212 0
10.0.10.110@o2ib7           1    up   107     8     8     8     8     6 0
10.0.10.102@o2ib7           2    up    59     8     7   -16     8  -338 0
10.0.10.107@o2ib7           2    up    91     8     7   -16     8  -215 0
10.0.10.52@o2ib7            1    up     1     8     8    -8     8   -17 0
10.0.10.104@o2ib7           2    up    42     8     7    -8     8  -226 0
10.0.10.54@o2ib7            1    up    76     8     8    -8     8   -86 0
10.0.10.101@o2ib7           1    up    55     8     8   -16     8  -248 0
10.0.10.106@o2ib7           1    up    21     8     8   -16     8  -358 0
10.0.10.51@o2ib7            1    up   136     8     8    -8     8  -424 0
10.0.10.103@o2ib7           2    up    13     8     7   -16     8  -259 0
10.0.10.116@o2ib7           1    up   107     8     8     8     8     6 0
10.0.10.108@o2ib7           1    up    24     8     8   -16     8  -396 0
10.0.10.53@o2ib7            1    up   157     8     8    -8     8  -549 0
---------------
sh-rtr-fir-2-4
---------------
nid                      refs state  last   max   rtr   min    tx   min queue
10.0.10.3@o2ib7             1    up   176     8     8     8     8     6 0
10.0.10.105@o2ib7           3    up    44     8     6   -16     8  -292 0
10.0.10.110@o2ib7           1    up    52     8     8     8     8     6 0
10.0.10.102@o2ib7           1    up    28     8     8   -16     8  -318 0
10.0.10.107@o2ib7           1    up    29     8     8   -16     8  -271 0
10.0.10.52@o2ib7            1    up     1     8     8    -8     8   -19 0
10.0.10.104@o2ib7           3    up    48     8     6   -16     8  -294 0
10.0.10.54@o2ib7            1    up    28     8     8    -8     8   -53 0
10.0.10.101@o2ib7           1    up    19     8     8   -16     8  -254 0
10.0.10.106@o2ib7           1    up    37     8     8   -16     8  -342 0
10.0.10.51@o2ib7            1    up    89     8     8    -8     8  -470 0
10.0.10.103@o2ib7           1    up    31     8     8   -16     8  -311 0
10.0.10.116@o2ib7           1    up    53     8     8     8     8     6 0
10.0.10.108@o2ib7           2    up     9     8     7   -16     8  -261 0
10.0.10.53@o2ib7            1    up    20     8     8    -8     8  -332 0
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Example of trafic on sh-rtr-fir-2-1 (10.0.10.209@o2ib7) at 10:03AM this morning:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[root@sh-rtr-fir-2-1 ~]# collectl -sx
waiting for 1 second sample...
#&amp;lt;-----------InfiniBand-----------&amp;gt;
#   KBIn  PktIn   KBOut PktOut Errs 
  377711  2493K 9336463  2488K    0 
  397428  2070K 7594350  2066K    0 
  545513  2403K 8775817  2395K    0 
  521123  2781K  10089K  2775K    0 
  527162  2831K  10251K  2824K    0 
  416422  2516K 9368186  2511K    0 
  691311  2339K 8384500  2330K    0 
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;At about the same time:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;fir-io3-s2: Oct 20 10:00:17 fir-io3-s2 kernel: LNetError: 61521:0:(peer.c:3451:lnet_peer_ni_add_to_recoveryq_locked()) lpni 10.0.10.209@o2ib7 added to recovery queue. Health = 900
fir-io3-s2: Oct 20 10:00:17 fir-io3-s2 kernel: LNetError: 61521:0:(peer.c:3451:lnet_peer_ni_add_to_recoveryq_locked()) Skipped 79 previous similar messages
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;On the router &lt;tt&gt;sh-rtr-fir-2-1&lt;/tt&gt; itself, no new messages today (these messages are from yesterday). Occassional client timeout on the cluster fabric (o2ib4 here) is normal I think:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Oct 19 13:26:59 sh-rtr-fir-2-1.int kernel: LNetError: 10492:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.9.106.54@o2ib4 (103): c: 7, oc: 0, rc: 8
Oct 19 13:27:49 sh-rtr-fir-2-1.int kernel: LNetError: 10498:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don&apos;t perform health checking (0, 2)
Oct 19 13:27:56 sh-rtr-fir-2-1.int kernel: LNetError: 10501:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don&apos;t perform health checking (0, 2)
Oct 19 17:34:37 sh-rtr-fir-2-1.int kernel: LNetError: 10492:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: active_txs, 0 seconds
Oct 19 17:34:37 sh-rtr-fir-2-1.int kernel: LNetError: 10492:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.9.105.12@o2ib4 (107): c: 7, oc: 0, rc: 8
Oct 19 17:34:53 sh-rtr-fir-2-1.int kernel: LNetError: 10496:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don&apos;t perform health checking (0, 2)
Oct 19 17:35:07 sh-rtr-fir-2-1.int kernel: LNetError: 10495:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don&apos;t perform health checking (0, 2)
Oct 19 21:28:39 sh-rtr-fir-2-1.int kernel: LNetError: 10492:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds
Oct 19 21:28:39 sh-rtr-fir-2-1.int kernel: LNetError: 10492:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.9.107.3@o2ib4 (107): c: 8, oc: 0, rc: 8
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Our setup:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;&lt;tt&gt;sh-rtr-fir-1-&lt;span class=&quot;error&quot;&gt;&amp;#91;1-4&amp;#93;&lt;/span&gt;&lt;/tt&gt; are our o2ib6 (Sherlock 1 cluster FDR fabric) / o2ib7 (scratch storage, HDR100 fabric) routers (&lt;tt&gt;10.0.10.[201-204@o2ib7&lt;/tt&gt;)&lt;/li&gt;
	&lt;li&gt;&lt;tt&gt;sh-rtr-fir-2-&lt;span class=&quot;error&quot;&gt;&amp;#91;1-4&amp;#93;&lt;/span&gt;&lt;/tt&gt; are our o2ib4 (Sherlock 2 cluster EDR fabric) / o2ib7 (scratch storage, HDR100 fabric) routers (&lt;tt&gt;10.0.10.[209-212@o2ib7&lt;/tt&gt;)&lt;/li&gt;
&lt;/ul&gt;
</comment>
                            <comment id="260565" author="luis_silva" created="Fri, 3 Jan 2020 17:51:06 +0000"  >&lt;p&gt;We are having the same errors on our clients. Our setup only uses a single IB interface everywhere. We are not using multi-rail anywhere. My understanding is that this health check is designed for multi-rail setups? Is that correct? If so, is there any way I can tell Lustre to skip these health checks? Thanks in advance. -Luis&lt;/p&gt;</comment>
                            <comment id="260579" author="ashehata" created="Fri, 3 Jan 2020 20:16:30 +0000"  >&lt;p&gt;Hi Luis,&lt;/p&gt;

&lt;p&gt;You can turn off health on your setup&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
lnetctl set health_sensitivity 0
lnetctl set transaction_timeout 50 # or some value you&apos;d like 
lnetctl set retry_count 0&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;We&apos;ve also committed a patch which reduces the severity of these messages so they will not be displayed on the console.&lt;/p&gt;</comment>
                            <comment id="262235" author="ofaaland" created="Thu, 30 Jan 2020 22:04:32 +0000"  >&lt;blockquote&gt;&lt;p&gt;We&apos;ve also committed a patch which reduces the severity of these messages so they will not be displayed on the console.&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;Hi Amir,&lt;br/&gt;
Which patch?&lt;br/&gt;
Thanks&lt;/p&gt;</comment>
                            <comment id="262237" author="pjones" created="Thu, 30 Jan 2020 22:25:31 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-11981&quot; title=&quot;lnet_is_health_check() Msg is in inconsistent state, don&amp;#39;t perform health checking (0, 2)&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-11981&quot;&gt;&lt;del&gt;LU-11981&lt;/del&gt;&lt;/a&gt; I assume. It will be in 2.12.4&lt;/p&gt;</comment>
                            <comment id="262239" author="ofaaland" created="Thu, 30 Jan 2020 22:35:41 +0000"  >&lt;p&gt;It looks to me like &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-11981&quot; title=&quot;lnet_is_health_check() Msg is in inconsistent state, don&amp;#39;t perform health checking (0, 2)&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-11981&quot;&gt;&lt;del&gt;LU-11981&lt;/del&gt;&lt;/a&gt; patch addresses only:&lt;/p&gt;

&lt;p&gt;&quot;Msg is in inconsistent state, don&apos;t perform health checking&quot;&lt;/p&gt;

&lt;p&gt;not&lt;/p&gt;

&lt;p&gt;&quot;added to recovery queue&quot;&lt;/p&gt;</comment>
                            <comment id="263536" author="ofaaland" created="Wed, 19 Feb 2020 04:28:54 +0000"  >&lt;p&gt;Hi Amir,&lt;/p&gt;

&lt;p&gt;What 2.12.4 patch(es) reduce the severity of the &quot;added to recovery queue&quot; messages?&lt;/p&gt;

&lt;p&gt;thanks&lt;/p&gt;</comment>
                            <comment id="263616" author="ashehata" created="Thu, 20 Feb 2020 00:16:43 +0000"  >&lt;p&gt;Hi Olaf,&lt;/p&gt;

&lt;p&gt;It doesn&apos;t look like this patch made it in 2.12.4. It still hasn&apos;t landed. Needs an extra review:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://review.whamcloud.com/#/c/37002/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/#/c/37002/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The &quot;inconsistent&quot; message was reduced here:&lt;/p&gt;

&lt;p&gt;f549927ea633b910a8c788fa970af742b3bf10c1 &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-11981&quot; title=&quot;lnet_is_health_check() Msg is in inconsistent state, don&amp;#39;t perform health checking (0, 2)&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-11981&quot;&gt;&lt;del&gt;LU-11981&lt;/del&gt;&lt;/a&gt; lnet: clean up error message&lt;/p&gt;

&lt;p&gt;thanks&lt;/p&gt;

&lt;p&gt;amir&lt;/p&gt;</comment>
                            <comment id="264119" author="sthiell" created="Thu, 27 Feb 2020 00:43:50 +0000"  >&lt;p&gt;Hello, we would also appreciate a backport of this patch (that just landed into master) to b2_12 as it is very verbose. Thanks!!&lt;/p&gt;</comment>
                            <comment id="264120" author="ofaaland" created="Thu, 27 Feb 2020 00:51:06 +0000"  >&lt;p&gt;Hi Stephane,&lt;/p&gt;

&lt;p&gt;They&apos;ve pushed a backport for b2_12 to gerrit, and it&apos;s been reviewed and passed tests, just needs to land:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://review.whamcloud.com/#/c/37718/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/#/c/37718/&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="266533" author="pjones" created="Wed, 1 Apr 2020 12:07:45 +0000"  >&lt;p&gt;The &lt;a href=&quot;https://review.whamcloud.com/#/c/37718/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/#/c/37718/&lt;/a&gt; fix landed to b2_12 some weeks back so I think that it should be ok to close this ticket. Any objections?&lt;/p&gt;</comment>
                            <comment id="267400" author="ofaaland" created="Fri, 10 Apr 2020 21:31:16 +0000"  >&lt;p&gt;Agreed&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="57624">LU-13071</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="33682" name="fir-io1-s1.log" size="954750" author="sthiell" created="Sun, 20 Oct 2019 17:15:21 +0000"/>
                            <attachment id="33681" name="lnet_recoveryq.log" size="20556" author="sthiell" created="Sat, 19 Oct 2019 18:48:55 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i00obb:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10023"><![CDATA[4]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>