<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:42:19 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-11257] RHEL/CentOS 3.10.0-862.11.6.el7.x86_64 kernel breaks LNet</title>
                <link>https://jira.whamcloud.com/browse/LU-11257</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;It looks like the latest kernel update from CentOS/RedHat prevents LNet to work on Infiniband interfaces (mlx5).&lt;/p&gt;
&lt;h3&gt;&lt;a name=&quot;Symptoms&quot;&gt;&lt;/a&gt;Symptoms&lt;/h3&gt;

&lt;p&gt;No LNet communication, self-ping doesn&apos;t work:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
# lctl list_nids
10.9.101.60@o2ib4
# lctl ping 10.9.101.60@o2ib4
failed to ping 10.9.101.60@o2ib4: Input/output error&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Communicating with other nodes is impossible, as is mounting filesystems.&lt;br/&gt;
The exact same node with the exact same configuration works flawlessly with kernel&#160;&lt;tt&gt;3.10.0-862.9.1.el7.x86_64&lt;/tt&gt;&lt;/p&gt;
&lt;h3&gt;&lt;a name=&quot;%C2%A0Versions&quot;&gt;&lt;/a&gt;&#160;Versions&lt;/h3&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
# uname -r
3.10.0-862.11.6.el7.x86_64
# cat /sys/fs/lustre/version
2.10.4&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;h3&gt;&lt;a name=&quot;HW&quot;&gt;&lt;/a&gt;HW&lt;/h3&gt;

&lt;p&gt;&#160;&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;# ibstat
CA &apos;mlx5_0&apos;
        CA type: MT4115
        Number of ports: 1
        Firmware version: 12.21.3012
        Hardware version: 0
        Node GUID: 0x7cfe900300268c04
        System image GUID: 0x7cfe900300268c04
        Port 1:
                State: Active
                Physical state: LinkUp
                Rate: 100
                Base lid: 72
                LMC: 0
                SM lid: 6
                Capability mask: 0x2651e848
                Port GUID: 0x7cfe900300268c04
                Link layer: InfiniBand&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&#160;&lt;/p&gt;
&lt;h3&gt;&lt;a name=&quot;Kernellogs&quot;&gt;&lt;/a&gt;Kernel logs&lt;/h3&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
[ 1185.337098] LNetError: 22109:0:(o2iblnd_cb.c:2513:kiblnd_passive_connect()) Can&apos;t accept 10.9.101.60@o2ib4: -22 
[ 1185.348376] LNet: 22109:0:(o2iblnd_cb.c:2212:kiblnd_reject()) Error -22 sending reject 
[ 1185.357473] LNetError: 22109:0:(o2iblnd_cb.c:2721:kiblnd_rejected()) 10.9.101.60@o2ib4 rejected: consumer defined fatal error&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</description>
                <environment>CentOS 7.5, x86_64</environment>
        <key id="52987">LU-11257</key>
            <summary>RHEL/CentOS 3.10.0-862.11.6.el7.x86_64 kernel breaks LNet</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="3">Duplicate</resolution>
                                        <assignee username="pjones">Peter Jones</assignee>
                                    <reporter username="srcc">Stanford Research Computing Center</reporter>
                        <labels>
                    </labels>
                <created>Thu, 16 Aug 2018 03:23:53 +0000</created>
                <updated>Fri, 23 Aug 2019 13:28:02 +0000</updated>
                            <resolved>Fri, 23 Aug 2019 13:28:01 +0000</resolved>
                                    <version>Lustre 2.10.4</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>11</watches>
                                                                            <comments>
                            <comment id="232040" author="srcc" created="Thu, 16 Aug 2018 15:02:21 +0000"  >&lt;p&gt;A more detailed changelog about that kernel is at &lt;a href=&quot;https://access.redhat.com/downloads/content/rhel---7/x86_64/2456/kernel/3.10.0-862.11.6.el7/x86_64/fd431d51/package-changelog,&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://access.redhat.com/downloads/content/rhel---7/x86_64/2456/kernel/3.10.0-862.11.6.el7/x86_64/fd431d51/package-changelog,&lt;/a&gt;&#160;if that&apos;s of any help.&lt;/p&gt;</comment>
                            <comment id="232049" author="scadmin" created="Thu, 16 Aug 2018 16:22:00 +0000"  >&lt;p&gt;we see the same thing on our OPA network.&lt;/p&gt;

&lt;p&gt;ksocklnd reportedly seems ok with this kernel on our TCP networks (in VMs mostly), so I suspect it&apos;s ko2iblnd related.&lt;/p&gt;

&lt;p&gt;below is syslog from&lt;br/&gt;
john98 # lctl ping warble@o2ib44&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Aug 17 01:58:10 john98 kernel: LNet: HW NUMA nodes: 2, HW CPU cores: 36, npartitions: 2
Aug 17 01:58:10 john98 kernel: alg: No test for adler32 (adler32-zlib)
Aug 17 01:58:11 john98 kernel: Lustre: Lustre: Build Version: 2.10.4
Aug 17 01:58:11 john98 kernel: LNet: Using FMR for registration
Aug 17 01:58:11 john98 kernel: LNet: Added LNI 192.168.44.198@o2ib44 [128/2048/0/180]
Aug 17 01:58:36 warble1 kernel: LNetError: 103:0:(o2iblnd_cb.c:3061:kiblnd_cm_callback()) 192.168.44.198@o2ib44: REJECTED 28
Aug 17 01:58:36 warble1 kernel: LNetError: 103:0:(o2iblnd_cb.c:3061:kiblnd_cm_callback()) Skipped 3 previous similar messages
Aug 17 02:06:08 john98 kernel: LNetError: 204:0:(o2iblnd_cb.c:2513:kiblnd_passive_connect()) Can&apos;t accept 192.168.44.198@o2ib44: -22
Aug 17 02:06:08 john98 kernel: LNet: 204:0:(o2iblnd_cb.c:2212:kiblnd_reject()) Error -22 sending reject
Aug 17 02:06:08 john98 kernel: LNetError: 204:0:(o2iblnd_cb.c:2721:kiblnd_rejected()) 192.168.44.198@o2ib44 rejected: consumer defined fatal error
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;2.10.4 was dkms rebuilt for this kernel.&lt;/p&gt;

&lt;p&gt;cheers,&lt;br/&gt;
robin&lt;/p&gt;</comment>
                            <comment id="232056" author="scadmin" created="Thu, 16 Aug 2018 16:36:23 +0000"  >&lt;p&gt;actually, ib_send_rw and ibv_rc_pingpong don&apos;t seem to work on this new kernel, so I suspect a RHEL have broken all IB RDMA?&lt;/p&gt;

&lt;p&gt;do they work for you?&lt;/p&gt;

&lt;p&gt;IPoIB works ok.&lt;/p&gt;

&lt;p&gt;cheers,&lt;br/&gt;
robin&lt;/p&gt;</comment>
                            <comment id="232057" author="srcc" created="Thu, 16 Aug 2018 16:46:58 +0000"  >&lt;p&gt;Good observation, indeed: perf tests such as ib_{read,send}_bw&#160; and ibv_rc_pingpong fail with errors like:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
Failed to modify QP to RTR
Couldn&apos;t connect to remote QP &lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;or&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
Failed to modify QP 386 to RTR
Unable to Connect the HCA&apos;s through the link &lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="232059" author="srcc" created="Thu, 16 Aug 2018 17:03:52 +0000"  >&lt;p&gt;I submitted RHEL bug #1618452 to report the issue:&lt;br/&gt;
 &lt;a href=&quot;https://bugzilla.redhat.com/show_bug.cgi?id=1618452&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://bugzilla.redhat.com/show_bug.cgi?id=1618452&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Which seem to be marked &quot;private&quot; by the Redhat bugzilla, without any way to mark it &quot;public&quot; on my end &lt;img class=&quot;emoticon&quot; src=&quot;https://jira.whamcloud.com/images/icons/emoticons/sad.png&quot; height=&quot;16&quot; width=&quot;16&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/p&gt;</comment>
                            <comment id="232061" author="pjones" created="Thu, 16 Aug 2018 17:09:42 +0000"  >&lt;p&gt;I think that you have to request for them to open it up.&lt;/p&gt;</comment>
                            <comment id="232064" author="srcc" created="Thu, 16 Aug 2018 17:52:43 +0000"  >&lt;p&gt;RHEL&apos;s reply:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;&lt;a href=&quot;https://bugzilla.redhat.com/show_bug.cgi?id=1618452&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://bugzilla.redhat.com/show_bug.cgi?id=1618452&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&#8212; Comment #3 from Don Dutile &amp;lt;&lt;span class=&quot;nobr&quot;&gt;&lt;a href=&quot;mailto:ddutile@redhat.com&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;ddutile@redhat.com&lt;sup&gt;&lt;img class=&quot;rendericon&quot; src=&quot;https://jira.whamcloud.com/images/icons/mail_small.gif&quot; height=&quot;12&quot; width=&quot;13&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/sup&gt;&lt;/a&gt;&lt;/span&gt;&amp;gt; &#8212; &lt;br/&gt;
 Already reported and being actively fixed.&lt;/p&gt;

&lt;p&gt;Cannot make this public, as the patch that caused it was due to embargo&apos;d &lt;br/&gt;
 security fix.&lt;/p&gt;

&lt;p&gt;This issue has highest priority for resolution.&lt;br/&gt;
 Revert to 3.10.0-862.11.5.el7 in the mean time.&lt;/p&gt;

&lt;p&gt;This bug has been marked as a duplicate of bug 1616346&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;</comment>
                            <comment id="232074" author="pjones" created="Thu, 16 Aug 2018 18:31:18 +0000"  >&lt;p&gt;Thanks for the info!&lt;/p&gt;</comment>
                            <comment id="232429" author="srcc" created="Wed, 22 Aug 2018 15:05:15 +0000"  >&lt;p&gt;Still no update from Red Hat.&#160;&lt;/p&gt;

&lt;p&gt;We&apos;re getting more info via The Register:&lt;br/&gt;
&lt;a href=&quot;https://www.theregister.co.uk/2018/08/21/fix_for_julys_spectrelike_bug_is_breaking_some_supers/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://www.theregister.co.uk/2018/08/21/fix_for_julys_spectrelike_bug_is_breaking_some_supers/&lt;/a&gt;&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;&#8220;The problem will be fixed in kernel-3.10.0-862.13.1 which is currently being reviewed by Red Hat Enterprise Linux Engineering.&#8221;&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;But no ETA yet.&lt;/p&gt;</comment>
                            <comment id="232436" author="scadmin" created="Wed, 22 Aug 2018 15:47:12 +0000"  >&lt;p&gt;hmm. comments attached to that article point to a fix in centos - potentially just a misplaced semi-colon. but OTOH the centos bug seems to be talking about IPoIB and that works fine. perhaps the fix is right and the bug report is wrong?&lt;/p&gt;

&lt;p&gt;cheers,&lt;br/&gt;
robin&lt;/p&gt;</comment>
                            <comment id="233046" author="jfilizetti" created="Wed, 5 Sep 2018 11:36:56 +0000"  >&lt;p&gt;I wish I would have looked here first when digging into the same thing instead of wasting a day trying to figure out the culprit.&#160; For now I&apos;ve opened another redhat bug since I didn&apos;t come across anything when searching their bugzilla.:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://bugzilla.redhat.com/show_bug.cgi?id=1625620&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://bugzilla.redhat.com/show_bug.cgi?id=1625620&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="233173" author="mdiep" created="Fri, 7 Sep 2018 13:13:19 +0000"  >&lt;p&gt;FYI &lt;a href=&quot;https://bugs.centos.org/view.php?id=15193&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://bugs.centos.org/view.php?id=15193&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="233452" author="scadmin" created="Thu, 13 Sep 2018 09:13:07 +0000"  >&lt;p&gt;yeah, we gave up waiting and just built our own ib_core.ko module with the 1-character patch from centos.&lt;br/&gt;
works fine now.&lt;/p&gt;

&lt;p&gt;cheers,&lt;br/&gt;
robin&lt;/p&gt;</comment>
                            <comment id="234020" author="srcc" created="Wed, 26 Sep 2018 15:16:45 +0000"  >&lt;p&gt;Kernel 3.10.0-862.14&#160;has been released, which fixes the issue:&lt;/p&gt;

&lt;p&gt;&#160;&lt;a href=&quot;https://access.redhat.com/downloads/content/rhel---7/x86_64/2456/kernel/3.10.0-862.14.4.el7/x86_64/fd431d51/package&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://access.redhat.com/downloads/content/rhel---7/x86_64/2456/kernel/3.10.0-862.14.4.el7/x86_64/fd431d51/package&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="234026" author="boggl" created="Wed, 26 Sep 2018 17:07:03 +0000"  >&lt;p&gt;The fix has also been noted in the Centos bug report; &lt;a href=&quot;https://bugs.centos.org/view.php?id=15193.&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://bugs.centos.org/view.php?id=15193.&lt;/a&gt;&#160; The update .rpm isn&apos;t available in Centos mirrors yet though.&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;</comment>
                            <comment id="234110" author="boggl" created="Fri, 28 Sep 2018 18:29:02 +0000"  >&lt;p&gt;the kernel update to 3.10.0-862.14.4 is now available on Centos mirrors&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;</comment>
                            <comment id="234742" author="yujian" created="Wed, 10 Oct 2018 18:25:59 +0000"  >&lt;p&gt;RHEL 7.5 kernel update to 3.10.0-862.14.4.el7 is tracked in &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-11448&quot; title=&quot;kernel update [RHEL7.5 3.10.0-862.14.4.el7]&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-11448&quot;&gt;&lt;del&gt;LU-11448&lt;/del&gt;&lt;/a&gt;.&lt;/p&gt;</comment>
                            <comment id="253498" author="pjones" created="Fri, 23 Aug 2019 13:28:02 +0000"  >&lt;p&gt;It seems like this was fixed in the next RHEL/CentOS update&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10010">
                    <name>Duplicate</name>
                                                                <inwardlinks description="is duplicated by">
                                        <issuelink>
            <issuekey id="52991">LU-11261</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="52981">LU-11253</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i000tj:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>