<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 03:11:48 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-14675] LNet not working over IB (RHEL8.3 MOFED 5.2 ppc64le)</title>
                <link>https://jira.whamcloud.com/browse/LU-14675</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Hi,&lt;/p&gt;

&lt;p&gt;I&apos;m trying to get the Lustre client working with RHEL 8.3 and MOFED 5.2 or later on the ppc64le architecture, and have run into trouble.&lt;/p&gt;

&lt;p&gt;With the help of cherry picking the commit for &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-13783&quot; title=&quot;Support for linux kernel version 5.8&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-13783&quot;&gt;&lt;del&gt;LU-13783&lt;/del&gt;&lt;/a&gt;, Lustre 2.12.6 builds. Once installed I can configure lnet, but the box is unable to lnetctl ping itself over InfiniBand:&lt;/p&gt;

&lt;p&gt;&lt;span class=&quot;error&quot;&gt;&amp;#91;root@infer004 ~&amp;#93;&lt;/span&gt;# systemctl start lnet&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;root@infer004 ~&amp;#93;&lt;/span&gt;# lnetctl ping 172.16.44.4@tcp&lt;br/&gt;
ping:&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;primary nid: 172.16.44.4@tcp&lt;br/&gt;
 Multi-Rail: False&lt;br/&gt;
 peer ni:&lt;/li&gt;
	&lt;li&gt;nid: 172.16.50.204@o2ib&lt;/li&gt;
	&lt;li&gt;nid: 172.16.44.4@tcp&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;root@infer004 ~&amp;#93;&lt;/span&gt;# lnetctl ping 172.16.50.204@o2ib&lt;br/&gt;
manage:&lt;/li&gt;
	&lt;li&gt;ping:&lt;br/&gt;
 errno: -1&lt;br/&gt;
 descr: failed to ping 172.16.50.204@o2ib: Input/output error&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;Syslog contains:&lt;/p&gt;

&lt;p&gt;May 7 12:51:17 infer004 kernel: LNet: HW NUMA nodes: 2, HW CPU cores: 160, npartitions: 2&lt;br/&gt;
May 7 12:51:17 infer004 kernel: alg: No test for adler32 (adler32-zlib)&lt;br/&gt;
May 7 12:51:17 infer004 kernel: alg: hash: digest failed on test 1 for crc32-table: ret=126&lt;br/&gt;
May 7 12:51:17 infer004 kernel: LNet: Using FastReg for registration&lt;br/&gt;
May 7 12:51:19 infer004 kernel: LNet: Added LNI 172.16.50.204@o2ib &lt;span class=&quot;error&quot;&gt;&amp;#91;32/1024/0/180&amp;#93;&lt;/span&gt;&lt;br/&gt;
May 7 12:51:19 infer004 kernel: LNet: Added LNI 172.16.44.4@tcp &lt;span class=&quot;error&quot;&gt;&amp;#91;8/256/0/180&amp;#93;&lt;/span&gt;&lt;br/&gt;
May 7 12:51:19 infer004 kernel: LNet: Accept secure, port 988&lt;br/&gt;
May 7 12:51:17 infer004 systemd&lt;span class=&quot;error&quot;&gt;&amp;#91;1&amp;#93;&lt;/span&gt;: Starting lnet management...&lt;br/&gt;
May 7 12:51:19 infer004 systemd&lt;span class=&quot;error&quot;&gt;&amp;#91;1&amp;#93;&lt;/span&gt;: Started lnet management.&lt;br/&gt;
May 7 12:51:41 infer004 kernel: LNet: 9655:0:(o2iblnd_cb.c:3420:kiblnd_check_conns()) Timed out tx for 172.16.50.204@o2ib: 217 seconds&lt;br/&gt;
May 7 12:51:42 infer004 kernel: LNetError: 9649:0:(lib-move.c:2955:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-172.16.50.204@o2ib: -125&lt;br/&gt;
May 7 12:51:42 infer004 kernel: LNet: 9655:0:(o2iblnd_cb.c:3420:kiblnd_check_conns()) Timed out tx for 172.16.50.204@o2ib: 218 seconds&lt;/p&gt;

&lt;p&gt;After attempting to ping over InfiniBand, the idle system&apos;s load average goes from ~0.00 to 1.00, &quot;systemctl stop lnet&quot; hangs and the following is added to syslog:&lt;/p&gt;

&lt;p&gt;May  7 12:57:01 infer004 systemd&lt;span class=&quot;error&quot;&gt;&amp;#91;1&amp;#93;&lt;/span&gt;: Stopping lnet management...&lt;br/&gt;
May  7 12:57:04 infer004 kernel: LNet: Removed LNI 172.16.44.4@tcp&lt;br/&gt;
May  7 12:57:05 infer004 kernel: LNet: 9702:0:(o2iblnd.c:3012:kiblnd_shutdown()) 172.16.50.204@o2ib: waiting for 1 peers to disconnect&lt;br/&gt;
May  7 12:57:09 infer004 kernel: LNet: 9702:0:(o2iblnd.c:3012:kiblnd_shutdown()) 172.16.50.204@o2ib: waiting for 1 peers to disconnect&lt;br/&gt;
May  7 12:57:17 infer004 kernel: LNet: 9702:0:(o2iblnd.c:3012:kiblnd_shutdown()) 172.16.50.204@o2ib: waiting for 1 peers to disconnect&lt;br/&gt;
May  7 12:57:34 infer004 kernel: LNet: 9702:0:(o2iblnd.c:3012:kiblnd_shutdown()) 172.16.50.204@o2ib: waiting for 1 peers to disconnect&lt;br/&gt;
May  7 12:58:07 infer004 kernel: LNet: 9702:0:(o2iblnd.c:3012:kiblnd_shutdown()) 172.16.50.204@o2ib: waiting for 1 peers to disconnect&lt;/p&gt;


&lt;p&gt;If I downgrade MOFED to 5.1-2.5.8.0 and rebuild Lustre 2.12.6 + &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-13783&quot; title=&quot;Support for linux kernel version 5.8&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-13783&quot;&gt;&lt;del&gt;LU-13783&lt;/del&gt;&lt;/a&gt;, the box is able to lnetctl ping itself on its InfiniBand interface.&lt;/p&gt;

&lt;p&gt;Any ideas, please?&lt;/p&gt;

&lt;p&gt;Thanks,&lt;/p&gt;

&lt;p&gt;Mark&lt;/p&gt;</description>
                <environment>Client: RHEL 8.3 (4.18.0-240.el8.ppc64le), MOFED 5.2-2.2.0 (prebuilt Mellanox binaries), ppc64le, Lustre 2.12.6 + &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-13783&quot; title=&quot;Support for linux kernel version 5.8&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-13783&quot;&gt;&lt;strike&gt;LU-13783&lt;/strike&gt;&lt;/a&gt;&lt;br/&gt;
&lt;br/&gt;
Lustre client compiled with:&lt;br/&gt;
&lt;br/&gt;
sh autogen.sh &amp;amp;&amp;amp; ./configure --with-linux=/usr/src/kernels/4.18.0-240.el8.ppc64le --with-o2ib=/usr/src/ofa_kernel/default &amp;amp;&amp;amp; make rpms&lt;br/&gt;
&lt;br/&gt;
ko2iblnd options:&lt;br/&gt;
&lt;br/&gt;
options ko2iblnd peer_credits=32 peer_credits_hiw=16 credits=1024 concurrent_sends=64 ntx=2048 map_on_demand=16 fmr_pool_size=2048 fmr_flush_trigger=512 fmr_cache=1 conns_per_peer=4&lt;br/&gt;
&lt;br/&gt;
lnet.conf:&lt;br/&gt;
&lt;br/&gt;
net:&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;- net type: o2ib&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;local NI(s):&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;- nid: &lt;a href=&apos;mailto:172.16.50.204@o2ib&apos;&gt;172.16.50.204@o2ib&lt;/a&gt;&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;status: up&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;interfaces:&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;0: ib0&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;- net type: tcp&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;local NI(s):&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;- nid: &lt;a href=&apos;mailto:172.16.44.4@tcp&apos;&gt;172.16.44.4@tcp&lt;/a&gt;&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;status: up&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;interfaces:&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;0: enP49p3s0f1&lt;br/&gt;
&lt;br/&gt;
Interfaces:&lt;br/&gt;
&lt;br/&gt;
ib0: flags=4163&amp;lt;UP,BROADCAST,RUNNING,MULTICAST&amp;gt;  mtu 2044&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;inet 172.16.50.204  netmask 255.255.252.0  broadcast 172.16.51.255&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;inet6 fe80::1e34:da03:7d:6c0e  prefixlen 64  scopeid 0x20&amp;lt;link&amp;gt;&lt;br/&gt;
Infiniband hardware address can be incorrect! Please read BUGS section in ifconfig(8).&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;infiniband 00:00:10:87:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00  txqueuelen 256  (InfiniBand)&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;RX packets 172  bytes 34188 (33.3 KiB)&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;RX errors 0  dropped 0  overruns 0  frame 0&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;TX packets 231  bytes 29724 (29.0 KiB)&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0&lt;br/&gt;
&lt;br/&gt;
enP49p3s0f1: flags=4163&amp;lt;UP,BROADCAST,RUNNING,MULTICAST&amp;gt;  mtu 1500&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;inet 172.16.44.4  netmask 255.255.248.0  broadcast 172.16.47.255&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;inet6 fe80::a94:efff:fe80:db5f  prefixlen 64  scopeid 0x20&amp;lt;link&amp;gt;&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;ether 08:94:ef:80:db:5f  txqueuelen 1000  (Ethernet)&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;RX packets 1873  bytes 395482 (386.2 KiB)&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;RX errors 0  dropped 0  overruns 0  frame 0&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;TX packets 2644  bytes 421936 (412.0 KiB)&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;device interrupt 59  &lt;br/&gt;
&lt;br/&gt;
&lt;br/&gt;
Server: CentOS 7 + MOFED 4.9 on x86_64, Lustre 2.12.5 (but not touched during this test)</environment>
        <key id="64087">LU-14675</key>
            <summary>LNet not working over IB (RHEL8.3 MOFED 5.2 ppc64le)</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="1" iconUrl="https://jira.whamcloud.com/images/icons/statuses/open.png" description="The issue is open and ready for the assignee to start work on it.">Open</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="wc-triage">WC Triage</assignee>
                                    <reporter username="bodgerer">Mark Dixon</reporter>
                        <labels>
                    </labels>
                <created>Fri, 7 May 2021 13:09:23 +0000</created>
                <updated>Fri, 14 May 2021 09:34:49 +0000</updated>
                                            <version>Lustre 2.12.6</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>2</watches>
                                                                            <comments>
                            <comment id="301578" author="bodgerer" created="Fri, 14 May 2021 09:34:49 +0000"  >&lt;p&gt;Have done some more work and discovered this is not a ppc64le-specific issue, applies to x86_64 as well. So I&apos;m hoping that this is all known about and being worked on in other tickets &lt;img class=&quot;emoticon&quot; src=&quot;https://jira.whamcloud.com/images/icons/emoticons/smile.png&quot; height=&quot;16&quot; width=&quot;16&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/p&gt;

&lt;p&gt;Essentially mofed 5.2-1.0.4.0 is the only version that compiles on rhel 8.3 and where the Lustre client works. Lustre client built on top of mofed 5.2-2.2.0.0 and 5.3-1.0.01 results in the problem described in this ticket.&lt;/p&gt;

&lt;p&gt;Looking at what prebuilt mofed binaries are available, looks like rhel 8.4 beta may need mofed 5.3 or later - so this issue will in all likelihood prevent the migration of clients to rhel 8.4 when it&apos;s released.&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                    <customfield id="customfield_10030" key="com.atlassian.jira.plugin.system.customfieldtypes:labels">
                        <customfieldname>Epic/Theme</customfieldname>
                        <customfieldvalues>
                                        <label>lnet</label>
            <label>mofed</label>
            <label>ppc64le</label>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i01u0n:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>