[LU-14675] LNet not working over IB (RHEL8.3 MOFED 5.2 ppc64le) Created: 07/May/21 Updated: 14/May/21 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.12.6 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Mark Dixon | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Client: RHEL 8.3 (4.18.0-240.el8.ppc64le), MOFED 5.2-2.2.0 (prebuilt Mellanox binaries), ppc64le, Lustre 2.12.6 + Lustre client compiled with: sh autogen.sh && ./configure --with-linux=/usr/src/kernels/4.18.0-240.el8.ppc64le --with-o2ib=/usr/src/ofa_kernel/default && make rpms ko2iblnd options: options ko2iblnd peer_credits=32 peer_credits_hiw=16 credits=1024 concurrent_sends=64 ntx=2048 map_on_demand=16 fmr_pool_size=2048 fmr_flush_trigger=512 fmr_cache=1 conns_per_peer=4 lnet.conf: net:
Interfaces: ib0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 2044 enP49p3s0f1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 Server: CentOS 7 + MOFED 4.9 on x86_64, Lustre 2.12.5 (but not touched during this test) |
||
| Epic/Theme: | lnet, mofed, ppc64le |
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
Hi, I'm trying to get the Lustre client working with RHEL 8.3 and MOFED 5.2 or later on the ppc64le architecture, and have run into trouble. With the help of cherry picking the commit for [root@infer004 ~]# systemctl start lnet
Syslog contains: May 7 12:51:17 infer004 kernel: LNet: HW NUMA nodes: 2, HW CPU cores: 160, npartitions: 2 After attempting to ping over InfiniBand, the idle system's load average goes from ~0.00 to 1.00, "systemctl stop lnet" hangs and the following is added to syslog: May 7 12:57:01 infer004 systemd[1]: Stopping lnet management... If I downgrade MOFED to 5.1-2.5.8.0 and rebuild Lustre 2.12.6 + Any ideas, please? Thanks, Mark |
| Comments |
| Comment by Mark Dixon [ 14/May/21 ] |
|
Have done some more work and discovered this is not a ppc64le-specific issue, applies to x86_64 as well. So I'm hoping that this is all known about and being worked on in other tickets Essentially mofed 5.2-1.0.4.0 is the only version that compiles on rhel 8.3 and where the Lustre client works. Lustre client built on top of mofed 5.2-2.2.0.0 and 5.3-1.0.01 results in the problem described in this ticket. Looking at what prebuilt mofed binaries are available, looks like rhel 8.4 beta may need mofed 5.3 or later - so this issue will in all likelihood prevent the migration of clients to rhel 8.4 when it's released. |