[LU-16719] ib_xxx symbol mismatch between in-kernel and mlx OFED Created: 06/Apr/23  Updated: 05/Jun/23

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.16.0
Fix Version/s: Lustre 2.16.0

Type: Bug Priority: Blocker
Reporter: Shuichi Ihara Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Related
is related to LU-16662 Linux 5.19+ break configure test comp... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

There is a regression in commit 321a533b86 (LU-16662 autoconf: fix configure test compile for CONFIG_KEYS) that causes ib_xxx symbol mismatch and it won't able to load ko2iblnd which was built agaisnt mlx ofed below. Before 321a533b86, it worked fine.

[root@ec01 ~]# uname -r
4.18.0-425.13.1.el8_7.x86_64
[root@ec01 ~]# ofed_info -n
5.8-1.1.2.1

[root@ec01 lustre-release]# git clean -d -x -f; sh ./autogen.sh; ./configure --with-o2ib=/usr/src/ofa_kernel/default; make rpms
[root@ec01 lustre-release]# modprobe lustre
modprobe: ERROR: could not insert 'lustre': Invalid argument
[root@ec01 lustre-release]# lctl get_param version
version=2.15.54_159_g321a533
Apr  6 22:18:21 ec01 kernel: libcfs: HW NUMA nodes: 1, HW CPU cores: 32, npartitions: 8
Apr  6 22:18:21 ec01 kernel: alg: No test for adler32 (adler32-zlib)
Apr  6 22:18:21 ec01 kernel: Key type ._llcrypt registered
Apr  6 22:18:21 ec01 kernel: Key type .llcrypt registered
Apr  6 22:18:21 ec01 kernel: Lustre: Lustre: Build Version: 2.15.54_159_g321a533
Apr  6 22:18:22 ec01 kernel: ko2iblnd: disagrees about version of symbol __ib_alloc_pd
Apr  6 22:18:22 ec01 kernel: ko2iblnd: Unknown symbol __ib_alloc_pd (err -22)
Apr  6 22:18:22 ec01 kernel: ko2iblnd: disagrees about version of symbol rdma_resolve_addr
Apr  6 22:18:22 ec01 kernel: ko2iblnd: Unknown symbol rdma_resolve_addr (err -22)
Apr  6 22:18:22 ec01 kernel: ko2iblnd: disagrees about version of symbol ib_dereg_mr_user
Apr  6 22:18:22 ec01 kernel: ko2iblnd: Unknown symbol ib_dereg_mr_user (err -22)
Apr  6 22:18:22 ec01 kernel: ko2iblnd: disagrees about version of symbol rdma_reject
Apr  6 22:18:22 ec01 kernel: ko2iblnd: Unknown symbol rdma_reject (err -22)
Apr  6 22:18:22 ec01 kernel: ko2iblnd: disagrees about version of symbol rdma_disconnect
Apr  6 22:18:22 ec01 kernel: ko2iblnd: Unknown symbol rdma_disconnect (err -22)
Apr  6 22:18:22 ec01 kernel: ko2iblnd: disagrees about version of symbol __rdma_create_kernel_id
Apr  6 22:18:22 ec01 kernel: ko2iblnd: Unknown symbol __rdma_create_kernel_id (err -22)
Apr  6 22:18:22 ec01 kernel: ko2iblnd: disagrees about version of symbol ib_register_event_handler
Apr  6 22:18:22 ec01 kernel: ko2iblnd: Unknown symbol ib_register_event_handler (err -22)
Apr  6 22:18:22 ec01 kernel: ko2iblnd: disagrees about version of symbol rdma_resolve_route
Apr  6 22:18:22 ec01 kernel: ko2iblnd: Unknown symbol rdma_resolve_route (err -22)
Apr  6 22:18:22 ec01 kernel: ko2iblnd: disagrees about version of symbol ib_unregister_event_handler
Apr  6 22:18:22 ec01 kernel: ko2iblnd: Unknown symbol ib_unregister_event_handler (err -22)
Apr  6 22:18:22 ec01 kernel: ko2iblnd: disagrees about version of symbol rdma_bind_addr
Apr  6 22:18:22 ec01 kernel: ko2iblnd: Unknown symbol rdma_bind_addr (err -22)
Apr  6 22:18:22 ec01 kernel: ko2iblnd: disagrees about version of symbol rdma_create_qp
Apr  6 22:18:22 ec01 kernel: ko2iblnd: Unknown symbol rdma_create_qp (err -22)
Apr  6 22:18:22 ec01 kernel: ko2iblnd: disagrees about version of symbol ib_map_mr_sg
Apr  6 22:18:22 ec01 kernel: ko2iblnd: Unknown symbol ib_map_mr_sg (err -22)
Apr  6 22:18:22 ec01 kernel: ko2iblnd: disagrees about version of symbol ib_query_port
Apr  6 22:18:22 ec01 kernel: ko2iblnd: Unknown symbol ib_query_port (err -22)
Apr  6 22:18:22 ec01 kernel: ko2iblnd: disagrees about version of symbol rdma_notify
Apr  6 22:18:22 ec01 kernel: ko2iblnd: Unknown symbol rdma_notify (err -22)
Apr  6 22:18:22 ec01 kernel: ko2iblnd: disagrees about version of symbol rdma_listen
Apr  6 22:18:22 ec01 kernel: ko2iblnd: Unknown symbol rdma_listen (err -22)
Apr  6 22:18:22 ec01 kernel: ko2iblnd: disagrees about version of symbol rdma_destroy_qp
Apr  6 22:18:22 ec01 kernel: ko2iblnd: Unknown symbol rdma_destroy_qp (err -22)
Apr  6 22:18:22 ec01 kernel: ko2iblnd: disagrees about version of symbol __ib_create_cq
Apr  6 22:18:22 ec01 kernel: ko2iblnd: Unknown symbol __ib_create_cq (err -22)
Apr  6 22:18:22 ec01 kernel: ko2iblnd: disagrees about version of symbol ib_alloc_mr
Apr  6 22:18:22 ec01 kernel: ko2iblnd: Unknown symbol ib_alloc_mr (err -22)
Apr  6 22:18:22 ec01 kernel: ko2iblnd: disagrees about version of symbol rdma_connect_locked
Apr  6 22:18:22 ec01 kernel: ko2iblnd: Unknown symbol rdma_connect_locked (err -22)
Apr  6 22:18:22 ec01 kernel: ko2iblnd: disagrees about version of symbol rdma_set_reuseaddr
Apr  6 22:18:22 ec01 kernel: ko2iblnd: Unknown symbol rdma_set_reuseaddr (err -22)
Apr  6 22:18:22 ec01 kernel: ko2iblnd: disagrees about version of symbol ib_destroy_cq_user
Apr  6 22:18:22 ec01 kernel: ko2iblnd: Unknown symbol ib_destroy_cq_user (err -22)
Apr  6 22:18:22 ec01 kernel: ko2iblnd: disagrees about version of symbol ib_modify_qp
Apr  6 22:18:22 ec01 kernel: ko2iblnd: Unknown symbol ib_modify_qp (err -22)
Apr  6 22:18:22 ec01 kernel: ko2iblnd: disagrees about version of symbol ib_dma_virt_map_sg
Apr  6 22:18:22 ec01 kernel: ko2iblnd: Unknown symbol ib_dma_virt_map_sg (err -22)
Apr  6 22:18:22 ec01 kernel: ko2iblnd: disagrees about version of symbol rdma_destroy_id
Apr  6 22:18:22 ec01 kernel: ko2iblnd: Unknown symbol rdma_destroy_id (err -22)
Apr  6 22:18:22 ec01 kernel: ko2iblnd: disagrees about version of symbol rdma_accept
Apr  6 22:18:22 ec01 kernel: ko2iblnd: Unknown symbol rdma_accept (err -22)
Apr  6 22:18:22 ec01 kernel: ko2iblnd: disagrees about version of symbol ib_dealloc_pd_user
Apr  6 22:18:22 ec01 kernel: ko2iblnd: Unknown symbol ib_dealloc_pd_user (err -22)
Apr  6 22:18:22 ec01 kernel: LNetError: 2824746:0:(api-ni.c:2639:lnet_load_lnd()) Can't load LND o2ib, module ko2iblnd, rc=256
Apr  6 22:18:22 ec01 kernel: LustreError: 2824746:0:(events.c:642:ptlrpc_init_portals()) network initialisation failed


 Comments   
Comment by Peter Jones [ 05/Jun/23 ]

Removing affects version 2.15.3 because it does not look like b2_15 should be impacted. Please speak up if this issue is seen on b2_15 because then something is off with the analysis to date.

Generated at Sat Feb 10 03:29:25 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.