Details
-
Improvement
-
Resolution: Fixed
-
Medium
-
Lustre 2.15.8
-
None
-
Server 2.12.x, mixture of clients: rhel 8.10 (ppc64le), Rocky 9.7 (aarch64+64k)
Description
Hello!
NVIDIA now release their out-of-tree InfiniBand drivers in a package called DOCA instead of MLNX_OFED. On EL they've renamed some RPM packages and (more significantly) they now seem to maintain the kernel OFA source using DKMS, outside of the knowledge of the RPM database. This means that the existing detection logic cannot work, e.g. on one of our 2.15.8 clients:
sh autogen.sh ./configure --with-linux=/usr/src/kernels/$(uname -r) ... checking whether to use Compat RDMA... /usr/bin/ofed_info rpm: no arguments given for query configure: error: You seem to have an OFED installed but have not installed it's devel package. If you still want to build Lustre for your OFED I/B stack, you need to install its devel headers RPM. Instead, if you want to build Lustre for your kernel's built-in I/B stack rather than your installed OFED stack, either remove the OFED package(s) or use --with-o2ib=no.
Can this be resolved, please?
I've attached a very naive fix that works for me, but I cannot vouch if it handles enough use cases.
I'd previously tried porting LU-18002 to 2.15.8, but it didn't resolve this.
Thanks,
Mark