[LU-16050] ofed_info does not show mlnx-ofed-kernel-modules Created: 27/Jul/22  Updated: 26/Oct/22  Resolved: 17/Sep/22

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.16.0, Lustre 2.15.1
Fix Version/s: Lustre 2.16.0, Lustre 2.15.2

Type: Bug Priority: Major
Reporter: Jian Yu Assignee: Jian Yu
Resolution: Fixed Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

After installing MLNX_OFED by running mlnxofedinstall command, I found mlnx-ofed-kernel-modules package was not listed in the output of ofed_info:

# tar xzf MLNX_OFED_LINUX-5.6-2.0.9.0-ubuntu22.04-x86_64.tgz 
# cd MLNX_OFED_LINUX-5.6-2.0.9.0-ubuntu22.04-x86_64/
# ./mlnxofedinstall --add-kernel-support --all --force
# /etc/init.d/openibd restart

# dpkg -S /usr/src/ofa_kernel/x86_64/5.15.0-41-generic/
mlnx-ofed-kernel-modules: /usr/src/ofa_kernel/x86_64/5.15.0-41-generic

# ofed_info | awk '{print $2}' | grep mlnx-ofed
mlnx-ofed-kernel-utils

There is no mlnx-ofed-kernel-modules in the output, which caused Lustre configure hit the following error:

checking whether to use Compat RDMA... /usr/bin/ofed_info
dpkg-query: error: --listfiles needs at least one package name argument

The relevant codes are in lnet/autoconf/lustre-lnet.m4:

case $with_o2ib in
        yes)    AS_IF([which ofed_info 2>/dev/null], [
                        AS_IF([test x$uses_dpkg = xyes], [
                                OFED_INFO="ofed_info | awk '{print \[$]2}'"
                                LSPKG="dpkg --listfiles"
                        ], [
                                OFED_INFO="ofed_info"
                                LSPKG="rpm -ql"
                        ])
                        O2IBPATHS=$(eval $OFED_INFO |
                                    egrep -w 'mlnx-ofed-kernel-dkms|mlnx-ofa_kernel-devel|compat-rdma-devel|kernel-ib-devel|ofa_kernel-devel' |
                                    xargs $LSPKG | grep -v 'ofa_kernel-' | grep rdma_cm.h | sed 's/\/include\/rdma\/rdma_cm.h//')


 Comments   
Comment by Gerrit Updater [ 27/Jul/22 ]

"Jian Yu <yujian@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/48047
Subject: LU-16050 build: replace ofed_info with dpkg/rpm
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 171b16923c83c5c68f500788ae040d3016dd5df2

Comment by Jian Yu [ 27/Jul/22 ]

Hi nathand,
Could you please try the latest patch set 2 of https://review.whamcloud.com/48047?
It works on my node with mlnx-ofed-kernel-dkms installed.
The path can be detected now and there is no need to specify it with "--with-o2ib".

Additionally, another fix would be to have the "make dkms-debs" actually honor the original "./configure" params.

The fix needs to be made in debian/dkms.conf.in. I will look into the details to see why the author created that file with hard-coded params.
Before I make some changes, please try to update that file to adjust the configure params for dkms package.

Comment by Nathan Dauchy [ 31/Jul/22 ]

Jian,

Patch set 2 does correctly find the IB headers without needing to specify "--with-o2ib", and it works both for the initial ./configure and for "make dkms-debs -j".  This was tested with the tarball Patrick provided in NVDA-149, more or less master.

./configure --disable-dependency-tracking --with-linux=/usr/src/linux-headers-$(uname -r) --disable-snmp --enable-quota --disable-server --without-zfs --disable-ldiskfs --disable-gss --disable-crypto
checking whether to use Compat RDMA... /usr/bin/ofed_info
yes
checking whether to use any OFED backport headers... no
checking whether to enable OpenIB gen2 support... yes
configure: adding /usr/src/ofa_kernel/x86_64/5.15.0-40-generic/Module.symvers to Symbol Path

When installing the resulting packages and triggering the DKMS build, everything seemed to finish compiling fine, modules loaded, and o2ib lnet pings worked. Looks good!

Thanks,
Nathan

Comment by Jian Yu [ 01/Aug/22 ]

Thank you for verifying, Nathan.

Comment by Gerrit Updater [ 17/Sep/22 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/48047/
Subject: LU-16050 build: replace ofed_info with dpkg/rpm
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 3a7930e63c15b0fbe51ac73db81a1186939115bb

Comment by Peter Jones [ 17/Sep/22 ]

Landed for 2.16

Comment by Gerrit Updater [ 19/Sep/22 ]

"Jian Yu <yujian@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/48592
Subject: LU-16050 build: replace ofed_info with dpkg/rpm
Project: fs/lustre-release
Branch: b2_15
Current Patch Set: 1
Commit: 1d7be9d8e70ca84b92cb59480b62eb2cc0ce0424

Comment by Gerrit Updater [ 26/Oct/22 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/48592/
Subject: LU-16050 build: replace ofed_info with dpkg/rpm
Project: fs/lustre-release
Branch: b2_15
Current Patch Set:
Commit: 3c8812e6d364829c4faa78fe02feda755c83164a

Generated at Sat Feb 10 03:23:33 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.