Description
Problem Description
When using DKMS to install the Lustre kernel modules in a chroot or container environment, MOFED is not correctly detected so ko2iblnd.ko gets omitted from the installation (despite being built). This is problematic when trying to do OS image builds in a chroot or container environment where the targeted kernel version and sources don't match that of the running kernel (as seen by "uname -r"). The source of this problem comes from dkms.conf file filtering MOFED sources by the host's kernel, not finding anything, then leaving the ext_mofed flag set to "no".
Evidence
Running dkms install <lustre_module> -k <kernel>, after adding DEBUG print statements to the /usr/src/lustre-client-2.15.3.2_cray_45_g1b746e4/dkms.conf:
I've uploaded the full log output of this as dkms_install_B14.log, but we'll go through the important bits below:
cray-ims:~ # dkms install lustre-client/2.15.3.2_cray_45_g1b746e4 -k 5.14.21-150500.55.59_13.0.72-cray_shasta_c DEBUG: kernelver=5.14.21-150500.55.59_13.0.72-cray_shasta_c module=lustre-client module_version=2.15.3.2_cray_45_g1b746e4 dkms_tree=/var/lib/dkms source_tree=/usr/src DEBUG: o2ib= DEBUG: pkgs=mlnx-ofed-kernel-dkms|mlnx-ofed-kernel-modules|mlnx-ofa_kernel-devel|compat-rdma-devel|kernel-ib-devel|ofa_kernel-devel DEBUG: paths=/usr/src/ofa_kernel/x86_64/5.14.21-150500.55.49_13.0.56-cray_shasta_c DEBUG: find /usr/src/ofa_kernel/x86_64/5.14.21-150500.55.49_13.0.56-cray_shasta_c -name rdma_cm.h: /usr/src/ofa_kernel/x86_64/5.14.21-150500.55.49_13.0.56-cray_shasta_c/include/rdma/rdma_cm.h DEBUG: uname -r: 6.1.38 DEBUG: epaths= DEBUG: ext_mofed=no, int_mofed=no Creating symlink /var/lib/dkms/lustre-client/2.15.3.2_cray_45_g1b746e4/source -> /usr/src/lustre-client-2.15.3.2_cray_45_g1b746e4
Above we see the initial invocation of the dkms.conf file, and see the trace showing why ext_mofed and int_mofed end up as "no". Ultimately it's because we were filtering the paths variable based on the uname -r result, instead of the target kernel (kernelver). This is all despite MOFED being in place for the target kernel.
We can see this further on in the ./configure output, invoked by the DKMS pre-build script:
============================================================================== checking whether to enable tunable backoff TCP support... yes checking if Linux kernel has tunable backoff TCP support... no checking whether to use Compat RDMA... yes checking whether to use any OFED backport headers... no checking whether to enable OpenIB gen2 support... yes configure: adding /usr/src/ofa_kernel/x86_64/5.14.21-150500.55.49_13.0.56-cray_shasta_c/Module.symvers to Symbol Path O2IB checking if Linux kernel has kthread_worker... no checking whether to enable GNI lnd... no checking if Linux kernel exports 'kmap_to_page'... no configure: Lustre kernel checks ==============================================================================
Lastly, we can see that DKMS skips the installation of ko2iblnd.ko, installing only ksocklnd.ko which is unconditional by default.
Running the post_build script: cleaning build area... lnet_selftest.ko: Running module version sanity check. - Original module - No original module exists within this kernel - Installation - Installing to /lib/modules/5.14.21-150500.55.59_13.0.72-cray_shasta_c/updates/ lnet.ko: Running module version sanity check. - Original module - No original module exists within this kernel - Installation - Installing to /lib/modules/5.14.21-150500.55.59_13.0.72-cray_shasta_c/updates/ ksocklnd.ko: Running module version sanity check. - Original module - No original module exists within this kernel - Installation - Installing to /lib/modules/5.14.21-150500.55.59_13.0.72-cray_shasta_c/updates/ libcfs.ko: Running module version sanity check. - Original module - No original module exists within this kernel - Installation - Installing to /lib/modules/5.14.21-150500.55.59_13.0.72-cray_shasta_c/updates/ ...
Finally, we can see that ksocklnd.ko was the only kernel object installed:
cray-ims:~ # find /lib/modules -name "*lnd.ko*"
/lib/modules/5.14.21-150500.55.59_13.0.72-cray_shasta_c/updates/ksocklnd.ko
Fix
The fix for this should be to simply replace the "grep -F -e $(uname -r) -e default" with a grep for a filter based on the targeted kernel version variable:
Example:
# Filter MOFED source paths by the target kernel's major.minor.patch-ABI version filter=$(echo $kernelver | grep -P -o "\d+\.\d+\.\d+-\d+") epaths=$(find $paths -name rdma_cm.h | grep -F -e "$filter" | sed -e 's:/include/rdma/rdma_cm.h::')
This would restrict MOFED sources entries to those for our targeted kernel:
DEBUG: o2ib= DEBUG: pkgs=mlnx-ofed-kernel-dkms|mlnx-ofed-kernel-modules|mlnx-ofa_kernel-devel|compat-rdma-devel|kernel-ib-devel|ofa_kernel-devel DEBUG: paths=/usr/src/ofa_kernel/x86_64/5.14.21-150500.55.49_13.0.56-cray_shasta_c DEBUG: find /usr/src/ofa_kernel/x86_64/5.14.21-150500.55.49_13.0.56-cray_shasta_c -name rdma_cm.h: /usr/src/ofa_kernel/x86_64/5.14.21-150500.55.49_13.0.56-cray_shasta_c/include/rdma/rdma_cm.h DEBUG: kernelver=5.14.21-150500.55.59_13.0.72-cray_shasta_c DEBUG: epaths=/usr/src/ofa_kernel/x86_64/5.14.21-150500.55.49_13.0.56-cray_shasta_c DEBUG: ext_mofed=yes, int_mofed=no .... ksocklnd.ko: Running module version sanity check. - Original module - No original module exists within this kernel - Installation - Installing to /lib/modules/5.14.21-150500.55.59_13.0.72-cray_shasta_c/updates/ ko2iblnd.ko: Running module version sanity check. - Original module - No original module exists within this kernel - Installation - Installing to /lib/modules/5.14.21-150500.55.59_13.0.72-cray_shasta_c/updates/
Fix and Install Logs
I've done a DKMS build/install of cray-2.15-int (which tracks WC master) and compared it against a build/install of cray-2.15-int + his patch (fe75881e1e) cherry-picked on.
Here's the build/install logs for the cray-2.15-int branch without the patch: dkms_install_cray-2.15-int.log
... ksocklnd.ko: Running module version sanity check. - Original module - No original module exists within this kernel - Installation - Installing to /lib/modules/5.14.21-150500.55.59_13.0.72-cray_shasta_c/updates/ ... + find /lib/modules -iname '*lnd*.ko' /lib/modules/5.14.21-150500.55.59_13.0.72-cray_shasta_c/updates/ksocklnd.ko
With just cray-2.15-int, only ksocklnd.ko was installed by DKMS, and ko2iblnd.ko is missing.
Here's the build/install logs for the cray-2.15-int with fe75881e1e:
dkms_install_fe75881e1e.log
... ksocklnd.ko: Running module version sanity check. - Original module - No original module exists within this kernel - Installation - Installing to /lib/modules/5.14.21-150500.55.59_13.0.72-cray_shasta_c/updates/ ko2iblnd.ko: Running module version sanity check. - Original module - No original module exists within this kernel - Installation - Installing to /lib/modules/5.14.21-150500.55.59_13.0.72-cray_shasta_c/updates/ ... + find /lib/modules -iname '*lnd*.ko' /lib/modules/5.14.21-150500.55.59_13.0.72-cray_shasta_c/updates/ko2iblnd.ko /lib/modules/5.14.21-150500.55.59_13.0.72-cray_shasta_c/updates/ksocklnd.ko
With the patch applied, both ksocklnd.ko and ko2iblnd.ko are installed by DKMS.
FWIW, here's the script I used to build, install, and verify the results above:
#!/bin/bash set -ex # Git settings cd lustre-wc-rel git fetch -p git reset --hard HEAD git checkout cray-2.15-int git cherry-pick fe75881e1e git clean -dfx > /dev/null git log --pretty=oneline | head -4 # Modify this for respective distro you're using KERNEL_VERSION="5.14.21-150500.55.59_13.0.72" ARCH="x86_64" LINUX_DIR=$(ls -d /usr/src/linux-${KERNEL_VERSION}) LINUX_OBJ_DIR=$(ls -d /usr/src/linux-${KERNEL_VERSION}-obj/${ARCH}/cray_shasta_c) ./LUSTRE-VERSION-GEN # Modify this to include configure options for the build you're doing sh ./autogen.sh ./configure \ --enable-client \ --disable-server \ --disable-gss-keyring \ --enable-gss="no" \ --enable-mpitests="no" \ --enable-ldap="no" \ --with-o2ib="/usr/src/ofa_kernel/default" \ --with-linux="$LINUX_DIR" \ --with-linux-obj="$LINUX_OBJ_DIR" make dkms-rpms zypper install --allow-unsigned-rpm --no-confirm lustre-client-dkms-*.noarch.rpm find /lib/modules -iname "*lnd*.ko"