Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17972

dkms.conf not installing o2iblnd in chroot environment

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      Problem Description

      When using DKMS to install the Lustre kernel modules in a chroot or container environment, MOFED is not correctly detected so ko2iblnd.ko gets omitted from the installation (despite being built). This is problematic when trying to do OS image builds in a chroot or container environment where the targeted kernel version and sources don't match that of the running kernel (as seen by "uname -r"). The source of this problem comes from dkms.conf file filtering MOFED sources by the host's kernel, not finding anything, then leaving the ext_mofed flag set to "no".

      Evidence

      Running dkms install <lustre_module> -k <kernel>, after adding DEBUG print statements to the /usr/src/lustre-client-2.15.3.2_cray_45_g1b746e4/dkms.conf:

      I've uploaded the full log output of this as dkms_install_B14.log, but we'll go through the important bits below:

      cray-ims:~ # dkms install lustre-client/2.15.3.2_cray_45_g1b746e4 -k 5.14.21-150500.55.59_13.0.72-cray_shasta_c
      
      DEBUG: kernelver=5.14.21-150500.55.59_13.0.72-cray_shasta_c module=lustre-client module_version=2.15.3.2_cray_45_g1b746e4 dkms_tree=/var/lib/dkms source_tree=/usr/src
      DEBUG: o2ib=
      DEBUG: pkgs=mlnx-ofed-kernel-dkms|mlnx-ofed-kernel-modules|mlnx-ofa_kernel-devel|compat-rdma-devel|kernel-ib-devel|ofa_kernel-devel
      DEBUG: paths=/usr/src/ofa_kernel/x86_64/5.14.21-150500.55.49_13.0.56-cray_shasta_c
      DEBUG: find /usr/src/ofa_kernel/x86_64/5.14.21-150500.55.49_13.0.56-cray_shasta_c -name rdma_cm.h: /usr/src/ofa_kernel/x86_64/5.14.21-150500.55.49_13.0.56-cray_shasta_c/include/rdma/rdma_cm.h
      DEBUG: uname -r: 6.1.38
      DEBUG: epaths=
      DEBUG: ext_mofed=no, int_mofed=no
      Creating symlink /var/lib/dkms/lustre-client/2.15.3.2_cray_45_g1b746e4/source -> /usr/src/lustre-client-2.15.3.2_cray_45_g1b746e4
      

      Above we see the initial invocation of the dkms.conf file, and see the trace showing why ext_mofed and int_mofed end up as "no". Ultimately it's because we were filtering the paths variable based on the uname -r result, instead of the target kernel (kernelver). This is all despite MOFED being in place for the target kernel.

      We can see this further on in the ./configure output, invoked by the DKMS pre-build script:

      ==============================================================================
      checking whether to enable tunable backoff TCP support... yes
      checking if Linux kernel has tunable backoff TCP support... no
      checking whether to use Compat RDMA... yes
      checking whether to use any OFED backport headers... no
      checking whether to enable OpenIB gen2 support... yes
      configure: adding /usr/src/ofa_kernel/x86_64/5.14.21-150500.55.49_13.0.56-cray_shasta_c/Module.symvers to Symbol Path O2IB
      checking if Linux kernel has kthread_worker... no
      checking whether to enable GNI lnd... no
      checking if Linux kernel exports 'kmap_to_page'... no
      configure: Lustre kernel checks
      ==============================================================================
      

      Lastly, we can see that DKMS skips the installation of ko2iblnd.ko, installing only ksocklnd.ko which is unconditional by default.

      Running the post_build script:
      cleaning build area...
      
      lnet_selftest.ko:
      Running module version sanity check.
       - Original module
         - No original module exists within this kernel
       - Installation
         - Installing to /lib/modules/5.14.21-150500.55.59_13.0.72-cray_shasta_c/updates/
      
      lnet.ko:
      Running module version sanity check.
       - Original module
         - No original module exists within this kernel
       - Installation
         - Installing to /lib/modules/5.14.21-150500.55.59_13.0.72-cray_shasta_c/updates/
      
      ksocklnd.ko:
      Running module version sanity check.
       - Original module
         - No original module exists within this kernel
       - Installation
         - Installing to /lib/modules/5.14.21-150500.55.59_13.0.72-cray_shasta_c/updates/
      
      libcfs.ko:
      Running module version sanity check.
       - Original module
         - No original module exists within this kernel
       - Installation
         - Installing to /lib/modules/5.14.21-150500.55.59_13.0.72-cray_shasta_c/updates/
      
      ...
      

      Finally, we can see that ksocklnd.ko was the only kernel object installed:

      cray-ims:~ # find /lib/modules -name "*lnd.ko*"
      /lib/modules/5.14.21-150500.55.59_13.0.72-cray_shasta_c/updates/ksocklnd.ko
      

      Fix

      The fix for this should be to simply replace the "grep -F -e $(uname -r) -e default" with a grep for a filter based on the targeted kernel version variable:

      Example:

      # Filter MOFED source paths by the target kernel's major.minor.patch-ABI version
      filter=$(echo $kernelver | grep -P -o "\d+\.\d+\.\d+-\d+")
      epaths=$(find $paths -name rdma_cm.h |
              grep -F -e "$filter" |
              sed -e 's:/include/rdma/rdma_cm.h::')
      

      This would restrict MOFED sources entries to those for our targeted kernel:

      DEBUG: o2ib=
      DEBUG: pkgs=mlnx-ofed-kernel-dkms|mlnx-ofed-kernel-modules|mlnx-ofa_kernel-devel|compat-rdma-devel|kernel-ib-devel|ofa_kernel-devel
      DEBUG: paths=/usr/src/ofa_kernel/x86_64/5.14.21-150500.55.49_13.0.56-cray_shasta_c
      DEBUG: find /usr/src/ofa_kernel/x86_64/5.14.21-150500.55.49_13.0.56-cray_shasta_c -name rdma_cm.h: /usr/src/ofa_kernel/x86_64/5.14.21-150500.55.49_13.0.56-cray_shasta_c/include/rdma/rdma_cm.h
      DEBUG: kernelver=5.14.21-150500.55.59_13.0.72-cray_shasta_c
      DEBUG: epaths=/usr/src/ofa_kernel/x86_64/5.14.21-150500.55.49_13.0.56-cray_shasta_c
      DEBUG: ext_mofed=yes, int_mofed=no
      
      ....
      
      ksocklnd.ko:
      Running module version sanity check.
       - Original module
         - No original module exists within this kernel
       - Installation
         - Installing to /lib/modules/5.14.21-150500.55.59_13.0.72-cray_shasta_c/updates/
      
      ko2iblnd.ko:
      Running module version sanity check.
       - Original module
         - No original module exists within this kernel
       - Installation
         - Installing to /lib/modules/5.14.21-150500.55.59_13.0.72-cray_shasta_c/updates/
      

      Fix and Install Logs

      I've done a DKMS build/install of cray-2.15-int (which tracks WC master) and compared it against a build/install of cray-2.15-int + his patch (fe75881e1e) cherry-picked on.

      Here's the build/install logs for the cray-2.15-int branch without the patch: dkms_install_cray-2.15-int.log

       

      ...
      ksocklnd.ko:
      Running module version sanity check.
       - Original module
         - No original module exists within this kernel
       - Installation
         - Installing to /lib/modules/5.14.21-150500.55.59_13.0.72-cray_shasta_c/updates/
      ...
      + find /lib/modules -iname '*lnd*.ko'
      /lib/modules/5.14.21-150500.55.59_13.0.72-cray_shasta_c/updates/ksocklnd.ko
      

      With just cray-2.15-int, only ksocklnd.ko was installed by DKMS, and ko2iblnd.ko is missing.
      Here's the build/install logs for the cray-2.15-int with fe75881e1e:
      dkms_install_fe75881e1e.log

      ...
      ksocklnd.ko:
      Running module version sanity check.
       - Original module
         - No original module exists within this kernel
       - Installation
         - Installing to /lib/modules/5.14.21-150500.55.59_13.0.72-cray_shasta_c/updates/
      ko2iblnd.ko:
      Running module version sanity check.
       - Original module
         - No original module exists within this kernel
       - Installation
         - Installing to /lib/modules/5.14.21-150500.55.59_13.0.72-cray_shasta_c/updates/
      ...
      + find /lib/modules -iname '*lnd*.ko'
      /lib/modules/5.14.21-150500.55.59_13.0.72-cray_shasta_c/updates/ko2iblnd.ko
      /lib/modules/5.14.21-150500.55.59_13.0.72-cray_shasta_c/updates/ksocklnd.ko
      

       

      With the patch applied, both ksocklnd.ko and ko2iblnd.ko are installed by DKMS.

      FWIW, here's the script I used to build, install, and verify the results above:

      #!/bin/bash
      
      set -ex
      
      # Git settings
      cd lustre-wc-rel
      git fetch -p
      git reset --hard HEAD
      git checkout cray-2.15-int
      git cherry-pick fe75881e1e
      git clean -dfx > /dev/null
      git log --pretty=oneline | head -4
      
      # Modify this for respective distro you're using
      KERNEL_VERSION="5.14.21-150500.55.59_13.0.72"
      ARCH="x86_64"
      LINUX_DIR=$(ls -d /usr/src/linux-${KERNEL_VERSION})
      LINUX_OBJ_DIR=$(ls -d /usr/src/linux-${KERNEL_VERSION}-obj/${ARCH}/cray_shasta_c)
      
      ./LUSTRE-VERSION-GEN
      
      # Modify this to include configure options for the build you're doing
      sh ./autogen.sh
      ./configure \
        --enable-client \
        --disable-server \
        --disable-gss-keyring \
        --enable-gss="no" \
        --enable-mpitests="no" \
        --enable-ldap="no" \
        --with-o2ib="/usr/src/ofa_kernel/default" \
        --with-linux="$LINUX_DIR" \
        --with-linux-obj="$LINUX_OBJ_DIR"
      
      make dkms-rpms
      
      zypper install --allow-unsigned-rpm --no-confirm lustre-client-dkms-*.noarch.rpm
      find /lib/modules -iname "*lnd*.ko"
      

      Attachments

        1. dkms_install_cray-2.15-int.log
          88 kB
          Caleb Carlson
        2. dkms_install_fe75881e1e.log
          89 kB
          Caleb Carlson

        Activity

          People

            carlson Caleb Carlson
            carlson Caleb Carlson
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: