Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9745

dkms-lustre does not install all modules on initial autoinstall

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.10.1, Lustre 2.11.0
    • Lustre 2.10.0
    • None
    • 3
    • 9223372036854775807

    Description

      When I run the following command to install DKMS built Lustre:

      # yum install kernel-devel-[0-9]*_lustre lustre lustre-dkms kmod-lustre-osd-ldiskfs zfs
      
      

      the result after the installation is that only the lustre.ko (built from dkms-lustre) module is in /lib/modules/3.10.0-514.21.1.el7_lustre.x86_64/extra/:

      # ls -l /lib/modules/3.10.0-514.21.1.el7_lustre.x86_64/extra/
      total 4836
      -rw-r--r-- 1 root root 1615824 Jul  7 14:01 lustre.ko
      drwxr-xr-x 3 root root      16 Jul  7 13:55 lustre-osd-ldiskfs
      -rw-r--r-- 1 root root  353632 Jul  7 13:55 splat.ko
      -rw-r--r-- 1 root root  170024 Jul  7 13:55 spl.ko
      -rw-r--r-- 1 root root   14016 Jul  7 13:57 zavl.ko
      -rw-r--r-- 1 root root   75848 Jul  7 13:57 zcommon.ko
      -rw-r--r-- 1 root root 2205152 Jul  7 13:57 zfs.ko
      -rw-r--r-- 1 root root  132488 Jul  7 13:57 znvpair.ko
      -rw-r--r-- 1 root root   34000 Jul  7 13:57 zpios.ko
      -rw-r--r-- 1 root root  330920 Jul  7 13:57 zunicode.ko
      
      

      Notice that all of the other supporting modules are missing.

      After the above, if I then remove the module with dkms uninstall -m lustre/2.10.0_RC1 -k 3.10.0-514.21.1.el7_lustre.x86_64 and then run /etc/kernel/postinst.d/dkms 3.10.0-514.21.1.el7_lustre.x86_64 to emulate what happens during the yum installation above, /lib/modules/3.10.0-514.21.1.el7_lustre.x86_64/extra/ contains all of the necessary Lustre modules.

      So there seems to be some subtle issue with the lustre-dkms RPM that only occurs during the initial installation.

      Attachments

        1. config.log
          726 kB
        2. make.log
          55 kB
        3. transcsript
          61 kB

        Activity

          [LU-9745] dkms-lustre does not install all modules on initial autoinstall
          pjones Peter Jones added a comment -

          Landed for 2.11

          pjones Peter Jones added a comment - Landed for 2.11

          Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/28210/
          Subject: LU-9745 dkms: Fix included dkms.conf file
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: c99e8035ea543860f6db5e9919ff0045b56d1835

          gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/28210/ Subject: LU-9745 dkms: Fix included dkms.conf file Project: fs/lustre-release Branch: master Current Patch Set: Commit: c99e8035ea543860f6db5e9919ff0045b56d1835

          John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/28224/
          Subject: LU-9745 dkms: Fix included dkms.conf file
          Project: fs/lustre-release
          Branch: b2_10
          Current Patch Set:
          Commit: a0d855bcf7e55a6c2e0659b5fce7a19a9818d2b0

          gerrit Gerrit Updater added a comment - John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/28224/ Subject: LU-9745 dkms: Fix included dkms.conf file Project: fs/lustre-release Branch: b2_10 Current Patch Set: Commit: a0d855bcf7e55a6c2e0659b5fce7a19a9818d2b0

          Brian J. Murrell (brian.murrell@intel.com) uploaded a new patch: https://review.whamcloud.com/28224
          Subject: LU-9745 dkms: Fix included dkms.conf file
          Project: fs/lustre-release
          Branch: b2_10
          Current Patch Set: 1
          Commit: 7a2bdccdbb48873a02be70d843c5203ba8c668ca

          gerrit Gerrit Updater added a comment - Brian J. Murrell (brian.murrell@intel.com) uploaded a new patch: https://review.whamcloud.com/28224 Subject: LU-9745 dkms: Fix included dkms.conf file Project: fs/lustre-release Branch: b2_10 Current Patch Set: 1 Commit: 7a2bdccdbb48873a02be70d843c5203ba8c668ca
          brian Brian Murrell (Inactive) added a comment - - edited

          utopiabound: Do you mind if I cherry-pick this to b2_10 so that I can test it?

          brian Brian Murrell (Inactive) added a comment - - edited utopiabound : Do you mind if I cherry-pick this  to b2_10 so that I can test it?

          Using https://review.whamcloud.com/#/c/28210/3/ works. It was the bad dkms.conf.

          utopiabound Nathaniel Clark added a comment - Using https://review.whamcloud.com/#/c/28210/3/ works. It was the bad dkms.conf.

          Nathaniel Clark (nathaniel.l.clark@intel.com) uploaded a new patch: https://review.whamcloud.com/28210
          Subject: LU-9745 dkms: Fix included dkms.conf file
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: 28c700e07d78b04cf7be819c646be2a5b6087301

          gerrit Gerrit Updater added a comment - Nathaniel Clark (nathaniel.l.clark@intel.com) uploaded a new patch: https://review.whamcloud.com/28210 Subject: LU-9745 dkms: Fix included dkms.conf file Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 28c700e07d78b04cf7be819c646be2a5b6087301

          I think I know what is going wrong, lustre-dkms uses a "temporary" dkms.conf (which only lists lustre.ko) then rebuilds it during dkms process; I'm not sure why, but I aim to fix it.

          utopiabound Nathaniel Clark added a comment - I think I know what is going wrong, lustre-dkms uses a "temporary" dkms.conf (which only lists lustre.ko) then rebuilds it during dkms process; I'm not sure why, but I aim to fix it.

          But, as utopiabound points out, it's only the Lustre DKMS module that has this problem.  The ZFS/SPL DKMS modules don't have the same problem.

          So this would appear to be some subtle issue with our module, or at least the way we have written our module is triggering a bug that the ZFL/SPL modules are not.

          The next step is probably to examine our module and the ZFS/SPL modules closely to see what kind of differences exist between them.

          brian Brian Murrell (Inactive) added a comment - But, as utopiabound points out, it's only the Lustre DKMS module that has this problem.  The ZFS/SPL DKMS modules don't have the same problem. So this would appear to be some subtle issue with our module, or at least the way we have written our module is triggering a bug that the ZFL/SPL modules are not. The next step is probably to examine our module and the ZFS/SPL modules closely to see what kind of differences exist between them.

          The process Nathaniel outlined is the one I also described, and it is the only mechanism that I was successfully able to incorporate (in my command line, the first yum command installs lustre-patched kernel packages from a lustre 2.10 repo).

          Adding the lustre kernel packages and then continuing without a reboot, regardless of combination, leaves the system in the incomplete state when the host is eventually rebooted, where the modules get compiled, but not installed. I'd also point out that the modules are "added" but not installed when the attempt is made without rebooting first.

          To build on Nathaniel's comment, one possible compromise could be to install the kernel-devel package of the "starting" kernel, as well as the desired kernel, to see if that alters the behaviour prior to reboot. This will cause the modules to be built more than once, but it might be a way to meet the requirement of installing all packages prior to reboot.

          malkolm Malcolm Cowe (Inactive) added a comment - The process Nathaniel outlined is the one I also described, and it is the only mechanism that I was successfully able to incorporate (in my command line, the first yum command installs lustre-patched kernel packages from a lustre 2.10 repo). Adding the lustre kernel packages and then continuing without a reboot, regardless of combination, leaves the system in the incomplete state when the host is eventually rebooted, where the modules get compiled, but not installed. I'd also point out that the modules are "added" but not installed when the attempt is made without rebooting first. To build on Nathaniel's comment, one possible compromise could be to install the kernel-devel package of the "starting" kernel, as well as the desired kernel, to see if that alters the behaviour prior to reboot. This will cause the modules to be built more than once, but it might be a way to meet the requirement of installing all packages prior to reboot.

          People

            utopiabound Nathaniel Clark
            brian Brian Murrell (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: