Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2907

Infiniband HW kernel modules of OFA builds not started at system boot

Details

    • Story
    • Resolution: Fixed
    • Blocker
    • Lustre 2.4.0
    • Lustre 2.4.0
    • 6997

    Description

      Symptom:
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      When provisioning test nodes with ofa builds (i.e. 'external' build of the kernel-ib based on Openfabrics OFED tarballs) based on rhel6 and compiled against kernel version 2.6.32-279, the initialization of the Infiniband interfaces (ib0, ib1,...) fails due to the fact the low level kernel Infiniband HW modules mlx4_core, mlx4_en are not loaded.

      When loading the kernel-ib HW modules manually (modprobe mlx4_core,...) the interface are created and operational (i.e. connected to fabric, IP over IB works,...)

      The kernel-ib RPM normally is going to be build with a set of startup-scripts (/etc/init.d/openibd and links in /etc/rc.d/*, chkconfig execution,...) to ensure that the Infiniband HW kernel modules are loaded during system start. These files/scripts are missing in the kernel-ib RPM.

      Due to a installation conflict of the kernel-ib with openibd RPM for canonical distribution 'rhel5' the scripts/files were removed from the OFED kernel-ib SPEC file before creating them (rpmbuild) with help of the lbuild script. (See LU-388 for further details)

      This conflict no longer exist since openib-<version>.rpm isn't part of rhel6 anymore. Additionally the functionality of initializing the Infiniband HW is gone, too, because openib RPM contain(ed) the necessary startup scripts:

      rpm -qil --scripts -p openib-1.4.1-5.el5.noarch.rpm
      warning: openib-1.4.1-5.el5.noarch.rpm: Header V3 DSA/SHA1 Signature, key ID 192a7d7d: NOKEY
      Name : openib Relocations: (not relocatable)
      Version : 1.4.1 Vendor: Scientific Linux
      Release : 5.el5 Build Date: Wed 31 Mar 2010 12:39:27 AM PDT
      Install Date: (not installed) Build Host: norob.fnal.gov
      Group : System Environment/Base Source RPM: openib-1.4.1-5.el5.src.rpm
      Size : 27021 License: GPL/BSD
      Signature : DSA/SHA1, Wed 31 Mar 2010 12:52:50 PM PDT, Key ID b0b4183f192a7d7d
      URL : http://www.openfabrics.org/
      Summary : OpenIB Infiniband Driver Stack
      Description :
      User space initialization scripts for the kernel InfiniBand drivers
      postinstall scriptlet (using /bin/sh):
      if [ $1 = 1 ]; then
      /sbin/chkconfig --add openibd
      fi
      preuninstall scriptlet (using /bin/sh):
      if [ $1 = 0 ]; then
      /sbin/chkconfig --del openibd
      fi
      /etc/ofed
      /etc/ofed/fixup-mtrr.awk
      /etc/ofed/openib.conf
      /etc/rc.d/init.d/openibd
      /etc/sysconfig/network-scripts/ifup-ib
      /etc/udev/rules.d/90-ib.rules

      The script (openidb) have been 'moved' to kernel-ib package for OFED version 1.5.*.

      To overcome the situation the following code change in lustre-reviews/build/lbuild (inside loop beginning at line 1216; `for file in $(ls ${TOPDIR}/lustre/build/patches/ofed/*.patch); do´ )

      if [ file =~ "${CANONICAL_TARGET}" ]
      ed_fragment3="$ed_fragment3
      $(cat $file)"
      let n=$n+1
      end

      and rename of the ed - script (to remove packaging of openibd files and scripts) from

      01-play-nice-with-RHEL5.ed
      to
      01-play-nice-with-rhel5.ed

      is necessary. This will ensure that kernel-ib ofa-builds for rhel5 are created without openibd scripts, but make them available for rhel6 RPMs.

      Attachments

        Issue Links

          Activity

            People

              heckes Frank Heckes (Inactive)
              heckes Frank Heckes (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: