Details
-
Story
-
Resolution: Fixed
-
Blocker
-
Lustre 2.4.0
-
6997
Description
Symptom:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
When provisioning test nodes with ofa builds (i.e. 'external' build of the kernel-ib based on Openfabrics OFED tarballs) based on rhel6 and compiled against kernel version 2.6.32-279, the initialization of the Infiniband interfaces (ib0, ib1,...) fails due to the fact the low level kernel Infiniband HW modules mlx4_core, mlx4_en are not loaded.
When loading the kernel-ib HW modules manually (modprobe mlx4_core,...) the interface are created and operational (i.e. connected to fabric, IP over IB works,...)
The kernel-ib RPM normally is going to be build with a set of startup-scripts (/etc/init.d/openibd and links in /etc/rc.d/*, chkconfig execution,...) to ensure that the Infiniband HW kernel modules are loaded during system start. These files/scripts are missing in the kernel-ib RPM.
Due to a installation conflict of the kernel-ib with openibd RPM for canonical distribution 'rhel5' the scripts/files were removed from the OFED kernel-ib SPEC file before creating them (rpmbuild) with help of the lbuild script. (See LU-388 for further details)
This conflict no longer exist since openib-<version>.rpm isn't part of rhel6 anymore. Additionally the functionality of initializing the Infiniband HW is gone, too, because openib RPM contain(ed) the necessary startup scripts:
rpm -qil --scripts -p openib-1.4.1-5.el5.noarch.rpm
warning: openib-1.4.1-5.el5.noarch.rpm: Header V3 DSA/SHA1 Signature, key ID 192a7d7d: NOKEY
Name : openib Relocations: (not relocatable)
Version : 1.4.1 Vendor: Scientific Linux
Release : 5.el5 Build Date: Wed 31 Mar 2010 12:39:27 AM PDT
Install Date: (not installed) Build Host: norob.fnal.gov
Group : System Environment/Base Source RPM: openib-1.4.1-5.el5.src.rpm
Size : 27021 License: GPL/BSD
Signature : DSA/SHA1, Wed 31 Mar 2010 12:52:50 PM PDT, Key ID b0b4183f192a7d7d
URL : http://www.openfabrics.org/
Summary : OpenIB Infiniband Driver Stack
Description :
User space initialization scripts for the kernel InfiniBand drivers
postinstall scriptlet (using /bin/sh):
if [ $1 = 1 ]; then
/sbin/chkconfig --add openibd
fi
preuninstall scriptlet (using /bin/sh):
if [ $1 = 0 ]; then
/sbin/chkconfig --del openibd
fi
/etc/ofed
/etc/ofed/fixup-mtrr.awk
/etc/ofed/openib.conf
/etc/rc.d/init.d/openibd
/etc/sysconfig/network-scripts/ifup-ib
/etc/udev/rules.d/90-ib.rules
The script (openidb) have been 'moved' to kernel-ib package for OFED version 1.5.*.
To overcome the situation the following code change in lustre-reviews/build/lbuild (inside loop beginning at line 1216; `for file in $(ls ${TOPDIR}/lustre/build/patches/ofed/*.patch); do´ )
if [ file =~ "${CANONICAL_TARGET}" ]
ed_fragment3="$ed_fragment3
$(cat $file)"
let n=$n+1
end
and rename of the ed - script (to remove packaging of openibd files and scripts) from
01-play-nice-with-RHEL5.ed
to
01-play-nice-with-rhel5.ed
is necessary. This will ensure that kernel-ib ofa-builds for rhel5 are created without openibd scripts, but make them available for rhel6 RPMs.
Attachments
Issue Links
- is duplicated by
-
LU-2972 Execution conflict of OFED initialisation script
-
- Closed
-
chris: Because the standard OFED build assumes a "vanilla" Linux installation does not really take into account vendor "Value Add" such as RedHat has done with their "rdma" package. Ideally, their packaging process should try to figure out if they need to interoperate with the vendors "Value Add' but I don't believe it does".