[LU-2907] Infiniband HW kernel modules of OFA builds not started at system boot Created: 05/Mar/13 Updated: 23/Apr/13 Resolved: 07/Apr/13 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.0 |
| Fix Version/s: | Lustre 2.4.0 |
| Type: | Story | Priority: | Blocker |
| Reporter: | Frank Heckes (Inactive) | Assignee: | Frank Heckes (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | HB | ||
| Issue Links: |
|
||||||||||||
| Rank (Obsolete): | 6997 | ||||||||||||
| Description |
|
Symptom: When loading the kernel-ib HW modules manually (modprobe mlx4_core,...) the interface are created and operational (i.e. connected to fabric, IP over IB works,...) The kernel-ib RPM normally is going to be build with a set of startup-scripts (/etc/init.d/openibd and links in /etc/rc.d/*, chkconfig execution,...) to ensure that the Infiniband HW kernel modules are loaded during system start. These files/scripts are missing in the kernel-ib RPM. Due to a installation conflict of the kernel-ib with openibd RPM for canonical distribution 'rhel5' the scripts/files were removed from the OFED kernel-ib SPEC file before creating them (rpmbuild) with help of the lbuild script. (See This conflict no longer exist since openib-<version>.rpm isn't part of rhel6 anymore. Additionally the functionality of initializing the Infiniband HW is gone, too, because openib RPM contain(ed) the necessary startup scripts: rpm -qil --scripts -p openib-1.4.1-5.el5.noarch.rpm The script (openidb) have been 'moved' to kernel-ib package for OFED version 1.5.*. To overcome the situation the following code change in lustre-reviews/build/lbuild (inside loop beginning at line 1216; `for file in $(ls ${TOPDIR}/lustre/build/patches/ofed/*.patch); do´ ) if [ file =~ "${CANONICAL_TARGET}" ] and rename of the ed - script (to remove packaging of openibd files and scripts) from 01-play-nice-with-RHEL5.ed is necessary. This will ensure that kernel-ib ofa-builds for rhel5 are created without openibd scripts, but make them available for rhel6 RPMs. |
| Comments |
| Comment by Brian Murrell (Inactive) [ 05/Mar/13 ] |
|
Frank, I don't understand. We talked about this at quite some length (must have been several hours over a few conversations) and I thought we came to the same conclusion. I thought we had agreed that the patching (01-play-nice-with-RHEL5.ed) in lbuild should stay as it is for both EL5 and EL6 and the solution to the problem of initializing drivers on EL6 was the job of the rdma initscript from the rdma RPM. i.e. simply "yum install rdma" on EL6 nodes to get initscripts to load the I/B drivers. Has something changed since those conversations? |
| Comment by Frank Heckes (Inactive) [ 06/Mar/13 ] |
|
Hi Brian, well, the problem is that the rdma RPM (script) was there from the beginning, i.e. it was installed during the node provisioning: and it failed. I really looked and reconsidered the rdma (/etc/init.d/rdma) script again, but it will initialize the Infiniband interface with an IP Address It fails during system boot: Bringing up interface ib0: Device ib0 does not seem to be present, delaying initialization. No hardware was detected: Upper layer protocol modules: User space access modules: Connection management modules: Configured IPoIB interfaces: none rdma script is active: Starting the IB HW modules manually (mlx4_core, mlx4_en, mlx4_ib) 'fix' the problem. Further I found that openibd RPM (part of rhel5 distro) contained the /etc/init.d/openibd script starts the HW modules. For rhel6 the distro no longer contains the openib RPM. Therefore there's no conflict. At first glance there's the strange fact that the 'inkernel' build initilizes the Infiniband card correctly. But reason is ./lib/modules/2.6.32-279.14.1.el6_lustre.g044a3a2.x86_64: This is also visible from the system boot messages: mlx4_core: Mellanox ConnectX core driver v1.1 (Dec, 2011) Setting hostname client-7.lab.whamcloud.com: [ OK ] i.e. modules are loaded before init execute the run-level scripts. This could be a workaround for the OFA builds, too. I.e. add mlx4_ {core,en,ib} to /etc/sysconfig/kernel |
| Comment by Brian Murrell (Inactive) [ 06/Mar/13 ] |
|
Frank, To be clear, the issue of conflict is not one of simply RPM naming, but its of having multiple initscripts trying to do the same things. If we install an initscript in kernel-ib that fiddles with I/B and the user decides to also install the rdma RPM which also fiddles with I/B there is a conflict there as both should not be trying to do the same thing. Ultimately we need our added kernel-ib to integrate with the base O/S as closely as we can. So the question becomes, why are the mlx4_* modules from the stock kernel loaded during boot but when they are supplied by kernel-ib they are not loaded? Perhaps you need to compare the operation of the rdma initscript with and without kernel-ib. You could insert the following right after the first line (i.e. after the #! line) of the rdma initscript: exec 2>/tmp/rdma.debug set -x This will log the xtrace of that initscript to /tmp/rdma.debug. Do that and boot with both kernel-ib installed and without it installed and compare them and see if they operate differently, and if they do, why they do. It might also be worth while taking an inventory of the installed modules (i.e. lsmod) before and after the rdma initscript runs during boot. You could add an "lsmod > /tmp/before" to the initscript before it calls start() and an "lsmod >/tmp/after" after it exits from start() and again, run that with and without kernel-ib to see what difference in behaviour there is. Ultimately what you have here is a case were something ought to work but doesn't. In such cases it's usually better to understand why something that ought to work but doesn't doesn't and approach from there. The problem is that I don't think we yet know why that something that ought to work doesn't actually work so any attempt to band-aid it has a likelihood of causing some other unexpected problem and it might not happen until it's out in the field where it becomes a customer support problem (i.e. much more expensive to deal with) and a mea culpa. |
| Comment by Frank Heckes (Inactive) [ 06/Mar/13 ] |
|
Hi Brian, reason why the stock ('inkernel') starts the mlx4_ {core, en, ib}modules, is because they're included in the initial ramdisk of the Lustre kernel (--> see above in my previous comment). I think that is well understood. We could use the same idea for the external (OFA) builds to circumvent the risk for any clashes of whatever scripts available in the distro with the kernel-ib scripts. For rhel5 there used to be a dedicated RPM (openib; listed in my first comment) that contained the init script '/etc/init.d/openibd' which is (was) supplied by OFED-1.4.* kernel-ib RPM, too. That was the conflict resolved in But I agree rdma and openibd (of the OFED-1.5.4 kernel-ib) can modprobe the same modules (besides the HW core modules |
| Comment by Brian Murrell (Inactive) [ 06/Mar/13 ] |
|
If the mlx4_* modules really are only being installed by virtue of them being in the ramdisk, why do they not get included in the ramdisk when kernel-ib is installed? i.e. Why do we have to modify /etc/sysconfig/kernel for the kernel-ib case and not for the stock kernel case? |
| Comment by Chris Gearing (Inactive) [ 06/Mar/13 ] |
|
Brian: I have little insight into the detail on this. But I am surprised that the standard OFED build would not be the best outcome, why do we need to modify the standard build? Or more correctly why would the standard build be of a form that is not providing the best functionality? Frank: Is it the case that the standard OFED build, without the spec file change, builds, installs and runs properly - or have I missed something? |
| Comment by Brian Murrell (Inactive) [ 06/Mar/13 ] |
|
chris: Because the standard OFED build assumes a "vanilla" Linux installation does not really take into account vendor "Value Add" such as RedHat has done with their "rdma" package. Ideally, their packaging process should try to figure out if they need to interoperate with the vendors "Value Add' but I don't believe it does". |
| Comment by Frank Heckes (Inactive) [ 07/Mar/13 ] |
|
Created change for lbuild to alter the kernel-ib SPEC file based on the canonical target name of the distribution (will preserve changes for rhel5). Also continue investigating into option adding the mlx4_ {core,en,ib}to initrd and why it isn't done for ofa builds in parallel. |
| Comment by Frank Heckes (Inactive) [ 15/Mar/13 ] |
|
For inkernel build the mlx4_core and mlx4_en are not part of the initrmamfs. I checked the initrd.kdump file by mistake. Anyway important finding is that the modules are started before the execution of the /etc/init.d/rdma - script For the inkernel build the following sequence relevant to the infiniband initialization is performed: init run /etc/rc.d/rc.sysinit The 'critical' part for script rdma is whether mlx4_core is loaded or not. If the module is not present the The behaviour (for the inkernel) can be repeated at run-time by running udevadm monitor --environment and by executing --> this will remove all mlx4_* modules and the HCA (infiniband) card from the OS Executing: adds the hardware and udevd starts the mlx4_en, mlx4_core driver (see client-7-) If the hardware isn't removed, but all mlx4_* modules are unloaded the udevd reloads the mlx4_core, mlx4_en For ofa builds the only the HCA is detected, but the drivers don't. Reason is a dublicate entry in client-7-modules.alias-ofa:alias pci:v000015B3d0000673Csv*sd*bc*sc*i* mlx4_en Removing the entry for mlx4_en fixes the problem and rdma scripts works for ofa, too. |
| Comment by Brian Murrell (Inactive) [ 15/Mar/13 ] |
|
Frank, Was this discovery:
made after we spoke on Friday? i.e. is that the smoking gun and if we figure out why that duplicate entry (which is only there when using the OFA I/B, is that right?) is being created it will resolve the issue and the rdma initscript will be fully-functional? |
| Comment by Frank Heckes (Inactive) [ 16/Mar/13 ] |
|
Yes, that the right, so we have two potential solutions for the problem. I didn't find out yet why the entries for mlx4_en are created. I'll that check on Monday. |
| Comment by Frank Heckes (Inactive) [ 18/Mar/13 ] |
|
For both client and server ofa builds the modules mlx4_core, mlx4_en won't be loaded by udevd (started from /etc/rc.d/rc.sysinit) if the configuration file '/etc/modprobe.d/mlx4_en.conf' is present. If the file is removed (or moved to other directory or file name) startup of the mlx4_core, mlx4_en works and Content of the file reads as: The file is owned by the external OFED kernel-ib RPM: The failed startup of the modules in the case 'mlx_en.conf' is present can can be reproduced by: mappping. Easiest fix for the problem will be to remove the file '/etc/modprobe.d/mlx4_en.conf' from the 'packaging list' of the rpmbuild spec file for the OFED kernel-ib modules RPM. |
| Comment by Brian Murrell (Inactive) [ 19/Mar/13 ] |
Ahhh. Nice detective work Frank! This /etc/modprobe.d/mlx4_en.conf is marginally interesting. Reformatting it's lack of whitespace for ease of reading: install mlx4_core modprobe -ignore-install $((modprobe -c | grep -wq "^allow_unsupported_modules") && echo '-allow-unsupported-modules') mlx4_core && if [ -e /etc/infiniband/openib.conf ]; then if ( grep -q "^MLX4_EN_LOAD=yes" /etc/infiniband/openib.conf > /dev/null 2>&1); then modprobe mlx4_en fi else modprobe mlx4_en fi install mlx4_en modprobe -ignore-install $((modprobe -c | grep -wq "^allow_unsupported_modules") && echo '-allow-unsupported-modules') mlx4_en && if [ -e /etc/infiniband/openib.conf ]; then if ( grep -q "^RUN_SYSCTL=yes" /etc/infiniband/openib.conf > /dev/null 2>&1); then /sbin/sysctl_perf_tuning load fi fi remove mlx4_en /sbin/sysctl_perf_tuning unload ; modprobe -r --ignore-remove mlx4_en It's an interesting little bit of code. One thing about it worth noting is the reference to /etc/infiniband/openib.conf. Is that file used for anything other than this module installation configuration? If not, might as well remove it from the kernel-ib package as well. |
| Comment by Frank Heckes (Inactive) [ 20/Mar/13 ] |
|
No the file (/etc/infiniband/openib.conf) is there but not the entries the command grep-command search for, I'm sorry I forgot to append the line: to 01-play-nice-with-rhel5-rhel6.ed. Push it to git. |
| Comment by Peter Jones [ 07/Apr/13 ] |
|
Landed for 2.4 |