Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2907

Infiniband HW kernel modules of OFA builds not started at system boot

Details

    • Story
    • Resolution: Fixed
    • Blocker
    • Lustre 2.4.0
    • Lustre 2.4.0
    • 6997

    Description

      Symptom:
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      When provisioning test nodes with ofa builds (i.e. 'external' build of the kernel-ib based on Openfabrics OFED tarballs) based on rhel6 and compiled against kernel version 2.6.32-279, the initialization of the Infiniband interfaces (ib0, ib1,...) fails due to the fact the low level kernel Infiniband HW modules mlx4_core, mlx4_en are not loaded.

      When loading the kernel-ib HW modules manually (modprobe mlx4_core,...) the interface are created and operational (i.e. connected to fabric, IP over IB works,...)

      The kernel-ib RPM normally is going to be build with a set of startup-scripts (/etc/init.d/openibd and links in /etc/rc.d/*, chkconfig execution,...) to ensure that the Infiniband HW kernel modules are loaded during system start. These files/scripts are missing in the kernel-ib RPM.

      Due to a installation conflict of the kernel-ib with openibd RPM for canonical distribution 'rhel5' the scripts/files were removed from the OFED kernel-ib SPEC file before creating them (rpmbuild) with help of the lbuild script. (See LU-388 for further details)

      This conflict no longer exist since openib-<version>.rpm isn't part of rhel6 anymore. Additionally the functionality of initializing the Infiniband HW is gone, too, because openib RPM contain(ed) the necessary startup scripts:

      rpm -qil --scripts -p openib-1.4.1-5.el5.noarch.rpm
      warning: openib-1.4.1-5.el5.noarch.rpm: Header V3 DSA/SHA1 Signature, key ID 192a7d7d: NOKEY
      Name : openib Relocations: (not relocatable)
      Version : 1.4.1 Vendor: Scientific Linux
      Release : 5.el5 Build Date: Wed 31 Mar 2010 12:39:27 AM PDT
      Install Date: (not installed) Build Host: norob.fnal.gov
      Group : System Environment/Base Source RPM: openib-1.4.1-5.el5.src.rpm
      Size : 27021 License: GPL/BSD
      Signature : DSA/SHA1, Wed 31 Mar 2010 12:52:50 PM PDT, Key ID b0b4183f192a7d7d
      URL : http://www.openfabrics.org/
      Summary : OpenIB Infiniband Driver Stack
      Description :
      User space initialization scripts for the kernel InfiniBand drivers
      postinstall scriptlet (using /bin/sh):
      if [ $1 = 1 ]; then
      /sbin/chkconfig --add openibd
      fi
      preuninstall scriptlet (using /bin/sh):
      if [ $1 = 0 ]; then
      /sbin/chkconfig --del openibd
      fi
      /etc/ofed
      /etc/ofed/fixup-mtrr.awk
      /etc/ofed/openib.conf
      /etc/rc.d/init.d/openibd
      /etc/sysconfig/network-scripts/ifup-ib
      /etc/udev/rules.d/90-ib.rules

      The script (openidb) have been 'moved' to kernel-ib package for OFED version 1.5.*.

      To overcome the situation the following code change in lustre-reviews/build/lbuild (inside loop beginning at line 1216; `for file in $(ls ${TOPDIR}/lustre/build/patches/ofed/*.patch); do´ )

      if [ file =~ "${CANONICAL_TARGET}" ]
      ed_fragment3="$ed_fragment3
      $(cat $file)"
      let n=$n+1
      end

      and rename of the ed - script (to remove packaging of openibd files and scripts) from

      01-play-nice-with-RHEL5.ed
      to
      01-play-nice-with-rhel5.ed

      is necessary. This will ensure that kernel-ib ofa-builds for rhel5 are created without openibd scripts, but make them available for rhel6 RPMs.

      Attachments

        Issue Links

          Activity

            [LU-2907] Infiniband HW kernel modules of OFA builds not started at system boot

            Yes, that the right, so we have two potential solutions for the problem. I didn't find out yet why the entries for mlx4_en are created. I'll that check on Monday.

            heckes Frank Heckes (Inactive) added a comment - Yes, that the right, so we have two potential solutions for the problem. I didn't find out yet why the entries for mlx4_en are created. I'll that check on Monday.

            Frank,

            Was this discovery:

            For ofa builds the only the HCA is detected, but the drivers don't. Reason is a dublicate entry in
            modules.alias for the ofa build:

            client-7-modules.alias-ofa:alias pci:v000015B3d0000673Csv*sd*bc*sc*i* mlx4_en
            client-7-modules.alias-ofa:alias pci:v000015B3d0000673Csv*sd*bc*sc*i* mlx4_core

            Removing the entry for mlx4_en fixes the problem and rdma scripts works for ofa, too.

            made after we spoke on Friday? i.e. is that the smoking gun and if we figure out why that duplicate entry (which is only there when using the OFA I/B, is that right?) is being created it will resolve the issue and the rdma initscript will be fully-functional?

            brian Brian Murrell (Inactive) added a comment - Frank, Was this discovery: For ofa builds the only the HCA is detected, but the drivers don't. Reason is a dublicate entry in modules.alias for the ofa build: client-7-modules.alias-ofa:alias pci:v000015B3d0000673Csv*sd*bc*sc*i* mlx4_en client-7-modules.alias-ofa:alias pci:v000015B3d0000673Csv*sd*bc*sc*i* mlx4_core Removing the entry for mlx4_en fixes the problem and rdma scripts works for ofa, too. made after we spoke on Friday? i.e. is that the smoking gun and if we figure out why that duplicate entry (which is only there when using the OFA I/B, is that right?) is being created it will resolve the issue and the rdma initscript will be fully-functional?
            heckes Frank Heckes (Inactive) added a comment - - edited

            For inkernel build the mlx4_core and mlx4_en are not part of the initrmamfs. I checked the initrd.kdump file by mistake. Anyway important finding is that the modules are started before the execution of the /etc/init.d/rdma - script

            For the inkernel build the following sequence relevant to the infiniband initialization is performed:

            init run /etc/rc.d/rc.sysinit
            /etc/rc.sysinit run /sbin/start_udev
            /sbin/start_udev runs udevd
            udevd receives event from kernel that HCA interface is available
            udevd triggers load of mlx4_core, and mlx4_en
            /etc/rc.sysinit executes active run-level scripts
            rdma is executed
            if mlx4_core is started mlx4_ib is started ---> which will create interface (ib0, ib...)
            if interface is (ib0) available IP configuration is done
            rdma finish with success

            The 'critical' part for script rdma is whether mlx4_core is loaded or not. If the module is not present the
            initialization of the infiniband interface fails.

            The behaviour (for the inkernel) can be repeated at run-time by running udevadm monitor --environment and by executing
            /etc/init.d/rdma stop
            echo 1 > /sys/devices/pci0000\:00/0000\:00\:03.0/0000\:02\:00.0/remove

            --> this will remove all mlx4_* modules and the HCA (infiniband) card from the OS

            Executing:
            echo 1 > > /sys/bus/pci/rescan

            adds the hardware and udevd starts the mlx4_en, mlx4_core driver (see client-7-)

            If the hardware isn't removed, but all mlx4_* modules are unloaded the udevd reloads the mlx4_core, mlx4_en
            when starting the ib-interface via /etc/init.d/rdma.
            The startup is handled by the entry:
            alias pci:v000015B3d0000673Csv*sd*bc*sc*i* mlx4_core

            For ofa builds the only the HCA is detected, but the drivers don't. Reason is a dublicate entry in
            modules.alias for the ofa build:

            client-7-modules.alias-ofa:alias pci:v000015B3d0000673Csv*sd*bc*sc*i* mlx4_en
            client-7-modules.alias-ofa:alias pci:v000015B3d0000673Csv*sd*bc*sc*i* mlx4_core

            Removing the entry for mlx4_en fixes the problem and rdma scripts works for ofa, too.

            heckes Frank Heckes (Inactive) added a comment - - edited For inkernel build the mlx4_core and mlx4_en are not part of the initrmamfs. I checked the initrd.kdump file by mistake. Anyway important finding is that the modules are started before the execution of the /etc/init.d/rdma - script For the inkernel build the following sequence relevant to the infiniband initialization is performed: init run /etc/rc.d/rc.sysinit /etc/rc.sysinit run /sbin/start_udev /sbin/start_udev runs udevd udevd receives event from kernel that HCA interface is available udevd triggers load of mlx4_core, and mlx4_en /etc/rc.sysinit executes active run-level scripts rdma is executed if mlx4_core is started mlx4_ib is started ---> which will create interface (ib0, ib...) if interface is (ib0) available IP configuration is done rdma finish with success The 'critical' part for script rdma is whether mlx4_core is loaded or not. If the module is not present the initialization of the infiniband interface fails. The behaviour (for the inkernel) can be repeated at run-time by running udevadm monitor --environment and by executing /etc/init.d/rdma stop echo 1 > /sys/devices/pci0000\:00/0000\:00\:03.0/0000\:02\:00.0/remove --> this will remove all mlx4_* modules and the HCA (infiniband) card from the OS Executing: echo 1 > > /sys/bus/pci/rescan adds the hardware and udevd starts the mlx4_en, mlx4_core driver (see client-7-) If the hardware isn't removed, but all mlx4_* modules are unloaded the udevd reloads the mlx4_core, mlx4_en when starting the ib-interface via /etc/init.d/rdma. The startup is handled by the entry: alias pci:v000015B3d0000673Csv*sd*bc*sc*i* mlx4_core For ofa builds the only the HCA is detected, but the drivers don't. Reason is a dublicate entry in modules.alias for the ofa build: client-7-modules.alias-ofa:alias pci:v000015B3d0000673Csv*sd*bc*sc*i* mlx4_en client-7-modules.alias-ofa:alias pci:v000015B3d0000673Csv*sd*bc*sc*i* mlx4_core Removing the entry for mlx4_en fixes the problem and rdma scripts works for ofa, too.

            Created change for lbuild to alter the kernel-ib SPEC file based on the canonical target name of the distribution (will preserve changes for rhel5).

            Also continue investigating into option adding the mlx4_

            {core,en,ib}

            to initrd and why it isn't done for ofa builds in parallel.

            heckes Frank Heckes (Inactive) added a comment - Created change for lbuild to alter the kernel-ib SPEC file based on the canonical target name of the distribution (will preserve changes for rhel5). Also continue investigating into option adding the mlx4_ {core,en,ib} to initrd and why it isn't done for ofa builds in parallel.
            brian Brian Murrell (Inactive) added a comment - - edited

            chris: Because the standard OFED build assumes a "vanilla" Linux installation does not really take into account vendor "Value Add" such as RedHat has done with their "rdma" package. Ideally, their packaging process should try to figure out if they need to interoperate with the vendors "Value Add' but I don't believe it does".

            brian Brian Murrell (Inactive) added a comment - - edited chris : Because the standard OFED build assumes a "vanilla" Linux installation does not really take into account vendor "Value Add" such as RedHat has done with their "rdma" package. Ideally, their packaging process should try to figure out if they need to interoperate with the vendors "Value Add' but I don't believe it does".

            Brian: I have little insight into the detail on this. But I am surprised that the standard OFED build would not be the best outcome, why do we need to modify the standard build? Or more correctly why would the standard build be of a form that is not providing the best functionality?

            Frank: Is it the case that the standard OFED build, without the spec file change, builds, installs and runs properly - or have I missed something?

            chris Chris Gearing (Inactive) added a comment - Brian: I have little insight into the detail on this. But I am surprised that the standard OFED build would not be the best outcome, why do we need to modify the standard build? Or more correctly why would the standard build be of a form that is not providing the best functionality? Frank: Is it the case that the standard OFED build, without the spec file change, builds, installs and runs properly - or have I missed something?

            If the mlx4_* modules really are only being installed by virtue of them being in the ramdisk, why do they not get included in the ramdisk when kernel-ib is installed? i.e. Why do we have to modify /etc/sysconfig/kernel for the kernel-ib case and not for the stock kernel case?

            brian Brian Murrell (Inactive) added a comment - If the mlx4_* modules really are only being installed by virtue of them being in the ramdisk, why do they not get included in the ramdisk when kernel-ib is installed? i.e. Why do we have to modify /etc/sysconfig/kernel for the kernel-ib case and not for the stock kernel case?

            Hi Brian,

            reason why the stock ('inkernel') starts the mlx4_

            {core, en, ib}

            modules, is because they're included in the initial ramdisk of the Lustre kernel (--> see above in my previous comment). I think that is well understood.

            We could use the same idea for the external (OFA) builds to circumvent the risk for any clashes of whatever scripts available in the distro with the kernel-ib scripts.
            This could be done by adding the modules to /etc/sysconfig/kernel add re-create the Lustre-kernel init-ramdisk, as said above.
            Indeed applying the ed-script inside the lbuild script could be left as it is.

            For rhel5 there used to be a dedicated RPM (openib; listed in my first comment) that contained the init script '/etc/init.d/openibd' which is (was) supplied by OFED-1.4.* kernel-ib RPM, too. That was the conflict resolved in LU-388.
            For rhel6 the openib-RPM doesn't exist anymore, i.e. the packaging has changed.

            But I agree rdma and openibd (of the OFED-1.5.4 kernel-ib) can modprobe the same modules (besides the HW core modules ) and set the IP address twice, but that won't do any harm, I guess. (I'll try that on Toro: client-7)

            heckes Frank Heckes (Inactive) added a comment - Hi Brian, reason why the stock ('inkernel') starts the mlx4_ {core, en, ib} modules, is because they're included in the initial ramdisk of the Lustre kernel (--> see above in my previous comment). I think that is well understood. We could use the same idea for the external (OFA) builds to circumvent the risk for any clashes of whatever scripts available in the distro with the kernel-ib scripts. This could be done by adding the modules to /etc/sysconfig/kernel add re-create the Lustre-kernel init-ramdisk, as said above. Indeed applying the ed-script inside the lbuild script could be left as it is. For rhel5 there used to be a dedicated RPM (openib; listed in my first comment) that contained the init script '/etc/init.d/openibd' which is (was) supplied by OFED-1.4.* kernel-ib RPM, too. That was the conflict resolved in LU-388 . For rhel6 the openib-RPM doesn't exist anymore, i.e. the packaging has changed. But I agree rdma and openibd (of the OFED-1.5.4 kernel-ib) can modprobe the same modules (besides the HW core modules ) and set the IP address twice, but that won't do any harm, I guess. (I'll try that on Toro: client-7)

            Frank,

            To be clear, the issue of conflict is not one of simply RPM naming, but its of having multiple initscripts trying to do the same things. If we install an initscript in kernel-ib that fiddles with I/B and the user decides to also install the rdma RPM which also fiddles with I/B there is a conflict there as both should not be trying to do the same thing. Ultimately we need our added kernel-ib to integrate with the base O/S as closely as we can.

            So the question becomes, why are the mlx4_* modules from the stock kernel loaded during boot but when they are supplied by kernel-ib they are not loaded?

            Perhaps you need to compare the operation of the rdma initscript with and without kernel-ib. You could insert the following right after the first line (i.e. after the #! line) of the rdma initscript:

            exec 2>/tmp/rdma.debug
            set -x
            

            This will log the xtrace of that initscript to /tmp/rdma.debug. Do that and boot with both kernel-ib installed and without it installed and compare them and see if they operate differently, and if they do, why they do.

            It might also be worth while taking an inventory of the installed modules (i.e. lsmod) before and after the rdma initscript runs during boot. You could add an "lsmod > /tmp/before" to the initscript before it calls start() and an "lsmod >/tmp/after" after it exits from start() and again, run that with and without kernel-ib to see what difference in behaviour there is.

            Ultimately what you have here is a case were something ought to work but doesn't. In such cases it's usually better to understand why something that ought to work but doesn't doesn't and approach from there. The problem is that I don't think we yet know why that something that ought to work doesn't actually work so any attempt to band-aid it has a likelihood of causing some other unexpected problem and it might not happen until it's out in the field where it becomes a customer support problem (i.e. much more expensive to deal with) and a mea culpa.

            brian Brian Murrell (Inactive) added a comment - Frank, To be clear, the issue of conflict is not one of simply RPM naming, but its of having multiple initscripts trying to do the same things. If we install an initscript in kernel-ib that fiddles with I/B and the user decides to also install the rdma RPM which also fiddles with I/B there is a conflict there as both should not be trying to do the same thing. Ultimately we need our added kernel-ib to integrate with the base O/S as closely as we can. So the question becomes, why are the mlx4_* modules from the stock kernel loaded during boot but when they are supplied by kernel-ib they are not loaded? Perhaps you need to compare the operation of the rdma initscript with and without kernel-ib. You could insert the following right after the first line (i.e. after the #! line) of the rdma initscript: exec 2>/tmp/rdma.debug set -x This will log the xtrace of that initscript to /tmp/rdma.debug. Do that and boot with both kernel-ib installed and without it installed and compare them and see if they operate differently, and if they do, why they do. It might also be worth while taking an inventory of the installed modules (i.e. lsmod) before and after the rdma initscript runs during boot. You could add an "lsmod > /tmp/before" to the initscript before it calls start() and an "lsmod >/tmp/after" after it exits from start() and again, run that with and without kernel-ib to see what difference in behaviour there is. Ultimately what you have here is a case were something ought to work but doesn't. In such cases it's usually better to understand why something that ought to work but doesn't doesn't and approach from there. The problem is that I don't think we yet know why that something that ought to work doesn't actually work so any attempt to band-aid it has a likelihood of causing some other unexpected problem and it might not happen until it's out in the field where it becomes a customer support problem (i.e. much more expensive to deal with) and a mea culpa.
            heckes Frank Heckes (Inactive) added a comment - - edited

            Hi Brian,

            well, the problem is that the rdma RPM (script) was there from the beginning, i.e. it was installed during the node provisioning:
            ...
            rng-tools-2-13.el6_2.x86_64 Mon 04 Mar 2013 08:39:51 AM PST
            readahead-1.5.6-1.el6.x86_64 Mon 04 Mar 2013 08:39:51 AM PST
            rdma-3.3-4.el6_3.noarch Mon 04 Mar 2013 08:39:51 AM PST
            quota-3.17-16.el6.x86_64 Mon 04 Mar 2013 08:39:51 AM PST
            microcode_ctl-1.17-11.el6.x86_64 Mon 04 Mar 2013 08:39:51 AM PST
            ...
            ...

            and it failed.

            I really looked and reconsidered the rdma (/etc/init.d/rdma) script again, but it will initialize the Infiniband interface with an IP Address
            if the card has been recognized by the OS. This is only the case if the modules mlx4_core, mlx4_en and mlx4_ib are loaded. This is what the rdma
            doesn't provide.

            It fails during system boot:

            Bringing up interface ib0: Device ib0 does not seem to be present, delaying initialization.
            [FAILED]

            No hardware was detected:
            [root@client-7 ~]# /etc/init.d/rdma status
            Low level hardware support loaded:
            none found

            Upper layer protocol modules:
            ib_ipoib

            User space access modules:
            rdma_ucm ib_ucm ib_uverbs ib_umad

            Connection management modules:
            rdma_cm ib_cm iw_cm

            Configured IPoIB interfaces: none
            Currently active IPoIB interfaces: none
            [root@client-7 ~]# ip link
            1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
            link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
            2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
            link/ether 00:30:48:f7:72:4e brd ff:ff:ff:ff:ff:ff
            3: eth1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN qlen 1000
            link/ether 00:30:48:f7:72:4f brd ff:ff:ff:ff:ff:ff

            rdma script is active:
            [root@client-7 ~]# chkconfig --list rdma
            rdma 0:off 1:off 2:on 3:on 4:on 5:on 6:off

            Starting the IB HW modules manually (mlx4_core, mlx4_en, mlx4_ib) 'fix' the problem.

            Further I found that openibd RPM (part of rhel5 distro) contained the /etc/init.d/openibd script starts the HW modules.
            This is only the case for rhel5 and OFED-1.4.*

            For rhel6 the distro no longer contains the openib RPM. Therefore there's no conflict.

            At first glance there's the strange fact that the 'inkernel' build initilizes the Infiniband card correctly. But reason is
            that the modules are part of the initial ramdisk (extracted from inkernel build of #180@lustre-b2_1):

            ./lib/modules/2.6.32-279.14.1.el6_lustre.g044a3a2.x86_64:
            total 4548
            rw-rr- 1 root root 23712 Mar 6 03:31 acpi-cpufreq.ko
            rw-rr- 1 root root 85080 Mar 6 03:31 ahci.ko
            rw-rr- 1 root root 13672 Mar 6 03:31 ata_generic.ko
            ...
            ...
            ...
            rw-rr- 1 root root 36240 Mar 6 03:31 microcode.ko
            rw-rr- 1 root root 300952 Mar 6 03:31 mlx4_core.ko
            rw-rr- 1 root root 126960 Mar 6 03:31 mlx4_en.ko
            rw-rr- 1 root root 99544 Mar 6 03:31 mlx4_ib.ko
            rw-rr- 1 root root 21055 Mar 6 03:31 modules.alias

            This is also visible from the system boot messages:

            mlx4_core: Mellanox ConnectX core driver v1.1 (Dec, 2011)
            mlx4_core: Initializing 0000:02:00.0
            mlx4_core 0000:02:00.0: PCI INT A -> GSI 24 (level, low) -> IRQ 24
            mlx4_en: Mellanox ConnectX HCA Ethernet driver v2.0 (Dec 2011)
            mlx4_en 0000:02:00.0: UDP RSS is not supported on this device.
            mlx4_ib: Mellanox ConnectX InfiniBand driver v1.0 (April 4, 2008)

            Setting hostname client-7.lab.whamcloud.com: [ OK ]
            Setting up Logical Volume Management: No volume groups found
            [ OK ]

            i.e. modules are loaded before init execute the run-level scripts.

            This could be a workaround for the OFA builds, too. I.e. add mlx4_

            {core,en,ib}

            to /etc/sysconfig/kernel
            to add ensure they started at system boot.

            heckes Frank Heckes (Inactive) added a comment - - edited Hi Brian, well, the problem is that the rdma RPM (script) was there from the beginning, i.e. it was installed during the node provisioning: ... rng-tools-2-13.el6_2.x86_64 Mon 04 Mar 2013 08:39:51 AM PST readahead-1.5.6-1.el6.x86_64 Mon 04 Mar 2013 08:39:51 AM PST rdma-3.3-4.el6_3.noarch Mon 04 Mar 2013 08:39:51 AM PST quota-3.17-16.el6.x86_64 Mon 04 Mar 2013 08:39:51 AM PST microcode_ctl-1.17-11.el6.x86_64 Mon 04 Mar 2013 08:39:51 AM PST ... ... and it failed. I really looked and reconsidered the rdma (/etc/init.d/rdma) script again, but it will initialize the Infiniband interface with an IP Address if the card has been recognized by the OS. This is only the case if the modules mlx4_core, mlx4_en and mlx4_ib are loaded. This is what the rdma doesn't provide. It fails during system boot: Bringing up interface ib0: Device ib0 does not seem to be present, delaying initialization. [FAILED] No hardware was detected: [root@client-7 ~] # /etc/init.d/rdma status Low level hardware support loaded: none found Upper layer protocol modules: ib_ipoib User space access modules: rdma_ucm ib_ucm ib_uverbs ib_umad Connection management modules: rdma_cm ib_cm iw_cm Configured IPoIB interfaces: none Currently active IPoIB interfaces: none [root@client-7 ~] # ip link 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000 link/ether 00:30:48:f7:72:4e brd ff:ff:ff:ff:ff:ff 3: eth1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN qlen 1000 link/ether 00:30:48:f7:72:4f brd ff:ff:ff:ff:ff:ff rdma script is active: [root@client-7 ~] # chkconfig --list rdma rdma 0:off 1:off 2:on 3:on 4:on 5:on 6:off Starting the IB HW modules manually (mlx4_core, mlx4_en, mlx4_ib) 'fix' the problem. Further I found that openibd RPM (part of rhel5 distro) contained the /etc/init.d/openibd script starts the HW modules. This is only the case for rhel5 and OFED-1.4.* For rhel6 the distro no longer contains the openib RPM. Therefore there's no conflict. At first glance there's the strange fact that the 'inkernel' build initilizes the Infiniband card correctly. But reason is that the modules are part of the initial ramdisk (extracted from inkernel build of #180@lustre-b2_1): ./lib/modules/2.6.32-279.14.1.el6_lustre.g044a3a2.x86_64: total 4548 rw-r r - 1 root root 23712 Mar 6 03:31 acpi-cpufreq.ko rw-r r - 1 root root 85080 Mar 6 03:31 ahci.ko rw-r r - 1 root root 13672 Mar 6 03:31 ata_generic.ko ... ... ... rw-r r - 1 root root 36240 Mar 6 03:31 microcode.ko rw-r r - 1 root root 300952 Mar 6 03:31 mlx4_core.ko rw-r r - 1 root root 126960 Mar 6 03:31 mlx4_en.ko rw-r r - 1 root root 99544 Mar 6 03:31 mlx4_ib.ko rw-r r - 1 root root 21055 Mar 6 03:31 modules.alias This is also visible from the system boot messages: mlx4_core: Mellanox ConnectX core driver v1.1 (Dec, 2011) mlx4_core: Initializing 0000:02:00.0 mlx4_core 0000:02:00.0: PCI INT A -> GSI 24 (level, low) -> IRQ 24 mlx4_en: Mellanox ConnectX HCA Ethernet driver v2.0 (Dec 2011) mlx4_en 0000:02:00.0: UDP RSS is not supported on this device. mlx4_ib: Mellanox ConnectX InfiniBand driver v1.0 (April 4, 2008) Setting hostname client-7.lab.whamcloud.com: [ OK ] Setting up Logical Volume Management: No volume groups found [ OK ] i.e. modules are loaded before init execute the run-level scripts. This could be a workaround for the OFA builds, too. I.e. add mlx4_ {core,en,ib} to /etc/sysconfig/kernel to add ensure they started at system boot.

            Frank,

            I don't understand. We talked about this at quite some length (must have been several hours over a few conversations) and I thought we came to the same conclusion. I thought we had agreed that the patching (01-play-nice-with-RHEL5.ed) in lbuild should stay as it is for both EL5 and EL6 and the solution to the problem of initializing drivers on EL6 was the job of the rdma initscript from the rdma RPM. i.e. simply "yum install rdma" on EL6 nodes to get initscripts to load the I/B drivers.

            Has something changed since those conversations?

            brian Brian Murrell (Inactive) added a comment - Frank, I don't understand. We talked about this at quite some length (must have been several hours over a few conversations) and I thought we came to the same conclusion. I thought we had agreed that the patching (01-play-nice-with-RHEL5.ed) in lbuild should stay as it is for both EL5 and EL6 and the solution to the problem of initializing drivers on EL6 was the job of the rdma initscript from the rdma RPM. i.e. simply "yum install rdma" on EL6 nodes to get initscripts to load the I/B drivers. Has something changed since those conversations?

            People

              heckes Frank Heckes (Inactive)
              heckes Frank Heckes (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: