Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2907

Infiniband HW kernel modules of OFA builds not started at system boot

Details

    • Story
    • Resolution: Fixed
    • Blocker
    • Lustre 2.4.0
    • Lustre 2.4.0
    • 6997

    Description

      Symptom:
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      When provisioning test nodes with ofa builds (i.e. 'external' build of the kernel-ib based on Openfabrics OFED tarballs) based on rhel6 and compiled against kernel version 2.6.32-279, the initialization of the Infiniband interfaces (ib0, ib1,...) fails due to the fact the low level kernel Infiniband HW modules mlx4_core, mlx4_en are not loaded.

      When loading the kernel-ib HW modules manually (modprobe mlx4_core,...) the interface are created and operational (i.e. connected to fabric, IP over IB works,...)

      The kernel-ib RPM normally is going to be build with a set of startup-scripts (/etc/init.d/openibd and links in /etc/rc.d/*, chkconfig execution,...) to ensure that the Infiniband HW kernel modules are loaded during system start. These files/scripts are missing in the kernel-ib RPM.

      Due to a installation conflict of the kernel-ib with openibd RPM for canonical distribution 'rhel5' the scripts/files were removed from the OFED kernel-ib SPEC file before creating them (rpmbuild) with help of the lbuild script. (See LU-388 for further details)

      This conflict no longer exist since openib-<version>.rpm isn't part of rhel6 anymore. Additionally the functionality of initializing the Infiniband HW is gone, too, because openib RPM contain(ed) the necessary startup scripts:

      rpm -qil --scripts -p openib-1.4.1-5.el5.noarch.rpm
      warning: openib-1.4.1-5.el5.noarch.rpm: Header V3 DSA/SHA1 Signature, key ID 192a7d7d: NOKEY
      Name : openib Relocations: (not relocatable)
      Version : 1.4.1 Vendor: Scientific Linux
      Release : 5.el5 Build Date: Wed 31 Mar 2010 12:39:27 AM PDT
      Install Date: (not installed) Build Host: norob.fnal.gov
      Group : System Environment/Base Source RPM: openib-1.4.1-5.el5.src.rpm
      Size : 27021 License: GPL/BSD
      Signature : DSA/SHA1, Wed 31 Mar 2010 12:52:50 PM PDT, Key ID b0b4183f192a7d7d
      URL : http://www.openfabrics.org/
      Summary : OpenIB Infiniband Driver Stack
      Description :
      User space initialization scripts for the kernel InfiniBand drivers
      postinstall scriptlet (using /bin/sh):
      if [ $1 = 1 ]; then
      /sbin/chkconfig --add openibd
      fi
      preuninstall scriptlet (using /bin/sh):
      if [ $1 = 0 ]; then
      /sbin/chkconfig --del openibd
      fi
      /etc/ofed
      /etc/ofed/fixup-mtrr.awk
      /etc/ofed/openib.conf
      /etc/rc.d/init.d/openibd
      /etc/sysconfig/network-scripts/ifup-ib
      /etc/udev/rules.d/90-ib.rules

      The script (openidb) have been 'moved' to kernel-ib package for OFED version 1.5.*.

      To overcome the situation the following code change in lustre-reviews/build/lbuild (inside loop beginning at line 1216; `for file in $(ls ${TOPDIR}/lustre/build/patches/ofed/*.patch); do´ )

      if [ file =~ "${CANONICAL_TARGET}" ]
      ed_fragment3="$ed_fragment3
      $(cat $file)"
      let n=$n+1
      end

      and rename of the ed - script (to remove packaging of openibd files and scripts) from

      01-play-nice-with-RHEL5.ed
      to
      01-play-nice-with-rhel5.ed

      is necessary. This will ensure that kernel-ib ofa-builds for rhel5 are created without openibd scripts, but make them available for rhel6 RPMs.

      Attachments

        Issue Links

          Activity

            [LU-2907] Infiniband HW kernel modules of OFA builds not started at system boot
            pjones Peter Jones added a comment -

            Landed for 2.4

            pjones Peter Jones added a comment - Landed for 2.4
            heckes Frank Heckes (Inactive) added a comment - - edited

            No the file (/etc/infiniband/openib.conf) is there but not the entries the command grep-command search for,
            since they are removed with help of the 01-play-nice.....ed-script, but even if I add them the install directives
            prevent both mlx4_core and mlx4_en from being started.

            I'm sorry I forgot to append the line:
            g/mlx4_en.conf/d

            to 01-play-nice-with-rhel5-rhel6.ed. Push it to git.

            heckes Frank Heckes (Inactive) added a comment - - edited No the file (/etc/infiniband/openib.conf) is there but not the entries the command grep-command search for, since they are removed with help of the 01-play-nice.....ed-script, but even if I add them the install directives prevent both mlx4_core and mlx4_en from being started. I'm sorry I forgot to append the line: g/mlx4_en.conf/d to 01-play-nice-with-rhel5-rhel6.ed. Push it to git.

            Easiest fix for the problem will be to remove the file '/etc/modprobe.d/mlx4_en.conf' from the 'packaging list' of the rpmbuild spec file for the OFED kernel-ib modules RPM.

            Ahhh. Nice detective work Frank!

            This /etc/modprobe.d/mlx4_en.conf is marginally interesting. Reformatting it's lack of whitespace for ease of reading:

            install mlx4_core modprobe -ignore-install $((modprobe -c | grep -wq "^allow_unsupported_modules") &&
                echo '-allow-unsupported-modules') mlx4_core &&
                if [ -e /etc/infiniband/openib.conf ]; then
                    if ( grep -q "^MLX4_EN_LOAD=yes" /etc/infiniband/openib.conf > /dev/null 2>&1); then
                        modprobe mlx4_en
                    fi
                else
                    modprobe mlx4_en
                fi
            install mlx4_en modprobe -ignore-install $((modprobe -c | grep -wq "^allow_unsupported_modules") &&
                echo '-allow-unsupported-modules') mlx4_en &&
                if [ -e /etc/infiniband/openib.conf ]; then
                    if ( grep -q "^RUN_SYSCTL=yes" /etc/infiniband/openib.conf > /dev/null 2>&1); then
                        /sbin/sysctl_perf_tuning load
                    fi
                fi
            remove mlx4_en /sbin/sysctl_perf_tuning unload ; modprobe -r --ignore-remove mlx4_en
            

            It's an interesting little bit of code. One thing about it worth noting is the reference to /etc/infiniband/openib.conf. Is that file used for anything other than this module installation configuration? If not, might as well remove it from the kernel-ib package as well.

            brian Brian Murrell (Inactive) added a comment - Easiest fix for the problem will be to remove the file '/etc/modprobe.d/mlx4_en.conf' from the 'packaging list' of the rpmbuild spec file for the OFED kernel-ib modules RPM. Ahhh. Nice detective work Frank! This /etc/modprobe.d/mlx4_en.conf is marginally interesting. Reformatting it's lack of whitespace for ease of reading: install mlx4_core modprobe -ignore-install $((modprobe -c | grep -wq "^allow_unsupported_modules" ) && echo '-allow-unsupported-modules' ) mlx4_core && if [ -e /etc/infiniband/openib.conf ]; then if ( grep -q "^MLX4_EN_LOAD=yes" /etc/infiniband/openib.conf > /dev/ null 2>&1); then modprobe mlx4_en fi else modprobe mlx4_en fi install mlx4_en modprobe -ignore-install $((modprobe -c | grep -wq "^allow_unsupported_modules" ) && echo '-allow-unsupported-modules' ) mlx4_en && if [ -e /etc/infiniband/openib.conf ]; then if ( grep -q "^RUN_SYSCTL=yes" /etc/infiniband/openib.conf > /dev/ null 2>&1); then /sbin/sysctl_perf_tuning load fi fi remove mlx4_en /sbin/sysctl_perf_tuning unload ; modprobe -r --ignore-remove mlx4_en It's an interesting little bit of code. One thing about it worth noting is the reference to /etc/infiniband/openib.conf. Is that file used for anything other than this module installation configuration? If not, might as well remove it from the kernel-ib package as well.

            For both client and server ofa builds the modules mlx4_core, mlx4_en won't be loaded by udevd (started from /etc/rc.d/rc.sysinit) if the configuration file '/etc/modprobe.d/mlx4_en.conf' is present. If the file is removed (or moved to other directory or file name) startup of the mlx4_core, mlx4_en works and
            therefore the interface ib0 is configured correctly by the '/etc/init.d/rdma' script.

            Content of the file reads as:
            [root@client-7 ~]# cat /etc/modprobe.d/mlx4_en.conf
            install mlx4_core modprobe -ignore-install $((modprobe -c | grep -wq "^allow_unsupported_modules") && echo '-allow-unsupported-modules') mlx4_core && if [ -e /etc/infiniband/openib.conf ]; then if ( grep -q "^MLX4_EN_LOAD=yes" /etc/infiniband/openib.conf > /dev/null 2>&1); then modprobe mlx4_en; fi; else modprobe mlx4_en; fi
            install mlx4_en modprobe -ignore-install $((modprobe -c | grep -wq "^allow_unsupported_modules") && echo '-allow-unsupported-modules') mlx4_en && if [ -e /etc/infiniband/openib.conf ]; then if ( grep -q "^RUN_SYSCTL=yes" /etc/infiniband/openib.conf > /dev/null 2>&1); then /sbin/sysctl_perf_tuning load; fi; fi
            remove mlx4_en /sbin/sysctl_perf_tuning unload ; modprobe -r --ignore-remove mlx4_en

            The file is owned by the external OFED kernel-ib RPM:
            [root@client-7 ~]# rpm -q --whatprovides /etc/modprobe.d/mlx4_en.conf
            kernel-ib-1.5.4-2.6.32_279.14.1.el6_lustre.g1f5b9fe.x86_64.x86_64
            (Same for client kernel-ib RPM; version string is only different)

            The failed startup of the modules in the case 'mlx_en.conf' is present can can be reproduced by:
            1 Removing the HCA (echo 1 > /sys/devices/pci0000\:00/0000\:00\:03.0/0000\:02\:00.0/remove)
            2 Rescan of PCI bus (echo 1 > /sys/bus/pci/rescan)
            The output of 'udevadm monitor --environment' run simultaneously, shows only the initialization, but no startup of the modules. The same test sequence with 'mlx4_en.conf' removed shows that the modules are loaded correctly accordingly to the modules.

            {alias, dep}

            mappping.

            Easiest fix for the problem will be to remove the file '/etc/modprobe.d/mlx4_en.conf' from the 'packaging list' of the rpmbuild spec file for the OFED kernel-ib modules RPM.

            heckes Frank Heckes (Inactive) added a comment - For both client and server ofa builds the modules mlx4_core, mlx4_en won't be loaded by udevd (started from /etc/rc.d/rc.sysinit) if the configuration file '/etc/modprobe.d/mlx4_en.conf' is present. If the file is removed (or moved to other directory or file name) startup of the mlx4_core, mlx4_en works and therefore the interface ib0 is configured correctly by the '/etc/init.d/rdma' script. Content of the file reads as: [root@client-7 ~] # cat /etc/modprobe.d/mlx4_en.conf install mlx4_core modprobe - ignore-install $((modprobe -c | grep -wq "^allow_unsupported_modules") && echo ' -allow-unsupported-modules') mlx4_core && if [ -e /etc/infiniband/openib.conf ]; then if ( grep -q "^MLX4_EN_LOAD=yes" /etc/infiniband/openib.conf > /dev/null 2>&1); then modprobe mlx4_en; fi; else modprobe mlx4_en; fi install mlx4_en modprobe - ignore-install $((modprobe -c | grep -wq "^allow_unsupported_modules") && echo ' -allow-unsupported-modules') mlx4_en && if [ -e /etc/infiniband/openib.conf ]; then if ( grep -q "^RUN_SYSCTL=yes" /etc/infiniband/openib.conf > /dev/null 2>&1); then /sbin/sysctl_perf_tuning load; fi; fi remove mlx4_en /sbin/sysctl_perf_tuning unload ; modprobe -r --ignore-remove mlx4_en The file is owned by the external OFED kernel-ib RPM: [root@client-7 ~] # rpm -q --whatprovides /etc/modprobe.d/mlx4_en.conf kernel-ib-1.5.4-2.6.32_279.14.1.el6_lustre.g1f5b9fe.x86_64.x86_64 (Same for client kernel-ib RPM; version string is only different) The failed startup of the modules in the case 'mlx_en.conf' is present can can be reproduced by: 1 Removing the HCA (echo 1 > /sys/devices/pci0000\:00/0000\:00\:03.0/0000\:02\:00.0/remove) 2 Rescan of PCI bus (echo 1 > /sys/bus/pci/rescan) The output of 'udevadm monitor --environment' run simultaneously, shows only the initialization, but no startup of the modules. The same test sequence with 'mlx4_en.conf' removed shows that the modules are loaded correctly accordingly to the modules. {alias, dep} mappping. Easiest fix for the problem will be to remove the file '/etc/modprobe.d/mlx4_en.conf' from the 'packaging list' of the rpmbuild spec file for the OFED kernel-ib modules RPM.

            Yes, that the right, so we have two potential solutions for the problem. I didn't find out yet why the entries for mlx4_en are created. I'll that check on Monday.

            heckes Frank Heckes (Inactive) added a comment - Yes, that the right, so we have two potential solutions for the problem. I didn't find out yet why the entries for mlx4_en are created. I'll that check on Monday.

            Frank,

            Was this discovery:

            For ofa builds the only the HCA is detected, but the drivers don't. Reason is a dublicate entry in
            modules.alias for the ofa build:

            client-7-modules.alias-ofa:alias pci:v000015B3d0000673Csv*sd*bc*sc*i* mlx4_en
            client-7-modules.alias-ofa:alias pci:v000015B3d0000673Csv*sd*bc*sc*i* mlx4_core

            Removing the entry for mlx4_en fixes the problem and rdma scripts works for ofa, too.

            made after we spoke on Friday? i.e. is that the smoking gun and if we figure out why that duplicate entry (which is only there when using the OFA I/B, is that right?) is being created it will resolve the issue and the rdma initscript will be fully-functional?

            brian Brian Murrell (Inactive) added a comment - Frank, Was this discovery: For ofa builds the only the HCA is detected, but the drivers don't. Reason is a dublicate entry in modules.alias for the ofa build: client-7-modules.alias-ofa:alias pci:v000015B3d0000673Csv*sd*bc*sc*i* mlx4_en client-7-modules.alias-ofa:alias pci:v000015B3d0000673Csv*sd*bc*sc*i* mlx4_core Removing the entry for mlx4_en fixes the problem and rdma scripts works for ofa, too. made after we spoke on Friday? i.e. is that the smoking gun and if we figure out why that duplicate entry (which is only there when using the OFA I/B, is that right?) is being created it will resolve the issue and the rdma initscript will be fully-functional?
            heckes Frank Heckes (Inactive) added a comment - - edited

            For inkernel build the mlx4_core and mlx4_en are not part of the initrmamfs. I checked the initrd.kdump file by mistake. Anyway important finding is that the modules are started before the execution of the /etc/init.d/rdma - script

            For the inkernel build the following sequence relevant to the infiniband initialization is performed:

            init run /etc/rc.d/rc.sysinit
            /etc/rc.sysinit run /sbin/start_udev
            /sbin/start_udev runs udevd
            udevd receives event from kernel that HCA interface is available
            udevd triggers load of mlx4_core, and mlx4_en
            /etc/rc.sysinit executes active run-level scripts
            rdma is executed
            if mlx4_core is started mlx4_ib is started ---> which will create interface (ib0, ib...)
            if interface is (ib0) available IP configuration is done
            rdma finish with success

            The 'critical' part for script rdma is whether mlx4_core is loaded or not. If the module is not present the
            initialization of the infiniband interface fails.

            The behaviour (for the inkernel) can be repeated at run-time by running udevadm monitor --environment and by executing
            /etc/init.d/rdma stop
            echo 1 > /sys/devices/pci0000\:00/0000\:00\:03.0/0000\:02\:00.0/remove

            --> this will remove all mlx4_* modules and the HCA (infiniband) card from the OS

            Executing:
            echo 1 > > /sys/bus/pci/rescan

            adds the hardware and udevd starts the mlx4_en, mlx4_core driver (see client-7-)

            If the hardware isn't removed, but all mlx4_* modules are unloaded the udevd reloads the mlx4_core, mlx4_en
            when starting the ib-interface via /etc/init.d/rdma.
            The startup is handled by the entry:
            alias pci:v000015B3d0000673Csv*sd*bc*sc*i* mlx4_core

            For ofa builds the only the HCA is detected, but the drivers don't. Reason is a dublicate entry in
            modules.alias for the ofa build:

            client-7-modules.alias-ofa:alias pci:v000015B3d0000673Csv*sd*bc*sc*i* mlx4_en
            client-7-modules.alias-ofa:alias pci:v000015B3d0000673Csv*sd*bc*sc*i* mlx4_core

            Removing the entry for mlx4_en fixes the problem and rdma scripts works for ofa, too.

            heckes Frank Heckes (Inactive) added a comment - - edited For inkernel build the mlx4_core and mlx4_en are not part of the initrmamfs. I checked the initrd.kdump file by mistake. Anyway important finding is that the modules are started before the execution of the /etc/init.d/rdma - script For the inkernel build the following sequence relevant to the infiniband initialization is performed: init run /etc/rc.d/rc.sysinit /etc/rc.sysinit run /sbin/start_udev /sbin/start_udev runs udevd udevd receives event from kernel that HCA interface is available udevd triggers load of mlx4_core, and mlx4_en /etc/rc.sysinit executes active run-level scripts rdma is executed if mlx4_core is started mlx4_ib is started ---> which will create interface (ib0, ib...) if interface is (ib0) available IP configuration is done rdma finish with success The 'critical' part for script rdma is whether mlx4_core is loaded or not. If the module is not present the initialization of the infiniband interface fails. The behaviour (for the inkernel) can be repeated at run-time by running udevadm monitor --environment and by executing /etc/init.d/rdma stop echo 1 > /sys/devices/pci0000\:00/0000\:00\:03.0/0000\:02\:00.0/remove --> this will remove all mlx4_* modules and the HCA (infiniband) card from the OS Executing: echo 1 > > /sys/bus/pci/rescan adds the hardware and udevd starts the mlx4_en, mlx4_core driver (see client-7-) If the hardware isn't removed, but all mlx4_* modules are unloaded the udevd reloads the mlx4_core, mlx4_en when starting the ib-interface via /etc/init.d/rdma. The startup is handled by the entry: alias pci:v000015B3d0000673Csv*sd*bc*sc*i* mlx4_core For ofa builds the only the HCA is detected, but the drivers don't. Reason is a dublicate entry in modules.alias for the ofa build: client-7-modules.alias-ofa:alias pci:v000015B3d0000673Csv*sd*bc*sc*i* mlx4_en client-7-modules.alias-ofa:alias pci:v000015B3d0000673Csv*sd*bc*sc*i* mlx4_core Removing the entry for mlx4_en fixes the problem and rdma scripts works for ofa, too.

            Created change for lbuild to alter the kernel-ib SPEC file based on the canonical target name of the distribution (will preserve changes for rhel5).

            Also continue investigating into option adding the mlx4_

            {core,en,ib}

            to initrd and why it isn't done for ofa builds in parallel.

            heckes Frank Heckes (Inactive) added a comment - Created change for lbuild to alter the kernel-ib SPEC file based on the canonical target name of the distribution (will preserve changes for rhel5). Also continue investigating into option adding the mlx4_ {core,en,ib} to initrd and why it isn't done for ofa builds in parallel.
            brian Brian Murrell (Inactive) added a comment - - edited

            chris: Because the standard OFED build assumes a "vanilla" Linux installation does not really take into account vendor "Value Add" such as RedHat has done with their "rdma" package. Ideally, their packaging process should try to figure out if they need to interoperate with the vendors "Value Add' but I don't believe it does".

            brian Brian Murrell (Inactive) added a comment - - edited chris : Because the standard OFED build assumes a "vanilla" Linux installation does not really take into account vendor "Value Add" such as RedHat has done with their "rdma" package. Ideally, their packaging process should try to figure out if they need to interoperate with the vendors "Value Add' but I don't believe it does".

            Brian: I have little insight into the detail on this. But I am surprised that the standard OFED build would not be the best outcome, why do we need to modify the standard build? Or more correctly why would the standard build be of a form that is not providing the best functionality?

            Frank: Is it the case that the standard OFED build, without the spec file change, builds, installs and runs properly - or have I missed something?

            chris Chris Gearing (Inactive) added a comment - Brian: I have little insight into the detail on this. But I am surprised that the standard OFED build would not be the best outcome, why do we need to modify the standard build? Or more correctly why would the standard build be of a form that is not providing the best functionality? Frank: Is it the case that the standard OFED build, without the spec file change, builds, installs and runs properly - or have I missed something?

            People

              heckes Frank Heckes (Inactive)
              heckes Frank Heckes (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: