Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5953

lustre[-dkms] needs to automatically account for OFED

Details

    • Improvement
    • Resolution: Fixed
    • Major
    • Lustre 2.9.0
    • None
    • None
    • 16631

    Description

      Currently our lustre-dkms package does not build against any installed OFED.

      Background

      Outside of the DKMS world, if a user wants to build Lustre with OFED they first build and install OFED and then using Lustre's configure script, point the Lustre build at the built and installed OFED using the --with-o2ib switch, giving it the path to the OFED installation as an argument.

      Problem

      This doesn't work with lustre-dkms because there is no way for the user to provide that OFED path to DKMS built (Lustre) modules.

      Solution

      The nice part is that this solution is general and will be of benefit to users who compile from source in the traditional manner as well as benefiting lustre-dkms users.

      The solution that I propose is that the code we currently have to handle --with-o2ib [yes|no|<path_to_ofed>] should look for the OFED installation in it's expected installed location (i.e. when following the official upstream instructions on how to build and install OFED) when given simply a yes option and that OFED should be preferred over in-kernel IB if the argument is simply yes and OFED is installed.

      Attachments

        Issue Links

          Activity

            [LU-5953] lustre[-dkms] needs to automatically account for OFED
            bfaccini Bruno Faccini (Inactive) added a comment - - edited

            This automatic detection mechanism is intended for IEEL and DKMS Lustre RPMs, as an experienced super-user, again why don't you use the "--with-o2ib=<path>" configure option/way ??

            bfaccini Bruno Faccini (Inactive) added a comment - - edited This automatic detection mechanism is intended for IEEL and DKMS Lustre RPMs, as an experienced super-user, again why don't you use the "--with-o2ib=<path>" configure option/way ??

            Generally speaking, i don't think that automatically detecting is a good idea for super users, Bruno, maybe give options to users are better, no?

            wangshilong Wang Shilong (Inactive) added a comment - Generally speaking, i don't think that automatically detecting is a good idea for super users, Bruno, maybe give options to users are better, no?

            The simplest way to fix this could be to use the "--with-o2ib=<path>" configure option/way! And also to request latest MLNX_OFED versions to continue to provide the %install_path/openib link in their devel rpm?

            bfaccini Bruno Faccini (Inactive) added a comment - The simplest way to fix this could be to use the "--with-o2ib=<path>" configure option/way! And also to request latest MLNX_OFED versions to continue to provide the %install_path/openib link in their devel rpm?

            I'm in the process of setting up a Mellanox stack system so if I run into this issue I will see what I can do to fix it.

            simmonsja James A Simmons added a comment - I'm in the process of setting up a Mellanox stack system so if I run into this issue I will see what I can do to fix it.
            wangshilong Wang Shilong (Inactive) added a comment - - edited

            Hello,

            At least MLNX_OFED_LINUX-2.4-1.0.4-rhel6.6-x86_64.tar totally breaks your checks..

            Please check it and download from following link
            http://www.mellanox.com/page/products_dyn?product_family=26&mtag=linux_sw_drivers

            wangshilong Wang Shilong (Inactive) added a comment - - edited Hello, At least MLNX_OFED_LINUX-2.4-1.0.4-rhel6.6-x86_64.tar totally breaks your checks.. Please check it and download from following link http://www.mellanox.com/page/products_dyn?product_family=26&mtag=linux_sw_drivers

            Hello Wang,
            Thanks to raise this issue, but I am afraid this may be directly linked to MLNX_OFED different (and recent?) packaging than in the OFED RPMs and which was not in the original scope of this ticket and associated patch.

            OTH, I remember I have used the "openib" file/link name as a reference because I thought to have found it was kept for historical in the different OFED versions packaging. But seems MLNX_OFED now (I checked it is still true/working in/with MLNX_OFED_LINUX-2.1-*) breaks this behavior?

            bfaccini Bruno Faccini (Inactive) added a comment - Hello Wang, Thanks to raise this issue, but I am afraid this may be directly linked to MLNX_OFED different (and recent?) packaging than in the OFED RPMs and which was not in the original scope of this ticket and associated patch. OTH, I remember I have used the "openib" file/link name as a reference because I thought to have found it was kept for historical in the different OFED versions packaging. But seems MLNX_OFED now (I checked it is still true/working in/with MLNX_OFED_LINUX-2.1-*) breaks this behavior?

            Hello,

            We failed to build recently with this patch.

            It was because we failed following check:
            ofed_info | egrep -w 'compat-rdma-devel|kernel-ib-devel|ofa_kernel-devel' | xargs rpm -ql | grep '/openib$'

            Here is really output for our building (rhel6.6 with MLNX_OFED_LINUX-2.4-1.0.4-rhel6.6-x86_64):

            [root@build01 MLNX_OFED_LINUX-2.4-1.0.4-rhel6.6-x86_64]# ofed_info | egrep -w 'compat-rdma-devel|kernel-ib-devel|ofa_kernel-devel' | xargs rpm -ql | grep '/openib'
            /usr/src/ofa_kernel-2.4/ofed_scripts/openib.conf
            /usr/src/ofa_kernel-2.4/ofed_scripts/openibd
            /usr/src/ofa_kernel-2.4/ofed_scripts/openibd.service
            /usr/src/ofa_kernel/default/ofed_scripts/openib.conf
            /usr/src/ofa_kernel/default/ofed_scripts/openibd
            /usr/src/ofa_kernel/default/ofed_scripts/openibd.service

            So maybe you mean '/openibd$' rather than '/openib$' ?

            Best regards,
            Wang Shilong

            wangshilong Wang Shilong (Inactive) added a comment - Hello, We failed to build recently with this patch. It was because we failed following check: ofed_info | egrep -w 'compat-rdma-devel|kernel-ib-devel|ofa_kernel-devel' | xargs rpm -ql | grep '/openib$' Here is really output for our building (rhel6.6 with MLNX_OFED_LINUX-2.4-1.0.4-rhel6.6-x86_64): [root@build01 MLNX_OFED_LINUX-2.4-1.0.4-rhel6.6-x86_64] # ofed_info | egrep -w 'compat-rdma-devel|kernel-ib-devel|ofa_kernel-devel' | xargs rpm -ql | grep '/openib' /usr/src/ofa_kernel-2.4/ofed_scripts/openib.conf /usr/src/ofa_kernel-2.4/ofed_scripts/openibd /usr/src/ofa_kernel-2.4/ofed_scripts/openibd.service /usr/src/ofa_kernel/default/ofed_scripts/openib.conf /usr/src/ofa_kernel/default/ofed_scripts/openibd /usr/src/ofa_kernel/default/ofed_scripts/openibd.service So maybe you mean '/openibd$' rather than '/openib$' ? Best regards, Wang Shilong

            And this is somewhat its purpose to detect unexpected IB setups ...

            bfaccini Bruno Faccini (Inactive) added a comment - And this is somewhat its purpose to detect unexpected IB setups ...

            I looked and found our build machine has been hosed for a long time. Its just this patch exposes that the box have a a mixed OFED 3.5 and RHEL infiniband setup on the build machine.

            simmonsja James A Simmons added a comment - I looked and found our build machine has been hosed for a long time. Its just this patch exposes that the box have a a mixed OFED 3.5 and RHEL infiniband setup on the build machine.

            Hello James,
            Can you post the output of ofed_info script/cmd from the node where you are building Lustre?
            Thanks in advance.

            bfaccini Bruno Faccini (Inactive) added a comment - Hello James, Can you post the output of ofed_info script/cmd from the node where you are building Lustre? Thanks in advance.

            I'm seeing the following build errors after this was merged.

            hecking whether to use Compat RDMA... /usr/bin/ofed_info
            rpm: no arguments given for query
            configure: error:
            You seem to have an OFED installed but have not installed it's devel package.
            If you still want to build Lustre for your OFED I/B stack, you need to install its devel headers RPM.
            Instead, if you want to build Lustre for your kernel's built-in I/B stack rather than your installed OFED stack, either remove the OFED package(s) or use --with-o2ib=no.

            make: *** No rule to make target `rpms'. Stop.

            This is using the default OFED stack with RHEL6. I tried the test you merged in the patch and this is the result I get.

            ofed_info | egrep -w 'compat-rdma-devel|kernel-ib-devel|ofa_kernel-devel' | xargs rpm -ql | grep /openib
            rpm: no arguments given for query

            but you test if the output is null which is not the case.

            simmonsja James A Simmons added a comment - I'm seeing the following build errors after this was merged. hecking whether to use Compat RDMA... /usr/bin/ofed_info rpm: no arguments given for query configure: error: You seem to have an OFED installed but have not installed it's devel package. If you still want to build Lustre for your OFED I/B stack, you need to install its devel headers RPM. Instead, if you want to build Lustre for your kernel's built-in I/B stack rather than your installed OFED stack, either remove the OFED package(s) or use --with-o2ib=no. make: *** No rule to make target `rpms'. Stop. This is using the default OFED stack with RHEL6. I tried the test you merged in the patch and this is the result I get. ofed_info | egrep -w 'compat-rdma-devel|kernel-ib-devel|ofa_kernel-devel' | xargs rpm -ql | grep /openib rpm: no arguments given for query but you test if the output is null which is not the case.

            People

              utopiabound Nathaniel Clark
              bfaccini Bruno Faccini (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: