Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11348

Lustre 2.10.4 failed to build with MLNX_OFED_LINUX-4.4-1.0.0.0

Details

    • Bug
    • Resolution: Not a Bug
    • Major
    • None
    • Lustre 2.10.4
    • 3
    • 9223372036854775807

    Description

       

      When building Lustre 2.10.4 with Mellanox OFED stack, got errors:

      configure: error: can't compile with OpenIB gen2 headers

      (please see attached file for more details)

      Attachments

        Activity

          [LU-11348] Lustre 2.10.4 failed to build with MLNX_OFED_LINUX-4.4-1.0.0.0
          mdiep Minh Diep added a comment -

          I find it's hard to believe that we need a card to build. I have built lustre on VM all the time without the card. Perhaps you need to use --with-linux option?

          -Minh

          mdiep Minh Diep added a comment - I find it's hard to believe that we need a card to build. I have built lustre on VM all the time without the card. Perhaps you need to use --with-linux option? -Minh

          Minh,

          After further troubleshooting, I found my problem to the original issue - Lustre 2.10.4 failed to build on CentOS6.10.

          It was really a run-time issue, with Lustre build and install process, I believe.

          It turns out the processes requires Mellanox card to be presented and driver loaded in kernel space first. The machine I used to build Lustre server didn't have the Mellanox card (it was there before) at the time.

          So even Lustre 2.10.X isn't officially supported in CentOS6.* anymore, one can still build it on the platform and use it.

          You can close this ticket.

          Thanks for all the helps,

          Haisong

          haisong Haisong Cai (Inactive) added a comment - Minh, After further troubleshooting, I found my problem to the original issue - Lustre 2.10.4 failed to build on CentOS6.10. It was really a run-time issue, with Lustre build and install process, I believe. It turns out the processes requires Mellanox card to be presented and driver loaded in kernel space first. The machine I used to build Lustre server didn't have the Mellanox card (it was there before) at the time. So even Lustre 2.10.X isn't officially supported in CentOS6.* anymore, one can still build it on the platform and use it. You can close this ticket. Thanks for all the helps, Haisong

          Minh,

          I followed your suggestion and built 2.10.4 client successfuly on CentOS6.10.

          The procesures is very close to what I used for building server, minus ZFS

          command summary:

          tar zxvf /share/apps/src/MLNX_OFED_LINUX-4.4-1.0.0.0-rhel6.10-x86_64.tgz

          cd MLNX_OFED_LINUX-4.4-1.0.0.0-rhel6.10-x86_64
          ./mlnx_add_kernel_support.sh -m /tmp/MLNX_OFED_LINUX-4.4-1.0.0.0-rhel6.10-x86_64

          tar xvf /tmp/MLNX_OFED_LINUX-4.4-1.0.0.0-rhel6.10-x86_64-ext.tgz
          cd MLNX_OFED_LINUX-4.4-1.0.0.0-rhel6.10-x86_64-ext
          ./mlnxofedinstall

          [root@badger-oss-4-1 MLNX_OFED_LINUX-4.4-1.0.0.0-rhel6.10-x86_64-ext]# ofed_info -s
          MLNX_OFED_LINUX-4.4-1.0.0.0:

              1. Lustre client build
                cd /tmp/
                tar zxvf /share/apps/src/lustre-release_git.tgz
                cd /tmp/lustre-release
                sh ./autogen.sh
                ./configure --disable-server --with-obib=/usr/src/ofa_kernel/default && make && make rpms

           

          I also saved logs. Let me know if you like them too.

          haisong Haisong Cai (Inactive) added a comment - Minh, I followed your suggestion and built 2.10.4 client successfuly on CentOS6.10. The procesures is very close to what I used for building server, minus ZFS command summary: tar zxvf /share/apps/src/MLNX_OFED_LINUX-4.4-1.0.0.0-rhel6.10-x86_64.tgz cd MLNX_OFED_LINUX-4.4-1.0.0.0-rhel6.10-x86_64 ./mlnx_add_kernel_support.sh -m /tmp/MLNX_OFED_LINUX-4.4-1.0.0.0-rhel6.10-x86_64 tar xvf /tmp/MLNX_OFED_LINUX-4.4-1.0.0.0-rhel6.10-x86_64-ext.tgz cd MLNX_OFED_LINUX-4.4-1.0.0.0-rhel6.10-x86_64-ext ./mlnxofedinstall [root@badger-oss-4-1 MLNX_OFED_LINUX-4.4-1.0.0.0-rhel6.10-x86_64-ext] # ofed_info -s MLNX_OFED_LINUX-4.4-1.0.0.0: Lustre client build cd /tmp/ tar zxvf /share/apps/src/lustre-release_git.tgz cd /tmp/lustre-release sh ./autogen.sh ./configure --disable-server --with-obib=/usr/src/ofa_kernel/default && make && make rpms   I also saved logs. Let me know if you like them too.
          pjones Peter Jones added a comment -

          Haisong

          Sorry to not have been clear enough - the 2.10.x LTS branch is absolutely supported, but only for Linux distributions as outlined in the support matrix - https://wiki.whamcloud.com/display/PUB/Lustre+Support+Matrix . As you can see, we only offer support for RHEL 7.x servers at present.

          Reviewing the exchange on the EDU-91 ticket it is clear to me that the enquiry as to whether you were planning to upgrade to 2.10.x assumed that you realized that the implication of this would also mean to upgrade the Linux distribution on the storage servers accordingly.

          We don't try and break support for older Linux distributions so they can often stay working long after they have been dropped from the combinations we build and test. I am more surprised that it is only so recently RHEL 6.x servers stopped working than to learn that they do not work. It is some years since we have been actively supporting RHEL 6.x servers on master (they were deprecated for the Lustre 2.8 release cycle)

          Given your requirements I see two options:

          1) Upgrade your storage servers distro to RHEL 7.5 and keep your existing kernel on your clients, using 2.10.5

          2) Keep your storage servers on 2.7 FE (the last release to support RHEL 6.x servers) and see whether it is possible to port the LU-7199 fix back to that version

          Personally, I think that option #1 is the better approach. TBH I am surprised that you are still using RHEL 6.x for a Lustre on ZFS deployment - a study conducted back when Rick Wagner was as SDSC had showed that there were performance advantages of using RHEL 7.x over RHEL 6.x for Lustre on ZFS. Many sites will take this approach and treat the storage servers as a "black box" rather than trying to have a single kernel version used across both Lustre servers and clients.

          Peter

          pjones Peter Jones added a comment - Haisong Sorry to not have been clear enough - the 2.10.x LTS branch is absolutely supported, but only for Linux distributions as outlined in the support matrix - https://wiki.whamcloud.com/display/PUB/Lustre+Support+Matrix  . As you can see, we only offer support for RHEL 7.x servers at present. Reviewing the exchange on the EDU-91 ticket it is clear to me that the enquiry as to whether you were planning to upgrade to 2.10.x assumed that you realized that the implication of this would also mean to upgrade the Linux distribution on the storage servers accordingly. We don't try and break support for older Linux distributions so they can often stay working long after they have been dropped from the combinations we build and test. I am more surprised that it is only so recently RHEL 6.x servers stopped working than to learn that they do not work. It is some years since we have been actively supporting RHEL 6.x servers on master (they were deprecated for the Lustre 2.8 release cycle) Given your requirements I see two options: 1) Upgrade your storage servers distro to RHEL 7.5 and keep your existing kernel on your clients, using 2.10.5 2) Keep your storage servers on 2.7 FE (the last release to support RHEL 6.x servers) and see whether it is possible to port the LU-7199 fix back to that version Personally, I think that option #1 is the better approach. TBH I am surprised that you are still using RHEL 6.x for a Lustre on ZFS deployment - a study conducted back when Rick Wagner was as SDSC had showed that there were performance advantages of using RHEL 7.x over RHEL 6.x for Lustre on ZFS. Many sites will take this approach and treat the storage servers as a "black box" rather than trying to have a single kernel version used across both Lustre servers and clients. Peter

          I should also add that, on Aug 23, I used a CentOS6.7 server + ZFS 0.7.9, with Lustre 2.10.4 (git checkout v2_10_4) and successfully built and installed the server.

          Now that process can't be repeated any more.

          Haisong

          haisong Haisong Cai (Inactive) added a comment - I should also add that, on Aug 23, I used a CentOS6.7 server + ZFS 0.7.9, with Lustre 2.10.4 (git checkout v2_10_4) and successfully built and installed the server. Now that process can't be repeated any more. Haisong

          Thank you for the update.

          However in my earlier ticket, EDU-91, engineer suggested that we upgrade Lustre server from FE2.7.x to 2.10.x for that particular MDS bug.

          Now we are preparing for such upgrade, LTS is no longer supported? Here I am assuming LTS referring to all 2.10.x (according to the release note here http://lustre.org/lustre-2-10-0-released)

           

          Thanks,

          Haisong

          haisong Haisong Cai (Inactive) added a comment - Thank you for the update. However in my earlier ticket, EDU-91, engineer suggested that we upgrade Lustre server from FE2.7.x to 2.10.x for that particular MDS bug. Now we are preparing for such upgrade, LTS is no longer supported? Here I am assuming LTS referring to all 2.10.x (according to the release note here http://lustre.org/lustre-2-10-0-released)   Thanks, Haisong
          pjones Peter Jones added a comment -

          We offer only support for Lustre clients for RHEL 6.x on the LTS branch.

          pjones Peter Jones added a comment - We offer only support for Lustre clients for RHEL 6.x on the LTS branch.

          Minh,

          Are the following compatible:

          CentOS6.10 (kernel 2.6.32-754.3.5.el6.x86_64)

          ZFS 0.7.9

          Lustre server 2.10.4

          Mellanox MLNX_OFED_LINUX-4.4-1.0.0.0-rhel6.10-x86_64

          I still can't build Lustre server.

          Thanks,

          Haisong

          haisong Haisong Cai (Inactive) added a comment - Minh, Are the following compatible: CentOS6.10 (kernel 2.6.32-754.3.5.el6.x86_64) ZFS 0.7.9 Lustre server 2.10.4 Mellanox MLNX_OFED_LINUX-4.4-1.0.0.0-rhel6.10-x86_64 I still can't build Lustre server. Thanks, Haisong

          Minh,

          I followed suggested steps, but it failed still.

          Including compilation log and installation log here.

          Thanks,

          Haisonglustre_compiling_installing_log

          haisong Haisong Cai (Inactive) added a comment - Minh, I followed suggested steps, but it f ailed still. Including compilation log and installation log here. Thanks, Haisong lustre_compiling_installing_log
          mdiep Minh Diep added a comment -

          Hi haisong
          Below was my recommendation on el7 but also applies to el6
          The dependencies errors are due to different in build process between lustre and MOFED. Here is one way that works

          1. download the MLNX_OFED_LINUX-4.4-2.0.7.0-rhel7.5-x86_64.tgz; unzip and cd MLNX_OFED_LINUX-4.4-2.0.7.0-rhel7.5-x86_64
          2. you would use mlnx_add_kernel_support.sh since the kernel is newer than MOFED support.
          Run mlnx_add_kernel_support.sh --kmp (enable kmp support since it's disabled by default)
          3. once a build is done, there is a new tarball in /tmp/MLNX_OFED_LINUX-4.4-2.0.7.0-rhel7.5-x86_64-ext.tgz
          4. unzip MLNX_OFED_LINUX-4.4-2.0.7.0-rhel7.5-x86_64-ext.tgz and go to MLNX_OFED_LINUX-4.4-2.0.7.0-rhel7.5-x86_64-ext/RPMS
          5. install these rpms (others too if you think you'll need them)

          yum localinstall mlnx-ofa_kernel-4.4-OFED.4.4.2.0.7.1.gee7aa0e.rhel7u5.x86_64.rpm kmod-mlnx-ofa_kernel-4.4-OFED.4.4.2.0.7.1.gee7aa0e.201808301559.rhel7u5.x86_64.rpm mlnx-ofa_kernel-devel-4.4-OFED.4.4.2.0.7.1.gee7aa0e.rhel7u5.x86_64.rpm

          Now you can build lustre
          ./configure --with-obib=/usr/src/ofa_kernel/default && make && make rpms

          Please let me know the result

          mdiep Minh Diep added a comment - Hi haisong Below was my recommendation on el7 but also applies to el6 The dependencies errors are due to different in build process between lustre and MOFED. Here is one way that works 1. download the MLNX_OFED_LINUX-4.4-2.0.7.0-rhel7.5-x86_64.tgz; unzip and cd MLNX_OFED_LINUX-4.4-2.0.7.0-rhel7.5-x86_64 2. you would use mlnx_add_kernel_support.sh since the kernel is newer than MOFED support. Run mlnx_add_kernel_support.sh --kmp (enable kmp support since it's disabled by default) 3. once a build is done, there is a new tarball in /tmp/MLNX_OFED_LINUX-4.4-2.0.7.0-rhel7.5-x86_64-ext.tgz 4. unzip MLNX_OFED_LINUX-4.4-2.0.7.0-rhel7.5-x86_64-ext.tgz and go to MLNX_OFED_LINUX-4.4-2.0.7.0-rhel7.5-x86_64-ext/RPMS 5. install these rpms (others too if you think you'll need them) yum localinstall mlnx-ofa_kernel-4.4-OFED.4.4.2.0.7.1.gee7aa0e.rhel7u5.x86_64.rpm kmod-mlnx-ofa_kernel-4.4-OFED.4.4.2.0.7.1.gee7aa0e.201808301559.rhel7u5.x86_64.rpm mlnx-ofa_kernel-devel-4.4-OFED.4.4.2.0.7.1.gee7aa0e.rhel7u5.x86_64.rpm Now you can build lustre ./configure --with-obib=/usr/src/ofa_kernel/default && make && make rpms Please let me know the result

           

          Hi Minh,

          I tried to compile server 2.10.4 and 2.10.5 on a CentOS 6.10, ZFS 0.7.9 machine.

          Lustre server rpms built. But when try to install, I get errors - see attachment.lustre-installation-errors

           

          haisong Haisong Cai (Inactive) added a comment -   Hi Minh, I tried to compile server 2.10.4 and 2.10.5 on a CentOS 6.10, ZFS 0.7.9 machine. Lustre server rpms built. But when try to install, I get errors - see attachment. lustre-installation-errors  

          People

            mdiep Minh Diep
            haisong Haisong Cai (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: