Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11348

Lustre 2.10.4 failed to build with MLNX_OFED_LINUX-4.4-1.0.0.0

Details

    • Bug
    • Resolution: Not a Bug
    • Major
    • None
    • Lustre 2.10.4
    • 3
    • 9223372036854775807

    Description

       

      When building Lustre 2.10.4 with Mellanox OFED stack, got errors:

      configure: error: can't compile with OpenIB gen2 headers

      (please see attached file for more details)

      Attachments

        Activity

          [LU-11348] Lustre 2.10.4 failed to build with MLNX_OFED_LINUX-4.4-1.0.0.0
          mdiep Minh Diep added a comment -

          Thanks

          mdiep Minh Diep added a comment - Thanks

          In the course of my week long troubleshooting, I tried everything I could think of, including the one you suggested here.

          At this point, I am certaint that with the IB card in the server, all my configuration options will likely success. In that regard, The configure part of the compilation step is pretty smart about figuring out proper configuration parameters.

          Haisong

          haisong Haisong Cai (Inactive) added a comment - In the course of my week long troubleshooting, I tried everything I could think of, including the one you suggested here. At this point, I am certaint that with the IB card in the server, all my configuration options will likely success. In that regard, The configure part of the compilation step is pretty smart about figuring out proper configuration parameters. Haisong
          mdiep Minh Diep added a comment -

          if you use --with-o2ib, the config will assume that you want to build with whatever IB installed on the system which mean you need to have IB card.
          if you don't have IB driver or card, use --with-o2ib=/usr/src/ofa_kernel/default

          mdiep Minh Diep added a comment - if you use --with-o2ib, the config will assume that you want to build with whatever IB installed on the system which mean you need to have IB card. if you don't have IB driver or card, use --with-o2ib=/usr/src/ofa_kernel/default

           

          I used --with-linux option option.

          With Mellanox card, this time I only run with

          ./configure --enable-server --disable-ldiskfs --with-o2ib=yes

          This is the option I usually use.

          Thanks,

          Haisong

          haisong Haisong Cai (Inactive) added a comment -   I used --with-linux option option. With Mellanox card, this time I only run with ./configure --enable-server --disable-ldiskfs --with-o2ib=yes This is the option I usually use. Thanks, Haisong
          mdiep Minh Diep added a comment -

          I find it's hard to believe that we need a card to build. I have built lustre on VM all the time without the card. Perhaps you need to use --with-linux option?

          -Minh

          mdiep Minh Diep added a comment - I find it's hard to believe that we need a card to build. I have built lustre on VM all the time without the card. Perhaps you need to use --with-linux option? -Minh

          Minh,

          After further troubleshooting, I found my problem to the original issue - Lustre 2.10.4 failed to build on CentOS6.10.

          It was really a run-time issue, with Lustre build and install process, I believe.

          It turns out the processes requires Mellanox card to be presented and driver loaded in kernel space first. The machine I used to build Lustre server didn't have the Mellanox card (it was there before) at the time.

          So even Lustre 2.10.X isn't officially supported in CentOS6.* anymore, one can still build it on the platform and use it.

          You can close this ticket.

          Thanks for all the helps,

          Haisong

          haisong Haisong Cai (Inactive) added a comment - Minh, After further troubleshooting, I found my problem to the original issue - Lustre 2.10.4 failed to build on CentOS6.10. It was really a run-time issue, with Lustre build and install process, I believe. It turns out the processes requires Mellanox card to be presented and driver loaded in kernel space first. The machine I used to build Lustre server didn't have the Mellanox card (it was there before) at the time. So even Lustre 2.10.X isn't officially supported in CentOS6.* anymore, one can still build it on the platform and use it. You can close this ticket. Thanks for all the helps, Haisong

          Minh,

          I followed your suggestion and built 2.10.4 client successfuly on CentOS6.10.

          The procesures is very close to what I used for building server, minus ZFS

          command summary:

          tar zxvf /share/apps/src/MLNX_OFED_LINUX-4.4-1.0.0.0-rhel6.10-x86_64.tgz

          cd MLNX_OFED_LINUX-4.4-1.0.0.0-rhel6.10-x86_64
          ./mlnx_add_kernel_support.sh -m /tmp/MLNX_OFED_LINUX-4.4-1.0.0.0-rhel6.10-x86_64

          tar xvf /tmp/MLNX_OFED_LINUX-4.4-1.0.0.0-rhel6.10-x86_64-ext.tgz
          cd MLNX_OFED_LINUX-4.4-1.0.0.0-rhel6.10-x86_64-ext
          ./mlnxofedinstall

          [root@badger-oss-4-1 MLNX_OFED_LINUX-4.4-1.0.0.0-rhel6.10-x86_64-ext]# ofed_info -s
          MLNX_OFED_LINUX-4.4-1.0.0.0:

              1. Lustre client build
                cd /tmp/
                tar zxvf /share/apps/src/lustre-release_git.tgz
                cd /tmp/lustre-release
                sh ./autogen.sh
                ./configure --disable-server --with-obib=/usr/src/ofa_kernel/default && make && make rpms

           

          I also saved logs. Let me know if you like them too.

          haisong Haisong Cai (Inactive) added a comment - Minh, I followed your suggestion and built 2.10.4 client successfuly on CentOS6.10. The procesures is very close to what I used for building server, minus ZFS command summary: tar zxvf /share/apps/src/MLNX_OFED_LINUX-4.4-1.0.0.0-rhel6.10-x86_64.tgz cd MLNX_OFED_LINUX-4.4-1.0.0.0-rhel6.10-x86_64 ./mlnx_add_kernel_support.sh -m /tmp/MLNX_OFED_LINUX-4.4-1.0.0.0-rhel6.10-x86_64 tar xvf /tmp/MLNX_OFED_LINUX-4.4-1.0.0.0-rhel6.10-x86_64-ext.tgz cd MLNX_OFED_LINUX-4.4-1.0.0.0-rhel6.10-x86_64-ext ./mlnxofedinstall [root@badger-oss-4-1 MLNX_OFED_LINUX-4.4-1.0.0.0-rhel6.10-x86_64-ext] # ofed_info -s MLNX_OFED_LINUX-4.4-1.0.0.0: Lustre client build cd /tmp/ tar zxvf /share/apps/src/lustre-release_git.tgz cd /tmp/lustre-release sh ./autogen.sh ./configure --disable-server --with-obib=/usr/src/ofa_kernel/default && make && make rpms   I also saved logs. Let me know if you like them too.
          pjones Peter Jones added a comment -

          Haisong

          Sorry to not have been clear enough - the 2.10.x LTS branch is absolutely supported, but only for Linux distributions as outlined in the support matrix - https://wiki.whamcloud.com/display/PUB/Lustre+Support+Matrix . As you can see, we only offer support for RHEL 7.x servers at present.

          Reviewing the exchange on the EDU-91 ticket it is clear to me that the enquiry as to whether you were planning to upgrade to 2.10.x assumed that you realized that the implication of this would also mean to upgrade the Linux distribution on the storage servers accordingly.

          We don't try and break support for older Linux distributions so they can often stay working long after they have been dropped from the combinations we build and test. I am more surprised that it is only so recently RHEL 6.x servers stopped working than to learn that they do not work. It is some years since we have been actively supporting RHEL 6.x servers on master (they were deprecated for the Lustre 2.8 release cycle)

          Given your requirements I see two options:

          1) Upgrade your storage servers distro to RHEL 7.5 and keep your existing kernel on your clients, using 2.10.5

          2) Keep your storage servers on 2.7 FE (the last release to support RHEL 6.x servers) and see whether it is possible to port the LU-7199 fix back to that version

          Personally, I think that option #1 is the better approach. TBH I am surprised that you are still using RHEL 6.x for a Lustre on ZFS deployment - a study conducted back when Rick Wagner was as SDSC had showed that there were performance advantages of using RHEL 7.x over RHEL 6.x for Lustre on ZFS. Many sites will take this approach and treat the storage servers as a "black box" rather than trying to have a single kernel version used across both Lustre servers and clients.

          Peter

          pjones Peter Jones added a comment - Haisong Sorry to not have been clear enough - the 2.10.x LTS branch is absolutely supported, but only for Linux distributions as outlined in the support matrix - https://wiki.whamcloud.com/display/PUB/Lustre+Support+Matrix  . As you can see, we only offer support for RHEL 7.x servers at present. Reviewing the exchange on the EDU-91 ticket it is clear to me that the enquiry as to whether you were planning to upgrade to 2.10.x assumed that you realized that the implication of this would also mean to upgrade the Linux distribution on the storage servers accordingly. We don't try and break support for older Linux distributions so they can often stay working long after they have been dropped from the combinations we build and test. I am more surprised that it is only so recently RHEL 6.x servers stopped working than to learn that they do not work. It is some years since we have been actively supporting RHEL 6.x servers on master (they were deprecated for the Lustre 2.8 release cycle) Given your requirements I see two options: 1) Upgrade your storage servers distro to RHEL 7.5 and keep your existing kernel on your clients, using 2.10.5 2) Keep your storage servers on 2.7 FE (the last release to support RHEL 6.x servers) and see whether it is possible to port the LU-7199 fix back to that version Personally, I think that option #1 is the better approach. TBH I am surprised that you are still using RHEL 6.x for a Lustre on ZFS deployment - a study conducted back when Rick Wagner was as SDSC had showed that there were performance advantages of using RHEL 7.x over RHEL 6.x for Lustre on ZFS. Many sites will take this approach and treat the storage servers as a "black box" rather than trying to have a single kernel version used across both Lustre servers and clients. Peter

          I should also add that, on Aug 23, I used a CentOS6.7 server + ZFS 0.7.9, with Lustre 2.10.4 (git checkout v2_10_4) and successfully built and installed the server.

          Now that process can't be repeated any more.

          Haisong

          haisong Haisong Cai (Inactive) added a comment - I should also add that, on Aug 23, I used a CentOS6.7 server + ZFS 0.7.9, with Lustre 2.10.4 (git checkout v2_10_4) and successfully built and installed the server. Now that process can't be repeated any more. Haisong

          Thank you for the update.

          However in my earlier ticket, EDU-91, engineer suggested that we upgrade Lustre server from FE2.7.x to 2.10.x for that particular MDS bug.

          Now we are preparing for such upgrade, LTS is no longer supported? Here I am assuming LTS referring to all 2.10.x (according to the release note here http://lustre.org/lustre-2-10-0-released)

           

          Thanks,

          Haisong

          haisong Haisong Cai (Inactive) added a comment - Thank you for the update. However in my earlier ticket, EDU-91, engineer suggested that we upgrade Lustre server from FE2.7.x to 2.10.x for that particular MDS bug. Now we are preparing for such upgrade, LTS is no longer supported? Here I am assuming LTS referring to all 2.10.x (according to the release note here http://lustre.org/lustre-2-10-0-released)   Thanks, Haisong

          People

            mdiep Minh Diep
            haisong Haisong Cai (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: