Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-14297

Can't compile lustre client against MLNX OFED-5.2-1.0.4 on Centos 7.8

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.12.7
    • Lustre 2.12.5
    • None
    • Dell and Lenovo hardware. MLNX OFED-5.2-1.0.4. Lustre 2.12.5. OS is Centos 7.8. Kernel is 3.10.0-1127.19.1.el7.x86_64

    Description

      Hello, I am trying to install lustre on our lnet routers which have connectx-5 cards installed in them using dkms on Centos 7.8 with kernel 3.10.0-1127.19.1.el7.x86_64. Also Mellanox just released their latest driver version OFED-5.2-1.0.4 yesterday Jan 4, 2021. When dkms tries to compile lustre, it fails with the following at end:

      configure: LNet kernel checks
      ==============================================================================
      checking whether to enable CPU affinity support... yes
      checking if Linux kernel has cpu affinity support... yes
      checking whether to enable tunable backoff TCP support... yes
      checking if Linux kernel has tunable backoff TCP support... no
      checking whether to use Compat RDMA... /bin/ofed_info
      no
      configure: error: no OFED nor kernel OpenIB gen2 headers present
      configure error, check /var/lib/dkms/lustre-client/2.12.5/build/config.log

      Building module:
      cleaning build area...(bad exit status: 2)
      make -j8 KERNELRELEASE=3.10.0-1127.19.1.el7.x86_64...(bad exit status: 2)
      Error! Bad return status for module build on kernel: 3.10.0-1127.19.1.el7.x86_64 (x86_64)
      Consult /var/lib/dkms/lustre-client/2.12.5/build/make.log for more information.

      Also, I did verify that the MLNX rpms that are supposed to be installed, are installed.
      On the machine I am trying to install on, I did check and ibstat states that both the cards have an active LinkUP:

      [root@lnet08 ~]# ibstat
      CA 'mlx5_0'
      CA type: MT4119
      Number of ports: 1
      Firmware version: 16.26.1040
      Hardware version: 0
      Node GUID: 0xb8599f03002f8318
      System image GUID: 0xb8599f03002f8318
      Port 1:
      State: Active
      Physical state: LinkUp
      Rate: 100
      Base lid: 1522
      LMC: 0
      SM lid: 1434
      Capability mask: 0x2651e848
      Port GUID: 0xb8599f03002f8318
      Link layer: InfiniBand
      CA 'mlx5_1'
      CA type: MT4119
      Number of ports: 1
      Firmware version: 16.26.1040
      Hardware version: 0
      Node GUID: 0xb8599f03002f8319
      System image GUID: 0xb8599f03002f8318
      Port 1:
      State: Active
      Physical state: LinkUp
      Rate: 56
      Base lid: 2260
      LMC: 0
      SM lid: 158
      Capability mask: 0x2651e848
      Port GUID: 0xb8599f03002f8319
      Link layer: InfiniBand

      Any ideas how to get this to work ?

      Thanks,
      Mike

      Attachments

        1. autogen.sh
          0.3 kB
        2. config.log
          208 kB
        3. lustre-version.m4
          1 kB

        Issue Links

          Activity

            [LU-14297] Can't compile lustre client against MLNX OFED-5.2-1.0.4 on Centos 7.8

            Hi Jian,
            I followed your instructions and that seems to have worked and the lnet route is running. I need to rebuild 9 other lnet routers and this is what I should correct ? Or is there going to be an "official" release that will include this fix soon ?
            It won't be an official version of 2.12.5 correct ?
            Thanks,
            Mike

            mre64 Michael Ethier (Inactive) added a comment - Hi Jian, I followed your instructions and that seems to have worked and the lnet route is running. I need to rebuild 9 other lnet routers and this is what I should correct ? Or is there going to be an "official" release that will include this fix soon ? It won't be an official version of 2.12.5 correct ? Thanks, Mike
            yujian Jian Yu added a comment -

            Hi Mike,
            The same ones as those in #comment-288967

            yujian Jian Yu added a comment - Hi Mike, The same ones as those in #comment-288967

            Hi Jian,
            The patches I should apply are they the same ones or different ones ? Can you give me pointers to them ?
            Thanks,
            Mike

            mre64 Michael Ethier (Inactive) added a comment - Hi Jian, The patches I should apply are they the same ones or different ones ? Can you give me pointers to them ? Thanks, Mike
            yujian Jian Yu added a comment -

            And before running autogen.sh, the attached lustre-version.m4 also needs to be put into /usr/src/lustre-client-2.12.5/config.
            The following steps work for me from scratch:

            # rpm -ivh lustre-client-dkms-2.12.5-1.el7.noarch.rpm
            # cd /usr/src/lustre-client-2.12.5/
            # patch -p1 < /root/0001-LU-13761-o2ib-Fix-compilation-with-MOFED-5.1.patch 
            # patch -p1 < /root/0001-LU-13783-o2iblnd-make-FMR-pool-support-optional.patch
            # cp /root/autogen.sh .
            # cp /root/lustre-version.m4 config/
            # sh ./autogen.sh 
            # dkms install -k $(uname -r) lustre-client/2.12.5
            ...
            ...
             - Installation
               - Installing to /lib/modules/3.10.0-1127.19.1.el7.x86_64/extra/
            Adding any weak-modules
            
            depmod....
            
            DKMS: install completed.
            
            yujian Jian Yu added a comment - And before running autogen.sh , the attached lustre-version.m4 also needs to be put into /usr/src/lustre-client-2.12.5/config . The following steps work for me from scratch: # rpm -ivh lustre-client-dkms-2.12.5-1.el7.noarch.rpm # cd /usr/src/lustre-client-2.12.5/ # patch -p1 < /root/0001-LU-13761-o2ib-Fix-compilation-with-MOFED-5.1.patch # patch -p1 < /root/0001-LU-13783-o2iblnd-make-FMR-pool-support-optional.patch # cp /root/autogen.sh . # cp /root/lustre-version.m4 config/ # sh ./autogen.sh # dkms install -k $(uname -r) lustre-client/2.12.5 ... ... - Installation - Installing to /lib/modules/3.10.0-1127.19.1.el7.x86_64/extra/ Adding any weak-modules depmod.... DKMS: install completed.
            yujian Jian Yu added a comment -

            Hi Mike,
            I can reproduce your issue. After applying the patches, could you please run the attached autogen.sh under /usr/src/lustre-client-2.12.5 before running dkms install ...?

            # pwd
            /usr/src/lustre-client-2.12.5
            # sh ./autogen.sh
            
            yujian Jian Yu added a comment - Hi Mike, I can reproduce your issue. After applying the patches, could you please run the attached autogen.sh under /usr/src/lustre-client-2.12.5 before running dkms install ... ? # pwd /usr/src/lustre-client-2.12.5 # sh ./autogen.sh

            So I have an lnet router out of service that I was trying to get running with the latest MOFED and lustre 2.12.5. Should I just rebuilt it back to its previous functioning setup ? I don't want to leave it down for a long time.

            mre64 Michael Ethier (Inactive) added a comment - So I have an lnet router out of service that I was trying to get running with the latest MOFED and lustre 2.12.5. Should I just rebuilt it back to its previous functioning setup ? I don't want to leave it down for a long time.
            pjones Peter Jones added a comment -

            My suggestion is that we expedite landing https://review.whamcloud.com/#/c/41152/ to b2_12 and then the tip of b2_12 will be what is needed to to build 2.12.6 for MOFED 5.2. We have not thought about 2.12.7 timing yet, but we will certainly want to include this fix.

            pjones Peter Jones added a comment - My suggestion is that we expedite landing https://review.whamcloud.com/#/c/41152/ to b2_12 and then the tip of b2_12 will be what is needed to to build 2.12.6 for MOFED 5.2. We have not thought about 2.12.7 timing yet, but we will certainly want to include this fix.

            Hi Jian,
            Any luck in trying my method ?
            Thanks,
            Mike

            mre64 Michael Ethier (Inactive) added a comment - Hi Jian, Any luck in trying my method ? Thanks, Mike
            yujian Jian Yu added a comment -

            You're welcome, Mike. I'm not sure when the next 2.12.x version will be released.
            I directly installed the lustre-client-dkms rpm generated by Jenkins build system https://build.whamcloud.com/job/lustre-reviews/78580/arch=x86_64,build_type=client,distro=el7.8,ib_stack=inkernel/artifact/artifacts/RPMS/x86_64/lustre-client-dkms-2.12.5_1_g726eed2-1.el7.noarch.rpm without problem.
            I will try your method to see how it goes.

            yujian Jian Yu added a comment - You're welcome, Mike. I'm not sure when the next 2.12.x version will be released. I directly installed the lustre-client-dkms rpm generated by Jenkins build system https://build.whamcloud.com/job/lustre-reviews/78580/arch=x86_64,build_type=client,distro=el7.8,ib_stack=inkernel/artifact/artifacts/RPMS/x86_64/lustre-client-dkms-2.12.5_1_g726eed2-1.el7.noarch.rpm without problem. I will try your method to see how it goes.

            Hi Jian,
            I just tried those 2 patches you recommended to lustre 2.12.5 and its failing the same way still. How exactly are you applying those 2 patches ? This is what I did:

            [root@cannonlnet08 lustre-client-2.12.5]# pwd
            /usr/src/lustre-client-2.12.5
            [root@cannonlnet08 lustre-client-2.12.5]# patch -p1 < ~/14e02fb3.diff
            patching file lnet/autoconf/lustre-lnet.m4
            Hunk #3 succeeded at 567 with fuzz 2 (offset -23 lines).
            patching file lnet/klnds/o2iblnd/o2iblnd.c
            patching file lnet/klnds/o2iblnd/o2iblnd.h
            patching file lnet/klnds/o2iblnd/o2iblnd_cb.c
            [root@cannonlnet08 lustre-client-2.12.5]# patch -p1 < ~/ba702c79.diff
            patching file lnet/autoconf/lustre-lnet.m4
            Hunk #1 succeeded at 579 with fuzz 2 (offset 9 lines).
            patching file lnet/klnds/o2iblnd/o2iblnd_cb.c
            Hunk #1 succeeded at 2418 (offset 11 lines).

            Then I started the build:
            [root@cannonlnet08 lustre-client-2.12.5]# dkms install -k $(uname -r) lustre-client/2.12.5

            Kernel preparation unnecessary for this kernel. Skipping...

            Running the pre_build script:
            checking build system type... x86_64-unknown-linux-gnu
            ...
            ...

            mre64 Michael Ethier (Inactive) added a comment - Hi Jian, I just tried those 2 patches you recommended to lustre 2.12.5 and its failing the same way still. How exactly are you applying those 2 patches ? This is what I did: [root@cannonlnet08 lustre-client-2.12.5] # pwd /usr/src/lustre-client-2.12.5 [root@cannonlnet08 lustre-client-2.12.5] # patch -p1 < ~/14e02fb3.diff patching file lnet/autoconf/lustre-lnet.m4 Hunk #3 succeeded at 567 with fuzz 2 (offset -23 lines). patching file lnet/klnds/o2iblnd/o2iblnd.c patching file lnet/klnds/o2iblnd/o2iblnd.h patching file lnet/klnds/o2iblnd/o2iblnd_cb.c [root@cannonlnet08 lustre-client-2.12.5] # patch -p1 < ~/ba702c79.diff patching file lnet/autoconf/lustre-lnet.m4 Hunk #1 succeeded at 579 with fuzz 2 (offset 9 lines). patching file lnet/klnds/o2iblnd/o2iblnd_cb.c Hunk #1 succeeded at 2418 (offset 11 lines). Then I started the build: [root@cannonlnet08 lustre-client-2.12.5] # dkms install -k $(uname -r) lustre-client/2.12.5 Kernel preparation unnecessary for this kernel. Skipping... Running the pre_build script: checking build system type... x86_64-unknown-linux-gnu ... ...

            BTW, do you know when this issue will be fixed in the general lustre release ? 2.12.6 is already released.

            mre64 Michael Ethier (Inactive) added a comment - BTW, do you know when this issue will be fixed in the general lustre release ? 2.12.6 is already released.

            People

              yujian Jian Yu
              mre64 Michael Ethier (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: