[LU-11348] Lustre 2.10.4 failed to build with MLNX_OFED_LINUX-4.4-1.0.0.0 Created: 06/Sep/18  Updated: 14/Sep/18  Resolved: 14/Sep/18

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.10.4
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Haisong Cai (Inactive) Assignee: Minh Diep
Resolution: Not a Bug Votes: 0
Labels: build
Environment:

[root@wombat-oss-21-2 ~]# uname -a
Linux wombat-oss-21-2.local 2.6.32-573.7.1.el6.x86_64 #1 SMP Tue Sep 22 22:00:00 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

[root@wombat-oss-21-2 ~]# rpm -aq | egrep "spl|zfs"
spl-0.7.9-1.el6.x86_64
zfs-0.7.9-1.el6.x86_64
kmod-spl-0.7.9-1.el6.x86_64
kmod-spl-devel-0.7.9-1.el6.x86_64
kmod-zfs-0.7.9-1.el6.x86_64
zfs-dracut-0.7.9-1.el6.x86_64
kmod-zfs-devel-0.7.9-1.el6.x86_64
lustre-osd-zfs-mount-2.10.4-1.el6.x86_64
libzfs2-0.7.9-1.el6.x86_64
libzfs2-devel-0.7.9-1.el6.x86_64
kmod-lustre-osd-zfs-2.10.4-1.el6.x86_64


Attachments: HTML File compiling_errors     HTML File lustre-installation-errors     HTML File lustre_compiling_installing_log    
Epic/Theme: Mellaonx
Severity: 3
Epic: server
Rank (Obsolete): 9223372036854775807

 Description   

 

When building Lustre 2.10.4 with Mellanox OFED stack, got errors:

configure: error: can't compile with OpenIB gen2 headers

(please see attached file for more details)



 Comments   
Comment by Peter Jones [ 06/Sep/18 ]

Minh

Could you please advise

Peter

Comment by Minh Diep [ 06/Sep/18 ]

Hi haisong,

I think you need --with-o2ib=/usr/src/ofa_kernel/default

Comment by Haisong Cai (Inactive) [ 06/Sep/18 ]

Hi Minh,

Good to see you!

Well, I tried it, with various alternatives pathe besides /usr/src/ofa_kernel/default, such as mlnx-ofa_kernel-4.4.The same errors came out.I also noticed LU-6790 mentioning CentOS6.6 wasn't being support?More system info below:

  1. ofed_info -s
    MLNX_OFED_LINUX-4.4-1.0.0.0:
  1. ls -1 /usr/src/
    debug
    kernels
    linux-2.6
    mlnx-ofa_kernel
    mlnx-ofa_kernel-4.4
    ofa_kernel
    ofa_kernel-4.4
    spl-0.6.4.2
    spl-0.7.9
    zfs-0.6.4.2
    zfs-0.7.9

 

Thanks,Haisong

Comment by Minh Diep [ 06/Sep/18 ]

Hey,

yes, el6.6 is quite old

Comment by Haisong Cai (Inactive) [ 10/Sep/18 ]

 

Hi Minh,

I tried to compile server 2.10.4 and 2.10.5 on a CentOS 6.10, ZFS 0.7.9 machine.

Lustre server rpms built. But when try to install, I get errors - see attachment.lustre-installation-errors

 

Comment by Minh Diep [ 10/Sep/18 ]

Hi haisong
Below was my recommendation on el7 but also applies to el6
The dependencies errors are due to different in build process between lustre and MOFED. Here is one way that works

1. download the MLNX_OFED_LINUX-4.4-2.0.7.0-rhel7.5-x86_64.tgz; unzip and cd MLNX_OFED_LINUX-4.4-2.0.7.0-rhel7.5-x86_64
2. you would use mlnx_add_kernel_support.sh since the kernel is newer than MOFED support.
Run mlnx_add_kernel_support.sh --kmp (enable kmp support since it's disabled by default)
3. once a build is done, there is a new tarball in /tmp/MLNX_OFED_LINUX-4.4-2.0.7.0-rhel7.5-x86_64-ext.tgz
4. unzip MLNX_OFED_LINUX-4.4-2.0.7.0-rhel7.5-x86_64-ext.tgz and go to MLNX_OFED_LINUX-4.4-2.0.7.0-rhel7.5-x86_64-ext/RPMS
5. install these rpms (others too if you think you'll need them)

yum localinstall mlnx-ofa_kernel-4.4-OFED.4.4.2.0.7.1.gee7aa0e.rhel7u5.x86_64.rpm kmod-mlnx-ofa_kernel-4.4-OFED.4.4.2.0.7.1.gee7aa0e.201808301559.rhel7u5.x86_64.rpm mlnx-ofa_kernel-devel-4.4-OFED.4.4.2.0.7.1.gee7aa0e.rhel7u5.x86_64.rpm

Now you can build lustre
./configure --with-obib=/usr/src/ofa_kernel/default && make && make rpms

Please let me know the result

Comment by Haisong Cai (Inactive) [ 10/Sep/18 ]

Minh,

I followed suggested steps, but it failed still.

Including compilation log and installation log here.

Thanks,

Haisonglustre_compiling_installing_log

Comment by Haisong Cai (Inactive) [ 12/Sep/18 ]

Minh,

Are the following compatible:

CentOS6.10 (kernel 2.6.32-754.3.5.el6.x86_64)

ZFS 0.7.9

Lustre server 2.10.4

Mellanox MLNX_OFED_LINUX-4.4-1.0.0.0-rhel6.10-x86_64

I still can't build Lustre server.

Thanks,

Haisong

Comment by Peter Jones [ 12/Sep/18 ]

We offer only support for Lustre clients for RHEL 6.x on the LTS branch.

Comment by Haisong Cai (Inactive) [ 12/Sep/18 ]

Thank you for the update.

However in my earlier ticket, EDU-91, engineer suggested that we upgrade Lustre server from FE2.7.x to 2.10.x for that particular MDS bug.

Now we are preparing for such upgrade, LTS is no longer supported? Here I am assuming LTS referring to all 2.10.x (according to the release note here http://lustre.org/lustre-2-10-0-released)

 

Thanks,

Haisong

Comment by Haisong Cai (Inactive) [ 12/Sep/18 ]

I should also add that, on Aug 23, I used a CentOS6.7 server + ZFS 0.7.9, with Lustre 2.10.4 (git checkout v2_10_4) and successfully built and installed the server.

Now that process can't be repeated any more.

Haisong

Comment by Peter Jones [ 12/Sep/18 ]

Haisong

Sorry to not have been clear enough - the 2.10.x LTS branch is absolutely supported, but only for Linux distributions as outlined in the support matrix - https://wiki.whamcloud.com/display/PUB/Lustre+Support+Matrix . As you can see, we only offer support for RHEL 7.x servers at present.

Reviewing the exchange on the EDU-91 ticket it is clear to me that the enquiry as to whether you were planning to upgrade to 2.10.x assumed that you realized that the implication of this would also mean to upgrade the Linux distribution on the storage servers accordingly.

We don't try and break support for older Linux distributions so they can often stay working long after they have been dropped from the combinations we build and test. I am more surprised that it is only so recently RHEL 6.x servers stopped working than to learn that they do not work. It is some years since we have been actively supporting RHEL 6.x servers on master (they were deprecated for the Lustre 2.8 release cycle)

Given your requirements I see two options:

1) Upgrade your storage servers distro to RHEL 7.5 and keep your existing kernel on your clients, using 2.10.5

2) Keep your storage servers on 2.7 FE (the last release to support RHEL 6.x servers) and see whether it is possible to port the LU-7199 fix back to that version

Personally, I think that option #1 is the better approach. TBH I am surprised that you are still using RHEL 6.x for a Lustre on ZFS deployment - a study conducted back when Rick Wagner was as SDSC had showed that there were performance advantages of using RHEL 7.x over RHEL 6.x for Lustre on ZFS. Many sites will take this approach and treat the storage servers as a "black box" rather than trying to have a single kernel version used across both Lustre servers and clients.

Peter

Comment by Haisong Cai (Inactive) [ 12/Sep/18 ]

Minh,

I followed your suggestion and built 2.10.4 client successfuly on CentOS6.10.

The procesures is very close to what I used for building server, minus ZFS

command summary:

tar zxvf /share/apps/src/MLNX_OFED_LINUX-4.4-1.0.0.0-rhel6.10-x86_64.tgz

cd MLNX_OFED_LINUX-4.4-1.0.0.0-rhel6.10-x86_64
./mlnx_add_kernel_support.sh -m /tmp/MLNX_OFED_LINUX-4.4-1.0.0.0-rhel6.10-x86_64

tar xvf /tmp/MLNX_OFED_LINUX-4.4-1.0.0.0-rhel6.10-x86_64-ext.tgz
cd MLNX_OFED_LINUX-4.4-1.0.0.0-rhel6.10-x86_64-ext
./mlnxofedinstall

[root@badger-oss-4-1 MLNX_OFED_LINUX-4.4-1.0.0.0-rhel6.10-x86_64-ext]# ofed_info -s
MLNX_OFED_LINUX-4.4-1.0.0.0:

      1. Lustre client build
        cd /tmp/
        tar zxvf /share/apps/src/lustre-release_git.tgz
        cd /tmp/lustre-release
        sh ./autogen.sh
        ./configure --disable-server --with-obib=/usr/src/ofa_kernel/default && make && make rpms

 

I also saved logs. Let me know if you like them too.

Comment by Haisong Cai (Inactive) [ 13/Sep/18 ]

Minh,

After further troubleshooting, I found my problem to the original issue - Lustre 2.10.4 failed to build on CentOS6.10.

It was really a run-time issue, with Lustre build and install process, I believe.

It turns out the processes requires Mellanox card to be presented and driver loaded in kernel space first. The machine I used to build Lustre server didn't have the Mellanox card (it was there before) at the time.

So even Lustre 2.10.X isn't officially supported in CentOS6.* anymore, one can still build it on the platform and use it.

You can close this ticket.

Thanks for all the helps,

Haisong

Comment by Minh Diep [ 13/Sep/18 ]

I find it's hard to believe that we need a card to build. I have built lustre on VM all the time without the card. Perhaps you need to use --with-linux option?

-Minh

Comment by Haisong Cai (Inactive) [ 13/Sep/18 ]

 

I used --with-linux option option.

With Mellanox card, this time I only run with

./configure --enable-server --disable-ldiskfs --with-o2ib=yes

This is the option I usually use.

Thanks,

Haisong

Comment by Minh Diep [ 13/Sep/18 ]

if you use --with-o2ib, the config will assume that you want to build with whatever IB installed on the system which mean you need to have IB card.
if you don't have IB driver or card, use --with-o2ib=/usr/src/ofa_kernel/default

Comment by Haisong Cai (Inactive) [ 13/Sep/18 ]

In the course of my week long troubleshooting, I tried everything I could think of, including the one you suggested here.

At this point, I am certaint that with the IB card in the server, all my configuration options will likely success. In that regard, The configure part of the compilation step is pretty smart about figuring out proper configuration parameters.

Haisong

Comment by Minh Diep [ 14/Sep/18 ]

Thanks

Generated at Sat Feb 10 02:43:05 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.