[LU-11348] Lustre 2.10.4 failed to build with MLNX_OFED_LINUX-4.4-1.0.0.0 Created: 06/Sep/18 Updated: 14/Sep/18 Resolved: 14/Sep/18 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.10.4 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Haisong Cai (Inactive) | Assignee: | Minh Diep |
| Resolution: | Not a Bug | Votes: | 0 |
| Labels: | build | ||
| Environment: |
[root@wombat-oss-21-2 ~]# uname -a [root@wombat-oss-21-2 ~]# rpm -aq | egrep "spl|zfs" |
||
| Attachments: |
|
| Epic/Theme: | Mellaonx |
| Severity: | 3 |
| Epic: | server |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
When building Lustre 2.10.4 with Mellanox OFED stack, got errors: configure: error: can't compile with OpenIB gen2 headers (please see attached file for more details) |
| Comments |
| Comment by Peter Jones [ 06/Sep/18 ] |
|
Minh Could you please advise Peter |
| Comment by Minh Diep [ 06/Sep/18 ] |
|
Hi haisong, I think you need --with-o2ib=/usr/src/ofa_kernel/default |
| Comment by Haisong Cai (Inactive) [ 06/Sep/18 ] |
|
Hi Minh, Good to see you! Well, I tried it, with various alternatives pathe besides /usr/src/ofa_kernel/default, such as mlnx-ofa_kernel-4.4.The same errors came out.I also noticed
Thanks,Haisong |
| Comment by Minh Diep [ 06/Sep/18 ] |
|
Hey, yes, el6.6 is quite old |
| Comment by Haisong Cai (Inactive) [ 10/Sep/18 ] |
|
Hi Minh, I tried to compile server 2.10.4 and 2.10.5 on a CentOS 6.10, ZFS 0.7.9 machine. Lustre server rpms built. But when try to install, I get errors - see attachment.lustre-installation-errors
|
| Comment by Minh Diep [ 10/Sep/18 ] |
|
Hi haisong 1. download the MLNX_OFED_LINUX-4.4-2.0.7.0-rhel7.5-x86_64.tgz; unzip and cd MLNX_OFED_LINUX-4.4-2.0.7.0-rhel7.5-x86_64 yum localinstall mlnx-ofa_kernel-4.4-OFED.4.4.2.0.7.1.gee7aa0e.rhel7u5.x86_64.rpm kmod-mlnx-ofa_kernel-4.4-OFED.4.4.2.0.7.1.gee7aa0e.201808301559.rhel7u5.x86_64.rpm mlnx-ofa_kernel-devel-4.4-OFED.4.4.2.0.7.1.gee7aa0e.rhel7u5.x86_64.rpm Now you can build lustre Please let me know the result |
| Comment by Haisong Cai (Inactive) [ 10/Sep/18 ] |
|
Minh, I followed suggested steps, but it failed still. Including compilation log and installation log here. Thanks, |
| Comment by Haisong Cai (Inactive) [ 12/Sep/18 ] |
|
Minh, Are the following compatible: CentOS6.10 (kernel 2.6.32-754.3.5.el6.x86_64) ZFS 0.7.9 Lustre server 2.10.4 Mellanox MLNX_OFED_LINUX-4.4-1.0.0.0-rhel6.10-x86_64 I still can't build Lustre server. Thanks, Haisong |
| Comment by Peter Jones [ 12/Sep/18 ] |
|
We offer only support for Lustre clients for RHEL 6.x on the LTS branch. |
| Comment by Haisong Cai (Inactive) [ 12/Sep/18 ] |
|
Thank you for the update. However in my earlier ticket, EDU-91, engineer suggested that we upgrade Lustre server from FE2.7.x to 2.10.x for that particular MDS bug. Now we are preparing for such upgrade, LTS is no longer supported? Here I am assuming LTS referring to all 2.10.x (according to the release note here http://lustre.org/lustre-2-10-0-released)
Thanks, Haisong |
| Comment by Haisong Cai (Inactive) [ 12/Sep/18 ] |
|
I should also add that, on Aug 23, I used a CentOS6.7 server + ZFS 0.7.9, with Lustre 2.10.4 (git checkout v2_10_4) and successfully built and installed the server. Now that process can't be repeated any more. Haisong |
| Comment by Peter Jones [ 12/Sep/18 ] |
|
Haisong Sorry to not have been clear enough - the 2.10.x LTS branch is absolutely supported, but only for Linux distributions as outlined in the support matrix - https://wiki.whamcloud.com/display/PUB/Lustre+Support+Matrix . As you can see, we only offer support for RHEL 7.x servers at present. Reviewing the exchange on the EDU-91 ticket it is clear to me that the enquiry as to whether you were planning to upgrade to 2.10.x assumed that you realized that the implication of this would also mean to upgrade the Linux distribution on the storage servers accordingly. We don't try and break support for older Linux distributions so they can often stay working long after they have been dropped from the combinations we build and test. I am more surprised that it is only so recently RHEL 6.x servers stopped working than to learn that they do not work. It is some years since we have been actively supporting RHEL 6.x servers on master (they were deprecated for the Lustre 2.8 release cycle) Given your requirements I see two options: 1) Upgrade your storage servers distro to RHEL 7.5 and keep your existing kernel on your clients, using 2.10.5 2) Keep your storage servers on 2.7 FE (the last release to support RHEL 6.x servers) and see whether it is possible to port the Personally, I think that option #1 is the better approach. TBH I am surprised that you are still using RHEL 6.x for a Lustre on ZFS deployment - a study conducted back when Rick Wagner was as SDSC had showed that there were performance advantages of using RHEL 7.x over RHEL 6.x for Lustre on ZFS. Many sites will take this approach and treat the storage servers as a "black box" rather than trying to have a single kernel version used across both Lustre servers and clients. Peter |
| Comment by Haisong Cai (Inactive) [ 12/Sep/18 ] |
|
Minh, I followed your suggestion and built 2.10.4 client successfuly on CentOS6.10. The procesures is very close to what I used for building server, minus ZFS command summary: tar zxvf /share/apps/src/MLNX_OFED_LINUX-4.4-1.0.0.0-rhel6.10-x86_64.tgz cd MLNX_OFED_LINUX-4.4-1.0.0.0-rhel6.10-x86_64 tar xvf /tmp/MLNX_OFED_LINUX-4.4-1.0.0.0-rhel6.10-x86_64-ext.tgz [root@badger-oss-4-1 MLNX_OFED_LINUX-4.4-1.0.0.0-rhel6.10-x86_64-ext]# ofed_info -s
I also saved logs. Let me know if you like them too. |
| Comment by Haisong Cai (Inactive) [ 13/Sep/18 ] |
|
Minh, After further troubleshooting, I found my problem to the original issue - Lustre 2.10.4 failed to build on CentOS6.10. It was really a run-time issue, with Lustre build and install process, I believe. It turns out the processes requires Mellanox card to be presented and driver loaded in kernel space first. The machine I used to build Lustre server didn't have the Mellanox card (it was there before) at the time. So even Lustre 2.10.X isn't officially supported in CentOS6.* anymore, one can still build it on the platform and use it. You can close this ticket. Thanks for all the helps, Haisong |
| Comment by Minh Diep [ 13/Sep/18 ] |
|
I find it's hard to believe that we need a card to build. I have built lustre on VM all the time without the card. Perhaps you need to use --with-linux option? -Minh |
| Comment by Haisong Cai (Inactive) [ 13/Sep/18 ] |
|
I used --with-linux option option. With Mellanox card, this time I only run with ./configure --enable-server --disable-ldiskfs --with-o2ib=yes This is the option I usually use. Thanks, Haisong |
| Comment by Minh Diep [ 13/Sep/18 ] |
|
if you use --with-o2ib, the config will assume that you want to build with whatever IB installed on the system which mean you need to have IB card. |
| Comment by Haisong Cai (Inactive) [ 13/Sep/18 ] |
|
In the course of my week long troubleshooting, I tried everything I could think of, including the one you suggested here. At this point, I am certaint that with the IB card in the server, all my configuration options will likely success. In that regard, The configure part of the compilation step is pretty smart about figuring out proper configuration parameters. Haisong |
| Comment by Minh Diep [ 14/Sep/18 ] |
|
Thanks |