[LU-5953] lustre[-dkms] needs to automatically account for OFED Created: 25/Nov/14  Updated: 06/Oct/16  Resolved: 20/Jun/16

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.9.0

Type: Improvement Priority: Major
Reporter: Bruno Faccini (Inactive) Assignee: Nathaniel Clark
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-5597 Kernel Module.symvers take precendene... Resolved
is related to LU-6083 IB with Ubuntu 14.04 client Resolved
Rank (Obsolete): 16631

 Description   

Currently our lustre-dkms package does not build against any installed OFED.

Background

Outside of the DKMS world, if a user wants to build Lustre with OFED they first build and install OFED and then using Lustre's configure script, point the Lustre build at the built and installed OFED using the --with-o2ib switch, giving it the path to the OFED installation as an argument.

Problem

This doesn't work with lustre-dkms because there is no way for the user to provide that OFED path to DKMS built (Lustre) modules.

Solution

The nice part is that this solution is general and will be of benefit to users who compile from source in the traditional manner as well as benefiting lustre-dkms users.

The solution that I propose is that the code we currently have to handle --with-o2ib [yes|no|<path_to_ofed>] should look for the OFED installation in it's expected installed location (i.e. when following the official upstream instructions on how to build and install OFED) when given simply a yes option and that OFED should be preferred over in-kernel IB if the argument is simply yes and OFED is installed.



 Comments   
Comment by Dmitry Eremin (Inactive) [ 25/Nov/14 ]

Hmm. What should you do if many different OFEDs installed on the machine? What version should you choose?
In more complicated systems we can have several kernels which compiled with appropriate version of OFED and user in boot time select in which he will boot. How are you going to understand this configuration?

Comment by Peter Jones [ 25/Nov/14 ]

http://review.whamcloud.com/#/c/12686/

Comment by Dmitry Eremin (Inactive) [ 25/Nov/14 ]

My understanding is we cannot try to find out any version of OFED in build scripts. This is unacceptable in terms of correct build process! We should avoid any heuristics in build process. All we need should be specified by user explicitly through command line options or spec file. Probably when we generate a DKMS spec file we can specify a dependency from particular version of OFED.

This is bad idea and potentially bring us a lot of issues from customers when this logic will select incorrect version of OFED silently.

Comment by Dmitry Eremin (Inactive) [ 25/Nov/14 ]

One of possible solutions can be specify to rpmbuild command option something like "--with mlx_ofed" and in .spec file have the following:

%if %{with mlx_ofed}
BuildRequires: mlnx-ofa_kernel-devel
%define ofed_path /usr/src/ofa_kernel
%endif
Comment by James A Simmons [ 04/Dec/14 ]

Will this resolve LU-5597 as well?

Comment by Bruno Faccini (Inactive) [ 05/Dec/14 ]

James, I don't think so because in LU-5597, the "--with-o2ib=<path>" explicit way to specify [M]OFED devel/headers location was used when the purpose of this ticket, and associated patch, is to automize it. I believe LU-5597 issue is more a problem that has occurred during inter-modules symbols/addresses resolution. I wonder if MOFED modules were already compiled/installed at the time where Lustre install reached the depmod step??...

Comment by Gerrit Updater [ 25/Mar/15 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/12686/
Subject: LU-5953 build: use installed OFED by default
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 1222a7114a5558a2f5b81cb570966546e37dec48

Comment by James A Simmons [ 26/Mar/15 ]

I'm seeing the following build errors after this was merged.

hecking whether to use Compat RDMA... /usr/bin/ofed_info
rpm: no arguments given for query
configure: error:
You seem to have an OFED installed but have not installed it's devel package.
If you still want to build Lustre for your OFED I/B stack, you need to install its devel headers RPM.
Instead, if you want to build Lustre for your kernel's built-in I/B stack rather than your installed OFED stack, either remove the OFED package(s) or use --with-o2ib=no.

make: *** No rule to make target `rpms'. Stop.

This is using the default OFED stack with RHEL6. I tried the test you merged in the patch and this is the result I get.

ofed_info | egrep -w 'compat-rdma-devel|kernel-ib-devel|ofa_kernel-devel' | xargs rpm -ql | grep /openib
rpm: no arguments given for query

but you test if the output is null which is not the case.

Comment by Bruno Faccini (Inactive) [ 27/Mar/15 ]

Hello James,
Can you post the output of ofed_info script/cmd from the node where you are building Lustre?
Thanks in advance.

Comment by James A Simmons [ 27/Mar/15 ]

I looked and found our build machine has been hosed for a long time. Its just this patch exposes that the box have a a mixed OFED 3.5 and RHEL infiniband setup on the build machine.

Comment by Bruno Faccini (Inactive) [ 27/Mar/15 ]

And this is somewhat its purpose to detect unexpected IB setups ...

Comment by Wang Shilong (Inactive) [ 02/Apr/15 ]

Hello,

We failed to build recently with this patch.

It was because we failed following check:
ofed_info | egrep -w 'compat-rdma-devel|kernel-ib-devel|ofa_kernel-devel' | xargs rpm -ql | grep '/openib$'

Here is really output for our building (rhel6.6 with MLNX_OFED_LINUX-2.4-1.0.4-rhel6.6-x86_64):

[root@build01 MLNX_OFED_LINUX-2.4-1.0.4-rhel6.6-x86_64]# ofed_info | egrep -w 'compat-rdma-devel|kernel-ib-devel|ofa_kernel-devel' | xargs rpm -ql | grep '/openib'
/usr/src/ofa_kernel-2.4/ofed_scripts/openib.conf
/usr/src/ofa_kernel-2.4/ofed_scripts/openibd
/usr/src/ofa_kernel-2.4/ofed_scripts/openibd.service
/usr/src/ofa_kernel/default/ofed_scripts/openib.conf
/usr/src/ofa_kernel/default/ofed_scripts/openibd
/usr/src/ofa_kernel/default/ofed_scripts/openibd.service

So maybe you mean '/openibd$' rather than '/openib$' ?

Best regards,
Wang Shilong

Comment by Bruno Faccini (Inactive) [ 02/Apr/15 ]

Hello Wang,
Thanks to raise this issue, but I am afraid this may be directly linked to MLNX_OFED different (and recent?) packaging than in the OFED RPMs and which was not in the original scope of this ticket and associated patch.

OTH, I remember I have used the "openib" file/link name as a reference because I thought to have found it was kept for historical in the different OFED versions packaging. But seems MLNX_OFED now (I checked it is still true/working in/with MLNX_OFED_LINUX-2.1-*) breaks this behavior?

Comment by Wang Shilong (Inactive) [ 02/Apr/15 ]

Hello,

At least MLNX_OFED_LINUX-2.4-1.0.4-rhel6.6-x86_64.tar totally breaks your checks..

Please check it and download from following link
http://www.mellanox.com/page/products_dyn?product_family=26&mtag=linux_sw_drivers

Comment by James A Simmons [ 02/Apr/15 ]

I'm in the process of setting up a Mellanox stack system so if I run into this issue I will see what I can do to fix it.

Comment by Bruno Faccini (Inactive) [ 02/Apr/15 ]

The simplest way to fix this could be to use the "--with-o2ib=<path>" configure option/way! And also to request latest MLNX_OFED versions to continue to provide the %install_path/openib link in their devel rpm?

Comment by Wang Shilong (Inactive) [ 02/Apr/15 ]

Generally speaking, i don't think that automatically detecting is a good idea for super users, Bruno, maybe give options to users are better, no?

Comment by Bruno Faccini (Inactive) [ 02/Apr/15 ]

This automatic detection mechanism is intended for IEEL and DKMS Lustre RPMs, as an experienced super-user, again why don't you use the "--with-o2ib=<path>" configure option/way ??

Comment by Gerrit Updater [ 31/May/16 ]

Nathaniel Clark (nathaniel.l.clark@intel.com) uploaded a new patch: http://review.whamcloud.com/20523
Subject: LU-5953 build: use installed OFED by default with dpkg
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 53d4d4ba028f2d2df62d836dc43b2ee4ae66ac4e

Comment by Gerrit Updater [ 20/Jun/16 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/20523/
Subject: LU-5953 build: use installed OFED by default with dpkg
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: ce8389306ad009f59eb5203260df38ddda16828d

Generated at Sat Feb 10 01:55:56 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.