[LU-1327] No OFED support for RHEL5.8 2.6.18-308.1.1.el5 Created: 12/Apr/12  Updated: 03/May/12  Resolved: 30/Apr/12

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Johann Lombardi (Inactive) Assignee: Minh Diep
Resolution: Fixed Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 2204

 Description   

RHEL5.8 support has just been landed to b1_8 and there is unfortunately no external version of OFED which supports this kernel yet. The consequence is that builds with external OFED are failing.
Shall we disable builds with external OFED temporarily?



 Comments   
Comment by Peter Jones [ 12/Apr/12 ]

I just added Brian and Minh as watchers. I would like to know what the likelihood of there being an OFED release soon that supports RHEL5.8.

Also, any idea why there is an OFED 3.2 version? That seems a jump from the current 1.5.x....

Comment by Brian Murrell (Inactive) [ 12/Apr/12 ]

My guess re: OFED 3.2: they are changing their development model so that they are working against (i.e. backporting to using the "compat" library) what is in the kernel and adjusting their versions to be in parity with the kernel they are working against.

This might help explain: http://lists.openfabrics.org/pipermail/ewg/2012-February/017303.html

This git tree shows some sign of backports for RHEL 6.2: http://git.openfabrics.org/git?p=compat-rdma/compat-rdma.git;a=summary but nothing older (i.e. RHEL 5.8)

As for RHEL 5.8 support elsewhere, I don't really know. I have been "unplugged" from the OFED development stream for quite a while now. IMHO, if we want to be tracking and supporting OFED for our releases, we need to have somebody at least monitoring what's going on with the OFA group and what they are doing so that we don't wind up "out of the loop" like this and trying to piece together what they are up to.

That said, there doesn't seem to be any new activity in http://git.openfabrics.org/git?p=ofed_1_5/linux-2.6.git;a=summary with 1.5.4.1 being the last thing done there which only seems to have support for RHEL up to 5.7.

Comment by Chris Gearing (Inactive) [ 16/Apr/12 ]

For b1_8 master builds external ofed is now not supported. Rather than just remove the builds which could lead to confusion a README file is produced with the following contents;

16th April 2011.
****************
OFED does not currently support RHEL5.8 and so cannot be built for b1_8. See LU-1327 for more information.
If 5.8 support becomes available for OFED then raise a ticket to have the build renabled.

Comment by James A Simmons [ 16/Apr/12 ]

I have gotten Lustre 1.8 on RHEL5.8 to build with OFED 1.5.4.1. You need to set the BACKPORT_DIR variable to 2.6.18-EL5.7. I also have built and tested a file system with this combo.

Comment by Peter Jones [ 16/Apr/12 ]

Minh

Could you please try this out?

Thanks

Peter

Comment by Peter Jones [ 16/Apr/12 ]

...and thanks for the suggestion James!

Comment by Minh Diep [ 16/Apr/12 ]

Hi James,

Here is the configure command we ran to build ofed

+ ./configure --prefix=/usr --kernel-version 2.6.18-308.1.1.el5 --kernel-sources /home/mdiep/build/b1_8/lustre/reused/usr/src/kernels/2.6.18-308.1.1.el5-x86_64 --modules-dir /lib/modules/2.6.18-308.1.1.el5/updates --without-quilt --with-core-mod --with-ipoib-mod --with-sdp-mod --with-user_mad-mod --with-user_access-mod --with-addr_trans-mod --with-rds-mod --with-qlgc_vnic-mod --with-madeye-mod --with-mthca-mod --with-mlx4-mod --with-mlx4_en-mod --with-cxgb3-mod --with-nes-mod
config.mk does not exist. running ofed_patch.sh
/home/mdiep/build/b1_8/lustre/BUILD/ofa_kernel-1.5.4.1/ofed_scripts/ofed_patch.sh --kernel-version 2.6.18-308.1.1.el5 --without-quilt

...
and we see that it actually used the 2.6.18-EL5.7 backport patches

Applying patches for 2.6.18-EL5.7 kernel:
/home/mdiep/build/b1_8/lustre/BUILD/ofa_kernel-1.5.4.1/kernel_patches/backport/2.6.18-EL5.7/2_misc_device_to_2_6_19.patch
patching file drivers/infiniband/core/ucma.c
Hunk #1 succeeded at 1298 (offset 91 lines).
Hunk #3 succeeded at 1326 (offset 91 lines).

...

but it still failed

-I/home/mdiep/build/b1_8/lustre/reused/usr/src/kernels/2.6.18-308.1.1.el5-x86_64/arch//include \
-Wall -Wundef -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -Wstrict-prototypes -Wundef -Werror-implicit-function-declaration -fno-delete-null-pointer-checks -fwrapv -Os -mtune=generic -m64 -mno-red-zone -mcmodel=kernel -pipe -fno-reorder-blocks -Wno-sign-compare -fno-asynchronous-unwind-tables -funit-at-a-time -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -fomit-frame-pointer -g -fno-stack-protector -Wdeclaration-after-statement -Wno-pointer-sign -DMODULE -D"KBUILD_STR(s)=#s" -D"KBUILD_BASENAME=KBUILD_STR(notice)" -D"KBUILD_MODNAME=KBUILD_STR(ib_sa)" -c -o /home/mdiep/build/b1_8/lustre/BUILD/ofa_kernel-1.5.4.1/drivers/infiniband/core/.tmp_notice.o /home/mdiep/build/b1_8/lustre/BUILD/ofa_kernel-1.5.4.1/drivers/infiniband/core/notice.c
In file included from /home/mdiep/build/b1_8/lustre/BUILD/ofa_kernel-1.5.4.1/drivers/infiniband/core/notice.c:37:
/home/mdiep/build/b1_8/lustre/BUILD/ofa_kernel-1.5.4.1/kernel_addons/backport/2.6.18-EL5.7/include/linux/pci.h:164: error: conflicting types for ‘pci_pcie_cap’
include/linux/pci.h:1015: error: previous definition of ‘pci_pcie_cap’ was here
make[4]: *** [/home/mdiep/build/b1_8/lustre/BUILD/ofa_kernel-1.5.4.1/drivers/infiniband/core/notice.o] Error 1
make[3]: *** [/home/mdiep/build/b1_8/lustre/BUILD/ofa_kernel-1.5.4.1/drivers/infiniband/core] Error 2
make[2]: *** [/home/mdiep/build/b1_8/lustre/BUILD/ofa_kernel-1.5.4.1/drivers/infiniband] Error 2
make[1]: *** [_module_/home/mdiep/build/b1_8/lustre/BUILD/ofa_kernel-1.5.4.1] Error 2
make[1]: Leaving directory `/home/mdiep/build/b1_8/lustre/reused/usr/src/kernels/2.6.18-308.1.1.el5-x86_64'
make: *** [kernel] Error 2
error: Bad exit status from /var/tmp/rpm-tmp.763 (%build)

Comment by James A Simmons [ 17/Apr/12 ]

Oh, I forgot I did patch the OFED kernel for this problem. The question is how to work this so your build system would work out of the box. Thinking.... Perhaps we can add kernel patch in the Lustre tree to make this work. Yes I know it is hocky but it would work. It seems we have no choice since OFED is moving on to 3.2 now which is not even ready. Le me see what I can come up with.

Comment by Peter Jones [ 17/Apr/12 ]

James

I think that the correct approach would be for us to proceed with 1.8.8-wc1 with inkernel OFED only and then those looking to use external OFED with 1.8.8-wc1 will be able to do so with a patched version of OFED.

Peter

Comment by Minh Diep [ 30/Apr/12 ]

James,

Can I close this bug since it has nothing to do with lustre? We can reopen when OFED provide a patch for new kernel

Comment by James A Simmons [ 30/Apr/12 ]

Yes go ahead. It's not really a Lustre issue. Once OFED 3.2 comes out then we can deal with this.

Comment by Minh Diep [ 30/Apr/12 ]

closing, please reopen when it's needed

Comment by Minh Diep [ 02/May/12 ]

I have tried ofed 1.5.3.2 and it worked

Comment by Peter Jones [ 02/May/12 ]

To be clear, this is the Mellanox version of OFED.

Comment by Minh Diep [ 03/May/12 ]

Sorry about not being clear. I have tried both Mellanox version which I think is based on ofed 1.5.3.2; and standard ofed 1.5.3.2. I reason they worked because this ofed version does not have the patch for pci.h file where it declares pci_pcie_cap (which linux kernel added in 5.8)

Generated at Sat Feb 10 01:15:40 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.