[LU-3389] Lustre b2_1 build failed on RHEL6.4 with OFA IB stack Created: 24/May/13  Updated: 09/Dec/13  Resolved: 12/Aug/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.1.6
Fix Version/s: Lustre 2.4.1, Lustre 2.5.0

Type: Bug Priority: Blocker
Reporter: Jian Yu Assignee: Minh Diep
Resolution: Fixed Votes: 0
Labels: mn1
Environment:

Distro: RHEL6.4
Network: OFA IB


Severity: 3
Rank (Obsolete): 8392

 Description   

After http://review.whamcloud.com/5504 was landed on Lustre b2_1 branch, build on RHEL6.4 distro with OFA IB stack has been failing:

http://build.whamcloud.com/job/lustre-b2_1/198/
http://build.whamcloud.com/job/lustre-b2_1/203/

The OFA version is 1.5.4.



 Comments   
Comment by Jian Yu [ 24/May/13 ]

I saw that http://review.whamcloud.com/5688 was also landed on Lustre b2_1 branch. Should we change to build with "--ofed-version=3.5"?

Comment by Peter Jones [ 24/May/13 ]

If it works then sure, but I think that even 3.5 only supports RHEL6.3 not RHEL 6.4. I had heard some mention of and OFED 3.5.1. See LU-2975. For now I think that we will just disable the external OFED build as we have done on master and I have opened a TT ticket (TT-1291) to track this.

Comment by Yang Sheng [ 24/May/13 ]

Looks like 3.5.1 have present in OFED daily build. The create date was May 22. Hope it release sooner.

Comment by Shuichi Ihara (Inactive) [ 24/May/13 ]

Yes, I've tested OFED-3.5.1 with RHEL6.4 and it works. Howerver, it still needs some changes to build with Lustre.
I'm wokring on this and will update here. btw, MLNX_OFED_LINUX-2.0-2.0.5 is based on OFED-3.x, and this works on RHEL6.4 with Lustre for both servers and client.

Comment by Peter Jones [ 24/May/13 ]

Thanks Ihara! For the immediate releases scheduled for this quarter I think that the only options would be to either to not test external OFED or to test OFA OFED though I definitely think that there is a case to be made for us to look at Mellanox OFED as a possibility for the future. I did raise this suggestion on the most recent CDWG but there was not strong interest from others present. So, for the time being - do you expect to be able to supply a patch to allow us to support RHEL 6.4 and OFED 3.5.1 in the next day or so?

Comment by Shuichi Ihara (Inactive) [ 24/May/13 ]

Peter,
I just figured out and patches were not much needed for lustre build with OFED-3.5.1 against RHEL6.4 kernel.

Here is a quick workaround for lustre (master and b2_1) with RHEL6.4 kernel and OFED-3.5.1.

# EXTRA_LNET_INCLUDE="-DCONFIG_COMPAT_RHEL_6_4" ./configure --with-o2ib=/usr/src/compat-rdma --with-linux=/usr/src/kernels/2.6.32-358.6.1.el6_lustre.x86_64

I've confirmed the compile worked with OFED-3.5.1, and am working on auto-detection of RHEL6.4 and set -DCONFIG_COMPAT_RHEL_6_4 to EXTRA_LNET_INCLUDE at the configure time. I will do some tests and push patches sonner.

btw, I also would suggest land patch for LU-3166 (http://review.whamcloud.com/6048) as well. This is needed for bonding configuration with OFED-3.x stack.

Comment by Shuichi Ihara (Inactive) [ 24/May/13 ]

http://review.whamcloud.com/6448

patch to build master on RHEL6.4 + OFED-3.5.1.

Comment by Shuichi Ihara (Inactive) [ 15/Jul/13 ]

The patches work to build OFED-3.5-1 against RHEL6.4, but Mellanox OFED has different Macro name today.
Here is quick description to build MLNX_OFED_LINUX-2.0-2.0.5 against the latest lustre patched kernel based on RHEL6.4. Today, we need adding macro name that we want to avoid due to OFED doesn't have automatically kernel verion detection. I asked Mellanox and they seems to be trying fix in the future release.

# EXTRA_LNET_INCLUDE="-DCONFIG_COMPAT_IS_PHYS_ID_STATE -DCONFIG_COMPAT_IS_PCI_PHYSFN \
-DCONFIG_COMPAT_IS_KSTRTOX -DCONFIG_COMPAT_IS_BITOP \
-DCONFIG_COMPAT_NETLINK_3_7 -DCONFIG_COMPAT_IS_IP_TOS2PRIO \
-DCONFIG_COMPAT_IS_NETIF_RSS_QUEUES -DCONFIG_COMPAT_IS_NOOP_LLSEEK \
-DCONFIG_COMPAT_IS_SIMPLE_OPEN -DCONFIG_COMPAT_RCU \
-DCONFIG_COMPAT_HAS_NUM_CHANNELS -DCONFIG_COMPAT_ETHTOOL_OPS_EXT" \
./configure --with-o2ib=/usr/src/ofa_kernel

# EXTRA_LNET_INCLUDE="-DCONFIG_COMPAT_IS_PHYS_ID_STATE -DCONFIG_COMPAT_IS_PCI_PHYSFN \
-DCONFIG_COMPAT_IS_KSTRTOX -DCONFIG_COMPAT_IS_BITOP \
-DCONFIG_COMPAT_NETLINK_3_7 -DCONFIG_COMPAT_IS_IP_TOS2PRIO \
-DCONFIG_COMPAT_IS_NETIF_RSS_QUEUES -DCONFIG_COMPAT_IS_NOOP_LLSEEK \
-DCONFIG_COMPAT_IS_SIMPLE_OPEN -DCONFIG_COMPAT_RCU \
-DCONFIG_COMPAT_HAS_NUM_CHANNELS -DCONFIG_COMPAT_ETHTOOL_OPS_EXT" make rpms
Comment by Jodi Levi (Inactive) [ 29/Jul/13 ]

Now that the 2 patches have landed, can this ticket be closed? Or is more work needed in this ticket for OFED?

Comment by Jian Yu [ 30/Jul/13 ]

Or is more work needed in this ticket for OFED?

1) the patches need to be landed on Lustre b2_1 and b2_4 branches
2) OFA builds need to be enabled on Jenkins

Comment by Minh Diep [ 02/Aug/13 ]

b2_1 and b2_4 are now building with kernel 2.6.32-358.11.1 which has no external OFED support (or may be broken even ok to build).

Comment by Shuichi Ihara (Inactive) [ 02/Aug/13 ]

I ported patch for b2_1 and b2_4 and will post them shortly.

Comment by Shuichi Ihara (Inactive) [ 02/Aug/13 ]

http://review.whamcloud.com/7216 for b2_4
http://review.whamcloud.com/7217 for b2_1

Comment by Peter Jones [ 12/Aug/13 ]

Landed for 2.4.1 and 2.5. Will land to b2_1 if/when work on a 2.1.7 starts

Comment by Wally Wang (Inactive) [ 28/Nov/13 ]

Lustre 2.5 server build fails with OFED 3.5-2. It appears to be unnecessarily providing -DCONFIG_COMPAT_RHEL_6_4 and -I${O2IBPATH}/include to the ldiskfs build. And we probably don't need O2IBPATH for any modules not related to O2IB.

[   96s] make[3]: Leaving directory `/home/abuild/rpmbuild/BUILD/cray-lustre/lustre'
[   96s] /usr/bin/make CC="gcc"  -C /usr/src/linux-2.6.32-358.el6_1.0000.7630-obj/x86_64/cray_ari_s_cos	     \
[   96s] 	-f /home/abuild/rpmbuild/BUILD/cray-lustre/build/Makefile LUSTRE_LINUX_CONFIG=/usr/src/linux-2.6.32-358.el6_1.0000.7630-obj/x86_64/cray_ari_s_cos/.config \
[   96s] 	LINUXINCLUDE='-DCONFIG_COMPAT_RHEL_6_4 -I/usr/src/kernel-modules-ofed/x86_64/cray_ari_s_cos/include -I$(srctree)/arch/$(SRCARCH)/include -I$(srctree)/arch/$(SRCARCH)/include/generated -Iinclude $(if $(KBUILD_SRC),-Iinclude2 -I$(srctree)/include) -I$(srctree)/arch/$(SRCARCH)/include/uapi -Iarch/$(SRCARCH)/include/generated/uapi -I$(srctree)/include/uapi -Iinclude/generated/uapi -include include/linux/autoconf.h' \
[   96s] 	M=/home/abuild/rpmbuild/BUILD/cray-lustre -o tmp_include_depends -o scripts -o \
[   96s] 	include/config/MARKER modules
[   96s] make[3]: Entering directory `/usr/src/linux-2.6.32-358.el6_1.0000.7630-obj/x86_64/cray_ari_s_cos'
[   96s] /usr/bin/make -C ../../../linux-2.6.32-358.el6_1.0000.7630 O=/usr/src/linux-2.6.32-358.el6_1.0000.7630-obj/x86_64/cray_ari_s_cos/. modules
[   97s]   CC [M]  /home/abuild/rpmbuild/BUILD/cray-lustre/ldiskfs/acl.o
[   97s]   CC [M]  /home/abuild/rpmbuild/BUILD/cray-lustre/ldiskfs/balloc.o
[   97s]   CC [M]  /home/abuild/rpmbuild/BUILD/cray-lustre/ldiskfs/bitmap.o
[   97s]   CC [M]  /home/abuild/rpmbuild/BUILD/cray-lustre/ldiskfs/block_validity.o
[   97s]   CC [M]  /home/abuild/rpmbuild/BUILD/cray-lustre/ldiskfs/dir.o
...
...
[   99s]   CC [M]  /home/abuild/rpmbuild/BUILD/cray-lustre/ldiskfs/super.o
[   99s] In file included from /usr/src/linux-2.6.32-358.el6_1.0000.7630/include/trace/ftrace.h:441,
[   99s]                  from /usr/src/linux-2.6.32-358.el6_1.0000.7630/include/trace/define_trace.h:74,
[   99s]                  from /home/abuild/rpmbuild/BUILD/cray-lustre/ldiskfs/trace/events/ldiskfs.h:905,
[   99s]                  from /home/abuild/rpmbuild/BUILD/cray-lustre/ldiskfs/super.c:57:
[   99s] /home/abuild/rpmbuild/BUILD/cray-lustre/ldiskfs/trace/events/ldiskfs.h: In function 'ftrace_profile_enable_ldiskfs_free_inode':
[   99s] /home/abuild/rpmbuild/BUILD/cray-lustre/ldiskfs/trace/events/ldiskfs.h:18: error: implicit declaration of function 'register_trace_ldiskfs_free_inode'
[   99s] In file included from /usr/src/linux-2.6.32-358.el6_1.0000.7630/include/trace/ftrace.h:441,
[   99s]                  from /usr/src/linux-2.6.32-358.el6_1.0000.7630/include/trace/define_trace.h:74,
[   99s]                  from /home/abuild/rpmbuild/BUILD/cray-lustre/ldiskfs/trace/events/ldiskfs.h:905,
[   99s]                  from /home/abuild/rpmbuild/BUILD/cray-lustre/ldiskfs/super.c:57:
[   99s] /home/abuild/rpmbuild/BUILD/cray-lustre/ldiskfs/trace/events/ldiskfs.h: In function 'ftrace_profile_disable_ldiskfs_free_inode':
[   99s] /home/abuild/rpmbuild/BUILD/cray-lustre/ldiskfs/trace/events/ldiskfs.h:18: error: implicit declaration of function 'unregister_trace_ldiskfs_free_inode'
[   99s] In file included from /usr/src/linux-2.6.32-358.el6_1.0000.7630/include/trace/ftrace.h:441,
[   99s]                  from /usr/src/linux-2.6.32-358.el6_1.0000.7630/include/trace/define_trace.h:74,
[...
Comment by Wally Wang (Inactive) [ 09/Dec/13 ]

LU-4266 fixed our problem.

Generated at Sat Feb 10 01:33:29 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.