[LU-4266] fix lbuild script to work with OFED 3.5-x Created: 18/Nov/13  Updated: 22/May/14  Resolved: 16/Dec/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.6.0, Lustre 2.5.2

Type: Bug Priority: Minor
Reporter: Minh Diep Assignee: Minh Diep
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Blocker
Severity: 3
Rank (Obsolete): 11719

 Description   

Build lustre using OFED 3.5-1 fail

++ echo 3.5-1
++ sed re 's/([0-9]-[rR][cC][0-9])$//'
+ ofed_version=3.5-1
++ ls '/mnt/build/build/b25/BUILD/RPMS//compat-rdma-devel-3.5-1-2.6.32_358.18.1.el6.x86_64..rpm'
ls: cannot access /mnt/build/build/b25/BUILD/RPMS//compat-rdma-devel-3.5-1-2.6.32_358.18.1.el6.x86_64..rpm: No such file or directory
+ local rpm=
+ rpm2cpio
+ cpio -id



 Comments   
Comment by Minh Diep [ 19/Nov/13 ]

http://review.whamcloud.com/#/c/8319/

but build failed with the following unrelated to OFED. It's interesting that with in-kernel ofed, it passed.

In file included from include/trace/ftrace.h:440,
from include/trace/define_trace.h:73,
from /mnt/build/build/master-ofed/BUILD/BUILD/lustre-2.5.51/ldiskfs/trace/events/ldiskfs.h:904,
from /mnt/build/build/master-ofed/BUILD/BUILD/lustre-2.5.51/ldiskfs/super.c:56:
/mnt/build/build/master-ofed/BUILD/BUILD/lustre-2.5.51/ldiskfs/trace/events/ldiskfs.h: In function 'ftrace_profile_enable_ldiskfs_free_inode':
/mnt/build/build/master-ofed/BUILD/BUILD/lustre-2.5.51/ldiskfs/trace/events/ldiskfs.h:18: error: implicit declaration of function 'register_trace_ldiskfs_free_inode'
/mnt/build/build/master-ofed/BUILD/BUILD/lustre-2.5.51/ldiskfs/trace/events/ldiskfs.h: In function 'ftrace_profile_disable_ldiskfs_free_inode':
/mnt/build/build/master-ofed/BUILD/BUILD/lustre-2.5.51/ldiskfs/trace/events/ldiskfs.h:18: error: implicit declaration of function 'unregister_trace_ldiskfs_free_inode'
/mnt/build/build/master-ofed/BUILD/BUILD/lustre-2.5.51/ldiskfs/trace/events/ldiskfs.h: In function 'ftrace_profile_enable_ldiskfs_request_inode':
/mnt/build/build/master-ofed/BUILD/BUILD/lustre-2.5.51/ldiskfs/trace/events/ldiskfs.h:47: error: implicit declaration of function 'register_trace_ldiskfs_request_inode'
/mnt/build/build/master-ofed/BUILD/BUILD/lustre-2.5.51/ldiskfs/trace/events/ldiskfs.h: In function 'ftrace_profile_disable_ldiskfs_request_inode':
/mnt/build/build/master-ofed/BUILD/BUILD/lustre-2.5.51/ldiskfs/trace/events/ldiskfs.h:47: error: implicit declaration of function 'unregister_trace_ldiskfs_request_inode'
/mnt/build/build/master-ofed/BUILD/BUILD/lustre-2.5.51/ldiskfs/trace/events/ldiskfs.h: In function 'ftrace_profile_enable_ldiskfs_allocate_inode':
/mnt/build/build/master-ofed/BUILD/BUILD/lustre-2.5.51/ldiskfs/trace/events/ldiskfs.h:69: error: implicit declaration of function 'register_trace_ldiskfs_allocate_inode'
/mnt/build/build/master-ofed/BUILD/BUILD/lustre-2.5.51/ldiskfs/trace/events/ldiskfs.h: In function 'ftrace_profile_disable_ldiskfs_allocate_inode':
/mnt/build/build/master-ofed/BUILD/BUILD/lustre-2.5.51/ldiskfs/trace/events/ldiskfs.h:69: error: implicit declaration of function 'unregister_trace_ldiskfs_allocate_inode'
/mnt/build/build/master-ofed/BUILD/BUILD/lustre-2.5.51/ldiskfs/trace/events/ldiskfs.h: In function 'ftrace_profile_enable_ldiskfs_write_begin':
/mnt/build/build/master-ofed/BUILD/BUILD/lustre-2.5.51/ldiskfs/trace/events/ldiskfs.h:121: error: implicit declaration of function 'register_trace_ldiskfs_write_begin'
....

There was a mention about this failure in LU-3462 but no further follow up

Comment by Dmitry Eremin (Inactive) [ 26/Nov/13 ]

The patch http://review.whamcloud.com/8404 should fix this issue.

Comment by Minh Diep [ 26/Nov/13 ]

the above patch did not solve the issue. there seems to be something else. investigating

Comment by Shuichi Ihara (Inactive) [ 27/Nov/13 ]

I fixed it with fixing OFED codes. This causes tracepoint evenets are backported in RHEL6's kernel below.

TRACE_EVENT and DEFINE_EVENT are defined in ofa_kernel/default/include/linux/tracepoint.h
#if (LINUX_VERSION_CODE <= KERNEL_VERSION(2,6,32))
/*
 * Disable all tracing for older kernels
 * < 2.6.27             had no tracing
 * 2.6.27               had broken tracing
 * 2.6.28-2.6.32        didn't have anything like DECLARE_EVENT_CLASS
 *                      and faking it would be extremely difficult
 */backported
#if (LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,28))
/*
 * For 2.6.28+ include the original tracepoint.h but override
 * the defines new code uses to disable tracing completely.
 */
#include_next <linux/tracepoint.h>
#endif
#undef TRACE_EVENT
#define TRACE_EVENT(name, proto, ...) \
static inline void trace_ ## name(proto) {}
#undef DECLARE_EVENT_CLASS
#define DECLARE_EVENT_CLASS(...)
#undef DEFINE_EVENT
#define DEFINE_EVENT(evt_class, name, proto, ...) \
static inline void trace_ ## name(proto) {}

We might need more proper fixes, but as a quick workaround is adding "!defined(CONFIG_COMPAT_RHEL_6_4)" to skip trace events for patched lustre kernel below.

--- ofa_kernel/default/include/linux/tracepoint.h.orig    2013-11-05 03:11:23.000000000 -0800
+++ ofa_kernel/default/include/linux/tracepoint.h    2013-11-05 03:13:08.000000000 -0800
@@ -3,7 +3,7 @@

 #include <linux/version.h>

-#if (LINUX_VERSION_CODE <= KERNEL_VERSION(2,6,32))
+#if (LINUX_VERSION_CODE <= KERNEL_VERSION(2,6,32) && !defined(CONFIG_COMPAT_RHEL_6_4))
 /*
  * Disable all tracing for older kernels
  * < 2.6.27        had no tracing
@@ -35,6 +35,6 @@ static inline void trace_ ## name(proto)
 /* since 2.6.33, tracing hasn't changed, so just include the kernel's file */
 #include_next <linux/tracepoint.h>

-#endif /* (LINUX_VERSION_CODE <= KERNEL_VERSION(2,6,32)) */
+#endif /* (LINUX_VERSION_CODE <= KERNEL_VERSION(2,6,32) && !defined(CONFIG_COMPAT_RHEL_6_4)) */

 #endif    /* _COMPAT_LINUX_TRACEPOINT_H */
Comment by Dmitry Eremin (Inactive) [ 28/Nov/13 ]

I can confirm with this patch I was able to compile OFED-3.5-1 and OFED-3.5-2-rc2. But we have an issue with specifying our patch for OFED in "contrib/patches/ofed". The script "01-remove-mlx4-erroneous-modprobe-config-file:rhel6.ed" is not applied to OFED spec and thanks this other patched will not be applied also.

Comment by Shuichi Ihara (Inactive) [ 28/Nov/13 ]

that's differnt problem. please use OFED-3.5.2-daily build. previous version doesn't work on RHEL6.4

Comment by Minh Diep [ 01/Dec/13 ]

Ihara, could you explain why OFED-3.5-1 doesn't work on RHEL6.4?

I have a patch here that patch ofed and built OFED-3.5-1 successfully.

http://review.whamcloud.com/#/c/8319/

Ihara, Dmitry,

please review the patch. I will do another around of clean up the comments and remove the hack to build OFED 3.5 (since our lab doesn't do that at the moment)

Comment by Dmitry Eremin (Inactive) [ 02/Dec/13 ]

Actually I think we used wrong way to adopt OFED for our build. Why we use OFED headers for fdiskfs build? Probably we should use OFED headers only for sources that required them. Otherwise such backport issues will always affect us. Could we change build scripts to avoid using OFED headers for all sources?

Comment by Dmitry Eremin (Inactive) [ 02/Dec/13 ]

Include any OFED backport headers in all compile commands was introduced in commit 70eca9ed3d7408bebeda59ad65ea4fc1ca9bf57b with following comment:

+# it's ugly to be doing anything with OFED outside of the lnet module, but
+# this has to be done here so that the backports path is set before all of
+# the LN_PROG_LINUX checks are done
+LB_CONFIG_OFED_BACKPORTS

Does anybody knows why this done? Why we need backport headers in all compile commands?

Comment by Dmitry Eremin (Inactive) [ 02/Dec/13 ]

Just removed OFED backport headers for all components except lnet and move all OFED related autoconf testing into lnet/autoconf in patch http://review.whamcloud.com/8451

Comment by parinay v kondekar (Inactive) [ 03/Dec/13 ]

Hello Dmitry,
I tested your patch http://review.whamcloud.com/#/c/8451/1 to build OFED-3.5-2-rc2 ( for another issue LU-3389 reported by Wally ). The build completed successfully. I did remove the hard coded value of OFED_VERSION.
Let me know if I should be updating the logs for your reference.

Thanks

Comment by Dmitry Eremin (Inactive) [ 03/Dec/13 ]

Thanks for testing. But it looks we have an issue with OFED-1.5.3.1:

/mnt/build/build/lu4266/BUILD/BUILD/lustre-2.5.52/lnet/klnds/o2iblnd/o2iblnd.c: In function 'kiblnd_dev_need_failover':
/mnt/build/build/lu4266/BUILD/BUILD/lustre-2.5.52/lnet/klnds/o2iblnd/o2iblnd.c:2565: error: too few arguments to function 'rdma_create_id'
/mnt/build/build/lu4266/BUILD/BUILD/lustre-2.5.52/lnet/klnds/o2iblnd/o2iblnd.c: In function 'kiblnd_dev_failover':
/mnt/build/build/lu4266/BUILD/BUILD/lustre-2.5.52/lnet/klnds/o2iblnd/o2iblnd.c:2639: error: too few arguments to function 'rdma_create_id'
make[7]: *** [/mnt/build/build/lu4266/BUILD/BUILD/lustre-2.5.52/lnet/klnds/o2iblnd/o2iblnd.o] Error 1
make[6]: *** [/mnt/build/build/lu4266/BUILD/BUILD/lustre-2.5.52/lnet/klnds/o2iblnd] Error 2
make[5]: *** [/mnt/build/build/lu4266/BUILD/BUILD/lustre-2.5.52/lnet/klnds] Error 2
make[4]: *** [/mnt/build/build/lu4266/BUILD/BUILD/lustre-2.5.52/lnet] Error 2
make[4]: *** Waiting for unfinished jobs....

I need to fix this.

Comment by Dmitry Eremin (Inactive) [ 03/Dec/13 ]

The Patch Set 2 works for all OFED versions.

Comment by James A Simmons [ 09/Dec/13 ]

Same here. Your patch address my issues as well. I have these same issues on the 2.5 branch as well so this will need to be backported.

Comment by Cory Spitz [ 19/Dec/13 ]

Yes, let's update the fix version to 2.5.1

Comment by James A Simmons [ 04/Feb/14 ]

Patch for b2_5 is at http://review.whamcloud.com/#/c/9109

Generated at Sat Feb 10 01:41:10 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.