[LU-5363] kernel update [SLES11 SP3 3.0.101-0.35] Created: 17/Jul/14  Updated: 25/Aug/14  Resolved: 19/Aug/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.6.0
Fix Version/s: Lustre 2.7.0, Lustre 2.5.3

Type: Improvement Priority: Minor
Reporter: Bob Glossman (Inactive) Assignee: Bob Glossman (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Rank (Obsolete): 14958

 Description   

New kernel version for SLES11 SP3, 3.0.101-0.35. Need to update lustre config and build files. If upstream fix for loop devices is present need to reeanble sanity test previously skipped.



 Comments   
Comment by Bob Glossman (Inactive) [ 17/Jul/14 ]

The new update for sles11sp3 has broken the lustre build. It looks to be due to changes in linux #include files. example errors:

  CC [M]  /home/bogl/lustre-release/lustre/fid/fid_request.o
In file included from /home/bogl/lustre-release/lustre/include/linux/lustre_compat25.h:44,
                 from /home/bogl/lustre-release/lustre/include/linux/obd_support.h:54,
                 from /home/bogl/lustre-release/lustre/include/obd_support.h:44,
                 from /home/bogl/lustre-release/lustre/include/linux/obd.h:44,
                 from /home/bogl/lustre-release/lustre/include/obd.h:44,
                 from /home/bogl/lustre-release/lustre/fid/fid_request.c:47:
/home/bogl/lustre-release/lustre/include/linux/lustre_patchless_compat.h:100:1: error: "d_count" redefined
In file included from /usr/src/linux-3.0.101-0.35/include/linux/fs.h:396,
                 from /usr/src/linux-3.0.101-0.35/include/linux/pagemap.h:8,
                 from /home/bogl/lustre-release/libcfs/include/libcfs/linux/linux-mem.h:54,
                 from /home/bogl/lustre-release/libcfs/include/libcfs/linux/libcfs.h:52,
                 from /home/bogl/lustre-release/libcfs/include/libcfs/libcfs.h:47,
                 from /home/bogl/lustre-release/lustre/fid/fid_request.c:46:
/usr/src/linux-3.0.101-0.35/include/linux/dcache.h:153:1: error: this is the location of the previous definition
make[8]: *** [/home/bogl/lustre-release/lustre/fid/fid_request.o] Error 1
make[7]: *** [/home/bogl/lustre-release/lustre/fid] Error 2
make[6]: *** [/home/bogl/lustre-release/lustre] Error 2
make[5]: *** [_module_/home/bogl/lustre-release] Error 2
make[4]: *** [sub-make] Error 2
make[3]: *** [all] Error 2
make[3]: Leaving directory `/usr/src/linux-3.0.101-0.35-obj/x86_64/default'
make[2]: *** [modules] Error 2
make[2]: Leaving directory `/home/bogl/lustre-release'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/home/bogl/lustre-release'
make: *** [all] Error 2

This problem appears to be due to new unconditional #defines in kernel <linux/dcache.h> of

#define d_lock d_lockref.lock
#define d_count d_lockref.count

These conflict with the #define of d_count() (a macro function, not a simple #define) in lustre #include lustre/include/linux/lustre_patchless_compat.h.

There is already some autoconf for HAVE_D_COUNT, but it looks for an existing d_count() and doesn't find one in linux #includes. Not sure how to change things so it operates correctly. Not sure any distro we still support ever has HAVE_D_COUNT #define'd.

Think I may need some help solving this build problem. The kernel update is blocked until I have a working solution.

Comment by Jeff Mahoney [ 17/Jul/14 ]

This is due to the inclusion of lock ref in a performance update. It's binary compatible with previous releases and defines d_count so previous code that uses d_count directly works as expected. The naming collision is obviously causing problems so we'll need to work around that a bit.

This should work, but is untested:

diff --git a/lustre/autoconf/lustre-core.m4 b/lustre/autoconf/lustre-core.m4
index 0891fd4..eeba701 100644
--- a/lustre/autoconf/lustre-core.m4
+++ b/lustre/autoconf/lustre-core.m4
@@ -1383,6 +1383,24 @@ d_count, [
 ]) # LC_HAVE_DCOUNT
 
 #
+# LC_HAVE_D_LOCKREF
+#
+# SLES 11 SP3's 3.0.101-0.35 update adds lockref, but defines a d_count
+# that is different from later kernel versions.
+#
+AC_DEFUN([LC_HAVE_D_LOCKREF], [
+LB_CHECK_COMPILE([if 'd_lockref exists],
+d_lockref, [
+	#include <linux/dcache.h>
+],[
+	struct dentry de;
+	int x = de.d_lockref.count;
+],[
+	AC_DEFINE(HAVE_D_LOCKREF, 1, [d_lockref exists])
+])
+]) # LC_HAVE_D_LOCKREF
+
+#
 # LC_OLDSIZE_TRUNCATE_PAGECACHE
 #
 # 3.12 truncate_pagecache without oldsize parameter
@@ -1595,6 +1613,7 @@ AC_DEFUN([LC_PROG_LINUX], [
 	LC_HAVE_DIR_CONTEXT
 	LC_D_COMPARE_5ARGS
 	LC_HAVE_DCOUNT
+	LC_HAVE_D_LOCKREF
 
 	# 3.12
 	LC_OLDSIZE_TRUNCATE_PAGECACHE
diff --git a/lustre/include/linux/lustre_patchless_compat.h b/lustre/include/linux/lustre_patchless_compat.h
index 5b7bab6..d119f75 100644
--- a/lustre/include/linux/lustre_patchless_compat.h
+++ b/lustre/include/linux/lustre_patchless_compat.h
@@ -96,8 +96,11 @@ truncate_complete_page(struct address_space *mapping, struct page *page)
 #ifdef HAVE_DCACHE_LOCK
 #  define dget_dlock(d)			dget_locked(d)
 #  define d_count(d)			atomic_read(&(d)->d_count)
-#elif !defined(HAVE_D_COUNT)
+#elif !defined(HAVE_D_COUNT) && !defined(HAVE_D_LOCKREF)
 #  define d_count(d)			((d)->d_count)
+#elif !define(HAVE_D_COUNT) && defined(HAVE_D_LOCKREF)
+#undef d_count
+#define d_count(d)			((d)->d_lockref.count)
 #endif /* HAVE_DCACHE_LOCK */
 
 #ifdef ATTR_OPEN
Comment by Bob Glossman (Inactive) [ 17/Jul/14 ]

Jeff, thanks for the suggestion. Currently working on an alternate approach suggested by Yang Sheng. doing a wrapper, ll_d_count(), and using autoconf conditionals to translate that correctly for all cases. Using ll_d_count() instead of d_count() in lustre to avoid name collisions. If that doesn't work out I may try your way.

Comment by Bob Glossman (Inactive) [ 17/Jul/14 ]

http://review.whamcloud.com/11133

Comment by Bob Glossman (Inactive) [ 18/Jul/14 ]

in b2_5:
http://review.whamcloud.com/11140

Comment by Peter Jones [ 19/Aug/14 ]

Landed for 2.7

Generated at Sat Feb 10 01:50:50 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.