Details
-
Bug
-
Resolution: Fixed
-
Minor
-
Lustre 2.6.0
-
Any linux environment that supports process namespace and the external Mellanox OFED stack. In my case it was SLES11 SP3 with a external OFED stack.
-
3
-
14179
Description
The Mellanox OFED stack has a compatibility layer to allow it be build across many kernel versions and many distributions. The linux process namespace is one of the things Mellanox creates wrappers to handle various levels of support of this feature.
For lustre the libcfs layer also does the same exact thing to handle different levels of support of process namespace. In order to do that libcfs has to figure out which abstract to wrap around, Mellanox or the native system. Currently libcfs doesn't not handle this case properly.
Attachments
Issue Links
- duplicates
-
LU-5194 fail to build lustre with OFED-3.12
-
- Closed
-
Activity
When we tried to build master with OFED 3.12, Cray encountered the following build failure, which is also fixed by Jame's patch:
[ 140s] CC [M] /usr/src/packages/BUILD/cray-lustre/lnet/klnds/o2iblnd/o2iblnd.o
[ 141s] In file included from /usr/src/packages/BUILD/cray-lustre/libcfs/include/libcfs/linux/linux-prim.h:66,
[ 141s] from /usr/src/packages/BUILD/cray-lustre/libcfs/include/libcfs/linux/libcfs.h:53,
[ 141s] from /usr/src/packages/BUILD/cray-lustre/libcfs/include/libcfs/libcfs.h:47,
[ 141s] from /usr/src/packages/BUILD/cray-lustre/lnet/klnds/o2iblnd/o2iblnd.h:71,
[ 141s] from /usr/src/packages/BUILD/cray-lustre/lnet/klnds/o2iblnd/o2iblnd.c:41:
[ 141s] /usr/src/kernel-modules-ofed/x86_64/cray_ari_s/include/linux/uidgid.h:137: error: 'LINUX_BACKPORT' declared as function returning a function
[ 141s] cc1: warnings being treated as errors
[ 141s] /usr/src/kernel-modules-ofed/x86_64/cray_ari_s/include/linux/uidgid.h:137: error: parameter names (without types) in function declaration
[ 141s] /usr/src/kernel-modules-ofed/x86_64/cray_ari_s/include/linux/uidgid.h:139: error: 'LINUX_BACKPORT' declared as function returning a function
[ 141s] /usr/src/kernel-modules-ofed/x86_64/cray_ari_s/include/linux/uidgid.h:139: error: parameter names (without types) in function declaration
[ 141s] /usr/src/kernel-modules-ofed/x86_64/cray_ari_s/include/linux/uidgid.h:142: error: 'LINUX_BACKPORT' declared as function returning a function
[ 141s] /usr/src/kernel-modules-ofed/x86_64/cray_ari_s/include/linux/uidgid.h:142: error: parameter names (without types) in function declaration
[ 141s] /usr/src/kernel-modules-ofed/x86_64/cray_ari_s/include/linux/uidgid.h:144: error: 'LINUX_BACKPORT' declared as function returning a function
[ 141s] /usr/src/kernel-modules-ofed/x86_64/cray_ari_s/include/linux/uidgid.h:144: error: parameter names (without types) in function declaration
[ 141s] /usr/src/kernel-modules-ofed/x86_64/cray_ari_s/include/linux/uidgid.h:146: error: 'LINUX_BACKPORT' declared as function returning a function
[ 141s] /usr/src/kernel-modules-ofed/x86_64/cray_ari_s/include/linux/uidgid.h:146: error: parameter names (without types) in function declaration
[ 141s] /usr/src/kernel-modules-ofed/x86_64/cray_ari_s/include/linux/uidgid.h:148: error: 'LINUX_BACKPORT' declared as function returning a function
[ 141s] /usr/src/kernel-modules-ofed/x86_64/cray_ari_s/include/linux/uidgid.h:148: error: parameter names (without types) in function declaration
[ 141s] /usr/src/kernel-modules-ofed/x86_64/cray_ari_s/include/linux/uidgid.h:152: error: 'LINUX_BACKPORT' declared as function returning a function
[ 141s] /usr/src/kernel-modules-ofed/x86_64/cray_ari_s/include/linux/uidgid.h:152: error: function declaration isn't a prototype
[ 141s] /usr/src/kernel-modules-ofed/x86_64/cray_ari_s/include/linux/uidgid.h:151: error: static declaration of 'LINUX_BACKPORT' follows non-static declaration
[ 141s] /usr/src/kernel-modules-ofed/x86_64/cray_ari_s/include/linux/uidgid.h:148: error: previous declaration of 'LINUX_BACKPORT' was here
[ 141s] /usr/src/kernel-modules-ofed/x86_64/cray_ari_s/include/linux/uidgid.h: In function 'LINUX_BACKPORT':
[ 141s] /usr/src/kernel-modules-ofed/x86_64/cray_ari_s/include/linux/uidgid.h:153: error: 'from_kuid' undeclared (first use in this function)
[ 141s] /usr/src/kernel-modules-ofed/x86_64/cray_ari_s/include/linux/uidgid.h:153: error: (Each undeclared identifier is reported only once
[ 141s] /usr/src/kernel-modules-ofed/x86_64/cray_ari_s/include/linux/uidgid.h:153: error: for each function it appears in.)
[ 141s] /usr/src/kernel-modules-ofed/x86_64/cray_ari_s/include/linux/uidgid.h:153: error: 'ns' undeclared (first use in this function)
[ 141s] /usr/src/kernel-modules-ofed/x86_64/cray_ari_s/include/linux/uidgid.h:153: error: 'uid' undeclared (first use in this function)
[ 141s] /usr/src/kernel-modules-ofed/x86_64/cray_ari_s/include/linux/uidgid.h:153: error: called object 'LINUX_BACKPORT(<erroneous-expression>)' is not a function
[ 141s] /usr/src/kernel-modules-ofed/x86_64/cray_ari_s/include/linux/uidgid.h: At top level:
[ 141s] /usr/src/kernel-modules-ofed/x86_64/cray_ari_s/include/linux/uidgid.h:158: error: 'LINUX_BACKPORT' declared as function returning a function
[ 141s] /usr/src/kernel-modules-ofed/x86_64/cray_ari_s/include/linux/uidgid.h:158: error: function declaration isn't a prototype
[ 141s] /usr/src/kernel-modules-ofed/x86_64/cray_ari_s/include/linux/uidgid.h:157: error: redefinition of 'LINUX_BACKPORT'
[ 141s] /usr/src/kernel-modules-ofed/x86_64/cray_ari_s/include/linux/uidgid.h:152: error: previous definition of 'LINUX_BACKPORT' was here
[ 141s] /usr/src/kernel-modules-ofed/x86_64/cray_ari_s/include/linux/uidgid.h: In function 'LINUX_BACKPORT':
[ 141s] /usr/src/kernel-modules-ofed/x86_64/cray_ari_s/include/linux/uidgid.h:159: error: 'from_kgid' undeclared (first use in this function)
[ 141s] /usr/src/kernel-modules-ofed/x86_64/cray_ari_s/include/linux/uidgid.h:159: error: 'ns' undeclared (first use in this function)
[ 141s] /usr/src/kernel-modules-ofed/x86_64/cray_ari_s/include/linux/uidgid.h:159: error: 'gid' undeclared (first use in this function)
[ 141s] /usr/src/kernel-modules-ofed/x86_64/cray_ari_s/include/linux/uidgid.h:159: error: called object 'LINUX_BACKPORT(<erroneous-expression>)' is not a function
[ 141s] make[9]: *** [/usr/src/packages/BUILD/cray-lustre/lnet/klnds/o2iblnd/o2iblnd.o] Error 1
[ 141s] make[8]: *** [/usr/src/packages/BUILD/cray-lustre/lnet/klnds/o2iblnd] Error 2
[ 141s] make[7]: *** [/usr/src/packages/BUILD/cray-lustre/lnet/klnds] Error 2
[ 141s] make[6]: *** [/usr/src/packages/BUILD/cray-lustre/lnet] Error 2
[ 141s] make[5]: *** [_module_/usr/src/packages/BUILD/cray-lustre] Error 2
[ 141s] make[4]: *** [sub-make] Error 2
[ 141s] make[3]: *** [all] Error 2
[ 141s] make[3]: Leaving directory `/usr/src/linux-3.0.101-0.21.1_1.0000.8135-obj/x86_64/cray_ari_s'
[ 141s] make[2]: *** [modules] Error 2
[ 141s] make[2]: Leaving directory `/usr/src/packages/BUILD/cray-lustre'
[ 141s] make[1]: *** [all-recursive] Error 1
[ 141s] make[1]: Leaving directory `/usr/src/packages/BUILD/cray-lustre'
[ 141s] make: *** [all] Error 2
[ 141s] error: Bad exit status from /var/tmp/rpm-tmp.71744 (%build)
[ 141s]
[ 141s]
[ 141s] RPM build errors:
[ 141s] Bad exit status from /var/tmp/rpm-tmp.71744 (%build)
Maloo failed to run for patch http://review.whamcloud.com/#/c/10571. Could some one please start the test for this patch. Thank you.
The question becomes the order of importance for the uidgid defines. We have the possible combos of distro, ofed, and libcfs. The order for all code outside of the o2ib LND driver is a no brainier. We use the distro if present and the libcfs if not present. The definitions for OFED don't show up outside the o2ib LND driver. Now in the o2ib LND driver do we want in order of most to least importance:
compact-rdma.h -> uidgid.h -> libcfs
uidgid.h -> compact-rdma.h -> libcfs
As for defining _LINUX_UIDGID_H I really can't see a way around this unless we involve the OFED testing in libcfs autoconf and that would be to messy and ugly.
That additional change to curproc.h does repair the build with recent OFED on Centos 6.5, but can't speak to all other variations. Even in the Centos build I'm worried that it may have the effect of using defns from OFED uidgid.h in some places and local defns from curproc.h in others, depending on who includes what exactly.
Looks like we will have to use _LINUX_UIDGID_H as well. Both the linux kernel header and the compact-rdma.h define this to avoid potential conflicts with each other. I don't have a OFED 12 setup right now so try this:
diff --git a/libcfs/include/libcfs/curproc.h b/libcfs/include/libcfs/curproc.h
index e5e3d57..aa144e7 100644
— a/libcfs/include/libcfs/curproc.h
+++ b/libcfs/include/libcfs/curproc.h
@@ -45,6 +45,9 @@
#if !defined(HAVE_UIDGID_HEADER) || !defined(_KERNEL_)
+#ifndef _LINUX_UIDGID_H
+#define _LINUX_UIDGID_H
+
typedef uid_t kuid_t;
typedef gid_t kgid_t;
@@ -106,6 +109,8 @@ static inline bool gid_valid(kgid_t gid)
{ return (gid != INVALID_GID); }+#endif /* _LINUX_UIDGID_H */
+
#endif
int cfs_get_environ(const char *key, char *value, int *val_len);
Doesn't look like the proposed solution works universally. Builds in our autotest framework are passing, but those are with older OFED versions.
I still see failures when trying to build a with a recent OFED; OFED-3.12-rc3; in Centos 6.5. In this case there is no uidgid.h present in kernel source but there is one present in OFED. Due to a lack of one in the kernel, lustre's config.h has #undef HAVE_UIDGID_HEADER. This leads to trying to use local #defines, which in turn leads to compile conflicts with uidgid.h from OFED. example errors:
. . . LD [M] /home/bogl/lustre-release/libcfs/libcfs/libcfs.o CC [M] /home/bogl/lustre-release/lnet/klnds/o2iblnd/o2iblnd.o In file included from /home/bogl/lustre-release/libcfs/include/libcfs/libcfs.h:56, from /home/bogl/lustre-release/lnet/klnds/o2iblnd/o2iblnd.h:79, from /home/bogl/lustre-release/lnet/klnds/o2iblnd/o2iblnd.c:41: /home/bogl/lustre-release/libcfs/include/libcfs/curproc.h:48: error: redefinition of typedef ‘kuid_t’ /usr/src/compat-rdma/include/linux/uidgid.h:50: note: previous declaration of ‘kuid_t’ was here /home/bogl/lustre-release/libcfs/include/libcfs/curproc.h:49: error: redefinition of typedef ‘kgid_t’ /usr/src/compat-rdma/include/linux/uidgid.h:51: note: previous declaration of ‘kgid_t’ was here In file included from /home/bogl/lustre-release/libcfs/include/libcfs/libcfs.h:56, from /home/bogl/lustre-release/lnet/klnds/o2iblnd/o2iblnd.h:79, from /home/bogl/lustre-release/lnet/klnds/o2iblnd/o2iblnd.c:41: /home/bogl/lustre-release/libcfs/include/libcfs/curproc.h:51:1: error: "INVALID_UID" redefined In file included from /usr/src/compat-rdma/include/linux/compat-2.6.h:19, from /home/bogl/lustre-release/lnet/klnds/o2iblnd/o2iblnd.h:70, from /home/bogl/lustre-release/lnet/klnds/o2iblnd/o2iblnd.c:41: /usr/src/compat-rdma/include/linux/uidgid.h:71:1: error: this is the location of the previous definition In file included from /home/bogl/lustre-release/libcfs/include/libcfs/libcfs.h:56, from /home/bogl/lustre-release/lnet/klnds/o2iblnd/o2iblnd.h:79, from /home/bogl/lustre-release/lnet/klnds/o2iblnd/o2iblnd.c:41: /home/bogl/lustre-release/libcfs/include/libcfs/curproc.h:52:1: error: "INVALID_GID" redefined In file included from /usr/src/compat-rdma/include/linux/compat-2.6.h:19, from /home/bogl/lustre-release/lnet/klnds/o2iblnd/o2iblnd.h:70, from /home/bogl/lustre-release/lnet/klnds/o2iblnd/o2iblnd.c:41: /usr/src/compat-rdma/include/linux/uidgid.h:72:1: error: this is the location of the previous definition In file included from /home/bogl/lustre-release/libcfs/include/libcfs/libcfs.h:56, from /home/bogl/lustre-release/lnet/klnds/o2iblnd/o2iblnd.h:79, from /home/bogl/lustre-release/lnet/klnds/o2iblnd/o2iblnd.c:41: /home/bogl/lustre-release/libcfs/include/libcfs/curproc.h:54:1: error: "GLOBAL_ROOT_UID" redefined In file included from /usr/src/compat-rdma/include/linux/compat-2.6.h:19, from /home/bogl/lustre-release/lnet/klnds/o2iblnd/o2iblnd.h:70, from /home/bogl/lustre-release/lnet/klnds/o2iblnd/o2iblnd.c:41: /usr/src/compat-rdma/include/linux/uidgid.h:68:1: error: this is the location of the previous definition In file included from /home/bogl/lustre-release/libcfs/include/libcfs/libcfs.h:56, from /home/bogl/lustre-release/lnet/klnds/o2iblnd/o2iblnd.h:79, from /home/bogl/lustre-release/lnet/klnds/o2iblnd/o2iblnd.c:41: /home/bogl/lustre-release/libcfs/include/libcfs/curproc.h:55:1: error: "GLOBAL_ROOT_GID" redefined In file included from /usr/src/compat-rdma/include/linux/compat-2.6.h:19, from /home/bogl/lustre-release/lnet/klnds/o2iblnd/o2iblnd.h:70, from /home/bogl/lustre-release/lnet/klnds/o2iblnd/o2iblnd.c:41: /usr/src/compat-rdma/include/linux/uidgid.h:69:1: error: this is the location of the previous definition /home/bogl/lustre-release/libcfs/include/libcfs/curproc.h:65: error: redefinition of ‘__kuid_val’ /usr/src/compat-rdma/include/linux/uidgid.h:53: note: previous definition of ‘__kuid_val’ was here /home/bogl/lustre-release/libcfs/include/libcfs/curproc.h:70: error: redefinition of ‘__kgid_val’ /usr/src/compat-rdma/include/linux/uidgid.h:58: note: previous definition of ‘__kgid_val’ was here /home/bogl/lustre-release/libcfs/include/libcfs/curproc.h:75: error: static declaration of ‘backport_make_kuid’ follows non-static declaration /usr/src/compat-rdma/include/linux/uidgid.h:137: note: previous declaration of ‘backport_make_kuid’ was here /home/bogl/lustre-release/libcfs/include/libcfs/curproc.h:80: error: static declaration of ‘backport_make_kgid’ follows non-static declaration /usr/src/compat-rdma/include/linux/uidgid.h:139: note: previous declaration of ‘backport_make_kgid’ was here /home/bogl/lustre-release/libcfs/include/libcfs/curproc.h:85: error: static declaration of ‘backport_from_kuid’ follows non-static declaration /usr/src/compat-rdma/include/linux/uidgid.h:142: note: previous declaration of ‘backport_from_kuid’ was here /home/bogl/lustre-release/libcfs/include/libcfs/curproc.h:90: error: static declaration of ‘backport_from_kgid’ follows non-static declaration /usr/src/compat-rdma/include/linux/uidgid.h:144: note: previous declaration of ‘backport_from_kgid’ was here /home/bogl/lustre-release/libcfs/include/libcfs/curproc.h:95: error: redefinition of ‘uid_eq’ /usr/src/compat-rdma/include/linux/uidgid.h:74: note: previous definition of ‘uid_eq’ was here /home/bogl/lustre-release/libcfs/include/libcfs/curproc.h:100: error: redefinition of ‘uid_valid’ /usr/src/compat-rdma/include/linux/uidgid.h:124: note: previous definition of ‘uid_valid’ was here /home/bogl/lustre-release/libcfs/include/libcfs/curproc.h:105: error: redefinition of ‘gid_valid’ /usr/src/compat-rdma/include/linux/uidgid.h:129: note: previous definition of ‘gid_valid’ was here make[7]: *** [/home/bogl/lustre-release/lnet/klnds/o2iblnd/o2iblnd.o] Error 1 make[6]: *** [/home/bogl/lustre-release/lnet/klnds/o2iblnd] Error 2 make[5]: *** [/home/bogl/lustre-release/lnet/klnds] Error 2 make[4]: *** [/home/bogl/lustre-release/lnet] Error 2 make[3]: *** [_module_/home/bogl/lustre-release] Error 2 make[3]: Leaving directory `/home/bogl/rb/BUILD/kernel-2.6.32.431.17.1.l0508' make[2]: *** [modules] Error 2 make[2]: Leaving directory `/home/bogl/lustre-release' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `/home/bogl/lustre-release' make: *** [all] Error 2
Checking back I see that SP3 does have uidgid.h in it too. I didn't notice because it has the relevant CONFIG_ setting turned off by default there, so it always built correctly before we had namespace changes in lustre.
I ran into this issue using a external Mellanox stack on an SLES11 SP3 system (Cray CLE5.2 environment). I'm going to test the current 2.6 on my test bed Cray system today.
Patch has landed. This ticket can be closed.