[LU-5140] Mellanox backport header conflicts with newer kernels Created: 03/Jun/14  Updated: 18/Jun/14  Resolved: 18/Jun/14

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.6.0
Fix Version/s: Lustre 2.6.0

Type: Bug Priority: Minor
Reporter: James A Simmons Assignee: Bob Glossman (Inactive)
Resolution: Fixed Votes: 0
Labels: build
Environment:

Any linux environment that supports process namespace and the external Mellanox OFED stack. In my case it was SLES11 SP3 with a external OFED stack.


Issue Links:
Duplicate
duplicates LU-5194 fail to build lustre with OFED-3.12 Closed
Severity: 3
Rank (Obsolete): 14179

 Description   

The Mellanox OFED stack has a compatibility layer to allow it be build across many kernel versions and many distributions. The linux process namespace is one of the things Mellanox creates wrappers to handle various levels of support of this feature.

For lustre the libcfs layer also does the same exact thing to handle different levels of support of process namespace. In order to do that libcfs has to figure out which abstract to wrap around, Mellanox or the native system. Currently libcfs doesn't not handle this case properly.



 Comments   
Comment by James A Simmons [ 03/Jun/14 ]

Here is the fix.

http://review.whamcloud.com/#/c/10571

I think this ticket should be linked to LU-4416.

Comment by Bob Glossman (Inactive) [ 03/Jun/14 ]

I ran into the same problem with a recent OFED distro and rhel7. It looks to me like MLNX/OFED chose to keep a local version of uidgid.h to solve this problem. This solution conflicts with the wrappers used in lustre code to address the issue. It's not clear to me how to avoid this conflict. We should adapt correctly to building in environments both with and without any namespace support, with kernel IB stack, with MLNX IB stack, with OFED IB stack.

Comment by Bob Glossman (Inactive) [ 03/Jun/14 ]

James, you are ahead of me again. I added my comment before I saw your posted solution. How many environments have you tested your fix in? Our build and autotest will only test current distros, which only exercise the case of no namespace support.

Comment by James A Simmons [ 03/Jun/14 ]

I ran into this issue using a external Mellanox stack on an SLES11 SP3 system (Cray CLE5.2 environment). I'm going to test the current 2.6 on my test bed Cray system today.

Comment by Bob Glossman (Inactive) [ 03/Jun/14 ]

Checking back I see that SP3 does have uidgid.h in it too. I didn't notice because it has the relevant CONFIG_ setting turned off by default there, so it always built correctly before we had namespace changes in lustre.

Comment by Bob Glossman (Inactive) [ 03/Jun/14 ]

Doesn't look like the proposed solution works universally. Builds in our autotest framework are passing, but those are with older OFED versions.

I still see failures when trying to build a with a recent OFED; OFED-3.12-rc3; in Centos 6.5. In this case there is no uidgid.h present in kernel source but there is one present in OFED. Due to a lack of one in the kernel, lustre's config.h has #undef HAVE_UIDGID_HEADER. This leads to trying to use local #defines, which in turn leads to compile conflicts with uidgid.h from OFED. example errors:

    .
    .
    .
  LD [M]  /home/bogl/lustre-release/libcfs/libcfs/libcfs.o
  CC [M]  /home/bogl/lustre-release/lnet/klnds/o2iblnd/o2iblnd.o
In file included from /home/bogl/lustre-release/libcfs/include/libcfs/libcfs.h:56,
                 from /home/bogl/lustre-release/lnet/klnds/o2iblnd/o2iblnd.h:79,
                 from /home/bogl/lustre-release/lnet/klnds/o2iblnd/o2iblnd.c:41:
/home/bogl/lustre-release/libcfs/include/libcfs/curproc.h:48: error: redefinition of typedef ‘kuid_t’
/usr/src/compat-rdma/include/linux/uidgid.h:50: note: previous declaration of ‘kuid_t’ was here
/home/bogl/lustre-release/libcfs/include/libcfs/curproc.h:49: error: redefinition of typedef ‘kgid_t’
/usr/src/compat-rdma/include/linux/uidgid.h:51: note: previous declaration of ‘kgid_t’ was here
In file included from /home/bogl/lustre-release/libcfs/include/libcfs/libcfs.h:56,
                 from /home/bogl/lustre-release/lnet/klnds/o2iblnd/o2iblnd.h:79,
                 from /home/bogl/lustre-release/lnet/klnds/o2iblnd/o2iblnd.c:41:
/home/bogl/lustre-release/libcfs/include/libcfs/curproc.h:51:1: error: "INVALID_UID" redefined
In file included from /usr/src/compat-rdma/include/linux/compat-2.6.h:19,
                 from /home/bogl/lustre-release/lnet/klnds/o2iblnd/o2iblnd.h:70,
                 from /home/bogl/lustre-release/lnet/klnds/o2iblnd/o2iblnd.c:41:
/usr/src/compat-rdma/include/linux/uidgid.h:71:1: error: this is the location of the previous definition
In file included from /home/bogl/lustre-release/libcfs/include/libcfs/libcfs.h:56,
                 from /home/bogl/lustre-release/lnet/klnds/o2iblnd/o2iblnd.h:79,
                 from /home/bogl/lustre-release/lnet/klnds/o2iblnd/o2iblnd.c:41:
/home/bogl/lustre-release/libcfs/include/libcfs/curproc.h:52:1: error: "INVALID_GID" redefined
In file included from /usr/src/compat-rdma/include/linux/compat-2.6.h:19,
                 from /home/bogl/lustre-release/lnet/klnds/o2iblnd/o2iblnd.h:70,
                 from /home/bogl/lustre-release/lnet/klnds/o2iblnd/o2iblnd.c:41:
/usr/src/compat-rdma/include/linux/uidgid.h:72:1: error: this is the location of the previous definition
In file included from /home/bogl/lustre-release/libcfs/include/libcfs/libcfs.h:56,
                 from /home/bogl/lustre-release/lnet/klnds/o2iblnd/o2iblnd.h:79,
                 from /home/bogl/lustre-release/lnet/klnds/o2iblnd/o2iblnd.c:41:
/home/bogl/lustre-release/libcfs/include/libcfs/curproc.h:54:1: error: "GLOBAL_ROOT_UID" redefined
In file included from /usr/src/compat-rdma/include/linux/compat-2.6.h:19,
                 from /home/bogl/lustre-release/lnet/klnds/o2iblnd/o2iblnd.h:70,
                 from /home/bogl/lustre-release/lnet/klnds/o2iblnd/o2iblnd.c:41:
/usr/src/compat-rdma/include/linux/uidgid.h:68:1: error: this is the location of the previous definition
In file included from /home/bogl/lustre-release/libcfs/include/libcfs/libcfs.h:56,
                 from /home/bogl/lustre-release/lnet/klnds/o2iblnd/o2iblnd.h:79,
                 from /home/bogl/lustre-release/lnet/klnds/o2iblnd/o2iblnd.c:41:
/home/bogl/lustre-release/libcfs/include/libcfs/curproc.h:55:1: error: "GLOBAL_ROOT_GID" redefined
In file included from /usr/src/compat-rdma/include/linux/compat-2.6.h:19,
                 from /home/bogl/lustre-release/lnet/klnds/o2iblnd/o2iblnd.h:70,
                 from /home/bogl/lustre-release/lnet/klnds/o2iblnd/o2iblnd.c:41:
/usr/src/compat-rdma/include/linux/uidgid.h:69:1: error: this is the location of the previous definition
/home/bogl/lustre-release/libcfs/include/libcfs/curproc.h:65: error: redefinition of ‘__kuid_val’
/usr/src/compat-rdma/include/linux/uidgid.h:53: note: previous definition of ‘__kuid_val’ was here
/home/bogl/lustre-release/libcfs/include/libcfs/curproc.h:70: error: redefinition of ‘__kgid_val’
/usr/src/compat-rdma/include/linux/uidgid.h:58: note: previous definition of ‘__kgid_val’ was here
/home/bogl/lustre-release/libcfs/include/libcfs/curproc.h:75: error: static declaration of ‘backport_make_kuid’ follows non-static declaration
/usr/src/compat-rdma/include/linux/uidgid.h:137: note: previous declaration of ‘backport_make_kuid’ was here
/home/bogl/lustre-release/libcfs/include/libcfs/curproc.h:80: error: static declaration of ‘backport_make_kgid’ follows non-static declaration
/usr/src/compat-rdma/include/linux/uidgid.h:139: note: previous declaration of ‘backport_make_kgid’ was here
/home/bogl/lustre-release/libcfs/include/libcfs/curproc.h:85: error: static declaration of ‘backport_from_kuid’ follows non-static declaration
/usr/src/compat-rdma/include/linux/uidgid.h:142: note: previous declaration of ‘backport_from_kuid’ was here
/home/bogl/lustre-release/libcfs/include/libcfs/curproc.h:90: error: static declaration of ‘backport_from_kgid’ follows non-static declaration
/usr/src/compat-rdma/include/linux/uidgid.h:144: note: previous declaration of ‘backport_from_kgid’ was here
/home/bogl/lustre-release/libcfs/include/libcfs/curproc.h:95: error: redefinition of ‘uid_eq’
/usr/src/compat-rdma/include/linux/uidgid.h:74: note: previous definition of ‘uid_eq’ was here
/home/bogl/lustre-release/libcfs/include/libcfs/curproc.h:100: error: redefinition of ‘uid_valid’
/usr/src/compat-rdma/include/linux/uidgid.h:124: note: previous definition of ‘uid_valid’ was here
/home/bogl/lustre-release/libcfs/include/libcfs/curproc.h:105: error: redefinition of ‘gid_valid’
/usr/src/compat-rdma/include/linux/uidgid.h:129: note: previous definition of ‘gid_valid’ was here
make[7]: *** [/home/bogl/lustre-release/lnet/klnds/o2iblnd/o2iblnd.o] Error 1
make[6]: *** [/home/bogl/lustre-release/lnet/klnds/o2iblnd] Error 2
make[5]: *** [/home/bogl/lustre-release/lnet/klnds] Error 2
make[4]: *** [/home/bogl/lustre-release/lnet] Error 2
make[3]: *** [_module_/home/bogl/lustre-release] Error 2
make[3]: Leaving directory `/home/bogl/rb/BUILD/kernel-2.6.32.431.17.1.l0508'
make[2]: *** [modules] Error 2
make[2]: Leaving directory `/home/bogl/lustre-release'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/home/bogl/lustre-release'
make: *** [all] Error 2
Comment by James A Simmons [ 03/Jun/14 ]

Looks like we will have to use _LINUX_UIDGID_H as well. Both the linux kernel header and the compact-rdma.h define this to avoid potential conflicts with each other. I don't have a OFED 12 setup right now so try this:

diff --git a/libcfs/include/libcfs/curproc.h b/libcfs/include/libcfs/curproc.h
index e5e3d57..aa144e7 100644
— a/libcfs/include/libcfs/curproc.h
+++ b/libcfs/include/libcfs/curproc.h
@@ -45,6 +45,9 @@

#if !defined(HAVE_UIDGID_HEADER) || !defined(_KERNEL_)

+#ifndef _LINUX_UIDGID_H
+#define _LINUX_UIDGID_H
+
typedef uid_t kuid_t;
typedef gid_t kgid_t;

@@ -106,6 +109,8 @@ static inline bool gid_valid(kgid_t gid)

{ return (gid != INVALID_GID); }

+#endif /* _LINUX_UIDGID_H */
+
#endif

int cfs_get_environ(const char *key, char *value, int *val_len);

Comment by Bob Glossman (Inactive) [ 03/Jun/14 ]

That additional change to curproc.h does repair the build with recent OFED on Centos 6.5, but can't speak to all other variations. Even in the Centos build I'm worried that it may have the effect of using defns from OFED uidgid.h in some places and local defns from curproc.h in others, depending on who includes what exactly.

Comment by James A Simmons [ 04/Jun/14 ]

The question becomes the order of importance for the uidgid defines. We have the possible combos of distro, ofed, and libcfs. The order for all code outside of the o2ib LND driver is a no brainier. We use the distro if present and the libcfs if not present. The definitions for OFED don't show up outside the o2ib LND driver. Now in the o2ib LND driver do we want in order of most to least importance:

compact-rdma.h -> uidgid.h -> libcfs

uidgid.h -> compact-rdma.h -> libcfs

As for defining _LINUX_UIDGID_H I really can't see a way around this unless we involve the OFED testing in libcfs autoconf and that would be to messy and ugly.

Comment by James A Simmons [ 09/Jun/14 ]

Maloo failed to run for patch http://review.whamcloud.com/#/c/10571. Could some one please start the test for this patch. Thank you.

Comment by Bob Glossman (Inactive) [ 09/Jun/14 ]

James, I've called for a retest. It's back in the queue.

Comment by Patrick Farrell (Inactive) [ 13/Jun/14 ]

When we tried to build master with OFED 3.12, Cray encountered the following build failure, which is also fixed by Jame's patch:

[ 140s] CC [M] /usr/src/packages/BUILD/cray-lustre/lnet/klnds/o2iblnd/o2iblnd.o
[ 141s] In file included from /usr/src/packages/BUILD/cray-lustre/libcfs/include/libcfs/linux/linux-prim.h:66,
[ 141s] from /usr/src/packages/BUILD/cray-lustre/libcfs/include/libcfs/linux/libcfs.h:53,
[ 141s] from /usr/src/packages/BUILD/cray-lustre/libcfs/include/libcfs/libcfs.h:47,
[ 141s] from /usr/src/packages/BUILD/cray-lustre/lnet/klnds/o2iblnd/o2iblnd.h:71,
[ 141s] from /usr/src/packages/BUILD/cray-lustre/lnet/klnds/o2iblnd/o2iblnd.c:41:
[ 141s] /usr/src/kernel-modules-ofed/x86_64/cray_ari_s/include/linux/uidgid.h:137: error: 'LINUX_BACKPORT' declared as function returning a function
[ 141s] cc1: warnings being treated as errors
[ 141s] /usr/src/kernel-modules-ofed/x86_64/cray_ari_s/include/linux/uidgid.h:137: error: parameter names (without types) in function declaration
[ 141s] /usr/src/kernel-modules-ofed/x86_64/cray_ari_s/include/linux/uidgid.h:139: error: 'LINUX_BACKPORT' declared as function returning a function
[ 141s] /usr/src/kernel-modules-ofed/x86_64/cray_ari_s/include/linux/uidgid.h:139: error: parameter names (without types) in function declaration
[ 141s] /usr/src/kernel-modules-ofed/x86_64/cray_ari_s/include/linux/uidgid.h:142: error: 'LINUX_BACKPORT' declared as function returning a function
[ 141s] /usr/src/kernel-modules-ofed/x86_64/cray_ari_s/include/linux/uidgid.h:142: error: parameter names (without types) in function declaration
[ 141s] /usr/src/kernel-modules-ofed/x86_64/cray_ari_s/include/linux/uidgid.h:144: error: 'LINUX_BACKPORT' declared as function returning a function
[ 141s] /usr/src/kernel-modules-ofed/x86_64/cray_ari_s/include/linux/uidgid.h:144: error: parameter names (without types) in function declaration
[ 141s] /usr/src/kernel-modules-ofed/x86_64/cray_ari_s/include/linux/uidgid.h:146: error: 'LINUX_BACKPORT' declared as function returning a function
[ 141s] /usr/src/kernel-modules-ofed/x86_64/cray_ari_s/include/linux/uidgid.h:146: error: parameter names (without types) in function declaration
[ 141s] /usr/src/kernel-modules-ofed/x86_64/cray_ari_s/include/linux/uidgid.h:148: error: 'LINUX_BACKPORT' declared as function returning a function
[ 141s] /usr/src/kernel-modules-ofed/x86_64/cray_ari_s/include/linux/uidgid.h:148: error: parameter names (without types) in function declaration
[ 141s] /usr/src/kernel-modules-ofed/x86_64/cray_ari_s/include/linux/uidgid.h:152: error: 'LINUX_BACKPORT' declared as function returning a function
[ 141s] /usr/src/kernel-modules-ofed/x86_64/cray_ari_s/include/linux/uidgid.h:152: error: function declaration isn't a prototype
[ 141s] /usr/src/kernel-modules-ofed/x86_64/cray_ari_s/include/linux/uidgid.h:151: error: static declaration of 'LINUX_BACKPORT' follows non-static declaration
[ 141s] /usr/src/kernel-modules-ofed/x86_64/cray_ari_s/include/linux/uidgid.h:148: error: previous declaration of 'LINUX_BACKPORT' was here
[ 141s] /usr/src/kernel-modules-ofed/x86_64/cray_ari_s/include/linux/uidgid.h: In function 'LINUX_BACKPORT':
[ 141s] /usr/src/kernel-modules-ofed/x86_64/cray_ari_s/include/linux/uidgid.h:153: error: 'from_kuid' undeclared (first use in this function)
[ 141s] /usr/src/kernel-modules-ofed/x86_64/cray_ari_s/include/linux/uidgid.h:153: error: (Each undeclared identifier is reported only once
[ 141s] /usr/src/kernel-modules-ofed/x86_64/cray_ari_s/include/linux/uidgid.h:153: error: for each function it appears in.)
[ 141s] /usr/src/kernel-modules-ofed/x86_64/cray_ari_s/include/linux/uidgid.h:153: error: 'ns' undeclared (first use in this function)
[ 141s] /usr/src/kernel-modules-ofed/x86_64/cray_ari_s/include/linux/uidgid.h:153: error: 'uid' undeclared (first use in this function)
[ 141s] /usr/src/kernel-modules-ofed/x86_64/cray_ari_s/include/linux/uidgid.h:153: error: called object 'LINUX_BACKPORT(<erroneous-expression>)' is not a function
[ 141s] /usr/src/kernel-modules-ofed/x86_64/cray_ari_s/include/linux/uidgid.h: At top level:
[ 141s] /usr/src/kernel-modules-ofed/x86_64/cray_ari_s/include/linux/uidgid.h:158: error: 'LINUX_BACKPORT' declared as function returning a function
[ 141s] /usr/src/kernel-modules-ofed/x86_64/cray_ari_s/include/linux/uidgid.h:158: error: function declaration isn't a prototype
[ 141s] /usr/src/kernel-modules-ofed/x86_64/cray_ari_s/include/linux/uidgid.h:157: error: redefinition of 'LINUX_BACKPORT'
[ 141s] /usr/src/kernel-modules-ofed/x86_64/cray_ari_s/include/linux/uidgid.h:152: error: previous definition of 'LINUX_BACKPORT' was here
[ 141s] /usr/src/kernel-modules-ofed/x86_64/cray_ari_s/include/linux/uidgid.h: In function 'LINUX_BACKPORT':
[ 141s] /usr/src/kernel-modules-ofed/x86_64/cray_ari_s/include/linux/uidgid.h:159: error: 'from_kgid' undeclared (first use in this function)
[ 141s] /usr/src/kernel-modules-ofed/x86_64/cray_ari_s/include/linux/uidgid.h:159: error: 'ns' undeclared (first use in this function)
[ 141s] /usr/src/kernel-modules-ofed/x86_64/cray_ari_s/include/linux/uidgid.h:159: error: 'gid' undeclared (first use in this function)
[ 141s] /usr/src/kernel-modules-ofed/x86_64/cray_ari_s/include/linux/uidgid.h:159: error: called object 'LINUX_BACKPORT(<erroneous-expression>)' is not a function
[ 141s] make[9]: *** [/usr/src/packages/BUILD/cray-lustre/lnet/klnds/o2iblnd/o2iblnd.o] Error 1
[ 141s] make[8]: *** [/usr/src/packages/BUILD/cray-lustre/lnet/klnds/o2iblnd] Error 2
[ 141s] make[7]: *** [/usr/src/packages/BUILD/cray-lustre/lnet/klnds] Error 2
[ 141s] make[6]: *** [/usr/src/packages/BUILD/cray-lustre/lnet] Error 2
[ 141s] make[5]: *** [_module_/usr/src/packages/BUILD/cray-lustre] Error 2
[ 141s] make[4]: *** [sub-make] Error 2
[ 141s] make[3]: *** [all] Error 2
[ 141s] make[3]: Leaving directory `/usr/src/linux-3.0.101-0.21.1_1.0000.8135-obj/x86_64/cray_ari_s'
[ 141s] make[2]: *** [modules] Error 2
[ 141s] make[2]: Leaving directory `/usr/src/packages/BUILD/cray-lustre'
[ 141s] make[1]: *** [all-recursive] Error 1
[ 141s] make[1]: Leaving directory `/usr/src/packages/BUILD/cray-lustre'
[ 141s] make: *** [all] Error 2
[ 141s] error: Bad exit status from /var/tmp/rpm-tmp.71744 (%build)
[ 141s]
[ 141s]
[ 141s] RPM build errors:
[ 141s] Bad exit status from /var/tmp/rpm-tmp.71744 (%build)

Comment by James A Simmons [ 16/Jun/14 ]

Patch has landed. This ticket can be closed.

Generated at Sat Feb 10 01:48:52 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.