[LU-5140] Mellanox backport header conflicts with newer kernels Created: 03/Jun/14 Updated: 18/Jun/14 Resolved: 18/Jun/14 |
|
| Status: | Closed |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.6.0 |
| Fix Version/s: | Lustre 2.6.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | James A Simmons | Assignee: | Bob Glossman (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | build | ||
| Environment: |
Any linux environment that supports process namespace and the external Mellanox OFED stack. In my case it was SLES11 SP3 with a external OFED stack. |
||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 14179 | ||||||||
| Description |
|
The Mellanox OFED stack has a compatibility layer to allow it be build across many kernel versions and many distributions. The linux process namespace is one of the things Mellanox creates wrappers to handle various levels of support of this feature. For lustre the libcfs layer also does the same exact thing to handle different levels of support of process namespace. In order to do that libcfs has to figure out which abstract to wrap around, Mellanox or the native system. Currently libcfs doesn't not handle this case properly. |
| Comments |
| Comment by James A Simmons [ 03/Jun/14 ] |
|
Here is the fix. http://review.whamcloud.com/#/c/10571 I think this ticket should be linked to |
| Comment by Bob Glossman (Inactive) [ 03/Jun/14 ] |
|
I ran into the same problem with a recent OFED distro and rhel7. It looks to me like MLNX/OFED chose to keep a local version of uidgid.h to solve this problem. This solution conflicts with the wrappers used in lustre code to address the issue. It's not clear to me how to avoid this conflict. We should adapt correctly to building in environments both with and without any namespace support, with kernel IB stack, with MLNX IB stack, with OFED IB stack. |
| Comment by Bob Glossman (Inactive) [ 03/Jun/14 ] |
|
James, you are ahead of me again. I added my comment before I saw your posted solution. How many environments have you tested your fix in? Our build and autotest will only test current distros, which only exercise the case of no namespace support. |
| Comment by James A Simmons [ 03/Jun/14 ] |
|
I ran into this issue using a external Mellanox stack on an SLES11 SP3 system (Cray CLE5.2 environment). I'm going to test the current 2.6 on my test bed Cray system today. |
| Comment by Bob Glossman (Inactive) [ 03/Jun/14 ] |
|
Checking back I see that SP3 does have uidgid.h in it too. I didn't notice because it has the relevant CONFIG_ setting turned off by default there, so it always built correctly before we had namespace changes in lustre. |
| Comment by Bob Glossman (Inactive) [ 03/Jun/14 ] |
|
Doesn't look like the proposed solution works universally. Builds in our autotest framework are passing, but those are with older OFED versions. I still see failures when trying to build a with a recent OFED; OFED-3.12-rc3; in Centos 6.5. In this case there is no uidgid.h present in kernel source but there is one present in OFED. Due to a lack of one in the kernel, lustre's config.h has #undef HAVE_UIDGID_HEADER. This leads to trying to use local #defines, which in turn leads to compile conflicts with uidgid.h from OFED. example errors: .
.
.
LD [M] /home/bogl/lustre-release/libcfs/libcfs/libcfs.o
CC [M] /home/bogl/lustre-release/lnet/klnds/o2iblnd/o2iblnd.o
In file included from /home/bogl/lustre-release/libcfs/include/libcfs/libcfs.h:56,
from /home/bogl/lustre-release/lnet/klnds/o2iblnd/o2iblnd.h:79,
from /home/bogl/lustre-release/lnet/klnds/o2iblnd/o2iblnd.c:41:
/home/bogl/lustre-release/libcfs/include/libcfs/curproc.h:48: error: redefinition of typedef ‘kuid_t’
/usr/src/compat-rdma/include/linux/uidgid.h:50: note: previous declaration of ‘kuid_t’ was here
/home/bogl/lustre-release/libcfs/include/libcfs/curproc.h:49: error: redefinition of typedef ‘kgid_t’
/usr/src/compat-rdma/include/linux/uidgid.h:51: note: previous declaration of ‘kgid_t’ was here
In file included from /home/bogl/lustre-release/libcfs/include/libcfs/libcfs.h:56,
from /home/bogl/lustre-release/lnet/klnds/o2iblnd/o2iblnd.h:79,
from /home/bogl/lustre-release/lnet/klnds/o2iblnd/o2iblnd.c:41:
/home/bogl/lustre-release/libcfs/include/libcfs/curproc.h:51:1: error: "INVALID_UID" redefined
In file included from /usr/src/compat-rdma/include/linux/compat-2.6.h:19,
from /home/bogl/lustre-release/lnet/klnds/o2iblnd/o2iblnd.h:70,
from /home/bogl/lustre-release/lnet/klnds/o2iblnd/o2iblnd.c:41:
/usr/src/compat-rdma/include/linux/uidgid.h:71:1: error: this is the location of the previous definition
In file included from /home/bogl/lustre-release/libcfs/include/libcfs/libcfs.h:56,
from /home/bogl/lustre-release/lnet/klnds/o2iblnd/o2iblnd.h:79,
from /home/bogl/lustre-release/lnet/klnds/o2iblnd/o2iblnd.c:41:
/home/bogl/lustre-release/libcfs/include/libcfs/curproc.h:52:1: error: "INVALID_GID" redefined
In file included from /usr/src/compat-rdma/include/linux/compat-2.6.h:19,
from /home/bogl/lustre-release/lnet/klnds/o2iblnd/o2iblnd.h:70,
from /home/bogl/lustre-release/lnet/klnds/o2iblnd/o2iblnd.c:41:
/usr/src/compat-rdma/include/linux/uidgid.h:72:1: error: this is the location of the previous definition
In file included from /home/bogl/lustre-release/libcfs/include/libcfs/libcfs.h:56,
from /home/bogl/lustre-release/lnet/klnds/o2iblnd/o2iblnd.h:79,
from /home/bogl/lustre-release/lnet/klnds/o2iblnd/o2iblnd.c:41:
/home/bogl/lustre-release/libcfs/include/libcfs/curproc.h:54:1: error: "GLOBAL_ROOT_UID" redefined
In file included from /usr/src/compat-rdma/include/linux/compat-2.6.h:19,
from /home/bogl/lustre-release/lnet/klnds/o2iblnd/o2iblnd.h:70,
from /home/bogl/lustre-release/lnet/klnds/o2iblnd/o2iblnd.c:41:
/usr/src/compat-rdma/include/linux/uidgid.h:68:1: error: this is the location of the previous definition
In file included from /home/bogl/lustre-release/libcfs/include/libcfs/libcfs.h:56,
from /home/bogl/lustre-release/lnet/klnds/o2iblnd/o2iblnd.h:79,
from /home/bogl/lustre-release/lnet/klnds/o2iblnd/o2iblnd.c:41:
/home/bogl/lustre-release/libcfs/include/libcfs/curproc.h:55:1: error: "GLOBAL_ROOT_GID" redefined
In file included from /usr/src/compat-rdma/include/linux/compat-2.6.h:19,
from /home/bogl/lustre-release/lnet/klnds/o2iblnd/o2iblnd.h:70,
from /home/bogl/lustre-release/lnet/klnds/o2iblnd/o2iblnd.c:41:
/usr/src/compat-rdma/include/linux/uidgid.h:69:1: error: this is the location of the previous definition
/home/bogl/lustre-release/libcfs/include/libcfs/curproc.h:65: error: redefinition of ‘__kuid_val’
/usr/src/compat-rdma/include/linux/uidgid.h:53: note: previous definition of ‘__kuid_val’ was here
/home/bogl/lustre-release/libcfs/include/libcfs/curproc.h:70: error: redefinition of ‘__kgid_val’
/usr/src/compat-rdma/include/linux/uidgid.h:58: note: previous definition of ‘__kgid_val’ was here
/home/bogl/lustre-release/libcfs/include/libcfs/curproc.h:75: error: static declaration of ‘backport_make_kuid’ follows non-static declaration
/usr/src/compat-rdma/include/linux/uidgid.h:137: note: previous declaration of ‘backport_make_kuid’ was here
/home/bogl/lustre-release/libcfs/include/libcfs/curproc.h:80: error: static declaration of ‘backport_make_kgid’ follows non-static declaration
/usr/src/compat-rdma/include/linux/uidgid.h:139: note: previous declaration of ‘backport_make_kgid’ was here
/home/bogl/lustre-release/libcfs/include/libcfs/curproc.h:85: error: static declaration of ‘backport_from_kuid’ follows non-static declaration
/usr/src/compat-rdma/include/linux/uidgid.h:142: note: previous declaration of ‘backport_from_kuid’ was here
/home/bogl/lustre-release/libcfs/include/libcfs/curproc.h:90: error: static declaration of ‘backport_from_kgid’ follows non-static declaration
/usr/src/compat-rdma/include/linux/uidgid.h:144: note: previous declaration of ‘backport_from_kgid’ was here
/home/bogl/lustre-release/libcfs/include/libcfs/curproc.h:95: error: redefinition of ‘uid_eq’
/usr/src/compat-rdma/include/linux/uidgid.h:74: note: previous definition of ‘uid_eq’ was here
/home/bogl/lustre-release/libcfs/include/libcfs/curproc.h:100: error: redefinition of ‘uid_valid’
/usr/src/compat-rdma/include/linux/uidgid.h:124: note: previous definition of ‘uid_valid’ was here
/home/bogl/lustre-release/libcfs/include/libcfs/curproc.h:105: error: redefinition of ‘gid_valid’
/usr/src/compat-rdma/include/linux/uidgid.h:129: note: previous definition of ‘gid_valid’ was here
make[7]: *** [/home/bogl/lustre-release/lnet/klnds/o2iblnd/o2iblnd.o] Error 1
make[6]: *** [/home/bogl/lustre-release/lnet/klnds/o2iblnd] Error 2
make[5]: *** [/home/bogl/lustre-release/lnet/klnds] Error 2
make[4]: *** [/home/bogl/lustre-release/lnet] Error 2
make[3]: *** [_module_/home/bogl/lustre-release] Error 2
make[3]: Leaving directory `/home/bogl/rb/BUILD/kernel-2.6.32.431.17.1.l0508'
make[2]: *** [modules] Error 2
make[2]: Leaving directory `/home/bogl/lustre-release'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/home/bogl/lustre-release'
make: *** [all] Error 2
|
| Comment by James A Simmons [ 03/Jun/14 ] |
|
Looks like we will have to use _LINUX_UIDGID_H as well. Both the linux kernel header and the compact-rdma.h define this to avoid potential conflicts with each other. I don't have a OFED 12 setup right now so try this: diff --git a/libcfs/include/libcfs/curproc.h b/libcfs/include/libcfs/curproc.h #if !defined(HAVE_UIDGID_HEADER) || !defined(_KERNEL_) +#ifndef _LINUX_UIDGID_H @@ -106,6 +109,8 @@ static inline bool gid_valid(kgid_t gid) { return (gid != INVALID_GID); }+#endif /* _LINUX_UIDGID_H */ int cfs_get_environ(const char *key, char *value, int *val_len); |
| Comment by Bob Glossman (Inactive) [ 03/Jun/14 ] |
|
That additional change to curproc.h does repair the build with recent OFED on Centos 6.5, but can't speak to all other variations. Even in the Centos build I'm worried that it may have the effect of using defns from OFED uidgid.h in some places and local defns from curproc.h in others, depending on who includes what exactly. |
| Comment by James A Simmons [ 04/Jun/14 ] |
|
The question becomes the order of importance for the uidgid defines. We have the possible combos of distro, ofed, and libcfs. The order for all code outside of the o2ib LND driver is a no brainier. We use the distro if present and the libcfs if not present. The definitions for OFED don't show up outside the o2ib LND driver. Now in the o2ib LND driver do we want in order of most to least importance: compact-rdma.h -> uidgid.h -> libcfs uidgid.h -> compact-rdma.h -> libcfs As for defining _LINUX_UIDGID_H I really can't see a way around this unless we involve the OFED testing in libcfs autoconf and that would be to messy and ugly. |
| Comment by James A Simmons [ 09/Jun/14 ] |
|
Maloo failed to run for patch http://review.whamcloud.com/#/c/10571. Could some one please start the test for this patch. Thank you. |
| Comment by Bob Glossman (Inactive) [ 09/Jun/14 ] |
|
James, I've called for a retest. It's back in the queue. |
| Comment by Patrick Farrell (Inactive) [ 13/Jun/14 ] |
|
When we tried to build master with OFED 3.12, Cray encountered the following build failure, which is also fixed by Jame's patch: [ 140s] CC [M] /usr/src/packages/BUILD/cray-lustre/lnet/klnds/o2iblnd/o2iblnd.o |
| Comment by James A Simmons [ 16/Jun/14 ] |
|
Patch has landed. This ticket can be closed. |