[LU-5628] Dealing with kernels that have lustre enabled already Created: 15/Sep/14  Updated: 23/Nov/17  Resolved: 20/Sep/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major
Reporter: Oleg Drokin Assignee: Dmitry Eremin (Inactive)
Resolution: Duplicate Votes: 1
Labels: usk

Issue Links:
Related
is related to LU-6083 IB with Ubuntu 14.04 client Resolved
is related to LU-7042 config.h header conflict with OFED 3.18 Resolved
is related to LU-6215 Sync Lustre external tree with lustre... Resolved
is related to LU-6547 build errors building against recent ... Resolved
Rank (Obsolete): 15743

 Description   

Now that the kernels that have lustre (from that staging tree at the moment) included grows and distributions that ship it increase, we need to do something about all the problems this creates for us.

Currently we cannot build our external lustre against such a kernel due to clash in config defines e.g.:

make[1]: Entering directory `/home/green/bk/x86'
  CC [M]  /home/green/git/lustre-current/libcfs/libcfs/linux/linux-tracefile.o
In file included from <command-line>:0:0:
/home/green/git/lustre-current/config.h:26:0: error: "CONFIG_LNET_MAX_PAYLOAD" redefined [-Werror]
 #define CONFIG_LNET_MAX_PAYLOAD LNET_MTU
 ^
In file included from /home/green/bk/linux/include/linux/kconfig.h:4:0,
                 from <command-line>:0:
include/generated/autoconf.h:1571:0: note: this is the location of the previous definition
 #define CONFIG_LNET_MAX_PAYLOAD 1048576
 ^
cc1: all warnings being treated as errors

Once the lustre is moved out of staging tree, another problem will be added - clashing of symbols from lustre includes in the kernel tree (now hidden in secluded staging location so not a problem immediately).

Once the config symbols clash is resolved - the other problem is the clash in module names between in-kernel lustre and out of kernel lustre. Due to in-kernel implementation mostly being geared towards clients and also lacking our debugging aids and such - these modules are not interchangeable really and we need to do something about it too - possibly consider renaming our out of tree modules? This will become a problem once distributions start to enable lustre by default in their kernels (so not a big problem yet too).

Finally there are bound to be symbol clashes between in and out-of kernel lustre modules so we need to do something about that too I suspect, but not sure what so far. A wrapper to change the name a bit?



 Comments   
Comment by Robert Read (Inactive) [ 13/Oct/14 ]

We'll also need to enable distributions to be able to package the client utilities so users can actually use the modules.

Comment by Minh Diep [ 14/Jan/15 ]

also saw this

/root/inkernel/debian/tmp/modules-deb/usr_src/modules/lustre/libcfs/include/libcfs/linux/linux-mem.h: In function ‘set_shrinker’:
/root/inkernel/debian/tmp/modules-deb/usr_src/modules/lustre/libcfs/include/libcfs/linux/linux-mem.h:135:10: error: ‘struct shrinker’ has no member named ‘shrink’
s->shrink = func;
^
In file included from /root/inkernel/debian/tmp/modules-deb/usr_src/modules/lustre/libcfs/include/libcfs/linux/libcfs.h:53:0,
from /root/inkernel/debian/tmp/modules-deb/usr_src/modules/lustre/libcfs/include/libcfs/libcfs.h:47,
from /root/inkernel/debian/tmp/modules-deb/usr_src/modules/lustre/libcfs/libcfs/linux/linux-tracefile.c:40:
/root/inkernel/debian/tmp/modules-deb/usr_src/modules/lustre/libcfs/include/libcfs/linux/linux-prim.h: At top level:
/root/inkernel/debian/tmp/modules-deb/usr_src/modules/lustre/libcfs/include/libcfs/linux/linux-prim.h:100:1: error: unknown type name ‘read_proc_t’
typedef read_proc_t cfs_read_proc_t;
^
/root/inkernel/debian/tmp/modules-deb/usr_src/modules/lustre/libcfs/include/libcfs/linux/linux-prim.h:101:1: error: unknown type name ‘write_proc_t’
typedef write_proc_t cfs_write_proc_t;
^
In file included from /root/inkernel/debian/tmp/modules-deb/usr_src/modules/lustre/libcfs/include/libcfs/libcfs.h:305:0,
from /root/inkernel/debian/tmp/modules-deb/usr_src/modules/lustre/libcfs/libcfs/linux/linux-tracefile.c:40:
/root/inkernel/debian/tmp/modules-deb/usr_src/modules/lustre/libcfs/include/libcfs/params_tree.h: In function ‘LPROCFS_ENTRY_CHECK’:
/root/inkernel/debian/tmp/modules-deb/usr_src/modules/lustre/libcfs/include/libcfs/params_tree.h:85:17: error: dereferencing pointer to incomplete type
spin_lock(&(dp)->pde_unload_lock);
^
/root/inkernel/debian/tmp/modules-deb/usr_src/modules/lustre/libcfs/include/libcfs/params_tree.h:86:8: error: dereferencing pointer to incomplete type
if (dp->proc_fops == NULL)
^
/root/inkernel/debian/tmp/modules-deb/usr_src/modules/lustre/libcfs/include/libcfs/params_tree.h:88:19: error: dereferencing pointer to incomplete type
spin_unlock(&(dp)->pde_unload_lock);
^

Comment by James A Simmons [ 14/Jan/15 ]

Those errors are due to procfs api changes upstream which should be resolved by the patches from LU-5275. Which debian/Ubuntu are you testing on?

Comment by Minh Diep [ 16/Jan/15 ]

I am testing on Ubuntu 14.04

Comment by James Beal [ 22/Apr/15 ]

Any news as I am seeing this with lustre 2.7 on Ubuntu 14.04

Comment by James A Simmons [ 23/Apr/15 ]

I'm also working with on Ubuntu 14.04 as well and just pushed some patches to make the intel branch of lustre functional. As for making it work with the upstream client that is included it will require a bit of work which I haven't had the time to do. So basically we have do something along the lines of OFED. Besides handing CONFIG_LNET_MAX_PAYLOAD we have to modify Module.symvers so that the correct lustre modules have to be updated. Currently make debs places the lustre modules in kernel/fs instead of updates. That needs to be fixed first.

Comment by James Beal [ 23/Apr/15 ]

Thanks for that, in our use case we use a redhat kernel on our servers with the user space being ubuntu but for our clients we want to use the real client and the default kernel. We use dkms for our client modules so that works for us . I could email you a link to the system we use to build things if that would help at all ( it works in vagrant and aws ).

Comment by James A Simmons [ 23/Apr/15 ]

Looking at the module-assisant man pages it appears that KPKG_DEST_DIR can be used to place the lustre modules into the update directory. Perhaps that is not the best solution yet since I am not a debian package expert by any means. Anyone debian package gurues here?

Comment by Nathaniel Clark [ 17/Aug/15 ]

WORKAROUND:
Ubuntu 14.04 LTS
Linux kernel 3.13

./configure --disable-server --enable-quota --with-max-payload-mb=1

edit config.h to replace ((1)<<20) with 1048576

Comment by Dmitry Eremin (Inactive) [ 14/Sep/15 ]

The patch http://review.whamcloud.com/16418 will also resolve this. The issue is common with LU-7042.

Comment by James A Simmons [ 15/Sep/15 ]

Patch http://review.whamcloud.com/16418 will resolve the config.h issues with the upstream kernel but for OpenSFS/Intel lustre to run instead of the upstream client we need to modify Module.symvers to replace the symbols form the upstream clients with the master branch much like we do for the OFED external stacks.

Comment by Dmitry Eremin (Inactive) [ 16/Sep/15 ]

Why we need this? Are we assume somebody will link with our modules? What the reason to provide our symbols versions for other?

Comment by James A Simmons [ 16/Sep/15 ]

For LNet this is the case. In the wild exist external kernel modules that use LNet like DVS from Cray.

Have you tried Intel Lustre on a Distro with upstream Lustre enabled? I have newer Ubuntu versions on the IBM PowerPC but for some mysterious reason Lustre is disabled unlike other Ubuntu systems

Comment by Robert Read (Inactive) [ 16/Sep/15 ]

Amazon Linux includes upstream Lustre, and I've heard you can install el6 lustre-client rpm to be able to mount a filesystem. Haven't heard how well the it actually works, though.

Comment by Dmitry Eremin (Inactive) [ 17/Sep/15 ]
  • I agree we need to provide Module.symvers for DVS from Cray. But we don't need to modify any file from kernel.
  • I checked the compiled Lustre client for Ubuntu 14.04 works fine and overwrites in-kernel version like OFED package do.

So, this is additional ticket to provide Module.symvers for external programs that would like to link with our modules. I need more info about those programs. How they link now? What API they use?

Comment by James A Simmons [ 18/Sep/15 ]

I know DVS from Cray is closed source so no one can help much there. Thinking about it it should be up to the external packages to handle the Module.symvers issue themselves. Better way to handle this is sync up libcfs/lnet upstream with master After patch 16418 lands we can close this ticket.

Comment by Dmitry Eremin (Inactive) [ 19/Sep/15 ]

Patch landed to master.

Comment by Peter Jones [ 20/Sep/15 ]

Fix tracked under LU-7042

Generated at Sat Feb 10 01:53:08 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.