[LU-94] llite_lloop should not be part of the client build Created: 23/Feb/11  Updated: 14/Mar/11  Resolved: 14/Mar/11

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.1.0
Fix Version/s: Lustre 2.1.0

Type: Bug Priority: Minor
Reporter: Christopher Morrone Assignee: Brian Murrell (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 10455

 Description   

llite_lloop should really not be part of the client build, since (as far as I know) it is only used by servers. Lets move it to the server build, and make it so that "--disable-server" does NOT build llite_lloop.

I have an added impetus for doing this because llite_lloop does not build on systems with 64k pages, such as RHEL6 on ppc64. But that is a story for another bug.

I made a patch and submitted to gerrit:

http://review.whamcloud.com/#change,261

but the build breaks for Debian systems. I'm not clear on why that is. If someone could look at the debian build, that would be great.



 Comments   
Comment by Jinshan Xiong (Inactive) [ 23/Feb/11 ]

Maybe I missed something here, but lloop really depends on llite.ko so it must be working at client side.

Comment by Oleg Drokin [ 23/Feb/11 ]

Indeed.
lloop is a client-side code. The primary goal initially was to allow clients to swap to Lustre.
It's not used on servers at all.

Comment by Andreas Dilger [ 23/Feb/11 ]

Chris, the lloop driver allows Lustre files/objects to be efficiently exported as block devices on the client, similiar in nature to ZFS ZVOL devices. It was also the basis for swapping on Lustre clients, since it avoided many of the complexities of swap files an normal loop drivers.

Unfortunately, there was no funding to complete this work, and it had fallen somewhat into disrepair, since it borrowed heavily from the original loop device and depends on the block driver API.

I'd really prefer someone to fix up the biology of this device (and ideally make it a first-class supported feature) but there are a lot of other priorities these days.

Comment by Brian Murrell (Inactive) [ 24/Feb/11 ]

I guess it's moot now, but it looks like the Debian (Ubuntu in fact) build is failing because the "SERVER_TRUE" guard is not working for some reason. You can see that in llite, there is still a dependency on llite_lloop.ko for some reason:

Making all in llite
make[6]: Entering directory `/var/lib/hudson/workspace/reviews-ubuntu/BUILD/lustre-2.0.59/debian/tmp/modules-deb/usr_src/modules/lustre/lustre/llite'
make[6]: *** No rule to make target `llite_lloop.ko', needed by `all-am'. Stop.

Comment by Christopher Morrone [ 24/Feb/11 ]

Ah, whoops! I clearly made a bad assumption about what llite_lloop was. Is there a way to change issue titles in jira, like with bugzilla? I am not seeing it...

So I either still need an option to avoid building llite_lloop.ko, or it needs to be fixed to work with >=64k pages. Assuming that the latter will be hard, I'll fix up my patch to trigger off of page size (and maybe a new llite_lloop configure option).

Comment by Christopher Morrone [ 24/Feb/11 ]

I made the build of llite_lloop.ko conditional based on kernel configured page size:

http://review.whamcloud.com/266

It builds under RHEL6 on PPC64 (skips llite_lloop.ko), and under RHEL6 x86_64 (includes llite_lloop.ko) for me. But the build is failing under Whamcloud's Hudson/Jenkins for all platforms. I have no idea why.

I was using these lustre configure parameters:

--disable-server --disable-quilt --disable-liblustre --disable-docs --disable-snmp --enable-panic_dumplog --disable-tests

I wasn't able to find what configure options you guys are using for the patchless-centos5 build, but I assume that --disable-server is in there, and I wouldn't think that the others would matter...

If you could get me the configure options and perhaps the contents of lustre/llite/Makefile after the configure step, that might help me narrow down the problem.

Is it possible that old configure files are left over from previous builds? Maybe the Jenkins setup needs a "git clean -xfd" in there somewhere?

Comment by Niu Yawei (Inactive) [ 24/Feb/11 ]

Hi, Peter

I think for the short term solution, disable lloop build for large page size system is enough, I think we'd better reassign this bug to some build expert.

For the long term solution, as Andreas said, it needs heavy changes in lloop to make it support large page size, and we can continue the discussion in LU-96. (Xiong has made some initial investigation on it)

Comment by Peter Jones [ 28/Feb/11 ]

ok, thanks Niu.

Brian, are you able to handle the build changes outlined?

Chris, it looks like the ticket title can be altered using the Edit tab. If you are not able to do this then let me know the changes that you want to make and I can take care of it for you.

Comment by Brian Murrell (Inactive) [ 28/Feb/11 ]

Chris, you can see what the build process for a Jenkins job is by configuring on the Configure link for the job. i.e.: http://build.whamcloud.com/job/reviews-patchless-centos5/configure, but it's entirely possible that we have ACL'd that. In any case, for the reviews-patchless-centos5 job for example, we first configure with:

./configure --disable-modules

and then create a tarball with:

make dist

and then use lbuild to build the packages:

build/lbuild --ccache --patchless --kerneltree=$bld/kerneltree --kernelrpm=$bld/kernelrpm --reusebuild=$bld/reusebuild --tag=foo --target=2.6-rhel5 --target-archs=x86_64 --distro=rhel5 --lustre=../lustre-$ver.tar.gz

The actual build work can be seen in the rpm spec's %build scriptlet which can be found in the console log (http://build.whamcloud.com/job/reviews-patchless-centos5/62/console):

Executing(%build): /bin/sh -e /var/tmp/rpm-tmp.12226
...
+ eval ./configure --build=x86_64-redhat-linux-gnu --host=x86_64-redhat-linux-gnu --target=x86_64-redhat-linux-gnu --program-prefix= --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64 --libexecdir=/usr/libexec --localstatedir=/var --sharedstatedir=/usr/com --mandir=/usr/share/man --infodir=/usr/share/info --with-linux=/build/hudson/workspace/reviews-patchless-centos5/BUILD/reused/usr/src/kernels/2.6.18-194.17.1.el5-x86_64 --with-linux-obj=/build/hudson/workspace/reviews-patchless-centos5/BUILD/reused/usr/src/kernels/2.6.18-194.17.1.el5-x86_64 --disable-server --enable-liblustre --enable-liblustre-tests --with-release=2.6.18_194.17.1.el5_g69f43da --enable-tests --enable-liblustre-tests
++ ./configure --build=x86_64-redhat-linux-gnu --host=x86_64-redhat-linux-gnu --target=x86_64-redhat-linux-gnu --program-prefix= --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64 --libexecdir=/usr/libexec --localstatedir=/var --sharedstatedir=/usr/com --mandir=/usr/share/man --infodir=/usr/share/info --with-linux=/build/hudson/workspace/reviews-patchless-centos5/BUILD/reused/usr/src/kernels/2.6.18-194.17.1.el5-x86_64 --with-linux-obj=/build/hudson/workspace/reviews-patchless-centos5/BUILD/reused/usr/src/kernels/2.6.18-194.17.1.el5-x86_64 --disable-server --enable-liblustre --enable-liblustre-tests --with-release=2.6.18_194.17.1.el5_g69f43da --enable-tests --enable-liblustre-tests

So ultimately you can see that the configure args are:

 --with-linux=.../2.6.18-194.17.1.el5-x86_64 --with-linux-obj=.../2.6.18-194.17.1.el5-x86_64 --disable-server --enable-liblustre --enable-liblustre-tests --with-release=2.6.18_194.17.1.el5_g69f43da --enable-tests --enable-liblustre-tests

once we strip out the extraneous args that rpmbuild adds.

As for the lustre/llite/Makefile contents, in the dir where the initial configure is run (i.e. in prep for make dist) that file has:

MODULES := lustre
#MODULES += llite_lloop
lustre-objs := dcache.o dir.o file.o llite_close.o llite_lib.o llite_nfs.o
lustre-objs += rw.o lproc_llite.o namei.o symlink.o llite_mmap.o
lustre-objs += xattr.o remote_perm.o llite_rmtacl.o llite_capa.o
lustre-objs += rw26.o super25.o statahead.o
lustre-objs += ../lclient/glimpse.o ../lclient/lcommon_cl.o ../lclient/lcommon_misc.o
lustre-objs += vvp_dev.o vvp_page.o vvp_lock.o vvp_io.o vvp_object.o

#llite_lloop-objs := lloop.o

EXTRA_DIST := $(lustre-objs:.o=.c) llite_internal.h rw26.c super25.c
EXTRA_DIST += $(llite_lloop-objs:.o=.c)
EXTRA_DIST += vvp_internal.h

include /build/hudson/workspace/reviews-patchless-centos5/Rules

However in the directory where that rpbuild is working (i.e. "%_topdir/BUILD/lustre-2.0.59/lustre/llite) the Makefile is:

MODULES := lustre
MODULES += llite_lloop
lustre-objs := dcache.o dir.o file.o llite_close.o llite_lib.o llite_nfs.o
lustre-objs += rw.o lproc_llite.o namei.o symlink.o llite_mmap.o
lustre-objs += xattr.o remote_perm.o llite_rmtacl.o llite_capa.o
lustre-objs += rw26.o super25.o statahead.o
lustre-objs += ../lclient/glimpse.o ../lclient/lcommon_cl.o ../lclient/lcommon_misc.o
lustre-objs += vvp_dev.o vvp_page.o vvp_lock.o vvp_io.o vvp_object.o

llite_lloop-objs := lloop.o

EXTRA_DIST := $(lustre-objs:.o=.c) llite_internal.h rw26.c super25.c
EXTRA_DIST += $(llite_lloop-objs:.o=.c)
EXTRA_DIST += vvp_internal.h

include /build/hudson/workspace/reviews-patchless-centos5/BUILD/BUILD/lustre-2.0.59/Rules

The Makefile.in in that same dir is:

MODULES := lustre
@LLITE_LLOOP_TRUE@MODULES += llite_lloop
lustre-objs := dcache.o dir.o file.o llite_close.o llite_lib.o llite_nfs.o
lustre-objs += rw.o lproc_llite.o namei.o symlink.o llite_mmap.o
lustre-objs += xattr.o remote_perm.o llite_rmtacl.o llite_capa.o
lustre-objs += rw26.o super25.o statahead.o
lustre-objs += ../lclient/glimpse.o ../lclient/lcommon_cl.o ../lclient/lcommon_misc.o
lustre-objs += vvp_dev.o vvp_page.o vvp_lock.o vvp_io.o vvp_object.o

@LLITE_LLOOP_TRUE@llite_lloop-objs := lloop.o

EXTRA_DIST := $(lustre-objs:.o=.c) llite_internal.h rw26.c super25.c
EXTRA_DIST += $(llite_lloop-objs:.o=.c)
EXTRA_DIST += vvp_internal.h

@INCLUDE_RULES@

So your new code is being included in the make dist tarball.

I do notice in the configure output in the %build scriptlet reports:

checking whether to enable llite_lloop module... yes

So somehow the test is failing to operate properly in that build environment. The config.log for that test reports:

configure:9283: checking whether to enable llite_lloop module
configure:9308: cp conftest.c build && make -d modules  CC=gcc -f /build/hudson/workspace/reviews-patchless-centos5/BUILD/BUILD/lustre-2.0.59/build/Makefile LUSTRE_LINUX_CONFIG=/build/hudson/workspace/reviews-patchless-centos5/BUILD/reused/usr/src/kernels/2.6.18-194.17.1.el5-x86_64/.config LINUXINCLUDE= -I/build/hudson/workspace/reviews-patchless-centos5/BUILD/reused/usr/src/kernels/2.6.18-194.17.1.el5-x86_64/include -I/build/hudson/workspace/reviews-patchless-centos5/BUILD/reused/usr/src/kernels/2.6.18-194.17.1.el5-x86_64/arch/x86/include -I/build/hudson/workspace/reviews-patchless-centos5/BUILD/reused/usr/src/kernels/2.6.18-194.17.1.el5-x86_64/include -I/build/hudson/workspace/reviews-patchless-centos5/BUILD/reused/usr/src/kernels/2.6.18-194.17.1.el5-x86_64/include -I/build/hudson/workspace/reviews-patchless-centos5/BUILD/reused/usr/src/kernels/2.6.18-194.17.1.el5-x86_64/include2 -include include/linux/autoconf.h -o tmp_include_depends -o scripts -o include/config/MARKER -C /build/hudson/workspace/reviews-patchless-centos5/BUILD/reused/usr/src/kernels/2.6.18-194.17.1.el5-x86_64 EXTRA_CFLAGS=-Werror-implicit-function-declaration -g -I/build/hudson/workspace/reviews-patchless-centos5/BUILD/BUILD/lustre-2.0.59/libcfs/include -I/build/hudson/workspace/reviews-patchless-centos5/BUILD/BUILD/lustre-2.0.59/lnet/include -I/build/hudson/workspace/reviews-patchless-centos5/BUILD/BUILD/lustre-2.0.59/lustre/include  M=/build/hudson/workspace/reviews-patchless-centos5/BUILD/BUILD/lustre-2.0.59/build
configure:9311: $? = 0
configure:9313: test -s build/conftest.o
configure:9316: $? = 0
configure:9320: result: yes

I'm really not sure why though. Maybe you can add some diagnostics to your test to display why it's finding the results it is.

Comment by Christopher Morrone [ 28/Feb/11 ]

Which patch are you looking at? I closed the original patch and replaced it with:

http://review.whamcloud.com/266

This patch detects if page size is >=64k and then disables llite_loop.ko. As far as I know, all of your systems are 4k, so llite_loop.ko SHOULD be enabled. The build fails on your systems, but works fine under RHEL6.

Comment by Christopher Morrone [ 28/Feb/11 ]

Oh, and I get "Access Denied" when I try to look at the configuration that you pointed me to. After creating the 4th or 5th account at *.whamcloud.com...

Comment by Christopher Morrone [ 28/Feb/11 ]

Peter, I don't think that there is an Edit tab in my view. I can only edit comments (my own) as far as I can tell.

Comment by Peter Jones [ 28/Feb/11 ]

ok Chris, let me know what you want to change the title to and I will take care of it

Comment by Brian Murrell (Inactive) [ 01/Mar/11 ]

I am indeed looking at http://review.whamcloud.com/266. I would agree that the patch does look like it's supposed to detect the page size but clearly, it's failing for some reason. Can you make the test more deterministic, like have the test program actually print the page size and then interpret the output from the program? The problem with a test based on a compile pass/fail is that it can be tripped up environmental problems which cause the program to fail to compile for unrelated issues.

Comment by Christopher Morrone [ 01/Mar/11 ]

Maybe I misunderstand, but you seem to be claiming that this is incorrect:

checking whether to enable llite_lloop module... yes

But that is exactly correct. You are building on x86_64 systems where the kernel page size is almost certainly 4k, so llite_lloop.ko should indeed be built. The check is working.

I think we really need to see more of the build procedure in the Jenkins log. Why isn't that initial "./configure --disable-modules" logged?

It would also be great to change this:

  /bin/bash /tmp/hudson1869624119457289468.sh

to this:

  /bin/bash -x /tmp/hudson1869624119457289468.sh

so we can all see what commands it is running.

My guess at this point is that the "./configure --disable-modules" is to blame, since the later check is clearly working fine. I suspect the test is failing because there is no path to a kernel specified, so it can't include the kernel header.

I probably need another check to always enable llite_lloop.ko when no kernel source is available just to get everything into the dist.

Comment by Christopher Morrone [ 01/Mar/11 ]

Fixed! Although this was basically what I did in the Makefile.in for http://review.whamcloud.com/261, and that failed on ubuntu.

But it works now, so I'm not going to touch it.

Comment by Brian Murrell (Inactive) [ 02/Mar/11 ]

Ahh. Glad you figured it out.

Comment by Brian Murrell (Inactive) [ 07/Mar/11 ]

Chris,

I was going to reassign this back to you to carry through to landing since this is your patch and the build issues are resolved but it seems I cannot. You are not on the list of people I can assign to. If you think this is in error, please don't hesitate to let me know. If not, perhaps the easiest thing is for you to just take the issue back.

Comment by Christopher Morrone [ 07/Mar/11 ]

So who do we ask to review http://review.whamcloud.com/266?

Comment by Brian Murrell (Inactive) [ 08/Mar/11 ]

So who do we ask to review http://review.whamcloud.com/266?

Anyone you want. I've added a review, but as I mentioned, it would be good to get somebody who understands why the module is not working on the given architectures to review also. Perhaps Andreas.

Comment by Peter Jones [ 14/Mar/11 ]

It looks like this is all landed. Please reopen if there is anything further needed

Generated at Sat Feb 10 01:03:42 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.