[LU-8852] Add build option to avoid weak module updates Created: 18/Nov/16 Updated: 11/May/17 Resolved: 08/May/17 |
|
| Status: | Closed |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.9.0 |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major |
| Reporter: | Doug Oucharek (Inactive) | Assignee: | Minh Diep |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | None | ||
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
The use of weak module updates starting in RHEL 7.2 is causing me no end of grief. I cannot get RPMs to install anymore starting in RHEL 7.2 because of this feature. Can a build option be added to build the RPMs as we did before RHEL 7.2 and avoid weak module updates? |
| Comments |
| Comment by James A Simmons [ 18/Nov/16 ] |
|
What exactly is the problem? |
| Comment by Doug Oucharek (Inactive) [ 18/Nov/16 ] |
|
When I tried to load the kmod-based RPM (where lnet.ko is), it complains that I am missing a path in my /lib/modules directory. I have tried to do tricks like symbolic links to "fake" the existence of the path but then get a whole bunch of missing symbols errors. So, it appears that my build environment is not properly set up for building the RPMs with weak module updates. I am following the Wiki page for building Lustre from source. It would seem those instructions are not valid as of RHEL 7.2. To be honest, I am not groking the whole weak module update feature so I don't have a basis on which I can start debugging this problem. Nor am I seeing the value of that feature and question why we are using it. Given that "make install" is broken for Lustre (and has been for years), I have been forced to rely on "make rpms" and now thanks to weak module updates, I can't get that working anymore either. That leave the possibility I have to use lbuild. Please don't ask me to go down that rat hole! I just want "make rpms" to produce something I can actually install again, or even better, have "make install" work again so I can avoid all the packaging issues altogether when I am doing development work. |
| Comment by Patrick Farrell (Inactive) [ 18/Nov/16 ] |
|
I cannot agree more with Doug, this feature is a huge pain given the lack of documentation around how to actually build with it. Doug, for what it's worth, I was able to get the RPMs I built for CentOS 7.X (I think 7.2?) to work by installing them with --nodeps (I installed all of the produced RPMs from make rpms). But that's not right either, obviously, and it sounds like your problem is worse...? |
| Comment by Micah Bhakti (Inactive) [ 02/Dec/16 ] |
|
The purpose for weak module support is to remove the need for end customers to rebuild when performing minor kernel updates (CVEs) by maintaining ABI compatibility. It sounds like documentation is a problem, as well as the larger lbuild vs make conflicts. I would suggest the removing kernel patching (in progress) and providing documentation and support for the use of make files would address this. Please let me know if you believe that would address the problem. |
| Comment by Doug Oucharek (Inactive) [ 02/Dec/16 ] |
|
What you say Micah sounds good in theory. In practise this feature has made me turn to other "tricks" when I need to change a kernel module (in my case, lnet.ko). Rather than make it easier to install modules as you have indicated, this feature has stopped me entirely from being able to do so (via RPMs). The best advise I have been given is to avoid RPMs entirely as they are a pain for developers who need to change code frequently. Apparently, the best approach here is to nfs mount your local build tree on the cluster node, change to the tests directory in that build tree, and execute: LOAD=only sh llmount.sh Apparently, llmount.sh has the ability to load modules manually for you depending on where you run it. Hopefully other developers will see this ticket and learn this trick to avoid RPMs as well. |
| Comment by Minh Diep [ 13/Dec/16 ] |
Please file a new ticket if this is supported but not working. Please describe exactly the steps you built lustre rpms or install the rpms from https://build.hpdd.intel.com |
| Comment by Minh Diep [ 13/Dec/16 ] |
|
Hi paf Have you looked at Let me know if any issue you found. I am happy to look into it |
| Comment by Christopher Morrone [ 13/Dec/16 ] |
|
Doug, can you elaborate on your issue? We use RHEL, and the weak modules stuff is working great for us. At what point in the process are you seeing complaints about /lib/modules, and what exactly is the error? I suspect that your issue is fixable without any drastic steps to avoid weak-modules. And yes, the hpdd wiki build instructions are probably not correct. They weren't really very correct even before weak modules came along. |
| Comment by Doug Oucharek (Inactive) [ 14/Dec/16 ] |
|
I'm able to build the Lustre RPMs just fine. My problem comes when I try to load them via rpm -Uvh. I get complaints of a missing directory under /lib/modules. I've tried using the "nodeps" and "force" options and the kmod's will still not load. I've run into this problem trying to load the kmods on the system I have built them on and on a system I have copied them to. I'm sure I am doing something wrong, but have no idea what that is. This feature is not documented and I really don't understand it at all. What I would like to see is a screencast where someone builds Lustre RPMs on a RHEL 7.3 build system, transfers them to a production system which is already running Lustre, and successfully install the updated Lustre modules. |
| Comment by Christopher Morrone [ 14/Dec/16 ] |
|
I rather suspect that there is something else wrong in your build environment before starting the LLustre build that is causing you the trouble. But knowing what exactly you are doing would make it easier to help you. If you take a clean RHEL7.2/RHEL7.3 system, install ZFS devel rpms and kernel devel rpms (and the kernel debuginfo package), then take the lustre source and do basically: sh autogen.sh && ./configure && make rpms you will almost certainly get lustre rpms that cleanly install on those RHEL systems. We do this all the time here, and upgrading packages works just fine as well. We even actually take advantage of the weak-modules parts by installing a Lustre that was built against a different version of the kernel (but one, of course, that offers the same symbol versions). We don't need to tightly couple the kernel and Lustre builds. And the symbol dependencies are all correctly expressed in the rpms! Finally, rpm packages that work the way they are supposed to! Nirvana! Of course, if you want to run the ldiskfs tests, you also need a patched kernel until
If you don't want to build rpms of Lustre's prerequisites, then it isn't terribly reasonable to expect to be able to build Lustre rpms. The rpm system needs to be honored all the way through to be able to use rpm correctly. So the solution is not to add a build option to avoid weak modules; it is just to learn the correct way to build things in the first place. I'm willing to help by sharing what I know. I don't think I'll make a screencast, or write the documentation myself, but I'll be happy to engage in conversation, and mentor someone who does want to do those things. |
| Comment by Patrick Farrell (Inactive) [ 04/Jan/17 ] |
|
Minh, I've managed to sort out ways to get things working - I would be curious to know how Intel's developers build for themselves these days, since I'm stuck ignoring module dependencies... But this is just in my test environment, Cray's not having any trouble in our larger build systems. So, I'm fine. Thanks for looking at this. |
| Comment by Minh Diep [ 04/Jan/17 ] |
|
Patrick, |
| Comment by Patrick Farrell (Inactive) [ 04/Jan/17 ] |
|
Sure, Minh. It's CentOS 6 (various versions as time goes by, but including very recent ones). I build the kernel (from source) and Lustre using the process on the wiki, then install the kernel on another node. When installing Lustre, I get unmet ksyms dependencies. My kernel build doesn't seem to generate anything satisfying them. If I do --no-deps when installing Lustre RPMs, Lustre works fine. |
| Comment by Minh Diep [ 04/Jan/17 ] |
|
Patrick, I assume you are referring to this wiki https://wiki.hpdd.intel.com/display/PUB/Building+Lustre+from+Source |
| Comment by Patrick Farrell (Inactive) [ 04/Jan/17 ] |
|
Yes, that's it. |
| Comment by Minh Diep [ 23/Jan/17 ] |
|
Hi Doug, Patrick, https://wiki.hpdd.intel.com/pages/viewpage.action?pageId=52104622 is new walk-thru on centos/rhel7.3. Please review and see if this helps you resolve your issues. |
| Comment by Doug Oucharek (Inactive) [ 23/Jan/17 ] |
|
Hi Minh, Some things I have run into with your new page:
I was able to start with a clean CentOS 1611 build (same as RHEL 7.3) and got Lustre built, installed, and was able to run the basic test. I then tried to make a change to LNet and do a "make; make install" to see if I can just install changed modules. This worked, however, it did not install the lnet.ko file in the correct place so it was not being loaded when I did a "modprobe lnet". The original lnet.ko was installed in: ./3.10.0-514.2.2.el7_lustre.x86_64/extra/lustre/net/lnet.ko My new changed lnet.ko got installed here: ./3.10.0-514.2.2.el7_lustre.x86_64/extra/kernel/net/lustre/lnet.ko Any idea how to fix this? |
| Comment by James A Simmons [ 25/Apr/17 ] |
|
I started to look at this again. Doug can you post the results for [rpm -qp --scripts kmod-lustre-*.rpm] |
| Comment by Doug Oucharek (Inactive) [ 25/Apr/17 ] |
|
Where/when I am to issue that rpm command? Just did on my build VM and it came back with nothing. |
| Comment by Christopher Morrone [ 26/Apr/17 ] |
|
In the directory where the package files exist that match the glob "kmod-lustre-*.rpm". The "q" option to rpm means your are doing a query, and the "p" option means that you are operating on a package file (as opposed to querying the system rpm database). |
| Comment by Christopher Morrone [ 26/Apr/17 ] |
|
Doug, the issue you describe is almost certainly explained by this comment in the lustre.spec.in:
Someone could, perhaps, work on adding system type detection into the autoconf configuration and making it capable of knowing the correct install path on various Linux distributions (and even various releases within a single distribution). Then the code could perhaps be removed from the spec file making it a bit simpler. But frankly what you are trying to do will almost always be fraught with peril. I think a good rule of thumb is this: Do not mix package-installed files and "make install" installed files on the same system. If you want to use "make install" (which makes lots of sense when you are doing rapid development install cycles), then you should first purge (remove) all of the lustre packages that are installed. When you overwrite files that rpm installed, you make rpm cry. Don't be mean to rpm. (Try running "rpm ---verify" on a package that has a file you overwrote with changes and you will see the problem that you have introduced.) |
| Comment by Christopher Morrone [ 26/Apr/17 ] |
|
I should probably elaborate a bit. Even though I said "Someone could, perhaps, work on adding system type detection into the autoconf configuration", I am not actually proposing that someone should do that. I don't think that is currently the proper place to deal with the various differences in systems' packaging requirements. I would argue that "make install" is probably never something that anyone should use with Lustre during development. Instead it is usually better to use llmount.sh and work out of the development directory for rapid development cycles. Ultimately, this ticket has largely devolved into sharing good development practices, and sharing information about how the packaging system works. The work proposed in the original "Description" section of this ticket should not be done. I would propose that we close this ticket, and take further discussion to somewhere like the lustre-devel mailing list. Should work items be identified in the future, new tickets specific to those tasks can be created. |
| Comment by James A Simmons [ 26/Apr/17 ] |
|
Actually the reason I brought up the rpm query for the scripts is that I'm seeing incorrect names for the /lib/modules patch. The post install scripts believe the modules to be in /lib/modules/`(uname).x86_64.x86_64.x86_64. I have no idea why its doing this. This is something we saw while loading rpms into an image for PFL testing. I was wondering if this what Doug was seeing. |
| Comment by James A Simmons [ 26/Apr/17 ] |
|
Exact output: rpm -qp --scripts kmod-lustre-2.9.56_10_g588831e_dirty-1.el7.x86_64.rpm modules=( $(find /lib/modules/3.10.0-514.10.2.el7.x86_64.x86_64.x86_64/extra/lustre | grep '\.ko$') ) modules=( $(cat /var/run/rpm-kmod-lustre-modules) ) |
| Comment by Doug Oucharek (Inactive) [ 26/Apr/17 ] |
|
Ok, here is what I get: [root@centos-7 lustre-release3]# rpm -qp --scripts kmod-lustre-*.rpm
postinstall scriptlet (using /bin/sh):
if [ -e "/boot/System.map-3.10.0-514.2.2.el7_lustre.x86_64" ]; then
/usr/sbin/depmod -aeF "/boot/System.map-3.10.0-514.2.2.el7_lustre.x86_64" "3.10.0-514.2.2.el7_lustre.x86_64" > /dev/null || :
fi
modules=( $(find /lib/modules/3.10.0-514.2.2.el7_lustre.x86_64/extra/lustre | grep '\.ko$') )
if [ -x "/sbin/weak-modules" ]; then
printf '%s\n' "${modules[@]}" | /sbin/weak-modules --add-modules
fi
preuninstall scriptlet (using /bin/sh):
rpm -ql kmod-lustre-2.9.55_49_g0fe2577_dirty-1.el7.centos.x86_64 | grep '\.ko$' > /var/run/rpm-kmod-lustre-modules
postuninstall scriptlet (using /bin/sh):
if [ -e "/boot/System.map-3.10.0-514.2.2.el7_lustre.x86_64" ]; then
/usr/sbin/depmod -aeF "/boot/System.map-3.10.0-514.2.2.el7_lustre.x86_64" "3.10.0-514.2.2.el7_lustre.x86_64" > /dev/null || :
fi
modules=( $(cat /var/run/rpm-kmod-lustre-modules) )
rm /var/run/rpm-kmod-lustre-modules
if [ -x "/sbin/weak-modules" ]; then
printf '%s\n' "${modules[@]}" | /sbin/weak-modules --remove-modules
fi
postinstall scriptlet (using /bin/sh):
if [ -e "/boot/System.map-3.10.0-514.2.2.el7_lustre.x86_64" ]; then
/usr/sbin/depmod -aeF "/boot/System.map-3.10.0-514.2.2.el7_lustre.x86_64" "3.10.0-514.2.2.el7_lustre.x86_64" > /dev/null || :
fi
modules=( $(find /lib/modules/3.10.0-514.2.2.el7_lustre.x86_64/extra/lustre-osd-ldiskfs | grep '\.ko$') )
if [ -x "/sbin/weak-modules" ]; then
printf '%s\n' "${modules[@]}" | /sbin/weak-modules --add-modules
fi
preuninstall scriptlet (using /bin/sh):
rpm -ql kmod-lustre-osd-ldiskfs-2.9.55_49_g0fe2577_dirty-1.el7.centos.x86_64 | grep '\.ko$' > /var/run/rpm-kmod-lustre-osd-ldiskfs-modules
postuninstall scriptlet (using /bin/sh):
if [ -e "/boot/System.map-3.10.0-514.2.2.el7_lustre.x86_64" ]; then
/usr/sbin/depmod -aeF "/boot/System.map-3.10.0-514.2.2.el7_lustre.x86_64" "3.10.0-514.2.2.el7_lustre.x86_64" > /dev/null || :
fi
modules=( $(cat /var/run/rpm-kmod-lustre-osd-ldiskfs-modules) )
rm /var/run/rpm-kmod-lustre-osd-ldiskfs-modules
if [ -x "/sbin/weak-modules" ]; then
printf '%s\n' "${modules[@]}" | /sbin/weak-modules --remove-modules
fi
postinstall scriptlet (using /bin/sh):
if [ -e "/boot/System.map-3.10.0-514.2.2.el7_lustre.x86_64" ]; then
/usr/sbin/depmod -aeF "/boot/System.map-3.10.0-514.2.2.el7_lustre.x86_64" "3.10.0-514.2.2.el7_lustre.x86_64" > /dev/null || :
fi
modules=( $(find /lib/modules/3.10.0-514.2.2.el7_lustre.x86_64/extra/lustre-tests | grep '\.ko$') )
if [ -x "/sbin/weak-modules" ]; then
printf '%s\n' "${modules[@]}" | /sbin/weak-modules --add-modules
fi
preuninstall scriptlet (using /bin/sh):
rpm -ql kmod-lustre-tests-2.9.55_49_g0fe2577_dirty-1.el7.centos.x86_64 | grep '\.ko$' > /var/run/rpm-kmod-lustre-tests-modules
postuninstall scriptlet (using /bin/sh):
if [ -e "/boot/System.map-3.10.0-514.2.2.el7_lustre.x86_64" ]; then
/usr/sbin/depmod -aeF "/boot/System.map-3.10.0-514.2.2.el7_lustre.x86_64" "3.10.0-514.2.2.el7_lustre.x86_64" > /dev/null || :
fi
modules=( $(cat /var/run/rpm-kmod-lustre-tests-modules) )
rm /var/run/rpm-kmod-lustre-tests-modules
if [ -x "/sbin/weak-modules" ]; then
printf '%s\n' "${modules[@]}" | /sbin/weak-modules --remove-modules
fi
[root@centos-7 lustre-release3]#
|
| Comment by James A Simmons [ 26/Apr/17 ] |
|
Okay so my "x86_64.x86_64.x86_64" problem is something broken on our side Now that we have a understanding of what the problem is for Doug maybe we can have a work around. First he knows he has to remove any lustre kernel module rpms before doing a make install. I found running depmod -v "kernel version" after make install helps resolve the I don't see the module issues. Maybe we can create a special make install that does a depmod afterwards in the case DESTDIR is pointing to the standard /lib/modules locate? |
| Comment by Dmitry Eremin (Inactive) [ 27/Apr/17 ] |
|
James, the "x86_64.x86_64.x86_64" problem is related to kmodtool issue. We have a patch for this contrib/lbuild/rhel7/kmodtool.patch. Actually the following command: make install DESTDIR=/tmp/lustre will install all files into root directory /tmp/lustre but modules layout will be a little bit different. So, you can rearrnge and use them later as you wish. |
| Comment by Dmitry Eremin (Inactive) [ 27/Apr/17 ] |
|
The real issue I cannot identify yet is I have dependency issue when I build Lustre rpms for kernel which is not installed on developers system. It looks somethis wrong with getting symbols versions from installed kernel instead of development Module.symvers. |
| Comment by James A Simmons [ 27/Apr/17 ] |
|
You mean the: ksym(snprintf) = 0x28318305 is needed by kmod-lustre-tests-2.9.56_37_g170658f_dirty-1.el7.x86_64 type errors. |
| Comment by Dmitry Eremin (Inactive) [ 27/Apr/17 ] |
|
Yes, but after forceful install all works fine. |
| Comment by Doug Oucharek (Inactive) [ 27/Apr/17 ] |
|
How do you do a forceful install (via "make install")? |
| Comment by Christopher Morrone [ 28/Apr/17 ] |
|
First, the "x86_64.x86_64.x86_64" that is worked around in contrib/lbuild/rhel7/kmodtool.patch should not be described a "kmodtool issue". kmodtool works correctly. The problem that kmodtool.patch addresses is one of lbuild inventing a non-standard (and ill advised, in my opinion) build approach, so it needs hacked non-standard tools to match its non-standard approach. Hence the kmodtool.patch. James, I can't say if lbuild's issue is also your issue. I have only seen that as a result of lbuild's broken build process. What does your build procedure look like?
The answer is: Don't do that. You can't properly build a packages if said package's build prerequisites are not also properly packaged and properly installed. That right there is a process bug. Fix the process, and you'll get properly built lustre packages. If you want to generate working Lustre packages for RHEL, you must first package and install the kernel against which you want to compile. It really is that simple. Most of these problems arise from trying to side step that rule. lbuild itself violates the rule, and therefore needs a bunch of very ugly hacks to make something approaching a properly built package.
I think our general advice for developers should also be not to use "make install" for testing. It is almost always going to be better to use "llmount.sh", or "LOAD=only sh llmount.sh". |
| Comment by Doug Oucharek (Inactive) [ 28/Apr/17 ] |
|
I don't agree with developers not being able to use "make install". Even the Linux kernel lets us do: "make menuconfig; make; make modules-install; make install". I currently find it easier to work on LNet in the staging area than I do in the community repo because of being pushed down this complex maze of Linux script-fu. LNet development work does not need nor want to mount Lustre. And I have been very unsuccessful at using the llmount.sh for doing the basic things I need to do as a developer. It has gotten to a point where I am now relying on Gerrit and loadjenkinsbuild for even experimental work which I plan to throw away just to get around the many blockades I am finding with working on LNet. I get the code...I don't at all get the build/install system being used by Lustre. It is hard enough to grok the complex code of a system like Lustre, we should not then have to tackle an equally complex system for building/installing it. |
| Comment by Christopher Morrone [ 06/May/17 ] |
|
> It is hard enough to grok the complex code of a system like Lustre, we should not then have to tackle an equally complex system for building/installing it. Well, that is an awfully nice dream land you seem to be living in over there. The fact is that building/installing is itself very complex in lustre. I don't know what "staging area" and "community repo" mean, or why they would be different. I'll assume for now that you mean pre-2.9 build system versus post-2.9 build system... Yes, the pre-2.9 build system was designed by, and for, core lustre developers. It fragrantly ignored the needs of the customer for proper packaging. Trying to make the post-2.9 build system work exactly the same for developers without any adjustment at all would have turned a very complex task (it took literally years to get the 2.9 build system into master) into one requiring super human abilities. But look, you really want to use "make install"? OK, fine. We've explained how to do that: Don't mix it with rpm packages. If you want to use make install, remove the packages that will be damaged by that command any way. This isn't even a Lustre-specific kind of rule, it is the sort of thing that a developer who doesn't want to break his rpm database would do for any package under development. Is that really a blockade? Does it require any special script-fu knowledge? What am I missing here? I can commiserate with llmount.sh not doing everything you might want. It is certainly oriented primarily toward mounting. But if there are other features you would like to see, you might open tickets on those. I don't think it would be terribly difficult to add more scripts that allow more fine-grain module load/unload operations out of the build tree. So I hear that you are frustrated about having to change your process. I understand. Change is hard. I've tried to present a couple of simple rules that make it easier. I also understand that you don't feel that you should need to understand the complexities of the build/packaging system. I understand that as well. You have work that you need to get done, and nobody has time to learn everything about everything. If you just want to vent, then go for it. I get that too. But if you want to make forward progress, then I think the best bet is to open tickets or start threads on lustre-devel about the specific use cases that make your life difficult. That way the people that have some knowledge of the build/package complexities can work to devise solutions that will work reasonably well for almost everyone. |
| Comment by Minh Diep [ 11/May/17 ] |
|
Sorry to join the party too late. I believe the rpm installed the modules under /lib/modules/<kversion>/extra/lustre and when we do make install, it put the modules under /lib/modules/<kversion>/extra/kernel. something is going wrong here I think. so... in lustre-build-linux.m4 AC_MSG_CHECKING([for Linux kernel module package directory]) AC_ARG_WITH([kmp-moddir], ▶·······AC_HELP_STRING([--with-kmp-moddir=string], ▶·······▶·······[set the kmod updates or extra directory]), ▶·······[KMP_MODDIR=$withval ▶······· IN_KERNEL=''],[ ▶·······AS_IF([test x$RHEL_KERNEL = xyes], [KMP_MODDIR="extra/kernel"], <<<<<< here ▶······· [test x$SUSE_KERNEL = xyes], [KMP_MODDIR="updates/kernel"]) ▶·······IN_KERNEL="${PACKAGE}"]) AC_MSG_RESULT($KMP_MODDIR) this means that make install is putting it where we tell it by default and in the spec file %eval_configure $CONFIGURE_ARGS \
▶·······%{?with_lustre_tests:--enable-tests}%{!?with_lustre_tests:--disable-tests} \
▶·······%{?with_lustre_utils:--enable-utils}%{!?with_lustre_utils:--disable-utils} \
▶·······%{?with_lustre_modules:--enable-modules}%{!?with_lustre_modules:--disable-modules} \
▶·······%{!?with_shared:--disable-shared} \
▶·······%{!?with_static:--disable-static} \
▶·······%{!?with_lustre_iokit:--disable-iokit} \
▶·······%{!?with_ldiskfs:--disable-ldiskfs} \
▶·······%{!?with_servers:--disable-server} \
▶·······%{!?with_zfs:--without-zfs} \
▶·······%{!?with_lnet_dlc:--disable-dlc} \
▶·······%{!?with_manpages:--disable-manpages} \
▶·······--with-linux=%{kdir} \
▶·······--with-linux-obj=%{kobjdir} \
▶·······--with-kmp-moddir=%{kmoddir}/%{name} <<<<<
this means building using spec file will put modules in .../extra/lustre to make make install put in the same directory so mod probe can load, we do ./configure --with-kmp-moddir=extra/lustre ... |