[LU-5614] use %kernel_module_package for weak-updates Created: 12/Sep/14 Updated: 06/Jun/17 Resolved: 18/Jul/16 |
|
| Status: | Closed |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.9.0 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Stephen Champion | Assignee: | Minh Diep |
| Resolution: | Fixed | Votes: | 1 |
| Labels: | llnl, patch | ||
| Attachments: |
|
||||||||||||||||||||||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||
| Severity: | 3 | ||||||||||||||||||||||||||||||||||||||||||||
| Rank (Obsolete): | 15704 | ||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
The correct way to support weak-updates in rpm packages is the vendor defined %kernel_Module_package. This does the right thing on all distributions. We have used this feature in SGI Lustre for several years, and I plan to work this feature back into the master branch. |
| Comments |
| Comment by Peter Jones [ 12/Sep/14 ] |
|
Minh Could you please review these patches when Steve supplies them? Thanks Peter |
| Comment by Stephen Champion [ 16/Sep/14 ] |
|
Don't hold your breathe waiting for them ;^) I started to take a look, and realized I need to have a handle on what lbuild does before I work this out. |
| Comment by Brian Murrell (Inactive) [ 16/Sep/14 ] |
|
Can you just paste your complete lustre.spec here and we can see which side gets to it first? |
| Comment by Stephen Champion [ 16/Sep/14 ] |
|
Simplified version of the SGI spec |
| Comment by Stephen Champion [ 16/Sep/14 ] |
|
Anyone is very welcome to jump ahead of me. I just attached a stripped down version of the spec used for our 2.4.2 release. Essentially, the %kernel_module_package macro replaces the module package and all of the associated pre/post scripts. The tricky bit is that it expects to build for every installed kernel. You either have to override this or deal with it. There are also some hazards in the macro itself : ie, it expects that all installed kernels have the same version. All of this gets especially nasty dealing with the server packages, which is the default flavor with the version string twiddled (I dealt with this in SGI releases by building the server kernel as a distinct flavor). I suspect the answer is simple to not do weak-modules for servers, as there is limited benefit to it on servers. I was considering having a separate spec: either for clients vs servers, or for --with-weak-updates vs --without-weak-updates. |
| Comment by Stephen Champion [ 25/Sep/14 ] |
|
http://review.whamcloud.com/12063 Intended only as a conversation piece at this point. |
| Comment by Stephen Champion [ 30/Sep/14 ] |
|
^Build failed because there are multiple versions of the same kernel installed. %kernel_module_package normally uses rpm to query the installed kernel version, and packages modules for the first listed kernel version. That resulted in a mismatch between the kernel the modules were built for and the kernel the modules were (failed to be) packaged for. I'm exploring some ways around this. |
| Comment by Stephen Champion [ 02/Oct/14 ] |
|
With RedHat, we can define kernel_version to override the default selection of which kernel version to build for. SuSE does not have anything like that, but it looks there is a mechanism with %suse_kernel_module_package (which is the underlying implementaion of %kernel_module_package on SLES). I will investigate that, and perhaps engage SuSE to get the SLES %kernel_module_package to use that facility in a way which is compatible with RH's. In the meantime, I'm looking at the build failure from http://review.whamcloud.com/#/c/12063/2, which I could not reproduce. On my system, /usr/lib64/lustre/mount_osd_xyz.so is packaged with the osd xyz kmp. I suspect something odd about % {with lustre_utils}. |
| Comment by John Fuchs-Chesney (Inactive) [ 08/Oct/14 ] |
|
Any further news on this one Stephen? Thanks, |
| Comment by Stephen Champion [ 08/Oct/14 ] |
|
Building on local systems with './configure ; make rpms', the patch I have is able to build for redhat when multiple kernel package versions are installed. It works on SLES with a single version installed. Building with Intel's jenkin's, these files: which should be part of the their respective osd module packages, are not picked up, and I was unable to identify why this difference arises or how to address in it the time I had available. |
| Comment by Gerrit Updater [ 26/Mar/15 ] |
|
Alexander Boyko (alexander.boyko@seagate.com) uploaded a new patch: http://review.whamcloud.com/14191 |
| Comment by Minh Diep [ 30/Jun/15 ] |
|
I found that it needs to have kernel-devel installed on the builder. As we use lbuild to extract the kernel-devel on to the kdir, it doesn't seem to work ++ /usr/bin/rpmbuild --target x86_64 -tb /mnt/build/lustre-release/lustre-2.7.55.tar.gz --without servers --define '__find_requires /mnt/build/lustre-release/BUILD/find-requires' --define 'configure_args --with-o2ib=no' --define 'kdir /mnt/build/lustre-release/BUILD/reused/usr/src/kernels/2.6.32-504.23.4.el6.x86_64' --define '_tmppath /var/tmp' --define '_topdir /mnt/build/lustre-release/BUILD' |
| Comment by Stephen Champion [ 01/Jul/15 ] |
|
lbuild and the environment it operates in is sufficiently nonstandard for host and rpm builds that we'll have difficulty getting it to use any of the standard kernel module package methods without addressing some of it's deficiencies. multiple kernel-devel packages can be installed concurrently, so there is no reason not to install the package if it is required by a build - and it is correctly required by any package building kernel modules. That said... The Buildreq is automatically added by %kernel_module_package. It looks like it can be overridden with nobuildreqs=yes. |
| Comment by Minh Diep [ 01/Jul/15 ] |
|
even with your latest patch, it won't install without building with kernel-devel [root@onyx-24 x86_64]# rpm -hiv kmod-lustre-client-2.7.55-2.6.32_504.12.2.el6.x86_64_ge550f31.x86_64.rpm |
| Comment by Brian Murrell (Inactive) [ 06/Jul/15 ] |
I think you can use --nodeps with rpmbuild. This is of course something that needs to be done with care and attention to not paper over other missing dependency requirements. But those will float up to the surface pretty quickly.
These are not missing kernel-devel dependencies. These are missing "weak-module" ABI dependencies. Which kernel(s) were on the machine you tried to install kmod-lustre-client-2.7.55-2.6.32_504.12.2.el6.x86_64_ge550f31.x86_64.rpm on? The above errors are telling you that you don't have a kernel that is ABI (weak module) compatible with your kmod-lustre-client-2.7.55-2.6.32_504.12.2.el6.x86_64_ge550f31.x86_64.rpm |
| Comment by John Fuchs-Chesney (Inactive) [ 30/Oct/15 ] |
|
Minh or Steve, Just checking in to see where we are going with this ticket? Thanks, |
| Comment by Gerrit Updater [ 05/Dec/15 ] |
|
Christopher J. Morrone (morrone2@llnl.gov) uploaded a new patch: http://review.whamcloud.com/17489 |
| Comment by Christopher Morrone [ 05/Dec/15 ] |
|
I used a different Change-Id for my revision of Stephen's patch, and marked mine "fortestonly" to reduce confusion somewhat. Nothing too significantly different in mine yet, but I plan to continue work on mine next week. If folks like the changes and don't mind, I can update Stephen's original patch (if someone doesn't make the needed improvements before me). Stephen's work is an important step towards having Lustre sanely packed for RHEL. LLNL is moving to the koji build farm for our RHEL7-based TOSS3 distribution, and having this patch that gets weak-updates working is a high priority for us. |
| Comment by Christopher Morrone [ 05/Dec/15 ] |
|
Once we have this weak-modules support, we can finally drop the kernel version string from Lustre's release string. That will enable even more cleanup. Yay! |
| Comment by Stephen Champion [ 07/Dec/15 ] |
|
I'm glad you are finding this work useful! You need to be careful about the build environment for the kmp packaging to work as expected. You can hack around limitations a bit, but really need to have only a single version-release installed. This was never an issue for my build system, but the Jenkins environment does not enforce it. To deal with some of the problems of existing installations, crossbuilds, etc, I was looking at having multiple rpm specs, selected by configure options. |
| Comment by Stephen Champion [ 07/Dec/15 ] |
|
Christopher, I can't see my responses to your inline comments, so... You can do Obsoletes, Requires, etc with %kernel_module_package (%kmp) by providing a preamble. This a file which is expanded into the subpackage preamble. My use was: Source5: lustre-modules.files
Source6: lustre-ldiskfs.files
Source7: lustre-modules.preamble
Source8: lustre-ldiskfs.preamble [ ... ] %kernel_module_package -p %SOURCE7 -n %{name} -f %SOURCE5 @TARGET_FLAVORS@
%kernel_module_package -p %SOURCE8 -n %{name}-ldiskfs -f %SOURCE6 @TARGET_FLAVORS@
lustre-modules.preamble:
Provides: %{name}-modules %{name}-modules-%1
Obsoletes: %{old_kmod_obsoletes}
%if %is_server
Requires: lustre-backend-fs-%1
%endif
lustre-ldiskfs.preamble:
Requires: %{name}-modules-%1
Provides: lustre-backend-fs-%1
On RH based distros, it is used to override the default method of identifying which kernel to build for. It is not needed if the build environment is strictly controlled. See my Oct 1 comment.
You would need to move the without_servers condition into kmp-lustre.files |
| Comment by Christopher Morrone [ 07/Dec/15 ] |
You have to click the "review" button to post your inline comments.
Yes, I understand. I think you are right to include that for now, because changing the way Lustre selects which kernel to build against would only complicate the patch further. I am only asking that a comment be added to the spec file so that it is clearer to everyone in the future. You can see I added a comment in my version of the patch.
I completely agree. Having one spec file seems like a nice idea at first, but once you get into all of the little details, it becomes clear that rpm just was not designed to support multiple distros (or really even multiple versions of the same distro) in one spec file. We really should switch to having many simpler spec files instead of one inscrutable and always broken spec file. I am not convinced yet whether configure should be involved. I would like to move towards having the build (compilation) system and the packaging system be more cleanly separated. But it is a possibility. |
| Comment by Christopher Morrone [ 10/Dec/15 ] |
|
Server packages build fine under CentOS7 with my version of the patch, but not in Intel's buildfarm. Sigh. Can we kill lbuild yet? |
| Comment by Christopher Morrone [ 10/Dec/15 ] |
|
I'd like to go back and answer John's question:
and I'll start by quoting Stephen Champion:
Stephen really hit the nail on the head. The nonstandard way that lbuild operates is the biggest obstacle to success in this ticket. It looks to me like Stephen's last revision of the patch was very close to complete. The remaining minor issues shouldn't take very long to work through, and I have already addressed some of them in my revision. Build systems are surprisingly complicated and difficult to get right for every revision of every distribution. Adding that inherent complication to the unnecessary complication of lbuild's non-standard approach leaves us with a problem that is practically insurmountable. I've been recommending that we ditch lbuild for something that uses more standard packaging approach for years now (LU-3956, but I'd been advocating to change it long before I opened that ticket). I have not seen much effort in that area, unfortunately. So, Intel, how do we make progress here? I would really like to see this and other build and packaging improvements land early in the 2.9 landing window. Is there some way to address the problems in Intel's build farm in that time frame? I would assume that setting up a full lbuild replacement in just a month or two won't happen, but we'll need some attention to improve the current situation enough to let sane changes happen. So it would be good to lay out a plan for both short and longer term changes to Intel's buildfarm situation. |
| Comment by Minh Diep [ 10/Dec/15 ] |
|
Hi Chris
I see that it passed EL6.7 but failed in EL7. Have you tried local build on EL7? If so please share the output log. I wonder what lbuild does in this case that worked of EL6.7 and didn't work on EL7. Thanks |
| Comment by Christopher Morrone [ 10/Dec/15 ] |
|
CentOS7 is the same as EL7 for our purposes. So yes, I did try it, and it works. I'm not sure which log you mean by "output log". But I'm also not seeing how that would help you since your build farm with lbuild is so incredibly different a normal build environment. Welcome to the club. We all wonder why lbuild works on one but not the other. In an lbuild-free environment, the builds appear to work on that platform. |
| Comment by Minh Diep [ 10/Dec/15 ] |
|
Regarding >> lbuild and the environment it operates in is sufficiently nonstandard for host and rpm builds that we'll have difficulty getting it to use any of the standard kernel module package methods without addressing some of it's deficiencies. We are using lbuild just as if anyone use rpmbuild to build the package. I believe we address one of the issue where kernel module package requires kernel-devel to install on the builders which we did. it would be great if we can understand more of the deficiencies that you mentioned. |
| Comment by Minh Diep [ 10/Dec/15 ] |
|
>>> I'm not sure which log you mean by "output log". But I'm also not seeing how that would help you since your build farm with lbuild is so incredibly different a normal build environment. It would be nice if you can provide the commands and its' output that you built from EL7. I assume you'll run something like: autogen.sh, configure ...., make.... |
| Comment by Christopher Morrone [ 10/Dec/15 ] |
|
Yes, I ran "autogen.sh && ./configure && make rpms". You have both successful and failed logs in lbuild paths for RHEL6.7 and RHEL7. Those are the logs and systems you need to be looking at in the short term. |
| Comment by Minh Diep [ 10/Dec/15 ] |
|
Yes, I am looking. I am mostly interested in rpmbuild output. BTW did you build locally with patched kernel? |
| Comment by Stephen Champion [ 11/Dec/15 ] |
|
The build systems I am familiar with have a few key characteristics we are lacking for using this macro.
Not all of the build systems do this, but ideally kernel module package builds also have:
|
| Comment by Alexander Boyko [ 14/Jan/16 ] |
|
I moved forward a patch for kmod. and all redhat build failed as expected. |
| Comment by Christopher Morrone [ 14/Jan/16 ] |
|
Peter Jones, could we get Intel to weigh in here? We are almost certainly going to need to get more attention from a person or persons at Intel to make any further progress. Alexander and Stephen have done some great work on this patch. The community has put in its effort to move Lustre leaps and bounds closer to proper packaging. Now we are all blocked by problems in Intel's build farm. We need to have a serious discussion about application of manpower and approach to resolving this problem. We need to make some firm plans now if we are going to get this done for 2.9. How do we move forward? Would a teleconference help? |
| Comment by Minh Diep [ 15/Jan/16 ] |
|
Hi Chris, I will starting looking into this and will gather more manpower and resources to figure out what issue with Intel's build farm and come up with a sensible solution. |
| Comment by Christopher Morrone [ 15/Jan/16 ] |
|
Great, Minh! |
| Comment by Minh Diep [ 15/Jan/16 ] |
|
started this by manually configure, make rpms but failed at make rpms make[1]: Entering directory `/mnt/lustre-release' This is similar to what Intel jenkins reported. still looking, just FYI |
| Comment by Christopher Morrone [ 15/Jan/16 ] |
|
Minh, either use patch revision 22 for now or remove the "= %{kversion}" from the following line in the lustre.spec.in in patch revision 23: BuildRequires: kernel-devel = %{kversion}
|
| Comment by Minh Diep [ 15/Jan/16 ] |
|
ah, I see, revision 22 still failed on el7 due to the extra x86_64 string in the patch name |
| Comment by Christopher Morrone [ 16/Jan/16 ] |
|
Minh, while that is a problem that needs resolution, the bigger issue that we need you focussed on is the kernel-devel package installation issue. For instance, while the Intel build farm says that the el6.7, inkernel, server build was successfully for patch revision 22, that isn't really correct. The build farm only checks that something came out of the build; it doesn't really check if what came out is correct. The kernel symbol requirements in the resulting binary rpms are incorrect because the correct kernel-devel package was not installed at packaging time. |
| Comment by Minh Diep [ 16/Jan/16 ] |
|
Chris, kernel-devel is installed whenever a new version released. I checked the builder node which built version 22 (onyx-8-sdf1-el6-x8664) and see kernel-devel installed [root@onyx-8-sdf1-el6-x8664 ~]# rpm -qa | grep kernel-devel |
| Comment by Christopher Morrone [ 16/Jan/16 ] |
|
The next question that you need to ask yourself is this: "Are these the the kernel-devel packages for the correct kernel"? Keep in mind that Lustre is being built against a custom patched kernel. This kernel is built and packaged by lbuild during the lustre build process. You can see the packages for one distro here (this is linked off of the same patch revision 22, el6.7, inkernel, server build that I referenced ealier): And here are the relevant rpms: kernel-2.6.32-573.8.1.el6_lustre.g773a7e4.x86_64.rpm kernel-debuginfo-2.6.32-573.8.1.el6_lustre.g773a7e4.x86_64.rpm kernel-debuginfo-common-x86_64-2.6.32-573.8.1.el6_lustre.g773a7e4.x86_64.rpm kernel-devel-2.6.32-573.8.1.el6_lustre.g773a7e4.x86_64.rpm kernel-firmware-2.6.32-573.8.1.el6_lustre.g773a7e4.x86_64.rpm kernel-headers-2.6.32-573.8.1.el6_lustre.g773a7e4.x86_64.rpm You will note that the kernels that lbuild patches and builds have "_lustre.g773a7e4" appended to their revision number. The kernel packages that you listed do not contain that string, so I would expect that none of the kernel-devel packages that you listed are from the correct kernel. |
| Comment by Minh Diep [ 16/Jan/16 ] |
|
gotcha! |
| Comment by Alexander Boyko [ 16/Jan/16 ] |
|
Minh Diep, to check kmod dependencies you need to do rpm -qp --requires kmod-xxx.rpm | grep vfree rpm -qp --requires /lustre/lustre/kmod-lustre-client-tests-2.7.64-3.10.0_229.20.1.el7.x86_64_g773a7e4.x86_64.rpm | grep vfree kernel(vfree) = 0x999e8297 Patchset 22 have ksym(vfree) and it is wrong. |
| Comment by Stephen Champion [ 17/Jan/16 ] |
|
Just a note: client builds are done using the distro kernel. So we need the distro -devel kernel packages for client builds and the _lustre.tag kernel packages for server builds. |
| Comment by Minh Diep [ 19/Jan/16 ] |
|
I noticed that spl and zfs also require kernel-devel*lustre*. are they not having the same problem? |
| Comment by Alexander Boyko [ 20/Jan/16 ] |
|
I`ve tried to check kmod-zfs, but fail. Download requires authorization. What is reason for separate access to lustre packages and kmod-zfs? |
| Comment by Peter Jones [ 20/Jan/16 ] |
|
Limitations imposed (or at least perceived to be imposed) by the CDDL vs GPL licensing incompatibility. |
| Comment by Minh Diep [ 27/Jan/16 ] |
|
Hi, With my latest revision built under https://build.hpdd.intel.com/job/lustre-reviews/37084/arch=x86_64,build_type=server,distro=el6.7,ib_stack=inkernel/artifact/artifacts/RPMS/x86_64/ the kmod dependencies on el6.7 seems to be correct [root@onyx-24 ~]# rpm -qp --requires kmod-lustre-2.7.64-2.6.32_573.8.1.el6_lustre.g160868c.x86_64_g160868c.x86_64.rpm | grep vfree I still need to figure the failure on el7 though |
| Comment by Christopher Morrone [ 27/Jan/16 ] |
|
Minh, that is fantastic! Did you do any spot checking to see if the packages work? |
| Comment by Gerrit Updater [ 27/Jan/16 ] |
|
Christopher J. Morrone (morrone2@llnl.gov) uploaded a new patch: http://review.whamcloud.com/18170 |
| Comment by Minh Diep [ 27/Jan/16 ] |
|
Chris, not yet. There will be more testing on this I am sure |
| Comment by Christopher Morrone [ 29/Jan/16 ] |
|
Any theories on that final |
| Comment by Minh Diep [ 29/Jan/16 ] |
|
We used to build el7 server on a shorter build directory (ie /var/lib/jenkins/tmp/el7_top_dir) because somehow rpmbuild doesn't work on long dir (ie /var/lib/jenkins/workspace/lustre-reviews/arch/x86_64/build_type/server/distro/el7/ib_stack/inkernel/BUILD/BUILD). I don't know why it failed with this kmod patch but not other. I changed it back to long dir to build in the same directory as git checkout; and this solved the issue. |
| Comment by Minh Diep [ 02/Feb/16 ] |
|
Installation test found some issues + yum install -y kmod-lustre-osd-ldiskfs
Dependencies Resolved ================================================================================ Transaction Summary Total download size: 2.8 M and interestingly, install lustre require to install lustre-dkms. Installing: Transaction Summary |
| Comment by Aurelien Degremont (Inactive) [ 02/Feb/16 ] |
|
Just wanted to say I'm strongly supporting this feature. I very want something similar to Chris' patch being landed! I also think this is the good way to go. At CEA we are building our own kernel, (M)OFED and Lustre RPMS, in standard way, using standard tools. I would love building lustre with such patch. |
| Comment by Christopher Morrone [ 02/Feb/16 ] |
|
Minh, can you please elaborate on the "interestingly, install lustre require to install lustre-dkms" statement? Are you saying that the "yum install -y kmod-lustre-osd-ldiskfs" command was the one that pulled in the lustre-dkms package? |
| Comment by Minh Diep [ 02/Feb/16 ] |
|
sorry for not being clear, it's yum -y lustre command that pulled in the lustre-dkms package |
| Comment by Minh Diep [ 11/Feb/16 ] |
|
so lustre requires lustre-osd [root@onyx-21vm4 ~]# rpm -qp --requires ./lustre-2.7.65-2.6.32_573.12.1.el6_lustre.gb872116.x86_64_gb872116.x86_64.rpm | head which lustre-dkms provides [root@onyx-21vm4 ~]# rpm -qp --provides ./lustre-dkms-2.7.65-1.el6.noarch.rpm |
| Comment by Christopher Morrone [ 18/Feb/16 ] |
|
Let's take stock of where we stand. Obviously, we need a rebase of the patch. Next we have the interaction between the dkms packages and the non-dkms packages. I think the solution there is probably pretty straight forward: don't put both packages in the same yum repository. What else do we need to work through? |
| Comment by Minh Diep [ 18/Feb/16 ] |
|
I have updated the patch with more dependencies, please review. This resolved the conflict with dkms package. |
| Comment by Alexander Boyko [ 18/Feb/16 ] |
|
Could you explain, why the next line was added? Requires: %{requires_kmod_name} = %{requires_kmod_version}\n\
kmod adds requires on the symbols, so old style package references looks strange. |
| Comment by Minh Diep [ 19/Feb/16 ] |
|
you're right, we don't need that. |
| Comment by Minh Diep [ 19/Feb/16 ] |
|
I found that osd_ldiskfs.ko is not in weak-updates/... directory [root@onyx-21vm4 ~]# uname -a Is this expected or this is not working on lustre patched server? |
| Comment by Alexander Boyko [ 19/Feb/16 ] |
|
There are some differences for a rpms between SLES and RHEL at the rpm kmod scripts. rpm --scripts -qp kmod-lustre-osd-ldiskfs-2.8.50-2.6.32_573.12.1.el6_lustre.g33b2752.x86_64_gd0671d9.x86_64.rpm
postinstall scriptlet (using /bin/sh):
if [ -e "/boot/System.map-2.6.32-573.12.1.el6_lustre.g33b2752.x86_64" ]; then
/sbin/depmod -aeF "/boot/System.map-2.6.32-573.12.1.el6_lustre.g33b2752.x86_64" "2.6.32-573.12.1.el6_lustre.g33b2752.x86_64" > /dev/null || :
fi
modules=( $(find /lib/modules/2.6.32-573.12.1.el6_lustre.g33b2752.x86_64/extra/lustre-osd-ldiskfs | grep '\.ko$') )
if [ -x "/sbin/weak-modules" ]; then
printf '%s\n' "${modules[@]}" | /sbin/weak-modules --add-modules
fi
preuninstall scriptlet (using /bin/sh):
rpm -ql kmod-lustre-osd-ldiskfs-2.8.50-2.6.32_573.12.1.el6_lustre.g33b2752.x86_64_gd0671d9.x86_64 | grep '\.ko$' > /var/run/rpm-kmod-lustre-osd-ldiskfs-modules
postuninstall scriptlet (using /bin/sh):
if [ -e "/boot/System.map-2.6.32-573.12.1.el6_lustre.g33b2752.x86_64" ]; then
/sbin/depmod -aeF "/boot/System.map-2.6.32-573.12.1.el6_lustre.g33b2752.x86_64" "2.6.32-573.12.1.el6_lustre.g33b2752.x86_64" > /dev/null || :
fi
modules=( $(cat /var/run/rpm-kmod-lustre-osd-ldiskfs-modules) )
rm /var/run/rpm-kmod-lustre-osd-ldiskfs-modules
if [ -x "/sbin/weak-modules" ]; then
printf '%s\n' "${modules[@]}" | /sbin/weak-modules --remove-modules
fi
SLES rpm --scripts -qp lustre-osd-ldiskfs-kmp-default-2.8.50_3.0.101_0.47.71-3.0.101_0.47.71_lustre.g33b2752_default_gd0671d9.x86_64.rpm
postinstall scriptlet (using /bin/sh):
nvr=lustre-osd-ldiskfs-kmp-default-2.8.50_3.0.101_0.47.71-3.0.101_0.47.71_lustre.g33b2752_default_gd0671d9
wm2=/usr/lib/module-init-tools/weak-modules2
if [ -x $wm2 ]; then
/bin/bash -${-/e/} $wm2 --add-kmp $nvr
fi
preuninstall scriptlet (using /bin/sh):
nvr=lustre-osd-ldiskfs-kmp-default-2.8.50_3.0.101_0.47.71-3.0.101_0.47.71_lustre.g33b2752_default_gd0671d9
rpm -ql $nvr | sed -n '/\.ko$/p' > /var/run/rpm-$nvr-modules
postuninstall scriptlet (using /bin/sh):
nvr=lustre-osd-ldiskfs-kmp-default-2.8.50_3.0.101_0.47.71-3.0.101_0.47.71_lustre.g33b2752_default_gd0671d9
modules=( $(cat /var/run/rpm-$nvr-modules) )
rm -f /var/run/rpm-$nvr-modules
if [ ${#modules[*]} = 0 ]; then
echo "WARNING: $nvr does not contain any kernel modules" >&2
exit 0
fi
wm2=/usr/lib/module-init-tools/weak-modules2
if [ -x $wm2 ]; then
printf '%s\n' "${modules[@]}" | /bin/bash -${-/e/} $wm2 --remove-kmp $nvr
fi
The main point is RHEL uses modules directory equal to rpm package name to find and add *ko, for example extra/lustre-osd-ldiskfs . And SLES uses rpm package to find and add modlues. We have a different package name lustre-osd-ldiskfs lustre-tests etc., but use one directory to install all of them extra/lustre/fs/ and this could bring RHEL to some kind of a problems. Now with your question, |
| Comment by Stephen Champion [ 19/Feb/16 ] |
|
FYI, after expanding %kernel_module_package, the correct module installation directory can be obtained via %kernel_module_package_moddir instead of the inline selection and assignment to %kmoddir introduced with As Alexander mentioned, I suspect a kernel symbol mismatch for osd_ldiskfs trying to install for a non-lustre kernel. |
| Comment by Minh Diep [ 19/Feb/16 ] |
|
Did you see any error when you updated a kernel, it may have another version of kernel symbol and weak-update should fail, especially for ldiskfs*. >> No, I did not. However, we can't mount lustre dev because of missing osd-ldiskfs module (after upgrade) |
| Comment by Minh Diep [ 19/Feb/16 ] |
|
perhaps I did, here is the command yum install -y kernel-2.6.32-573.12.1.el6_lustre.gb872116.x86_64 kmod-lustre kmod-lustre-osd-ldiskfs Loaded plugins: fastestmirror, security Dependencies Resolved ================================================================================ Transaction Summary Total download size: 59 M WARNING: /lib/modules/2.6.32-573.12.1.el6_lustre.gb872116.x86_64/extra/lustre/fs/osd_ldiskfs.ko needs unknown symbol lu_cdebug_printer WARNING: /lib/modules/2.6.32-573.12.1.el6_lustre.gb872116.x86_64/extra/lustre/fs/osd_ldiskfs.ko needs unknown symbol lprocfs_counter_sub |
| Comment by Stephen Champion [ 19/Feb/16 ] |
|
Since you are building the modules with a patched kernel, it is not surprising that it is unable to resolve symbols when it attempts weak-updates for the unpatched kernel. Anytime you change the KABI without changing the flavor, this is the result. What is surprising is that only osd_ldiskfs.ko has trouble - this demonstrates great progress removing dependencies on the patched kernel! To test weak updates on servers, try installing the server modules on a system with a different (and later) version #. Ie, build modules for 2.6.32-573.12.1.el6_lustre.foo and install them on 2.6.32-573.18.1.el6_lustre.fop This is curious: |
| Comment by Minh Diep [ 19/Feb/16 ] |
|
Ah, I was hoping that this ticket let us use weak updates on unpatched kernel. I will try your suggestion. the line above is coming from the post install script rpm --scripts -qp kmod-lustre-osd-ldiskfs-2.8.50-2.6.32_573.12.1.el6_lustre.g33b2752.x86_64_gd0671d9.x86_64.rpm modules=( $(find /lib/modules/2.6.32-573.12.1.el6_lustre.g33b2752.x86_64/extra/lustre-osd-ldiskfs | grep '\.ko$') ) the path should be /lib/modules/2.6.32-573.12.1.el6_lustre.g33b2752.x86_64/extra/lustre/fs |
| Comment by Christopher Morrone [ 19/Feb/16 ] |
|
The patch does allow using weak modules with unpatched kernels. This patch doesn't care whether the kernel is patched or not; it works with both. However not all kernels offer identical symbols, so one build of lustre will not magically work with all kernels. When |
| Comment by Christopher Morrone [ 02/Mar/16 ] |
|
Minh, are you working on splitting apart the yum repos for the dkms and standard lustre builds? What else is currently on our list of blockers to getting this ticket done? Anything we can help with? |
| Comment by Minh Diep [ 02/Mar/16 ] |
|
I am working on moving the inline Require in to a file. will upload a version shortly. I have verified that we do not need the split the repo. This is one perhaps minor issue notice above about this error find: `/lib/modules/2.6.32-573.12.1.el6_lustre.gb872116.x86_64/extra/lustre-osd-ldiskfs': No such file or directory when installing kmod-lustre-osd-ldiskfs. If you can help with this, this would be great. |
| Comment by Minh Diep [ 04/Mar/16 ] |
|
and when I installed: yum install kmod-lustre-osd-zfs I see error like these --> Processing Dependency: ksym(arc_add_prune_callback) = 0x5bd5668d for package: kmod-lustre-osd-zfs-2.8.50-2.6.32_431.29.2.el6_lustre.g57d7852.x86_64_g57d7852.x86_64 |
| Comment by Christopher Morrone [ 04/Mar/16 ] |
|
Is ZFS installed? |
| Comment by Minh Diep [ 04/Mar/16 ] |
|
yes, I installed zfs before that and verified that zfs, spl, kmod-zfs, kmod-spl were on the system |
| Comment by Christopher Morrone [ 04/Mar/16 ] |
|
You are hacking some internal script to get it to make the kernel's symbol dependencies be named "ksym" instead of "kernel", right? Did that hack go too far and change the name for symbols that are in external modules as well? |
| Comment by Minh Diep [ 04/Mar/16 ] |
|
I don't think so, all we do is edit the scripts to use the build path, not /usr/src/... cp $RPM_HELPERS_DIR/{symset-table,find-requires{,.ksyms}} .
FIND_REQUIRES="$(pwd)/find-requires"
chmod 755 {symset-table,find-requires{,.ksyms}}
local tmp="$(pwd)"
tmp="${tmp//\//\\/}"
ed find-requires <<EOF
1a
set -x
.
/|.*find-requires.ksyms/s/|/| bash -x/
g/ [^ ]*\/\(find-requires\.ksyms\)/s// $tmp\/\1/g
wq
EOF
ed find-requires.ksyms <<EOF
1a
set -x
.
g/\/.*\/\(symset-table\)/s//$tmp\/\1/g
g/\(\/usr\/src\/kernels\/\)/s//$tmp\/reused\1/g
wq
EOF
ed symset-table <<EOF
1a
set -x
.
g/\(\/boot\/\)/s//$tmp\/reused\1/g
g/\(\/usr\/src\/kernels\/\)/s//$tmp\/reused\1/g
wq
EOF
|
| Comment by James A Simmons [ 18/Apr/16 ] |
|
I saw a refresh of this patch. Does this mean the issues of the Intel build farm have been resolved? |
| Comment by Christopher Morrone [ 18/Apr/16 ] |
|
There were just a large number of conflicts piling up so I did a refresh. No other changes, so any outstanding problems are probably still there. |
| Comment by Minh Diep [ 22/Apr/16 ] |
|
somehow build 38351 did not produce any kmod-lustre rpm |
| Comment by Christopher Morrone [ 22/Apr/16 ] |
|
Could you be more specific? It looks to like the the rhel7 builder for build 38351 produced kmode-lustre rpms: Wrote: /tmp/rpmbuild-lustre-jenkins-xBBcs7qm/RPMS/x86_64/lustre-2.8.51_36_g4782472-3.10.0_327.13.1.el7_lustre.x86_64.x86_64.rpm Wrote: /tmp/rpmbuild-lustre-jenkins-xBBcs7qm/RPMS/x86_64/kmod-lustre-2.8.51_36_g4782472-3.10.0_327.13.1.el7_lustre.x86_64.x86_64.rpm Wrote: /tmp/rpmbuild-lustre-jenkins-xBBcs7qm/RPMS/x86_64/kmod-lustre-osd-ldiskfs-2.8.51_36_g4782472-3.10.0_327.13.1.el7_lustre.x86_64.x86_64.rpm Wrote: /tmp/rpmbuild-lustre-jenkins-xBBcs7qm/RPMS/x86_64/lustre-osd-ldiskfs-mount-2.8.51_36_g4782472-3.10.0_327.13.1.el7_lustre.x86_64.x86_64.rpm Wrote: /tmp/rpmbuild-lustre-jenkins-xBBcs7qm/RPMS/x86_64/kmod-lustre-osd-zfs-2.8.51_36_g4782472-3.10.0_327.13.1.el7_lustre.x86_64.x86_64.rpm Wrote: /tmp/rpmbuild-lustre-jenkins-xBBcs7qm/RPMS/x86_64/lustre-osd-zfs-mount-2.8.51_36_g4782472-3.10.0_327.13.1.el7_lustre.x86_64.x86_64.rpm Wrote: /tmp/rpmbuild-lustre-jenkins-xBBcs7qm/RPMS/x86_64/lustre-source-2.8.51_36_g4782472-3.10.0_327.13.1.el7_lustre.x86_64.x86_64.rpm Wrote: /tmp/rpmbuild-lustre-jenkins-xBBcs7qm/RPMS/x86_64/lustre-tests-2.8.51_36_g4782472-3.10.0_327.13.1.el7_lustre.x86_64.x86_64.rpm Wrote: /tmp/rpmbuild-lustre-jenkins-xBBcs7qm/RPMS/x86_64/kmod-lustre-tests-2.8.51_36_g4782472-3.10.0_327.13.1.el7_lustre.x86_64.x86_64.rpm Wrote: /tmp/rpmbuild-lustre-jenkins-xBBcs7qm/RPMS/x86_64/lustre-iokit-2.8.51_36_g4782472-3.10.0_327.13.1.el7_lustre.x86_64.x86_64.rpm Wrote: /tmp/rpmbuild-lustre-jenkins-xBBcs7qm/RPMS/x86_64/lustre-debuginfo-2.8.51_36_g4782472-3.10.0_327.13.1.el7_lustre.x86_64.x86_64.rpm |
| Comment by Christopher Morrone [ 22/Apr/16 ] |
|
Oh, they were produced, but it looks like lbuild failed to copy them? Hmm, did I drop some change from lbuild in the rebase? I'll look into that. |
| Comment by Christopher Morrone [ 22/Apr/16 ] |
|
Oh, whew, I didn't change anything by accident. But the " |
| Comment by Christopher Morrone [ 25/Apr/16 ] |
|
It looks like patch set 30 dealt with the But testing is all failing. |
| Comment by Christopher Morrone [ 27/Apr/16 ] |
|
Minh, it looks like there are problems at package installation time. In one of the logs named "node-provisioning-1.node-provisioning_1.autotest.trevis-38vm1.log" I am seeing the following failure: 03:36:43:Attempt 3 03:36:44:yum install output: Loaded plugins: fastestmirror, security Setting up Install Process Loading mirror speeds from cached hostfile No package lustre-osd-ldiskfs available. Error: Nothing to do 03:36:44:Yum install was unsuccessful. 03:36:44:Exhausted installation attempts of package lustre-osd-ldiskfs. 03:36:44:File system install complete. 03:36:44:after_prepare_nodes complete for trevis-38vm3 03:36:44:Preparing nodes complete! First of all, the scripts are failing to detect the failure of the installation process. The provisioning step is declared a success even though it it failing. Next, the scripts appear to be trying to install lustre-osd-ldiskfs which is not a valid package name under this patch. We probably need to update the build farm scripts to understand the new package names. |
| Comment by Minh Diep [ 27/Apr/16 ] |
|
got it. I'll check it out |
| Comment by Minh Diep [ 29/Apr/16 ] |
|
actually the error was |
| Comment by Minh Diep [ 30/Apr/16 ] |
|
we need to rebase due to a bug in |
| Comment by Minh Diep [ 03/May/16 ] |
|
with the latest build yum install kmod-lustre-osd-zfs still failed dependencies. I have install zfs, kmod-zfs, spl, ... ---> Package lustre-osd-zfs-mount.x86_64 0:2.8.52_45_gbef8586-2.6.32_573.22.1.el6_lustre.x86_64 will be installed here is what I have on the node [root@trevis-28vm2 yum.repos.d]# rpm -qa | grep lustre |
| Comment by Christopher Morrone [ 03/May/16 ] |
|
OK, so find the rpm for the kmod-zfs-2.6.32-573.22.1.el6_lustre.x86_64-0.6.4.2-1.el6.x86_64 packge. Run this command on it: rpm -qp --provides <rpm file name> | grep dmu_objset_disown What symbol version does it say that dmu_objest_disown has in that package? If it is anything other than "0xde76f7a7", that will explain why you are getting this error. If the line matches, then we'll move on to other problems. |
| Comment by Minh Diep [ 03/May/16 ] |
|
right, that's the issue. kmod-zfs doesn't provide any symbol.
|
| Comment by Christopher Morrone [ 03/May/16 ] |
|
Great! That seems like the next Intel buildfarm problem that you need to tackle. Let me know if I can help in any way. |
| Comment by Minh Diep [ 03/May/16 ] |
|
I look at http://zfsonlinux.org/generic-rpm.html and notice the steps to build zfs mod as below $ cd spl-x.y.z
$ cd ../zfs-x.y.z and we are using rpm build. I'll start there |
| Comment by Minh Diep [ 03/May/16 ] |
|
Chris, I have repeated the build manually, (ie not using lbuild) and it still failed the dependency. I don't think this is lbuild issue but building kmod-lustre-osd-zfs with kmod-zfs issue |
| Comment by Christopher Morrone [ 03/May/16 ] |
|
You are saying that the kmod-zfs package has no ksym symbols under the Provides even when you build it yourself? Alright, can you step me through the details of how you built zfs manually? |
| Comment by Minh Diep [ 04/May/16 ] |
|
here are the steps wget http://archive.zfsonlinux.org/downloads/zfsonlinux/spl/spl-0.6.4.2.tar.gz tar zxvf spl-0.6.4.2.tar.gz ./autogen.sh install all spl rpm tar zxvf zfs-0.6.4.2.tar.gz ./autogen.sh then clone and checkout the patch |
| Comment by Christopher Morrone [ 04/May/16 ] |
|
What distro is that on? What kernel version? What is the output of "rpm -qp --provides" for the kmod-zfs package that results? |
| Comment by Minh Diep [ 04/May/16 ] |
|
[root@onyx-21vm4 lustre-release]# uname -a [root@onyx-21vm4 zfs-0.6.4.2]# rpm -qp --provides kmod-zfs-2.6.32-573.22.1.el6.x86_64-0.6.4.2-1.el6.x86_64.rpm |
| Comment by Christopher Morrone [ 06/May/16 ] |
|
Minh, you are going to want to add the configure option "--with-spec=redhat" when building both spl and zfs for the centos/rhel systems. That will get you the packages with the needed symbol dependencies. |
| Comment by Christopher Morrone [ 06/May/16 ] |
|
During my investigation of the previous package installation problem, I found another problem. Take a look at the postinstall script from the kmod-lustre-osd-zfs package generated under RHEL6.7: postinstall scriptlet (using /bin/sh):
if [ -e "/boot/System.map-2.6.32-573.26.1.el6.x86_64" ]; then
/sbin/depmod -aeF "/boot/System.map-2.6.32-573.26.1.el6.x86_64" "2.6.32-573.26.1.el6.x86_64" > /dev/null || :
fi
modules=( $(find /lib/modules/2.6.32-573.26.1.el6.x86_64/extra/lustre-osd-zfs | grep '\.ko$') )
if [ -x "/sbin/weak-modules" ]; then
printf '%s\n' "${modules[@]}" | /sbin/weak-modules --add-modules
fi
Note the path for the find command. It is looking in /lib/modules/2.6.32-573.26.1.el6.x86_64/extra/lustre-osd-zfs for the module, but that directory doesn't exist. I think that the simplest solution would probably just to make those subdirectories and move the modules in the %install section of the spec file. I'll give that a try. |
| Comment by Christopher Morrone [ 09/May/16 ] |
|
FYI, I am about to push another update to the patch adding a couple of more Obsoletes entries. |
| Comment by Minh Diep [ 09/May/16 ] |
|
Chris, have you (or anyone) tried to install kmod-lustre-osd-zfs from your own build? |
| Comment by Christopher Morrone [ 09/May/16 ] |
|
OK, patch set 38 addresses every issue that I know about in the patch. I tested it through package installation on a RHEL6.7 VM, and all went well. The dependencies on the zfs package all matched up, and the symlinks from all of the modules looked like they were created correctly, even for the osd modules (that wasn't working on RHEL until one of my updates last week). This The only problem that I know of now is that we need to get a version of ZFS built in Intel's build far that lists all of the provided symbols in the package Provides: section. We know one way to fix that (add --with-spec=redhat to the configure line on RHEL-like systems). Barring undiscovered issues, I think the patch is pretty darn close to complete. |
| Comment by Minh Diep [ 09/May/16 ] |
|
that's great news! |
| Comment by Minh Diep [ 11/May/16 ] |
|
Chris, I noticed when we use 'generic' spec, we produced two kmod-spl-devel kmod-spl-devel-0.6.4.2-1.el6.x86_64.rpm but on 'redhat' spec, we only have one kmod-spl-devel-0.6.4.2-1.el6.x86_64.rpm |
| Comment by Christopher Morrone [ 11/May/16 ] |
|
I don't think there is anything to be concerned about there. |
| Comment by Minh Diep [ 11/May/16 ] |
|
is kmod-spl-devel-0.6.4.2-1.el6.x86_64.rpm the same between the 'generic' and 'redhat' spec? |
| Comment by Christopher Morrone [ 11/May/16 ] |
|
You can always rpm -qpl the package to find out. My guess is that there are differences in the contents. Why is this an issue? |
| Comment by Minh Diep [ 17/May/16 ] |
|
I am still looking into this. the 'redhat' spec is building against the builder's kernel, not the one we want for lustre. |
| Comment by Minh Diep [ 18/May/16 ] |
|
Chris, it seems to me that redhat/*-kmod.spec does not allow to build any kernel other than the one in /usr/src %define ksrc %{_usrsrc}/kernels/% {kverrel}in order to use it for lustre, we need to install lustre kernel onto the builder after we patched the kernel. any advice? |
| Comment by Brian Murrell (Inactive) [ 18/May/16 ] |
You should be able to override the %{_usrsrc}} variable (either at the build invocation time, or in ~/.rpmmacros) value to point at where you unpacked your kernel no? |
| Comment by Christopher Morrone [ 18/May/16 ] |
|
Yeah, getting the Intel buildfarm to do things normally (actually install the packages that are BuildRequires of lustre) is probably not going to happen any time soon. So in the mean time we are probably stuck with various variable overrides. I don't think that overriding _usrsrc will work on its own because the zfs "redhat" zfs-kmod.spec file uses %kernel_module_package, and that will always look instead for a properly installed kernel. That can be overriden, but that will take modifications to the zfs-kmod.spec script. I am none too certain that the zfs community is going to accept patches that dirty up their spec file for a completely non standard build environment. So maybe we should go back to the "generic" zfs-kmod.spec file. The "generic" spec file is kmods 2 based and designed to work well in build environments like rpmfusion. It sounds like that spec file was easier to build in the Intel buildfarm. The problem with those packages is that they were failing to list Provides for the provided kernel symbols. I verified that issue on a clean RHEL6.7 image. The zfs community is going to be alot more receptive to allowing a change to add those Provides. So I think we should put some effort into figuring out how to add the kernel symbol Provides to the generic zfs spec files. |
| Comment by Minh Diep [ 18/May/16 ] |
|
yes, I am able to build spl with spl-kmod.spec modified; only a few lines like below 31,34d30 < %if !%{defined kernels} < %define kernels %{kverrel} < %endif < 37d32 < %if !%{defined ksrc} 39d33 < %endif 51d44 < 104c97 < %{__rm} -f %{buildroot}/lib/modules/%{kernels}/modules.* --- > %{__rm} -f %{buildroot}/lib/modules/%{kverrel}/modules.* 106c99 < %{__chmod} u+x %{buildroot}/lib/modules/%{kernels}/extra/*/*/* --- > %{__chmod} u+x %{buildroot}/lib/modules/%{kverrel}/extra/*/*/* 112d104 < /lib/modules/%{kernels}/extra/*/*/* If you think zfs community won't allow such change for flexibility, then I will pursue the 'generic' spec |
| Comment by Christopher Morrone [ 18/May/16 ] |
|
Just a quick tip: always use unified diff format: "diff -u". A decent set of standard options would be "diff -uNp". Better yet, use git. The "git diff" command does the right thing by default. |
| Comment by Minh Diep [ 23/May/16 ] |
|
update: latest version with --define "_use_internal_dependency_generator 0" added worked!! |
| Comment by Minh Diep [ 23/May/16 ] |
|
Chris, I want to review the packages and provides; just to make sure [root@onyx-21vm7 build_type-server_distro-el6.7_arch-x86_64_ib_stack-inkernel]# rpm -qp --provides lustre-dkms-2.8.52_70_g3660338-1.el6.noarch.rpm should lustre-dkms also provides lustre-osd-ldiskfs? |
| Comment by Christopher Morrone [ 23/May/16 ] |
|
I have never reviewed what they did in the dkms packaging. |
| Comment by Minh Diep [ 23/May/16 ] |
|
I am investigating an sles12sp1 build failure that could be from this patch d4d54bfba3e5f20ab369aa69b0f9b6dcfab1cdce debuginfo(build-id) = d8c98b864c6e0ac2360a32da503bd15d7cbbc263 debuginfo(build-id) = da62063a63187395fb3c9b3aae6e3541ed005684 debuginfo(build-id) = da697fd1e5187bf2f6ddc0dc1528c9050b51f08d debuginfo(build-id) = e46c8a226902868165db3894ee0450d05be00325 debuginfo(build-id) = f6cfd424f753677f8f61b05c6326af88cdafc5da debuginfo(build-id) = f8134d77ab4d46f6c4cd994efab8518e9e8b8372 lustre-tests-debuginfo = 2.8.53_27_ga8b7e8c-3.12.57_60.35_lustre_default lustre-tests-debuginfo(x86-64) = 2.8.53_27_ga8b7e8c-3.12.57_60.35_lustre_default |
| Comment by Christopher Morrone [ 27/May/16 ] |
|
It looks like there were problems with node provisioning step in the most recent revision of the patch. Are you working on a solution for that too? |
| Comment by Minh Diep [ 27/May/16 ] |
|
Yes, I am. I will make sure this patch get tested |
| Comment by Minh Diep [ 29/May/16 ] |
|
the latest issue is on zfs el7, can't load osd-zfs May 29 08:04:34 onyx-42vm3 mrshd[7705]: root@onyx-42vm5.onyx.hpdd.intel.com as root: cmd='(PATH=$PATH:/usr/lib64/lustre/utils:/usr/lib64/lustre/tests:/sbin:/usr/sbin; cd /usr/lib64/lustre/tests; LUSTRE="/usr/lib64/lustre" sh -c "mkdir -p /mnt/mds1; mount -t lustre #011#011 lustre-mdt1/mdt1 /mnt/mds1");echo XXRETCODE:$?' I have checked manually with el6.7 but not el7. I'll check on el7 |
| Comment by Minh Diep [ 01/Jun/16 ] |
|
the reason is on el7, somehow another 'x86_64' string was added in the path D: %post(kmod-lustre-osd-zfs-2.8.53_28_g7808b96-3.10.0_327.13.1.el7_lustre.x86_64.x86_64): scriptlet start D: %post(kmod-lustre-osd-zfs-2.8.53_28_g7808b96-3.10.0_327.13.1.el7_lustre.x86_64.x86_64): execv(/bin/sh) pid 25714 + '[' -e /boot/System.map-3.10.0-327.13.1.el7_lustre.x86_64.x86_64.x86_64 ']' + modules=($(find /lib/modules/3.10.0-327.13.1.el7_lustre.x86_64.x86_64.x86_64/extra/lustre-osd-zfs | grep '\.ko$')) ++ grep '\.ko$' ++ find /lib/modules/3.10.0-327.13.1.el7_lustre.x86_64.x86_64.x86_64/extra/lustre-osd-zfs find: ‘/lib/modules/3.10.0-327.13.1.el7_lustre.x86_64.x86_64.x86_64/extra/lustre-osd-zfs’: No such file or directory |
| Comment by Christopher Morrone [ 01/Jun/16 ] |
|
Can you please provide a pointer to the failed test in maloo? |
| Comment by Minh Diep [ 01/Jun/16 ] |
|
actually I had to run rpm manually to get that error message. I thought your patch from |
| Comment by Minh Diep [ 01/Jun/16 ] |
|
I wonder if this is related to kmodtool. I think zfs/spl is using a later version than the one provided with the distro. should lustre do the same? |
| Comment by Christopher Morrone [ 01/Jun/16 ] |
|
It does not surprise me that With a stock rhel7.2 kernel with zfs, and I cannot reproduce this problem. So I am guessing that this is another Intel build system problem. I think the first thing I would try is adding a %{warn} statement to the line after "%{!?kernel_version: %global kernel_version %kversion}" in the lustre spec file to print out the value of %kernel_version. That warn statement might show the weird value right there, and then you can work backward from that to find where it originates. |
| Comment by Minh Diep [ 02/Jun/16 ] |
|
yeah, so the kernel_version print out is "3.10.0-327.18.2.el7.x86_64" |
| Comment by Christopher Morrone [ 02/Jun/16 ] |
|
I think it is too early to give up on this approach. Lets keep digging. |
| Comment by Christopher Morrone [ 02/Jun/16 ] |
|
On RHEL7, /usr/lib/rpm/redhat/kmodtool has the following function: get_kernel_release ()
{
if [[ -z $1 ]]; then
uname -r
return
fi
local arch=$(arch)
local verrel=${1%.$arch}
local verprefix=${verrel%.*}
local versuffix=${verrel#$verprefix}
verrel=$(ls -Ud /usr/src/kernels/$verprefix*$versuffix.$arch | sort -V | tail -n 1)
verrel=${verrel##*/}
[[ -z $verrel ]] && verrel=$1.$arch
echo "$verrel"
}
This is no doubt where the problem arises for lbuild on rhel7. lbuild, as we all know, employees a very non-standard build environment. Since the kernel is not in the correct location during packaging, this function is falling back to setting verrel to "$1.$arch". That is almost certainly where that extra ".x86_64" is coming from, but only under lbuild on rhel7. The trick here will be to come up with more hackery for lbuild that allows the lustre.spec file to remain relatively clean for proper build environments. |
| Comment by Christopher Morrone [ 02/Jun/16 ] |
|
Meanwhile, Minh, do you have a plan to fix the Intel buildfarm to install the correct packages when this patch is in use? The node provisioning step still seems to always install the old packages, which winds up pulling in part kmod and part dkms packages. |
| Comment by Minh Diep [ 02/Jun/16 ] |
|
Chris, actually I have fixed the one under Onyx cluster. We are rolling out to the rest of the clusters. I checked and the installation has worked on Onyx. Unfortunately, most of the testing went to Trevis cluster in the last few days (week) due to Onyx being loaded with master testing. |
| Comment by Christopher Morrone [ 06/Jun/16 ] |
|
OK, great. If you have that figured out, then we could be in good shape. I believe that I have hacked around lbuild's latest problem on RHEL7. I tested my hack in http://review.whamcloud.com/20572, and it seemed to work. No triple ".x86_64" in the scripts, and it passed testing...but that was on trevis. I don't know why it passed there despite failing ldiskfs package installation. Very strange. I squashed my work-around into Patch Set 44 of this ticket's patch, http://review.whamcloud.com/12063. Are there any other known problems? |
| Comment by Minh Diep [ 07/Jun/16 ] |
|
actually, I also have a version and I am testing right now. http://review.whamcloud.com/#/c/18426/ |
| Comment by Christopher Morrone [ 08/Jun/16 ] |
|
The patch is apparently hung up on unrelated test failures now. It would still be good to get reviews in now, so we can get this landed as soon as reasonable. |
| Comment by Minh Diep [ 13/Jun/16 ] |
|
I found in build log: + echo 'The kernel ABI reference files (provided by kabi-whitelists) were not found.' The kernel ABI reference files (provided by kabi-whitelists) were not found. + echo 'No compatibility check was performed. Please install the kABI reference files' No compatibility check was performed. Please install the kABI reference files + echo 'and rebuild if you would like to verify compatibility with kernel ABI.' and rebuild if you would like to verify compatibility with kernel ABI. + echo '' we need to install kabi-whitelists rpm don't we? |
| Comment by Stephen Champion [ 13/Jun/16 ] |
|
It helps, but Lustre uses symbols not the in the whitelist, so you will still get a warning. |
| Comment by Brian Murrell (Inactive) [ 13/Jun/16 ] |
|
Does it still? |
| Comment by Minh Diep [ 13/Jun/16 ] |
|
I don't this we'll get the warning after install the kabi-whitelist rpm. I did it locally and didn't see that. we'll update the builders and see |
| Comment by James A Simmons [ 15/Jun/16 ] |
|
I just tried the latest patch and I don't see lustre-module-* anymore. Is this expected? I also tried installing them RPMs and got: rpm |
| Comment by Bob Glossman (Inactive) [ 15/Jun/16 ] |
|
James, |
| Comment by Stephen Champion [ 15/Jun/16 ] |
|
Thank you for picking this up and following through with it. My complaint is that I'm still the owner and can't give you a +1! Edit: I won't bother nitpicking changes to lbuild. |
| Comment by James A Simmons [ 15/Jun/16 ] |
|
Okay, for some reason my initial build didn't produce the kmod-* rpms. I had to delete the source tree and do it over again. Now I see ksym(snprintf) = 0x9edbecae is needed by kmod-lustre-2.8.54_60_g2a55f34_dirty-2.6.32_573.12.1.el6.head.x86_64.x86_64 When I go to install kmod-lustre-*.x86_64.rpm |
| Comment by Minh Diep [ 16/Jun/16 ] |
|
James, are you able to build and install it. Please let me know if you have any problem. if not could you re-review that patch? |
| Comment by James A Simmons [ 16/Jun/16 ] |
|
Nope, but looking at how I build our kernel I might know why. When building the kernel rpm I used --without kabichk. Would that be the reason it doesn't work? This is all done using RHEL6.7. |
| Comment by James A Simmons [ 16/Jun/16 ] |
|
Remove --without kabichk and still gives me the ksym issues. |
| Comment by Stephen Champion [ 17/Jun/16 ] |
|
Is the target kernel installed on the build system (or environment)? |
| Comment by James A Simmons [ 17/Jun/16 ] |
|
No the target kernel rpm is not installed on the build system. That would be a big no no since the machine is used for other purposes. The target kernel source tree is installed in a special directory which is not /usr/src to build against. I will be looking into why the dependency generation is done against the wrong kernel. I wonder if we have to do a OFED style build process in which we tell the location of the symver. |
| Comment by James A Simmons [ 17/Jun/16 ] |
|
So I created the most basic test to show the problem. This is with building just the patchless client. Currently our build machine is at RHEL6.7 and my test node is running RHEL6.8. So we have very different kernels running on both. On the build machine I as non-root installed the RHEL6.8 development tree in /tmp so I have /tmp//usr/src/kernels/2.6.32-642.1.1.el6.x86_64. Then I went into my lustre tree containing this patch and did: cd lustre_release Then I went to install the rpms into the image and got the ksyms errors still. This points to the current patch for |
| Comment by Minh Diep [ 17/Jun/16 ] |
|
it worked on el7. I'll verify on el6 as you did next [root@onyx-21vm5 lustre-release]# uname -a scp kmod-lustre-client-2.8.54_61_gcc7a8c9-3.10.0_327.18.2.el7.x86_64.x86_64.rpm root@onyx-24:/root [root@onyx-24 ~]# uname -a |
| Comment by James A Simmons [ 17/Jun/16 ] |
|
Okay. Lets see if its a RHEL6.X issue. Especially since I don't have access to RHEL7 systems |
| Comment by Minh Diep [ 17/Jun/16 ] |
|
I don't see any issue on el6. I built on el6.7 pointing to el6.8 kernel and install kmod lustre on el6.8 system. no issue [root@onyx-21vm2 lustre-release]# uname -a rpm -hiv ./kmod-lustre-client-2.8.54_61_gcc7a8c9-2.6.32_642.1.1.el6.x86_64.x86_64.rpm I am not sure what's missing. I setup the builder without anything special. installed git, libtool, rpm-build How did you put the kernel-devel on the builder? cpio or rpm install? |
| Comment by James A Simmons [ 18/Jun/16 ] |
|
Are you ssh into the destination node and then installing the rpm on that node? For our systems we create the rpms and chroot into a directory that serves as the root of my image for the diskless test nodes. [ management server ] -> /export-image/root -> diskless-node:/root chroot /export-image Could the chroot environment be causing the ksym issues? In our diskless setup its not really possible to install rpms directly on the test nodes since they are essentially read only. Can you try that setup Minh please. |
| Comment by Minh Diep [ 18/Jun/16 ] |
|
"Are you ssh into the destination node and then installing the rpm on that node?" I am not familiar with chroot. For diskless node, I know that LLNL Chaos built the image with the set of rpms, then refresh/boot the node. I'll have to look into chroot more; but I am pretty sure that's the difference and causing the issue. Can you attach or upload your kmod-lustre-client rpm so I can check its content? |
| Comment by James A Simmons [ 20/Jun/16 ] |
|
I found the source of the problem. This change now requires the package kabi-whitelist at least for RHEL6. Do you have this package for RHEL7 and SLES as well? Once I installed kaki-whitelist it appears to work. Well ZFS still gives trouble but it might be a simple case of rebuilding it. |
| Comment by Christopher Morrone [ 20/Jun/16 ] |
|
I'm back from work travel, so I can jump back into the conversation now. James, I don't believe that kernel-abi-whitelists is required on either RHEL6 or RHEL7. Yes, there will be a warning about it being missing at build time, but that doesn't hurt the packaging process. Clean RHEL6.7 and RHEL7.2 images have been my main testing platforms (with some SLES 12.X on occasion). I checked both of my RHEL images just now, and neither have kernel-abi-whitelists installed. The rpm packages generated under this patch install just fine, and appear to have all required kernel symbols expressed in the rpm --requires section. By the way, if your packages really lists this: ksym(snprintf) = 0x9edbecae ksym(sscanf) = 0x42224298 then there is an additional problem that I didn't remember off the top of my head on the road. Those should not be "ksym" requirements; they should be "kernel" requirements. I would check the package with "rpm -qp --requires" and see how they look. Here are the requirements from the kmod-lustre package on my RHEL6.7 kernel(snprintf) = 0x9edbecae kernel(sscanf) = 0x42224298 A "kernel" requirement on RHEL means that the RH build system knows that the symbol is provided by the kernel package itself. Any other kernel symbol provided by some external package (e.g. zfs) will be a "ksym" requirement instead of "kernel". So this still implies that Lustre was not built on a system with the prerequisite kernel rpms properly installed. FYI, SUSE does not have a kernel-abi-whitelists package. They do not even employ the approach of having a symbol whitelist. |
| Comment by Christopher Morrone [ 20/Jun/16 ] |
|
Minh said:
That is not a requirement for this ticket. It probably won't hurt, but you can track that activity in a separate ticket if you like. It does not need to be linked to this issue. |
| Comment by Christopher Morrone [ 21/Jun/16 ] |
|
In reply to James from gerrit:
No one is requiring special things like mock. We just require that, when building rpms, the prerequisite BuildRequires must actually be installed. This is a completely normal thing to expect in the rpm packaging world, and it is what all of the behind-the-scenes scripts from the distro assume will be the case. Most people will be able to meet that requirement, I think. To avoid installing the prerequisites, you have to replace system-provided behind-the-scenes scripts with your own. We did that for lbuild, despite how icky I felt doing it. I just recommend mock for the places where you don't have access to a system where the BuildRequires are installed, or you don't have the permissions to install those packages. mock is designed exactly for those situations: it sets up a clean base chroot environment, and then installs only the packages required by the source rpm's BuildRequires. If none of those things will work for people, they can always just download prebuilt binaries. The prebuilt binaries are going to work on a much wider set of kernels once they are no longer tied to one specific kernel version. Yes, some adaptation of process will be necessary for some people. I know, change is hard. I totally feel your pain. The Lustre community has gotten very used to its oddball way of building and packaging. But that means that Lustre just flat out doesn't build through the standard tools of major distros like Fedora and RHEL. I think the advantage of supporting those standard methods vastly outweighs the short term pain we will feel as we adapt our personal processes to the new packaging methods. |
| Comment by James A Simmons [ 21/Jun/16 ] |
|
I discovered the kabi-whitelist issues with the following error during the build process: Finding Provides: /usr/lib/rpm/redhat/find-provides Finding Requires: /usr/lib/rpm/redhat/find-requires Once I installed kabi-whitelish and the rpms started to work. Also 'rpm kernel(__wake_up) = 0x642e54ac So its working correct now. |
| Comment by Gerrit Updater [ 27/Jun/16 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/12063/ |
| Comment by Christopher Morrone [ 29/Jun/16 ] |
|
The patch landed to master, so I closing this ticket. Good work everyone! If there is any fallout that we need to work through, we can open new tickets for that. |
| Comment by Minh Diep [ 29/Jun/16 ] |
|
yes, a small fix for suse12 is in |
| Comment by Cory Spitz [ 06/Jun/17 ] |
|
FYI: |