Details
-
Bug
-
Resolution: Fixed
-
Major
-
Lustre 2.4.0
-
Builds from the build server
-
3
-
7569
Description
While working on LU-3109 it seems the ZFS we package is old.
Christopher Morrone added a comment - 05/Apr/13 1:04 AM rc10 is old, you should definitely upgrade to the latest rc of 0.6.0. 0.6.1 is a little TOO new, because packaging has changed there, and lustre will need a little tweaking to find the new paths and things automatically. You can build by hand by giving spl and zfs paths, but the latest 0.6.0 rc will just be easier.
It seems we need to say current with the ZFS version.
Attachments
Activity
The dkms tool provides a way (in theory) to build kernel sepcific kmod packages from a given dkms package via the command "dkms mkkms". That specific dkms sub-command is broken in the latest dkms release, but is fixed by the above linked patch (it may not say so in the comment but it fixes "dkms mkkms" and "dkms match" and "dkms uninstall").
So given the above this would let just spl-dkms and zfs-dkms be installed on the build machines and then kmod-zfs-* and kmod-spl-* could then be generated for a given kernel and Lustre could be built against those generated packages.
I don't understand that at all. In order to use the dkms packages from spl/zfs, you would need to both install the kernel that you care about and install the dkms packages from spl/zfs.
And all of this complexity that you are adding is expressly to deal with the fact that you can't install the rpm packages in your current build system. If you could install the correct kernel, you could also just built spl/zfs kmod packages directly like we do, instead of the more convoluted dkms-to-kmod route. And then you could install those packages, which would then let you build lustre.
But you guys haven't been able to build against installed rpms, so I'm not sure how any of that is relevant.
I apologize if this muddies up the discussion, but I just want to make sure I understand the build system in place at Intel. For each build of Lustre, you guys build the kernel (to apply Lustre patches), then build ldiskfs/zfs against this new kernel (without actually installing or running the new kernel), and then build Lustre against the newly built ldiskfs/zfs/kernel? And all this is done without ever installing the RPMs that are built (i.e. you point the build system for the current package being built (e.g. lustre) at the working directory of each recently built dependent package (e.g. ldiskfs/zfs/etc))?
The build system is lbuild (in the lustre tree: lustre-release/contrib/lbuild/lbuild), so you can read exactly what it does, but as I understand it (skipping some reuse optimizations):
We build a patched kernel, place it in $BUILDROOT/usr/src/kernels/<KVER>
We build spl/zfs against that kernel and place them in $BUILDROOT/usr/src/spl-<SPLVER> and $BUILDROOT/usr/src/zfs-<ZFSVER>
Not that we also build the kmod rpms which get unwrapped into
$BUILDROOT/usr/src/spl-<SPLVER>/<KVER>
$BUILDROOT/usr/src/zfs-<ZFSVER>/<KVER>
Then we build lustre against that kernel and those versions of spl and zfs.
The current build process builds zfs/spl from source every time. I would very much like to just install the latest zfs/spl dkms rpms (spl-dkms, zfs-dkms) on our build machines and build from those. The problem with this is that Lustre osd-zfs needs to build against the zfs binaries for the kernel we're building against. Once
LU-20(patchless server kernel) is completed, this become MUCH easier, but until then we build a kernel during the Lustre build process, thus we need to build kmod-spl-* and kmod-zfs-* during the Lustre build process after we build the kernel.
I apologize if this muddies up the discussion, but I just want to make sure I understand the build system in place at Intel. For each build of Lustre, you guys build the kernel (to apply Lustre patches), then build ldiskfs/zfs against this new kernel (without actually installing or running the new kernel), and then build Lustre against the newly built ldiskfs/zfs/kernel? And all this is done without ever installing the RPMs that are built (i.e. you point the build system for the current package being built (e.g. lustre) at the working directory of each recently built dependent package (e.g. ldiskfs/zfs/etc))?
Chris M, Let me step back and try to explain myself more clearly:
The current build process builds zfs/spl from source every time. I would very much like to just install the latest zfs/spl dkms rpms (spl-dkms, zfs-dkms) on our build machines and build from those. The problem with this is that Lustre osd-zfs needs to build against the zfs binaries for the kernel we're building against. Once LU-20 (patchless server kernel) is completed, this become MUCH easier, but until then we build a kernel during the Lustre build process, thus we need to build kmod-spl-* and kmod-zfs-* during the Lustre build process after we build the kernel.
The dkms tool provides a way (in theory) to build kernel sepcific kmod packages from a given dkms package via the command "dkms mkkms". That specific dkms sub-command is broken in the latest dkms release, but is fixed by the above linked patch (it may not say so in the comment but it fixes "dkms mkkms" and "dkms match" and "dkms uninstall").
So given the above this would let just spl-dkms and zfs-dkms be installed on the build machines and then kmod-zfs-* and kmod-spl-* could then be generated for a given kernel and Lustre could be built against those generated packages.
The next problem that you would run into is that spl-dkms and zfs-dkms do not include the required spec files, they include the pre-configure spl-kmod.spec.in and zfs-kmod.spec.in but not the post configure spl-kmod.spec and the zfs-kmod.spec, so even with a working "dkms mkkms" command, kmod-spl-* and kmod-zfs-* can't be built.
To address some of your specific concerns:
ZFS is being packaged into many Linux distributions. You guys really need to get comfortable with building against other people's packages.
As I described, I've tried, but there are more barriers than can be quickly fixed for a 2.4 release, going forward I would be happy to work with you and Chris G to get a more robust and community friendly build system in place.
The route you are following is just going to make it even harder for most users of Lustre on ZFS to properly build Lustre with ZFS. You're making it harder (read "impossible") for LLNL to use your build system, and we're the first productions users of Lustre on ZFS. I don't want to get into a big rant about the problems with the Intel build methodology.
I'm pretty new, so I don't have any vested interest in keeping around the lbuild system; I'm all for a set of spec files that just cleanly build Lustre. I'm just trying to get spl and zfs uptodate without blowing up the existing build system. I think currently building Lustre with ZFS is pretty straight forward, but I don't use the lbuild command to build and test locally.
I will just again encourage you to find a way to use the spl/zfs DKMS packages straight from zfsonlinux.org, unmolested.
I believe I have tried, but given the current issues and the structure of the build system, that's not feasible quickly.
You should look into using Fedora's mock tool. "Mock creates chroots and builds packages in them. Its only task is to reliably populate a chroot and attempt to build a package in that chroot."
That looks like a useful tool for LU-1199 which seems to have a lively discussion under way.
The dkms tool has a bug:
dkms mkkmp never runs, this is fixed upstream in dkms master.
I'm not sure what "mkkmp" is, and the commit that you point to doesn't say anything about mkkmp. Could you please elaborate?
This prevents installing dkms packages on the build servers and then just building the kmod packages cleanly from those.
What kmod packages exactly? Do you mean building lustre kmod packages? Because if you're using the spl/zfs DKMS packages you don't get involved with the spl/zfs kmod packages.
There's also an issue building kmod packages from the spl/zfs-dkms rpms even after applying the upstream fix. The rpms only provide spl/zfs-kmod.spec.in and not spl/zfs-kmod.spec.
I am really lost on this one. What kmod packages are you trying to mix with dkms, and why? And a spec file is used to create an rpm, so why would you need the rpm to contain a spec file?
Can you please document the commands that you are trying to run, and explain the approach in a bit more depth?
The lbuild system essentially builds everything under a buildroot, this includs kernel sources spl and zfs sources.
ZFS is being packaged into many Linux distributions. You guys really need to get comfortable with building against other people's packages.
The route you are following is just going to make it even harder for most users of Lustre on ZFS to properly build Lustre with ZFS. You're making it harder (read "impossible") for LLNL to use your build system, and we're the first productions users of Lustre on ZFS. I don't want to get into a big rant about the problems with the Intel build methodology.
I will just again encourage you to find a way to use the spl/zfs DKMS packages straight from zfsonlinux.org, unmolested.
I've added Chris Gearing to this issue so he can follow along. Chris: this is what we need to change about the Intel Lustre build system. You should look into using Fedora's mock tool.
"Mock creates chroots and builds packages in them. Its only task is to reliably populate a chroot and attempt to build a package in that chroot."
Brian,
The dkms tool has a bug:
dkms mkkmp never runs, this is fixed upstream in dkms master. This prevents installing dkms packages on the build servers and then just building the kmod packages cleanly from those.
There's also an issue building kmod packages from the spl/zfs-dkms rpms even after applying the upstream fix. The rpms only provide spl/zfs-kmod.spec.in and not spl/zfs-kmod.spec.
I ran into another issue of zfs failing to build (in our build system), because of how kmodtool looks for kernel versions, and I wasn't specifying which version to build so it was building from the default search list, which didn't include the kernel I was building against.
The lbuild system essentially builds everything under a buildroot, this includs kernel sources spl and zfs sources. dkms and the zfs and spl spec files don't lend themselves to building in an environment like this. The build process is absolutely forbidden from changing anything outside of the buildroot, so the issue boils down to this: We build a kernel almost everytime we do a lustre build, so we need zfs and spl to build against that kernel also, but the default place to look for those objects is /usr/src/spl-<splversion>/<kernelversion>/ which violates the "touching anything outside of buildroot", so I need a way to change where we look for the kernel specific zfs and spl objects. I also need to convince spl and zfs to build against a kernel that isn't present in /usr/src/kernels or in /lib/modules
Landed for 2.4