Details
-
Bug
-
Resolution: Fixed
-
Major
-
Lustre 2.4.0
-
Builds from the build server
-
3
-
7569
Description
While working on LU-3109 it seems the ZFS we package is old.
Christopher Morrone added a comment - 05/Apr/13 1:04 AM rc10 is old, you should definitely upgrade to the latest rc of 0.6.0. 0.6.1 is a little TOO new, because packaging has changed there, and lustre will need a little tweaking to find the new paths and things automatically. You can build by hand by giving spl and zfs paths, but the latest 0.6.0 rc will just be easier.
It seems we need to say current with the ZFS version.
Attachments
Activity
The dkms tool provides a way (in theory) to build kernel sepcific kmod packages from a given dkms package via the command "dkms mkkms". That specific dkms sub-command is broken in the latest dkms release, but is fixed by the above linked patch (it may not say so in the comment but it fixes "dkms mkkms" and "dkms match" and "dkms uninstall").
So given the above this would let just spl-dkms and zfs-dkms be installed on the build machines and then kmod-zfs-* and kmod-spl-* could then be generated for a given kernel and Lustre could be built against those generated packages.
I don't understand that at all. In order to use the dkms packages from spl/zfs, you would need to both install the kernel that you care about and install the dkms packages from spl/zfs.
And all of this complexity that you are adding is expressly to deal with the fact that you can't install the rpm packages in your current build system. If you could install the correct kernel, you could also just built spl/zfs kmod packages directly like we do, instead of the more convoluted dkms-to-kmod route. And then you could install those packages, which would then let you build lustre.
But you guys haven't been able to build against installed rpms, so I'm not sure how any of that is relevant.
I apologize if this muddies up the discussion, but I just want to make sure I understand the build system in place at Intel. For each build of Lustre, you guys build the kernel (to apply Lustre patches), then build ldiskfs/zfs against this new kernel (without actually installing or running the new kernel), and then build Lustre against the newly built ldiskfs/zfs/kernel? And all this is done without ever installing the RPMs that are built (i.e. you point the build system for the current package being built (e.g. lustre) at the working directory of each recently built dependent package (e.g. ldiskfs/zfs/etc))?
The build system is lbuild (in the lustre tree: lustre-release/contrib/lbuild/lbuild), so you can read exactly what it does, but as I understand it (skipping some reuse optimizations):
We build a patched kernel, place it in $BUILDROOT/usr/src/kernels/<KVER>
We build spl/zfs against that kernel and place them in $BUILDROOT/usr/src/spl-<SPLVER> and $BUILDROOT/usr/src/zfs-<ZFSVER>
Not that we also build the kmod rpms which get unwrapped into
$BUILDROOT/usr/src/spl-<SPLVER>/<KVER>
$BUILDROOT/usr/src/zfs-<ZFSVER>/<KVER>
Then we build lustre against that kernel and those versions of spl and zfs.
The current build process builds zfs/spl from source every time. I would very much like to just install the latest zfs/spl dkms rpms (spl-dkms, zfs-dkms) on our build machines and build from those. The problem with this is that Lustre osd-zfs needs to build against the zfs binaries for the kernel we're building against. Once
LU-20(patchless server kernel) is completed, this become MUCH easier, but until then we build a kernel during the Lustre build process, thus we need to build kmod-spl-* and kmod-zfs-* during the Lustre build process after we build the kernel.
I apologize if this muddies up the discussion, but I just want to make sure I understand the build system in place at Intel. For each build of Lustre, you guys build the kernel (to apply Lustre patches), then build ldiskfs/zfs against this new kernel (without actually installing or running the new kernel), and then build Lustre against the newly built ldiskfs/zfs/kernel? And all this is done without ever installing the RPMs that are built (i.e. you point the build system for the current package being built (e.g. lustre) at the working directory of each recently built dependent package (e.g. ldiskfs/zfs/etc))?
Chris M, Let me step back and try to explain myself more clearly:
The current build process builds zfs/spl from source every time. I would very much like to just install the latest zfs/spl dkms rpms (spl-dkms, zfs-dkms) on our build machines and build from those. The problem with this is that Lustre osd-zfs needs to build against the zfs binaries for the kernel we're building against. Once LU-20 (patchless server kernel) is completed, this become MUCH easier, but until then we build a kernel during the Lustre build process, thus we need to build kmod-spl-* and kmod-zfs-* during the Lustre build process after we build the kernel.
The dkms tool provides a way (in theory) to build kernel sepcific kmod packages from a given dkms package via the command "dkms mkkms". That specific dkms sub-command is broken in the latest dkms release, but is fixed by the above linked patch (it may not say so in the comment but it fixes "dkms mkkms" and "dkms match" and "dkms uninstall").
So given the above this would let just spl-dkms and zfs-dkms be installed on the build machines and then kmod-zfs-* and kmod-spl-* could then be generated for a given kernel and Lustre could be built against those generated packages.
The next problem that you would run into is that spl-dkms and zfs-dkms do not include the required spec files, they include the pre-configure spl-kmod.spec.in and zfs-kmod.spec.in but not the post configure spl-kmod.spec and the zfs-kmod.spec, so even with a working "dkms mkkms" command, kmod-spl-* and kmod-zfs-* can't be built.
To address some of your specific concerns:
ZFS is being packaged into many Linux distributions. You guys really need to get comfortable with building against other people's packages.
As I described, I've tried, but there are more barriers than can be quickly fixed for a 2.4 release, going forward I would be happy to work with you and Chris G to get a more robust and community friendly build system in place.
The route you are following is just going to make it even harder for most users of Lustre on ZFS to properly build Lustre with ZFS. You're making it harder (read "impossible") for LLNL to use your build system, and we're the first productions users of Lustre on ZFS. I don't want to get into a big rant about the problems with the Intel build methodology.
I'm pretty new, so I don't have any vested interest in keeping around the lbuild system; I'm all for a set of spec files that just cleanly build Lustre. I'm just trying to get spl and zfs uptodate without blowing up the existing build system. I think currently building Lustre with ZFS is pretty straight forward, but I don't use the lbuild command to build and test locally.
I will just again encourage you to find a way to use the spl/zfs DKMS packages straight from zfsonlinux.org, unmolested.
I believe I have tried, but given the current issues and the structure of the build system, that's not feasible quickly.
You should look into using Fedora's mock tool. "Mock creates chroots and builds packages in them. Its only task is to reliably populate a chroot and attempt to build a package in that chroot."
That looks like a useful tool for LU-1199 which seems to have a lively discussion under way.
The dkms tool has a bug:
dkms mkkmp never runs, this is fixed upstream in dkms master.
I'm not sure what "mkkmp" is, and the commit that you point to doesn't say anything about mkkmp. Could you please elaborate?
This prevents installing dkms packages on the build servers and then just building the kmod packages cleanly from those.
What kmod packages exactly? Do you mean building lustre kmod packages? Because if you're using the spl/zfs DKMS packages you don't get involved with the spl/zfs kmod packages.
There's also an issue building kmod packages from the spl/zfs-dkms rpms even after applying the upstream fix. The rpms only provide spl/zfs-kmod.spec.in and not spl/zfs-kmod.spec.
I am really lost on this one. What kmod packages are you trying to mix with dkms, and why? And a spec file is used to create an rpm, so why would you need the rpm to contain a spec file?
Can you please document the commands that you are trying to run, and explain the approach in a bit more depth?
The lbuild system essentially builds everything under a buildroot, this includs kernel sources spl and zfs sources.
ZFS is being packaged into many Linux distributions. You guys really need to get comfortable with building against other people's packages.
The route you are following is just going to make it even harder for most users of Lustre on ZFS to properly build Lustre with ZFS. You're making it harder (read "impossible") for LLNL to use your build system, and we're the first productions users of Lustre on ZFS. I don't want to get into a big rant about the problems with the Intel build methodology.
I will just again encourage you to find a way to use the spl/zfs DKMS packages straight from zfsonlinux.org, unmolested.
I've added Chris Gearing to this issue so he can follow along. Chris: this is what we need to change about the Intel Lustre build system. You should look into using Fedora's mock tool.
"Mock creates chroots and builds packages in them. Its only task is to reliably populate a chroot and attempt to build a package in that chroot."
Brian,
The dkms tool has a bug:
dkms mkkmp never runs, this is fixed upstream in dkms master. This prevents installing dkms packages on the build servers and then just building the kmod packages cleanly from those.
There's also an issue building kmod packages from the spl/zfs-dkms rpms even after applying the upstream fix. The rpms only provide spl/zfs-kmod.spec.in and not spl/zfs-kmod.spec.
I ran into another issue of zfs failing to build (in our build system), because of how kmodtool looks for kernel versions, and I wasn't specifying which version to build so it was building from the default search list, which didn't include the kernel I was building against.
The lbuild system essentially builds everything under a buildroot, this includs kernel sources spl and zfs sources. dkms and the zfs and spl spec files don't lend themselves to building in an environment like this. The build process is absolutely forbidden from changing anything outside of the buildroot, so the issue boils down to this: We build a kernel almost everytime we do a lustre build, so we need zfs and spl to build against that kernel also, but the default place to look for those objects is /usr/src/spl-<splversion>/<kernelversion>/ which violates the "touching anything outside of buildroot", so I need a way to change where we look for the kernel specific zfs and spl objects. I also need to convince spl and zfs to build against a kernel that isn't present in /usr/src/kernels or in /lib/modules
Nathaniel, you lost me. Can you walk me through the remaining issues.
> zfs has a bug in it that prevents it from building kmod packages at all.
Can you point me to a specific bug for this? I've test built the entire spl+zfs+lustre stack in the following configurations and it works just fine.
*) spl+zfs+lustre dkms packages. These are the style packages we're hosting in the ZoL EPEL repository, they build cleanly for us and I've heard they work just for for other sites. You can find the latest patches against 2.6.32 at https://github.com/chaos/lustre/commits/v2_3_64-dkms.
*) spl+zfs+lustre kmod packages. These are the style packages we using internally at LLNL and include in our CHAOS/TOSS distribution. You can find the full set of patches at https://github.com/chaos/lustre/commits/2.3.64-llnl
*) spl+zfs dkms packages and a lustre kmod package. I just verified this style of building also works as expected, you just need to add my original LU-3117 patch. This option should be the easiest for you to get going, just install the official packages in to your image and build lustre like usual.
> zfs doesn't build cleanly
Can you be more specific. The ZFS code builds reliably for us and many many many other people. There only known issue which might be causing your issue and has already been fixed is to make sure you're using the dkms-2.2.0.3-2.zfs1.el6.noarch package provided by the ZoL EPEL repository. It includes a fix to ensure that the SPL is always built before the ZFS code. Your version of dkms will be automatically updated when you install the ZFS packages.
> locally to accept an spldir argument to configure
I still don't understand why this is needed. Perhaps you can explain how your trying to build things and we can come up with a clean way to resolve the outstanding issues.
After spinning my wheels on dkms, I figured out that the dkms that comes with zfs (latest official release) has a bug in it that prevents it from building kmod packages at all. Once that's straightened out, spl and zfs don't lend themselves to building kmod packages via dkms, because they need to update the spec files during configure, while dkms just wants to build from source provided by
{zfs,spl}-dkms packages, and even when that's worked around, zfs doesn't build cleanly because it's missing some files for spl in the dkms built tree (Module.symvers). So everything is overcomeable but it's looking like shortest path is to munch zfs spec files locally to accept an spldir argument to configure, and then working on fixing things up post 2.4.
Nathaniel, I'm all for improving the build system, and installing extra DKMS packages, but the existing ZFS binary packages cannot be removed until your patch is landed, or it will break other patches in flight. There is a non-zero risk that this change would cause the existing builds or tests to fail in some way, so this change needs to be coordinated with Chris Gearing. I would suggest to open a new bug instead of reusing the old one.
Landed for 2.4