[LU-3117] Build: ZFS version is old Created: 05/Apr/13 Updated: 07/May/13 Resolved: 07/May/13 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.0 |
| Fix Version/s: | Lustre 2.4.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | Keith Mannthey (Inactive) | Assignee: | Nathaniel Clark |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | patch, zfs | ||
| Environment: |
Builds from the build server |
||
| Issue Links: |
|
||||
| Severity: | 3 | ||||
| Rank (Obsolete): | 7569 | ||||
| Description |
|
While working on Christopher Morrone added a comment - 05/Apr/13 1:04 AM rc10 is old, you should definitely upgrade to the latest rc of 0.6.0. 0.6.1 is a little TOO new, because packaging has changed there, and lustre will need a little tweaking to find the new paths and things automatically. You can build by hand by giving spl and zfs paths, but the latest 0.6.0 rc will just be easier. It seems we need to say current with the ZFS version. |
| Comments |
| Comment by Jodi Levi (Inactive) [ 05/Apr/13 ] |
|
Nathaniel, |
| Comment by Andreas Dilger [ 05/Apr/13 ] |
|
Chris, Brian, Are there any fixes after the 0.6.1 tag that we should include? |
| Comment by Brian Behlendorf [ 05/Apr/13 ] |
|
The packaging changes which effect the Lustre build system are the only concern. My intention is to push a patch today which addresses these issues. If we can get this merged before 2.4 is tagged then people will be able to run ZFS 0.6.1 with Lustre 2.4.0 easily. The intention is RHEL/Centos users can add ZFS+Lustre support as follows: # Install the ZFS EPEL repository and install the ZFS DKMS packages. sudo yum localinstall --nogpgcheck http://archive.zfsonlinux.org/epel/zfs-release-1-2.el6.noarch.rpm sudo yum install zfs zfs-devel # Build and install Lustre as usual, it will automatically detect the ZFS packages and enable OSD support. cd lustre sh autogen.sh ./configure make rpm |
| Comment by Brian Behlendorf [ 05/Apr/13 ] |
|
Two more thoughts related to this:
|
| Comment by Brian Behlendorf [ 05/Apr/13 ] |
| Comment by Nathaniel Clark [ 15/Apr/13 ] |
|
I've worked with Chris to update the zfs & spl versions in the build system to 0.6.1. This will probably break builds until this patch is landed with a fix to the lbuild script for the new versions of zfs and spl. The change in the new version that caused lbuild to fail is the removal of all the autotools products. The fix in the patch is to run autogen.sh before trying to call configure. |
| Comment by Nathaniel Clark [ 18/Apr/13 ] |
|
Structure of the zfs spec files makes it hard to override the location of spl directory during build in zfs 0.6.1 |
| Comment by Nathaniel Clark [ 18/Apr/13 ] |
|
Setup pull request for ZFS https://github.com/zfsonlinux/zfs/pull/1413 to add ability to override spl directory passed to configure during rpm creation. |
| Comment by Christopher Morrone [ 22/Apr/13 ] |
|
Nathaniel, think about it this way: You are modifying an rpm spec file, which means that you are in an rpm environment. However, your patch is explicitly to subvert the rpm way of building packages. I understand why you are trying to do this, and I can certainly commiserate. The fundamental problem is that the Intel Lustre build farm lacks any system to recognize and honor rpm dependecies. But while I understand, I don't feel like we should condone that bad behavior of the build farm by adjusting zfs to make it easier to behave badly. By Lustre 2.5, I very much hope to see the build farm improved to handle rpms in a more reasonable fashion. In the mean time, perhaps you can put the workaround at the source of the problem (i.e. the build farm). Why do you guys want to build spl/zfs at all? Why not simply install the spl/zfs packages? DKMS versions of the spl/zfs packages are available. |
| Comment by Andreas Dilger [ 22/Apr/13 ] |
|
Chris, in the past we didn't have DKMS packages for ZFS, so we had to build our own. Since we cannot distribute the binary packages, it makes sense to use the DKMS packages if we can use those in our testbed. |
| Comment by Nathaniel Clark [ 22/Apr/13 ] |
|
Chris, I'm all for having the builders install zfs from a remote repository and just build against it. I think that would be a much better solution. (This is my first foray into the world of lbuild). Andreas, Do you think we should reopen TT-391 (zfs packages installed on builders), have zfs-dkms-0.6.1 installed on all builders and then add a link to the zfs yum repository (mirroring it locally for installers)? |
| Comment by Brian Behlendorf [ 22/Apr/13 ] |
|
I agree. From my point of view it would be ideal if you just added the ZFS repository to your builder image. The benefits would be: 1) ZFS won't need to be rebuilt for each Lustre build. The original patch I posted has support for building against the DKMS style packages from the repository so you should just need to patch Lustre, install the packages in the images, and disable the lbuild ZFS infrastructure. The one gotcha I can think of offhand is that we're not distributing packages for 32-bit systems until some ZFS specific issues get resolved. |
| Comment by Andreas Dilger [ 23/Apr/13 ] |
|
Nathaniel, I'm all for improving the build system, and installing extra DKMS packages, but the existing ZFS binary packages cannot be removed until your patch is landed, or it will break other patches in flight. There is a non-zero risk that this change would cause the existing builds or tests to fail in some way, so this change needs to be coordinated with Chris Gearing. I would suggest to open a new bug instead of reusing the old one. |
| Comment by Nathaniel Clark [ 01/May/13 ] |
|
After spinning my wheels on dkms, I figured out that the dkms that comes with zfs (latest official release) has a bug in it that prevents it from building kmod packages at all. Once that's straightened out, spl and zfs don't lend themselves to building kmod packages via dkms, because they need to update the spec files during configure, while dkms just wants to build from source provided by {zfs,spl}-dkms packages, and even when that's worked around, zfs doesn't build cleanly because it's missing some files for spl in the dkms built tree (Module.symvers). So everything is overcomeable but it's looking like shortest path is to munch zfs spec files locally to accept an spldir argument to configure, and then working on fixing things up post 2.4. |
| Comment by Brian Behlendorf [ 02/May/13 ] |
|
Nathaniel, you lost me. Can you walk me through the remaining issues. > zfs has a bug in it that prevents it from building kmod packages at all. Can you point me to a specific bug for this? I've test built the entire spl+zfs+lustre stack in the following configurations and it works just fine. *) spl+zfs+lustre dkms packages. These are the style packages we're hosting in the ZoL EPEL repository, they build cleanly for us and I've heard they work just for for other sites. You can find the latest patches against 2.6.32 at https://github.com/chaos/lustre/commits/v2_3_64-dkms. *) spl+zfs+lustre kmod packages. These are the style packages we using internally at LLNL and include in our CHAOS/TOSS distribution. You can find the full set of patches at https://github.com/chaos/lustre/commits/2.3.64-llnl *) spl+zfs dkms packages and a lustre kmod package. I just verified this style of building also works as expected, you just need to add my original > zfs doesn't build cleanly Can you be more specific. The ZFS code builds reliably for us and many many many other people. There only known issue which might be causing your issue and has already been fixed is to make sure you're using the dkms-2.2.0.3-2.zfs1.el6.noarch package provided by the ZoL EPEL repository. It includes a fix to ensure that the SPL is always built before the ZFS code. Your version of dkms will be automatically updated when you install the ZFS packages. > locally to accept an spldir argument to configure I still don't understand why this is needed. Perhaps you can explain how your trying to build things and we can come up with a clean way to resolve the outstanding issues. |
| Comment by Nathaniel Clark [ 03/May/13 ] |
|
Brian, The dkms tool has a bug: There's also an issue building kmod packages from the spl/zfs-dkms rpms even after applying the upstream fix. The rpms only provide spl/zfs-kmod.spec.in and not spl/zfs-kmod.spec. I ran into another issue of zfs failing to build (in our build system), because of how kmodtool looks for kernel versions, and I wasn't specifying which version to build so it was building from the default search list, which didn't include the kernel I was building against. The lbuild system essentially builds everything under a buildroot, this includs kernel sources spl and zfs sources. dkms and the zfs and spl spec files don't lend themselves to building in an environment like this. The build process is absolutely forbidden from changing anything outside of the buildroot, so the issue boils down to this: We build a kernel almost everytime we do a lustre build, so we need zfs and spl to build against that kernel also, but the default place to look for those objects is /usr/src/spl-<splversion>/<kernelversion>/ which violates the "touching anything outside of buildroot", so I need a way to change where we look for the kernel specific zfs and spl objects. I also need to convince spl and zfs to build against a kernel that isn't present in /usr/src/kernels or in /lib/modules |
| Comment by Christopher Morrone [ 03/May/13 ] |
I'm not sure what "mkkmp" is, and the commit that you point to doesn't say anything about mkkmp. Could you please elaborate?
What kmod packages exactly? Do you mean building lustre kmod packages? Because if you're using the spl/zfs DKMS packages you don't get involved with the spl/zfs kmod packages.
I am really lost on this one. What kmod packages are you trying to mix with dkms, and why? And a spec file is used to create an rpm, so why would you need the rpm to contain a spec file? Can you please document the commands that you are trying to run, and explain the approach in a bit more depth?
ZFS is being packaged into many Linux distributions. You guys really need to get comfortable with building against other people's packages. The route you are following is just going to make it even harder for most users of Lustre on ZFS to properly build Lustre with ZFS. You're making it harder (read "impossible") for LLNL to use your build system, and we're the first productions users of Lustre on ZFS. I don't want to get into a big rant about the problems with the Intel build methodology. I will just again encourage you to find a way to use the spl/zfs DKMS packages straight from zfsonlinux.org, unmolested. I've added Chris Gearing to this issue so he can follow along. Chris: this is what we need to change about the Intel Lustre build system. You should look into using Fedora's mock tool. "Mock creates chroots and builds packages in them. Its only task is to reliably populate a chroot and attempt to build a package in that chroot." |
| Comment by Nathaniel Clark [ 06/May/13 ] |
|
Chris M, Let me step back and try to explain myself more clearly: The current build process builds zfs/spl from source every time. I would very much like to just install the latest zfs/spl dkms rpms (spl-dkms, zfs-dkms) on our build machines and build from those. The problem with this is that Lustre osd-zfs needs to build against the zfs binaries for the kernel we're building against. Once The dkms tool provides a way (in theory) to build kernel sepcific kmod packages from a given dkms package via the command "dkms mkkms". That specific dkms sub-command is broken in the latest dkms release, but is fixed by the above linked patch (it may not say so in the comment but it fixes "dkms mkkms" and "dkms match" and "dkms uninstall"). So given the above this would let just spl-dkms and zfs-dkms be installed on the build machines and then kmod-zfs-* and kmod-spl-* could then be generated for a given kernel and Lustre could be built against those generated packages. The next problem that you would run into is that spl-dkms and zfs-dkms do not include the required spec files, they include the pre-configure spl-kmod.spec.in and zfs-kmod.spec.in but not the post configure spl-kmod.spec and the zfs-kmod.spec, so even with a working "dkms mkkms" command, kmod-spl-* and kmod-zfs-* can't be built. To address some of your specific concerns:
As I described, I've tried, but there are more barriers than can be quickly fixed for a 2.4 release, going forward I would be happy to work with you and Chris G to get a more robust and community friendly build system in place.
I'm pretty new, so I don't have any vested interest in keeping around the lbuild system; I'm all for a set of spec files that just cleanly build Lustre. I'm just trying to get spl and zfs uptodate without blowing up the existing build system. I think currently building Lustre with ZFS is pretty straight forward, but I don't use the lbuild command to build and test locally.
I believe I have tried, but given the current issues and the structure of the build system, that's not feasible quickly.
That looks like a useful tool for |
| Comment by Prakash Surya (Inactive) [ 07/May/13 ] |
I apologize if this muddies up the discussion, but I just want to make sure I understand the build system in place at Intel. For each build of Lustre, you guys build the kernel (to apply Lustre patches), then build ldiskfs/zfs against this new kernel (without actually installing or running the new kernel), and then build Lustre against the newly built ldiskfs/zfs/kernel? And all this is done without ever installing the RPMs that are built (i.e. you point the build system for the current package being built (e.g. lustre) at the working directory of each recently built dependent package (e.g. ldiskfs/zfs/etc))? |
| Comment by Nathaniel Clark [ 07/May/13 ] |
The build system is lbuild (in the lustre tree: lustre-release/contrib/lbuild/lbuild), so you can read exactly what it does, but as I understand it (skipping some reuse optimizations): We build a patched kernel, place it in $BUILDROOT/usr/src/kernels/<KVER> |
| Comment by Christopher Morrone [ 07/May/13 ] |
I don't understand that at all. In order to use the dkms packages from spl/zfs, you would need to both install the kernel that you care about and install the dkms packages from spl/zfs. And all of this complexity that you are adding is expressly to deal with the fact that you can't install the rpm packages in your current build system. If you could install the correct kernel, you could also just built spl/zfs kmod packages directly like we do, instead of the more convoluted dkms-to-kmod route. And then you could install those packages, which would then let you build lustre. But you guys haven't been able to build against installed rpms, so I'm not sure how any of that is relevant. |
| Comment by Nathaniel Clark [ 07/May/13 ] |
|
Landed for 2.4 |