Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.4.0
    • Lustre 2.4.0
    • Builds from the build server
    • 3
    • 7569

    Description

      While working on LU-3109 it seems the ZFS we package is old.

      Christopher Morrone added a comment - 05/Apr/13 1:04 AM
      
      rc10 is old, you should definitely upgrade to the latest rc of 0.6.0. 0.6.1 is a little TOO new, because packaging has changed there, and lustre will need a little tweaking to find the new paths and things automatically. You can build by hand by giving spl and zfs paths, but the latest 0.6.0 rc will just be easier.
      
      

      It seems we need to say current with the ZFS version.

      Attachments

        Activity

          [LU-3117] Build: ZFS version is old

          The dkms tool has a bug:
          dkms mkkmp never runs, this is fixed upstream in dkms master.

          I'm not sure what "mkkmp" is, and the commit that you point to doesn't say anything about mkkmp. Could you please elaborate?

          This prevents installing dkms packages on the build servers and then just building the kmod packages cleanly from those.

          What kmod packages exactly? Do you mean building lustre kmod packages? Because if you're using the spl/zfs DKMS packages you don't get involved with the spl/zfs kmod packages.

          There's also an issue building kmod packages from the spl/zfs-dkms rpms even after applying the upstream fix. The rpms only provide spl/zfs-kmod.spec.in and not spl/zfs-kmod.spec.

          I am really lost on this one. What kmod packages are you trying to mix with dkms, and why? And a spec file is used to create an rpm, so why would you need the rpm to contain a spec file?

          Can you please document the commands that you are trying to run, and explain the approach in a bit more depth?

          The lbuild system essentially builds everything under a buildroot, this includs kernel sources spl and zfs sources.

          ZFS is being packaged into many Linux distributions. You guys really need to get comfortable with building against other people's packages.

          The route you are following is just going to make it even harder for most users of Lustre on ZFS to properly build Lustre with ZFS. You're making it harder (read "impossible") for LLNL to use your build system, and we're the first productions users of Lustre on ZFS. I don't want to get into a big rant about the problems with the Intel build methodology.

          I will just again encourage you to find a way to use the spl/zfs DKMS packages straight from zfsonlinux.org, unmolested.

          I've added Chris Gearing to this issue so he can follow along. Chris: this is what we need to change about the Intel Lustre build system. You should look into using Fedora's mock tool.

          "Mock creates chroots and builds packages in them. Its only task is to reliably populate a chroot and attempt to build a package in that chroot."

          morrone Christopher Morrone (Inactive) added a comment - The dkms tool has a bug: dkms mkkmp never runs, this is fixed upstream in dkms master. I'm not sure what "mkkmp" is, and the commit that you point to doesn't say anything about mkkmp. Could you please elaborate? This prevents installing dkms packages on the build servers and then just building the kmod packages cleanly from those. What kmod packages exactly? Do you mean building lustre kmod packages? Because if you're using the spl/zfs DKMS packages you don't get involved with the spl/zfs kmod packages. There's also an issue building kmod packages from the spl/zfs-dkms rpms even after applying the upstream fix. The rpms only provide spl/zfs-kmod.spec.in and not spl/zfs-kmod.spec. I am really lost on this one. What kmod packages are you trying to mix with dkms, and why? And a spec file is used to create an rpm, so why would you need the rpm to contain a spec file? Can you please document the commands that you are trying to run, and explain the approach in a bit more depth? The lbuild system essentially builds everything under a buildroot, this includs kernel sources spl and zfs sources. ZFS is being packaged into many Linux distributions. You guys really need to get comfortable with building against other people's packages. The route you are following is just going to make it even harder for most users of Lustre on ZFS to properly build Lustre with ZFS. You're making it harder (read "impossible") for LLNL to use your build system, and we're the first productions users of Lustre on ZFS. I don't want to get into a big rant about the problems with the Intel build methodology. I will just again encourage you to find a way to use the spl/zfs DKMS packages straight from zfsonlinux.org, unmolested. I've added Chris Gearing to this issue so he can follow along. Chris: this is what we need to change about the Intel Lustre build system. You should look into using Fedora's mock tool. "Mock creates chroots and builds packages in them. Its only task is to reliably populate a chroot and attempt to build a package in that chroot."

          Brian,

          The dkms tool has a bug:
          dkms mkkmp never runs, this is fixed upstream in dkms master. This prevents installing dkms packages on the build servers and then just building the kmod packages cleanly from those.

          There's also an issue building kmod packages from the spl/zfs-dkms rpms even after applying the upstream fix. The rpms only provide spl/zfs-kmod.spec.in and not spl/zfs-kmod.spec.

          I ran into another issue of zfs failing to build (in our build system), because of how kmodtool looks for kernel versions, and I wasn't specifying which version to build so it was building from the default search list, which didn't include the kernel I was building against.

          The lbuild system essentially builds everything under a buildroot, this includs kernel sources spl and zfs sources. dkms and the zfs and spl spec files don't lend themselves to building in an environment like this. The build process is absolutely forbidden from changing anything outside of the buildroot, so the issue boils down to this: We build a kernel almost everytime we do a lustre build, so we need zfs and spl to build against that kernel also, but the default place to look for those objects is /usr/src/spl-<splversion>/<kernelversion>/ which violates the "touching anything outside of buildroot", so I need a way to change where we look for the kernel specific zfs and spl objects. I also need to convince spl and zfs to build against a kernel that isn't present in /usr/src/kernels or in /lib/modules

          utopiabound Nathaniel Clark added a comment - Brian, The dkms tool has a bug: dkms mkkmp never runs, this is fixed upstream in dkms master . This prevents installing dkms packages on the build servers and then just building the kmod packages cleanly from those. There's also an issue building kmod packages from the spl/zfs-dkms rpms even after applying the upstream fix. The rpms only provide spl/zfs-kmod.spec.in and not spl/zfs-kmod.spec. I ran into another issue of zfs failing to build (in our build system), because of how kmodtool looks for kernel versions, and I wasn't specifying which version to build so it was building from the default search list, which didn't include the kernel I was building against. The lbuild system essentially builds everything under a buildroot, this includs kernel sources spl and zfs sources. dkms and the zfs and spl spec files don't lend themselves to building in an environment like this. The build process is absolutely forbidden from changing anything outside of the buildroot, so the issue boils down to this: We build a kernel almost everytime we do a lustre build, so we need zfs and spl to build against that kernel also, but the default place to look for those objects is /usr/src/spl-<splversion>/<kernelversion>/ which violates the "touching anything outside of buildroot", so I need a way to change where we look for the kernel specific zfs and spl objects. I also need to convince spl and zfs to build against a kernel that isn't present in /usr/src/kernels or in /lib/modules

          Nathaniel, you lost me. Can you walk me through the remaining issues.

          > zfs has a bug in it that prevents it from building kmod packages at all.

          Can you point me to a specific bug for this? I've test built the entire spl+zfs+lustre stack in the following configurations and it works just fine.

          *) spl+zfs+lustre dkms packages. These are the style packages we're hosting in the ZoL EPEL repository, they build cleanly for us and I've heard they work just for for other sites. You can find the latest patches against 2.6.32 at https://github.com/chaos/lustre/commits/v2_3_64-dkms.

          *) spl+zfs+lustre kmod packages. These are the style packages we using internally at LLNL and include in our CHAOS/TOSS distribution. You can find the full set of patches at https://github.com/chaos/lustre/commits/2.3.64-llnl

          *) spl+zfs dkms packages and a lustre kmod package. I just verified this style of building also works as expected, you just need to add my original LU-3117 patch. This option should be the easiest for you to get going, just install the official packages in to your image and build lustre like usual.

          > zfs doesn't build cleanly

          Can you be more specific. The ZFS code builds reliably for us and many many many other people. There only known issue which might be causing your issue and has already been fixed is to make sure you're using the dkms-2.2.0.3-2.zfs1.el6.noarch package provided by the ZoL EPEL repository. It includes a fix to ensure that the SPL is always built before the ZFS code. Your version of dkms will be automatically updated when you install the ZFS packages.

          > locally to accept an spldir argument to configure

          I still don't understand why this is needed. Perhaps you can explain how your trying to build things and we can come up with a clean way to resolve the outstanding issues.

          behlendorf Brian Behlendorf added a comment - Nathaniel, you lost me. Can you walk me through the remaining issues. > zfs has a bug in it that prevents it from building kmod packages at all. Can you point me to a specific bug for this? I've test built the entire spl+zfs+lustre stack in the following configurations and it works just fine. *) spl+zfs+lustre dkms packages. These are the style packages we're hosting in the ZoL EPEL repository, they build cleanly for us and I've heard they work just for for other sites. You can find the latest patches against 2.6.32 at https://github.com/chaos/lustre/commits/v2_3_64-dkms . *) spl+zfs+lustre kmod packages. These are the style packages we using internally at LLNL and include in our CHAOS/TOSS distribution. You can find the full set of patches at https://github.com/chaos/lustre/commits/2.3.64-llnl *) spl+zfs dkms packages and a lustre kmod package. I just verified this style of building also works as expected, you just need to add my original LU-3117 patch. This option should be the easiest for you to get going, just install the official packages in to your image and build lustre like usual. > zfs doesn't build cleanly Can you be more specific. The ZFS code builds reliably for us and many many many other people. There only known issue which might be causing your issue and has already been fixed is to make sure you're using the dkms-2.2.0.3-2.zfs1.el6.noarch package provided by the ZoL EPEL repository. It includes a fix to ensure that the SPL is always built before the ZFS code. Your version of dkms will be automatically updated when you install the ZFS packages. > locally to accept an spldir argument to configure I still don't understand why this is needed. Perhaps you can explain how your trying to build things and we can come up with a clean way to resolve the outstanding issues.

          After spinning my wheels on dkms, I figured out that the dkms that comes with zfs (latest official release) has a bug in it that prevents it from building kmod packages at all. Once that's straightened out, spl and zfs don't lend themselves to building kmod packages via dkms, because they need to update the spec files during configure, while dkms just wants to build from source provided by

          {zfs,spl}

          -dkms packages, and even when that's worked around, zfs doesn't build cleanly because it's missing some files for spl in the dkms built tree (Module.symvers). So everything is overcomeable but it's looking like shortest path is to munch zfs spec files locally to accept an spldir argument to configure, and then working on fixing things up post 2.4.

          utopiabound Nathaniel Clark added a comment - After spinning my wheels on dkms, I figured out that the dkms that comes with zfs (latest official release) has a bug in it that prevents it from building kmod packages at all. Once that's straightened out, spl and zfs don't lend themselves to building kmod packages via dkms, because they need to update the spec files during configure, while dkms just wants to build from source provided by {zfs,spl} -dkms packages, and even when that's worked around, zfs doesn't build cleanly because it's missing some files for spl in the dkms built tree (Module.symvers). So everything is overcomeable but it's looking like shortest path is to munch zfs spec files locally to accept an spldir argument to configure, and then working on fixing things up post 2.4.

          Nathaniel, I'm all for improving the build system, and installing extra DKMS packages, but the existing ZFS binary packages cannot be removed until your patch is landed, or it will break other patches in flight. There is a non-zero risk that this change would cause the existing builds or tests to fail in some way, so this change needs to be coordinated with Chris Gearing. I would suggest to open a new bug instead of reusing the old one.

          adilger Andreas Dilger added a comment - Nathaniel, I'm all for improving the build system, and installing extra DKMS packages, but the existing ZFS binary packages cannot be removed until your patch is landed, or it will break other patches in flight. There is a non-zero risk that this change would cause the existing builds or tests to fail in some way, so this change needs to be coordinated with Chris Gearing. I would suggest to open a new bug instead of reusing the old one.

          I agree. From my point of view it would be ideal if you just added the ZFS repository to your builder image. The benefits would be:

          1) ZFS won't need to be rebuilt for each Lustre build.
          2) ZFS will be automatically rebuild for kernel upgrades in the image.
          3) ZFS will get updated via 'yum update' as stable release are tagged.
          (It would be nice if you could setup a proxy to cache the packages locally,
          but since you shouldn't need to build them often it's not critical.)

          The original patch I posted has support for building against the DKMS style packages from the repository so you should just need to patch Lustre, install the packages in the images, and disable the lbuild ZFS infrastructure. The one gotcha I can think of offhand is that we're not distributing packages for 32-bit systems until some ZFS specific issues get resolved.

          behlendorf Brian Behlendorf added a comment - I agree. From my point of view it would be ideal if you just added the ZFS repository to your builder image. The benefits would be: 1) ZFS won't need to be rebuilt for each Lustre build. 2) ZFS will be automatically rebuild for kernel upgrades in the image. 3) ZFS will get updated via 'yum update' as stable release are tagged. (It would be nice if you could setup a proxy to cache the packages locally, but since you shouldn't need to build them often it's not critical.) The original patch I posted has support for building against the DKMS style packages from the repository so you should just need to patch Lustre, install the packages in the images, and disable the lbuild ZFS infrastructure. The one gotcha I can think of offhand is that we're not distributing packages for 32-bit systems until some ZFS specific issues get resolved.

          Chris, I'm all for having the builders install zfs from a remote repository and just build against it. I think that would be a much better solution. (This is my first foray into the world of lbuild).

          Andreas, Do you think we should reopen TT-391 (zfs packages installed on builders), have zfs-dkms-0.6.1 installed on all builders and then add a link to the zfs yum repository (mirroring it locally for installers)?

          utopiabound Nathaniel Clark added a comment - Chris, I'm all for having the builders install zfs from a remote repository and just build against it. I think that would be a much better solution. (This is my first foray into the world of lbuild). Andreas, Do you think we should reopen TT-391 (zfs packages installed on builders), have zfs-dkms-0.6.1 installed on all builders and then add a link to the zfs yum repository (mirroring it locally for installers)?

          Chris, in the past we didn't have DKMS packages for ZFS, so we had to build our own. Since we cannot distribute the binary packages, it makes sense to use the DKMS packages if we can use those in our testbed.

          adilger Andreas Dilger added a comment - Chris, in the past we didn't have DKMS packages for ZFS, so we had to build our own. Since we cannot distribute the binary packages, it makes sense to use the DKMS packages if we can use those in our testbed.

          Nathaniel, think about it this way: You are modifying an rpm spec file, which means that you are in an rpm environment. However, your patch is explicitly to subvert the rpm way of building packages.

          I understand why you are trying to do this, and I can certainly commiserate. The fundamental problem is that the Intel Lustre build farm lacks any system to recognize and honor rpm dependecies.

          But while I understand, I don't feel like we should condone that bad behavior of the build farm by adjusting zfs to make it easier to behave badly.

          By Lustre 2.5, I very much hope to see the build farm improved to handle rpms in a more reasonable fashion. In the mean time, perhaps you can put the workaround at the source of the problem (i.e. the build farm).

          Why do you guys want to build spl/zfs at all? Why not simply install the spl/zfs packages? DKMS versions of the spl/zfs packages are available.

          morrone Christopher Morrone (Inactive) added a comment - Nathaniel, think about it this way: You are modifying an rpm spec file, which means that you are in an rpm environment. However, your patch is explicitly to subvert the rpm way of building packages. I understand why you are trying to do this, and I can certainly commiserate. The fundamental problem is that the Intel Lustre build farm lacks any system to recognize and honor rpm dependecies. But while I understand, I don't feel like we should condone that bad behavior of the build farm by adjusting zfs to make it easier to behave badly. By Lustre 2.5, I very much hope to see the build farm improved to handle rpms in a more reasonable fashion. In the mean time, perhaps you can put the workaround at the source of the problem (i.e. the build farm). Why do you guys want to build spl/zfs at all? Why not simply install the spl/zfs packages? DKMS versions of the spl/zfs packages are available.

          Setup pull request for ZFS https://github.com/zfsonlinux/zfs/pull/1413 to add ability to override spl directory passed to configure during rpm creation.

          utopiabound Nathaniel Clark added a comment - Setup pull request for ZFS https://github.com/zfsonlinux/zfs/pull/1413 to add ability to override spl directory passed to configure during rpm creation.

          Structure of the zfs spec files makes it hard to override the location of spl directory during build in zfs 0.6.1

          utopiabound Nathaniel Clark added a comment - Structure of the zfs spec files makes it hard to override the location of spl directory during build in zfs 0.6.1

          People

            utopiabound Nathaniel Clark
            keith Keith Mannthey (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: