Details

    • Technical task
    • Resolution: Fixed
    • Major
    • Lustre 2.5.0
    • Lustre 2.5.0
    • None
    • 8659

    Description

      ldiskfs is a prerequisite of lustre. But instead of building the prerequisite before lustre, we have chosen to stop and perform the build of in the middle of the lustre. That introduces a great deal of unnecessary complexity to the lustre build system. The complexity is so great, that we have allowed the normal* way of building RPMs to remain broken for years. Our .src.rpm files remain unbuildable under many normal situations.

      To make further progress on LU-1199, this major problem must be addressed. As I see it, there are two main solutions we could choose:

      1. Move ldiskfs out of the lustre source tree. Build it first.
      2. Keep ldiskfs in-tree, but eliminate its independant build system and spec file. The ldiskfs binary rpm becomes a sub-module of lustre.spec.

      I have been moving us towards option 1. I have gotten enough change landed to allow ldiskfs to be built out of tree. It will not take much change to allow Lustre to use the externally pre-build ldiskfs packages. We already do this at LLNL, so I'm pretty clear on what needs to be done there.

      I think option 1 would also position us nicely is James Simmons is successful in getting all of the ldiskfs into the upstream kernel at some point. We will already be comfortable at that point building against an ldiskfs that is out of tree.

      It is also nicely symmetric with how we build against an external zfs today (and perhaps btrfs in the future). Even if we wind up needed to make an "lbtrfs" in the future, I don't think we would want to repeat the choice to try to merge that into the lustre tree.

      I would like to see a quick decision from Intel on which path we should take for Lustre 2.5, so that we can get the change introduced early in the development cycle. That will give us some time to adjust all of our personal processes, and get the kinks worked out well in advance of release.

      * I am defining "normal" to be the way that a distro like fedore would do it: first build ldiskfs rpms, including devel packages. Then build lustre against the already built ldiskfs packages.

      Attachments

        Issue Links

          Activity

            [LU-3462] Eliminate ldiskfs recursive & independent rpm packaging
            pjones Peter Jones added a comment -

            I am delighted to note that this patch has landed for 2.5

            pjones Peter Jones added a comment - I am delighted to note that this patch has landed for 2.5

            Ah, thanks! I clearly missed a makefile change for lustre-iokit. I remember making that change, so I must have misplaced it. Good catch.

            The second problem I can't reproduce. Some more details about your configuration might help. But it may not matter in the end.

            It looks like there is an existing bug not necessarily related to my patch, that the "srpm-real" make target does "rpmbuild -ta", which is incorrect. It should be doing "rpmbuild -ts". In other words, if you run "make srpm", there should really be no attempt to build ldiskfs, and you should not have seen that error. Then again, that doesn't entirely explain why the build did fail. Hopefully ldiskfs just was not configured correctly under that path (which is really should not need to be). But it is possible that the problem will just move to the rpm case.

            Lets start with those two changes, and you can tell me if the problem moves to another make target. I just pushed revision 17 of the patch with those fixes.

            By the way, don't hesitate to review the patch -1 when you find problems!

            morrone Christopher Morrone (Inactive) added a comment - Ah, thanks! I clearly missed a makefile change for lustre-iokit. I remember making that change, so I must have misplaced it. Good catch. The second problem I can't reproduce. Some more details about your configuration might help. But it may not matter in the end. It looks like there is an existing bug not necessarily related to my patch, that the "srpm-real" make target does "rpmbuild -ta", which is incorrect. It should be doing "rpmbuild -ts". In other words, if you run "make srpm", there should really be no attempt to build ldiskfs, and you should not have seen that error. Then again, that doesn't entirely explain why the build did fail. Hopefully ldiskfs just was not configured correctly under that path (which is really should not need to be). But it is possible that the problem will just move to the rpm case. Lets start with those two changes, and you can tell me if the problem moves to another make target. I just pushed revision 17 of the patch with those fixes. By the way, don't hesitate to review the patch -1 when you find problems!
            mdiep Minh Diep added a comment -

            Hi Chris,

            make srpm results in this error

            [mpiuser@client-1 lu3462.org]$ make srpm
            make -C lustre-iokit srpm
            make[1]: Entering directory `/mnt/build/build/lu3462.org/lustre-iokit'
            make[1]: *** No rule to make target `srpm'. Stop.
            make[1]: Leaving directory `/mnt/build/build/lu3462.org/lustre-iokit'
            make: *** [srpm] Error 2

            After I comment out the make iokit in srpm, it continue but fails here.

            ...
            Type 'make' to build Lustre.
            + make -j 4 -s
            make[2]: Entering directory `/localhome/mpiuser/rpmbuild/BUILD/lustre-2.4.53'
            make[3]: Entering directory `/localhome/mpiuser/rpmbuild/BUILD/lustre-2.4.53'
            Making all in ldiskfs
            Making all in .
            make[4]: Entering directory `/localhome/mpiuser/rpmbuild/BUILD/lustre-2.4.53'
            make[5]: Entering directory `/localhome/mpiuser/rpmbuild/BUILD/lustre-2.4.53/lustre'
            make[5]: Leaving directory `/localhome/mpiuser/rpmbuild/BUILD/lustre-2.4.53/lustre'
            make[5]: Entering directory `/usr/src/kernels/2.6.32-358.11.1.el6.x86_64'
            make[7]: *** No rule to make target `/localhome/mpiuser/rpmbuild/BUILD/lustre-2.4.53/ldiskfs/dynlocks.o', needed by `/localhome/mpiuser/rpmbuild/BUILD/lustre-2.4.53/ldiskfs/ldiskfs.o'. Stop.
            make[7]: *** Waiting for unfinished jobs....
            make[6]: *** [/localhome/mpiuser/rpmbuild/BUILD/lustre-2.4.53/ldiskfs] Error 2
            make[6]: *** Waiting for unfinished jobs....

            mdiep Minh Diep added a comment - Hi Chris, make srpm results in this error [mpiuser@client-1 lu3462.org] $ make srpm make -C lustre-iokit srpm make [1] : Entering directory `/mnt/build/build/lu3462.org/lustre-iokit' make [1] : *** No rule to make target `srpm'. Stop. make [1] : Leaving directory `/mnt/build/build/lu3462.org/lustre-iokit' make: *** [srpm] Error 2 After I comment out the make iokit in srpm, it continue but fails here. ... Type 'make' to build Lustre. + make -j 4 -s make [2] : Entering directory `/localhome/mpiuser/rpmbuild/BUILD/lustre-2.4.53' make [3] : Entering directory `/localhome/mpiuser/rpmbuild/BUILD/lustre-2.4.53' Making all in ldiskfs Making all in . make [4] : Entering directory `/localhome/mpiuser/rpmbuild/BUILD/lustre-2.4.53' make [5] : Entering directory `/localhome/mpiuser/rpmbuild/BUILD/lustre-2.4.53/lustre' make [5] : Leaving directory `/localhome/mpiuser/rpmbuild/BUILD/lustre-2.4.53/lustre' make [5] : Entering directory `/usr/src/kernels/2.6.32-358.11.1.el6.x86_64' make [7] : *** No rule to make target `/localhome/mpiuser/rpmbuild/BUILD/lustre-2.4.53/ldiskfs/dynlocks.o', needed by `/localhome/mpiuser/rpmbuild/BUILD/lustre-2.4.53/ldiskfs/ldiskfs.o'. Stop. make [7] : *** Waiting for unfinished jobs.... make [6] : *** [/localhome/mpiuser/rpmbuild/BUILD/lustre-2.4.53/ldiskfs] Error 2 make [6] : *** Waiting for unfinished jobs....

            Status update:

            Patch 7054 landed.

            The remaining patch, 6850 is in pretty good shape. Barring problems found by reviewers, I consider it ready to land. LLNL is already using it on our 2.4.0 branch.

            6850 had a +1 review from Brian Murrell at Patch Set 14. I rebased it to address conflicts from newly landed patches, but master seems to be destabilized at the moment, and Maloo testing failed for reasons unrelated to the patch.

            morrone Christopher Morrone (Inactive) added a comment - - edited Status update: Patch 7054 landed. The remaining patch, 6850 is in pretty good shape. Barring problems found by reviewers, I consider it ready to land. LLNL is already using it on our 2.4.0 branch. 6850 had a +1 review from Brian Murrell at Patch Set 14. I rebased it to address conflicts from newly landed patches, but master seems to be destabilized at the moment, and Maloo testing failed for reasons unrelated to the patch.

            It took a couple of more revisions, but things are now building. There are now two patches for this change:

            http://review.whamcloud.com/7054
            http://review.whamcloud.com/6850

            6850 failed maloo on one test, but I believe that it was unrelated to the patch.

            I am now done with work on these patches until more work arises from reviews and/or rebases.

            morrone Christopher Morrone (Inactive) added a comment - It took a couple of more revisions, but things are now building. There are now two patches for this change: http://review.whamcloud.com/7054 http://review.whamcloud.com/6850 6850 failed maloo on one test, but I believe that it was unrelated to the patch. I am now done with work on these patches until more work arises from reviews and/or rebases.

            I pulled a thread and unravelled the sweater. But I think I put it back together now. Hopefully patch set 12 will compile and be near final.

            Patch set 12 depends on a new smaller patch that repairs some recent (and possibly some long standing) breakage with the Lustre "make dist" target.

            morrone Christopher Morrone (Inactive) added a comment - I pulled a thread and unravelled the sweater. But I think I put it back together now. Hopefully patch set 12 will compile and be near final. Patch set 12 depends on a new smaller patch that repairs some recent (and possibly some long standing) breakage with the Lustre "make dist" target.

            Patch set 6 of http://review.whamcloud.com/6850 is now available.

            I cleaned up the m4 files (among other things). The main task with the m4 cleanup was to merge the two ways that lustre and ldiskfs were determining the kernel version.

            With that done, I think it is now ready for wider review.

            morrone Christopher Morrone (Inactive) added a comment - Patch set 6 of http://review.whamcloud.com/6850 is now available. I cleaned up the m4 files (among other things). The main task with the m4 cleanup was to merge the two ways that lustre and ldiskfs were determining the kernel version. With that done, I think it is now ready for wider review.

            Patch set 3 of http://review.whamcloud.com/6850 is now available.

            I cleaned up the things that Minh noted in the last patch. The m4 files still have some obvious cleanup work needed before they are ready though.

            The big change in this version is the elimination of the ldiskfs/ldiskfs subdirectory. I realized that it was pretty much superfluous once the ldiskfs build system was eliminated, so I moved the make files from that directory up one level and removed it.

            Looks like it needs a rebase after today's landings too.

            But I think things are looking pretty good otherwise.

            morrone Christopher Morrone (Inactive) added a comment - Patch set 3 of http://review.whamcloud.com/6850 is now available. I cleaned up the things that Minh noted in the last patch. The m4 files still have some obvious cleanup work needed before they are ready though. The big change in this version is the elimination of the ldiskfs/ldiskfs subdirectory. I realized that it was pretty much superfluous once the ldiskfs build system was eliminated, so I moved the make files from that directory up one level and removed it. Looks like it needs a rebase after today's landings too. But I think things are looking pretty good otherwise.

            Patch set 2 of http://review.whamcloud.com/6850 built in Jenkins, and appears to be running through testing pretty well. I assume any problems probably would have been evident pretty quickly since this is just packaging change. And it has been running fine for hours now.

            Minh, I think it would be worthwhile to get your initial impressions of the approach, keeping in mind that there is still cleanup to do.

            morrone Christopher Morrone (Inactive) added a comment - Patch set 2 of http://review.whamcloud.com/6850 built in Jenkins, and appears to be running through testing pretty well. I assume any problems probably would have been evident pretty quickly since this is just packaging change. And it has been running fine for hours now. Minh, I think it would be worthwhile to get your initial impressions of the approach, keeping in mind that there is still cleanup to do.

            It didn't sounds like there was much endorsement of the plan to move patch application into configure, so I will avoid that for the first iteration of the patch.

            This is not at all finished, but I pushed a patch just now to get a peak at how the Intel build system is going to react to the change. And to demonstrate that work is progressing.

            http://review.whamcloud.com/6850

            morrone Christopher Morrone (Inactive) added a comment - It didn't sounds like there was much endorsement of the plan to move patch application into configure, so I will avoid that for the first iteration of the patch. This is not at all finished, but I pushed a patch just now to get a peak at how the Intel build system is going to react to the change. And to demonstrate that work is progressing. http://review.whamcloud.com/6850

            That is a good guess! But from what I'm reading, I don't think that it really works that way.

            It looks like the only dependency is on the series file. So patch modifications won't cause a "rm -rf" of the patched directory unless there is also a series file change.

            It is not clear to me that there is much value in continuing that. It will lead to a false sense of security.

            And doing it right would be very difficult, because we are combining the kernel's build system with another pass of a non-kernel autotools system. It looks to me like it would be very difficult to get this right (which is why it never has been). So I'm not sure there is even value in preserving the current system in the hopes that someone someday will do all of the work to make it correct.

            I would vote for simplicity.

            morrone Christopher Morrone (Inactive) added a comment - That is a good guess! But from what I'm reading, I don't think that it really works that way. It looks like the only dependency is on the series file. So patch modifications won't cause a "rm -rf" of the patched directory unless there is also a series file change. It is not clear to me that there is much value in continuing that. It will lead to a false sense of security. And doing it right would be very difficult, because we are combining the kernel's build system with another pass of a non-kernel autotools system. It looks to me like it would be very difficult to get this right (which is why it never has been). So I'm not sure there is even value in preserving the current system in the hopes that someone someday will do all of the work to make it correct. I would vote for simplicity.

            People

              mdiep Minh Diep
              morrone Christopher Morrone (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: