Details

    • Technical task
    • Resolution: Fixed
    • Major
    • Lustre 2.5.0
    • Lustre 2.5.0
    • None
    • 8659

    Description

      ldiskfs is a prerequisite of lustre. But instead of building the prerequisite before lustre, we have chosen to stop and perform the build of in the middle of the lustre. That introduces a great deal of unnecessary complexity to the lustre build system. The complexity is so great, that we have allowed the normal* way of building RPMs to remain broken for years. Our .src.rpm files remain unbuildable under many normal situations.

      To make further progress on LU-1199, this major problem must be addressed. As I see it, there are two main solutions we could choose:

      1. Move ldiskfs out of the lustre source tree. Build it first.
      2. Keep ldiskfs in-tree, but eliminate its independant build system and spec file. The ldiskfs binary rpm becomes a sub-module of lustre.spec.

      I have been moving us towards option 1. I have gotten enough change landed to allow ldiskfs to be built out of tree. It will not take much change to allow Lustre to use the externally pre-build ldiskfs packages. We already do this at LLNL, so I'm pretty clear on what needs to be done there.

      I think option 1 would also position us nicely is James Simmons is successful in getting all of the ldiskfs into the upstream kernel at some point. We will already be comfortable at that point building against an ldiskfs that is out of tree.

      It is also nicely symmetric with how we build against an external zfs today (and perhaps btrfs in the future). Even if we wind up needed to make an "lbtrfs" in the future, I don't think we would want to repeat the choice to try to merge that into the lustre tree.

      I would like to see a quick decision from Intel on which path we should take for Lustre 2.5, so that we can get the change introduced early in the development cycle. That will give us some time to adjust all of our personal processes, and get the kinks worked out well in advance of release.

      * I am defining "normal" to be the way that a distro like fedore would do it: first build ldiskfs rpms, including devel packages. Then build lustre against the already built ldiskfs packages.

      Attachments

        Issue Links

          Activity

            [LU-3462] Eliminate ldiskfs recursive & independent rpm packaging
            mdiep Minh Diep added a comment -

            James,

            When you saw the ftrace compile issue above, were you using any external OFED?

            mdiep Minh Diep added a comment - James, When you saw the ftrace compile issue above, were you using any external OFED?
            mdiep Minh Diep added a comment - http://review.whamcloud.com/#/c/7639/ landed in 2.5.0

            Chris since this patch has landed I'm seeing the follow errors.

            James, could you start a new ticket for that? I'll forget about this in no time if it is on a closed ticket.

            Include which kernel you are using. Maybe configuration options too. I am not seeing that error, and I don't have any ideas off the top of my head.

            morrone Christopher Morrone (Inactive) added a comment - Chris since this patch has landed I'm seeing the follow errors. James, could you start a new ticket for that? I'll forget about this in no time if it is on a closed ticket. Include which kernel you are using. Maybe configuration options too. I am not seeing that error, and I don't have any ideas off the top of my head.
            mdiep Minh Diep added a comment - one minor fix http://review.whamcloud.com/#/c/7639/ added.
            simmonsja James A Simmons added a comment - - edited

            Chris since this patch has landed I'm seeing the follow errors.

            lustre-2.4.92-broke/ldiskfs/trace/events/ldiskfs.h: In function ‘ftrace_profile_enable_ldiskfs_free_inode’:
            lustre-2.4.92-broke/ldiskfs/trace/events/ldiskfs.h:18: error: implicit declaration of function ‘register_trace_ldiskfs_free_inode’
            lustre-2.4.92-broke/ldiskfs/trace/events/ldiskfs.h: In function ‘ftrace_profile_disable_ldiskfs_free_inode’:
            lustre-2.4.92-broke/ldiskfs/trace/events/ldiskfs.h:18: error: implicit declaration of function ‘unregister_trace_ldiskfs_free_inode’
            lustre-2.4.92-broke/ldiskfs/trace/events/ldiskfs.h: In function ‘ftrace_profile_enable_ldiskfs_request_inode’

            I have looked over the code in detail to see what is wrong but can't figure it out. Do you have any ideas?

            simmonsja James A Simmons added a comment - - edited Chris since this patch has landed I'm seeing the follow errors. lustre-2.4.92-broke/ldiskfs/trace/events/ldiskfs.h: In function ‘ftrace_profile_enable_ldiskfs_free_inode’: lustre-2.4.92-broke/ldiskfs/trace/events/ldiskfs.h:18: error: implicit declaration of function ‘register_trace_ldiskfs_free_inode’ lustre-2.4.92-broke/ldiskfs/trace/events/ldiskfs.h: In function ‘ftrace_profile_disable_ldiskfs_free_inode’: lustre-2.4.92-broke/ldiskfs/trace/events/ldiskfs.h:18: error: implicit declaration of function ‘unregister_trace_ldiskfs_free_inode’ lustre-2.4.92-broke/ldiskfs/trace/events/ldiskfs.h: In function ‘ftrace_profile_enable_ldiskfs_request_inode’ I have looked over the code in detail to see what is wrong but can't figure it out. Do you have any ideas?
            pjones Peter Jones added a comment -

            I am delighted to note that this patch has landed for 2.5

            pjones Peter Jones added a comment - I am delighted to note that this patch has landed for 2.5

            Ah, thanks! I clearly missed a makefile change for lustre-iokit. I remember making that change, so I must have misplaced it. Good catch.

            The second problem I can't reproduce. Some more details about your configuration might help. But it may not matter in the end.

            It looks like there is an existing bug not necessarily related to my patch, that the "srpm-real" make target does "rpmbuild -ta", which is incorrect. It should be doing "rpmbuild -ts". In other words, if you run "make srpm", there should really be no attempt to build ldiskfs, and you should not have seen that error. Then again, that doesn't entirely explain why the build did fail. Hopefully ldiskfs just was not configured correctly under that path (which is really should not need to be). But it is possible that the problem will just move to the rpm case.

            Lets start with those two changes, and you can tell me if the problem moves to another make target. I just pushed revision 17 of the patch with those fixes.

            By the way, don't hesitate to review the patch -1 when you find problems!

            morrone Christopher Morrone (Inactive) added a comment - Ah, thanks! I clearly missed a makefile change for lustre-iokit. I remember making that change, so I must have misplaced it. Good catch. The second problem I can't reproduce. Some more details about your configuration might help. But it may not matter in the end. It looks like there is an existing bug not necessarily related to my patch, that the "srpm-real" make target does "rpmbuild -ta", which is incorrect. It should be doing "rpmbuild -ts". In other words, if you run "make srpm", there should really be no attempt to build ldiskfs, and you should not have seen that error. Then again, that doesn't entirely explain why the build did fail. Hopefully ldiskfs just was not configured correctly under that path (which is really should not need to be). But it is possible that the problem will just move to the rpm case. Lets start with those two changes, and you can tell me if the problem moves to another make target. I just pushed revision 17 of the patch with those fixes. By the way, don't hesitate to review the patch -1 when you find problems!
            mdiep Minh Diep added a comment -

            Hi Chris,

            make srpm results in this error

            [mpiuser@client-1 lu3462.org]$ make srpm
            make -C lustre-iokit srpm
            make[1]: Entering directory `/mnt/build/build/lu3462.org/lustre-iokit'
            make[1]: *** No rule to make target `srpm'. Stop.
            make[1]: Leaving directory `/mnt/build/build/lu3462.org/lustre-iokit'
            make: *** [srpm] Error 2

            After I comment out the make iokit in srpm, it continue but fails here.

            ...
            Type 'make' to build Lustre.
            + make -j 4 -s
            make[2]: Entering directory `/localhome/mpiuser/rpmbuild/BUILD/lustre-2.4.53'
            make[3]: Entering directory `/localhome/mpiuser/rpmbuild/BUILD/lustre-2.4.53'
            Making all in ldiskfs
            Making all in .
            make[4]: Entering directory `/localhome/mpiuser/rpmbuild/BUILD/lustre-2.4.53'
            make[5]: Entering directory `/localhome/mpiuser/rpmbuild/BUILD/lustre-2.4.53/lustre'
            make[5]: Leaving directory `/localhome/mpiuser/rpmbuild/BUILD/lustre-2.4.53/lustre'
            make[5]: Entering directory `/usr/src/kernels/2.6.32-358.11.1.el6.x86_64'
            make[7]: *** No rule to make target `/localhome/mpiuser/rpmbuild/BUILD/lustre-2.4.53/ldiskfs/dynlocks.o', needed by `/localhome/mpiuser/rpmbuild/BUILD/lustre-2.4.53/ldiskfs/ldiskfs.o'. Stop.
            make[7]: *** Waiting for unfinished jobs....
            make[6]: *** [/localhome/mpiuser/rpmbuild/BUILD/lustre-2.4.53/ldiskfs] Error 2
            make[6]: *** Waiting for unfinished jobs....

            mdiep Minh Diep added a comment - Hi Chris, make srpm results in this error [mpiuser@client-1 lu3462.org] $ make srpm make -C lustre-iokit srpm make [1] : Entering directory `/mnt/build/build/lu3462.org/lustre-iokit' make [1] : *** No rule to make target `srpm'. Stop. make [1] : Leaving directory `/mnt/build/build/lu3462.org/lustre-iokit' make: *** [srpm] Error 2 After I comment out the make iokit in srpm, it continue but fails here. ... Type 'make' to build Lustre. + make -j 4 -s make [2] : Entering directory `/localhome/mpiuser/rpmbuild/BUILD/lustre-2.4.53' make [3] : Entering directory `/localhome/mpiuser/rpmbuild/BUILD/lustre-2.4.53' Making all in ldiskfs Making all in . make [4] : Entering directory `/localhome/mpiuser/rpmbuild/BUILD/lustre-2.4.53' make [5] : Entering directory `/localhome/mpiuser/rpmbuild/BUILD/lustre-2.4.53/lustre' make [5] : Leaving directory `/localhome/mpiuser/rpmbuild/BUILD/lustre-2.4.53/lustre' make [5] : Entering directory `/usr/src/kernels/2.6.32-358.11.1.el6.x86_64' make [7] : *** No rule to make target `/localhome/mpiuser/rpmbuild/BUILD/lustre-2.4.53/ldiskfs/dynlocks.o', needed by `/localhome/mpiuser/rpmbuild/BUILD/lustre-2.4.53/ldiskfs/ldiskfs.o'. Stop. make [7] : *** Waiting for unfinished jobs.... make [6] : *** [/localhome/mpiuser/rpmbuild/BUILD/lustre-2.4.53/ldiskfs] Error 2 make [6] : *** Waiting for unfinished jobs....

            Status update:

            Patch 7054 landed.

            The remaining patch, 6850 is in pretty good shape. Barring problems found by reviewers, I consider it ready to land. LLNL is already using it on our 2.4.0 branch.

            6850 had a +1 review from Brian Murrell at Patch Set 14. I rebased it to address conflicts from newly landed patches, but master seems to be destabilized at the moment, and Maloo testing failed for reasons unrelated to the patch.

            morrone Christopher Morrone (Inactive) added a comment - - edited Status update: Patch 7054 landed. The remaining patch, 6850 is in pretty good shape. Barring problems found by reviewers, I consider it ready to land. LLNL is already using it on our 2.4.0 branch. 6850 had a +1 review from Brian Murrell at Patch Set 14. I rebased it to address conflicts from newly landed patches, but master seems to be destabilized at the moment, and Maloo testing failed for reasons unrelated to the patch.

            It took a couple of more revisions, but things are now building. There are now two patches for this change:

            http://review.whamcloud.com/7054
            http://review.whamcloud.com/6850

            6850 failed maloo on one test, but I believe that it was unrelated to the patch.

            I am now done with work on these patches until more work arises from reviews and/or rebases.

            morrone Christopher Morrone (Inactive) added a comment - It took a couple of more revisions, but things are now building. There are now two patches for this change: http://review.whamcloud.com/7054 http://review.whamcloud.com/6850 6850 failed maloo on one test, but I believe that it was unrelated to the patch. I am now done with work on these patches until more work arises from reviews and/or rebases.

            I pulled a thread and unravelled the sweater. But I think I put it back together now. Hopefully patch set 12 will compile and be near final.

            Patch set 12 depends on a new smaller patch that repairs some recent (and possibly some long standing) breakage with the Lustre "make dist" target.

            morrone Christopher Morrone (Inactive) added a comment - I pulled a thread and unravelled the sweater. But I think I put it back together now. Hopefully patch set 12 will compile and be near final. Patch set 12 depends on a new smaller patch that repairs some recent (and possibly some long standing) breakage with the Lustre "make dist" target.

            People

              mdiep Minh Diep
              morrone Christopher Morrone (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: