Details

    • 5154

    Description

      I think that we need to take a step back and address some fundamental problems with the lustre spec file.

      My number one complaint is that lustre source rpms are not buildable (using rpmbuild) in any sane fashion. This has been a constant source of work for folks here at Livermore, because our entire automated build infrastructure is built around the concept of taking source rpms as input and generating binary rpms.

      Because the lustre srpms are not buildable, no one tries to do that (except for us). So we fix things for us, but then almost every time the spec file is touched, it breaks something for us again.

      I think we need to get on the same page, and finally clean up the lustre spec file. I think the requirements here are:

      1) lustre srpm must be rebuildable
      2) spec file must take sane options, to allow configuration of lustre at srpm rebuild time
      3) must make no attempt to parse options out of configure_args

      I think the greatest area of pain will be dealing with "make rpms". The current way that the build system passes option from the initial configuration step down through make rpms and into the spec file is, at best, jerry-rigged. Passing a single giant "configure_args" define into rpmbuild and then parsing out options to set rpm variables is just backwards from the way that that an rpm spec file SHOULD work.

      A spec file should take options (rpmbuild -with/-without options when applicable), and then assemble a configure line based on THAT.

      Another example of the spec file having it the wrong-way around is "is_client". In order to perform a build that leaves out the server code and rpm dependencies, you set the package NAME! Thats crazy, for not the least reason that now no one can choose to name the package differently when they want a client-only build.

      A better approach is to have an rpmbuild "--without servers" option. That sets a variable in the spec file. The spec file uses that flag to conditionally include Requires directives and such, and the lustre package name is changed to "lustre-client" IFF the lustre_name variable has not already been set. This would allow me, for instance, to generate a set of binary rpms with a package name of "lustre-orion", and also make it a client-only build. I cannot accomplish that with the current spec file.

      So now the big question: Do we need "make rpms" at all? Where is it really needed? I suspect that Whamcloud's build farm uses it, but I think that you guys should really switch to generating binary rpms from a single source rpm once we get that working reliably.

      If we do, I think there are ways that we can make that work with a spec file and source rpm that are done the Right Way, but if we can just drop it altogether that would certainly be easier.

      Attachments

        Issue Links

          Activity

            [LU-1199] lustre build system overhaul

            I think part of the problem is that there were patches landed to both 2.4.0 and 2.5.0 using LU-1199 as the tracking bug, instead of a subtask. I agree that the cloned subtasks were a mistake, and that was just a side-effect of how Jira clones a ticket with subtasks rather than the intended outcome.

            I guess the lessons to be learned here are:

            • patches shouldn't be landed directly against the tracking bug
            • don't clone a tracking bug with subtasks

            The reason that we want to close the ticket when it has patches landed under a particular release is largely to ensure that tickets with the 2.5.0 "Fix Version" field are tracked properly. Since this is the first release that has used "Fix Version" in such a manner, it is not unexpected that there will be some mistakes made and learning to be done. To be honest, I think the importance of this issue is being overemphasized. The old and new bugs are linked together, so I don't think it will be too confusing to track the issues.

            adilger Andreas Dilger added a comment - I think part of the problem is that there were patches landed to both 2.4.0 and 2.5.0 using LU-1199 as the tracking bug, instead of a subtask. I agree that the cloned subtasks were a mistake, and that was just a side-effect of how Jira clones a ticket with subtasks rather than the intended outcome. I guess the lessons to be learned here are: patches shouldn't be landed directly against the tracking bug don't clone a tracking bug with subtasks The reason that we want to close the ticket when it has patches landed under a particular release is largely to ensure that tickets with the 2.5.0 "Fix Version" field are tracked properly. Since this is the first release that has used "Fix Version" in such a manner, it is not unexpected that there will be some mistakes made and learning to be done. To be honest, I think the importance of this issue is being overemphasized. The old and new bugs are linked together, so I don't think it will be too confusing to track the issues.
            spitzcor Cory Spitz added a comment -

            Sounds like a topic for the OpenSFS CDWG. Can we move the discussion to cdwg@lists.opensfs.org?

            spitzcor Cory Spitz added a comment - Sounds like a topic for the OpenSFS CDWG. Can we move the discussion to cdwg@lists.opensfs.org?

            I see little value in having the tracking ticket marked fixed in a release where the work is clearly unfinished. Further, what bothers me more than duplicating the tracking ticket is the cloning of all of the subtasks, which serves no clear purpose. The subtask cloning just makes everything very hard to understand in jira.

            The subtask is usually much more simple and clean. There is no good reason to have it appear under two different tracking tickets. We have two main subtask states: finished and unfinished.

            Finished subtasks should not be cloned forward to the new tracker. There is no value there. They were finished and closed under the old tracker, no need to clone.

            Then there are the unfinished tasks, there is no reason for them to remain connected to the old tracker since they were not completed there. There is very little value (none that I can see) in closing the unfinished ticket under the old tracker and having an new clone ticket opened under the new tracker.

            Ultimately I don't really understand the desire to close the tracking ticket against a particular Fix Version when the work is incomplete. I would argue that simply leaving the Fix Version field empty for tracking tickets that are long lived makes more sense.

            But if I can't get you folks to agree to that, hopefully you can change your procedures to handle the subtasks more sanely next time.

            morrone Christopher Morrone (Inactive) added a comment - - edited I see little value in having the tracking ticket marked fixed in a release where the work is clearly unfinished. Further, what bothers me more than duplicating the tracking ticket is the cloning of all of the subtasks, which serves no clear purpose. The subtask cloning just makes everything very hard to understand in jira. The subtask is usually much more simple and clean. There is no good reason to have it appear under two different tracking tickets. We have two main subtask states: finished and unfinished. Finished subtasks should not be cloned forward to the new tracker. There is no value there. They were finished and closed under the old tracker, no need to clone. Then there are the unfinished tasks, there is no reason for them to remain connected to the old tracker since they were not completed there. There is very little value (none that I can see) in closing the unfinished ticket under the old tracker and having an new clone ticket opened under the new tracker. Ultimately I don't really understand the desire to close the tracking ticket against a particular Fix Version when the work is incomplete. I would argue that simply leaving the Fix Version field empty for tracking tickets that are long lived makes more sense. But if I can't get you folks to agree to that, hopefully you can change your procedures to handle the subtasks more sanely next time.

            Chris, I don't think there is a huge problem if patches that have LU-1199 land in 2.6 (assuming that they do not need to be updated between now and when they land), since we can just mark this ticket with a fix version for 2.6.0 as well. What is trying to be avoided is releasing 2.5.0 with open tickets. Conversely, we don't want to move tickets with landed patches out from fix version 2.5.0 since that makes it harder to track patches/bugs that were fixed in 2.5.0.

            I think cloning the ticket and linking it as a duplicate ensures both that tickets landed in either release can be tracked properly, and doesn't require changing all of the patches in flight.

            adilger Andreas Dilger added a comment - Chris, I don't think there is a huge problem if patches that have LU-1199 land in 2.6 (assuming that they do not need to be updated between now and when they land), since we can just mark this ticket with a fix version for 2.6.0 as well. What is trying to be avoided is releasing 2.5.0 with open tickets. Conversely, we don't want to move tickets with landed patches out from fix version 2.5.0 since that makes it harder to track patches/bugs that were fixed in 2.5.0. I think cloning the ticket and linking it as a duplicate ensures both that tickets landed in either release can be tracked properly, and doesn't require changing all of the patches in flight.

            I meant to suggest that we undo the cloned tickets and restore LU-1199 and its subtasks to active duty.

            morrone Christopher Morrone (Inactive) added a comment - I meant to suggest that we undo the cloned tickets and restore LU-1199 and its subtasks to active duty.

            Ok Chris. Will do.

            jlevi Jodi Levi (Inactive) added a comment - Ok Chris. Will do.

            I'm not a fan of the whole ticket cloning thing that happened here. The LU-1199 is well known as the "random little build system cleanup" ticket number for minor patches. I assume this was done because LU-1199 has Fix Version/s listed for 2.5.0, and you want to close as many 2.5.0 tickets as possible.

            I would recommend instead that we remove all Fix Version/s from LU-1199, and just let the sub-tasks be more targeted to specific versions and closed as they are complete.

            We also have the problem that in-flight patches now have the wrong jira ticket numbers. For instance, Minh's "Make lustre-iokit a subpackage of lustre" patch. I'd rather not blow away all of the reviews and testing on that patch just to correct the ticket number...but we really should make the ticket number in the patch match the ticket that it goes with.

            morrone Christopher Morrone (Inactive) added a comment - I'm not a fan of the whole ticket cloning thing that happened here. The LU-1199 is well known as the "random little build system cleanup" ticket number for minor patches. I assume this was done because LU-1199 has Fix Version/s listed for 2.5.0, and you want to close as many 2.5.0 tickets as possible. I would recommend instead that we remove all Fix Version/s from LU-1199 , and just let the sub-tasks be more targeted to specific versions and closed as they are complete. We also have the problem that in-flight patches now have the wrong jira ticket numbers. For instance, Minh's "Make lustre-iokit a subpackage of lustre" patch. I'd rather not blow away all of the reviews and testing on that patch just to correct the ticket number...but we really should make the ticket number in the patch match the ticket that it goes with.

            Remaining subtasks and patches for this ticket will be tracked under LU-3953 for 2.6. Any additional patches for this work should be tracked under the new ticket.

            jlevi Jodi Levi (Inactive) added a comment - Remaining subtasks and patches for this ticket will be tracked under LU-3953 for 2.6. Any additional patches for this work should be tracked under the new ticket.

            Minh, could you please start to refresh the lustre.spec.in cleanup patches starting with http://review.whamcloud.com/5486.

            adilger Andreas Dilger added a comment - Minh, could you please start to refresh the lustre.spec.in cleanup patches starting with http://review.whamcloud.com/5486 .

            Yes, I had already started that split, since this ticket has kind of evolved into a build system catch-all, and I don't want to lose sight of my main goal.

            morrone Christopher Morrone (Inactive) added a comment - Yes, I had already started that split, since this ticket has kind of evolved into a build system catch-all, and I don't want to lose sight of my main goal.

            People

              mdiep Minh Diep
              morrone Christopher Morrone (Inactive)
              Votes:
              1 Vote for this issue
              Watchers:
              26 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: