Details

    • 5154

    Description

      I think that we need to take a step back and address some fundamental problems with the lustre spec file.

      My number one complaint is that lustre source rpms are not buildable (using rpmbuild) in any sane fashion. This has been a constant source of work for folks here at Livermore, because our entire automated build infrastructure is built around the concept of taking source rpms as input and generating binary rpms.

      Because the lustre srpms are not buildable, no one tries to do that (except for us). So we fix things for us, but then almost every time the spec file is touched, it breaks something for us again.

      I think we need to get on the same page, and finally clean up the lustre spec file. I think the requirements here are:

      1) lustre srpm must be rebuildable
      2) spec file must take sane options, to allow configuration of lustre at srpm rebuild time
      3) must make no attempt to parse options out of configure_args

      I think the greatest area of pain will be dealing with "make rpms". The current way that the build system passes option from the initial configuration step down through make rpms and into the spec file is, at best, jerry-rigged. Passing a single giant "configure_args" define into rpmbuild and then parsing out options to set rpm variables is just backwards from the way that that an rpm spec file SHOULD work.

      A spec file should take options (rpmbuild -with/-without options when applicable), and then assemble a configure line based on THAT.

      Another example of the spec file having it the wrong-way around is "is_client". In order to perform a build that leaves out the server code and rpm dependencies, you set the package NAME! Thats crazy, for not the least reason that now no one can choose to name the package differently when they want a client-only build.

      A better approach is to have an rpmbuild "--without servers" option. That sets a variable in the spec file. The spec file uses that flag to conditionally include Requires directives and such, and the lustre package name is changed to "lustre-client" IFF the lustre_name variable has not already been set. This would allow me, for instance, to generate a set of binary rpms with a package name of "lustre-orion", and also make it a client-only build. I cannot accomplish that with the current spec file.

      So now the big question: Do we need "make rpms" at all? Where is it really needed? I suspect that Whamcloud's build farm uses it, but I think that you guys should really switch to generating binary rpms from a single source rpm once we get that working reliably.

      If we do, I think there are ways that we can make that work with a spec file and source rpm that are done the Right Way, but if we can just drop it altogether that would certainly be easier.

      Attachments

        Issue Links

          Activity

            [LU-1199] lustre build system overhaul

            I meant to suggest that we undo the cloned tickets and restore LU-1199 and its subtasks to active duty.

            morrone Christopher Morrone (Inactive) added a comment - I meant to suggest that we undo the cloned tickets and restore LU-1199 and its subtasks to active duty.

            Ok Chris. Will do.

            jlevi Jodi Levi (Inactive) added a comment - Ok Chris. Will do.

            I'm not a fan of the whole ticket cloning thing that happened here. The LU-1199 is well known as the "random little build system cleanup" ticket number for minor patches. I assume this was done because LU-1199 has Fix Version/s listed for 2.5.0, and you want to close as many 2.5.0 tickets as possible.

            I would recommend instead that we remove all Fix Version/s from LU-1199, and just let the sub-tasks be more targeted to specific versions and closed as they are complete.

            We also have the problem that in-flight patches now have the wrong jira ticket numbers. For instance, Minh's "Make lustre-iokit a subpackage of lustre" patch. I'd rather not blow away all of the reviews and testing on that patch just to correct the ticket number...but we really should make the ticket number in the patch match the ticket that it goes with.

            morrone Christopher Morrone (Inactive) added a comment - I'm not a fan of the whole ticket cloning thing that happened here. The LU-1199 is well known as the "random little build system cleanup" ticket number for minor patches. I assume this was done because LU-1199 has Fix Version/s listed for 2.5.0, and you want to close as many 2.5.0 tickets as possible. I would recommend instead that we remove all Fix Version/s from LU-1199 , and just let the sub-tasks be more targeted to specific versions and closed as they are complete. We also have the problem that in-flight patches now have the wrong jira ticket numbers. For instance, Minh's "Make lustre-iokit a subpackage of lustre" patch. I'd rather not blow away all of the reviews and testing on that patch just to correct the ticket number...but we really should make the ticket number in the patch match the ticket that it goes with.

            Remaining subtasks and patches for this ticket will be tracked under LU-3953 for 2.6. Any additional patches for this work should be tracked under the new ticket.

            jlevi Jodi Levi (Inactive) added a comment - Remaining subtasks and patches for this ticket will be tracked under LU-3953 for 2.6. Any additional patches for this work should be tracked under the new ticket.

            Minh, could you please start to refresh the lustre.spec.in cleanup patches starting with http://review.whamcloud.com/5486.

            adilger Andreas Dilger added a comment - Minh, could you please start to refresh the lustre.spec.in cleanup patches starting with http://review.whamcloud.com/5486 .

            Yes, I had already started that split, since this ticket has kind of evolved into a build system catch-all, and I don't want to lose sight of my main goal.

            morrone Christopher Morrone (Inactive) added a comment - Yes, I had already started that split, since this ticket has kind of evolved into a build system catch-all, and I don't want to lose sight of my main goal.

            We track all of the bugs that have patches landed in each release so that it is possible to query in Jira. If you want to split this bug into smaller sub-tasks it would be easier to track what parts landed in each release.

            adilger Andreas Dilger added a comment - We track all of the bugs that have patches landed in each release so that it is possible to query in Jira. If you want to split this bug into smaller sub-tasks it would be easier to track what parts landed in each release.

            While some early patches landed for 2.4.0, the main thrust of this ticket was not addressed for Lustre 2.4.0. I'm not sure that adding a Fix Version of Lustre 2.4.0 is appropriate.

            morrone Christopher Morrone (Inactive) added a comment - While some early patches landed for 2.4.0, the main thrust of this ticket was not addressed for Lustre 2.4.0. I'm not sure that adding a Fix Version of Lustre 2.4.0 is appropriate.

            I have broken some of the remaining work out into subtasks.

            I very much want to see LU-3462 completed for Lustre 2.5.0. I also think it is reasonable to get LU-3464 and LU-3466 done the same time frame.

            I would also like to see LU-3463 begun as soon as possible, but I am much less optimistic that it can be completed in time for 2.5.0. Perhaps it can stand next to the current buildfarm and be in an experimental usage mode at that time.

            morrone Christopher Morrone (Inactive) added a comment - I have broken some of the remaining work out into subtasks. I very much want to see LU-3462 completed for Lustre 2.5.0. I also think it is reasonable to get LU-3464 and LU-3466 done the same time frame. I would also like to see LU-3463 begun as soon as possible, but I am much less optimistic that it can be completed in time for 2.5.0. Perhaps it can stand next to the current buildfarm and be in an experimental usage mode at that time.

            You've been running with a separate ldiskfs repo for some time, so you know better if this is practical from a code maintenance POV to coordinate changes between lustre and ldiskfs. Are they always modified and installed in lockstep? If yes, then is there a benefit to having them in separate repos?

            Yes, it is practical to coordinate changes between lustre and ldiskfs. With the changes that James and I worked on to have ldiskfs explicitly list its features in an exported header file, lustre can now easily check to see if the ldiskfs that it is compiling against has the required features.

            Our experience is that the problems with maintaining an external ldiskfs stemmed almost entirely from having to maintain a solution that upstream makes no effort to support. Those problems go away when we're able to use the upstream solution.

            No, lustre and ldiskfs are not always modified and installed in lockstep. There are a great deal more changes made to lustre than there are to ldiskfs. We probably build lustre dozens of times for each build of ldiskfs.

            Of the top of my head, some reasons to separate them are:

            • Current recursive spec file usage is not compatible with standard mock-based rpm build systems.
            • Trying to build lustre's prerequisites (e.g. ldiskfs) in the middle of the lustre build process instead of before building lustre is both conceptually confusing, and more importantly makes the build system several times more complicated than it needs to be.
            • ldiskfs is not properly versioned. We've been stuck a the same version since 2010! Granted, this is just developer sloppiness, and isn't something that is forced by leaving ldiskfs in the lustre tree. But I believe that mistakes like that would be more difficult to miss ldiskfs was in its own stand-alone tree.
            • We've had frequent violations of layering of the lustre build system into ldiskfs and vice versa. Leaving ldiskfs in-tree enables that kind of bad practice. Granted, I've got most of that cleaned up, with help from James most recently with the creation of lustre_config.h. But a decade of evidence says that we're not able to keep things clean with ldiskfs in tree.

            As for naming, we also have lustre-iokit, and other related packages, even if they are not built by the same .spec file. This is because these packages are largely only useful as part of lustre, and it (IMHO) makes life easier for the admin to know what a package is used for.

            Yes, lustre-iokit is on my mental list of things to fix. Unlike ldiskfs, lustre-iokit is not a lustre prerequisite, so it is less of a complication. Its configure system and spec file are pretty simple, so I was thinking that it should just have its separate build system gutted, and merge into the lustre system properly. I.E., no independant autogen.sh, configure, or spec file.

            morrone Christopher Morrone (Inactive) added a comment - You've been running with a separate ldiskfs repo for some time, so you know better if this is practical from a code maintenance POV to coordinate changes between lustre and ldiskfs. Are they always modified and installed in lockstep? If yes, then is there a benefit to having them in separate repos? Yes, it is practical to coordinate changes between lustre and ldiskfs. With the changes that James and I worked on to have ldiskfs explicitly list its features in an exported header file, lustre can now easily check to see if the ldiskfs that it is compiling against has the required features. Our experience is that the problems with maintaining an external ldiskfs stemmed almost entirely from having to maintain a solution that upstream makes no effort to support. Those problems go away when we're able to use the upstream solution. No, lustre and ldiskfs are not always modified and installed in lockstep. There are a great deal more changes made to lustre than there are to ldiskfs. We probably build lustre dozens of times for each build of ldiskfs. Of the top of my head, some reasons to separate them are: Current recursive spec file usage is not compatible with standard mock-based rpm build systems. Trying to build lustre's prerequisites (e.g. ldiskfs) in the middle of the lustre build process instead of before building lustre is both conceptually confusing, and more importantly makes the build system several times more complicated than it needs to be. ldiskfs is not properly versioned. We've been stuck a the same version since 2010! Granted, this is just developer sloppiness, and isn't something that is forced by leaving ldiskfs in the lustre tree. But I believe that mistakes like that would be more difficult to miss ldiskfs was in its own stand-alone tree. We've had frequent violations of layering of the lustre build system into ldiskfs and vice versa. Leaving ldiskfs in-tree enables that kind of bad practice. Granted, I've got most of that cleaned up, with help from James most recently with the creation of lustre_config.h. But a decade of evidence says that we're not able to keep things clean with ldiskfs in tree. As for naming, we also have lustre-iokit, and other related packages, even if they are not built by the same .spec file. This is because these packages are largely only useful as part of lustre, and it (IMHO) makes life easier for the admin to know what a package is used for. Yes, lustre-iokit is on my mental list of things to fix. Unlike ldiskfs, lustre-iokit is not a lustre prerequisite, so it is less of a complication. Its configure system and spec file are pretty simple, so I was thinking that it should just have its separate build system gutted, and merge into the lustre system properly. I.E., no independant autogen.sh, configure, or spec file.

            I don't completely understand the resistance to improving things. You need to break an egg to make an omelet.
            Help me understand the scope of the resistance.

            I'm all for cleaning up the build system, but in this case I don't see what functional benefit renaming this package provides over the inconvenience to users?

            I have made it clear that I think that ldiskfs needs to move into its own source tree, in its own git repo. Am I going to get resistance on that as well? If not, do you envision that repository name being lustre-ldiskfs?

            Provably not, and yes, respectively. You've been running with a separate ldiskfs repo for some time, so you know better if this is practical from a code maintenance POV to coordinate changes between lustre and ldiskfs. Are they always modified and installed in lockstep? If yes, then is there a benefit to having them in separate repos?

            As for naming, we also have lustre-iokit, and other related packages, even if they are not built by the same .spec file. This is because these packages are largely only useful as part of lustre, and it (IMHO) makes life easier for the admin to know what a package is used for.

            Why is the osd package named lustre-osd-ldiskfs instead of lustre-osd-lustre-ldiskfs? After all, it that would be more consistent with lustre-osd-zfs naming, and make it lustre-osd-

            Unknown macro: {backend package name}

            .

            The name is "lustre-osd-

            {filesystem name}

            ", which is consistent. If we needed to modify ZFS for Lustre, we might indeed have a lustre-zfs or lustre-lzfs package.

            And then there is ldiskfsprogs. Is that incorrectly named? Should that be lustre-ldiskfsprogs?

            IMHO, I think you are making too big a deal of the RPM package names. The naming convention (or occasional lack thereof) is somewhat historical, but the "lustre- prefix means it was built from the lustre.spec file" convention probably only matters to a very small number of people (i.e. people working with the packaging) and not at all for end users. Conversely, (IMHO) the "lustre- prefix means this package is used for Lustre" convention is useful for every user.

            I know you guys don't use ldiskfsprogs upstream, but we think you should. After all, you are making it quite clear that ldiskfs should never be used without lustre. Why then should we taint the system-wide ext tools with lustre-only changes?

            I'm not wholly against moving over to a separate ldiskfsprogs, but it would also need some way to transition. That said, IMHO some of the changes in the Lustre e2fsprogs are useful for regular ext4 users (e.g. ibadness or your lost+found changes), but I've had difficulty getting it included upstream for various reasons.

            Here's a case in point: Some gentoo guys have started packaging Lustre. But the ldiskfs stuff is so horrible that the gentoo packages will be zfs-only.

            I think this is entirely to do with the fact that ldiskfs needs many kernel specific patches, while ZFS does not. I not aware that this has anything to do with the packaging (which would likely be completely different for Gentoo anyway?)

            So if you guys are going dig in your heels against the ldiskfs rename...well fine. That is not the end of the world. As long as this isn't foreshadowing of a larger resistance to continued reworking the of ldiskfs to packaging, I can live with the current poorly chosen package name.

            I've stated repeatedly that I'm in favour of packaging cleanup, and am very appreciative of your efforts in this area. I personally don't see much real benefit of this rename, and didn't realize it was such a big deal for you. I wanted to make sure I'm not reading things incorrectly from the end-user perspective here, which is why I also asked John and James to comment on it.

            While I definitely have an opinion on this and other topics, they ate very rarely set in stone, and I'm always open to reasonable arguments.

            It would be possible to rename the ldiskfs package and have a "Provides: lustre-ldiskfs" and/or "Obsoletes: lustre-ldiskfs" for upgrades, I just don't see the end-user benefit in this case.

            adilger Andreas Dilger added a comment - I don't completely understand the resistance to improving things. You need to break an egg to make an omelet. Help me understand the scope of the resistance. I'm all for cleaning up the build system, but in this case I don't see what functional benefit renaming this package provides over the inconvenience to users? I have made it clear that I think that ldiskfs needs to move into its own source tree, in its own git repo. Am I going to get resistance on that as well? If not, do you envision that repository name being lustre-ldiskfs? Provably not, and yes, respectively. You've been running with a separate ldiskfs repo for some time, so you know better if this is practical from a code maintenance POV to coordinate changes between lustre and ldiskfs. Are they always modified and installed in lockstep? If yes, then is there a benefit to having them in separate repos? As for naming, we also have lustre-iokit, and other related packages, even if they are not built by the same .spec file. This is because these packages are largely only useful as part of lustre, and it (IMHO) makes life easier for the admin to know what a package is used for. Why is the osd package named lustre-osd-ldiskfs instead of lustre-osd-lustre-ldiskfs? After all, it that would be more consistent with lustre-osd-zfs naming, and make it lustre-osd- Unknown macro: {backend package name} . The name is "lustre-osd- {filesystem name} ", which is consistent. If we needed to modify ZFS for Lustre, we might indeed have a lustre-zfs or lustre-lzfs package. And then there is ldiskfsprogs. Is that incorrectly named? Should that be lustre-ldiskfsprogs? IMHO, I think you are making too big a deal of the RPM package names. The naming convention (or occasional lack thereof) is somewhat historical, but the "lustre- prefix means it was built from the lustre.spec file" convention probably only matters to a very small number of people (i.e. people working with the packaging) and not at all for end users. Conversely, (IMHO) the "lustre- prefix means this package is used for Lustre" convention is useful for every user. I know you guys don't use ldiskfsprogs upstream, but we think you should. After all, you are making it quite clear that ldiskfs should never be used without lustre. Why then should we taint the system-wide ext tools with lustre-only changes? I'm not wholly against moving over to a separate ldiskfsprogs, but it would also need some way to transition. That said, IMHO some of the changes in the Lustre e2fsprogs are useful for regular ext4 users (e.g. ibadness or your lost+found changes), but I've had difficulty getting it included upstream for various reasons. Here's a case in point: Some gentoo guys have started packaging Lustre. But the ldiskfs stuff is so horrible that the gentoo packages will be zfs-only. I think this is entirely to do with the fact that ldiskfs needs many kernel specific patches, while ZFS does not. I not aware that this has anything to do with the packaging (which would likely be completely different for Gentoo anyway?) So if you guys are going dig in your heels against the ldiskfs rename...well fine. That is not the end of the world. As long as this isn't foreshadowing of a larger resistance to continued reworking the of ldiskfs to packaging, I can live with the current poorly chosen package name. I've stated repeatedly that I'm in favour of packaging cleanup, and am very appreciative of your efforts in this area. I personally don't see much real benefit of this rename, and didn't realize it was such a big deal for you. I wanted to make sure I'm not reading things incorrectly from the end-user perspective here, which is why I also asked John and James to comment on it. While I definitely have an opinion on this and other topics, they ate very rarely set in stone, and I'm always open to reasonable arguments. It would be possible to rename the ldiskfs package and have a "Provides: lustre-ldiskfs" and/or "Obsoletes: lustre-ldiskfs" for upgrades, I just don't see the end-user benefit in this case.

            People

              mdiep Minh Diep
              morrone Christopher Morrone (Inactive)
              Votes:
              1 Vote for this issue
              Watchers:
              26 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: