lustre build system improvments (LU-3953)

[LU-7699] Overhaul lustre's versioning Created: 22/Jan/16  Updated: 06/Nov/19  Resolved: 03/May/16

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.9.0

Type: Technical task Priority: Critical
Reporter: Christopher Morrone Assignee: Christopher Morrone
Resolution: Done Votes: 0
Labels: None

Issue Links:
Blocker
is blocking LU-7642 Allow lustre source build without git... Resolved
is blocking LU-7643 Remove kernel version string from Lus... Resolved
is blocking LU-7645 Stop controlling the RPM Release fiel... Closed
Related
is related to LU-7976 sles builds severely impacted by rece... Resolved
is related to LU-474 fix release build issues Resolved
Rank (Obsolete): 9223372036854775807

 Description   

There are numerous oddities about Lustre's system for applying version numbers in the configuration and packaging system. At this point, it is looking strongly like an overhaul is in order.



 Comments   
Comment by Christopher Morrone [ 23/Jan/16 ]

Why do we have both lustre_ver.h and lustre_build_version.h? So silly.

Comment by Gerrit Updater [ 23/Jan/16 ]

Christopher J. Morrone (morrone2@llnl.gov) uploaded a new patch: http://review.whamcloud.com/18107
Subject: LU-7699 build: Replace version_tag.pl with LUSTRE-VERSION-GEN
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: e0a4a49958df15608615533c1e6f67bf3aca9f26

Comment by Christopher Morrone [ 23/Jan/16 ]

I have a first rough version of the first overhaul patch. It is a net reduction in almost 300 lines, so I must be doing something right.

I think the patch is already decent, but I haven't really considered how lbuild is going to react to the changes, or looked at how the rpm or dpkg packaging systems are going to handle it. It may need further work to get along with all of that. But it is worth running this version through the build farm as a fortestonly patch to see what blows up first.

There is quite a bit of additional cleanup that can be done, but I want to try to avoid putting too much change in one patch. I'll add additional cleanup in a follow-on patch.

Some things that I'll look into (just so I don't forget):

  • Get rid of lustre_build_version.h
  • Fix how Lustre embeds the version into its binaries
  • Further cleanup autoconf variables and m4 macros pertaining to versioning that are no longer necessary if we use standard ones like {AC_}

    PACKAGE_VERSION

  • git grep for "META" and deal with its obsolescence
Comment by Christopher Morrone [ 23/Jan/16 ]

Two more revisions to fix easy problems. The first major issue is now what I expected: I need to teach lbuild about the new versioning scheme.

It also looks like the ppc64 builder isn't happy either. I'm not sure that it provides enough information to do anything about it though...maybe I can make a good enough guess to get it working.

But that will all have to wait until next week.

Comment by Gerrit Updater [ 23/Jan/16 ]

Christopher J. Morrone (morrone2@llnl.gov) uploaded a new patch: http://review.whamcloud.com/18108
Subject: LU-7699 build: Eliminate lustre_build_version.h
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 49e8ccfea6d6845ca13e9a2dd8d0a5635c9bc8c0

Comment by Gerrit Updater [ 23/Jan/16 ]

Christopher J. Morrone (morrone2@llnl.gov) uploaded a new patch: http://review.whamcloud.com/18110
Subject: LU-7699 build: Replace LUSTRE_VERSION_STRING with PACKAGE_VERSION
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 4ace235aef94e2182bc62a28e5cdb479ac3c2fcb

Comment by Gerrit Updater [ 24/Jan/16 ]

Christopher J. Morrone (morrone2@llnl.gov) uploaded a new patch: http://review.whamcloud.com/18111
Subject: LU-7699 build: Remove unnecessary AC_LUSTRE_

{MAJOR,MINOR,PATCH,FIX}

Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: b28a363a54d2b45a8f64b6ec81f0d0abedc220f7

Comment by Gerrit Updater [ 24/Jan/16 ]

Christopher J. Morrone (morrone2@llnl.gov) uploaded a new patch: http://review.whamcloud.com/18112
Subject: LU-7699 build: debug intel buildfarm
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 073eb66d36756f2f498d4c9aceb982a25c97e364

Comment by Christopher Morrone [ 25/Jan/16 ]

OK, I think I'm ready for the first round of reviews on the current four patches:

Comment by Christopher Morrone [ 17/Feb/16 ]

It is now just three patches:

Comment by Christopher Morrone [ 15/Mar/16 ]

In a http://review.whamcloud.com/18107 review, Oleg pointed out a potential issue with lack of a special-case for RC releases. I admit that I totally missed how RC releases are currently being done. To make a long story short, I am going to argue that we really shouldn't do it that way any more.

Correct me if I am wrong, but as I now understand it, even though a tag might have a name of "2.8.0-RC1", the build system is currently stripping off the "-RC1" and building the code as if it is a final "2.8.0" release. As Oleg explained it, the intention here has been that when an RC is tested and agreed to be the final release, the packages from the RC build are already named with the final release version and can be released as-is. The advantage here is that we don't need to trust our build procedures to be consistent, we can simply release the packages from the most recent RC.

The down side should be pretty clear though too: RC releases are things that we explicitly want to release and have the general community test and vet. So takeing 2.8.0 as an example, which to date has five RC releases (-RC1 through -RC5), there are exactly five different versions of the code now out there in the wild with no way to tell them apart.

I would argue that this downside of having multiple different codes out in the wild with exactly the same version vastly out weighs the possibility that we might not compile and generate packages the same way twice.

So I would argue that not special-casing RC releases is a feature of change 18017, not a defect. We will need to come up with a new way of versioning RC releases. I would suggest that we use the obvious method already established (but not actually fully employeed):

Release candidates will use the previous release's number, but have the thrid number start at 90. For example, release candidates for the 2.8.0 release would have version numbers like: 2.7.90, 2.7.91, 2.7.92, 2.7.93, etc.

This also has the advantage that it works well with packaging systems like RPM. Many rpm-based distros explicitly advise against using substrings like "RC1" in version strings, because rpm just doesn't handle them, and assiciated package upgrades in any particularly intelligent way. Granted, we somewhat avoided that problem by stripping off the RC altogether.

But I think it is very bad form to have multiple different versions of the code to have the same version string.

Comment by Oleg Drokin [ 15/Mar/16 ]

2.7.90, 2.7.91, ... are already used. they mean code drops in code freeze - i.e. when only blockers remain.

RCs (release candidates) on the other hand are generated when there are no more blockers left (At the time of tagging, anyway). I guess it's possible to continue the versioning as is, but we might run our of digits quite soon in some cases.

The "five versions of code" issue is real, I guess, but on the other hand it could be solved with other means. E.g. once we transition to signed packages, the real releases might be signed with a different key than the unsigned (or signed with a "build bot" only) key. Also we don't widely distribute these RCs and so nobody could get them. We can also destroy interim RCs from download site if needed (or it happens automatically over a relatively short time). And we can tell them apart by a build date too.
The actual drawbacks I encountered are: The changelog could only list an estimated "release date" since who knows how long the testing and all the approvals would take.

The "we don't need to trust our build procedures to be consistent" is a real risk on the other hand on many levels. There might have been a system update between the latest RC and the release that updated some library breaking something in process (updating gcc, whatever), a glitch during building for other reasons producing a dud (ram error, disk error), somebody might have compromised the build server meanwhile (or inserted some code into the git tree), ...
All in all what it really means is we would really need to test the final release just like an another RC just to be sure it's still viable, and if it's not then what?

In short, there's certain appeal to release code that was actually tested vs code that was only built even when it's supposedly the same as the one that was tested because it was supposedly built from the same source.

Comment by Christopher Morrone [ 15/Mar/16 ]

2.7.90, 2.7.91, ... are already used

OK, fine, then they could start at .120, or .500, or whatever. I don't care so much about the details as long as we follow some simple rules that make the versioning work well for the various packaging systems like RPM. For instance, each release (including release candidate releases) should have a unique version number. The version number should be mostly numerical, and atomically increasing. Letters should be used sparingly, and preferably not at all, in the main part of the version string. (Where main part is the first four period-separated numbers. We allow for alphanumeric strings for per-organization branding after that)

The "five versions of code" issue is real, I guess, but on the other hand it could be solved with other means. E.g. once we transition to signed packages, the real releases might be signed with a different key than the unsigned (or signed with a "build bot" only) key.

With all due respect, I do no think that will be an acceptable solution. The version string is specifically designed to make it easy to tell different builds and packages apart. Unique version numbers are something packaging systems like rpm, and dpkg understand. The relationship between different keys and how they effect install precedence, is not. You say that there are other reasonable methods, but I do not really think we are likely to find one. The version string is very explicitly the universal solution for this problem. We are introducing larger problems by trying to avoid that universal solution.

Also we don't widely distribute these RCs and so nobody could get them.

Right, and no doubt that is how this has flown under the radar for so long. Because this part of your development phase is kept pretty secret.

But this is pretty much the opposite of how a good open source development project, with an open development process, should work. If we had a more open process with good communication, then every RC release would have an announcement on mailing lists letting people know where to get the packages and asking for help in testing. This announcement shouldn't just go to lustre-devel, it should also got to lustre-discuss. We want adventurous users/sysadmins that can carve out some time to help with testing to do so. They should not have to go to git to do that, they should be able to download packages.

If we make those packages reasonably well known and easy to access (which we should definitely do for 2.9.0 RCs and into the future), then it will not be at all acceptable to have them all use exactly the same version number. They have to be uniquely and clearly versioned.

The "we don't need to trust our build procedures to be consistent" is a real risk on the other hand on many levels. There might have been a system update between the latest RC and the release that updated some library breaking something in process (updating gcc, whatever)

I agree, that is a more reasonable concern. But it is one that is pretty much entirely manageable with a small amount of process and good communication. When RC releases are about to start, we send an announcment out to the build farm admins saying "RCs are about to begin. Please freeze all changes to builders for branch X until further notice". If you are really worried, you call them on the phone and get a verbal confirmation that they understand. Many, many, many software projects are able to handle that level of release management in the real world. I believe that we can on Lustre as well. I'm putting my money where my mouth is too, by spending LLNL manpower on setting up a new buildfarm for Lustre. We'll present the prototype at LUG this year.

Furthermore, there are those that would argue that if Lustre can't compile against multiple different libraries with the same ABI, then Lustre is just broken. Remember that Lustre does not own the entire OS. It really needs to work correctly anywhere the packages install without being forced.

But I think keeping the builders static during the RC phase is totally reasonable.

a glitch during building for other reasons producing a dud (ram error, disk error),

Thats a one-in-a-billion type of problem. I don't think that in even remotely as much of a problem as releasing several packages with different contents, but versioning them identically.

somebody might have compromised the build server meanwhile (or inserted some code into the git tree), ...

That isn't a problem specific to release condidates, so I don't think it is relevant to the conversation.

All in all what it really means is we would really need to test the final release just like an another RC just to be sure it's still viable, and if it's not then what?

I actually think it is fairly reasonable to build the same commit with no other change but the version string and not do any additional testing. But if people want to do more testing, that is certainly fine with me. I would argue that 8-24 hours of automated-only testing would be more than sufficient as a sanity check that nothing went horribly wrong between the two tags (of the exact same commit).

In short, there's certain appeal to release code that was actually tested vs code that was only built even when it's supposedly the same as the one that was tested because it was supposedly built from the same source.

Yes, I understand the appeal and your points. But I don't think they are strong enough to justify the offsetting bad packing practice that you have introduced.

We need to stop having multiple packages of different contents with the same exact version. That is simply not acceptable.

Comment by Gerrit Updater [ 23/Mar/16 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/18107/
Subject: LU-7699 build: Replace version_tag.pl with LUSTRE-VERSION-GEN
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: ee813dbaa2a2b86f4873c4c289f62a0243aa9809

Comment by James A Simmons [ 11/Apr/16 ]

I tried a spin with the latest lustre on Ubuntu 15.04 and I get this error:

dpkg-buildpackage: warning: debian/changelog(l1): version '2.8.51_17_g5bd8f72-1' is invalid: version number contains illegal character '_'
LINE: lustre (2.8.51_17_g5bd8f72-1) unstable; urgency=low
dpkg-buildpackage: source package lustre
dpkg-buildpackage: source version 2.8.51_17_g5bd8f72-1
dpkg-buildpackage: error: version number contains illegal character '_'
autoMakefile:1136: recipe for target 'debs' failed
make: *** [debs] Error 25

So I did some digging and I found that '_' is actually an invalid character for fedora as well. See this link:

https://fedoraproject.org/wiki/Packaging:NamingGuidelines#Separators

So basically we will need to fix the lustre version not to contain any '_' for proper packaging on rpm and deb systems.

Comment by Christopher Morrone [ 11/Apr/16 ]

So I did some digging and I found that '_' is actually an invalid character for fedora as well. See this link:

Actually, that page says the following:

When naming packages for Fedora, the maintainer must use the dash '-' as the delimiter for name parts. The maintainer must NOT use an underscore '_', a plus '+', or a period '.' as a delimiter.

We are not using underscores in the name parts, i.e. the Name field. Underscores are only used in the Version.

But yes, it looks like we'll need to work something out for dpkg systems.

The main reason that we use underscores in the version is because Lustre currently has a variable number of version fields, which makes it difficult to tell the delineation between the upstream version and the following part of the version, whether third-party or development (git describe) information.

We can't use a dash (-) in the version string on rpm systems. That pretty much leaves a period as the only other sane option if we decide to stop using underscores.

But maybe we stick with underscores on rpm systems, and munge the string in some acceptable way for dpkg?

Comment by James A Simmons [ 11/Apr/16 ]

I discovered the is needed on rpms systems for the version field the hard way today. For dpkgs both Name and Version fields can not have '_'. I need to rethink this. I did see many sites stating you should avoid '-' at all cost for rpms Also I found when you have a top level .git directory if you remove the LUSTRE-VERSION-FILE it will not regenerate and it causes make rpms to fail.

Comment by Christopher Morrone [ 11/Apr/16 ]

Also I found when you have a top level .git directory if you remove the LUSTRE-VERSION-FILE it will not regenerate and it causes make rpms to fail.

Yeah, that is true. For now, you will just need to rerun autogen.sh to make any version changes. This could be automated in the future, but we pretty much need to fix autoreconf (LU-7700) to do it sanely.

Comment by James A Simmons [ 12/Apr/16 ]

I tracked down what needs to be changed for make debs. We have:

lversion=$$(sed -ne 's/^#define VERSION "(.*)"$$/\1/p' config.h);

The sed command needs to be modified to change '_' to '-'. I'm no sed master so Chris what needs to be done to make that so.

Comment by Christopher Morrone [ 12/Apr/16 ]

Thanks for the legwork on that, James!

We don't even need sed any more; I'll just use tr.

Comment by Gerrit Updater [ 12/Apr/16 ]

Christopher J. Morrone (morrone2@llnl.gov) uploaded a new patch: http://review.whamcloud.com/19488
Subject: LU-7699 build: Convert version underscores to dashes for dpkg
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 368150054a379f1da81dce8b2e5016016f126655

Comment by James A Simmons [ 12/Apr/16 ]

Yeah more simplification. Debian platforms are back in business. Thanks Chris.

Comment by Giuseppe Di Natale (Inactive) [ 13/Apr/16 ]

Already mentioned this to Chris, but this may have broken SUSE builds on the Lustre Buildbot. The error I'm seeing on builds is "error: line 86: Empty tag: Release:". Example log is here.

Comment by Christopher Morrone [ 13/Apr/16 ]

Already mentioned this to Chris, but this may have broken SUSE builds on the Lustre Buildbot.

I don't think so. I think that this is a problem in the buildbot configuration. I will open a github ticket shortly.

Comment by Gerrit Updater [ 17/Apr/16 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/18108/
Subject: LU-7699 build: Eliminate lustre_build_version.h
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 9c710162e5acebd860f1d3f0e1bf204ac1ba98c1

Comment by Gerrit Updater [ 21/Apr/16 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/18111/
Subject: LU-7699 build: Convert lustre_ver.h.in into a static .h file
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 6e84dbce05959e854d886772841a5d6d0690eb65

Comment by Gerrit Updater [ 02/May/16 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/19488/
Subject: LU-7699 build: Convert version underscores to dashes for dpkg
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: f2d28899a2c3b5a427f4322623ba138887fe6adc

Comment by James A Simmons [ 03/May/16 ]

Chris is everything for this ticket landed. Can it be closed?

Comment by Christopher Morrone [ 03/May/16 ]

Yes, I think now is a good time to close this one.

Generated at Sat Feb 10 02:11:10 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.