Details
-
Technical task
-
Resolution: Done
-
Critical
-
None
-
None
-
9223372036854775807
Description
There are numerous oddities about Lustre's system for applying version numbers in the configuration and packaging system. At this point, it is looking strongly like an overhaul is in order.
Attachments
Issue Links
Activity
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/18107/
Subject: LU-7699 build: Replace version_tag.pl with LUSTRE-VERSION-GEN
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: ee813dbaa2a2b86f4873c4c289f62a0243aa9809
2.7.90, 2.7.91, ... are already used
OK, fine, then they could start at .120, or .500, or whatever. I don't care so much about the details as long as we follow some simple rules that make the versioning work well for the various packaging systems like RPM. For instance, each release (including release candidate releases) should have a unique version number. The version number should be mostly numerical, and atomically increasing. Letters should be used sparingly, and preferably not at all, in the main part of the version string. (Where main part is the first four period-separated numbers. We allow for alphanumeric strings for per-organization branding after that)
The "five versions of code" issue is real, I guess, but on the other hand it could be solved with other means. E.g. once we transition to signed packages, the real releases might be signed with a different key than the unsigned (or signed with a "build bot" only) key.
With all due respect, I do no think that will be an acceptable solution. The version string is specifically designed to make it easy to tell different builds and packages apart. Unique version numbers are something packaging systems like rpm, and dpkg understand. The relationship between different keys and how they effect install precedence, is not. You say that there are other reasonable methods, but I do not really think we are likely to find one. The version string is very explicitly the universal solution for this problem. We are introducing larger problems by trying to avoid that universal solution.
Also we don't widely distribute these RCs and so nobody could get them.
Right, and no doubt that is how this has flown under the radar for so long. Because this part of your development phase is kept pretty secret.
But this is pretty much the opposite of how a good open source development project, with an open development process, should work. If we had a more open process with good communication, then every RC release would have an announcement on mailing lists letting people know where to get the packages and asking for help in testing. This announcement shouldn't just go to lustre-devel, it should also got to lustre-discuss. We want adventurous users/sysadmins that can carve out some time to help with testing to do so. They should not have to go to git to do that, they should be able to download packages.
If we make those packages reasonably well known and easy to access (which we should definitely do for 2.9.0 RCs and into the future), then it will not be at all acceptable to have them all use exactly the same version number. They have to be uniquely and clearly versioned.
The "we don't need to trust our build procedures to be consistent" is a real risk on the other hand on many levels. There might have been a system update between the latest RC and the release that updated some library breaking something in process (updating gcc, whatever)
I agree, that is a more reasonable concern. But it is one that is pretty much entirely manageable with a small amount of process and good communication. When RC releases are about to start, we send an announcment out to the build farm admins saying "RCs are about to begin. Please freeze all changes to builders for branch X until further notice". If you are really worried, you call them on the phone and get a verbal confirmation that they understand. Many, many, many software projects are able to handle that level of release management in the real world. I believe that we can on Lustre as well. I'm putting my money where my mouth is too, by spending LLNL manpower on setting up a new buildfarm for Lustre. We'll present the prototype at LUG this year.
Furthermore, there are those that would argue that if Lustre can't compile against multiple different libraries with the same ABI, then Lustre is just broken. Remember that Lustre does not own the entire OS. It really needs to work correctly anywhere the packages install without being forced.
But I think keeping the builders static during the RC phase is totally reasonable.
a glitch during building for other reasons producing a dud (ram error, disk error),
Thats a one-in-a-billion type of problem. I don't think that in even remotely as much of a problem as releasing several packages with different contents, but versioning them identically.
somebody might have compromised the build server meanwhile (or inserted some code into the git tree), ...
That isn't a problem specific to release condidates, so I don't think it is relevant to the conversation.
All in all what it really means is we would really need to test the final release just like an another RC just to be sure it's still viable, and if it's not then what?
I actually think it is fairly reasonable to build the same commit with no other change but the version string and not do any additional testing. But if people want to do more testing, that is certainly fine with me. I would argue that 8-24 hours of automated-only testing would be more than sufficient as a sanity check that nothing went horribly wrong between the two tags (of the exact same commit).
In short, there's certain appeal to release code that was actually tested vs code that was only built even when it's supposedly the same as the one that was tested because it was supposedly built from the same source.
Yes, I understand the appeal and your points. But I don't think they are strong enough to justify the offsetting bad packing practice that you have introduced.
We need to stop having multiple packages of different contents with the same exact version. That is simply not acceptable.
2.7.90, 2.7.91, ... are already used. they mean code drops in code freeze - i.e. when only blockers remain.
RCs (release candidates) on the other hand are generated when there are no more blockers left (At the time of tagging, anyway). I guess it's possible to continue the versioning as is, but we might run our of digits quite soon in some cases.
The "five versions of code" issue is real, I guess, but on the other hand it could be solved with other means. E.g. once we transition to signed packages, the real releases might be signed with a different key than the unsigned (or signed with a "build bot" only) key. Also we don't widely distribute these RCs and so nobody could get them. We can also destroy interim RCs from download site if needed (or it happens automatically over a relatively short time). And we can tell them apart by a build date too.
The actual drawbacks I encountered are: The changelog could only list an estimated "release date" since who knows how long the testing and all the approvals would take.
The "we don't need to trust our build procedures to be consistent" is a real risk on the other hand on many levels. There might have been a system update between the latest RC and the release that updated some library breaking something in process (updating gcc, whatever), a glitch during building for other reasons producing a dud (ram error, disk error), somebody might have compromised the build server meanwhile (or inserted some code into the git tree), ...
All in all what it really means is we would really need to test the final release just like an another RC just to be sure it's still viable, and if it's not then what?
In short, there's certain appeal to release code that was actually tested vs code that was only built even when it's supposedly the same as the one that was tested because it was supposedly built from the same source.
In a http://review.whamcloud.com/18107 review, Oleg pointed out a potential issue with lack of a special-case for RC releases. I admit that I totally missed how RC releases are currently being done. To make a long story short, I am going to argue that we really shouldn't do it that way any more.
Correct me if I am wrong, but as I now understand it, even though a tag might have a name of "2.8.0-RC1", the build system is currently stripping off the "-RC1" and building the code as if it is a final "2.8.0" release. As Oleg explained it, the intention here has been that when an RC is tested and agreed to be the final release, the packages from the RC build are already named with the final release version and can be released as-is. The advantage here is that we don't need to trust our build procedures to be consistent, we can simply release the packages from the most recent RC.
The down side should be pretty clear though too: RC releases are things that we explicitly want to release and have the general community test and vet. So takeing 2.8.0 as an example, which to date has five RC releases (-RC1 through -RC5), there are exactly five different versions of the code now out there in the wild with no way to tell them apart.
I would argue that this downside of having multiple different codes out in the wild with exactly the same version vastly out weighs the possibility that we might not compile and generate packages the same way twice.
So I would argue that not special-casing RC releases is a feature of change 18017, not a defect. We will need to come up with a new way of versioning RC releases. I would suggest that we use the obvious method already established (but not actually fully employeed):
Release candidates will use the previous release's number, but have the thrid number start at 90. For example, release candidates for the 2.8.0 release would have version numbers like: 2.7.90, 2.7.91, 2.7.92, 2.7.93, etc.
This also has the advantage that it works well with packaging systems like RPM. Many rpm-based distros explicitly advise against using substrings like "RC1" in version strings, because rpm just doesn't handle them, and assiciated package upgrades in any particularly intelligent way. Granted, we somewhat avoided that problem by stripping off the RC altogether.
But I think it is very bad form to have multiple different versions of the code to have the same version string.
OK, I think I'm ready for the first round of reviews on the current four patches:
Christopher J. Morrone (morrone2@llnl.gov) uploaded a new patch: http://review.whamcloud.com/18112
Subject: LU-7699 build: debug intel buildfarm
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 073eb66d36756f2f498d4c9aceb982a25c97e364
Christopher J. Morrone (morrone2@llnl.gov) uploaded a new patch: http://review.whamcloud.com/18111
Subject: LU-7699 build: Remove unnecessary AC_LUSTRE_
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: b28a363a54d2b45a8f64b6ec81f0d0abedc220f7
Christopher J. Morrone (morrone2@llnl.gov) uploaded a new patch: http://review.whamcloud.com/18110
Subject: LU-7699 build: Replace LUSTRE_VERSION_STRING with PACKAGE_VERSION
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 4ace235aef94e2182bc62a28e5cdb479ac3c2fcb
Christopher J. Morrone (morrone2@llnl.gov) uploaded a new patch: http://review.whamcloud.com/18108
Subject: LU-7699 build: Eliminate lustre_build_version.h
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 49e8ccfea6d6845ca13e9a2dd8d0a5635c9bc8c0
I tried a spin with the latest lustre on Ubuntu 15.04 and I get this error:
dpkg-buildpackage: warning: debian/changelog(l1): version '2.8.51_17_g5bd8f72-1' is invalid: version number contains illegal character '_'
LINE: lustre (2.8.51_17_g5bd8f72-1) unstable; urgency=low
dpkg-buildpackage: source package lustre
dpkg-buildpackage: source version 2.8.51_17_g5bd8f72-1
dpkg-buildpackage: error: version number contains illegal character '_'
autoMakefile:1136: recipe for target 'debs' failed
make: *** [debs] Error 25
So I did some digging and I found that '_' is actually an invalid character for fedora as well. See this link:
https://fedoraproject.org/wiki/Packaging:NamingGuidelines#Separators
So basically we will need to fix the lustre version not to contain any '_' for proper packaging on rpm and deb systems.