Details
-
Technical task
-
Resolution: Done
-
Critical
-
None
-
None
-
9223372036854775807
Description
There are numerous oddities about Lustre's system for applying version numbers in the configuration and packaging system. At this point, it is looking strongly like an overhaul is in order.
Attachments
Issue Links
Activity
Thanks for the legwork on that, James!
We don't even need sed any more; I'll just use tr.
I tracked down what needs to be changed for make debs. We have:
lversion=$$(sed -ne 's/^#define VERSION "(.*)"$$/\1/p' config.h);
The sed command needs to be modified to change '_' to '-'. I'm no sed master so Chris what needs to be done to make that so.
Also I found when you have a top level .git directory if you remove the LUSTRE-VERSION-FILE it will not regenerate and it causes make rpms to fail.
Yeah, that is true. For now, you will just need to rerun autogen.sh to make any version changes. This could be automated in the future, but we pretty much need to fix autoreconf (LU-7700) to do it sanely.
I discovered the is needed on rpms systems for the version field the hard way today. For dpkgs both Name and Version fields can not have '_'. I need to rethink this. I did see many sites stating you should avoid '-' at all cost for rpms
Also I found when you have a top level .git directory if you remove the LUSTRE-VERSION-FILE it will not regenerate and it causes make rpms to fail.
So I did some digging and I found that '_' is actually an invalid character for fedora as well. See this link:
Actually, that page says the following:
When naming packages for Fedora, the maintainer must use the dash '-' as the delimiter for name parts. The maintainer must NOT use an underscore '_', a plus '+', or a period '.' as a delimiter.
We are not using underscores in the name parts, i.e. the Name field. Underscores are only used in the Version.
But yes, it looks like we'll need to work something out for dpkg systems.
The main reason that we use underscores in the version is because Lustre currently has a variable number of version fields, which makes it difficult to tell the delineation between the upstream version and the following part of the version, whether third-party or development (git describe) information.
We can't use a dash (-) in the version string on rpm systems. That pretty much leaves a period as the only other sane option if we decide to stop using underscores.
But maybe we stick with underscores on rpm systems, and munge the string in some acceptable way for dpkg?
I tried a spin with the latest lustre on Ubuntu 15.04 and I get this error:
dpkg-buildpackage: warning: debian/changelog(l1): version '2.8.51_17_g5bd8f72-1' is invalid: version number contains illegal character '_'
LINE: lustre (2.8.51_17_g5bd8f72-1) unstable; urgency=low
dpkg-buildpackage: source package lustre
dpkg-buildpackage: source version 2.8.51_17_g5bd8f72-1
dpkg-buildpackage: error: version number contains illegal character '_'
autoMakefile:1136: recipe for target 'debs' failed
make: *** [debs] Error 25
So I did some digging and I found that '_' is actually an invalid character for fedora as well. See this link:
https://fedoraproject.org/wiki/Packaging:NamingGuidelines#Separators
So basically we will need to fix the lustre version not to contain any '_' for proper packaging on rpm and deb systems.
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/18107/
Subject: LU-7699 build: Replace version_tag.pl with LUSTRE-VERSION-GEN
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: ee813dbaa2a2b86f4873c4c289f62a0243aa9809
2.7.90, 2.7.91, ... are already used
OK, fine, then they could start at .120, or .500, or whatever. I don't care so much about the details as long as we follow some simple rules that make the versioning work well for the various packaging systems like RPM. For instance, each release (including release candidate releases) should have a unique version number. The version number should be mostly numerical, and atomically increasing. Letters should be used sparingly, and preferably not at all, in the main part of the version string. (Where main part is the first four period-separated numbers. We allow for alphanumeric strings for per-organization branding after that)
The "five versions of code" issue is real, I guess, but on the other hand it could be solved with other means. E.g. once we transition to signed packages, the real releases might be signed with a different key than the unsigned (or signed with a "build bot" only) key.
With all due respect, I do no think that will be an acceptable solution. The version string is specifically designed to make it easy to tell different builds and packages apart. Unique version numbers are something packaging systems like rpm, and dpkg understand. The relationship between different keys and how they effect install precedence, is not. You say that there are other reasonable methods, but I do not really think we are likely to find one. The version string is very explicitly the universal solution for this problem. We are introducing larger problems by trying to avoid that universal solution.
Also we don't widely distribute these RCs and so nobody could get them.
Right, and no doubt that is how this has flown under the radar for so long. Because this part of your development phase is kept pretty secret.
But this is pretty much the opposite of how a good open source development project, with an open development process, should work. If we had a more open process with good communication, then every RC release would have an announcement on mailing lists letting people know where to get the packages and asking for help in testing. This announcement shouldn't just go to lustre-devel, it should also got to lustre-discuss. We want adventurous users/sysadmins that can carve out some time to help with testing to do so. They should not have to go to git to do that, they should be able to download packages.
If we make those packages reasonably well known and easy to access (which we should definitely do for 2.9.0 RCs and into the future), then it will not be at all acceptable to have them all use exactly the same version number. They have to be uniquely and clearly versioned.
The "we don't need to trust our build procedures to be consistent" is a real risk on the other hand on many levels. There might have been a system update between the latest RC and the release that updated some library breaking something in process (updating gcc, whatever)
I agree, that is a more reasonable concern. But it is one that is pretty much entirely manageable with a small amount of process and good communication. When RC releases are about to start, we send an announcment out to the build farm admins saying "RCs are about to begin. Please freeze all changes to builders for branch X until further notice". If you are really worried, you call them on the phone and get a verbal confirmation that they understand. Many, many, many software projects are able to handle that level of release management in the real world. I believe that we can on Lustre as well. I'm putting my money where my mouth is too, by spending LLNL manpower on setting up a new buildfarm for Lustre. We'll present the prototype at LUG this year.
Furthermore, there are those that would argue that if Lustre can't compile against multiple different libraries with the same ABI, then Lustre is just broken. Remember that Lustre does not own the entire OS. It really needs to work correctly anywhere the packages install without being forced.
But I think keeping the builders static during the RC phase is totally reasonable.
a glitch during building for other reasons producing a dud (ram error, disk error),
Thats a one-in-a-billion type of problem. I don't think that in even remotely as much of a problem as releasing several packages with different contents, but versioning them identically.
somebody might have compromised the build server meanwhile (or inserted some code into the git tree), ...
That isn't a problem specific to release condidates, so I don't think it is relevant to the conversation.
All in all what it really means is we would really need to test the final release just like an another RC just to be sure it's still viable, and if it's not then what?
I actually think it is fairly reasonable to build the same commit with no other change but the version string and not do any additional testing. But if people want to do more testing, that is certainly fine with me. I would argue that 8-24 hours of automated-only testing would be more than sufficient as a sanity check that nothing went horribly wrong between the two tags (of the exact same commit).
In short, there's certain appeal to release code that was actually tested vs code that was only built even when it's supposedly the same as the one that was tested because it was supposedly built from the same source.
Yes, I understand the appeal and your points. But I don't think they are strong enough to justify the offsetting bad packing practice that you have introduced.
We need to stop having multiple packages of different contents with the same exact version. That is simply not acceptable.
2.7.90, 2.7.91, ... are already used. they mean code drops in code freeze - i.e. when only blockers remain.
RCs (release candidates) on the other hand are generated when there are no more blockers left (At the time of tagging, anyway). I guess it's possible to continue the versioning as is, but we might run our of digits quite soon in some cases.
The "five versions of code" issue is real, I guess, but on the other hand it could be solved with other means. E.g. once we transition to signed packages, the real releases might be signed with a different key than the unsigned (or signed with a "build bot" only) key. Also we don't widely distribute these RCs and so nobody could get them. We can also destroy interim RCs from download site if needed (or it happens automatically over a relatively short time). And we can tell them apart by a build date too.
The actual drawbacks I encountered are: The changelog could only list an estimated "release date" since who knows how long the testing and all the approvals would take.
The "we don't need to trust our build procedures to be consistent" is a real risk on the other hand on many levels. There might have been a system update between the latest RC and the release that updated some library breaking something in process (updating gcc, whatever), a glitch during building for other reasons producing a dud (ram error, disk error), somebody might have compromised the build server meanwhile (or inserted some code into the git tree), ...
All in all what it really means is we would really need to test the final release just like an another RC just to be sure it's still viable, and if it's not then what?
In short, there's certain appeal to release code that was actually tested vs code that was only built even when it's supposedly the same as the one that was tested because it was supposedly built from the same source.
In a http://review.whamcloud.com/18107 review, Oleg pointed out a potential issue with lack of a special-case for RC releases. I admit that I totally missed how RC releases are currently being done. To make a long story short, I am going to argue that we really shouldn't do it that way any more.
Correct me if I am wrong, but as I now understand it, even though a tag might have a name of "2.8.0-RC1", the build system is currently stripping off the "-RC1" and building the code as if it is a final "2.8.0" release. As Oleg explained it, the intention here has been that when an RC is tested and agreed to be the final release, the packages from the RC build are already named with the final release version and can be released as-is. The advantage here is that we don't need to trust our build procedures to be consistent, we can simply release the packages from the most recent RC.
The down side should be pretty clear though too: RC releases are things that we explicitly want to release and have the general community test and vet. So takeing 2.8.0 as an example, which to date has five RC releases (-RC1 through -RC5), there are exactly five different versions of the code now out there in the wild with no way to tell them apart.
I would argue that this downside of having multiple different codes out in the wild with exactly the same version vastly out weighs the possibility that we might not compile and generate packages the same way twice.
So I would argue that not special-casing RC releases is a feature of change 18017, not a defect. We will need to come up with a new way of versioning RC releases. I would suggest that we use the obvious method already established (but not actually fully employeed):
Release candidates will use the previous release's number, but have the thrid number start at 90. For example, release candidates for the 2.8.0 release would have version numbers like: 2.7.90, 2.7.91, 2.7.92, 2.7.93, etc.
This also has the advantage that it works well with packaging systems like RPM. Many rpm-based distros explicitly advise against using substrings like "RC1" in version strings, because rpm just doesn't handle them, and assiciated package upgrades in any particularly intelligent way. Granted, we somewhat avoided that problem by stripping off the RC altogether.
But I think it is very bad form to have multiple different versions of the code to have the same version string.
Christopher J. Morrone (morrone2@llnl.gov) uploaded a new patch: http://review.whamcloud.com/19488
Subject:
LU-7699build: Convert version underscores to dashes for dpkgProject: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 368150054a379f1da81dce8b2e5016016f126655