[LU-7699] Overhaul lustre's versioning - Whamcloud Community JIRA

Details

Type: Technical task
Resolution: Done
Priority: Critical
Fix Version/s: Lustre 2.9.0
Affects Version/s: None
Labels:
None

Rank (Obsolete):
9223372036854775807

Description

There are numerous oddities about Lustre's system for applying version numbers in the configuration and packaging system. At this point, it is looking strongly like an overhaul is in order.

Attachments

Issue Links

is blocking

LU-7642 Allow lustre source build without git working directory

Resolved

LU-7643 Remove kernel version string from Lustre release field

Resolved

LU-7645 Stop controlling the RPM Release field from Lustre's build system

Closed

is related to

LU-7976 sles builds severely impacted by recent naming changes

Resolved

LU-474 fix release build issues

Resolved

Activity

[LU-7699] Overhaul lustre's versioning

Gerrit Updater added a comment - 12/Apr/16 6:48 PM

Christopher J. Morrone (morrone2@llnl.gov) uploaded a new patch: http://review.whamcloud.com/19488
Subject: ~~LU-7699~~ build: Convert version underscores to dashes for dpkg
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 368150054a379f1da81dce8b2e5016016f126655

Gerrit Updater added a comment - 12/Apr/16 6:48 PM Christopher J. Morrone (morrone2@llnl.gov) uploaded a new patch: http://review.whamcloud.com/19488 Subject: LU-7699 build: Convert version underscores to dashes for dpkg Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 368150054a379f1da81dce8b2e5016016f126655

Christopher Morrone (Inactive) added a comment - 12/Apr/16 6:41 PM

Thanks for the legwork on that, James!

We don't even need sed any more; I'll just use tr.

Christopher Morrone (Inactive) added a comment - 12/Apr/16 6:41 PM Thanks for the legwork on that, James! We don't even need sed any more; I'll just use tr.

James A Simmons added a comment - 12/Apr/16 5:52 PM

I tracked down what needs to be changed for make debs. We have:

lversion=$$(sed -ne 's/^#define VERSION "(.*)"$$/\1/p' config.h);

The sed command needs to be modified to change '_' to '-'. I'm no sed master so Chris what needs to be done to make that so.

James A Simmons added a comment - 12/Apr/16 5:52 PM I tracked down what needs to be changed for make debs. We have: lversion=$$(sed -ne 's/^#define VERSION "(.*)"$$/\1/p' config.h); The sed command needs to be modified to change '_' to '-'. I'm no sed master so Chris what needs to be done to make that so.

Christopher Morrone (Inactive) added a comment - 11/Apr/16 8:18 PM

Also I found when you have a top level .git directory if you remove the LUSTRE-VERSION-FILE it will not regenerate and it causes make rpms to fail.

Yeah, that is true. For now, you will just need to rerun autogen.sh to make any version changes. This could be automated in the future, but we pretty much need to fix autoreconf (LU-7700) to do it sanely.

Christopher Morrone (Inactive) added a comment - 11/Apr/16 8:18 PM Also I found when you have a top level .git directory if you remove the LUSTRE-VERSION-FILE it will not regenerate and it causes make rpms to fail. Yeah, that is true. For now, you will just need to rerun autogen.sh to make any version changes. This could be automated in the future, but we pretty much need to fix autoreconf ( LU-7700 ) to do it sanely.

James A Simmons added a comment - 11/Apr/16 7:45 PM - edited

I discovered the is needed on rpms systems for the version field the hard way today. For dpkgs both Name and Version fields can not have '_'. I need to rethink this. I did see many sites stating you should avoid '-' at all cost for rpms Also I found when you have a top level .git directory if you remove the LUSTRE-VERSION-FILE it will not regenerate and it causes make rpms to fail.

James A Simmons added a comment - 11/Apr/16 7:45 PM - edited I discovered the is needed on rpms systems for the version field the hard way today. For dpkgs both Name and Version fields can not have '_'. I need to rethink this. I did see many sites stating you should avoid '-' at all cost for rpms Also I found when you have a top level .git directory if you remove the LUSTRE-VERSION-FILE it will not regenerate and it causes make rpms to fail.

Christopher Morrone (Inactive) added a comment - 11/Apr/16 5:55 PM

So I did some digging and I found that '_' is actually an invalid character for fedora as well. See this link:

Actually, that page says the following:

When naming packages for Fedora, the maintainer must use the dash '-' as the delimiter for name parts. The maintainer must NOT use an underscore '_', a plus '+', or a period '.' as a delimiter.

We are not using underscores in the name parts, i.e. the Name field. Underscores are only used in the Version.

But yes, it looks like we'll need to work something out for dpkg systems.

The main reason that we use underscores in the version is because Lustre currently has a variable number of version fields, which makes it difficult to tell the delineation between the upstream version and the following part of the version, whether third-party or development (git describe) information.

We can't use a dash (-) in the version string on rpm systems. That pretty much leaves a period as the only other sane option if we decide to stop using underscores.

But maybe we stick with underscores on rpm systems, and munge the string in some acceptable way for dpkg?

Christopher Morrone (Inactive) added a comment - 11/Apr/16 5:55 PM So I did some digging and I found that '_' is actually an invalid character for fedora as well. See this link: Actually, that page says the following: When naming packages for Fedora, the maintainer must use the dash '-' as the delimiter for name parts. The maintainer must NOT use an underscore '_', a plus '+', or a period '.' as a delimiter. We are not using underscores in the name parts , i.e. the Name field. Underscores are only used in the Version. But yes, it looks like we'll need to work something out for dpkg systems. The main reason that we use underscores in the version is because Lustre currently has a variable number of version fields, which makes it difficult to tell the delineation between the upstream version and the following part of the version, whether third-party or development (git describe) information. We can't use a dash (-) in the version string on rpm systems. That pretty much leaves a period as the only other sane option if we decide to stop using underscores. But maybe we stick with underscores on rpm systems, and munge the string in some acceptable way for dpkg?

James A Simmons added a comment - 11/Apr/16 1:44 AM

I tried a spin with the latest lustre on Ubuntu 15.04 and I get this error:

dpkg-buildpackage: warning: debian/changelog(l1): version '2.8.51_17_g5bd8f72-1' is invalid: version number contains illegal character '_'
LINE: lustre (2.8.51_17_g5bd8f72-1) unstable; urgency=low
dpkg-buildpackage: source package lustre
dpkg-buildpackage: source version 2.8.51_17_g5bd8f72-1
dpkg-buildpackage: error: version number contains illegal character '_'
autoMakefile:1136: recipe for target 'debs' failed
make: *** [debs] Error 25

So I did some digging and I found that '_' is actually an invalid character for fedora as well. See this link:

https://fedoraproject.org/wiki/Packaging:NamingGuidelines#Separators

So basically we will need to fix the lustre version not to contain any '_' for proper packaging on rpm and deb systems.

James A Simmons added a comment - 11/Apr/16 1:44 AM I tried a spin with the latest lustre on Ubuntu 15.04 and I get this error: dpkg-buildpackage: warning: debian/changelog(l1): version '2.8.51_17_g5bd8f72-1' is invalid: version number contains illegal character '_' LINE: lustre (2.8.51_17_g5bd8f72-1) unstable; urgency=low dpkg-buildpackage: source package lustre dpkg-buildpackage: source version 2.8.51_17_g5bd8f72-1 dpkg-buildpackage: error: version number contains illegal character '_' autoMakefile:1136: recipe for target 'debs' failed make: *** [debs] Error 25 So I did some digging and I found that '_' is actually an invalid character for fedora as well. See this link: https://fedoraproject.org/wiki/Packaging:NamingGuidelines#Separators So basically we will need to fix the lustre version not to contain any '_' for proper packaging on rpm and deb systems.

Gerrit Updater added a comment - 23/Mar/16 6:01 AM

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/18107/
Subject: ~~LU-7699~~ build: Replace version_tag.pl with LUSTRE-VERSION-GEN
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: ee813dbaa2a2b86f4873c4c289f62a0243aa9809

Gerrit Updater added a comment - 23/Mar/16 6:01 AM Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/18107/ Subject: LU-7699 build: Replace version_tag.pl with LUSTRE-VERSION-GEN Project: fs/lustre-release Branch: master Current Patch Set: Commit: ee813dbaa2a2b86f4873c4c289f62a0243aa9809

Christopher Morrone (Inactive) added a comment - 15/Mar/16 9:05 PM

2.7.90, 2.7.91, ... are already used

OK, fine, then they could start at .120, or .500, or whatever. I don't care so much about the details as long as we follow some simple rules that make the versioning work well for the various packaging systems like RPM. For instance, each release (including release candidate releases) should have a unique version number. The version number should be mostly numerical, and atomically increasing. Letters should be used sparingly, and preferably not at all, in the main part of the version string. (Where main part is the first four period-separated numbers. We allow for alphanumeric strings for per-organization branding after that)

The "five versions of code" issue is real, I guess, but on the other hand it could be solved with other means. E.g. once we transition to signed packages, the real releases might be signed with a different key than the unsigned (or signed with a "build bot" only) key.

With all due respect, I do no think that will be an acceptable solution. The version string is specifically designed to make it easy to tell different builds and packages apart. Unique version numbers are something packaging systems like rpm, and dpkg understand. The relationship between different keys and how they effect install precedence, is not. You say that there are other reasonable methods, but I do not really think we are likely to find one. The version string is very explicitly the universal solution for this problem. We are introducing larger problems by trying to avoid that universal solution.

Also we don't widely distribute these RCs and so nobody could get them.

Right, and no doubt that is how this has flown under the radar for so long. Because this part of your development phase is kept pretty secret.

But this is pretty much the opposite of how a good open source development project, with an open development process, should work. If we had a more open process with good communication, then every RC release would have an announcement on mailing lists letting people know where to get the packages and asking for help in testing. This announcement shouldn't just go to lustre-devel, it should also got to lustre-discuss. We want adventurous users/sysadmins that can carve out some time to help with testing to do so. They should not have to go to git to do that, they should be able to download packages.

If we make those packages reasonably well known and easy to access (which we should definitely do for 2.9.0 RCs and into the future), then it will not be at all acceptable to have them all use exactly the same version number. They have to be uniquely and clearly versioned.

The "we don't need to trust our build procedures to be consistent" is a real risk on the other hand on many levels. There might have been a system update between the latest RC and the release that updated some library breaking something in process (updating gcc, whatever)

I agree, that is a more reasonable concern. But it is one that is pretty much entirely manageable with a small amount of process and good communication. When RC releases are about to start, we send an announcment out to the build farm admins saying "RCs are about to begin. Please freeze all changes to builders for branch X until further notice". If you are really worried, you call them on the phone and get a verbal confirmation that they understand. Many, many, many software projects are able to handle that level of release management in the real world. I believe that we can on Lustre as well. I'm putting my money where my mouth is too, by spending LLNL manpower on setting up a new buildfarm for Lustre. We'll present the prototype at LUG this year.

Furthermore, there are those that would argue that if Lustre can't compile against multiple different libraries with the same ABI, then Lustre is just broken. Remember that Lustre does not own the entire OS. It really needs to work correctly anywhere the packages install without being forced.

But I think keeping the builders static during the RC phase is totally reasonable.

a glitch during building for other reasons producing a dud (ram error, disk error),

Thats a one-in-a-billion type of problem. I don't think that in even remotely as much of a problem as releasing several packages with different contents, but versioning them identically.

somebody might have compromised the build server meanwhile (or inserted some code into the git tree), ...

That isn't a problem specific to release condidates, so I don't think it is relevant to the conversation.

All in all what it really means is we would really need to test the final release just like an another RC just to be sure it's still viable, and if it's not then what?

I actually think it is fairly reasonable to build the same commit with no other change but the version string and not do any additional testing. But if people want to do more testing, that is certainly fine with me. I would argue that 8-24 hours of automated-only testing would be more than sufficient as a sanity check that nothing went horribly wrong between the two tags (of the exact same commit).

In short, there's certain appeal to release code that was actually tested vs code that was only built even when it's supposedly the same as the one that was tested because it was supposedly built from the same source.

Yes, I understand the appeal and your points. But I don't think they are strong enough to justify the offsetting bad packing practice that you have introduced.

We need to stop having multiple packages of different contents with the same exact version. That is simply not acceptable.

Christopher Morrone (Inactive) added a comment - 15/Mar/16 9:05 PM 2.7.90, 2.7.91, ... are already used OK, fine, then they could start at .120, or .500, or whatever. I don't care so much about the details as long as we follow some simple rules that make the versioning work well for the various packaging systems like RPM. For instance, each release (including release candidate releases) should have a unique version number. The version number should be mostly numerical, and atomically increasing. Letters should be used sparingly, and preferably not at all, in the main part of the version string. (Where main part is the first four period-separated numbers. We allow for alphanumeric strings for per-organization branding after that) The "five versions of code" issue is real, I guess, but on the other hand it could be solved with other means. E.g. once we transition to signed packages, the real releases might be signed with a different key than the unsigned (or signed with a "build bot" only) key. With all due respect, I do no think that will be an acceptable solution. The version string is specifically designed to make it easy to tell different builds and packages apart. Unique version numbers are something packaging systems like rpm, and dpkg understand. The relationship between different keys and how they effect install precedence, is not. You say that there are other reasonable methods, but I do not really think we are likely to find one. The version string is very explicitly the universal solution for this problem. We are introducing larger problems by trying to avoid that universal solution. Also we don't widely distribute these RCs and so nobody could get them. Right, and no doubt that is how this has flown under the radar for so long. Because this part of your development phase is kept pretty secret. But this is pretty much the opposite of how a good open source development project, with an open development process, should work. If we had a more open process with good communication, then every RC release would have an announcement on mailing lists letting people know where to get the packages and asking for help in testing. This announcement shouldn't just go to lustre-devel, it should also got to lustre-discuss. We want adventurous users/sysadmins that can carve out some time to help with testing to do so. They should not have to go to git to do that, they should be able to download packages. If we make those packages reasonably well known and easy to access (which we should definitely do for 2.9.0 RCs and into the future), then it will not be at all acceptable to have them all use exactly the same version number. They have to be uniquely and clearly versioned. The "we don't need to trust our build procedures to be consistent" is a real risk on the other hand on many levels. There might have been a system update between the latest RC and the release that updated some library breaking something in process (updating gcc, whatever) I agree, that is a more reasonable concern. But it is one that is pretty much entirely manageable with a small amount of process and good communication. When RC releases are about to start, we send an announcment out to the build farm admins saying "RCs are about to begin. Please freeze all changes to builders for branch X until further notice". If you are really worried, you call them on the phone and get a verbal confirmation that they understand. Many, many, many software projects are able to handle that level of release management in the real world. I believe that we can on Lustre as well. I'm putting my money where my mouth is too, by spending LLNL manpower on setting up a new buildfarm for Lustre. We'll present the prototype at LUG this year. Furthermore, there are those that would argue that if Lustre can't compile against multiple different libraries with the same ABI, then Lustre is just broken. Remember that Lustre does not own the entire OS. It really needs to work correctly anywhere the packages install without being forced. But I think keeping the builders static during the RC phase is totally reasonable. a glitch during building for other reasons producing a dud (ram error, disk error), Thats a one-in-a-billion type of problem. I don't think that in even remotely as much of a problem as releasing several packages with different contents, but versioning them identically. somebody might have compromised the build server meanwhile (or inserted some code into the git tree), ... That isn't a problem specific to release condidates, so I don't think it is relevant to the conversation. All in all what it really means is we would really need to test the final release just like an another RC just to be sure it's still viable, and if it's not then what? I actually think it is fairly reasonable to build the same commit with no other change but the version string and not do any additional testing. But if people want to do more testing, that is certainly fine with me. I would argue that 8-24 hours of automated-only testing would be more than sufficient as a sanity check that nothing went horribly wrong between the two tags (of the exact same commit). In short, there's certain appeal to release code that was actually tested vs code that was only built even when it's supposedly the same as the one that was tested because it was supposedly built from the same source. Yes, I understand the appeal and your points. But I don't think they are strong enough to justify the offsetting bad packing practice that you have introduced. We need to stop having multiple packages of different contents with the same exact version. That is simply not acceptable.

Oleg Drokin added a comment - 15/Mar/16 5:30 AM

2.7.90, 2.7.91, ... are already used. they mean code drops in code freeze - i.e. when only blockers remain.

RCs (release candidates) on the other hand are generated when there are no more blockers left (At the time of tagging, anyway). I guess it's possible to continue the versioning as is, but we might run our of digits quite soon in some cases.

The "five versions of code" issue is real, I guess, but on the other hand it could be solved with other means. E.g. once we transition to signed packages, the real releases might be signed with a different key than the unsigned (or signed with a "build bot" only) key. Also we don't widely distribute these RCs and so nobody could get them. We can also destroy interim RCs from download site if needed (or it happens automatically over a relatively short time). And we can tell them apart by a build date too.
The actual drawbacks I encountered are: The changelog could only list an estimated "release date" since who knows how long the testing and all the approvals would take.

The "we don't need to trust our build procedures to be consistent" is a real risk on the other hand on many levels. There might have been a system update between the latest RC and the release that updated some library breaking something in process (updating gcc, whatever), a glitch during building for other reasons producing a dud (ram error, disk error), somebody might have compromised the build server meanwhile (or inserted some code into the git tree), ...
All in all what it really means is we would really need to test the final release just like an another RC just to be sure it's still viable, and if it's not then what?

In short, there's certain appeal to release code that was actually tested vs code that was only built even when it's supposedly the same as the one that was tested because it was supposedly built from the same source.

Oleg Drokin added a comment - 15/Mar/16 5:30 AM 2.7.90, 2.7.91, ... are already used. they mean code drops in code freeze - i.e. when only blockers remain. RCs (release candidates) on the other hand are generated when there are no more blockers left (At the time of tagging, anyway). I guess it's possible to continue the versioning as is, but we might run our of digits quite soon in some cases. The "five versions of code" issue is real, I guess, but on the other hand it could be solved with other means. E.g. once we transition to signed packages, the real releases might be signed with a different key than the unsigned (or signed with a "build bot" only) key. Also we don't widely distribute these RCs and so nobody could get them. We can also destroy interim RCs from download site if needed (or it happens automatically over a relatively short time). And we can tell them apart by a build date too. The actual drawbacks I encountered are: The changelog could only list an estimated "release date" since who knows how long the testing and all the approvals would take. The "we don't need to trust our build procedures to be consistent" is a real risk on the other hand on many levels. There might have been a system update between the latest RC and the release that updated some library breaking something in process (updating gcc, whatever), a glitch during building for other reasons producing a dud (ram error, disk error), somebody might have compromised the build server meanwhile (or inserted some code into the git tree), ... All in all what it really means is we would really need to test the final release just like an another RC just to be sure it's still viable, and if it's not then what? In short, there's certain appeal to release code that was actually tested vs code that was only built even when it's supposedly the same as the one that was tested because it was supposedly built from the same source.

Christopher Morrone (Inactive) added a comment - 15/Mar/16 1:27 AM - edited

In a http://review.whamcloud.com/18107 review, Oleg pointed out a potential issue with lack of a special-case for RC releases. I admit that I totally missed how RC releases are currently being done. To make a long story short, I am going to argue that we really shouldn't do it that way any more.

Correct me if I am wrong, but as I now understand it, even though a tag might have a name of "2.8.0-RC1", the build system is currently stripping off the "-RC1" and building the code as if it is a final "2.8.0" release. As Oleg explained it, the intention here has been that when an RC is tested and agreed to be the final release, the packages from the RC build are already named with the final release version and can be released as-is. The advantage here is that we don't need to trust our build procedures to be consistent, we can simply release the packages from the most recent RC.

The down side should be pretty clear though too: RC releases are things that we explicitly want to release and have the general community test and vet. So takeing 2.8.0 as an example, which to date has five RC releases (-RC1 through -RC5), there are exactly five different versions of the code now out there in the wild with no way to tell them apart.

I would argue that this downside of having multiple different codes out in the wild with exactly the same version vastly out weighs the possibility that we might not compile and generate packages the same way twice.

So I would argue that not special-casing RC releases is a feature of change 18017, not a defect. We will need to come up with a new way of versioning RC releases. I would suggest that we use the obvious method already established (but not actually fully employeed):

Release candidates will use the previous release's number, but have the thrid number start at 90. For example, release candidates for the 2.8.0 release would have version numbers like: 2.7.90, 2.7.91, 2.7.92, 2.7.93, etc.

This also has the advantage that it works well with packaging systems like RPM. Many rpm-based distros explicitly advise against using substrings like "RC1" in version strings, because rpm just doesn't handle them, and assiciated package upgrades in any particularly intelligent way. Granted, we somewhat avoided that problem by stripping off the RC altogether.

But I think it is very bad form to have multiple different versions of the code to have the same version string.

Christopher Morrone (Inactive) added a comment - 15/Mar/16 1:27 AM - edited In a http://review.whamcloud.com/18107 review, Oleg pointed out a potential issue with lack of a special-case for RC releases. I admit that I totally missed how RC releases are currently being done. To make a long story short, I am going to argue that we really shouldn't do it that way any more. Correct me if I am wrong, but as I now understand it, even though a tag might have a name of "2.8.0-RC1", the build system is currently stripping off the "-RC1" and building the code as if it is a final "2.8.0" release. As Oleg explained it, the intention here has been that when an RC is tested and agreed to be the final release, the packages from the RC build are already named with the final release version and can be released as-is. The advantage here is that we don't need to trust our build procedures to be consistent, we can simply release the packages from the most recent RC. The down side should be pretty clear though too: RC releases are things that we explicitly want to release and have the general community test and vet. So takeing 2.8.0 as an example, which to date has five RC releases (-RC1 through -RC5), there are exactly five different versions of the code now out there in the wild with no way to tell them apart. I would argue that this downside of having multiple different codes out in the wild with exactly the same version vastly out weighs the possibility that we might not compile and generate packages the same way twice. So I would argue that not special-casing RC releases is a feature of change 18017, not a defect. We will need to come up with a new way of versioning RC releases. I would suggest that we use the obvious method already established (but not actually fully employeed): Release candidates will use the previous release's number, but have the thrid number start at 90. For example, release candidates for the 2.8.0 release would have version numbers like: 2.7.90, 2.7.91, 2.7.92, 2.7.93, etc. This also has the advantage that it works well with packaging systems like RPM. Many rpm-based distros explicitly advise against using substrings like "RC1" in version strings, because rpm just doesn't handle them, and assiciated package upgrades in any particularly intelligent way. Granted, we somewhat avoided that problem by stripping off the RC altogether. But I think it is very bad form to have multiple different versions of the code to have the same version string.

People

Assignee:: Christopher Morrone (Inactive)

Reporter:: Christopher Morrone (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 13 Start watching this issue

Dates

Created:: 22/Jan/16 8:06 PM

Updated:: 06/Nov/19 6:02 AM

Resolved:: 03/May/16 5:45 PM