lustre build system improvments (LU-3953)

[LU-7642] Allow lustre source build without git working directory Created: 08/Jan/16  Updated: 06/Dec/16  Resolved: 06/Dec/16

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.9.0

Type: Technical task Priority: Critical
Reporter: Christopher Morrone Assignee: Christopher Morrone
Resolution: Fixed Votes: 1
Labels: patch

Issue Links:
Blocker
is blocked by LU-7699 Overhaul lustre's versioning Closed
Rank (Obsolete): 9223372036854775807

 Description   

Right now lustre will fail to compile if one attempts to build from a source tree pulled from git, but not actually in a git working directory. The error will contain the string:

"I have no idea how to create a META file"

I can only guess at the reasons this was put in place originally, and none of the reasons that I am coming up with are particularly convincing.

There are, on the other hand, pretty strong counter examples for why this restriction should not exist. For instance, an automated build-and-test farm might reasonably want to use a command like the following to retrieve a tree for a specific commit:

 git archive --format=tar.gz --remote=$URL $GIT_TREEISH

As it stands, that tar ball will not allow lustre to be built.

This is a long standing problem, and we really need to get it fixed.

I have already hacked the code to stop the problem for LLNL's local 2.5-llnl branch, but I think a little more extensive overhaul is needed to make things clean. I think it is pretty reasonable for me to get this done in time for Lustre 2.9.



 Comments   
Comment by Brian Murrell (Inactive) [ 08/Jan/16 ]

It is done to take advantage of the metadata that is in git, like the tag (plus how ever many commits forward of the last tag made) where the code was taken from.

The META file is supposed to be created before you remove the code from git, and should be done when you do make dist. The META file is used to store the git metadata and supply it to the next configure run (i.e. outside of git).

make dist is the supported way of exporting a copy of the code outside of git.

Comment by Robert Read (Inactive) [ 08/Jan/16 ]

In the absence of the META file could we use a default for the build tag like "custom" or "$BUILDTAG"?

Comment by Christopher Morrone [ 08/Jan/16 ]

Right, and once I remove that restriction, we can support some completely reasonable build models that the current code prevents. Such as the one I listed in the description.

None of that git info is really required, and it is time to finally fix this well intentioned, but overly restrictive regulation.

I think we should probably also revisit how we do versioning in Lustre too. I was part of creating --downstream-release, but now I think that has outlived its usefullness, especially since the core Intel team has taken most of their branches and versions private. It probably time to rethink that and make it alot simpler. For instance, we probably just need to accept that it takes a commit to change the version string. I think that is reasonable. But I intend to open a separate ticket on that at some point. This one should be restricted to just allowing builds without a git working directory.

Comment by Christopher Morrone [ 08/Jan/16 ]

Robert: I don't think so, I think something like that should be an option. If you don't have the git working directory, you don't know what tag it came from. The code should just have the version included. It avoids soooooo many problems.

Now, I'm not at all saying that we can't also have Lustre add version info if it is in a git working directory. That part can certainly remain.

Comment by Brian Murrell (Inactive) [ 08/Jan/16 ]

rread Sure, we could throw any kinds of "placeholder" (i.e. like "custom") values into the missing metadata but the point of getting and using that metadata was to convey accurate information about exactly where the code came from.

Prior to all of that, as you will recall we used to just have the version/build information in a file that somebody had to update, but it often got overlooked when doing version updates/releases, etc., so we automated getting that from git. It also never conveyed "between release" information as well as a tag+fwd commits, or even a git hash could.

So yes, it does complicate and restrict, but there's no free lunch.

Comment by Christopher Morrone [ 08/Jan/16 ]

Prior to all of that, as you will recall we used to just have the version/build information in a file that somebody had to update, but it often got overlooked when doing version updates/releases, etc., so we automated getting that from git.

I don't think that is quite accurate. We still need to make commits to update the version information. It would be quite bad to fail to do so even now. It is mostly the release field that is effected by all of this.

And frankly, this use of the release field is a complete violation of what the release field is supposed to be for in rpm packaging. Anyone using a build system like koji (what fedora and related distros use), the release field must be reserved for packaging purposes. Lustre trying to reuse it for its own means causes a lot of complications. That just isn't what the Release field is for.

We are using the Release field for Versioning, of Lustre, whereas the Release field should only be used for packing purposes. Granted at the moment the concepts or building and packaging are pretty confused and incestuously interwined in Lustre, but I am working hard to fix that.

And further: There is minimal (no?) cost to you to allowing other people in the world to build without a working git repo. It is about as close to a free lunch as one can get. Other versioning issues will probably require tradeoffs, I agree, but I'm not going to go into those in this ticket.

Comment by Robert Read (Inactive) [ 08/Jan/16 ]

morrone Yes, that makes sense. Care to submit your patch for master? Code talks louder than words.

Comment by Christopher Morrone [ 08/Jan/16 ]

Absolutely! That is why I assigned this ticket to myself; I plan to do this work. This and other tickets will help me track what I need to get done, and also help begin socializing my vision for the build system in the 2.9 time frame.

And FYI, I opened LU-7645 to start planning what needs to change about our use of the RPM Release field.

Comment by Gerrit Updater [ 20/Jan/16 ]

Christopher J. Morrone (morrone2@llnl.gov) uploaded a new patch: http://review.whamcloud.com/18071
Subject: LU-7642 build: Allow Lustre to build in absence of git
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 6d88d4018191a5ef3c5543f05e60bbfa3523794b

Comment by Christopher Morrone [ 21/Jan/16 ]

It is a bit confusing at this point how the Lustre version is set under what conditions. I'll keep some notes here as I go along, and maybe turn it into wiki doc later. The document under lustre/doc/VERSIONING is a bit out of date.

lustre/autoconf/lustre-version.ac contains m4 macros named LUSTRE_

{MAJOR,MINOR,PATCH,FIX}

. They are combine in a single LUSTRE_VERSION m4 macro that contains only the first three numbers if LUSTRE_FIX is zero, and all four numbers otherwise.

LUSTRE_VERSION is assigned to AC_LUSTRE_VERSION_STRING which is in turn made available more broadly with AC_SUBST. However AC_LUSTRE_VERSION_STRING is only currently used in one place, in lustre/include/lustre_ver.h.in and used to set a CPP define named LUSTRE_VERSION_STRING.

Partly m4 macros are used so that the LUSTRE_VERSION can be used in AC_INIT in configure.ac.

But it is not immediately clear to me why we use the extra steps of using AC_SUBST and later manually setting CPP macros with slightly different names when they could have just used AC_DEFINE directly.

Next, lets consider places that try to query the local git repo for information.

In config/lustre-build.m4, the LB_CONFIGURE macro calls LB_BUILDID.

LB_BUILDID first checks if we are in a local git repo by running "git branch" and ignoring the output. If that command returns success, the macro tries to use "git describe", only looking at tags that match the pattern of the old cvs-style tags with underscores in their name (why are we still doing that??), and pulling out hash portion to use as the buildid.

Alternatively, if "git branch" is unsuccessful LB_BUILDID tries to pull the buildid from a file named META. If that doesn't exist either, LB_BUILDID prints a scary warning but does not actually abort.

There are a number of strange things here. For instance, the code will warn if the starting part of the tag doesn't match $VERSION (where does that one get set?), but doesn't actually abort using the hash. That is kind of strange. Also this whole thing only happens once when configure is run the first time, and then cached. So if the developer makes 15 commits after running configure, presumably the hash that lustre gets built with will be completely wrong. That seems odd.

Also strange is that this code to parse git describe output is more or less duplicated, but not exactly, in lustre/scripts/version_tag-git.pl. For instance, version_tag-git.pl fails to employ the "--match" option with git describe the way that the configure script does.

lustre/scripts/version_tag-git.pl is one of the support scripts in the overly-complicated-for-what-it-does script named lustre/scripts/version_tag.pl. version_tag.pl is called from the special automake dist-hook target (or more specifically the dist-hook target calls the module-dist-hook target which runs version_tag.pl). So this script is called at "make dist" time to record the custom buildid that is partially assempled from the output of git --describe, and put that information into a file named "META" so that it is available in the canonical distribution tarball. When the tarball is used in the future, access to a local git repo will no longer be available.

This all seems much more convoluted than it needs to be. Open questions:

  • Why don't we just do the parsing of "git --describe"'s output in one place instead of 2 or 3?
  • Why don't we use the git --describe output in the version option of AC_INIT (we could use m4_esyscmd_s to call the same script that would be used elsewhere).
Comment by Christopher Morrone [ 21/Jan/16 ]

Look at this:

$ sh autogen.sh
configure.ac:12: installing 'config/config.guess'
configure.ac:12: installing 'config/config.sub'
configure.ac:14: installing 'config/install-sh'
configure.ac:14: installing 'config/missing'
libcfs/libcfs/autoMakefile.am: installing 'config/depcomp'
$ ./configure --version
Lustre configure LUSTRE_VERSION
generated by GNU Autoconf 2.69
[cut]
$ ./configure --help
`configure' configures Lustre LUSTRE_VERSION to adapt to many kinds of systems.
[cut]

This is because we do, in configure.ac, the following:

AC_INIT([Lustre], [LUSTRE_VERSION], [https://jira.hpdd.intel.com/], [lustre])

We are setting the version to the string literal "LUSTRE_VERSION". That is clearly not correct.

Comment by Christopher Morrone [ 25/Jan/16 ]

FYI, I have a series of patches under LU-7699 that overhaul lustre versioning. This problem is eliminated in the first patch of that series (http://review.whamcloud.com/18107).

Comment by Christopher Morrone [ 11/Apr/16 ]

Change 18107 landed, so this is now fixed.

Generated at Sat Feb 10 02:10:40 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.