[LU-3956] Eliminate lbuild's nonstandard build process Created: 16/Sep/13  Updated: 20/Jul/16

Status: In Progress
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.5.0
Fix Version/s: None

Type: Task Priority: Minor
Reporter: Christopher Morrone Assignee: Minh Diep
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Related
is related to LU-5614 use %kernel_module_package for weak-u... Closed
is related to LU-3463 Create mock-based build farm capability Closed
Rank (Obsolete): 10526

 Description   

lbuild's nonstandard build process is making improvements in lustre's build and packaging systems far more difficult than it should be. We really need to move beyond lbuild's non-standard approach.

To support proper build procedures for rpm-based distributions, we need a build farm that is capable of building rpms the way that stnards distros like Fedora do it. One approach would be to use mock. Another approach would be to boot a new VM instances for every build on every single version of each Linux distribution. The VM instances would be launched from many known clean, basic images.

Each piece of software that we want to build should have a .src.rpm. That .src.rpm should have all proper dependencies expressed to allow it to be built under mock and other similar processes.

We'll need something that sits on top of mock to coordinate the build of dependant packages. At LLNL we use a home-grown script, that honestly has probably exceeded its lifetime.

I would suggest that we again evaluate Koji, which is Fedora's solution. I haven't used it personally, so I don't know how suitable Koji would be.



 Comments   
Comment by Minh Diep [ 09/Jun/15 ]

I am starting this conversion about using Fedora mock to build lustre. First off, I tried the mock command pulling lustre-release git. However, since git repo has lustre.spec.in, mock doesn't understand that and expecting lustre.spec. Any idea?

Comment by Christopher Morrone [ 09/Jun/15 ]

I recommend building the .src.rpm, and then feeding that to mock.

Comment by Christopher Morrone [ 09/Jun/15 ]

Perhaps I should give a bit more of an explanation so it is clear what is going on.

By design, Lustre cannot be built from its state in the git repository without first taking a prerequisite action. The build system needs to be bootstrapped into place before any configuration, compilation, and/or packaging can take place.

Autotools (autoconf, automake, libtool) form the basis of Lustre's build system. The bootstrapping step is a requirement for any system built on autotools. When it comes to storing your source code in a source repository, there are two schools of thought when it comes to what to do with the products of the bootstrapping process: refresh and commit to the source repository the bootstrap products every time a build system file is modified, or never commit the products and just bootstrap before building. There are advantages and disadvantages to both approaches. Lustre chose the latter; we do not check the bootstrap products into the repository.

So when we go to build lustre, the first step is always to bootstrap the build system. We named the bootstrapping script autogen.sh, which is a commonly chosen name. From the base of the source tree, run:

sh autogen.sh

Once autogen.sh is run, it is now possible to configure the build for the specific target platform (by running "configure" and choosing any desired configuration options), and then compile Lustre. That is what a developers commonly do during development. But when we are planning to package Lustre for distribution, we want to take a slightly different approach for the second step.

After bootstrapping, we want to generate the canonical full tarball of the source code, ready to be used for further distro packaging, or for end users to untar and compile directly.

To make the canonical tarball, we employ the "make dist" target. That target is created by autotools. The Makefiles themselves are generated by autotools as well. So even though we are not ready to configure the build tree for a specific target environment, we need to run the configure script to create the Makefiles that will allow us to create the tarball.

It is very important at this step to not make any decisions during the configuration step at this point that would in change the contents of the tarball. That tarball must include the full sources. (The decisions about kernel choice, which OSD, etc. will be made later when the tarball is unpacked and configure is run again to actually test the environment and make decisions about how the sources need to be built.) Since we are not making decisions about the environment yet, we can bypass most of the normal checks that we do when we run configure. In lustre we do that like so:

./configure --enable-dist

It is a struggle to keep the build system clean, so there are probably still some checks in there that are unnecessary, but it should be much better now on master than it was several years ago.

And now, we can finally generate the canonical tarball:

make dist

That tarball now contains the .spec files as well. Before bootstrapping and configuration steps, the files were .spec.in as you noted. We employ autotools to fill in a very few number of variables in the spec files. They are: VERSION, DOWNSTREAM_RELEASE, and PACKAGE. All other use of autotools variables in the spec files is forbidden, for reasons I won't go into.

So now you have the canonical tarball. Ideally this one tarball would be the input into an entire build system. Different OS build environments would all use that one-time generated source tarball, and not start over from the git repo for each.

For rpm-based systems, there is no longer just one .spec file. That is worth keeping in mind. I would not be surprised if we pushed that model further. We might have different spec files for RHEL-like systems and SUSE-like systems, for instance. Already we have a separate spec file for DKMS builds of lustre.

Hopefully that provides some illumination on why the spec files have the ".in" extension straight from the git repository.

Comment by Minh Diep [ 10/Jun/15 ]

Thanks Chris, that is the most comprehensive explanation I have seen.

Comment by Christopher Morrone [ 04/Dec/15 ]

FYI, I have more experience with koji now, because we are using it to build the TOSS distro here. It would not really be appropriate for building just lustre.

Comment by Christopher Morrone [ 10/Dec/15 ]

I changed the summary and description for this ticket to make it a little more high level. The main goal is really to eliminate the non-standard way that lbuild currently builds things. The specific solution is less important to me.

Another approach that may or may not employ mock is to have every build on every version of each distribution start with a cleanly started VM instance. buildbot makes it pretty easy to generate new VM instances from known clean images on the fly as needed. If we went the mock-less approach, we would still need to keep in mind that we want to have a process that ensures that our .src.rpm remains functional.

Comment by Minh Diep [ 11/Dec/15 ]

Perhaps we need to understand more about the build standard; and what so 'non-standard' way that lbuild is doing.
Lbuild is just a script to build lustre ecosystem: zfs, ofed, patch the kernel, and lustre.
Lbuild can build Centos/Rehat/Suse/Ubuntu and can also save the built kernel so the next patch don't have to rebuild kernel.
Lbuild is using rpmbuild which is not so 'non-standard' IMHO.

We will need a script to build lustre regardless it's called lbuild or not. Perhaps we need to see what we can improve the script rather declare to eliminate it.

In our lab, we are using Jenkins to build not only lustre but many other tools (rpm and non rpm). Jenkins is flexible enough to allow us to achieve that. we have talked about using chroot or generate new VM for every build. However that won't necessary be an improvement and might introduce more overhead to build times

However, if you have specific change to lbuild (or any new name), we are please to see a patch

Comment by Christopher Morrone [ 11/Dec/15 ]

First, read Stephen Champion's comment in LU-5614. He is exactly right.

Everything lbuild does before rpmbuild is non-standard. Here is the most egregious thing that lbuild does:

    rpm2cpio $rpm | cpio -id

It repeatedly sidesteps all standard packing practices by manually tearing apart the packages and putting their contents in random non-standard locations. lbuild's foundations are unsound. We can't fix that with simple patches; it needs a complete overhaul.

You can't reasonably address the requirements that Stephen rightly states with a system that repeatedly sidesteps the very packing systems it is targeting.

Generated at Sat Feb 10 01:38:23 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.