[LU-1032] Add dkms support for kernel modules Created: 25/Jan/12  Updated: 13/Feb/19  Resolved: 19/Feb/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.6.0, Lustre 2.8.0, Lustre 2.10.0

Type: Improvement Priority: Blocker
Reporter: Guy Coates Assignee: Bruno Faccini (Inactive)
Resolution: Fixed Votes: 0
Labels: prz

Attachments: File diff     Text File dkms.patch     File lustre.spec    
Issue Links:
Duplicate
is duplicated by LU-6393 dkms build of client modules requires... Resolved
Related
is related to LU-2391 warning messages for missing symbols ... Resolved
is related to LU-5465 Enhancements to Lustre DKMS RPM Resolved
Rank (Obsolete): 5847

 Description   

dkms is a cross distro mechanism for the building and maintenance of out-of-tree kernel modules.
(http://linux.dell.com/dkms/) The attached patch adds dkms support to the debian/ubuntu packaging infrastructure.

The current debian/ubuntu kernel module package uses module-assistant, which produces a deb which is tied to a specific kernel version, requiring the user to manually rebuild packages when the kernel is changed.

The dkms package allows modules for multiple kernel versions to be packaged together into a single deb. Once installed, the package contains triggers so that when a new kernel is installed for which no pre-built lustre modules exists, dkms will automatically build and install them. This reduces maintenance overhead on client machines.

dkms also works on redhat and sles; it should be possible to fold dkms support into the rpm build process.



 Comments   
Comment by Peter Jones [ 25/Jan/12 ]

Hi Guy

Thanks for this suggestion. I do not see an attached patch - will you be uploading the patch into gerrit so that we can review and hopefully land it?

Peter

Comment by Guy Coates [ 25/Jan/12 ]

Hi Peter,

There is is a diff attached on the ticket.

http://jira.whamcloud.com/secure/attachment/10778/diff

I can do the upload to gerrit if required, but it will involve some
negotiations to get git through our firewall; it might take a while...

Cheers,

Guy


Dr. Guy Coates, Informatics Systems Group
The Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1HH, UK
Tel: +44 (0)1223 834244 x 6925
Fax: +44 (0)1223 496802


The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.

Comment by Peter Jones [ 25/Jan/12 ]

Ah yes. I see it now - thanks! We do need to accept code contributions via gerrit I'm afraid so if you could upload it then that would be great.

Comment by Andreas Dilger [ 25/Jan/12 ]

Guy, as a starting point can you please attach a patch produced by "git format-patch HEAD~1" with a proper Signed-off-by: line, so that if you are unable to use Git to push to Gerrit then someone could take over the patch.

Note that Git/Gerrit can tunnel over either http or ssh, so there is no need to open up a dedicated git port to submit patches.

Comment by Guy Coates [ 26/Jan/12 ]

Great. I'll do a git-push via gerrit.

Cheers,

Guy


Dr. Guy Coates, Informatics Systems Group
The Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1HH, UK
Tel: +44 (0)1223 834244 x 6925
Fax: +44 (0)1223 496802


The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.

Comment by Guy Coates [ 26/Jan/12 ]

Hi,

Code has been pushed:

http://review.whamcloud.com/2021


The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.

Comment by Peter Jones [ 26/Jan/12 ]

thanks Guy!

Comment by Guy Coates [ 07/Feb/12 ]

I've updated the patchset to move the dkms deb to the correct place after build and to include some modules that were missing from the build.

Comment by Brian Murrell (Inactive) [ 14/Mar/12 ]

So I tried building this and the build does produce the lustre-client-modules-dkms package, but when I try to install and build that on a Lucid machine it fails due to a missing autogen.sh in the top-level dir (or anywhere else). Note that the "dist" tarball doesn't have the autogen.sh stuff in it, so when you make the copy for dkms you might have to cherry pick that out of the source tree.

Also, you didn't address the trailing whitespace in debian/README.Debian.

Comment by Andreas Dilger [ 12/Dec/12 ]

Does this DKMS package also work for installation on RHEL or SLES clients, or does it need to be packaged differently there?

Comment by Prakash Surya (Inactive) [ 12/Dec/12 ]

So I tried building this and the build does produce the lustre-client-modules-dkms package, but when I try to install and build that on a Lucid machine it fails due to a missing autogen.sh in the top-level dir (or anywhere else). Note that the "dist" tarball doesn't have the autogen.sh stuff in it, so when you make the copy for dkms you might have to cherry pick that out of the source tree.

I haven't tested the patch myself, but Brian's comment raises some red flags with me. Why is autogen.sh needed? The DKMS package should only contain the make dist sources, and just enough to rebuild the package against the kernel version it is targeting (i.e. configure). If you make a dependency on autogen.sh, that also means the user must have all of the autotools "stuff" installed in order to build the package (which is not a "normal" requirement for DKMS package, AFAIK). So definitely don't copy autogen.sh into the dist, fix the underlying issue with the DKMS build process.

Comment by Guy Coates [ 14/Dec/12 ]

Hi,

In principle, the dkms patch should work on SLES and RHEL, but I have
not tested that yet.

autogen.sh is needed to work around a bug in dkms, which tramples on
file timestamps in source trees. This renders the source tree
un-buildable after "make clean" is run.

cd lustre-source-tree
./configure ; make ; make clean
(works as expected)

./configure ; make
cd . && /bin/bash
/var/lib/dkms/lustre-client-modules/2.1.54/build/missing --run
automake-1.11 --foreign
automake-1.11: no `Makefile.am' found for any configure output
make: *** [autoMakefile.in] Error 1

The fix to dkms is trivial, but still has not made it upstream.

http://comments.gmane.org/gmane.linux.kernel.dkms.devel/821
https://bugs.launchpad.net/dkms/+bug/952817

Even with the autogen.sh workaround, my experience with running the
patch as it stands inside Sanger is that the build is still rather
fragile. I think progressing this depends on dkms being fixed upstream.

Comment by Bruno Faccini (Inactive) [ 14/Dec/12 ]

Anyway, thank's for your work already Guy !!
I am trying to see how we can re-use/integrate it to have our RHEL/SLES builds aware ...

Comment by Bruno Faccini (Inactive) [ 14/Dec/12 ]

I mean DKMS-aware.

Comment by Bruno Faccini (Inactive) [ 06/Feb/13 ]

Pushed patch for Lustre-Client RPM DKMS-aware :
http://review.whamcloud.com/5284

I did not integrate autogen.sh in the dist sources, so actually it can only work if "dkms" main script is modified (see attached dkms.patch) to bypass the bug already described and reported by Guy.

So we are currently stuck with the following choice :

_ push to get bug fixed in DKMS.
_ ship autogen.sh and thus Require autotools.

Even providing a tar-ball will not help/work-around ...

This patch version can allow future patchless-Server DKMS rpm builds, just need to remove the %if

{is_client}

/%endif tests in lustre.spec.in.

Comment by Prakash Surya (Inactive) [ 06/Feb/13 ]

Bruno, I don't fully understand why the DKMS bug is affecting us. Is there a simple reproducer I can run to investigate it in a VM some more? Or can you provide some more explanation? I don't really follow Guy's comment above.

Comment by Andreas Dilger [ 06/Feb/13 ]

Bruno, I'm not sure why you are attaching a patch here instead of Gerrit?

Comment by Bruno Faccini (Inactive) [ 07/Feb/13 ]

Andreas, the patch is for DKMS main script, it should be pushed to DKMS git repository.

Prakash, the only reproducer I know actually is to install the DKMS-aware RPM I was able to generate with my patch in my local builds/tests, and also in Maloo/Hudson build at http://build.whamcloud.com/job/lustre-reviews/12980/.

I think the problem is that when DKMS is copying without preserving timestamps, we then trigger Makefile rules that involves autotools and un-packaged files in dist. In a case I was able to debug and trace, running "rpm -i lustre-client-modules-dkms...>" command you get the same error/msgs than Guy :

<configure outputs>

Type 'make' to build Lustre.
++ make
CDPATH="${ZSH_VERSION+.}:" && cd . && /bin/sh /var/lib/dkms/lustre-client-modules/2.3.57/build/missing --run aclocal-1.11
 cd . && /bin/sh /var/lib/dkms/lustre-client-modules/2.3.57/build/missing --run automake-1.11 --foreign
automake-1.11: no `Makefile.am' found for any configure output
make: *** [autoMakefile.in] Error 1

according to the following rules in autoMakefile :

.....

am__aclocal_m4_deps = $(top_srcdir)/lustre/autoconf/lustre-version.ac \
        $(top_srcdir)/configure.ac
am__configure_deps = $(am__aclocal_m4_deps) $(CONFIGURE_DEPENDENCIES) \
        $(ACLOCAL_M4)
......

$(srcdir)/autoMakefile.in:  $(srcdir)/autoMakefile.am $(srcdir)/build/autoMakefile.am.toplevel $(am__configure_deps)
        @for dep in $?; do \
          case '$(am__configure_deps)' in \
            *$$dep*) \
              echo ' cd $(srcdir) && $(AUTOMAKE) --foreign'; \
              $(am__cd) $(srcdir) && $(AUTOMAKE) --foreign \
                && exit 0; \
              exit 1;; \
          esac; \

which triggered because of the no-preserve copy causing "lustre/autoconf/lustre-version.ac" mtime to become newer than autoMakefile.in :

[root@CentOS631 build]# pwd
/var/lib/dkms/lustre-client-modules/2.3.57/build
[root@CentOS631 build]# stat autoMakefile.in lustre/autoconf/lustre-version.ac
  File: `autoMakefile.in'
  Size: 37616           Blocks: 80         IO Block: 4096   regular file
Device: fd00h/64768d    Inode: 1191534     Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2013-01-25 19:01:35.022204638 +0100
Modify: 2013-01-25 19:00:10.249163807 +0100
Change: 2013-01-25 19:00:10.249163807 +0100
  File: `lustre/autoconf/lustre-version.ac'
  Size: 1418            Blocks: 8          IO Block: 4096   regular file
Device: fd00h/64768d    Inode: 1192557     Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2013-01-25 19:01:48.809277591 +0100
Modify: 2013-01-25 19:00:10.295164811 +0100
Change: 2013-01-25 19:00:10.295164811 +0100
[root@CentOS631 build]# 

If you run with patched "dkms" main-script this no longer can happen. BTW, this problem could only impact us due to our build system ... And this may be why DKMS developers do not seem to take care of it.

And finally too bad, build fails for Client sled11[sp2]+fc18 distros ... I think I need to rebase (as also pointed/commented by C.Morrone) and re-push ...

Comment by Prakash Surya (Inactive) [ 07/Feb/13 ]

Out of the two options posted above:

1. push to get bug fixed in DKMS.
2. ship autogen.sh and thus Require autotools.

Neither is viable, IMO. (1) Would require buy in from the upstream DKMS folks, and then buy in from the distro folks, and then buy in from the users to update their OS to pull the new package. (2) Requiring autotools to be installed in the users environment is just wrong. Other packages get around the "bug" just fine, so why not us?

Comment by Prakash Surya (Inactive) [ 07/Feb/13 ]

Bruno, I was able to reproduce the issue locally, and I think we're just doing things wrong and/or nonstandard. The build system as a whole needs a lot of clean up, some of which might be a prerequisite to any of the DKMS patches landing. For example, I was able to get your client DKMS patch working once I applied this patch on top of it: http://review.whamcloud.com/5301. I'm not sure that patch is entirely correct, but it demonstrates a way we can tweak the build system to get around the DKMS "bug" without autogen.sh.

Comment by Andreas Dilger [ 08/Feb/13 ]

Bruno, please push the DKMS patch upstream ASAP, and also submit to RHEL and Jeff Mahobey @ SLES.

I don't think it is terrible to Require auto tools installed to build, until this bug is fixed. Just make a clear comment that this is a workaround.

Comment by Bruno Faccini (Inactive) [ 08/Feb/13 ]

Prakash, could be that the DKMS "patch" was a way to hide some of our build specifics, and your is a 1st step to address a possible future long list of changes required due to the same kind of errors encountered during next DKMS-aware RPM usages ? Will see and fix on demand then!

Andreas, I had to re-work my patch due to a necessary rebase, new version is under build+testing in local and I will push it as soon it is ok. With Prakash additional the DKMS issue should now be bypassed and I will just add a comment that further issues may need to require autogen.sh and auto-tools.

Comment by Christopher Morrone [ 08/Feb/13 ]

I do think it would be fairly terrible to require autotools and a run of autogen.sh. I think that is a clear sign that there is a bug in Lustre's build system, and that we are failing to package some set of the necessary products from the run of autogen.sh. If the build system is working correctly, there should never be a reason to go back and rerun autogen.sh.

In other words, I am skeptical that this is really a dkms bug. I suspect that the dkms patch works around a lustre bug.

Comment by Brian Murrell (Inactive) [ 09/Feb/13 ]

Christopher,

Do you understand the nature of the actual DKMS bug? Did you follow any of the links to bug reports that Guy posted above?

His findings and patch seem entirely reasonable, and do fix the problem. What's a pity is that the DKMS maintainers have completely ignored both the bug filing itself and the repeated requests for even comment on this bug.

So, before you ask, "why does this only affect Lustre" and no other DKMS modules" (i.e. indicating that it's a Lustre build issue and not a DKMS bug). I would posit that no other kernel module packages are using autotools – because really, why does a Linux kernel module actually need them? Typically there is nothing to detect/configure – the Linux kernel is a known entity.

So this is an artefact of our multi-purpose (i.e. userspace and kernelspace make rules in the same tree) build tree.

All of this said, I wonder if there is opportunity in a DKMS module for us to fix up the timestamps manually with a few touch commands.

Comment by Bruno Faccini (Inactive) [ 11/Feb/13 ]

Just pushed patch #2, with new changes (thanks Chris for the RPM naming hint) and all required files (ie, autogen.sh but also new ones since Guy earlier tests) to allow successful DKMS builds even with the current known restriction. Thus also claimed to require auto-tools. All of this may be deleted with upcoming changes to our build process and/or DKMS will keep timestamps.

Prakash, even with your patch to allow successful rebuild of lustre-version.ac, I got new DKMS build errors of the same kind ...

Andreas, I am sorry, but what do you expect from me exactly when you say "submit to RHEL and Jeff Mahobey @ SLES" ??

Comment by Brian Murrell (Inactive) [ 11/Feb/13 ]

Bruno,

I suspect Andreas means that if the patch won't get pushed down from the DKMS maintainers (as they seem to be ignoring it and us) then we need to push it up to the distros (RHEL and SLES) for them to patch into their releases.

I'm not sure of the situation with DKMS and SUSE but I do know that DKMS is not actually included in standard RHEL, but is available from the EPEL repository. Given that DKMS is in FC 18, it is hopeful that it will become a first-class package in RHEL7.

I'm not sure what the process of getting patches into EPEL is but that's where you need to send the patch for EL6. In parallel, you probably ought to push the patch to FC18 and we can hope that that filters into RHEL7.

Comment by Bruno Faccini (Inactive) [ 11/Feb/13 ]

Humm, but builds for SLES-11[SP2] still fail, and according to their build log it seems to come from the "BuildArch: noarch" I used (as requested !) for DKMS sub-package. I presume it is wrongly detected and used in the SLES builds, but I am still working to find how/where this happen ...
Thus I may need to revert to not specify a BuildArch for SLES (or all) builds as a quick fix, who can help/comment on this ??

Comment by Peter Jones [ 11/Feb/13 ]

I have added Jeff Mahoney from Suse as a watcher to this ticket in the hope that he can provide some insight on this matter

Comment by Bruno Faccini (Inactive) [ 11/Feb/13 ]

Thank's Brian and Peter, I will investigate at EPEL and wait for an update from Jeff.

Also, with new DKMS rpms there is also the need to test them... This means, at least their installation+build (over a choice of Kernels), and also the correct load of re-builded Lustre modules and then correct run-time behavior (auto-test exposure ?).
What is the best way to get this started, an associated TT- JIRA ?

Comment by Jeff Mahoney [ 11/Feb/13 ]

I don't do anything with dkms, so I had to dig a little. It looks like we don't actually ship dkms for SLES 11 so I'm not sure what insight I can offer here.

FWIW, the spec file as shipped in the Lustre source doesn't really work for us either. I ended up hacking the stock spec file quite a bit. Our kernel module packages aren't tied to a specific kernel version – they're tied to the kABI offered by the kernel and our weak-updates implementation should handle this automatically. If a module breaks between updates in a service pack, we'd consider that a bug in our kernel. The SLES kernel consists of multiple "flavors." On x86_64, there's -default, -xen, and -trace. (The utility of the -trace kernel has been somewhat diminished starting with SP2 since tracing became much more lightweight between 2.6.32 and 3.0. It really just means that function tracing is available.) Our KMP infrastructure defines the packages for each flavor automatically based on a template package in the spec file. The source is built once for each flavor and then packaged separately for each flavor with the proper dependencies automatically computed against the symbols exported by the kernel package. If the KMP infrastructure isn't used, this doesn't happen AFAIK. I believe the dkms system does make use of the KMP infrastructure, though.

Based on what I've seen in the spec file, it seems silly that the lustre and lustre-client packages are built separately at all. It seems like the only difference is the client modules don't depend on ldiskfs. It's trivial to export multiple packages based on a single build. Even if the end goal is to split the packages into things like lustre-modules-common, lustre-modules-client, and lustre-modules-server, they still could all be built at the same time AFAICT. That's what I did with my spec file (without the -common, sticking with the naming scheme already in place).

Comment by Brian Murrell (Inactive) [ 11/Feb/13 ]

Jeff,

The lustre-client packages are somewhat of an historical artefact, back to before there was no notion of a separate client modules build.

But even in the current incarnation, there is a difference between the lustre-modules and the lustre-client-modules. The latter are client modules that work without kernel patches and the former (which granted, is typically only installed on servers) (still) contains the patched client modules.

The real reason for the separate build for the client is that the lustre tree is actually "configure"d with different flags (--enable/disable-server) depending on whether you want to build the lustre server modules, against a patched kernel or the patchless client modules against an unpatched kernel, so not only are the --enable/disable-server flags different, so are the flags pointing to the kernel-devel (in RHEL parlance) tree you want to build against.

Comment by Andreas Dilger [ 11/Feb/13 ]

Jeff, could you please attach your .spec file and/or submit a patch for how our .spec file should be changed, if you think it is suitable for inclusion to our tree?

I agree we are carrying a lot of historical baggage in our build system, and if you have already done the work to make a separate .spec file to clean this up then it can at least form the basis of an improved .spec in our tree.

I agree with Brian that the separation of lustre-modules and lustre-client-modules is not totally cosmetic yet for everyone (the server and client kernels are still slightly different for everyone but you). However, we are pretty close to eliminating those remaining differences.

Comment by Jeff Mahoney [ 11/Feb/13 ]

Sure, I'll attach it. It's 100% not ready for inclusion into the tree, especially after digging a little deeper into the client/server differences. Is there any benefit to having separate client modules if both client and server are supported on the host OS?

Comment by Jeff Mahoney [ 11/Feb/13 ]

Spec file I'm using for SLES builds. 100% not ready for inclusion.

Comment by Andreas Dilger [ 11/Feb/13 ]

For a "server" build (i.e. lustre-modules), it has both the Lustre client and server functionality. The "client" build (i.e. lustre-client-modules) it has only the client functionality.

Once the dependency on server kernel patches is removed, the main difference between the two would be the dependence on ldiskfs/zfs and the patched e2fsprogs. I guess it would be possible to remove the client modules from the server package, and allow both -client and -server RPMs to be installed at the same time on the same node. I suspect that James, Chris, and Prakash have stronger opinions about the best way to package the modules.

Comment by Prakash Surya (Inactive) [ 11/Feb/13 ]

I think having an end goal of 3 packages: lustre-common, lustre-server, and lustre-client (as Jeff mentioned above) would be a nice spot to be in. That would also align well with the NFS tools, IIRC, which could potentially ease the transition for somebody new to Lustre.

Comment by Christopher Morrone [ 11/Feb/13 ]

Do you understand the nature of the actual DKMS bug? Did you follow any of the links to bug reports that Guy posted above?

That is not particularly constructive, Brian.

According to Guy, the "make clean" is what breaks the build and requires faulting back to autotools. So my first instinct is to figure out what is wrong with lustre's "clean" target, not to go working around the problem by changing DKMS.

Bruno's investigation provides a more detailed view into the problem. But even he at the end admits that it is only a problem with Lustre's build system,

BTW, this problem could only impact us due to our build system ... And this may be why DKMS developers do not seem to take care of it.

.

I know that doing things "right" is harder, but it is ten years of shortcuts like this that got us the mess that we currently have in the build system. It is time to start moving us in the right direction.

Comment by Brian Murrell (Inactive) [ 12/Feb/13 ]

Chris,

You are right. On re-reading, my comment did not come across as genuine as I intended. Please accept my apologies. I really was asking if you read the DKMS bug because it seemed very clear to me what the bug was based on the bug description.

As you know, the autotools system does a good (sometimes too good) job of maintaining it's own dependencies and if a prerequisite file (i.e. Makefile.am) is newer than a generated file (i.e. Makefile.in which is in turn a prerequisite of Makefile) autotools directs "make" to re-generate the dependent files (i.e. Makefile) before using them. This is the crux of the problem. Because DKMS copies the tree without regard to preserving timestamps, if files are not copied in dependency order (which they will not be) autotools wants to regenerate the dependent files based on those new timestamps.

And of course, regenerating all of the Makefile (and possibly configure) dependencies is what causes autotools to be needed.

Short of fixing DKMS, the only other alternative I can see is to extract the kernel module sources out of the existing autotools-driven tree and put them into their own tree that had static Makefiles without autotools.

I wonder how much value we get out of the kernel modules being in the general autotools-driven tree any more. I think historically it was done this way to allow both kernel and userspace lustre (i.e. liblustre – which would need the assistance of autotools to be portable) to be built.

Comment by Prakash Surya (Inactive) [ 12/Feb/13 ]

Short of fixing DKMS, the only other alternative I can see is to extract the kernel module sources out of the existing autotools-driven tree and put them into their own tree that had static Makefiles without autotools.

Again, I'd like to reiterate that I have been able to build and install Bruno's DKMS patch with some modifications to the Lustre build process of my own (i.e. http://review.whamcloud.com/5301).

Also, I have built the necessary DKMS infrastructure into the ZFS on Linux build process (https://github.com/zfsonlinux/zfs/commit/26e08952e6ad113b91ae7d31263b6a4fd3a5a09f) which does not get hung up on the DKMS bug. So there is no reason we cannot do this for Lustre, although we'll probably have to do some much needed restructuring of our autotools infrastructure to get there (some of which will probably happen in LU-1199, eventually).

Comment by Bruno Faccini (Inactive) [ 13/Feb/13 ]

Pushed a new version of patch to work-around Maloo SLES builds issues with earlier versions. Prakash, if successful this may fix the SLES builds issues you got with your patch too ...

Chris, did you read my logs and analysis about the reason DKMS thinks that some files need to be rebuilt and thus requires autotools ?? It is clear that un-preserved timestamps are the problem. And if I agree that our build system has its specifics that triggers the DKMS problem, it is clear for me that DKMS is doing things wrong by not preserving timestamps, for others products/packages this at least leads to extra and not needed work.

Jeff, since Suse distros include DKMS, is there a way you/Suse can relay/support our request to fix this bug to DKMS folks ??

For EPEL/FC/RH reporting, I am a bit lost with who does what and where to ask, if somebody can help me to save time here, he is more than welcome!

Comment by Brian Murrell (Inactive) [ 13/Feb/13 ]

For EPEL/FC/RH reporting, I am a bit lost with who does what and where to ask, if somebody can help me to save time here, he is more than welcome!

I couldn't find an avenue for RHEL specifically but here is where you can report EPEL bugs for FC:

https://bugzilla.redhat.com/enter_bug.cgi?product=Fedora%20EPEL

But we also don't have any kind of "support" for RHEL so maybe that's why the lack of avenues. FC eventually filters into RHEL though, so filing it against FC is a good start.

Perhaps talk to PJones about whether we have any RHEL support/partner options we can lean on.

Comment by Christopher Morrone [ 13/Feb/13 ]

It is clear that un-preserved timestamps are the problem.

Yes, I understand. But we seem to disagree about which project is to blame for the problem. You guys think DKMS should change to suit us; I think that Lustre should be able to handle a copy of the source tree that results from "make dist".

So, OK, I'll dig into the problem.

Autotools already has a way to deal with the situation where both the timestamps are incorrect and the autotools are not available. The script that handles this is called "missing", and it is included in the "make dist" package. You can disable maintainer mode, at which point autotool's "missing" script fixes up the timestamps and Bob's your uncle (everything is fine).

See change 5423. With that patch, I did the following testing:

/tmp$ tar zxf ~/src/lustre/lustre-2.3.61.tar.gz
/tmp$ cp -r lustre-2.3.61 lustre-2.3.61-copy
/tmp$ cd lustre-2.3.61-copy
/tmp/lustre-2.3.61-copy$ # next let's show that I can reproduce the problem
/tmp/lustre-2.3.61-copy$ ./configure > /dev/null 2&>1 && make
CDPATH="${ZSH_VERSION+.}:" && cd . && /bin/sh /tmp/lustre-2.3.61-copy/missing --run aclocal-1.11
 cd . && /bin/sh /tmp/lustre-2.3.61-copy/missing --run automake-1.11 --foreign
automake-1.11: no `Makefile.am' found for any configure output
make: *** [autoMakefile.in] Error 1
/tmp/lustre-2.3.61-copy$ # next try it with maintainer mode disabled
/tmp/lustre-2.3.61-copy$ ./configure --disable-maintainer-mode > /dev/null 2&>1 && make
make  all-recursive
make[1]: Entering directory `/tmp/lustre-2.3.61-copy'
Making all in ldiskfs
make[2]: Entering directory `/tmp/lustre-2.3.61-copy/ldiskfs'
make  all-recursive
make[3]: Entering directory `/tmp/lustre-2.3.61-copy/ldiskfs'
Making all in ldiskfs
make[4]: Entering directory `/tmp/lustre-2.3.61-copy/ldiskfs/ldiskfs'
make -C /lib/modules/2.6.32-279.9.1.1chaos.ch5.1.x86_64/build M=/tmp/lustre-2.3.61-copy/ldiskfs/ldiskfs modules
make[5]: Entering directory `/usr/src/kernels/2.6.32-279.9.1.1chaos.ch5.1.x86_64'
  CC [M]  /tmp/lustre-2.3.61-copy/ldiskfs/ldiskfs/acl.o
  CC [M]  /tmp/lustre-2.3.61-copy/ldiskfs/ldiskfs/balloc.o
[cut]

It works.

So, I would suggest that you rebase your DKMS patch onto my change 5423, and then back out the part that calls autogen.sh. Finally, try adding "--disable-maintainer-mode" to the configure line in your DKMS script

Let me know if that is sufficient to deal with the problem.

Comment by Bruno Faccini (Inactive) [ 14/Feb/13 ]

Brian, thank's for the link.
Chris, sounds good ! I started to work this "missing" stub-script but gave-up ... I will add a dependency to your patch and add "--disable-maintainer-mode" option.
Prakash, your patch has dependency to mine but would be better if we reverse this, no ?

Comment by Prakash Surya (Inactive) [ 14/Feb/13 ]

I'll probably abandon my patch if you can get by using Chris's fix. It's good clean up, imo, but I'll wait for LU-1199 before doing any general cleanup up any of the build system stuff.

Comment by Bruno Faccini (Inactive) [ 18/Feb/13 ]

Chris, "-disable-maintainer-mode" works perfect !! So in patch #4 (now depending on your patch), I have removed any autogen.sh and auto-tools requirement, and added "-disable-maintainer-mode" option in dkms.conf configure command.

Prakash, successful SLES builds with path #3 seem to confirm that "BuildArch: noarch" is involved. I will try to better qualify this now.

Brian, even if we found a work-around I opened a bug-report (#912300) vs DKMS at FC/EPEL.

Comment by Bruno Faccini (Inactive) [ 18/Feb/13 ]

With the new DKMS-aware packages, specific care (DKMS-aware platform setup, installation, exposure vs different Kernels, ...) has to be developed in our testing-tools. So TT-1112 has been created to address the need.

Comment by Andreas Dilger [ 06/Mar/13 ]

Dropping this as a 2.4.0 release blocker. With the landing of LU-2391, the osd-zfs module is packaged into a separate RPM, so there is no dependency on the rest of the modules to have ZFS installed.

Anticipated 2.4 ZFS users will be able to compile Lustre from sources if this functionality is needed.

Comment by Bruno Faccini (Inactive) [ 07/Mar/13 ]

Ok, I worked more on the remaining problem with SLES builds and it is definitelly a limitation with the per-subpackage architecture handling in SLES-11 RPM tool. It is documented in some threads like at http://lists.opensuse.org/opensuse-buildservice/2012-10/msg00048.html where the answer is :

rpm of SLE 11 simply does not support the BuildArch setting per sub package.
It is a newer feature.

And this is definitely confirmed with the RPM tool announcement found at http://lists.rpm.org/pipermail/rpm-announce/2009-February/000015.html :

[Rpm-announce] Feature Highlight: Noarch Subpackages

Florian Festi ffesti at redhat.com 
Wed Feb 11 09:54:07 UTC 2009
Previous message: [Rpm-announce] [Press release] RPM version 4.6.0 is out
Next message: [Rpm-announce] RPM 4.7.0 beta1 available
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
With version 4.6.0 RPM supports noarch subpackages. This means that
adding "BuildArch: noarch" to a subpackage section makes this
subpackage noarch even if the main package is arch dependent. This
allows to make documentation, language or data subpackages noarch
without packaging the same sources twice or other workarounds.

Compatibility:

Spec files using this feature cannot be built on previous RPM
versions. The resulting binary rpms are fully backward compatible as
they are just normal noarch packages. Though packagers that need to
build their spec files on different and older distributions should
avoid using noarch subpackages . Distributions that build their spec
files with a defined version of RPM or software vendors that target
different distributions but use only on version of RPM for building
should consider noarch subpackages.

RPM based distributions should check their build systems whether they
can cope with builds on different architectures returning the same
noarch subpackages.

Benefits:

Noarch subpackages allow to increase the percentage of content
packaged in noarch packages with very little changes to spec
files. For distributions that build for several different arches this
can reduce the overall size of their repositories as long as noarch
packages are shared between the different arches. Distributions that
want to make massive use of this feature should also consider how they
can make sure that this benefit does actually apply. Possible
solutions are using hard links or to introduce a noarch repository
that is used by all arches.

For multilib installations noarch subpackages can also reduce the
amount of data that has to be stored, downloaded and installed twice.

Outlook:

All - or at least most - arch independent content should be in noarch
packages. Expect more features to make that possible in future
releases of RPM.
Previous message: [Rpm-announce] [Press release] RPM version 4.6.0 is out
Next message: [Rpm-announce] RPM 4.7.0 beta1 available
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Rpm-announce mailing list

BTW, SLES11[-SP2] ship with a RPM version 4.4.2.3.

So what next, should I need to revert and build DKMS RPM as a fully separate one, like lustre-iokit, or do we start to build them per-Arch waiting next SLES versions to upgrade their RPM tool version ??

Comment by Bruno Faccini (Inactive) [ 08/Mar/13 ]

I forgot to comment that the side effect of this single BuildArch handling on SLES11 causes the "on-target" builds to set their %configure macro with "--target=noarch-suse-linux" param from the lustre.spec file and then we hit the following error :

.....

Wrote: /var/lib/jenkins/workspace/lustre-reviews/arch/x86_64/build_type/client/distro/sles11/ib_stack/inkernel/BUILD/SRPMS/lustre-iokit-1.4.0-1.src.rpm
+ popd
+ local is_patchless=
+ true
+ is_patchless=yes
+ local lustre_tests=
+ true
+ /usr/bin/rpmbuild --target x86_64 -tb /var/lib/jenkins/workspace/lustre-reviews/arch/x86_64/build_type/client/distro/sles11/ib_stack/inkernel/lustre-2.3.61.tar.gz --define 'lustre_name lustre-client' --define '__find_requires /var/lib/jenkins/workspace/lustre-reviews/arch/x86_64/build_type/client/distro/sles11/ib_stack/inkernel/BUILD/find-requires' --define 'configure_args  --disable-server  --enable-liblustre --enable-liblustre-tests' --define 'kdir /var/lib/jenkins/workspace/lustre-reviews/arch/x86_64/build_type/client/distro/sles11/ib_stack/inkernel/BUILD/reused/usr/src/linux-2.6.32.36-0.5' --define 'kobjdir /var/lib/jenkins/workspace/lustre-reviews/arch/x86_64/build_type/client/distro/sles11/ib_stack/inkernel/BUILD/reused/usr/src/linux-2.6.32.36-0.5-obj/x86_64/default' --define '_tmppath /var/tmp' --define '_topdir /var/lib/jenkins/workspace/lustre-reviews/arch/x86_64/build_type/client/distro/sles11/ib_stack/inkernel/BUILD'
Building target platforms: x86_64
Building for target x86_64
Executing(%prep): /bin/sh -e /var/tmp/rpm-tmp.58587
+ umask 022
+ cd /var/lib/jenkins/workspace/lustre-reviews/arch/x86_64/build_type/client/distro/sles11/ib_stack/inkernel/BUILD/BUILD
+ cd /var/lib/jenkins/workspace/lustre-reviews/arch/x86_64/build_type/client/distro/sles11/ib_stack/inkernel/BUILD/BUILD
+ rm -rf lustre-2.3.61
+ /usr/bin/gzip -dc /var/lib/jenkins/workspace/lustre-reviews/arch/x86_64/build_type/client/distro/sles11/ib_stack/inkernel/lustre-2.3.61.tar.gz
+ tar -xf -
+ STATUS=0
+ '[' 0 -ne 0 ']'
+ cd lustre-2.3.61
++ /usr/bin/id -u
+ '[' 1000 = 0 ']'
++ /usr/bin/id -u
+ '[' 1000 = 0 ']'
+ /bin/chmod -Rf a+rX,u+w,g-w,o-w .
+ ln lustre/ChangeLog ChangeLog-lustre
+ ln lnet/ChangeLog ChangeLog-lnet
+ exit 0
Executing(%build): /bin/sh -e /var/tmp/rpm-tmp.22702
+ umask 022
+ cd /var/lib/jenkins/workspace/lustre-reviews/arch/x86_64/build_type/client/distro/sles11/ib_stack/inkernel/BUILD/BUILD
+ /bin/rm -rf /var/tmp/lustre-2.3.61-root
++ dirname /var/tmp/lustre-2.3.61-root
+ /bin/mkdir -p /var/tmp
+ /bin/mkdir /var/tmp/lustre-2.3.61-root
+ cd lustre-2.3.61
+ '[' -z '' ']'
++ egrep -c '^cpu[0-9]+' /proc/stat
+ RPM_BUILD_NCPUS=2
+ '[' 2 -eq 0 ']'
+ '[' 2 -gt 8 ']'
+ rm -rf /var/tmp/lustre-2.3.61-root
+ cd /var/lib/jenkins/workspace/lustre-reviews/arch/x86_64/build_type/client/distro/sles11/ib_stack/inkernel/BUILD/BUILD/lustre-2.3.61
+ CONFIGURE_ARGS='--disable-server  --enable-liblustre --enable-liblustre-tests --with-release=1'
+ CONFIGURE_ARGS='--disable-server  --enable-liblustre --enable-liblustre-tests --with-release=1 --enable-tests --enable-liblustre-tests'
+ '[' -n /var/lib/jenkins/workspace/lustre-reviews/arch/x86_64/build_type/client/distro/sles11/ib_stack/inkernel/BUILD/reused/usr/src/linux-2.6.32.36-0.5 ']'
++ echo --disable-server --enable-liblustre --enable-liblustre-tests --with-release=1 --enable-tests --enable-liblustre-tests
++ sed -e 's/"\?--with-linux=[^ ][^ ]* \?//'
+ CONFIGURE_ARGS='--disable-server --enable-liblustre --enable-liblustre-tests --with-release=1 --enable-tests --enable-liblustre-tests'
+ '[' -n /var/lib/jenkins/workspace/lustre-reviews/arch/x86_64/build_type/client/distro/sles11/ib_stack/inkernel/BUILD/reused/usr/src/linux-2.6.32.36-0.5-obj/x86_64/default ']'
++ echo --disable-server --enable-liblustre --enable-liblustre-tests --with-release=1 --enable-tests --enable-liblustre-tests
++ sed -e 's/"\?--with-linux-obj=[^ ][^ ]* \?//'
+ CONFIGURE_ARGS='--disable-server --enable-liblustre --enable-liblustre-tests --with-release=1 --enable-tests --enable-liblustre-tests'
+ CFLAGS='-g -O2 -Werror'
+ export CFLAGS
+ CXXFLAGS='-g -O2 -Werror'
+ export CXXFLAGS
+ FFLAGS='-g -O2 -Werror'
+ export FFLAGS
+ eval ./configure --host=x86_64-suse-linux-gnu --build=x86_64-suse-linux-gnu --target=noarch-suse-linux --program-prefix= --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64 --libexecdir=/usr/lib64 --localstatedir=/var --sharedstatedir=/usr/com --mandir=/usr/share/man --infodir=/usr/share/info --with-linux=/var/lib/jenkins/workspace/lustre-reviews/arch/x86_64/build_type/client/distro/sles11/ib_stack/inkernel/BUILD/reused/usr/src/linux-2.6.32.36-0.5 --with-linux-obj=/var/lib/jenkins/workspace/lustre-reviews/arch/x86_64/build_type/client/distro/sles11/ib_stack/inkernel/BUILD/reused/usr/src/linux-2.6.32.36-0.5-obj/x86_64/default --disable-server --enable-liblustre --enable-liblustre-tests --with-release=1 --enable-tests --enable-liblustre-tests
++ ./configure --host=x86_64-suse-linux-gnu --build=x86_64-suse-linux-gnu --target=noarch-suse-linux --program-prefix= --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64 --libexecdir=/usr/lib64 --localstatedir=/var --sharedstatedir=/usr/com --mandir=/usr/share/man --infodir=/usr/share/info --with-linux=/var/lib/jenkins/workspace/lustre-reviews/arch/x86_64/build_type/client/distro/sles11/ib_stack/inkernel/BUILD/reused/usr/src/linux-2.6.32.36-0.5 --with-linux-obj=/var/lib/jenkins/workspace/lustre-reviews/arch/x86_64/build_type/client/distro/sles11/ib_stack/inkernel/BUILD/reused/usr/src/linux-2.6.32.36-0.5-obj/x86_64/default --disable-server --enable-liblustre --enable-liblustre-tests --with-release=1 --enable-tests --enable-liblustre-tests
checking build system type... x86_64-suse-linux-gnu
checking host system type... x86_64-suse-linux-gnu
checking target system type... Invalid configuration `noarch-suse-linux': machine `noarch-suse' not recognized
configure: error: /bin/sh ./config.sub noarch-suse-linux failed
error: Bad exit status from /var/tmp/rpm-tmp.22702 (%build)


RPM build errors:
    Bad exit status from /var/tmp/rpm-tmp.22702 (%build)
+ fatal 1 'Error building rpms for  x86_64.'
+ cleanup
+ true
+ error 'Error building rpms for  x86_64.'
+ local 'msg=Error building rpms for  x86_64.'
+ '[' -n 'Error building rpms for  x86_64.' ']'
+ echo -e '\nlbuild: Error building rpms for  x86_64.'

lbuild: Error building rpms for  x86_64.
+ exit 1
+ run_exit_traps
+ local i num_items=2
++ seq 1 -1 0
+ for i in '$(seq $((num_items-1)) -1 0)'
+ '[' -z '[ -n "$CCACHE" ] && ccache -s' ']'
+ eval '[' -n '"$CCACHE"' ']' '&&' ccache -s
++ '[' -n ccache ']'
++ ccache -s
cache directory                     /var/lib/jenkins/.ccache
cache hit                              0
cache miss                             0
files in cache                      8033
cache size                         881.9 Mbytes
max cache size                     976.6 Mbytes
+ for i in '$(seq $((num_items-1)) -1 0)'
+ '[' -z 'kill -INT -1114 || true' ']'
+ eval kill -INT -1114 '||' true
++ kill -INT -1114
../build/exit_traps.sh: line 59: kill: (-1114) - No such process
++ true
+ rc=1
+ '[' 1 '!=' 0 ']'
+ echo 'Build failed'
Build failed
+ '[' sles11 == fc18 ']'
+ exit 1
Build step 'Execute shell' marked build as failure
Archiving artifacts
Finished: FAILURE

A possible "dirty" fix could be to test the build platform in spec file and only set "BuilArch: noarch" when not SLES ...

Comment by Bruno Faccini (Inactive) [ 08/Mar/13 ]

Also, may be Jeff can have some good news for us concerning a possible RPM tool upgrade in up-coming SLES versions/SPs ??

Comment by Bruno Faccini (Inactive) [ 19/Mar/13 ]

Submitted a new patch #7 version with the add-on of a work-around to allow SLES builds to succeed with the only side effect that their DKMS RPM will not be built as "no arch". This restriction will be removed when future SLES versions will use RPM 4.6.0+ version.

Comment by Jeff Mahoney [ 19/Mar/13 ]

Thanks, Bruno. Apologies for not responding sooner. Yeah, an RPM package update in SLE11 SP3 isn't planned. There's no feature request for it, so it won't get changed otherwise. Not that it helps the official support matrix, but openSUSE releases after SLE11 do have the updated RPM version.

Comment by Bruno Faccini (Inactive) [ 31/Mar/13 ]

Just submitted http://review.whamcloud.com/5897 to address need of osd_zfs module rebuild within DKMS framework. I took a few arbitrary choices (dkms.conf location, ...) regarding recent changes (new directories/locations of config/build files/scripts) introduced with LU-1199 patches that may result in further comments/modifications.

As per my local testing it builds (have to follow ZFS/SPL build needs), installs+re-builds under DKMS.

But then depmod find 6 unsatisfied references/symbols (qsd_[op_begin,start,fini,prepare,init,op_end]()) directly comming from the quota module. They will need to be avoided if we want to get a usable/loadable stand-alone osd_zfs module now.

Comment by Christopher Morrone [ 08/Apr/13 ]

Unfortunately, I am beginning to believe that the approach that is being taken to add a dkms package for the osd-zfs only is fundamentally flawed.

The current approach basically packages up the full (or most) source code to lustre into a dkms package. It then build most of lustre, just to get a single osd-zfs module which will be installed.

The problem is that this is fundamentally wrong. We should not be building the osd-zfs module against its own internally packaged version of lustre. It needs to be installed against the installed version of lustre.

Think about what will happen when a user installs a new binary rpm of lustre. The the DKMS package of the osd-zfs could quite possibly no longer be binary compatible with the new lustre. Installing a new kernel might cause a full rebuild of the osd-zfs package, and that build might even succeed. But it will always be rebuilt against its internal copy of lustre, not the installed lustre.

Unfortunately, we made the decision long ago that supporting a fully externally built osd module was not something that we needed to address. So we haven't fixed up our packaging and apis to be clean enough to support such a thing. To do so would probably require a great deal of work. Certainly too much to be done by 2.4.

Perhaps the best way to get around this problem is to make all of lustre, including all server modules, DKMS-enabled.

Comment by Christopher Morrone [ 08/Apr/13 ]

Another issue that we are going to have is that DKMS doesn't support dependencies, and there are dependencies here. If a person installs a new kernel, the DKMS build is extremely likely to break. The dkms packages must be rebuilt in a specific order: spl, zfs, osd-zfs. Brian Behlendorf has modified DKMS to support dependencies. Without that modified DKMS, this solution isn't going to work.

Comment by Brian Behlendorf [ 08/Apr/13 ]

Actually Darik Horn, the Ubuntu ZFS maintainer, did the hard work and wrote the patches for proper dependency support. They can be found here, with luck upstream with accept them.

https://github.com/zfsonlinux/dkms/commits/patch-queue/master/ubuntu/raring

We're already hosting source and noarch packages which get installed from the zfsonlinux repository until these improvements are picked up by the distributions. Here's a link to the el6 versions.

http://archive.zfsonlinux.org/epel/6/SRPMS/dkms-2.2.0.3-2.zfs1.el6.src.rpm
http://archive.zfsonlinux.org/epel/6/x86_64/dkms-2.2.0.3-2.zfs1.el6.noarch.rpm

Comment by Bruno Faccini (Inactive) [ 09/Apr/13 ]

I was a bit late for the odd_zfs DKMS stuff, and since I was expecting a lot of feed-back from you guys, I decided to push a very basic patch version.
I am not disappointed with the results!
Thank's already to all for your ideas and contributions that are of a great help. I can say that at least some of the questions/potental-problems Chris listed in his 2 last updates went to my mind too at the time I was writing the patch.
But again, I decided to do it first according what I think is possible with current Lustre build capabilities.
I also think having autonomous/stand-alone build of osd_zfs module or full/Server Lustre DKMS would be the mandatory/principal next step.
Last, I am not sure I fully understand what DKMS packages inter-dependencies support will provide, is it capable for example to re-start an odd_zfs rebuild when a new zfs is installed ?

Comment by Christopher Morrone [ 09/Apr/13 ]

I understand, and I think you did a good job considering the existing state of the Lustre build system!

Last, I am not sure I fully understand what DKMS packages inter-dependencies support will provide, is it capable for example to re-start an odd_zfs rebuild when a new zfs is installed?

I don't know if it handles that or not. What it should handle is when a new kernel is installed. It should have DKMS compile the modules in this specific order: spl, zfs, osd-zfs. If they are built in any other order, the compilation will fail and DKMS will abort.

Comment by Brian Behlendorf [ 10/Apr/13 ]

I don't want to step on anybodies toes, but I took a crack at this since I have some experience with dkms packaging. With the following three patches applied I'm able to create a lustre-dkms package and a matching user space lustre package. The lustre-dkms package includes support for the client and support for ZFS servers. I was forced to disable the ldiskfs functionality because we don't have reliable dkms packaging for it yet. The patches extend the existing functionality in the Lustre build system in the following ways:

http://review.whamcloud.com/#change,5960 - zfs-0.6.1 kmod+dkms compatibility
http://review.whamcloud.com/#change,6019 - Add Lustre DKMS spec file
http://review.whamcloud.com/#change,6020 - Honor --disable-modules option in spec file

To verify that the packages work as expected I applied them to a branch off the Lustre v2.3.63 tag. For reference I've pushed that branch to github, https://github.com/chaos/lustre/tree/v2_3_63-dkms. Using that branch I built the required packages and added them to ZFS on Linux, EPEL 6 repository, see http://zfsonlinux.org/epel.html. Next I created a pristine CentOS 6 image in a VM, added the ZoL repository and did a 'yum install lustre'. As expected everything gets installed and you have either a Lustre client or Lustre ZFS based server ready for use. Here's a transcript of the install process.

https://gist.github.com/behlendorf/5358099#file-lustre-dkms-install

If you want to try it yourself just run the following commands in either a RHEL or CentOS 6 environment.

sudo yum localinstall --nogpgcheck http://archive.zfsonlinux.org/epel/zfs-release-1-2.el6.noarch.rpm
sudo yum install lustre

As I mentioned earlier the packages are based off the 2.3.63 tag. It would be great if we could get these changes merged before the 2.3.64 tag is made. There are a few fixes in master which resolve some of the warnings which are issue during the build. My intention is to refresh the lustre packages in the ZFS on Linux repositories only when new upstream tags are made. When Lustre 2.4.0 is tagged I'll just start tracking the Lustre maintenance releases.

Comment by Brian Behlendorf [ 03/Sep/13 ]

The zfsonlinux repository was updated to include dkms packages for spl/zfs-0.6.2 and lustre-2.4.0. The update branch can be found here https://github.com/chaos/lustre/tree/v2_4_0-dkms, it includes refreshed versions of the following two patches which have been submitted before but not yet included. I can push refreshed copies if upstream is interested in adding dkms support.

https://github.com/chaos/lustre/commit/f7aeb9dd20c5c7cd044d06757f0782d8da8d2f92 -
https://github.com/chaos/lustre/commit/4afc95703208fcb50eb35ef72466e1d8af97b46a

Comment by Bruno Faccini (Inactive) [ 23/Dec/13 ]

For IEEL (INTL-26), I have re-worked my original patch http://review.whamcloud.com/5284 that allows Lustre-Client modules re-build under DKMS control. It is patch-set #9.

It should need further add-on to handle llite_lloop and ptlrpd_gss special cases, due to their respective configure-time and gss/krb5 dependencies on target platforms.

I also think that may be more Requirements need to be set for this DKMS RPM.

Also, Lustre-Server modules future DKMS re-build has been somewhat planned.

Comment by Bruno Faccini (Inactive) [ 07/Jan/14 ]

TEI-1359 has been created, as a follow-on to TT-1112 (that has been mistakenly mixed/duped with TT-1244/TEI-74) in order to have DKMS RPMs testing become part of autotests.

Comment by Bruno Faccini (Inactive) [ 20/Jan/14 ]

Actually working on the best way to package (Requires/Conflicts/...) DKMS lustre-client-modules RPM.
BTW, I just added a "Conflicts: " for the lustre-modules itself since we need to ensure that no dual modules install (lustre-modules RPM vs DKMS rebuild) can occur that may lead to unpredictable behavior.

Comment by Bruno Faccini (Inactive) [ 14/Feb/14 ]

Finally it has been decided to better use Brian's set of patches to get this DKMS RPM generated. After some little re-work+rebase both changes #5960 and #6019 have already landed and #6020 will soon.

I have also updated TEI-1359 to detail the build procedure and also the basic testing requirements to be included in our build/test tools.

Possible next steps will be, 1st to add the ability to create a Client only DKMS RPM and also later the ldiskfs modules when no Kernel patching will be necessary.

Comment by Peter Jones [ 19/Feb/14 ]

Patches landed for 2.6

Comment by Brian Murrell (Inactive) [ 08/Oct/14 ]

bfaccini: there was mention at one point that we needed a patch in DKMS and that while we wait for upstream to ship that we'd build our own DKMS RPM for EL6 and ship that. Did anything ever come of that?

Comment by Bruno Faccini (Inactive) [ 09/Oct/14 ]

Hello Brian,
No I am not aware that something has been done to ship a DKMS tool RPM including my patch.
But on the other hand, seems that upstream integration of my fix is done and we need to now use dkms-2.2.0.3-28 RPM/version from Fedora FC[19-21]/EPEL[5-7] distros.

Comment by Brian Murrell (Inactive) [ 09/Oct/14 ]

Hi bfaccini,

But that's roughly the same thing. The point though is that we are not actively building a working DKMS for our users to use, right?

Comment by Bruno Faccini (Inactive) [ 09/Oct/14 ]

For me the (big?) difference is that we don't need to maintain+build it on our own now!
And we can now make specific dkms-2.2.0.3-28 version a >= Required for our Lustre-DKMS RPM.

Comment by Brian Murrell (Inactive) [ 09/Oct/14 ]

@bruno: Yes, of course ultimately having upstream maintain those packages is the ultimate goal, but where are our customers, running EL6 systems, going to get dkms-2.2.0.3-28 binaries, today? That is the gap that I am proposing we need to fill.

Comment by Brian Behlendorf [ 09/Oct/14 ]

As of dkms-2.2.0.3-20 for EPEL/Fedora we got support merged to allow reliable building of the SPL and ZFS code. The updated version has been in their repository for a while now and we've been able to drop our custom build. See https://bugzilla.redhat.com/show_bug.cgi?id=1023598.

Comment by Bruno Faccini (Inactive) [ 09/Oct/14 ]

Brian,
Your update reminds me that I forgot (sorry!) to let you aware of a problem I have found in the "DKMS inter-modules dependencies feature" implementation (ie, BZ #1023598) during our testing of the Lustre DKMS RPM install/build. I reported it as part of FC/RHEL BZs #1140812 / #1143051, which ended in dkms-2.2.0.3-28.

Comment by Brian Behlendorf [ 09/Oct/14 ]

Interesting. OK, thanks for letting me know of getting it fixed. Unfortunately, it seems to have another problem as of today. There was recently an upstream patch merged which broke things by removing the builds directory. Here's the new ticket https://bugzilla.redhat.com/show_bug.cgi?id=1151123

Comment by Brian Murrell (Inactive) [ 09/Oct/14 ]

Ugh. So dkms-2.2.0.3-28.git.7c3e7c5.fc20.noarch doesn't work either now? Is there a version that is known to work? Do we know of a patch to dkms-2.2.0.3-28.git.7c3e7c5.fc20.noarch that will make it work?

I suppose we could just apply the reversion of http://linux.dell.com/cgi-bin/cgit.cgi/dkms.git/commit/dkms?id=2ea43f6108558849125cc1d66902d6992ee3fe39 and take advantage of the bug they fixed until this is more properly fixed.

Comment by Brian Behlendorf [ 09/Oct/14 ]

We should be able to sort this out on our side. It would be best for everyone if we relied on the version of DKMS shipped by the distributions. We're sorting out a reasonable fix now, there's a proposed patch in https://github.com/zfsonlinux/zfs/pull/2776. You'll likely need to make a similar tweak for Lustre.

Comment by Brian Behlendorf [ 10/Oct/14 ]

We've sorted the DKMS issues on the ZFS side and have updated the ZFS stable repositories before dkms-2.2.0.3-2 made it in to EPEL. Only Fedora was impacted for a few hours but Lustre may need a similar fix to that in 2276 above.

Comment by Bruno Faccini (Inactive) [ 20/Oct/14 ]

I am adding this comment in this closed/fixed JIRA since it has been the main ticket to track issues around the Lustre modules DKMS RPMs creation, so patch to implement the necessary stuff to allow for a Lustre Client (only) modules DKMS RPM is at http://review.whamcloud.com/12347.

Comment by Peter Jones [ 20/Oct/14 ]

Bruno I think that it would be better to create a new ticket that is linked to this one

Comment by Gerrit Updater [ 29/Oct/15 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/12347/
Subject: LU-1032 build: DKMS RPM for Lustre Client modules
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 549c57e77b7f3a4cae5a7381d612a499c2ca3dcc

Comment by Gerrit Updater [ 04/Apr/17 ]

-Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: https://review.whamcloud.com/26356-
Subject: LU-1032 build: fix typo in lustre-dkms.spec changelog
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: c7b3fc6ea51887183efbbfda106fb6f804a29c89

Comment by Gerrit Updater [ 05/Apr/17 ]

Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: https://review.whamcloud.com/26358
Subject: LU-1032 build: fix typo in lustre-dkms.spec changelog
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 166a49ce2a1d7b5a4491a33e3ebac0aef72006b0

Comment by Gerrit Updater [ 26/Apr/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/26358/
Subject: LU-1032 build: fix typo in lustre-dkms.spec changelog
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 74fb19ec4b12e9fb416c2d8fbe98825bfdd05846

Generated at Sat Feb 10 01:12:51 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.