[LU-2391] warning messages for missing symbols when lustre-modules::osd_zfs.ko installed on a system without zfs-modules installed Created: 27/Nov/12 Updated: 23/Apr/13 Resolved: 06/Mar/13 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.0 |
| Fix Version/s: | Lustre 2.4.0 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Andreas Dilger | Assignee: | Bruno Faccini (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | LB | ||
| Issue Links: |
|
||||||||||||||||||||
| Severity: | 3 | ||||||||||||||||||||
| Rank (Obsolete): | 5667 | ||||||||||||||||||||
| Description |
|
On Nov 8, 2012, at 5:54 AM, Prem Sethna Kancherla wrote on lustre-discuss:
I think three options exist:
|
| Comments |
| Comment by Brian Murrell (Inactive) [ 27/Nov/12 ] |
|
If this is happening because the user is installing the Lustre RPMs on a system that does not have the ZFS modules built and installed, then yes, the correct solution is to package osd_zfs.ko in it's own RPM and then have that "lustre-osd-zfs" RPM depend on whichever RPM provides the modules with the missing symbols. We actually have this same problem if/when we build Lustre with OFED instead of the kernel-supplied I/B stack since the o2ib LND module ends up depending on the OFED modules which are supplied by (historically) kernel-ib. |
| Comment by Christopher Morrone [ 27/Nov/12 ] |
|
Moving osd_zfs to its own rpm sounds reasonable. I have been considering something like that independantly. Although I think if we go that route, doing the same for osd_ldiskfs would be be good as well. Then not only will users be able to use ldiskfs without zfs packages, they will be able to use zfs without installing ldiskfs packages. I was also considering breaking the "lustre" binary rpm into three parts: "lustre-core", "lustre-server", and "lustre-client". lustre-core would be Required by lustre-server and lustre-client and be modules that they have in common. Then I start wondering if the rpm count is getting out of control. But it would be really nice to have one build that can go into a distro, and not have to install server components and init scripts and such on client nodes. More than anything though, the core bug here is that the lustre rpm has a dependency that the rpm spec file fails to declare. The dependency problem is one that I try to address in my spec file overhaul in |
| Comment by Peter Jones [ 30/Nov/12 ] |
|
Bruno Could you please look into this one? Thanks Peter |
| Comment by Bruno Faccini (Inactive) [ 04/Dec/12 ] |
|
So let's try to extract/move "osd_zfs.ko" to a separate RPM and see how it works ?? Also, I remember that missing OFED/IB RPMs can also lead to the same msgs ... |
| Comment by Andreas Dilger [ 12/Dec/12 ] |
|
It makes sense to implement the osd-zfs code in a separate DKMS RPM package that can depend on the zfs-modules and zfs-modules-devel package, and be built after the ZFS DKMS packages are installed. This will avoid any symbol issues, and will also allow users to install and use Lustre + ZFS in a relatively painless manner. Chris is also working to clean up the ldiskfs packaging in |
| Comment by Prakash Surya (Inactive) [ 12/Dec/12 ] |
|
If we separate osd-zfs, shouldn't we also separate osd-ldiskfs? Lets not be discriminatory. |
| Comment by Bruno Faccini (Inactive) [ 14/Dec/12 ] |
|
Sure, if I can make it for osd-zfs, I bet I'll be able to do it for osd-ldiskfs too ... |
| Comment by Bruno Faccini (Inactive) [ 19/Dec/12 ] |
|
My first try as a master patch is at http://review.whamcloud.com/4869. Let's see if it builds separate RPMs to bring osd_[ldiskfs,zfs].ko modules which can then be installed upon need for ldiskfs/zfs back-end support. My local tests were successful, but it needs ZFS-builds exposure ... |
| Comment by Bruno Faccini (Inactive) [ 20/Dec/12 ] |
|
Too bad, the first (el5 <, and Server) builds with my 1st patch version fails with the following messages : Processing files: lustre-debuginfo-2.3.56-2.6.32_279.5.1.el6_lustre.g8e452a9.x86_64_g697bcea.x86_64 so seems that we may trigger some RPM tool limitation with sub-packages during debuginfo package creation .... I will push a new version of my patch which will remove the osd_[ldiskfs,zfs].ko modules during the %install step, instead of excluding them and see how it works. An other way could be to disable/set"%define _unpackaged_files_terminate_build 0" to fix ... |
| Comment by Brian Murrell (Inactive) [ 20/Dec/12 ] |
|
Without having looked any part of the rest of this, I would say that...
seems like a bad idea. That's a safety/sanity check. I doubt RH themselves use it on any of the thousands of packages they build/maintain which means there must be some other way. Or maybe the way we are generating the debuginfo packages is wrong. That code only just landed in the last week or 4 IIRC. |
| Comment by Bruno Faccini (Inactive) [ 24/Dec/12 ] |
|
Brian, |
| Comment by Brian Murrell (Inactive) [ 24/Dec/12 ] |
|
Bruno, TBH, I've never really looked that deeply into RH's flavour of debuginfo package building. |
| Comment by Bruno Faccini (Inactive) [ 28/Dec/12 ] |
|
Humm finally I went thru the analysis of the very long build log showing the failure with my patch and I found that the reason why osd_[ldiskfs,zfs].ko were unpackaged was because they were not processed by "rpmbuild", which can be a bug/difference (with the one I used for my local build/tests, ie ) in the tool version used on Hudson/Jenkins, or because of some problem with my triggers/tests used to generate odd-[ldiskfs.osd-zfs] sub-packages %files lists ... Will provide a new patch soon based on this finding. |
| Comment by Bruno Faccini (Inactive) [ 02/Jan/13 ] |
|
Ok, as I wanted to test and verify, Patch #2 did not report errors, now that I also embed the modules exclude with the same test/trigger than the %file, but did not generate the osd-RPMs !! This mean I am wrong with the way I test if I need to build the osd-RPMs or not, and this seems confirmed (for others/additional reasons !!) by the patch comments from C.Morrone too (thank's !!). I will again review the new build report and see how I can change this test method, and also review other comments. |
| Comment by Bruno Faccini (Inactive) [ 03/Jan/13 ] |
|
Patch #3 now allows osd-[ldiskfs,zfs] RPMs to build local/Jenkins, but now and as it can be expected with new RPMs to take care of, we face a problem during autotest early step (lustre-initialization) because "osd-ldiskfs" module has not been installed and can not be loaded !! How can I customize autotest and have it handle my 2 new RPMs ?? I don't find something like "lustre-initialization" in the tests part of the source-tree ... |
| Comment by Bruno Faccini (Inactive) [ 03/Jan/13 ] |
|
BTW, just tried to provision the latest/11636 build from Jenkins and is works fine after I download and install lustre-osd-ldiskfs by hand. So can somebody help/tell me on how I can request or configure my 2 additional RPMs to be installed during autotest initialization ?? |
| Comment by Andreas Dilger [ 03/Jan/13 ] |
|
I've filed TT-1009 for fixing the node provisioning during test runs. Bruno, are these new RPMs just the binary RPMs with osd-zfs.ko and osd-ldiskfs.ko? For Lustre 2.4 we need osd-zfs to be packaged as a DKMS module, so that should be the next step after the basic binary osd-zfs and osd-ldiskfs module installation is tested and working. It may be for DKMS modules that we need GCC and autoconf to be installed on the test server nodes, or alternately add an install step on the control node that builds the binary package from the DKMS package and then installs the binary RPM on the server nodes. |
| Comment by Christopher Morrone [ 03/Jan/13 ] |
|
Not that I'm really opposed, but why do we need DKMS support in 2.4? Wouldn't a buildable srpm be sufficient for most? I think we'll probably want a non-DKMS osd-zfs package too. |
| Comment by Andreas Dilger [ 03/Jan/13 ] |
|
The DKMS packaging is a requirement on our side for distributing Lustre ZFS packages. I suppose an SRPM would be OK in the short term, but it isn't as convenient for the end users. We aren't allowed to distribute a binary zfs-modules or osd-zfs RPM together with Lustre, but we can distribute the DKMS ZFS and osd-zfs RPMs together with Lustre, but end-users can build binary RPM packages from the DKMS packages and use that to install on multiple nodes, if there is a benefit to do so. The DKMS requirement for osd-zfs was in part based on zfsonlinux.org distributing only DKMS ZFS packages instead of binary modules. In terms of missing module symbols, I don't think that there is a requirement for any OSD symbol exports. OSDs accessed via method tables that are dynamically loaded based on the OSD type, so I believe that once osd-zfs is out of the Lustre RPM there will no longer be missing symbols. |
| Comment by Christopher Morrone [ 03/Jan/13 ] |
|
Maybe Brian is only putting DKMS builds of ZFS on zfsonlinux.org, but be aware that at LLNL we use the binary build, not the DKMS packages. So if you add support to make osd-zfs DKMS buildable, you also need to retain the capability to build normal binary rpms as well. I'm not sure that you really want that level of complexity. I know that I would really prefer for that work to happen after my I'm not sure that I agree that you couldn't distribute a binary osd-zfs either. We would just need normal binary rpms of spl/zfs available somewhere as well. Are you afraid of having a zfs binary rpm on your servers? I'm not sure what you mean about generating binary rpms from DKMS. I think that the choices are either generate binary rpm from source rpm, or install DKMS rpm. I don't believe that one generates a binary rpm from a DKMS rpm. |
| Comment by Bruno Faccini (Inactive) [ 04/Jan/13 ] |
|
Wow, will try to answer you all ... But my 1st impressions when I started working on this JIRA appear more than true, it is what we can call an "exercise in style" !! Andreas: Andreas,Brian,Prakash,Christopher: Prakash: " trigger ... |
| Comment by Prakash Surya (Inactive) [ 04/Jan/13 ] |
Using either Requires, Suggests, or Recommends is more than just a style thing. RPM and YUM use these keywords to build it's dependency tree, and do things like automatically pull in packages or emit warnings/failures when the dependencies are not met. Since one can not run a Lustre server without an OSD package, it's ideal to have the tool fail the lustre package install when a lustre-osd package isn't detected. It's more about correctness than elegance, IMO.
OK, just want to make sure the final patch that lands works correctly for all configurations. Temporary/Incremental versions are perfectly fine. |
| Comment by Bruno Faccini (Inactive) [ 05/Jan/13 ] |
|
Prakash, _ we can't use reliably a pseudo-package name in 2 packages And on an other hand, can we imagine that any people/team willing to install Lustre may ignore the back-end + OSD requirement ? And anyway, the dependency will appear at run- instead install- time and could be easily resolved if we systematically built and provide all OSD rpms as part of the Lustre-Server suite. But, may be I miss completely some more packaging issues/needs here because I react as an ex+recent on-site guy who had to deal with much more complex scenarios sometimes, just tell me. |
| Comment by Andreas Dilger [ 05/Jan/13 ] |
|
Is there an RPM manual you can look at? I think we need something like Recommends or Suggests, or whatever actually works. Otherwise the users will have more complexity than is necessary, and it could be avoided. |
| Comment by Bruno Faccini (Inactive) [ 07/Jan/13 ] |
|
I use the quite old "Maximum RPM" from RedHat and others from LDP, Fedora, sources, but none speaks about Recommends nor Suggests directives, and BTW it is confirmed in the latest rpm source code I browsed. But the good new is that installing 2 RPMs providing the same pseudo-package works at least locally on my build testing platform ... So I will submit a new/#4 patch-version and see how it goes on Maloo/Jenkins side. |
| Comment by Christopher Morrone [ 07/Jan/13 ] |
Ok, great, I retract my argument for using Suggests/Recommends/whatever in that case. Using the same Provides in both the osd-ldiskfs and osd-zfs packages will be fine. Later we can worry about trying to split the core lustre package into lustre-client and lustre-server packages, but we don't need to do that in this ticket, and this won't conflict with that effort. |
| Comment by Bruno Faccini (Inactive) [ 08/Jan/13 ] |
|
Humm patch #4 failed due to again : this occurs because with Maloo/Jenkins builds, and according to full/raw Console scan,[LDISKFS,ZFS]_ENABLED_TRUE are not set but with modules built !!... Thus seems that SPEC-file must again handle Manual+Jenkins/Maloo (lbuild?) builds differences... So, will try to also fill modules_excludes on a per [LDISKFS,ZFS]_ENABLED_TRUE basis or remove the unwanted modules and see if it fixes. |
| Comment by Bruno Faccini (Inactive) [ 09/Jan/13 ] |
|
Chris, Its sure that the inheritance is something I have to deal with, but I don't agree with you that using autoconf conditionals breaks rebuilds from src.rpm, it simply and still (I mean as before, since what is failing is auto-remove of the dist tar-ball during "make rpms"!!) can only re-do/build the same build+options ("rpmbuild -bb/bs/ba" works) it comes from, but do we have to expect more from a src.rpm and is this why you feel it is broken ?? Thus, in addition with inheritance, as you suggest, seems I have also to deal and anticipate with So what should we decide here, to hold-on until Thank's already and in advance for your help and comments. |
| Comment by Bruno Faccini (Inactive) [ 09/Jan/13 ] |
|
Guys, just reviewing all what has beeing done/said, let me have a new try/patch #5 soon ... |
| Comment by Prakash Surya (Inactive) [ 09/Jan/13 ] |
I don't think the autoconf variables will prevent the SRPM from being re-buildable in anyway. What it will do, at least in the current form of revision 4, is prevent the user from enabling/disabling different OSDs at rebuild time (i.e. you must specify at configure time and then are locked into that configuration). As Bruno brought up, this can also be done using RPM SPEC macros combined with rpmbuild --define WITH_VAR. If we use the SPEC variables, I think that'll address Chris's concerns, but there will still be conflicting work between this and |
| Comment by Christopher Morrone [ 09/Jan/13 ] |
Yes, exactly. That is why I feel it is broken. I absolutely expect more from a src.rpm. We should be able to have a single set of src.rpm files. If you need to build several versions of the same src.rpm file to cover all of the possible options, you are doing it wrong. Granted, Lustre has been doing it wrong for ten years, but I'm desperately trying to address that in Take a look at the direction I am going in patch 3421. Look at how we handled the conditional build of lustre_tests and zfs in the lustre.spec.in. That is where I believe we should be going with the build system. Note that "is_client" is gone. We would just need to add an "ldiskfs" conditional that works like the zfs one and you would be set with what you need to do the osd-zfs and osd-ldiskfs packages. That patch is not yet ready for everyone to use, but it is already working in LLNL's build farm. That (or very similar) is how we currently rewrite the broken upstream rpm that won't build for us at all. |
| Comment by Christopher Morrone [ 09/Jan/13 ] |
I would prefer to put a hold on this bug until The obvious short term fix for Intel's users is to just disable zfs support in the "official" binary builds that Intel puts on its distribution page. Folks that are advanced enough to try Lustre over ZFS this early in the process (before full support appears in a major release) are likely advanced enough to check out the source and build lustre themselves. |
| Comment by Bruno Faccini (Inactive) [ 10/Jan/13 ] |
|
Submitted patch #5 with more of the "flavor" requested by Chris, if I fully understood !!... |
| Comment by Bruno Faccini (Inactive) [ 10/Jan/13 ] |
|
Humm, patch #5 causes Lustre-Client builds failures because the osd-ldiskfs.ko module has not been built but there is an attempt to construct the lustre-osd-ldiskfs sub-package anyway, so I think I need to definitely avoid building OSDs sub-packages during a Client build ... And lustre-osd-zfs sub-package is not attempted within the even successful Lustre-Server builds including ZFS/SPL packages, so I think I need to revert and construct the lustre-osd-zfs sub-package by default during Lustre-Server builds ... Seems that actually the only way to detect+fix this (for both Manual/Make and lbuild kinds of builds) is the ugly/is_client way, sorry ... When patches from Let see what happen with patch #6. |
| Comment by Bruno Faccini (Inactive) [ 22/Jan/13 ] |
|
Oops, just found that I forgot to comment on patch #6 which seems to work according to my previous comments and expectations. |
| Comment by Bruno Faccini (Inactive) [ 18/Feb/13 ] |
|
Due to rebase required (no more "build/autoMakefile.am.toplevel" file, now directly inlined within "autoMakefile.am"), causing Maloo/Hudson builds failures, patch #9 submitted including latest changes. |
| Comment by Bruno Faccini (Inactive) [ 18/Feb/13 ] |
|
These new OSD RPMs require specifc attention from our testing tools. Separate installation, external symbols resolution, ... Andreas already created TT-1009 to address the need. |
| Comment by Andreas Dilger [ 20/Feb/13 ] |
|
Maybe I'm missing something, but how did http://review.whamcloud.com/4869 pass testing if extra work is still needed for handling lustre-osd-ldiskfs and lustre-osd-zfs RPM dependencies? I took the fact that the patch testing had passed as a sign that everything is well? |
| Comment by Bruno Faccini (Inactive) [ 20/Feb/13 ] |
|
Andreas, this is TT-1009 task !! Thus, I can only add/confirm that, after I manually xfered the Malloo build full RPM set, I verified "locally+manually" that the lustre-modules RPM installed without any msgs regarding external+unsatisfied reference, then each of the OSD RPMs installation with either LDISKFS or ZFS separate/required RPM was successful and again without any message, finally I was able to bring up Lustre and run the test-suite. But I will double-check and verify each RPM content ... And the full test logs !! |
| Comment by Bruno Faccini (Inactive) [ 20/Feb/13 ] |
|
I am puzzled with the results for the last patch/build auto-tests. Seems that initialization test is (now/recently ?) run separately from others. And as you expected it failed when others have been successful in a separate session! This caused me to 1st think that some of TT-1009 related changes may have been already applied, and to miss the initialization failures. How can this happen ? With help of a magic "rpm What I can only confirm again is that during manual install of the build : _ lustre-modules RPM complains about missing "lustre-osd" dependency |
| Comment by Yang Sheng [ 27/Feb/13 ] |
|
hi, Bruno, Looks like the commit http://review.whamcloud.com/4869 has some defeat: the osd-zfs & osd-ldiskfs both require lustre-modules. but lustre-modules also require lustre-osd. Please fix it. |
| Comment by Bruno Faccini (Inactive) [ 27/Feb/13 ] |
|
Hello Yang, |
| Comment by Peter Jones [ 04/Mar/13 ] |
|
So, can this ticket be marked as resolved if any remaining work is being tracked under other tickets? |
| Comment by Prakash Surya (Inactive) [ 04/Mar/13 ] |
|
I agree with Bruno, the circular dependency between lustre-modules and lustre-osd is not a defect. It just means they need to be installed simultaneously to satisfy the dependencies, i.e. rpm -i lustre-modules osd-zfs |
| Comment by Peter Jones [ 06/Mar/13 ] |
|
Landed for 2.4 |