[LU-9731] kmods need to be limited to EL minor release kernel Created: 03/Jul/17 Updated: 29/Nov/18 Resolved: 09/Aug/17 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.10.0 |
| Fix Version/s: | Lustre 2.10.1, Lustre 2.11.0 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Brian Murrell (Inactive) | Assignee: | Brian Murrell (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||||||
| Severity: | 3 | ||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||
| Description |
|
Now that kmods are being produced, they need to be limited to the kernel of the RHEL minor release they were created for. This is because RHEL kernels have a kabi "whitelist". That means, that only a subset of kernel interfaces are guaranteed to be stable by RHEL's kabi and it's only those interfaces that are put into a kernel's list of "kernel(...) = ..." Provides: and a kmod's Requires:. This means that a kmod produced for RHEL 7.3 will look compatible with a RHEL 7.4 kernel (because the whitelisted kabi will not have changed across those releases) even though it is not because the Lustre kmods use interfaces that are not on the whitelist and can change from one minor release to another, even though the whitelisted kabi has not changed. While Red Hat guarantees that these non-whitelisted interfaces will not change within a minor release (i.e. 7.3 to 7.4) there is no such guarantee across minor releases and in practice they probably almost always change across minor releases so a kmod using non-whitelisted interfaces needs to limit itself to the kernel provided in a RHEL minor release. For a kmod produced on a RHEL 7.3 kernel that means adding a Requires: kernel >= 3.10.0-514, kernel <= 3.10.0-514 to the kmod RPM. If this is not done, the kmod will install on to a RHEL 7.4 machine, which has an incompatible kernel by default and a compatible kernel (kernel-3.10.0-514*) will not be installed even if it's available in a Yum repo, even though it should be. |
| Comments |
| Comment by Peter Jones [ 03/Jul/17 ] |
|
Minh Can you please advise on this one? Thanks Peter |
| Comment by Andreas Dilger [ 04/Jul/17 ] |
|
Brian, is there a ticket which lists the non-white listed symbols that we are using in Lustre? It would be useful to know this and see if we can stop using those symbols, or ask RH to add them to the whitelist? |
| Comment by Brian Murrell (Inactive) [ 04/Jul/17 ] |
|
If you install kernel-abi-whitelists before you build, it will tell you which interfaces you are using that are not on the whitelist such as: ******************************************************************************** *********************** KERNEL ABI COMPATIBILITY WARNING *********************** ******************************************************************************** The following kernel symbols are not guaranteed to remain compatible with future kernel updates to this RHEL release: __fentry__ __stack_chk_fail Red Hat recommends that you consider using only official kernel ABI symbols where possible. Requests for additions to the kernel ABI can be filed with your partner or customer representative (component: driver-update-program). So surprisingly only a couple. It would probably be useful for us if once we have eliminated the use of those interfaces, that we fail the build if a new use of non-whitelisted interfaces is discovered. I'm not sure if RH's kernel RPM build has such an option though. I suppose this would be useful outside of RPM building too though, so that developers that introduce non-whitelisted interface use see the error when they run make. I guess somebody would have to figure out how to utilise kernel-abi-whitelists for that. |
| Comment by Brian Murrell (Inactive) [ 04/Jul/17 ] |
|
I have tried an experiment where I have added: Requires: kernel < 3.10.0-515, kernel >= 3.10.0-514 to the kmp-lustre.preamble in the lustre-client build and it has the desired effect. Of course, not using non-whitelisted interfaces is the more desired approach because it would mean that our kmod would work across even the minor (i.e. 7.2->7.3->7.4->7.5, etc.) upgrades, but until that can happen, the above addition to kmp-lustre.preamble seems a suitable stop-gap. |
| Comment by Andreas Dilger [ 04/Jul/17 ] |
|
I don't see how/where these functions are being used? I searched for _ftrace and _stack_chk_fail and there are no direct users in our code. I looked at the kernel sources and couldn't even find anywhere that we indirectly used those functions. Is it possible that you can find out how we are using these functions? |
| Comment by Brian Murrell (Inactive) [ 04/Jul/17 ] |
|
Hrm. I don't think I'm knowledgeable enough in this area to make quick work of this. I doubt I'd find any use in our sources that you didn't find. The only thing grep found me was a bunch of references such as: static const struct modversion_info ____versions[]
__used
__attribute__((section("__versions"))) = {
...
{ 0xf0fdf6cb, "__stack_chk_fail" },
...
};
but I'm sure you found those also and have much better understanding of what they are and what they do that I would have. Perhaps mdiep can open a ticket with Red Hat to understand why their tool is declaring our module using interfaces that it's not. Now after having said all of this, I have a recollection of running the weak-updates tool for the -514 kernel modules on the -681 (RHEL 7.4) kernel and seeing a lot more incompatibilities than just those two interfaces. So perhaps more investigation is needed here one way (removing all use of non-whitelisted interfaces) or the other (adding a Requires: for the minor kernel version as I described above). But without a solution one way or the other, users are left with a kmod RPM on their machine that has no kernel version information associated with it and so no clue as to which kernel to even install (never-mind boot) to get the kmod working. |
| Comment by Brian Murrell (Inactive) [ 06/Jul/17 ] |
Because without it, when one installs a/the kmod(s) for lustre on a system that has an incompatible kernel (i.e. RHEL 7.4 currently, but it could be 7.2 or 7.1, etc.), a compatible kernel is not installed and there is no indication one way or another which kernel the user needs to install to get the kmod(s) functional. So the user ends up installing kernels by trial and error until he finds one that the modules will load into. For the human use-case, this is frustrating at best, but for the managed-installer (i.e. IML, but any configuration management system such as chef or puppet, or even just scripting that a user might create) case, it becomes quite impossible other than brute-force searching which is very time consuming.
kernel < 3.10.0-515, kernel >= 3.10.0-514 to be exact.
If I understand what Red Hat is saying about their kABI, yes. As I understand it, their kABI guarantee says that whitelisted interfaces are stable across even (RHEL) minor upgrades so that if a kernel module only uses whitelisted interfaces, it will work on RHEL 7.1, 7.2, 7.3, 7.4, etc. This is why adilger's comment is so important and (IMHO) is very worthy of investigation as it would free us up even more from having to chase kernels even across RHEL minor version upgrades. However, even in cases where non-whitelisted interfaces are being used, which is the case currently with the lustre-client (even though it does not appear to be – which is what needs investigating to achieve portability across RHEL minor upgrades), RHEL guarantee that within a minor release even non-whitelisted interfaces will not change. In kernel version nomenclature, that first digit after the kernel version (the -514 for the RHEL 7.3 kernel) is the demarcation of the non-whitelisted kABI interface guarantee and so any kernel built on a 3.10.0-514.* kernel should work on any other 3.10.0-514.* kernel. |
| Comment by Gerrit Updater [ 06/Jul/17 ] |
|
Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/27958 |
| Comment by Brian Murrell (Inactive) [ 10/Jul/17 ] |
|
adilger You might be interested in following the Red Hat bug I opened about this. In that ticket they have identified that at least two of the interfaces the client is using that are (I am presuming – pending confirmation) not on the kABI whitelist:
They do confirm that the hash for those two interfaces did change from RHEL 7.3 to RHEL 7.4 which, AFAIU, means they cannot be whitelisted interfaces. |
| Comment by Brian Murrell (Inactive) [ 10/Jul/17 ] |
|
mdiep: more from the Red Hat ticket referred to previously:
If I'm understanding that correctly, that solution would be preferred to the solution that I proposed of adding RHEL minor release kernel versions as a Requires:. |
| Comment by Minh Diep [ 12/Jul/17 ] |
|
isn't this what we already doing? Processing files: kmod-lustre-client-2.10.50-1.el7.centos.x86_64 Executing(%doc): /bin/sh -e /tmp/rpmbuild-lustre-root-WQmEMIgm/TMP/rpm-tmp.OuFNGP + umask 022 + cd /tmp/rpmbuild-lustre-root-WQmEMIgm/BUILD + cd lustre-2.10.50 + DOCDIR=/tmp/rpmbuild-lustre-root-WQmEMIgm/BUILDROOT/lustre-2.10.50-1.x86_64/usr/share/doc/kmod-lustre-client-2.10.50 + export DOCDIR + /usr/bin/mkdir -p /tmp/rpmbuild-lustre-root-WQmEMIgm/BUILDROOT/lustre-2.10.50-1.x86_64/usr/share/doc/kmod-lustre-client-2.10.50 + cp -pr COPYING /tmp/rpmbuild-lustre-root-WQmEMIgm/BUILDROOT/lustre-2.10.50-1.x86_64/usr/share/doc/kmod-lustre-client-2.10.50 + cp -pr ChangeLog-lustre /tmp/rpmbuild-lustre-root-WQmEMIgm/BUILDROOT/lustre-2.10.50-1.x86_64/usr/share/doc/kmod-lustre-client-2.10.50 + cp -pr ChangeLog-lnet /tmp/rpmbuild-lustre-root-WQmEMIgm/BUILDROOT/lustre-2.10.50-1.x86_64/usr/share/doc/kmod-lustre-client-2.10.50 + exit 0 Finding Provides: /usr/lib/rpm/redhat/find-provides Finding Requires(interp): Finding Requires(rpmlib): Finding Requires(verify): Finding Requires(pre): Finding Requires(post): Finding Requires(preun): Finding Requires(postun): Finding Requires(pretrans): Finding Requires(posttrans): Finding Requires: /usr/lib/rpm/redhat/find-requires ******************************************************************************** *********************** KERNEL ABI COMPATIBILITY WARNING *********************** ******************************************************************************** The following kernel symbols are not guaranteed to remain compatible with future kernel updates to this RHEL release: PDE_DATA __fentry__ __stack_chk_fail remove_wait_queue seq_lseek seq_read Red Hat recommends that you consider using only official kernel ABI symbols where possible. Requests for additions to the kernel ABI can be filed with your partner or customer representative (component: driver-update-program). /usr/lib/rpm/redhat/find-requires calls /usr/lib/rpm/redhat/find-requires.ksyms [root@onyx-28vm1 redhat]# tail -10 /usr/lib/rpm/redhat/find-requires
then
unset is_kmod;
break;
fi
done
[ -x /usr/lib/rpm/redhat/find-requires.ksyms ] && [ "$is_kmod" ] &&
printf "%s\n" "${filelist[@]}" | /usr/lib/rpm/redhat/find-requires.ksyms
|
| Comment by Brian Murrell (Inactive) [ 13/Jul/17 ] |
|
Are you sure all of the conditions for running /usr/lib/rpm/redhat/find-requires.ksyms at the end of /usr/lib/rpm/redhat/find-requires.ksyms are being met? I.e. does /usr/lib/rpm/redhat/find-requires.ksyms exist and $is_kmod" evaluating to true? |
| Comment by Minh Diep [ 13/Jul/17 ] |
|
yes, I have injected an echo to print out $is_kmod; and verified is_kmod=1 |
| Comment by Brian Murrell (Inactive) [ 13/Jul/17 ] |
|
And /usr/lib/rpm/redhat/find-requires.ksyms does exist of course, yes? Just want to make double sure before I go back to Red Hat claiming it's not working as they advertise. |
| Comment by Minh Diep [ 13/Jul/17 ] |
|
yes it does |
| Comment by Dmitry Eremin (Inactive) [ 14/Jul/17 ] |
|
I don't think we need this restriction at all. This is redundant because of modules installs into directory with specific kernel version it's build for. So, if other version of kernel is loaded those modules will not be used. But when this packet installed on system the script "/sbin/weak-modules --add-modules" is used to propogate those modules into other versions of kernel which are installed on system. So, this script is responsible for checking compatibilty with other kernel versions (it create symlinks in compatible kernel's weak-updates/ directory). It's not used "Required:" field from the package. Therefore I see no reason to add this field in our package. |
| Comment by Brian Murrell (Inactive) [ 14/Jul/17 ] |
|
dmiterThe reason this is needed is because if you install the kmod-lustre-client RPM on a system where there is no (i.e. weak-updates) compatible kernel at all (i.e. on a RHEL 7.4 system) then you end up with a set of modules that have no matching kernel so no way to even boot to a kernel that will use them. So, installing the kmod should result in a compatible kernel being present. If that's the kernel that is already installed, then all is fine, but if there is no matching kernel already installed, the Yum transaction doing the kmod installation should install an appropriate kernel also. The problem is that currently, the kmod-lustre-client RPM doesn't have enough information in it to make sure a compatible kernel is installed. This may be a bug in the Red Hat kmod building tools as is being explored in a Red Hat ticket or it may not be. That is still to be determined. Even if it is a bug, it will likely be some time before we get a fix and we need to work-around this issue in the meanwhile. Of course, the other (probably not at all short-term) resolution as has also been discussed in this ticket is to get the client kABI compatible. But that is also not likely going to happen in the time-frame that this issue needs to be either resolved or worked-around in. |
| Comment by Dmitry Eremin (Inactive) [ 14/Jul/17 ] |
|
I don't see the issue with be able to install the package on any system even without appropriate kernel installed. The package will be installed but not used. Why this is an issue? As I mentoined before the script /sbin/weak-modules is responsible for propogation those modules into compatible kernels only. So, how this package will affect incompatioble kernel if those modules will not be loaded into this kernel at all? |
| Comment by Brian Murrell (Inactive) [ 14/Jul/17 ] |
|
Why would I install a kernel modules package that I didn't want to use? If I install kmod-lustre-client surely I want to use the Lustre client on that node and thus I need a kernel that can use the modules. |
| Comment by Dmitry Eremin (Inactive) [ 14/Jul/17 ] |
|
The customer can install our kernel modules assume those are compatible with the latest kernel currently installed. We should not strict user with particular version of kernel we build for. This was main reason of introducing weak symbols support.
P.S. what account I can use to see the Red Hat ticket you mentoined?
|
| Comment by Brian Murrell (Inactive) [ 17/Jul/17 ] |
|
dmiter: You need to use an @intel.com account on RH's Bugzilla in order to see bugs filed by Intel folks. If you look at comment #23 in the ticket I have opened in their Bugzilla, you can see that the behaviour that I've described in this ticket is the actual expected behaviour they intend for kmods and that it's a bug in their kmod packaging scripts that not enough "Requires:" are added to a kmod so that a matching kernel can be found to be installed with the kmod. |
| Comment by Gerrit Updater [ 17/Jul/17 ] |
|
Brian J. Murrell (brian.murrell@intel.com) uploaded a new patch: https://review.whamcloud.com/28066 |
| Comment by Gerrit Updater [ 24/Jul/17 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/28066/ |
| Comment by Bob Glossman (Inactive) [ 24/Jul/17 ] |
|
I think this change mangles the Required strings on SLES. For example in build of current master for sles12sp2 where the kernel version is 4.4.59-92.17 the Requires in the built lustre-client-kmp-default package has: while the kernel-default for the pristine unpatched upstream kernel has a Provides of: kernel = 4.4.59-92.17 Don't see how these can properly match for the purposes of install dependencies. |
| Comment by Brian Murrell (Inactive) [ 25/Jul/17 ] |
|
Interesting that the sles12sp2 test run didn't fail to install the client RPMs because of this. Given that this patch is specifically to work around a bug in RHEL's kmod building macro, it should probably be limited to RHEL kmod building only. I'll push a patch for that. |
| Comment by Gerrit Updater [ 25/Jul/17 ] |
|
Brian J. Murrell (brian.murrell@intel.com) uploaded a new patch: https://review.whamcloud.com/28202 |
| Comment by Bob Glossman (Inactive) [ 25/Jul/17 ] |
Unless called for in Test-Parameters I don't think any SLES test runs are done routinely. For sure not in review tests. While I think this additional mod will fix the problem I wonder if it might be more well structured to push the lines putting in extra Requires into a function in lbuild-rhel and then in lbuild call that function if it exists. That way the extra Requires would be in all RHEL builds, not just RHEL7 and would still be left out of non-RHEL builds. |
| Comment by Brian Murrell (Inactive) [ 25/Jul/17 ] |
This problem has only been confirmed on EL 7 (so far) so applying it to EL 6 might be inappropriate. Fencing the fix off into the distro-specific lbuild file might not be horrible though. |
| Comment by Gerrit Updater [ 09/Aug/17 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/28202/ |
| Comment by Peter Jones [ 09/Aug/17 ] |
|
Second patch landed for 2.11 |
| Comment by Gerrit Updater [ 09/Aug/17 ] |
|
Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/28438 |
| Comment by Gerrit Updater [ 16/Aug/17 ] |
|
John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/28438/ |