Details
-
Technical task
-
Resolution: Fixed
-
Critical
-
None
-
9223372036854775807
Description
Right now when RPM packages are built, we insert into Lustre's release field the version string from the kernel against which Lustre was built. For instance:
$ rpm -qpi lustre-2.7.0-2.6.32_504.8.1.el6_lustre.x86_64.x86_64.rpm Name : lustre Version : 2.7.0 Release : 2.6.32_504.8.1.el6_lustre.x86_64
Side note: A sysadmin is going to (and have in the past) think we messed up because of the ".x86_64.x86_64" in the file name, but the reason for it is that the first one is part of the Linux kernel version string, as we can see in the Release field above. The second .x86_64 is Lustre's.
The reason for including the kernel's version string in Lustre's Release field because Lustre has traditionally been packaged to work with one, and only one, specific version of a kernel. If you have two very slightly different kernel versions "2.6.32_504.8.1.el6" and "2.6.32_504.8.2.el6", for instance, then you currently need to compile lustre against both kernels individually. While the "rpm -requires" should also list the specific required version number, because there are so many very closely compatible kernels for which we need to juggle lustre builds, it was simpler for sysadmins and developers alike to add the kernel's version string into Lustre's release field.
But fortunately, this need to build lustre for every specific kernel is a self-imposed restriction, and work is under way to lift that restriction in LU-5614.
For many years, it has been possible to compile kernel modules once and then use them with any kernel that is ABI compatible. The Linux distro mechanism that allows this is often called "weak modules". LU-5614 should bring Lustre into the year 2006 and get it working with weak modules.
Once that is done, we can finally drop the kernel version string.
This is especially fortuitous for anyone using koji as a build system, because koji makes this sort of abuse of standard packaging practice pretty close to impossible. koji is used by fedora and its cousins, and it has also been adopted by LLNL for its RHEL-based TOSS distribution.
Now you are definitely arguing that one should never use weak modules. I don't even see why it would be acceptable use them on clients if you can't trust the semantics to stay constant with the same symbol version. If you can't trust them, you can't trust them.
As to APIs that are not part of the ABI...in the kernel I don't believe that there is any such distinction. A linux distro vendor may choose to advertise a subset of the kernel ABI that it considers stable and safe. Red Hat has a kernel symbol whitelist. Suse, in contrast, does not.
Yes, the osd-ldiskfs package has a long list of RHEL whitelist violations. Otherwise, the usage is pretty small. In a recent random build for master on CentOS 7.2, I see only the following non-osd-ldiskfs related off-whitelist symbols (in other words, osd-ldiskfs uses many off-whitelist symbols that I am not listing here):
Some of those Red Hat might be amenable to adding to the whitelist. Some maybe we can choose a different symbol. Some we might not care and decide the level of risk is completely acceptable.
I don't see an issue with weak-modules use with zfs. Sure, maybe the way ldiskfs is currently produced makes it more vulnerable. If you want to add extra restrictions and high barriers to usage to ldiskfs then, speaking as an all-zfs house, I don't have too much of a concern about that.
Some concern though...we do have some labs that might still be using ldiskfs from TOSS's lustre.
Anyhow, I think James is right about this getting off topic for this ticket.