Details

    • Technical task
    • Resolution: Fixed
    • Critical
    • Lustre 2.9.0
    • None
    • 9223372036854775807

    Description

      Right now when RPM packages are built, we insert into Lustre's release field the version string from the kernel against which Lustre was built. For instance:

      $ rpm -qpi lustre-2.7.0-2.6.32_504.8.1.el6_lustre.x86_64.x86_64.rpm 
      Name        : lustre
      Version     : 2.7.0
      Release     : 2.6.32_504.8.1.el6_lustre.x86_64
      

      Side note: A sysadmin is going to (and have in the past) think we messed up because of the ".x86_64.x86_64" in the file name, but the reason for it is that the first one is part of the Linux kernel version string, as we can see in the Release field above. The second .x86_64 is Lustre's.

      The reason for including the kernel's version string in Lustre's Release field because Lustre has traditionally been packaged to work with one, and only one, specific version of a kernel. If you have two very slightly different kernel versions "2.6.32_504.8.1.el6" and "2.6.32_504.8.2.el6", for instance, then you currently need to compile lustre against both kernels individually. While the "rpm -requires" should also list the specific required version number, because there are so many very closely compatible kernels for which we need to juggle lustre builds, it was simpler for sysadmins and developers alike to add the kernel's version string into Lustre's release field.

      But fortunately, this need to build lustre for every specific kernel is a self-imposed restriction, and work is under way to lift that restriction in LU-5614.

      For many years, it has been possible to compile kernel modules once and then use them with any kernel that is ABI compatible. The Linux distro mechanism that allows this is often called "weak modules". LU-5614 should bring Lustre into the year 2006 and get it working with weak modules.

      Once that is done, we can finally drop the kernel version string.

      This is especially fortuitous for anyone using koji as a build system, because koji makes this sort of abuse of standard packaging practice pretty close to impossible. koji is used by fedora and its cousins, and it has also been adopted by LLNL for its RHEL-based TOSS distribution.

      Attachments

        Issue Links

          Activity

            [LU-7643] Remove kernel version string from Lustre release field

            reopening to fix fields to show up in 2.9.0 changelog

            jgmitter Joseph Gmitter (Inactive) added a comment - reopening to fix fields to show up in 2.9.0 changelog

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/19954/
            Subject: LU-7643 build: Remove Linux version string from RPM release field
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 28c17d40e5a597a3d2f10f1f43039ef92425954e

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/19954/ Subject: LU-7643 build: Remove Linux version string from RPM release field Project: fs/lustre-release Branch: master Current Patch Set: Commit: 28c17d40e5a597a3d2f10f1f43039ef92425954e

            I'm not really on board with adding a dependency that won't function correctly.

            morrone Christopher Morrone (Inactive) added a comment - I'm not really on board with adding a dependency that won't function correctly.

            even if having a proper Requires permits (incorrect) weak-updates links of osd-ldiskfs in other not really matching kernels that happen to be installed, having the Requires at least gives better tracking of exactly which kernel version the particular osd-ldiskfs was derived from and is intended for.

            While I would really prefer enforcement I want to at least have visibility into the dependency.

            bogl Bob Glossman (Inactive) added a comment - even if having a proper Requires permits (incorrect) weak-updates links of osd-ldiskfs in other not really matching kernels that happen to be installed, having the Requires at least gives better tracking of exactly which kernel version the particular osd-ldiskfs was derived from and is intended for. While I would really prefer enforcement I want to at least have visibility into the dependency.

            Yeah, but as we got to later in the discussion, that isn't very useful. I think I have to retract that offer.

            Just having that one kernel installed doesn't stop the osd-ldiskfs package's modules from being used in any of the other kernels that are also installed. The weak-updates system will still symlink the modules into all kernels with compatible symbols.

            morrone Christopher Morrone (Inactive) added a comment - Yeah, but as we got to later in the discussion, that isn't very useful. I think I have to retract that offer. Just having that one kernel installed doesn't stop the osd-ldiskfs package's modules from being used in any of the other kernels that are also installed. The weak-updates system will still symlink the modules into all kernels with compatible symbols.

            Christopher,
            You are probably correct. I may have been thinking of some other change.

            If you go ahead and add a kernel version Requires for osd-ldiskfs as discussed in earlier comments I will be satisfied with that.

            bogl Bob Glossman (Inactive) added a comment - Christopher, You are probably correct. I may have been thinking of some other change. If you go ahead and add a kernel version Requires for osd-ldiskfs as discussed in earlier comments I will be satisfied with that.

            Bob, I think you may be thinking of another change. Change 19954 does not delete any Requires or Provides.

            morrone Christopher Morrone (Inactive) added a comment - Bob, I think you may be thinking of another change. Change 19954 does not delete any Requires or Provides.

            That's just it. These changes also delete Requires and Provides in packages too. Can't figure out dependencies using rpm queries either.

            bogl Bob Glossman (Inactive) added a comment - That's just it. These changes also delete Requires and Provides in packages too. Can't figure out dependencies using rpm queries either.
            simmonsja James A Simmons added a comment - - edited

            So the question related to the patch for this ticket is does having the kernel name string in the rpm provide any gain. I would say no. Not landing this patch will not change the kernel version dependency issues. Even when we resolve the dependency issues does having the kernel version string in the release field improve anything. Again I would say no. You can figure out kernel dependency using rpm queries instead.

            simmonsja James A Simmons added a comment - - edited So the question related to the patch for this ticket is does having the kernel name string in the rpm provide any gain. I would say no. Not landing this patch will not change the kernel version dependency issues. Even when we resolve the dependency issues does having the kernel version string in the release field improve anything. Again I would say no. You can figure out kernel dependency using rpm queries instead.

            Now you are definitely arguing that one should never use weak modules. I don't even see why it would be acceptable use them on clients if you can't trust the semantics to stay constant with the same symbol version. If you can't trust them, you can't trust them.

            As to APIs that are not part of the ABI...in the kernel I don't believe that there is any such distinction. A linux distro vendor may choose to advertise a subset of the kernel ABI that it considers stable and safe. Red Hat has a kernel symbol whitelist. Suse, in contrast, does not.

            Yes, the osd-ldiskfs package has a long list of RHEL whitelist violations. Otherwise, the usage is pretty small. In a recent random build for master on CentOS 7.2, I see only the following non-osd-ldiskfs related off-whitelist symbols (in other words, osd-ldiskfs uses many off-whitelist symbols that I am not listing here):

            PDE_DATA
            __fentry__
            __free_pages
            __stack_chk_fail
            kernel_stack
            kstrtoull
            seq_lseek
            seq_open
            seq_read
            remove_wait_queue
            

            Some of those Red Hat might be amenable to adding to the whitelist. Some maybe we can choose a different symbol. Some we might not care and decide the level of risk is completely acceptable.

            I don't see an issue with weak-modules use with zfs. Sure, maybe the way ldiskfs is currently produced makes it more vulnerable. If you want to add extra restrictions and high barriers to usage to ldiskfs then, speaking as an all-zfs house, I don't have too much of a concern about that. Some concern though...we do have some labs that might still be using ldiskfs from TOSS's lustre.

            Anyhow, I think James is right about this getting off topic for this ticket.

            morrone Christopher Morrone (Inactive) added a comment - - edited Now you are definitely arguing that one should never use weak modules. I don't even see why it would be acceptable use them on clients if you can't trust the semantics to stay constant with the same symbol version. If you can't trust them, you can't trust them. As to APIs that are not part of the ABI...in the kernel I don't believe that there is any such distinction. A linux distro vendor may choose to advertise a subset of the kernel ABI that it considers stable and safe. Red Hat has a kernel symbol whitelist. Suse, in contrast, does not. Yes, the osd-ldiskfs package has a long list of RHEL whitelist violations. Otherwise, the usage is pretty small. In a recent random build for master on CentOS 7.2, I see only the following non-osd-ldiskfs related off-whitelist symbols (in other words, osd-ldiskfs uses many off-whitelist symbols that I am not listing here): PDE_DATA __fentry__ __free_pages __stack_chk_fail kernel_stack kstrtoull seq_lseek seq_open seq_read remove_wait_queue Some of those Red Hat might be amenable to adding to the whitelist. Some maybe we can choose a different symbol. Some we might not care and decide the level of risk is completely acceptable. I don't see an issue with weak-modules use with zfs. Sure, maybe the way ldiskfs is currently produced makes it more vulnerable. If you want to add extra restrictions and high barriers to usage to ldiskfs then, speaking as an all-zfs house, I don't have too much of a concern about that. Some concern though...we do have some labs that might still be using ldiskfs from TOSS's lustre. Anyhow, I think James is right about this getting off topic for this ticket.

            James,
            I think what you are saying is correct, but it isn't just the VFS api that must stay stable. It is all the internal kernel APIs that aren't part of the well defined ABI that must stay stable. lustre kernel modules both ldiskfs and not use lots of calls to symbols that are EXPORTs, but may or may not be part of the ABI and can and do occasionally change from time to time within the same major linux version from an upstream vendor.

            bogl Bob Glossman (Inactive) added a comment - James, I think what you are saying is correct, but it isn't just the VFS api that must stay stable. It is all the internal kernel APIs that aren't part of the well defined ABI that must stay stable. lustre kernel modules both ldiskfs and not use lots of calls to symbols that are EXPORTs, but may or may not be part of the ABI and can and do occasionally change from time to time within the same major linux version from an upstream vendor.

            People

              mdiep Minh Diep
              morrone Christopher Morrone (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: