Details

    • Technical task
    • Resolution: Fixed
    • Critical
    • Lustre 2.9.0
    • None
    • 9223372036854775807

    Description

      Right now when RPM packages are built, we insert into Lustre's release field the version string from the kernel against which Lustre was built. For instance:

      $ rpm -qpi lustre-2.7.0-2.6.32_504.8.1.el6_lustre.x86_64.x86_64.rpm 
      Name        : lustre
      Version     : 2.7.0
      Release     : 2.6.32_504.8.1.el6_lustre.x86_64
      

      Side note: A sysadmin is going to (and have in the past) think we messed up because of the ".x86_64.x86_64" in the file name, but the reason for it is that the first one is part of the Linux kernel version string, as we can see in the Release field above. The second .x86_64 is Lustre's.

      The reason for including the kernel's version string in Lustre's Release field because Lustre has traditionally been packaged to work with one, and only one, specific version of a kernel. If you have two very slightly different kernel versions "2.6.32_504.8.1.el6" and "2.6.32_504.8.2.el6", for instance, then you currently need to compile lustre against both kernels individually. While the "rpm -requires" should also list the specific required version number, because there are so many very closely compatible kernels for which we need to juggle lustre builds, it was simpler for sysadmins and developers alike to add the kernel's version string into Lustre's release field.

      But fortunately, this need to build lustre for every specific kernel is a self-imposed restriction, and work is under way to lift that restriction in LU-5614.

      For many years, it has been possible to compile kernel modules once and then use them with any kernel that is ABI compatible. The Linux distro mechanism that allows this is often called "weak modules". LU-5614 should bring Lustre into the year 2006 and get it working with weak modules.

      Once that is done, we can finally drop the kernel version string.

      This is especially fortuitous for anyone using koji as a build system, because koji makes this sort of abuse of standard packaging practice pretty close to impossible. koji is used by fedora and its cousins, and it has also been adopted by LLNL for its RHEL-based TOSS distribution.

      Attachments

        Issue Links

          Activity

            [LU-7643] Remove kernel version string from Lustre release field
            bogl Bob Glossman (Inactive) added a comment - - edited

            It isn't independent. Without the changes proposed here it couldn't happen. With the changes proposed here it can happen. That makes it dependent on exactly this topic.

            I may agree that it could be covered as a separate ticket from this one in spite of that. Let me think about it a bit.

            bogl Bob Glossman (Inactive) added a comment - - edited It isn't independent. Without the changes proposed here it couldn't happen. With the changes proposed here it can happen. That makes it dependent on exactly this topic. I may agree that it could be covered as a separate ticket from this one in spite of that. Let me think about it a bit.

            With osd-ldiskfs exporting only a generic "lustre-osd" Provides and other lustre packages having only that as Requires, there is now the possibility of taking an osd-ldiskfs from one build and installing it with lustre rpms from a different build.

            If that issue exists it is independent of the topic in this ticket. You can open a new ticket for that if you like.

            morrone Christopher Morrone (Inactive) added a comment - With osd-ldiskfs exporting only a generic "lustre-osd" Provides and other lustre packages having only that as Requires, there is now the possibility of taking an osd-ldiskfs from one build and installing it with lustre rpms from a different build. If that issue exists it is independent of the topic in this ticket. You can open a new ticket for that if you like.

            Your outline makes sense to me, but I'm a little worried about something unexpected creeping in if people start trying to mix and match pieces from different builds. Up until now with all lustre rpms tied to a specific kernel they all had to come from the same build. I'm not at all arguing that it was correct or even good that they were all tied that way, but it did have the beneficial side effect of blocking mix & match.

            With osd-ldiskfs exporting only a generic "lustre-osd" Provides and other lustre packages having only that as Requires, there is now the possibility of taking an osd-ldiskfs from one build and installing it with lustre rpms from a different build. While I don't know that this would cause problems I'm very much worried that it might.

            bogl Bob Glossman (Inactive) added a comment - Your outline makes sense to me, but I'm a little worried about something unexpected creeping in if people start trying to mix and match pieces from different builds. Up until now with all lustre rpms tied to a specific kernel they all had to come from the same build. I'm not at all arguing that it was correct or even good that they were all tied that way, but it did have the beneficial side effect of blocking mix & match. With osd-ldiskfs exporting only a generic "lustre-osd" Provides and other lustre packages having only that as Requires, there is now the possibility of taking an osd-ldiskfs from one build and installing it with lustre rpms from a different build. While I don't know that this would cause problems I'm very much worried that it might.

            Not sure how that is properly carried thru with the Provides exported by osd-ldiskfs or the Requires in other lustre rpms.

            I'm not entirely sure what you mean, but I'll take a stab at explaining. If the various other parts are working now, they will continue to work. The "lustre-osd" requirement is currently supplied by either the osd-zfs or osd-ldiskfs packages. One or both of them must be installed to install in order to install the main "lustre" package. With the proposed new specific kernel requirement in the osd-ldiskfs package, if the osd-zfs package is selected, everthing will install fine with any kernel that supplies the required versions of various symbols. If osd-ldiskfs is selected, it can only be installed if the correct kernel is installed.

            Of course, multiple kernels can be installed at the same time, so there is no reason that the admin needs to boot the required kernel. Only that it be installed. But that can already happen now with the packages that contain the kernel version string.

            If you think that too is too much of a problem, you are basically arguing that weak modules can't ever be used with lustre.

            morrone Christopher Morrone (Inactive) added a comment - - edited Not sure how that is properly carried thru with the Provides exported by osd-ldiskfs or the Requires in other lustre rpms. I'm not entirely sure what you mean, but I'll take a stab at explaining. If the various other parts are working now, they will continue to work. The "lustre-osd" requirement is currently supplied by either the osd-zfs or osd-ldiskfs packages. One or both of them must be installed to install in order to install the main "lustre" package. With the proposed new specific kernel requirement in the osd-ldiskfs package, if the osd-zfs package is selected, everthing will install fine with any kernel that supplies the required versions of various symbols. If osd-ldiskfs is selected, it can only be installed if the correct kernel is installed. Of course, multiple kernels can be installed at the same time, so there is no reason that the admin needs to boot the required kernel. Only that it be installed. But that can already happen now with the packages that contain the kernel version string. If you think that too is too much of a problem, you are basically arguing that weak modules can't ever be used with lustre.

            I think it would be an acceptable solution if only the osd-ldiskfs rpm has a Requires for the particular and specific kernel version it was built on. That would directly enforce and tie it to the upstream ext4 version source it was built from. This is only my opinion. I think we need buy in from all concerned. Would really like to see comment from Minh, Andreas, Dmitry, or other experts.

            Not sure how that is properly carried thru with the Provides exported by osd-ldiskfs or the Requires in other lustre rpms.

            bogl Bob Glossman (Inactive) added a comment - I think it would be an acceptable solution if only the osd-ldiskfs rpm has a Requires for the particular and specific kernel version it was built on. That would directly enforce and tie it to the upstream ext4 version source it was built from. This is only my opinion. I think we need buy in from all concerned. Would really like to see comment from Minh, Andreas, Dmitry, or other experts. Not sure how that is properly carried thru with the Provides exported by osd-ldiskfs or the Requires in other lustre rpms.

            Compatibility problems that don't change the ABI won't necessarily be addressed by freshly applying patches and recompiling either. We have had problems in the past where ext4 internal semantics changed without changing the API and without breaking ldiskfs patch application. If you care that much, and since those problems have actually hit, you should probably stop using the ldiskfs-as-patches approach altogether.

            At least with the ldiskfs module fully compiled in the past, we eliminate the problem of overlooking ext4 internal semantic changes for a single version of the packages. The ldiskfs module is going to be in a known good frozen state. It will only be when larger semantic changes happen between the larger OS and filesystems that a recompile will be needed. And hopefully those types of changes are as rare as the issues inherent to the ldiskfs-as-patches approach within a stable OS kernel release series.

            But if folks still insist on ldiskfs being tied to a single kernel, we can certainly do that and while also removing the kernel string from the lustre packages. The kernel string does not belong in the lustre package name. That is incorrect packaging and needs to stop.

            The proper way would be to add a Requires to only the osd-ldiskfs subpackage. Is that going to be required to land this patch? It will probably cause us some trouble, but I'm willing to compromise and try adding that.

            morrone Christopher Morrone (Inactive) added a comment - - edited Compatibility problems that don't change the ABI won't necessarily be addressed by freshly applying patches and recompiling either. We have had problems in the past where ext4 internal semantics changed without changing the API and without breaking ldiskfs patch application. If you care that much, and since those problems have actually hit, you should probably stop using the ldiskfs-as-patches approach altogether. At least with the ldiskfs module fully compiled in the past, we eliminate the problem of overlooking ext4 internal semantic changes for a single version of the packages. The ldiskfs module is going to be in a known good frozen state. It will only be when larger semantic changes happen between the larger OS and filesystems that a recompile will be needed. And hopefully those types of changes are as rare as the issues inherent to the ldiskfs-as-patches approach within a stable OS kernel release series. But if folks still insist on ldiskfs being tied to a single kernel, we can certainly do that and while also removing the kernel string from the lustre packages. The kernel string does not belong in the lustre package name. That is incorrect packaging and needs to stop. The proper way would be to add a Requires to only the osd-ldiskfs subpackage. Is that going to be required to land this patch? It will probably cause us some trouble, but I'm willing to compromise and try adding that.
            bogl Bob Glossman (Inactive) added a comment - - edited

            I strongly disagree. LU-684 doesn't eliminate the need for a lustre build with ldiskfs to be tightly tied to a specific linux kernel version. As long as we build ldiskfs by patching a particular upstream ext4 on the fly during a lustre server build we are subject to variations in the upstream ext4 that aren't represented by just having a compatible advertised kernel ABI that is constant. upstream ext4 changes occur unpredictably and in an unscheduled way in RHEL and SLES kernel updates.

            bogl Bob Glossman (Inactive) added a comment - - edited I strongly disagree. LU-684 doesn't eliminate the need for a lustre build with ldiskfs to be tightly tied to a specific linux kernel version. As long as we build ldiskfs by patching a particular upstream ext4 on the fly during a lustre server build we are subject to variations in the upstream ext4 that aren't represented by just having a compatible advertised kernel ABI that is constant. upstream ext4 changes occur unpredictably and in an unscheduled way in RHEL and SLES kernel updates.

            Lustre server rpms do not require a patched kernel.

            The only thing that needs a patched kernel at this point is ldiskfs testing. That should be fixed soon in LU-684. If for some reason that fails, we can take the approach suggested by James Simmons in LU-20.

            morrone Christopher Morrone (Inactive) added a comment - Lustre server rpms do not require a patched kernel. The only thing that needs a patched kernel at this point is ldiskfs testing. That should be fixed soon in LU-684 . If for some reason that fails, we can take the approach suggested by James Simmons in LU-20 .
            mdiep Minh Diep added a comment -

            Chris,

            while it's fine on client rpm where we can move to different kernel version, it's difficult for lustre server rpm to move to different kernel version since it requires patched kernel. Without kernel version string in the name, it's not easy to find out the kernel it's built on.

            Thanks
            -Minh

            mdiep Minh Diep added a comment - Chris, while it's fine on client rpm where we can move to different kernel version, it's difficult for lustre server rpm to move to different kernel version since it requires patched kernel. Without kernel version string in the name, it's not easy to find out the kernel it's built on. Thanks -Minh

            LU-5614 is done. Patch 19954 for this ticket was already based on that ticket's patch, so no rebase should be necessary. We just need to get the review process under way on this one.

            morrone Christopher Morrone (Inactive) added a comment - LU-5614 is done. Patch 19954 for this ticket was already based on that ticket's patch, so no rebase should be necessary. We just need to get the review process under way on this one.

            With LU-5614 close to landing, I rebased this issues' patch. It is ready to go through the normal review process.

            morrone Christopher Morrone (Inactive) added a comment - With LU-5614 close to landing, I rebased this issues' patch. It is ready to go through the normal review process.

            People

              mdiep Minh Diep
              morrone Christopher Morrone (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: