Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5614

use %kernel_module_package for weak-updates

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.9.0
    • None
    • 3
    • 15704

    Description

      The correct way to support weak-updates in rpm packages is the vendor defined %kernel_Module_package. This does the right thing on all distributions.

      We have used this feature in SGI Lustre for several years, and I plan to work this feature back into the master branch.

      Attachments

        1. lustre-ldiskfs.files
          0.1 kB
        2. lustre-modules.files
          0.1 kB
        3. sgi242-simple.spec
          9 kB

        Issue Links

          Activity

            [LU-5614] use %kernel_module_package for weak-updates
            simmonsja James A Simmons added a comment - - edited

            I found the source of the problem. This change now requires the package kabi-whitelist at least for RHEL6. Do you have this package for RHEL7 and SLES as well? Once I installed kaki-whitelist it appears to work. Well ZFS still gives trouble but it might be a simple case of rebuilding it.

            simmonsja James A Simmons added a comment - - edited I found the source of the problem. This change now requires the package kabi-whitelist at least for RHEL6. Do you have this package for RHEL7 and SLES as well? Once I installed kaki-whitelist it appears to work. Well ZFS still gives trouble but it might be a simple case of rebuilding it.
            mdiep Minh Diep added a comment -

            "Are you ssh into the destination node and then installing the rpm on that node?"
            yes

            I am not familiar with chroot. For diskless node, I know that LLNL Chaos built the image with the set of rpms, then refresh/boot the node. I'll have to look into chroot more; but I am pretty sure that's the difference and causing the issue.

            Can you attach or upload your kmod-lustre-client rpm so I can check its content?

            mdiep Minh Diep added a comment - "Are you ssh into the destination node and then installing the rpm on that node?" yes I am not familiar with chroot. For diskless node, I know that LLNL Chaos built the image with the set of rpms, then refresh/boot the node. I'll have to look into chroot more; but I am pretty sure that's the difference and causing the issue. Can you attach or upload your kmod-lustre-client rpm so I can check its content?

            Are you ssh into the destination node and then installing the rpm on that node? For our systems we create the rpms and chroot into a directory that serves as the root of my image for the diskless test nodes.

            [ management server ] -> /export-image/root -> diskless-node:/root
            /usr /usr

            chroot /export-image
            rpm -ivh /tmp/lustre.rpm

            Could the chroot environment be causing the ksym issues? In our diskless setup its not really possible to install rpms directly on the test nodes since they are essentially read only. Can you try that setup Minh please.

            simmonsja James A Simmons added a comment - Are you ssh into the destination node and then installing the rpm on that node? For our systems we create the rpms and chroot into a directory that serves as the root of my image for the diskless test nodes. [ management server ] -> /export-image/root -> diskless-node:/root /usr /usr chroot /export-image rpm -ivh /tmp/ lustre .rpm Could the chroot environment be causing the ksym issues? In our diskless setup its not really possible to install rpms directly on the test nodes since they are essentially read only. Can you try that setup Minh please.
            mdiep Minh Diep added a comment -

            I don't see any issue on el6. I built on el6.7 pointing to el6.8 kernel and install kmod lustre on el6.8 system. no issue

            [root@onyx-21vm2 lustre-release]# uname -a
            Linux onyx-21vm2.onyx.hpdd.intel.com 2.6.32-573.26.1.el6.x86_64 #1 SMP Wed May 4 00:57:44 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
            sh autogen.sh
            ./configure --disable-server --with-linux=/usr/src/kernels/2.6.32-642.1.1.el6.x86_64/
            make rpms
            scp kmod-lustre-client-2.8.54_61_gcc7a8c9-2.6.32_642.1.1.el6.x86_64.x86_64.rpm onyx-21vm3:/root
            scp lustre-client-2.8.54_61_gcc7a8c9-2.6.32_642.1.1.el6.x86_64.x86_64.rpm onyx-21vm3:/root
            history


            rpm -hiv ./kmod-lustre-client-2.8.54_61_gcc7a8c9-2.6.32_642.1.1.el6.x86_64.x86_64.rpm
            [root@onyx-21vm3 ~]# modprobe lustre
            LNet: HW CPU cores: 1, npartitions: 1
            alg: No test for adler32 (adler32-zlib)
            alg: No test for crc32 (crc32-table)
            Lustre: Lustre: Build Version: 2.8.54_61_gcc7a8c9
            LNet: Added LNI 10.2.4.14@tcp [8/256/0/180]
            LNet: Accept secure, port 988
            [root@onyx-21vm3 ~]# modinfo lustre
            filename: /lib/modules/2.6.32-642.1.1.el6.x86_64/extra/lustre-client/fs/lustre.ko
            license: GPL
            version: 2.8.54_61_gcc7a8c9
            description: Lustre Client File System
            author: OpenSFS, Inc. <http://www.lustre.org/>
            srcversion: F6CD34A134CF6E45A10B0DB
            depends: obdclass,ptlrpc,libcfs,lnet,lmv,mdc,lov
            vermagic: 2.6.32-642.1.1.el6.x86_64 SMP mod_unload modversions
            [root@onyx-21vm3 ~]# uname -a
            Linux onyx-21vm3.onyx.hpdd.intel.com 2.6.32-642.1.1.el6.x86_64 #1 SMP Tue May 31 21:57:07 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
            [root@onyx-21vm3 ~]#

            I am not sure what's missing. I setup the builder without anything special. installed git, libtool, rpm-build

            How did you put the kernel-devel on the builder? cpio or rpm install?

            mdiep Minh Diep added a comment - I don't see any issue on el6. I built on el6.7 pointing to el6.8 kernel and install kmod lustre on el6.8 system. no issue [root@onyx-21vm2 lustre-release] # uname -a Linux onyx-21vm2.onyx.hpdd.intel.com 2.6.32-573.26.1.el6.x86_64 #1 SMP Wed May 4 00:57:44 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux sh autogen.sh ./configure --disable-server --with-linux=/usr/src/kernels/2.6.32-642.1.1.el6.x86_64/ make rpms scp kmod-lustre-client-2.8.54_61_gcc7a8c9-2.6.32_642.1.1.el6.x86_64.x86_64.rpm onyx-21vm3:/root scp lustre-client-2.8.54_61_gcc7a8c9-2.6.32_642.1.1.el6.x86_64.x86_64.rpm onyx-21vm3:/root history rpm -hiv ./kmod-lustre-client-2.8.54_61_gcc7a8c9-2.6.32_642.1.1.el6.x86_64.x86_64.rpm [root@onyx-21vm3 ~] # modprobe lustre LNet: HW CPU cores: 1, npartitions: 1 alg: No test for adler32 (adler32-zlib) alg: No test for crc32 (crc32-table) Lustre: Lustre: Build Version: 2.8.54_61_gcc7a8c9 LNet: Added LNI 10.2.4.14@tcp [8/256/0/180] LNet: Accept secure, port 988 [root@onyx-21vm3 ~] # modinfo lustre filename: /lib/modules/2.6.32-642.1.1.el6.x86_64/extra/lustre-client/fs/lustre.ko license: GPL version: 2.8.54_61_gcc7a8c9 description: Lustre Client File System author: OpenSFS, Inc. < http://www.lustre.org/ > srcversion: F6CD34A134CF6E45A10B0DB depends: obdclass,ptlrpc,libcfs,lnet,lmv,mdc,lov vermagic: 2.6.32-642.1.1.el6.x86_64 SMP mod_unload modversions [root@onyx-21vm3 ~] # uname -a Linux onyx-21vm3.onyx.hpdd.intel.com 2.6.32-642.1.1.el6.x86_64 #1 SMP Tue May 31 21:57:07 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux [root@onyx-21vm3 ~] # I am not sure what's missing. I setup the builder without anything special. installed git, libtool, rpm-build How did you put the kernel-devel on the builder? cpio or rpm install?
            simmonsja James A Simmons added a comment - - edited

            Okay. Lets see if its a RHEL6.X issue. Especially since I don't have access to RHEL7 systems If you manage to build it then the build process must be dependent on something missing on our build machine. If I see this problem other people will to once its released into the wild.

            simmonsja James A Simmons added a comment - - edited Okay. Lets see if its a RHEL6.X issue. Especially since I don't have access to RHEL7 systems If you manage to build it then the build process must be dependent on something missing on our build machine. If I see this problem other people will to once its released into the wild.
            mdiep Minh Diep added a comment -

            it worked on el7. I'll verify on el6 as you did next

            [root@onyx-21vm5 lustre-release]# uname -a
            Linux onyx-21vm5.onyx.hpdd.intel.com 3.10.0-327.10.1.el7.x86_64 #1 SMP Tue Feb 16 17:03:50 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
            sh ./autogen.sh
            ./configure --disable-server --with-linux=/usr/src/kernels/3.10.0-327.18.2.el7.x86_64/
            make rpms

            scp kmod-lustre-client-2.8.54_61_gcc7a8c9-3.10.0_327.18.2.el7.x86_64.x86_64.rpm root@onyx-24:/root
            scp lustre-client-2.8.54_61_gcc7a8c9-3.10.0_327.18.2.el7.x86_64.x86_64.rpm root@onyx-24:/root

            [root@onyx-24 ~]# uname -a
            Linux onyx-24.onyx.hpdd.intel.com 3.10.0-327.18.2.el7.x86_64 #1 SMP Thu May 12 11:03:55 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
            [root@onyx-24 ~]# rpm -hiv ./kmod-lustre-client-2.8.54_61_gcc7a8c9-3.10.0_327.18.2.el7.x86_64.x86_64.rpm
            Preparing... ################################# [100%]
            Updating / installing...
            1:kmod-lustre-client-2.8.54_61_gcc7################################# [100%]
            [root@onyx-24 ~]# modprobe lustre
            [root@onyx-24 ~]# modinfo lustre
            filename: /lib/modules/3.10.0-327.18.2.el7.x86_64/extra/lustre-client/fs/lustre.ko
            license: GPL
            version: 2.8.54_61_gcc7a8c9
            description: Lustre Client File System
            author: OpenSFS, Inc. <http://www.lustre.org/>
            rhelversion: 7.2
            srcversion: F6CD34A134CF6E45A10B0DB
            depends: obdclass,ptlrpc,libcfs,lnet,lmv,mdc,lov
            vermagic: 3.10.0-327.18.2.el7.x86_64 SMP mod_unload modversions
            [root@onyx-24 ~]#

            mdiep Minh Diep added a comment - it worked on el7. I'll verify on el6 as you did next [root@onyx-21vm5 lustre-release] # uname -a Linux onyx-21vm5.onyx.hpdd.intel.com 3.10.0-327.10.1.el7.x86_64 #1 SMP Tue Feb 16 17:03:50 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux sh ./autogen.sh ./configure --disable-server --with-linux=/usr/src/kernels/3.10.0-327.18.2.el7.x86_64/ make rpms scp kmod-lustre-client-2.8.54_61_gcc7a8c9-3.10.0_327.18.2.el7.x86_64.x86_64.rpm root@onyx-24:/root scp lustre-client-2.8.54_61_gcc7a8c9-3.10.0_327.18.2.el7.x86_64.x86_64.rpm root@onyx-24:/root [root@onyx-24 ~] # uname -a Linux onyx-24.onyx.hpdd.intel.com 3.10.0-327.18.2.el7.x86_64 #1 SMP Thu May 12 11:03:55 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux [root@onyx-24 ~] # rpm -hiv ./kmod-lustre-client-2.8.54_61_gcc7a8c9-3.10.0_327.18.2.el7.x86_64.x86_64.rpm Preparing... ################################# [100%] Updating / installing... 1:kmod-lustre-client-2.8.54_61_gcc7################################# [100%] [root@onyx-24 ~] # modprobe lustre [root@onyx-24 ~] # modinfo lustre filename: /lib/modules/3.10.0-327.18.2.el7.x86_64/extra/lustre-client/fs/lustre.ko license: GPL version: 2.8.54_61_gcc7a8c9 description: Lustre Client File System author: OpenSFS, Inc. < http://www.lustre.org/ > rhelversion: 7.2 srcversion: F6CD34A134CF6E45A10B0DB depends: obdclass,ptlrpc,libcfs,lnet,lmv,mdc,lov vermagic: 3.10.0-327.18.2.el7.x86_64 SMP mod_unload modversions [root@onyx-24 ~] #
            simmonsja James A Simmons added a comment - - edited

            So I created the most basic test to show the problem. This is with building just the patchless client. Currently our build machine is at RHEL6.7 and my test node is running RHEL6.8. So we have very different kernels running on both. On the build machine I as non-root installed the RHEL6.8 development tree in /tmp so I have /tmp//usr/src/kernels/2.6.32-642.1.1.el6.x86_64. Then I went into my lustre tree containing this patch and did:

            cd lustre_release
            ./autogen.sh
            ./configure --disable-server --with-linux=/tmp/usr/src/kernels/2.6.32-642.1.1.el6.x86_64
            make rpm

            Then I went to install the rpms into the image and got the ksyms errors still. This points to the current patch for LU-5614 breaks --with-linux for configure. Note this is very common process our admins use to prepare our client rpms. You can say well just build on the local node. In that case the patch for LU-5614 need to be updated to remove --with-linux support. Also if you recommend using mock to work around this problem then please update this patch to ensure configure failed unless ./configure runs in a mock environment and the source rpm has a hard dependency on mock. In any case I have a reproducer of this problem.

            simmonsja James A Simmons added a comment - - edited So I created the most basic test to show the problem. This is with building just the patchless client. Currently our build machine is at RHEL6.7 and my test node is running RHEL6.8. So we have very different kernels running on both. On the build machine I as non-root installed the RHEL6.8 development tree in /tmp so I have /tmp//usr/src/kernels/2.6.32-642.1.1.el6.x86_64. Then I went into my lustre tree containing this patch and did: cd lustre_release ./autogen.sh ./configure --disable-server --with-linux=/tmp/usr/src/kernels/2.6.32-642.1.1.el6.x86_64 make rpm Then I went to install the rpms into the image and got the ksyms errors still. This points to the current patch for LU-5614 breaks --with-linux for configure. Note this is very common process our admins use to prepare our client rpms. You can say well just build on the local node. In that case the patch for LU-5614 need to be updated to remove --with-linux support. Also if you recommend using mock to work around this problem then please update this patch to ensure configure failed unless ./configure runs in a mock environment and the source rpm has a hard dependency on mock. In any case I have a reproducer of this problem.

            No the target kernel rpm is not installed on the build system. That would be a big no no since the machine is used for other purposes. The target kernel source tree is installed in a special directory which is not /usr/src to build against. I will be looking into why the dependency generation is done against the wrong kernel. I wonder if we have to do a OFED style build process in which we tell the location of the symver.

            simmonsja James A Simmons added a comment - No the target kernel rpm is not installed on the build system. That would be a big no no since the machine is used for other purposes. The target kernel source tree is installed in a special directory which is not /usr/src to build against. I will be looking into why the dependency generation is done against the wrong kernel. I wonder if we have to do a OFED style build process in which we tell the location of the symver.

            Is the target kernel installed on the build system (or environment)?
            Can you verify that dependency generation is done with the right kernel?

            schamp Stephen Champion added a comment - Is the target kernel installed on the build system (or environment)? Can you verify that dependency generation is done with the right kernel?

            Remove --without kabichk and still gives me the ksym issues.

            simmonsja James A Simmons added a comment - Remove --without kabichk and still gives me the ksym issues.

            Nope, but looking at how I build our kernel I might know why. When building the kernel rpm I used --without kabichk. Would that be the reason it doesn't work? This is all done using RHEL6.7.

            simmonsja James A Simmons added a comment - Nope, but looking at how I build our kernel I might know why. When building the kernel rpm I used --without kabichk. Would that be the reason it doesn't work? This is all done using RHEL6.7.

            People

              mdiep Minh Diep
              schamp Stephen Champion
              Votes:
              1 Vote for this issue
              Watchers:
              23 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: