Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5628

Dealing with kernels that have lustre enabled already

Details

    • Task
    • Resolution: Duplicate
    • Major
    • None
    • None
    • 15743

    Description

      Now that the kernels that have lustre (from that staging tree at the moment) included grows and distributions that ship it increase, we need to do something about all the problems this creates for us.

      Currently we cannot build our external lustre against such a kernel due to clash in config defines e.g.:

      make[1]: Entering directory `/home/green/bk/x86'
        CC [M]  /home/green/git/lustre-current/libcfs/libcfs/linux/linux-tracefile.o
      In file included from <command-line>:0:0:
      /home/green/git/lustre-current/config.h:26:0: error: "CONFIG_LNET_MAX_PAYLOAD" redefined [-Werror]
       #define CONFIG_LNET_MAX_PAYLOAD LNET_MTU
       ^
      In file included from /home/green/bk/linux/include/linux/kconfig.h:4:0,
                       from <command-line>:0:
      include/generated/autoconf.h:1571:0: note: this is the location of the previous definition
       #define CONFIG_LNET_MAX_PAYLOAD 1048576
       ^
      cc1: all warnings being treated as errors
      

      Once the lustre is moved out of staging tree, another problem will be added - clashing of symbols from lustre includes in the kernel tree (now hidden in secluded staging location so not a problem immediately).

      Once the config symbols clash is resolved - the other problem is the clash in module names between in-kernel lustre and out of kernel lustre. Due to in-kernel implementation mostly being geared towards clients and also lacking our debugging aids and such - these modules are not interchangeable really and we need to do something about it too - possibly consider renaming our out of tree modules? This will become a problem once distributions start to enable lustre by default in their kernels (so not a big problem yet too).

      Finally there are bound to be symbol clashes between in and out-of kernel lustre modules so we need to do something about that too I suspect, but not sure what so far. A wrapper to change the name a bit?

      Attachments

        Issue Links

          Activity

            [LU-5628] Dealing with kernels that have lustre enabled already
            simmonsja James A Simmons added a comment - - edited

            For LNet this is the case. In the wild exist external kernel modules that use LNet like DVS from Cray.

            Have you tried Intel Lustre on a Distro with upstream Lustre enabled? I have newer Ubuntu versions on the IBM PowerPC but for some mysterious reason Lustre is disabled unlike other Ubuntu systems

            simmonsja James A Simmons added a comment - - edited For LNet this is the case. In the wild exist external kernel modules that use LNet like DVS from Cray. Have you tried Intel Lustre on a Distro with upstream Lustre enabled? I have newer Ubuntu versions on the IBM PowerPC but for some mysterious reason Lustre is disabled unlike other Ubuntu systems

            Why we need this? Are we assume somebody will link with our modules? What the reason to provide our symbols versions for other?

            dmiter Dmitry Eremin (Inactive) added a comment - Why we need this? Are we assume somebody will link with our modules? What the reason to provide our symbols versions for other?

            Patch http://review.whamcloud.com/16418 will resolve the config.h issues with the upstream kernel but for OpenSFS/Intel lustre to run instead of the upstream client we need to modify Module.symvers to replace the symbols form the upstream clients with the master branch much like we do for the OFED external stacks.

            simmonsja James A Simmons added a comment - Patch http://review.whamcloud.com/16418 will resolve the config.h issues with the upstream kernel but for OpenSFS/Intel lustre to run instead of the upstream client we need to modify Module.symvers to replace the symbols form the upstream clients with the master branch much like we do for the OFED external stacks.
            dmiter Dmitry Eremin (Inactive) added a comment - - edited

            The patch http://review.whamcloud.com/16418 will also resolve this. The issue is common with LU-7042.

            dmiter Dmitry Eremin (Inactive) added a comment - - edited The patch http://review.whamcloud.com/16418 will also resolve this. The issue is common with LU-7042 .

            WORKAROUND:
            Ubuntu 14.04 LTS
            Linux kernel 3.13

            ./configure --disable-server --enable-quota --with-max-payload-mb=1
            

            edit config.h to replace ((1)<<20) with 1048576

            utopiabound Nathaniel Clark added a comment - WORKAROUND: Ubuntu 14.04 LTS Linux kernel 3.13 ./configure --disable-server --enable-quota --with-max-payload-mb=1 edit config.h to replace ((1)<<20) with 1048576

            Looking at the module-assisant man pages it appears that KPKG_DEST_DIR can be used to place the lustre modules into the update directory. Perhaps that is not the best solution yet since I am not a debian package expert by any means. Anyone debian package gurues here?

            simmonsja James A Simmons added a comment - Looking at the module-assisant man pages it appears that KPKG_DEST_DIR can be used to place the lustre modules into the update directory. Perhaps that is not the best solution yet since I am not a debian package expert by any means. Anyone debian package gurues here?
            james beal James Beal added a comment -

            Thanks for that, in our use case we use a redhat kernel on our servers with the user space being ubuntu but for our clients we want to use the real client and the default kernel. We use dkms for our client modules so that works for us . I could email you a link to the system we use to build things if that would help at all ( it works in vagrant and aws ).

            james beal James Beal added a comment - Thanks for that, in our use case we use a redhat kernel on our servers with the user space being ubuntu but for our clients we want to use the real client and the default kernel. We use dkms for our client modules so that works for us . I could email you a link to the system we use to build things if that would help at all ( it works in vagrant and aws ).

            I'm also working with on Ubuntu 14.04 as well and just pushed some patches to make the intel branch of lustre functional. As for making it work with the upstream client that is included it will require a bit of work which I haven't had the time to do. So basically we have do something along the lines of OFED. Besides handing CONFIG_LNET_MAX_PAYLOAD we have to modify Module.symvers so that the correct lustre modules have to be updated. Currently make debs places the lustre modules in kernel/fs instead of updates. That needs to be fixed first.

            simmonsja James A Simmons added a comment - I'm also working with on Ubuntu 14.04 as well and just pushed some patches to make the intel branch of lustre functional. As for making it work with the upstream client that is included it will require a bit of work which I haven't had the time to do. So basically we have do something along the lines of OFED. Besides handing CONFIG_LNET_MAX_PAYLOAD we have to modify Module.symvers so that the correct lustre modules have to be updated. Currently make debs places the lustre modules in kernel/fs instead of updates. That needs to be fixed first.
            james beal James Beal added a comment -

            Any news as I am seeing this with lustre 2.7 on Ubuntu 14.04

            james beal James Beal added a comment - Any news as I am seeing this with lustre 2.7 on Ubuntu 14.04
            mdiep Minh Diep added a comment -

            I am testing on Ubuntu 14.04

            mdiep Minh Diep added a comment - I am testing on Ubuntu 14.04
            simmonsja James A Simmons added a comment - - edited

            Those errors are due to procfs api changes upstream which should be resolved by the patches from LU-5275. Which debian/Ubuntu are you testing on?

            simmonsja James A Simmons added a comment - - edited Those errors are due to procfs api changes upstream which should be resolved by the patches from LU-5275 . Which debian/Ubuntu are you testing on?

            People

              dmiter Dmitry Eremin (Inactive)
              green Oleg Drokin
              Votes:
              1 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: