Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-14025

Need lustre client for SLES15 SP2 and Mellanox OFED 5.1

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.12.5
    • Lustre 2.12.5
    • Lustre client will run in SLES15 SP2 system with Mellanox OFED 5.1
    • 3
    • 9223372036854775807

    Description

      We are getting new systems that requires SLES15 SP2 and Mellanox OFED.

      Uploaded is a log-rpms file that showed compilation failure of lustre client 2.12.5 with SLES15 SP2 kernel and Mellanox OFED 5.1. The only mofed versions that support SLES15 SP2 are version 5.x.

      The failure looked not trivia. Probably due to some rules changes in newer version of gcc, make, rpmbuilds in SLES15 SP2.
      ...
      /usr/src/linux-5.3.18-24.15/scripts/Makefile.build:57: '/tmp/rpmbuild-lustre-jlan-c7QMO3MB/BUILD/lustre-2.12.5/libcfs/libcfs/libcfs.ko' will not be built even though obj-m is specified.
      /usr/src/linux-5.3.18-24.15/scripts/Makefile.build:58: You cannot use subdir-y/m to visit a module Makefile. Use obj-y/m instead.
      /usr/src/linux-5.3.18-24.15/scripts/Makefile.build:57: '/tmp/rpmbuild-lustre-jlan-c7QMO3MB/BUILD/lustre-2.12.5/lnet/selftest/lnet_selftest.ko' will not be built even though obj-m is specified.
      /usr/src/linux-5.3.18-24.15/scripts/Makefile.build:58: You cannot use subdir-y/m to visit a module Makefile. Use obj-y/m instead.
      ...

      I did not see this type of failures in compiling mofed, and other external kernel modules that I built. This only happened to Lustre build.

      Attachments

        Activity

          [LU-14025] Need lustre client for SLES15 SP2 and Mellanox OFED 5.1

          Ah, no, I did not have that patch in our nas-2.12.5 branch. Thank you.

          jaylan Jay Lan (Inactive) added a comment - Ah, no, I did not have that patch in our nas-2.12.5 branch. Thank you.
          yujian Jian Yu added a comment -

          You're welcome, Jay.
          The rdma_reject() issue was fixed in https://review.whamcloud.com/39781 on Lustre b2_12 branch. Could you please take a look if your b2_12 codes contain that patch?

          yujian Jian Yu added a comment - You're welcome, Jay. The rdma_reject() issue was fixed in https://review.whamcloud.com/39781 on Lustre b2_12 branch. Could you please take a look if your b2_12 codes contain that patch?

          Hi Jian,

          I needed to create a temporary workaround to address a synopsis change of rdma_reject() in mofed-5.1. The affected code is in lnet/klnds/o2iblnd/o2iblnd_cb.c.

          Otherwise, all worked well! I had rpms created. Thank you for your help!

          jaylan Jay Lan (Inactive) added a comment - Hi Jian, I needed to create a temporary workaround to address a synopsis change of rdma_reject() in mofed-5.1. The affected code is in lnet/klnds/o2iblnd/o2iblnd_cb.c. Otherwise, all worked well! I had rpms created. Thank you for your help!
          yujian Jian Yu added a comment -

          No, Jay.
          Here is the info on my SLES15 SP2 build node:

          # uname -r
          5.3.18-24.24-default
          # gcc --version | head -1
          gcc (SUSE Linux) 7.5.0
          

          The commit messages for the above four LU-12904 patches showed that two of the changes are needed for kernel 5.2 and 5.3. The other two patches are back-ported to resolve patch conflicts.

          yujian Jian Yu added a comment - No, Jay. Here is the info on my SLES15 SP2 build node: # uname -r 5.3.18-24.24-default # gcc --version | head -1 gcc (SUSE Linux) 7.5.0 The commit messages for the above four LU-12904 patches showed that two of the changes are needed for kernel 5.2 and 5.3. The other two patches are back-ported to resolve patch conflicts.

          Hi Jian,

          Does any of patches you listed require gcc8 or kernel5.4? LU-12904 "Support for linux kernel version 5.4" seems to suggest patches fro kernel 5.4?

          sles15sp2 is running kernel5.3 and gcc7.

          Thanks,
          Jay

          jaylan Jay Lan (Inactive) added a comment - Hi Jian, Does any of patches you listed require gcc8 or kernel5.4? LU-12904 "Support for linux kernel version 5.4" seems to suggest patches fro kernel 5.4? sles15sp2 is running kernel5.3 and gcc7. Thanks, Jay
          yujian Jian Yu added a comment -

          Hi Jay,
          The patch series has 28 patches:
          LU-14029 kernel: new kernel [SLES15 SP2 5.3.18-24.24.1]
          LU-13344 lnet: stop using struct timeval
          LU-13210 lnet: gcc8 add implicit-fallthrough decorator
          LU-12355 llite: MS_* flags and SB_* flags split
          LU-12400 libcfs: save_stack_trace_tsk if ARCH_STACKWALK
          LU-12400 osd-ldiskfs: get rid of legacy 'get_ds()' function
          LU-12355 llite: totalram_pages changed to atomic_long_t
          LU-13476 llite: Fix lock ordering in pagevec_dirty
          LU-13209 build: SUSE 15 SP2 fix for KBUILD_SRC removed
          LU-13209 build: Fix vvp_account_page_dirtied
          LU-13288 llite: Find account_page_dirtied on module init
          LU-12904 utils: zfs properly detect spa_multihost
          LU-12904 build: account_page_dirtied is not exported
          LU-12634 llite: Use __xa_set_mark if it is available
          LU-9920 vvp: dirty pages with pagevec
          LU-12904 build: Support for gcc -Wimplicit-fallthrough
          LU-12904 build: External module decorator removed
          LU-12634 libcfs: force_sig() removed task parameter
          LU-12634 build: Recognize ELRepo -ml mainline kernel
          LU-12634 llite: lm_compare_owner removed
          LU-12634 osd-ldiskfs: bi_phys_segments removed from struct bio
          LU-12634 build: kbuild changes in 5.3 drop subdir-m
          LU-12635 lnet: Fix style issues for module.c conctl.c
          LU-12635 lnet: Fix deceptive indenting on for_each
          LU-12635 lnet: Fix style issues for selftest/rpc.c
          LU-12635 build: Support for gcc -Wimplicit-fallthrough
          LU-9859 libcfs: remove wi_data from cfs_workitem
          LU-9859 libcfs: use a workqueue for rehash work.
          Among the above patches, https://review.whamcloud.com/40339 (LU-9859 libcfs: use a workqueue for rehash work.) is the first one needs to be applied, and https://review.whamcloud.com/40266 (LU-14029 kernel: new kernel [SLES15 SP2 5.3.18-24.24.1]) is the last one (the tip of the patch series).

          yujian Jian Yu added a comment - Hi Jay, The patch series has 28 patches: LU-14029 kernel: new kernel [SLES15 SP2 5.3.18-24.24.1] LU-13344 lnet: stop using struct timeval LU-13210 lnet: gcc8 add implicit-fallthrough decorator LU-12355 llite: MS_* flags and SB_* flags split LU-12400 libcfs: save_stack_trace_tsk if ARCH_STACKWALK LU-12400 osd-ldiskfs: get rid of legacy 'get_ds()' function LU-12355 llite: totalram_pages changed to atomic_long_t LU-13476 llite: Fix lock ordering in pagevec_dirty LU-13209 build: SUSE 15 SP2 fix for KBUILD_SRC removed LU-13209 build: Fix vvp_account_page_dirtied LU-13288 llite: Find account_page_dirtied on module init LU-12904 utils: zfs properly detect spa_multihost LU-12904 build: account_page_dirtied is not exported LU-12634 llite: Use __xa_set_mark if it is available LU-9920 vvp: dirty pages with pagevec LU-12904 build: Support for gcc -Wimplicit-fallthrough LU-12904 build: External module decorator removed LU-12634 libcfs: force_sig() removed task parameter LU-12634 build: Recognize ELRepo -ml mainline kernel LU-12634 llite: lm_compare_owner removed LU-12634 osd-ldiskfs: bi_phys_segments removed from struct bio LU-12634 build: kbuild changes in 5.3 drop subdir-m LU-12635 lnet: Fix style issues for module.c conctl.c LU-12635 lnet: Fix deceptive indenting on for_each LU-12635 lnet: Fix style issues for selftest/rpc.c LU-12635 build: Support for gcc -Wimplicit-fallthrough LU-9859 libcfs: remove wi_data from cfs_workitem LU-9859 libcfs: use a workqueue for rehash work. Among the above patches, https://review.whamcloud.com/40339 ( LU-9859 libcfs: use a workqueue for rehash work.) is the first one needs to be applied, and https://review.whamcloud.com/40266 ( LU-14029 kernel: new kernel [SLES15 SP2 5.3.18-24.24.1] ) is the last one (the tip of the patch series).

          In sles15sp2 (linux 5.3), gcc version is 7.x. The '-Wno-stringop-truncation' problem seems to require gcc 8.

          jaylan Jay Lan (Inactive) added a comment - In sles15sp2 (linux 5.3), gcc version is 7.x. The '-Wno-stringop-truncation' problem seems to require gcc 8.

          Hi Jian,

          I picked up these patches:
          LU-14029 kernel: new kernel [SLES15 SP2 5.3.18-24.24.1]
          LU-13288 llite: Find account_page_dirtied on module init
          LU-12634 gss: uid_keyring and session_keyring moved
          LU-12634 libcfs: force_sig() removed task parameter
          LU-12634 build: Recognize ELRepo -ml mainline kernel
          LU-12634 llite: lm_compare_owner removed
          LU-12634 osd-ldiskfs: bi_phys_segments removed from struct bio
          LU-12634 build: kbuild changes in 5.3 drop subdir-m
          LU-13209 build: SUSE 15 SP2 fix for KBUILD_SRC removed
          LU-13209 build: Fix vvp_account_page_dirtied
          LU-13820 kernel: new kernel [SLES15 SP2 5.3.18-22.2]

          Some errors:
          1) /tmp/rpmbuild-lustre-jlan-bYemuTCT/BUILD/lustre-2.12.5/libcfs/libcfs/linux/linux-tracefile.c: In function 'cfs_trace_max_debug_mb':
          /tmp/rpmbuild-lustre-jlan-bYemuTCT/BUILD/lustre-2.12.5/libcfs/libcfs/linux/linux-tracefile.c:270:34: error: invalid operands to binary >> (have 'long unsigned int (void)' and 'int')
          int total_mb = (totalram_pages >> (20 - PAGE_SHIFT));
          ~~~~~~~~~~~~~~ ^~
          2) /tmp/rpmbuild-lustre-jlan-bYemuTCT/BUILD/lustre-2.12.5/libcfs/libcfs/linux/linux-tracefile.c: At top level:
          cc1: error: unrecognized command line option '-Wno-stringop-truncation' [-Werror]
          3) /tmp/rpmbuild-lustre-jlan-bYemuTCT/BUILD/lustre-2.12.5/lustre/include/lustre_compat.h:554:20: error: redefinition of 'inode_has_no_xattr'
          It was defined at
          lustre/include/lustre_compat.h:554:20 and
          [linux source]/include/linux/fs.h:3504:20

          The log-rpms.20201027 would be uploaded.

          jaylan Jay Lan (Inactive) added a comment - Hi Jian, I picked up these patches: LU-14029 kernel: new kernel [SLES15 SP2 5.3.18-24.24.1] LU-13288 llite: Find account_page_dirtied on module init LU-12634 gss: uid_keyring and session_keyring moved LU-12634 libcfs: force_sig() removed task parameter LU-12634 build: Recognize ELRepo -ml mainline kernel LU-12634 llite: lm_compare_owner removed LU-12634 osd-ldiskfs: bi_phys_segments removed from struct bio LU-12634 build: kbuild changes in 5.3 drop subdir-m LU-13209 build: SUSE 15 SP2 fix for KBUILD_SRC removed LU-13209 build: Fix vvp_account_page_dirtied LU-13820 kernel: new kernel [SLES15 SP2 5.3.18-22.2] Some errors: 1) /tmp/rpmbuild-lustre-jlan-bYemuTCT/BUILD/lustre-2.12.5/libcfs/libcfs/linux/linux-tracefile.c: In function 'cfs_trace_max_debug_mb': /tmp/rpmbuild-lustre-jlan-bYemuTCT/BUILD/lustre-2.12.5/libcfs/libcfs/linux/linux-tracefile.c:270:34: error: invalid operands to binary >> (have 'long unsigned int (void)' and 'int') int total_mb = (totalram_pages >> (20 - PAGE_SHIFT)); ~~~~~~~~~~~~~~ ^~ 2) /tmp/rpmbuild-lustre-jlan-bYemuTCT/BUILD/lustre-2.12.5/libcfs/libcfs/linux/linux-tracefile.c: At top level: cc1: error: unrecognized command line option '-Wno-stringop-truncation' [-Werror] 3) /tmp/rpmbuild-lustre-jlan-bYemuTCT/BUILD/lustre-2.12.5/lustre/include/lustre_compat.h:554:20: error: redefinition of 'inode_has_no_xattr' It was defined at lustre/include/lustre_compat.h:554:20 and [linux source] /include/linux/fs.h:3504:20 The log-rpms.20201027 would be uploaded.
          yujian Jian Yu added a comment -

          Hi Jay,
          It turns out more patches are needed. Here is the tip of the back-ported patch series: https://review.whamcloud.com/40266. With those patches applied to Lustre b2_12 branch, I can successfully build SLES15 SP2 client (kernel 5.3.18-24.24.1) with MLNX_OFED 5.1-2.3.7.1. The build needs to be tested.

          yujian Jian Yu added a comment - Hi Jay, It turns out more patches are needed. Here is the tip of the back-ported patch series: https://review.whamcloud.com/40266 . With those patches applied to Lustre b2_12 branch, I can successfully build SLES15 SP2 client (kernel 5.3.18-24.24.1) with MLNX_OFED 5.1-2.3.7.1. The build needs to be tested.
          yujian Jian Yu added a comment - - edited

          Patches for LU-13209 are needed on Lustre b2_12 branch.
          Patches for LU-12634, LU-12904 and LU-13288 are also needed.

          yujian Jian Yu added a comment - - edited Patches for LU-13209 are needed on Lustre b2_12 branch. Patches for LU-12634 , LU-12904 and LU-13288 are also needed.

          People

            yujian Jian Yu
            jaylan Jay Lan (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: