[LU-14025] Need lustre client for SLES15 SP2 and Mellanox OFED 5.1 Created: 13/Oct/20  Updated: 23/Jan/24  Resolved: 23/Jan/24

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.12.5
Fix Version/s: Lustre 2.12.5

Type: Bug Priority: Critical
Reporter: Jay Lan (Inactive) Assignee: Jian Yu
Resolution: Fixed Votes: 0
Labels: build
Environment:

Lustre client will run in SLES15 SP2 system with Mellanox OFED 5.1


Attachments: File log-rpms.2.12.5-sles15sp2-mofed5.1     File log-rpms.20201027     File log-rpms.master-sles15sp2-mofed512    
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

We are getting new systems that requires SLES15 SP2 and Mellanox OFED.

Uploaded is a log-rpms file that showed compilation failure of lustre client 2.12.5 with SLES15 SP2 kernel and Mellanox OFED 5.1. The only mofed versions that support SLES15 SP2 are version 5.x.

The failure looked not trivia. Probably due to some rules changes in newer version of gcc, make, rpmbuilds in SLES15 SP2.
...
/usr/src/linux-5.3.18-24.15/scripts/Makefile.build:57: '/tmp/rpmbuild-lustre-jlan-c7QMO3MB/BUILD/lustre-2.12.5/libcfs/libcfs/libcfs.ko' will not be built even though obj-m is specified.
/usr/src/linux-5.3.18-24.15/scripts/Makefile.build:58: You cannot use subdir-y/m to visit a module Makefile. Use obj-y/m instead.
/usr/src/linux-5.3.18-24.15/scripts/Makefile.build:57: '/tmp/rpmbuild-lustre-jlan-c7QMO3MB/BUILD/lustre-2.12.5/lnet/selftest/lnet_selftest.ko' will not be built even though obj-m is specified.
/usr/src/linux-5.3.18-24.15/scripts/Makefile.build:58: You cannot use subdir-y/m to visit a module Makefile. Use obj-y/m instead.
...

I did not see this type of failures in compiling mofed, and other external kernel modules that I built. This only happened to Lustre build.



 Comments   
Comment by Peter Jones [ 13/Oct/20 ]

Jay

Severity 1 is reserved for site down issues. It seems like this was not the intention of this ticket but please confirm

Peter

Comment by Jay Lan (Inactive) [ 13/Oct/20 ]

No, not site down. We would need it when we have new system installed. Please adjust the severity and priority as you think appropriate.

BTW, since I build lustre rpms myself, I only need to have a working (for 2.12.5) patch 'submitted'. You handle release at your pace.

Comment by Peter Jones [ 14/Oct/20 ]

Jay

Are you able to build master clients ok? If I understand correctly, SLES15 SP2 requires a 5.3 kernel which I believe has been tested successfully with Ubuntu 18.04 HWE

Peter

Comment by Jay Lan (Inactive) [ 14/Oct/20 ]

I tried to compile master in sles15sp2 environment. It also failed. There were WARNING before the errors.

log-rpms.master-sles15sp2-mofed512 is attached. log-rpms.master-sles15sp2-mofed512

Comment by Peter Jones [ 14/Oct/20 ]

Jian

Could you please investigate?

Thanks

Peter

Comment by Jian Yu [ 16/Oct/20 ]

Hi Jay,

I tried to compile master in sles15sp2 environment. It also failed. There were WARNING before the errors.

The warnings did not cause the build to fail. Have you tried to install the following packages and proceed with the build?

error: Failed build dependencies:
        binutils-devel is needed by lustre-client-2.13.56-1.x86_64
        openmpi2-devel is needed by lustre-client-2.13.56-1.x86_64

FYI, with SLES15 SP2 client support patch https://review.whamcloud.com/40265 applied to master branch, I can successfully build it with MLNX_OFED 5.1-2.3.7.1:

# uname -r
5.3.18-24.24-default
# rpm -qf /usr/src/ofa_kernel/default/
mlnx-ofa_kernel-devel-5.1-OFED.5.1.2.3.7.1.sles15sp2.x86_64

# cd lustre-release/
# sh ./autogen.sh
# ./configure --disable-server --without-zfs --with-linux=/usr/src/linux-5.3.18-24.24 --with-linux-obj=/usr/src/linux-5.3.18-24.24-obj/x86_64/default --with-o2ib=/usr/src/ofa_kernel/default/
# make rpms
<~snip~>
Wrote: /tmp/rpmbuild-lustre-root-Pz1U9Cf0/RPMS/x86_64/lustre-client-2.13.56_23_g9e2f8a4-1.x86_64.rpm
Wrote: /tmp/rpmbuild-lustre-root-Pz1U9Cf0/RPMS/x86_64/lustre-client-kmp-default-2.13.56_23_g9e2f8a4_k5.3.18_24.24-1.x86_64.rpm
Wrote: /tmp/rpmbuild-lustre-root-Pz1U9Cf0/RPMS/x86_64/lustre-client-tests-2.13.56_23_g9e2f8a4-1.x86_64.rpm
<~snip~>

I will try Lustre b2_12 branch and figure out what patches are needed.

Comment by Jian Yu [ 16/Oct/20 ]

Patches for LU-13209 are needed on Lustre b2_12 branch.
Patches for LU-12634, LU-12904 and LU-13288 are also needed.

Comment by Jian Yu [ 23/Oct/20 ]

Hi Jay,
It turns out more patches are needed. Here is the tip of the back-ported patch series: https://review.whamcloud.com/40266. With those patches applied to Lustre b2_12 branch, I can successfully build SLES15 SP2 client (kernel 5.3.18-24.24.1) with MLNX_OFED 5.1-2.3.7.1. The build needs to be tested.

Comment by Jay Lan (Inactive) [ 27/Oct/20 ]

Hi Jian,

I picked up these patches:
LU-14029 kernel: new kernel [SLES15 SP2 5.3.18-24.24.1]
LU-13288 llite: Find account_page_dirtied on module init
LU-12634 gss: uid_keyring and session_keyring moved
LU-12634 libcfs: force_sig() removed task parameter
LU-12634 build: Recognize ELRepo -ml mainline kernel
LU-12634 llite: lm_compare_owner removed
LU-12634 osd-ldiskfs: bi_phys_segments removed from struct bio
LU-12634 build: kbuild changes in 5.3 drop subdir-m
LU-13209 build: SUSE 15 SP2 fix for KBUILD_SRC removed
LU-13209 build: Fix vvp_account_page_dirtied
LU-13820 kernel: new kernel [SLES15 SP2 5.3.18-22.2]

Some errors:
1) /tmp/rpmbuild-lustre-jlan-bYemuTCT/BUILD/lustre-2.12.5/libcfs/libcfs/linux/linux-tracefile.c: In function 'cfs_trace_max_debug_mb':
/tmp/rpmbuild-lustre-jlan-bYemuTCT/BUILD/lustre-2.12.5/libcfs/libcfs/linux/linux-tracefile.c:270:34: error: invalid operands to binary >> (have 'long unsigned int (void)' and 'int')
int total_mb = (totalram_pages >> (20 - PAGE_SHIFT));
~~~~~~~~~~~~~~ ^~
2) /tmp/rpmbuild-lustre-jlan-bYemuTCT/BUILD/lustre-2.12.5/libcfs/libcfs/linux/linux-tracefile.c: At top level:
cc1: error: unrecognized command line option '-Wno-stringop-truncation' [-Werror]
3) /tmp/rpmbuild-lustre-jlan-bYemuTCT/BUILD/lustre-2.12.5/lustre/include/lustre_compat.h:554:20: error: redefinition of 'inode_has_no_xattr'
It was defined at
lustre/include/lustre_compat.h:554:20 and
[linux source]/include/linux/fs.h:3504:20

The log-rpms.20201027 would be uploaded.

Comment by Jay Lan (Inactive) [ 27/Oct/20 ]

In sles15sp2 (linux 5.3), gcc version is 7.x. The '-Wno-stringop-truncation' problem seems to require gcc 8.

Comment by Jian Yu [ 27/Oct/20 ]

Hi Jay,
The patch series has 28 patches:
LU-14029 kernel: new kernel [SLES15 SP2 5.3.18-24.24.1]
LU-13344 lnet: stop using struct timeval
LU-13210 lnet: gcc8 add implicit-fallthrough decorator
LU-12355 llite: MS_* flags and SB_* flags split
LU-12400 libcfs: save_stack_trace_tsk if ARCH_STACKWALK
LU-12400 osd-ldiskfs: get rid of legacy 'get_ds()' function
LU-12355 llite: totalram_pages changed to atomic_long_t
LU-13476 llite: Fix lock ordering in pagevec_dirty
LU-13209 build: SUSE 15 SP2 fix for KBUILD_SRC removed
LU-13209 build: Fix vvp_account_page_dirtied
LU-13288 llite: Find account_page_dirtied on module init
LU-12904 utils: zfs properly detect spa_multihost
LU-12904 build: account_page_dirtied is not exported
LU-12634 llite: Use __xa_set_mark if it is available
LU-9920 vvp: dirty pages with pagevec
LU-12904 build: Support for gcc -Wimplicit-fallthrough
LU-12904 build: External module decorator removed
LU-12634 libcfs: force_sig() removed task parameter
LU-12634 build: Recognize ELRepo -ml mainline kernel
LU-12634 llite: lm_compare_owner removed
LU-12634 osd-ldiskfs: bi_phys_segments removed from struct bio
LU-12634 build: kbuild changes in 5.3 drop subdir-m
LU-12635 lnet: Fix style issues for module.c conctl.c
LU-12635 lnet: Fix deceptive indenting on for_each
LU-12635 lnet: Fix style issues for selftest/rpc.c
LU-12635 build: Support for gcc -Wimplicit-fallthrough
LU-9859 libcfs: remove wi_data from cfs_workitem
LU-9859 libcfs: use a workqueue for rehash work.
Among the above patches, https://review.whamcloud.com/40339 (LU-9859 libcfs: use a workqueue for rehash work.) is the first one needs to be applied, and https://review.whamcloud.com/40266 (LU-14029 kernel: new kernel [SLES15 SP2 5.3.18-24.24.1]) is the last one (the tip of the patch series).

Comment by Jay Lan (Inactive) [ 27/Oct/20 ]

Hi Jian,

Does any of patches you listed require gcc8 or kernel5.4? LU-12904 "Support for linux kernel version 5.4" seems to suggest patches fro kernel 5.4?

sles15sp2 is running kernel5.3 and gcc7.

Thanks,
Jay

Comment by Jian Yu [ 27/Oct/20 ]

No, Jay.
Here is the info on my SLES15 SP2 build node:

# uname -r
5.3.18-24.24-default
# gcc --version | head -1
gcc (SUSE Linux) 7.5.0

The commit messages for the above four LU-12904 patches showed that two of the changes are needed for kernel 5.2 and 5.3. The other two patches are back-ported to resolve patch conflicts.

Comment by Jay Lan (Inactive) [ 29/Oct/20 ]

Hi Jian,

I needed to create a temporary workaround to address a synopsis change of rdma_reject() in mofed-5.1. The affected code is in lnet/klnds/o2iblnd/o2iblnd_cb.c.

Otherwise, all worked well! I had rpms created. Thank you for your help!

Comment by Jian Yu [ 29/Oct/20 ]

You're welcome, Jay.
The rdma_reject() issue was fixed in https://review.whamcloud.com/39781 on Lustre b2_12 branch. Could you please take a look if your b2_12 codes contain that patch?

Comment by Jay Lan (Inactive) [ 30/Oct/20 ]

Ah, no, I did not have that patch in our nas-2.12.5 branch. Thank you.

Generated at Sat Feb 10 03:06:12 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.