[LU-10858] lustre-initialization-1 lustre-initialization fails for SLES12 SP2 and SP3 Created: 27/Mar/18  Updated: 29/Mar/18  Resolved: 29/Mar/18

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.11.0

Type: Bug Priority: Minor
Reporter: Maloo Assignee: Bob Glossman (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-10556 lustre client rebuild not building ln... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for James Nunez <james.a.nunez@intel.com>

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/199f17d0-3149-11e8-b74b-52540065bddc

lustre-initialization failed with the following error:

'lustre-initialization failed'

<<Please provide additional information about the failure here>>
Looking at the autotest log, we see that the Lustre tests are not installed

2018-03-26T21:56:49 trevis-18vm1: /usr/lib64/lustre/tests/cfg/: No such file or directory
2018-03-26T21:56:49 pdsh@trevis-18vm1: trevis-18vm1: ssh exited with exit code 1
2018-03-26T21:56:49 trevis-18vm3: /usr/lib64/lustre/tests/cfg/: No such file or directory
2018-03-26T21:56:49 pdsh@trevis-18vm1: trevis-18vm3: ssh exited with exit code 1
2018-03-26T21:56:49 trevis-18vm4: /usr/lib64/lustre/tests/cfg/: No such file or directory
2018-03-26T21:56:49 pdsh@trevis-18vm1: trevis-18vm4: ssh exited with exit code 1
2018-03-26T21:56:49 trevis-18vm2: /usr/lib64/lustre/tests/cfg/: No such file or directory
2018-03-26T21:56:49 pdsh@trevis-18vm1: trevis-18vm2: ssh exited with exit code 1

Yet, looking at the node console logs, I don’t see any failure relating to loading RPMS. Looking at the console logs for all the nodes, they all end with

Welcome to SUSE Linux Enterprise Server 12 SP3  (x86_64) - Kernel 4.4.114-94.11-default (ttyS0).


trevis-18vm2 login: [   80.209414] random: nonblocking pool is initialized

<ConMan> Console [trevis-18vm2] disconnected from <trevis-18:6001> at 03-26 22:56.

This failure started with master build #3731.

Another test session that failed in this way is at
https://testing.hpdd.intel.com/test_sessions/fb84aaa9-888e-4d17-9a76-1cfd67d415aa

VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
lustre-initialization-1 lustre-initialization - 'lustre-initialization failed'



 Comments   
Comment by James Nunez (Inactive) [ 27/Mar/18 ]

Test to see if reverting LU-10556 patch https://review.whamcloud.com/#/c/31710/ will fix this issue.

Revert patch at https://review.whamcloud.com/#/c/31800/

Comment by Peter Jones [ 27/Mar/18 ]

Bob

Can you please investigate?

Peter

Comment by James A Simmons [ 27/Mar/18 ]

Really? That is just adding in build requirements when building from the lustre source rpms. Are those rpms named differently on SLES12SP4?

Comment by Bob Glossman (Inactive) [ 27/Mar/18 ]

yes, this is a problem with naming conventions.
In RHEL the user level libyaml .rpm is named "libyaml" and Provides the symbol libyaml
in SLES the user level libyaml .rpm is named "libyaml-0-2" and has no Provides of the name "libyaml". It only Provides the symbol "libyaml-0-2"

would suggest removing the dependency this mod just added.

Comment by Jian Yu [ 27/Mar/18 ]

Yes, James.
I just found 'zlib' package was named as 'libz1' on SLES 12.

Comment by James A Simmons [ 27/Mar/18 ]

Oh crap. At least it is a easy fix. Its still zlib-devel tho?

Comment by Bob Glossman (Inactive) [ 27/Mar/18 ]

the zlib dependency isn't a problem. the SLES .rpm libz1 has a Provides for the name "zlib"

Comment by Jian Yu [ 27/Mar/18 ]

Its still zlib-devel tho?

Yes. It's still zlib-devel.

Comment by Bob Glossman (Inactive) [ 27/Mar/18 ]

-devel .rpms are fine. both libyaml-devel & zlib-devel exist in both RHEL and SLES.
I question the strategy of enforcing this in the .spec file though.
I thought there are already autoconf tests to check for and enforce the right build environment in these cases.

Comment by James A Simmons [ 27/Mar/18 ]

Its about pulling in the right rpms when you install a lustre binary rpm. Currently you can install rpm a prepackage lustre rpm on a system that could be lacking libyaml and/or libzlib. In that case you think it installed right but then when you go to run it you see a nice crash. Using yum you can grab libyaml and zlib when installing lustre binary rpm with this patch.

Comment by Bob Glossman (Inactive) [ 27/Mar/18 ]

wasn't questioning the additional Requires, although if you want that it will need to be conditional due to the different names in different distros.
I was questioning the need for added Build-Requires.

Comment by Jian Yu [ 27/Mar/18 ]

I was questioning the need for added Build-Requires.

Since the package names of libyaml-devel and zlib-devel are both correct on RHEL and SLES, I wonder if the following line in lustre.spec.in can cause any issue:

BuildRequires: libtool libyaml-devel zlib-devel
Comment by Bob Glossman (Inactive) [ 27/Mar/18 ]

Since the names are the same I don't see any harm.
I just question the need for it at all.
I'm of the "if it's not broke, don't fix it" school of thought.

Comment by James A Simmons [ 27/Mar/18 ]

The BuildRequires were added to make people use use mock and other build system like that happy.  FOr somethingf like mock you drop in the source rpm which will use BuildRequires to pull down the needed development rpms to build the packages.

libtool appears to the same. The issues is the logs from the failure are pretty useless. Do you have something better?

 

Comment by Bob Glossman (Inactive) [ 27/Mar/18 ]

if having the extra BuildRequires makes mock behave better I have no major objection.
for manual builds I have found examining the config.log of failed builds usually gives enough clues to figure it out.

If you insist on in I suggest something like the following to adapt to different names:

--- a/lustre.spec.in
+++ b/lustre.spec.in
@@ -81,12 +81,14 @@
 %global modules_fs_path /lib/modules/%{kversion}/%{kmoddir}
 
 %if %{_vendor}=="redhat" || %{_vendor}=="fedora"
+	%global requires_yaml_name libyaml
 	%global requires_kmod_name kmod-%{lustre_name}
 	%if %{with lustre_tests}
 		%global requires_kmod_tests_name kmod-%{lustre_name}-tests
 	%endif
 	%global requires_kmod_version %{version}
 %else	#for Suse
+	%global requires_yaml_name libyaml-0-2
 	%global requires_kmod_name %{lustre_name}-kmp
 	%if %{with lustre_tests}
 		%global requires_kmod_tests_name %{lustre_name}-tests-kmp
@@ -132,7 +134,8 @@ Source6: kmp-lustre-osd-zfs.files
 Source7: kmp-lustre-tests.files
 URL: https://wiki.hpdd.intel.com/
 BuildRoot: %{_tmppath}/lustre-%{version}-root
-Requires: %{requires_kmod_name} = %{requires_kmod_version} libyaml zlib
+Requires: %{requires_kmod_name} = %{requires_kmod_version} zlib
+Requires: %{requires_yaml_name}
 BuildRequires: libtool libyaml-devel zlib-devel
 %if %{with servers}
 Requires: lustre-osd

I don't say this is the best fix, but I think it will work.

Comment by Jian Yu [ 27/Mar/18 ]

Hi Bob,
For SLES 11, the package name is 'zlib'. And for SLES 12, the name is 'libz1'.

Comment by Bob Glossman (Inactive) [ 27/Mar/18 ]

Don't think the exact package name matters. As long as the .rpm has a Provides of "zlib" in it, it will be found and installed if required as a dependency.

In any case support for SLES11 has been stopped or is going to be going away soon on master.

Comment by James A Simmons [ 27/Mar/18 ]

Actually that fix looks good Bob. I'm going to try it.

Comment by Gerrit Updater [ 28/Mar/18 ]

James Simmons (uja.ornl@yahoo.com) uploaded a new patch: https://review.whamcloud.com/31815
Subject: LU-10858 build: handle yaml library packaging on SLES systems
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: fc4a9793c5ef2a3abd260474fc9f9dc2e9102673

Comment by Gerrit Updater [ 29/Mar/18 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/31815/
Subject: LU-10858 build: handle yaml library packaging on SLES systems
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 20ad3ed15c321c7740988728c49a97105c59a3c4

Comment by Peter Jones [ 29/Mar/18 ]

Landed for 2.11

Generated at Sat Feb 10 02:38:48 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.