[LU-107] Lustre init scripts with heartbeat v1 integration Created: 02/Mar/11  Updated: 25/Oct/12  Resolved: 27/Sep/12

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.1.0
Fix Version/s: Lustre 2.3.0

Type: Improvement Priority: Minor
Reporter: Ned Bass Assignee: Oleg Drokin
Resolution: Fixed Votes: 0
Labels: None

Bugzilla ID: 20,165
Rank (Obsolete): 4555

 Description   

This issue is to request that Lustre initialization scripts developed at LLNL be reviewed for inclusion in Lustre. A Gerrit submission for review is on its way.



 Comments   
Comment by Ned Bass [ 02/Mar/11 ]

See http://review.whamcloud.com/290

Comment by Ned Bass [ 04/Mar/11 ]

Updated gerrit with a couple of bug fixes in ldev script.

  • missing implementation of -r flag to query raidtab field
  • incorrectly parsed block device names containing :
Comment by Build Master (Inactive) [ 04/Mar/11 ]

Integrated in reviews-centos5 #385
LU-107 Add scripts for implementing heartbeat v1 failover

Ned Bass : 33e3a53d63cfaa85da01c0c8b1032704f5c745d9
Files :

  • lustre/scripts/haconfig
  • lustre/scripts/lhbadm
  • lustre/scripts/lustre.in
  • lustre/autoconf/lustre-core.m4
  • lustre/doc/lhbadm.8
  • lustre/conf/lustre
  • lustre/doc/nids.5
  • lustre/scripts/Makefile.am
  • build/autoconf/lustre-build.m4
  • lustre/doc/ldev.8
  • lustre.spec.in
  • lustre/scripts/lustre
  • lustre/conf/ldev.conf
  • lustre/scripts/Lustre
  • lustre/doc/ldev.conf.5
  • lustre/scripts/ldev
  • lustre/conf/Makefile.am
  • lustre/scripts/lnet
  • lustre/doc/Makefile.am
Comment by Build Master (Inactive) [ 04/Mar/11 ]

Integrated in reviews-centos5 #389
LU-107 Add scripts for implementing heartbeat v1 failover

Ned Bass : c0c1013e14d5bf99df874cd52fa747962b4441d3
Files :

  • lustre/scripts/lnet
  • lustre/scripts/lustre.in
  • build/autoconf/lustre-build.m4
  • lustre/doc/ldev.8
  • lustre/doc/nids.5
  • lustre/conf/ldev.conf
  • lustre/conf/lustre
  • lustre/doc/lhbadm.8
  • lustre/scripts/haconfig
  • lustre.spec.in
  • lustre/scripts/Lustre
  • lustre/conf/Makefile.am
  • lustre/scripts/lustre
  • lustre/doc/ldev.conf.5
  • lustre/scripts/ldev
  • lustre/autoconf/lustre-core.m4
  • lustre/scripts/Makefile.am
  • lustre/doc/Makefile.am
  • lustre/scripts/lhbadm
Comment by Peter Jones [ 04/Mar/11 ]

Oleg

Could you please assess whether this is safe to include in 2.1

Thanks

Peter

Comment by Robert Read (Inactive) [ 04/Mar/11 ]

This is currently breaking the build so it needs some updating. I've also asked Brian to inspect this.

Comment by Ned Bass [ 05/Mar/11 ]

Hi Robert,

Thanks for your comments. I replaced lustre with lustre.in in EXTRA_DIST in lustre/scripts/Makefile.am and got good results on my end (i.e. 'make rpms' still works). Removing Lustre from EXTRA_DIST breaks 'make rpms' so I left it in.

However, I don't think this is what is breaking the ubuntu build. To confirm, I submitted an unmodified master to Hudson and it fails in the same way:

http://build.whamcloud.com/job/reviews-ubuntu/229

reverting patch 0032-LU-111-Fix-force-options-parsing from ./ ... failed.
make[1]: *** [clean] Error 1

Thanks,
Ned

Comment by Brian Murrell (Inactive) [ 05/Mar/11 ]

Net Bass said ...

However, I don't think this is what is breaking the ubuntu build. To confirm, I submitted an unmodified master to Hudson and it fails in the same way:

http://build.whamcloud.com/job/reviews-ubuntu/229

I've had a look at this build attempt. It's based on quite an old revision of master from back at the end of December 2010. I have landed at least one fix to the debian (and therefore ubuntu) build code since then.

I'd be willing to bet that if you rebase your changes to the most recent master, this issue will go away.

Comment by Robert Read (Inactive) [ 06/Mar/11 ]

Ned, your master branch is very old. It looks like you are still based on the Oracle tree, and the ubuntu build is broken in that version.

http://git.whamcloud.com/?p=fs/lustre-release.git;a=log;h=f537233800d39a456d318815578aaafecc974fde

Please rebase your branch with the current master in fs/lustre-repository and push your request again.

Comment by Build Master (Inactive) [ 07/Mar/11 ]

Integrated in reviews-centos5 #405
LU-107 Add scripts for implementing heartbeat v1 failover

Ned Bass : cdd6bbe6152647db5bb8d388313ec03f35fcb080
Files :

  • lustre/autoconf/lustre-core.m4
  • lustre/conf/lustre
  • lustre/doc/ldev.8
  • lustre/scripts/lhbadm
  • lustre/scripts/Lustre
  • lustre/conf/ldev.conf
  • lustre/scripts/lnet
  • lustre/scripts/ldev
  • lustre/scripts/lustre.in
  • lustre.spec.in
  • lustre/scripts/haconfig
  • lustre/conf/Makefile.am
  • build/autoconf/lustre-build.m4
  • lustre/doc/nids.5
  • lustre/scripts/Makefile.am
  • lustre/scripts/lustre
  • lustre/doc/Makefile.am
  • lustre/doc/ldev.conf.5
  • lustre/doc/lhbadm.8
Comment by Ned Bass [ 07/Mar/11 ]

My apologies--I was accidentally using review/lustre instead of review/fs/lustre-release.

I rebased and resubmitted and the ubuntu build still fails, but I think I understand why now. My patch removes lustre/scripts/lustre and replaces it with lustre/scripts/lustre.in (due to the name of the tune2fs executable being determined at configure time). So lustre/scripts/lustre gets auto-generated when configure is run. But following configure the build runs

fakeroot debian/rules clean

which reverts all the patches. Reverting my patch tries to recreate lustre/scripts/lustre, but this fails because it already exists (it was created by configure). I suppose one way to fix this is to separate out the removal of lustre/scripts/lustre as a separate patch. Thoughts?

Thanks,
Ned

Comment by Brian Murrell (Inactive) [ 08/Mar/11 ]

My apologies--I was accidentally using review/lustre instead of review/fs/lustre-release.

No worries. Glad you figured out what it was.

I rebased and resubmitted and the ubuntu build still fails, but I think I understand why now. My patch removes lustre/scripts/lustre and replaces it with lustre/scripts/lustre.in (due to the name of the tune2fs executable being determined at configure time). So lustre/scripts/lustre gets auto-generated when configure is run. But following configure the build runs

fakeroot debian/rules clean

which reverts all the patches. Reverting my patch tries to recreate lustre/scripts/lustre, but this fails because it already exists (it was created by configure).

Nice catch. I also discovered the same yesterday since I was looking at why this build was failing also. It is a "perfect storm" of conditions that causes this.

I suppose one way to fix this is to separate out the removal of lustre/scripts/lustre as a separate patch. Thoughts?

I don't think that will fix it, but my changeset for LU-120 does. Perhaps you can cherry-pick that change and put it in front of yours and see if it fixes it. It does for me, locally.

Comment by Ned Bass [ 08/Mar/11 ]

That worked! Thanks Brian.

http://build.whamcloud.com/job/reviews-ubuntu/256/

Comment by Build Master (Inactive) [ 08/Mar/11 ]

Integrated in reviews-centos5 #416
LU-107 Add scripts for implementing heartbeat v1 failover

Ned Bass : a7de9d4b241454fa54e8e4638594240fec3bc82d
Files :

  • lustre/scripts/lustre.in
  • lustre/conf/ldev.conf
  • lustre/scripts/lnet
  • lustre/doc/ldev.conf.5
  • lustre/autoconf/lustre-core.m4
  • lustre/doc/nids.5
  • lustre/scripts/lhbadm
  • lustre/scripts/haconfig
  • lustre/doc/lhbadm.8
  • lustre/conf/Makefile.am
  • lustre/conf/lustre
  • lustre.spec.in
  • build/autoconf/lustre-build.m4
  • lustre/scripts/lustre
  • lustre/doc/Makefile.am
  • lustre/doc/ldev.8
  • lustre/scripts/Lustre
  • lustre/scripts/ldev
  • lustre/scripts/Makefile.am
Comment by Build Master (Inactive) [ 08/Mar/11 ]

Integrated in reviews-centos5 #418
LU-107 Add scripts for implementing heartbeat v1 failover

Ned Bass : ffdb6f1830fead020e79aab7c32b4633e6bdb179
Files :

  • lustre/scripts/haconfig
  • lustre/autoconf/lustre-core.m4
  • lustre/doc/ldev.conf.5
  • build/autoconf/lustre-build.m4
  • lustre/doc/lhbadm.8
  • lustre/scripts/Lustre
  • lustre/doc/Makefile.am
  • lustre/scripts/lustre.in
  • lustre/doc/ldev.8
  • lustre/conf/ldev.conf
  • lustre/scripts/lhbadm
  • lustre/doc/nids.5
  • lustre/scripts/Makefile.am
  • lustre/conf/lustre
  • lustre/conf/Makefile.am
  • lustre/scripts/ldev
  • lustre/scripts/lustre
  • lustre/scripts/lnet
  • lustre.spec.in
Comment by Andreas Dilger [ 08/May/12 ]

Ned, Brian, are there any implications to an existing system if /etc/init.d/lustre and /etc/init.d/lnet are suddenly added to an existing system?

What we don't want is that someone upgrades to Lustre 2.4 from 2.1 and suddenly their system is unusable until they generate an /etc/ldev.conf or something. My assumption is that nobody ever reads the manual or release notes when upgrading, so if it doesn't work correctly "out of the box" then something was done incorrectly by the code(r).

Comment by Ned Bass [ 08/May/12 ]

Hi Andreas,

If a site hasn't configured /etc/ldev.conf then adding the /etc/init.d/lustre script should have no effect. That is, it won't start any services or interfere with whatever method was used to start lustre before updating. Also it not run by the init system by default, but rather it is intended to by started by a HA/failover mechanism such as heartbeat.

/etc/init.d/lnet could start lnet sooner than was the case before updating, but I wouldn't expect that to break anything.

Comment by Andreas Dilger [ 08/May/12 ]

Doug,
could you please have a look at how the patch in http://review.whamcloud.com/290 complements or conflicts with your proposed changes to LNET configuration. I believe you were already planning to start with an /etc/init.d/lnet startup file. Hopefully by landing this now, it will give you a starting point for your configuration changes, and users will already be aware of this script, so your changes will be transparent to them.

Comment by Ned Bass [ 08/May/12 ]

If a site hasn't configured /etc/ldev.conf then adding the /etc/init.d/lustre script should have no effect.

That is unless there was already a site-specific /etc/init.d/lustre script in place and we overwrite it. The safest thing to do may be to mark it %config(noreplace) in the spec file.

Comment by Wally Wang (Inactive) [ 20/Sep/12 ]

When we do our packaging on SLES11 SP1/2, we run into the following problems:

1. no LSB header information:

E: File `lnet' without LSB header found in /var/tmp/cray-lustre-cray_gem_c-2.3_3.0.34_0.7.9_1.0000.6718.11.1-root/etc/init.d/
E: File `lustre' without LSB header found in /var/tmp/cray-lustre-cray_gem_c-2.3_3.0.34_0.7.9_1.0000.6718.11.1-root/etc/init.d/

2. need sysconfig.lustre in /var/adm/fillup-templates:

cray-lustre-cray_gem_c: "/etc/sysconfig/lustre" is not allowed anymore in SuSE Linux.

3. if failover is only for servers, it should probably be excluded from client build

Comment by Ned Bass [ 20/Sep/12 ]

Hi Wally,

Thanks for reporting these problems. We knew the init scripts would probably need work to properly support non-redhat distros. I don't currently have a SLES system to test on, but when I get a chance I'll try to bring up a VM to look into this.

Are you just using 'make rpm'?

Comment by Wally Wang (Inactive) [ 21/Sep/12 ]

We have our own make/spec to build for our environment but I think you should run into the same problem using 'make rpm' in SLES11.

Comment by Jodi Levi (Inactive) [ 27/Sep/12 ]

Please reopen this ticket if there is outstanding work to do.

Comment by Cory Spitz [ 24/Oct/12 ]

Ned Bass was right, people who have a preexisting /etc/init.d/lustre will be in trouble.

That is unless there was already a site-specific /etc/init.d/lustre script in place and we overwrite it. The safest thing to do may be to mark it %config(noreplace) in the spec file.

That wasn't done when this landed for 2.3, so it has broken Cray's environment. (besides the other issues that Wally raised)
We can probably live with it, but maybe the Ops Manual should be updated to warn the installer.

Comment by Andreas Dilger [ 25/Oct/12 ]

Cory and/or Ned, could you please submit a patch to resolve this issue.

Generated at Sat Feb 10 01:03:49 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.