[LU-107] Lustre init scripts with heartbeat v1 integration Created: 02/Mar/11 Updated: 25/Oct/12 Resolved: 27/Sep/12 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.1.0 |
| Fix Version/s: | Lustre 2.3.0 |
| Type: | Improvement | Priority: | Minor |
| Reporter: | Ned Bass | Assignee: | Oleg Drokin |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Bugzilla ID: | 20,165 |
| Rank (Obsolete): | 4555 |
| Description |
|
This issue is to request that Lustre initialization scripts developed at LLNL be reviewed for inclusion in Lustre. A Gerrit submission for review is on its way. |
| Comments |
| Comment by Ned Bass [ 02/Mar/11 ] |
| Comment by Ned Bass [ 04/Mar/11 ] |
|
Updated gerrit with a couple of bug fixes in ldev script.
|
| Comment by Build Master (Inactive) [ 04/Mar/11 ] |
|
Integrated in Ned Bass : 33e3a53d63cfaa85da01c0c8b1032704f5c745d9
|
| Comment by Build Master (Inactive) [ 04/Mar/11 ] |
|
Integrated in Ned Bass : c0c1013e14d5bf99df874cd52fa747962b4441d3
|
| Comment by Peter Jones [ 04/Mar/11 ] |
|
Oleg Could you please assess whether this is safe to include in 2.1 Thanks Peter |
| Comment by Robert Read (Inactive) [ 04/Mar/11 ] |
|
This is currently breaking the build so it needs some updating. I've also asked Brian to inspect this. |
| Comment by Ned Bass [ 05/Mar/11 ] |
|
Hi Robert, Thanks for your comments. I replaced lustre with lustre.in in EXTRA_DIST in lustre/scripts/Makefile.am and got good results on my end (i.e. 'make rpms' still works). Removing Lustre from EXTRA_DIST breaks 'make rpms' so I left it in. However, I don't think this is what is breaking the ubuntu build. To confirm, I submitted an unmodified master to Hudson and it fails in the same way: http://build.whamcloud.com/job/reviews-ubuntu/229 reverting patch 0032- Thanks, |
| Comment by Brian Murrell (Inactive) [ 05/Mar/11 ] |
|
Net Bass said ...
I've had a look at this build attempt. It's based on quite an old revision of master from back at the end of December 2010. I have landed at least one fix to the debian (and therefore ubuntu) build code since then. I'd be willing to bet that if you rebase your changes to the most recent master, this issue will go away. |
| Comment by Robert Read (Inactive) [ 06/Mar/11 ] |
|
Ned, your master branch is very old. It looks like you are still based on the Oracle tree, and the ubuntu build is broken in that version. http://git.whamcloud.com/?p=fs/lustre-release.git;a=log;h=f537233800d39a456d318815578aaafecc974fde Please rebase your branch with the current master in fs/lustre-repository and push your request again. |
| Comment by Build Master (Inactive) [ 07/Mar/11 ] |
|
Integrated in Ned Bass : cdd6bbe6152647db5bb8d388313ec03f35fcb080
|
| Comment by Ned Bass [ 07/Mar/11 ] |
|
My apologies--I was accidentally using review/lustre instead of review/fs/lustre-release. I rebased and resubmitted and the ubuntu build still fails, but I think I understand why now. My patch removes lustre/scripts/lustre and replaces it with lustre/scripts/lustre.in (due to the name of the tune2fs executable being determined at configure time). So lustre/scripts/lustre gets auto-generated when configure is run. But following configure the build runs fakeroot debian/rules clean which reverts all the patches. Reverting my patch tries to recreate lustre/scripts/lustre, but this fails because it already exists (it was created by configure). I suppose one way to fix this is to separate out the removal of lustre/scripts/lustre as a separate patch. Thoughts? Thanks, |
| Comment by Brian Murrell (Inactive) [ 08/Mar/11 ] |
No worries. Glad you figured out what it was.
Nice catch. I also discovered the same yesterday since I was looking at why this build was failing also. It is a "perfect storm" of conditions that causes this.
I don't think that will fix it, but my changeset for |
| Comment by Ned Bass [ 08/Mar/11 ] |
|
That worked! Thanks Brian. |
| Comment by Build Master (Inactive) [ 08/Mar/11 ] |
|
Integrated in Ned Bass : a7de9d4b241454fa54e8e4638594240fec3bc82d
|
| Comment by Build Master (Inactive) [ 08/Mar/11 ] |
|
Integrated in Ned Bass : ffdb6f1830fead020e79aab7c32b4633e6bdb179
|
| Comment by Andreas Dilger [ 08/May/12 ] |
|
Ned, Brian, are there any implications to an existing system if /etc/init.d/lustre and /etc/init.d/lnet are suddenly added to an existing system? What we don't want is that someone upgrades to Lustre 2.4 from 2.1 and suddenly their system is unusable until they generate an /etc/ldev.conf or something. My assumption is that nobody ever reads the manual or release notes when upgrading, so if it doesn't work correctly "out of the box" then something was done incorrectly by the code(r). |
| Comment by Ned Bass [ 08/May/12 ] |
|
Hi Andreas, If a site hasn't configured /etc/ldev.conf then adding the /etc/init.d/lustre script should have no effect. That is, it won't start any services or interfere with whatever method was used to start lustre before updating. Also it not run by the init system by default, but rather it is intended to by started by a HA/failover mechanism such as heartbeat. /etc/init.d/lnet could start lnet sooner than was the case before updating, but I wouldn't expect that to break anything. |
| Comment by Andreas Dilger [ 08/May/12 ] |
|
Doug, |
| Comment by Ned Bass [ 08/May/12 ] |
That is unless there was already a site-specific /etc/init.d/lustre script in place and we overwrite it. The safest thing to do may be to mark it %config(noreplace) in the spec file. |
| Comment by Wally Wang (Inactive) [ 20/Sep/12 ] |
|
When we do our packaging on SLES11 SP1/2, we run into the following problems: 1. no LSB header information: E: File `lnet' without LSB header found in /var/tmp/cray-lustre-cray_gem_c-2.3_3.0.34_0.7.9_1.0000.6718.11.1-root/etc/init.d/ 2. need sysconfig.lustre in /var/adm/fillup-templates: cray-lustre-cray_gem_c: "/etc/sysconfig/lustre" is not allowed anymore in SuSE Linux. 3. if failover is only for servers, it should probably be excluded from client build |
| Comment by Ned Bass [ 20/Sep/12 ] |
|
Hi Wally, Thanks for reporting these problems. We knew the init scripts would probably need work to properly support non-redhat distros. I don't currently have a SLES system to test on, but when I get a chance I'll try to bring up a VM to look into this. Are you just using 'make rpm'? |
| Comment by Wally Wang (Inactive) [ 21/Sep/12 ] |
|
We have our own make/spec to build for our environment but I think you should run into the same problem using 'make rpm' in SLES11. |
| Comment by Jodi Levi (Inactive) [ 27/Sep/12 ] |
|
Please reopen this ticket if there is outstanding work to do. |
| Comment by Cory Spitz [ 24/Oct/12 ] |
|
Ned Bass was right, people who have a preexisting /etc/init.d/lustre will be in trouble.
That wasn't done when this landed for 2.3, so it has broken Cray's environment. (besides the other issues that Wally raised) |
| Comment by Andreas Dilger [ 25/Oct/12 ] |
|
Cory and/or Ned, could you please submit a patch to resolve this issue. |