[LUDOC-69] Lustre Manual needs updated Failover section Created: 16/Jul/12  Updated: 01/Nov/13  Resolved: 01/Nov/13

Status: Resolved
Project: Lustre Documentation
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Cliff White (Inactive) Assignee: Linda Bebernes (Inactive)
Resolution: Fixed Votes: 0
Labels: QContent

Business Value: 7
Severity: 3
Rank (Obsolete): 7145

 Description   

The "configuring failover" section in the Whamcloud release of the
Lustre manual seems rather out of date:

http://build.whamcloud.com/job/lustre-manual/lastSuccessfulBuild/artifact/lustre_manual.html#configuringfailover

The Oracle release says much the same thing:
http://wiki.lustre.org/manual/LustreManual20_HTML/ConfiguringFailover.html#50540588_50628

In section 11.1.1 "Power management software", it says:

"For more information about PowerMan, go to:
https://computing.llnl.gov/linux/powerman.html"

Which no longer exists. It should probably point at
http://code.google.com/p/powerman/

Then in section 11.2. "Setting up High-Availability (HA) Software with
Lustre" it mentions "Red Hat Cluster Manager" and "Pacemaker".

"Red Hat Cluster Manager" points to
http://wiki.lustre.org/index.php/Using_Red_Hat_Cluster_Manager_with_Lustre

which says "In comparison with other HA solutions, RedHat Cluster as in
RHEL 5.5 is an old HA solution. We recommend using other HA solutions
like Pacemaker, if possible. "

The pacemaker link:
http://wiki.lustre.org/index.php/Using_Pacemaker_with_Lustre

Although the title of this is "Using Pacemaker with Lustre", it starts
off by saying "In modern clusters, OpenAIS, or more specifically, its
communication stack corosync, is used for this task".

In summary:

1) The manual could do with some updating here.

2) I suspect I should be using corosync.



 Comments   
Comment by Jodi Levi (Inactive) [ 18/Mar/13 ]

Cliff,
This is part of the quality improvement project for the Lustre Manual. Please feel free to work with Linda on this or reach out to her with questions.

Comment by Linda Bebernes (Inactive) [ 23/Oct/13 ]

Changes pushed to gerrit and ready for review at http://review.whamcloud.com/8058

Ch 3 Intro to Failover - edits for clarity, fixed missing figure
Ch 11 Configuring Lustre Failover - major rewrite to update and
clarify content
Ch 13 Lustre Operations - edited failover-related entries for clarity,
updated example from Elan to Ethernet, added cross-ref to Ch 11
Ch 14 Lustre Maintenance - edited failover-related entries for clarity,
added crossref to Ch 11
Ch 20 MMP - changed chapter name from "Managing Failover" to
"Lustre Failover and Multi-Mount Protection", minor edits, added xref to Ch 11
Ch 36 - updated --servicenode and --failnode descriptons for mkfs.lustre
and tunefs.lustre

Comment by Linda Bebernes (Inactive) [ 01/Nov/13 ]

Changes reviewed and merged. Resolved

Generated at Sat Feb 10 03:40:00 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.