Details
-
New Feature
-
Resolution: Fixed
-
Minor
-
None
-
9223372036854775807
Description
A new script to be used in Pacemaker to monitor LNet compatible with ZFS and LDISKFS based Lustre server installations.
This RA is able to monitor a single LNet device using the Pacemaker's clone technology.
pcs resource create [Resource Name] ocf:pacemaker:healthLNET \ dampen=[seconds 5s] \ multiplier=[number 1000] \ lctl=[ true | false] \ device=[device name ib0] \ host_list=[ list of NIDs, space separated, if lctl is true otherwise list of IPs] \ --clone
where:
- dampen The time to wait (dampening) further changes occur
- multiplier The number by which to multiply the number of connected ping nodes by
- attempts Number of ping attempts, per host, before declaring it dead
- timeout How long, in seconds, to wait before declaring a ping lost
- lctl Option to enable lctl ping instead of the normal ping. The default is true
- device Device used for the LNET network. We assume the same device accross the cluster
This script should be located in /usr/lib/ocf/resource.d/heartbeat/ of both the Lustre servers with permission 755.
Default values:
- dampen 5s
- multiplier 1
- attempts 3
- timeout 5s
- lctl true
Default timeout:
- start timeout 60s
- stop timeout 20s
- monitor timeout 60s interval 10s
Compatible and tested:
- pacemaker 1.1.13
- corosync 2.3.4
- pcs 0.9.143
- RHEL/CentOS 7.2
Example of procedure to configure:
pcs resource create healthLNET ocf:pacemaker:healthLNET dampen=5s multiplier=1000 lctl=true device=eth1 host_list="10.10.130.1@tcp1 10.10.130.2@tcp1" --clone targets=`crm_mon -1|grep 'OST'| awk '{print $1}'` for i in $targets; do pcs constraint location $i rule score=-INFINITY pingd lt 1 or not_defined pingd; done
Attachments
Issue Links
- is blocking
-
LU-9168 Add pacemaker resources to lustre rpms
- Resolved