Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8457

Pacemaker script to monitor LNet

    XMLWordPrintable

Details

    • 9223372036854775807

    Description

      A new script to be used in Pacemaker to monitor LNet compatible with ZFS and LDISKFS based Lustre server installations.

      This RA is able to monitor a single LNet device using the Pacemaker's clone technology.

      pcs resource create [Resource Name] ocf:pacemaker:healthLNET \
      dampen=[seconds 5s] \
      multiplier=[number 1000] \
      lctl=[ true | false] \ 
      device=[device name ib0] \ 
      host_list=[ list of NIDs, space separated, if lctl is true otherwise list of IPs] \
      --clone 
      

      where:

      • dampen The time to wait (dampening) further changes occur
      • multiplier The number by which to multiply the number of connected ping nodes by
      • attempts Number of ping attempts, per host, before declaring it dead
      • timeout How long, in seconds, to wait before declaring a ping lost
      • lctl Option to enable lctl ping instead of the normal ping. The default is true
      • device Device used for the LNET network. We assume the same device accross the cluster

      This script should be located in /usr/lib/ocf/resource.d/heartbeat/ of both the Lustre servers with permission 755.

      Default values:

      • dampen 5s
      • multiplier 1
      • attempts 3
      • timeout 5s
      • lctl true

      Default timeout:

      • start timeout 60s
      • stop timeout 20s
      • monitor timeout 60s interval 10s

      Compatible and tested:

      • pacemaker 1.1.13
      • corosync 2.3.4
      • pcs 0.9.143
      • RHEL/CentOS 7.2

      Example of procedure to configure:

      pcs resource create healthLNET ocf:pacemaker:healthLNET dampen=5s multiplier=1000 lctl=true device=eth1 host_list="10.10.130.1@tcp1 10.10.130.2@tcp1" --clone 
      
      targets=`crm_mon -1|grep 'OST'| awk '{print $1}'` 
      
      for i in $targets; do pcs constraint location $i rule score=-INFINITY pingd lt 1 or not_defined pingd; done 
      

      Attachments

        Issue Links

          Activity

            People

              gabriele.paciucci Gabriele Paciucci (Inactive)
              gabriele.paciucci Gabriele Paciucci (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: