Details

    • 9223372036854775807

    Description

      A new script to be used in Pacemaker to monitor LNet compatible with ZFS and LDISKFS based Lustre server installations.

      This RA is able to monitor a single LNet device using the Pacemaker's clone technology.

      pcs resource create [Resource Name] ocf:pacemaker:healthLNET \
      dampen=[seconds 5s] \
      multiplier=[number 1000] \
      lctl=[ true | false] \ 
      device=[device name ib0] \ 
      host_list=[ list of NIDs, space separated, if lctl is true otherwise list of IPs] \
      --clone 
      

      where:

      • dampen The time to wait (dampening) further changes occur
      • multiplier The number by which to multiply the number of connected ping nodes by
      • attempts Number of ping attempts, per host, before declaring it dead
      • timeout How long, in seconds, to wait before declaring a ping lost
      • lctl Option to enable lctl ping instead of the normal ping. The default is true
      • device Device used for the LNET network. We assume the same device accross the cluster

      This script should be located in /usr/lib/ocf/resource.d/heartbeat/ of both the Lustre servers with permission 755.

      Default values:

      • dampen 5s
      • multiplier 1
      • attempts 3
      • timeout 5s
      • lctl true

      Default timeout:

      • start timeout 60s
      • stop timeout 20s
      • monitor timeout 60s interval 10s

      Compatible and tested:

      • pacemaker 1.1.13
      • corosync 2.3.4
      • pcs 0.9.143
      • RHEL/CentOS 7.2

      Example of procedure to configure:

      pcs resource create healthLNET ocf:pacemaker:healthLNET dampen=5s multiplier=1000 lctl=true device=eth1 host_list="10.10.130.1@tcp1 10.10.130.2@tcp1" --clone 
      
      targets=`crm_mon -1|grep 'OST'| awk '{print $1}'` 
      
      for i in $targets; do pcs constraint location $i rule score=-INFINITY pingd lt 1 or not_defined pingd; done 
      

      Attachments

        Issue Links

          Activity

            [LU-8457] Pacemaker script to monitor LNet

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/25297/
            Subject: LU-8457 pacemaker: Update healthLNET to 0.99.4
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: f5530a0faa24ad836a44bdd8d0ce86bf806fde87

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/25297/ Subject: LU-8457 pacemaker: Update healthLNET to 0.99.4 Project: fs/lustre-release Branch: master Current Patch Set: Commit: f5530a0faa24ad836a44bdd8d0ce86bf806fde87

            Nathaniel Clark (nathaniel.l.clark@intel.com) uploaded a new patch: https://review.whamcloud.com/25297
            Subject: LU-8457 pacemaker: Update healthLNET to 0.99.4
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 8594bde54d52a92452535c226ae9da21affc8f8d

            gerrit Gerrit Updater added a comment - Nathaniel Clark (nathaniel.l.clark@intel.com) uploaded a new patch: https://review.whamcloud.com/25297 Subject: LU-8457 pacemaker: Update healthLNET to 0.99.4 Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 8594bde54d52a92452535c226ae9da21affc8f8d
            pjones Peter Jones added a comment -

            Landed for 2.10

            pjones Peter Jones added a comment - Landed for 2.10

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/22266/
            Subject: LU-8457 pacemaker: Pacemaker script to monitor LNet
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 9018f11cd5a1ab82353e79271163ef51db081e95

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/22266/ Subject: LU-8457 pacemaker: Pacemaker script to monitor LNet Project: fs/lustre-release Branch: master Current Patch Set: Commit: 9018f11cd5a1ab82353e79271163ef51db081e95

            Gabriele Paciucci (gabriele.paciucci@intel.com) uploaded a new patch: http://review.whamcloud.com/22266
            Subject: LU-8457 subject: Pacemaker script to monitor LNet
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 8b1ee4646a31818f73dfc18c46f5d38cd48156b7

            gerrit Gerrit Updater added a comment - Gabriele Paciucci (gabriele.paciucci@intel.com) uploaded a new patch: http://review.whamcloud.com/22266 Subject: LU-8457 subject: Pacemaker script to monitor LNet Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 8b1ee4646a31818f73dfc18c46f5d38cd48156b7

            People

              gabriele.paciucci Gabriele Paciucci (Inactive)
              gabriele.paciucci Gabriele Paciucci (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: