Details

    • New Feature
    • Resolution: Fixed
    • Minor
    • Lustre 2.10.0
    • None
    • None
    • 9223372036854775807

    Description

      A new script to be used in Pacemaker to manage ZFS pools and Lustre targets.

      This RA is able to manage (import/export) ZFS pools and Lustre Target (mount/umount).

      pcs resource create <Resource Name> ocf:heartbeat:LustreZFS \ 
      pool="<ZFS Pool Name>" \
      volume="<ZFS Volume Name>" \
      mountpoint="<Mount Point" \
      OCF_CHECK_LEVEL=10
      

      where:

      • pool is the pool name of the ZFS resource created in advance
      • volume is the volume name created on the ZFS pool during the Lustre format (mkfs.lustre).
      • mount point is the mount point created in advance on both the Lustre servers
      • OCF_CHECK_LEVEL is optional and enable an extra monitor on the status of the pool

      This script should be located in /usr/lib/ocf/resource.d/heartbeat/ of both the Lustre servers with permission 755.

      The script provides protection from double imports of the pools. In order to activate this functionality is important to configure the hostid protection in ZFS using the genhostid command.

      Default values:

      • no defaults

      Default timeout:

      • start timeout 300s
      • stop timeout 300s
      • monitor timeout 300s interval 20s

      Compatible and tested:

      • pacemaker 1.1.13
      • corosync 2.3.4
      • pcs 0.9.143
      • RHEL/CentOS 7.2

      Attachments

        Issue Links

          Activity

            [LU-8455] Pacemaker script for Lustre and ZFS

            Thanks for that!

            mhaakddn Malcolm Haak - NCI (Inactive) added a comment - Thanks for that!

            mhaakddn: Take a look here:

            http://wiki.lustre.org/Creating_Pacemaker_Resources_for_Lustre_Storage_Services

            Nathaniel Clark has upstreamed the ZFS RA into the resource agents project on GitHub but it will take some time to filter into OS distros. The above-referenced page shows how to download and incorporate into a pacemaker cluster.

            malkolm Malcolm Cowe (Inactive) added a comment - mhaakddn : Take a look here: http://wiki.lustre.org/Creating_Pacemaker_Resources_for_Lustre_Storage_Services Nathaniel Clark has upstreamed the ZFS RA into the resource agents project on GitHub but it will take some time to filter into OS distros. The above-referenced page shows how to download and incorporate into a pacemaker cluster.

            My apologies, I see LUSTREhealth has been merged in LU-8458, is the current state of affairs that the ZFS stuff is an exercise for the reader with the regular ZFS agents? Or have these other agents been up-streamed elsewhere?

            Or do we use the RPM attached here?

            Doesn't worry me what the answer is, it just seems to be a bit difficult to determine from the current state of the ticket/git repo

            mhaakddn Malcolm Haak - NCI (Inactive) added a comment - My apologies, I see LUSTREhealth has been merged in LU-8458 , is the current state of affairs that the ZFS stuff is an exercise for the reader with the regular ZFS agents? Or have these other agents been up-streamed elsewhere? Or do we use the RPM attached here? Doesn't worry me what the answer is, it just seems to be a bit difficult to determine from the current state of the ticket/git repo

            I think this can be closed. ZFS RA was merged upstream, and the Lustre resource agents are available.

            utopiabound Nathaniel Clark added a comment - I think this can be closed. ZFS RA was merged upstream, and the Lustre resource agents are available.

            Is there more to be done here, or should this ticket be closed? I believe the ZFS RA scripts were landed upstream?

            adilger Andreas Dilger added a comment - Is there more to be done here, or should this ticket be closed? I believe the ZFS RA scripts were landed upstream?

            Checked to see what the resource option could locate with respect to ZFS and here's what I got:

            pcs resource list | grep -i zfs
            ocf:heartbeat:Lustre-MDS-ZFS - Lustre and ZFS management when the MDT and MGT
            ocf:heartbeat*:LustreZFS* - Lustre and ZFS management
            ocf:llnl:lustre - Lustre ZFS OSD resource agent
            ocf:llnl:zpool - ZFS zpool resource agent
            ocf:pacemaker:Lustre-MDS-ZFS - Lustre and ZFS management when the MDT and MGT
            ocf:pacemaker:LustreZFS - Lustre and ZFS management

            ls /usr/lib/ocf/resource.d/pacemaker
            ClusterMon Dummy healthLNET HealthSMART LustreZFS pingd Stateful SystemHealth
            controld HealthCPU healthLUSTRE Lustre-MDS-ZFS ping remote SysInfo

            ls /usr/lib/ocf/resource.d/heartbeat
            apache Delay exportfs healthLUSTRE iSCSILogicalUnit LVM nfsnotify oralsnr redis Squid
            clvm dhcpd Filesystem iface-vlan iSCSITarget MailTo nfsserver pgsql Route symlink
            conntrackd docker galera IPaddr Lustre-MDS-ZFS mysql nginx portblock rsyncd tomcat
            CTDB Dummy garbd IPaddr2 LustreZFS nagios ocf-rarun postfix SendArp VirtualDomain
            db2 ethmonitor healthLNET IPsrcaddr named oracle rabbitmq-cluster slapd Xinetd

             

            ls /usr/lib/ocf/resource.d/llnl/
            lustre zpool

            The LLNL agents were installed yesterday by another staff member and we were able to successfully create the resources using the LLNL RA scripts but not the Intel ones:

            Online: [ mds00 mds01 ]

            Full list of resources:

            hammer_io6 (stonith:fence_powerman): Started mds00

            • hammer_io5 (stonith:fence_powerman): Started mds01*
            • lustreMDSPool (ocf::llnl:zpool): Started mds00*
            • lustreMGT (ocf::llnl:lustre): Started mds00*
            • lustreMDT (ocf::llnl:lustre): Started mds00*

            Anyway, if you have any other suggestions, I'd welcome them because I'd prefer using a vendor RA but will settle with the LLNL one for the moment.

            Thanks again for the support with this.

            Cheers,

            veclinton Vaughn E. Clinton (Inactive) added a comment - Checked to see what the resource option could locate with respect to ZFS and here's what I got: pcs resource list | grep -i zfs ocf:heartbeat: Lustre-MDS-ZFS - Lustre and ZFS management when the MDT and MGT ocf:heartbeat*:LustreZFS* - Lustre and ZFS management ocf:llnl:lustre - Lustre ZFS OSD resource agent ocf:llnl:zpool - ZFS zpool resource agent ocf:pacemaker: Lustre-MDS-ZFS - Lustre and ZFS management when the MDT and MGT ocf:pacemaker:LustreZFS - Lustre and ZFS management ls /usr/lib/ocf/resource.d/pacemaker ClusterMon Dummy healthLNET HealthSMART LustreZFS pingd Stateful SystemHealth controld HealthCPU healthLUSTRE Lustre-MDS-ZFS ping remote SysInfo ls /usr/lib/ocf/resource.d/heartbeat apache Delay exportfs healthLUSTRE iSCSILogicalUnit LVM nfsnotify oralsnr redis Squid clvm dhcpd Filesystem iface-vlan iSCSITarget MailTo nfsserver pgsql Route symlink conntrackd docker galera IPaddr Lustre-MDS-ZFS mysql nginx portblock rsyncd tomcat CTDB Dummy garbd IPaddr2 LustreZFS nagios ocf-rarun postfix SendArp VirtualDomain db2 ethmonitor healthLNET IPsrcaddr named oracle rabbitmq-cluster slapd Xinetd   ls /usr/lib/ocf/resource.d/llnl/ lustre zpool The LLNL agents were installed yesterday by another staff member and we were able to successfully create the resources using the LLNL RA scripts but not the Intel ones: Online: [ mds00 mds01 ] Full list of resources: hammer_io6 (stonith:fence_powerman): Started mds00 hammer_io5 (stonith:fence_powerman): Started mds01* lustreMDSPool (ocf::llnl:zpool): Started mds00* lustreMGT (ocf::llnl:lustre): Started mds00* lustreMDT (ocf::llnl:lustre): Started mds00* Anyway, if you have any other suggestions, I'd welcome them because I'd prefer using a vendor RA but will settle with the LLNL one for the moment. Thanks again for the support with this. Cheers,

            From the output, it looks as though PCS cannot find almost any resources. Probably need to check that the packages are installed correctly.

            For reference, the packages on my server are:

            [root@ct66-mds2 ~]# rpm -qa resource-agents
            resource-agents-3.9.5-82.el7_3.6.x86_64
            [root@ct66-mds2 ~]# rpm -qa Lustre-ZFS-RA
            Lustre-ZFS-RA-0.99.5-1.noarch
            

            The RAs are installed in /usr/lib/ocf/resource.d, in subdirectories for each class. For example, the pacemaker directory on one of my servers looks like this:

            [root@ct66-mds2 ~]# ls /usr/lib/ocf/resource.d/pacemaker
            ClusterMon  Dummy      healthLNET    HealthSMART     LustreZFS  pingd   Stateful  SystemHealth
            controld    HealthCPU  healthLUSTRE  Lustre-MDS-ZFS  ping       remote  SysInfo
            

            The pcs resource list command scans these directories to assemble the list of available RAs. Running pcs resource list with no further arguments should return a large list of available resource agents.

            If none of the RAs are showing up, but there are files listed in /usr/lib/ocf/resource.d/{heartbeat,pacemaker}, then it is possible that there is a permissions problem. All the RAs need to have the executable bit set, and on a default install will have mode 755 on all files and directories, owned by root. If they are correct, then perhaps something like SELinux is interfering, although I would hope that that is unlikely.

            malkolm Malcolm Cowe (Inactive) added a comment - From the output, it looks as though PCS cannot find almost any resources. Probably need to check that the packages are installed correctly. For reference, the packages on my server are: [root@ct66-mds2 ~]# rpm -qa resource-agents resource-agents-3.9.5-82.el7_3.6.x86_64 [root@ct66-mds2 ~]# rpm -qa Lustre-ZFS-RA Lustre-ZFS-RA-0.99.5-1.noarch The RAs are installed in /usr/lib/ocf/resource.d , in subdirectories for each class. For example, the pacemaker directory on one of my servers looks like this: [root@ct66-mds2 ~]# ls /usr/lib/ocf/resource.d/pacemaker ClusterMon  Dummy      healthLNET    HealthSMART     LustreZFS  pingd   Stateful  SystemHealth controld    HealthCPU  healthLUSTRE  Lustre-MDS-ZFS  ping       remote  SysInfo The pcs resource list command scans these directories to assemble the list of available RAs. Running pcs resource list with no further arguments should return a large list of available resource agents. If none of the RAs are showing up, but there are files listed in /usr/lib/ocf/resource.d/{heartbeat,pacemaker }, then it is possible that there is a permissions problem. All the RAs need to have the executable bit set, and on a default install will have mode 755 on all files and directories, owned by root. If they are correct, then perhaps something like SELinux is interfering, although I would hope that that is unlikely.

            Malcolm,

            Thanks for the response!  I really appreciate the help with this since I'm very new at PCS/Pacemaker/Corosync setups.

            Anyway, I ran the following command with the syntax as you suggested.  Here's the return from the command:

            pcs resource list ocf:pacemaker | awk 'tolower($0) ~ /lustre|lnet/'
            Error: No resource agents matching the filter.

            I even attempted with heartbeat and here's the return for that attempt:

            pcs resource list ocf:heartbeat | awk 'tolower($0) ~ /lustre|lnet/'
            Error: No resource agents matching the filter.

            I did attempt to create the resources anyway and it failed as with the previous attempts:

            pcs resource create hail-mgt ocf:pacemaker:LustreZFS pool="ha-mds" volume="mgt" mountpoint="/lustre/hail/mgmt"


            Error: Unable to create resource 'ocf:pacemaker:LustreZFS', it is not installed on this system (use --force to override)

            I forgot to add the version of the resource-agents RPM that installed in this environment:

            resource-agents-3.9.5-82.el7_3.3.x86_64

            **Again, thanks for the assistance

            veclinton Vaughn E. Clinton (Inactive) added a comment - Malcolm, Thanks for the response!  I really appreciate the help with this since I'm very new at PCS/Pacemaker/Corosync setups. Anyway, I ran the following command with the syntax as you suggested.  Here's the return from the command: pcs resource list ocf:pacemaker | awk 'tolower($0) ~ /lustre|lnet/' Error: No resource agents matching the filter. I even attempted with heartbeat and here's the return for that attempt: pcs resource list ocf:heartbeat | awk 'tolower($0) ~ /lustre|lnet/' Error: No resource agents matching the filter. I did attempt to create the resources anyway and it failed as with the previous attempts: pcs resource create hail-mgt ocf:pacemaker:LustreZFS pool="ha-mds" volume="mgt" mountpoint="/lustre/hail/mgmt" Error: Unable to create resource 'ocf:pacemaker:LustreZFS', it is not installed on this system (use --force to override) I forgot to add the version of the resource-agents RPM that installed in this environment: resource-agents-3.9.5-82.el7_3.3.x86_64 ** Again, thanks for the assistance

            Hi Vaughn,

            Try using the path ocf:pacemaker:Lustre-MDS-ZFS, instead of ocf:heartbeat:Lustre-MDS-ZFS. You can also verify the list of available RAs using the command pcs resource list. For example:

            [root@ct66-mds2 ~]# pcs resource list ocf:pacemaker | awk 'tolower($0) ~ /lustre|lnet/'
            ocf:pacemaker:Lustre-MDS-ZFS - Lustre and ZFS management when the MDT and MGT
            ocf:pacemaker:LustreZFS - Lustre and ZFS management
            ocf:pacemaker:healthLNET - LNet connectivity
            ocf:pacemaker:healthLUSTRE - lustre servers healthy
            

             

            malkolm Malcolm Cowe (Inactive) added a comment - Hi Vaughn, Try using the path ocf: pacemaker :Lustre-MDS-ZFS , instead of ocf: heartbeat :Lustre-MDS-ZFS . You can also verify the list of available RAs using the command pcs resource list . For example: [root@ct66-mds2 ~]# pcs resource list ocf:pacemaker | awk 'tolower($0) ~ /lustre|lnet/' ocf:pacemaker:Lustre-MDS-ZFS - Lustre and ZFS management when the MDT and MGT ocf:pacemaker:LustreZFS - Lustre and ZFS management ocf:pacemaker:healthLNET - LNet connectivity ocf:pacemaker:healthLUSTRE - lustre servers healthy  

            I've been trying to use the script to create the HA volumn/dataset resources with the following syntax:

            pcs resource create hail-mgt ocf:heartbeat:Lustre-MDS-ZFS pool="ha.mds" volume="mgt" mountpoint="/lustre/hail/mgt"

            Each attempt returns the following error:

            Error: Unable to create resource 'ocf:heartbeat:Lustre-MDS-ZFS', it is not installed on this system (use --force to override)

            I can see that the agent script, Lustre-MDS-ZFS, is dropped into the correct location when I run this syntax with the debug option enabled.  I also see the script being ran with a return 0 value. I'm not exactly sure what the problem is. Could it be missing some binary that I'm not seeing in the debug output? Anyway, I would greatly appreciate some guidance with solving this.

            Here are the details about my configuration:

            Red Hat Enterprise Linux Server release 7.3 (Maipo)
            pcs-0.9.152-10.el7.x86_64
            pacemaker-1.1.15-11.el7_3.2.x86_64
            corosync-2.4.0-4.el7.x86_64

            fence-agents-common-4.0.11-47.el7_3.2.x86_64
            fence-agents-powerman-4.0.11-7.ch6.x86_64
            libxshmfence-1.2-1.el7.x86_64

            This is being deployed in a diskless 2 node HA Lustre environment. Please let me know if you require me to open a ticket concerning this issue.

            veclinton Vaughn E. Clinton (Inactive) added a comment - - edited I've been trying to use the script to create the HA volumn/dataset resources with the following syntax: pcs resource create hail-mgt ocf:heartbeat:Lustre-MDS-ZFS pool="ha.mds" volume="mgt" mountpoint="/lustre/hail/mgt" Each attempt returns the following error: Error: Unable to create resource 'ocf:heartbeat:Lustre-MDS-ZFS', it is not installed on this system (use --force to override) I can see that the agent script, Lustre-MDS-ZFS, is dropped into the correct location when I run this syntax with the debug option enabled.  I also see the script being ran with a return 0 value. I'm not exactly sure what the problem is. Could it be missing some binary that I'm not seeing in the debug output? Anyway, I would greatly appreciate some guidance with solving this. Here are the details about my configuration: Red Hat Enterprise Linux Server release 7.3 (Maipo) pcs-0.9.152-10.el7.x86_64 pacemaker-1.1.15-11.el7_3.2.x86_64 corosync-2.4.0-4.el7.x86_64 fence-agents-common-4.0.11-47.el7_3.2.x86_64 fence-agents-powerman-4.0.11-7.ch6.x86_64 libxshmfence-1.2-1.el7.x86_64 This is being deployed in a diskless 2 node HA Lustre environment. Please let me know if you require me to open a ticket concerning this issue.

            People

              gabriele.paciucci Gabriele Paciucci (Inactive)
              gabriele.paciucci Gabriele Paciucci (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              15 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: