hacluster #198

Supports: bionic focal groovy hirsute impish




The hacluster charm provides high availability for OpenStack applications that lack native (built-in) HA functionality. The clustering solution is based on Corosync and Pacemaker.

It is a subordinate charm that works in conjunction with a principle charm that supports the 'hacluster' interface. The current list of such charms can be obtained from the Charm Store (the charms officially supported by the OpenStack Charms project are published by 'openstack-charmers').

See OpenStack high availability in the OpenStack Charms Deployment Guide for a comprehensive treatment of HA with charmed OpenStack.

Note: The hacluster charm is generally intended to be used with MAAS-based clouds.


High availability can be configured in two mutually exclusive ways:

  • virtual IP(s)
  • DNS

The virtual IP method of implementing HA requires that all units of the clustered OpenStack application are on the same subnet.

The DNS method of implementing HA requires that MAAS is used as the backing cloud. The clustered nodes must have static or "reserved" IP addresses registered in MAAS. If using a version of MAAS earlier than 2.3 the DNS hostname(s) should be pre-registered in MAAS before use with DNS HA.


This section covers common configuration options. See file config.yaml for the full list of options, along with their descriptions and default values.


The cluster_count option sets the number of hacluster units required to form the principle application cluster (the default is 3). It is best practice to provide a value explicitly as doing so ensures that the hacluster charm will wait until all relations are made to the principle application before building the Corosync/Pacemaker cluster, thereby avoiding a race condition.


At deploy time an application name should be set, and be based on the principle charm name (for organisational purposes):

juju deploy hacluster <principle-charm-name>-hacluster

A relation is then added between the hacluster application and the principle application.

In the below example the VIP approach is taken. These commands will deploy a three-node Keystone HA cluster, with a VIP of Each will reside in a container on existing machines 0, 1, and 2:

juju deploy -n 3 --to lxd:0,lxd:1,lxd:2 --config vip= keystone
juju deploy --config cluster_count=3 hacluster keystone-hacluster
juju add-relation keystone-hacluster:ha keystone:ha


This section lists Juju actions supported by the charm. Actions allow specific operations to be performed on a per-unit basis.

  • pause
  • resume
  • status
  • cleanup
  • update-ring

To display action descriptions run juju actions hacluster. If the charm is not deployed then see file actions.yaml.

Presenting status information

Here are a few examples of how to present useful information with the status action and the jq utility.

  • Querying for online and standby parameter values:

    juju run-action --wait hacluster/leader status \ --format json | jq '.[] | {(.UnitId):.results.result | fromjson \ | .nodes | .[] | {unit_name: .name, online: .online, standby: .standby}}'

output example

    "hacluster/0": {
      "unit_name": "juju-a37bc0-3",
      "online": "true",
      "standby": "false"
    "hacluster/0": {
      "unit_name": "juju-a37bc0-4",
      "online": "true",
      "standby": "false"
    "hacluster/0": {
      "unit_name": "juju-a37bc0-5",
      "online": "true",
      "standby": "false"
  • Displaying cluster resource information:

    juju run-action --wait hacluster/leader status \ --format json | jq '.[] | {(.UnitId):.results.result | fromjson \ | .resources.groups}'


Please report bugs on Launchpad.

For general charm questions refer to the OpenStack Charm Guide.


(int) Number of peer units required to bootstrap cluster services. . If less that 3 is specified, the cluster will be configured to ignore any quorum problems; with 3 or more units, quorum will be enforced and services will be stopped in the event of a loss of quorum. It is best practice to set this value to the expected number of units to avoid potential race conditions.
(int) Sets the pacemaker default resource meta-attribute value for 'cluster-recheck-interval'. This value represents the polling interval at which the cluster checks for changes in the resource parameters, constraints or other cluster options. Setting this to 0 disables the feature.
(string) Default network interface on which HA cluster will bind to communication with the other members of the HA Cluster. Defaults to the network interface hosting the units private-address. Only used when corosync_transport = multicast.
(string) This value will become the Corosync authentication key. To generate a suitable value use: . sudo corosync-keygen sudo cat /etc/corosync/authkey | base64 -w 0 . This configuration element is mandatory and the service will fail on install if it is not provided. The value must be base64 encoded.
(string) Multicast IP address to use for exchanging messages over the network. If multiple clusters are on the same bindnetaddr network, this value can be changed. Only used when corosync_transport = multicast.
(int) Default multicast port number that will be used to communicate between HA Cluster nodes. Only used when corosync_transport = multicast.
(string) Two supported modes are multicast (udp) or unicast (udpu)
(boolean) Enable debug logging
(string) DEPRECATED: will be removed in a future release If the CRM status has recorded failed actions in any of the registered resource agents, check_crm can optionally generate an alert. Valid options: ignore/warning/critical
(int) DEPRECATED: will be removed in a future release. Alias for res_failcount_warn. Takes precedence over res_failcount_warn if set to non-zero
(int) Sets the pacemaker default resource meta-attribute value for failure_timeout. This value represents the duration in seconds to wait before resetting failcount to 0. In practice, this is measured as the time elapsed since the most recent failure. Setting this to 0 disables the feature.
(string) MAAS credentials (required for STONITH).
(string) PPA for python3-maas-client: . - ppa:maas/stable - ppa:maas/next . The last option should be used in conjunction with the key configuration option. Used when service_dns is set on the primary charm for DNS HA.
(string) PPA key for python3-maas-client: PPA Key configuration option. Used when nodes are offline to specify the ppa public key.
(string) MAAS API endpoint (required for STONITH).
(boolean) When enabled pacemaker will be put in maintenance mode, this will allow administrators to manipulate cluster resources (e.g. stop daemons, reboot machines, etc). Pacemaker will not monitor the resources while maintenance mode is enabled and node removals won't be processed.
(string) One or more IPs, separated by space, that will be used as a safety check for avoiding split brain situations. Nodes in the cluster will ping these IPs periodically. Node that can not ping monitor_host will not run shared resources (VIP, shared disk...).
(string) Time period between checks of resource health. It consists of a number and a time factor, e.g. 5s = 5 seconds. 2m = 2 minutes.
(string) Used by the nrpe-external-master subordinate charm. A string that will be prepended to instance name to set the host name in nagios. So for instance the hostname would be something like: . juju-postgresql-0 . If you're running multiple environments with the same services in them this allows you to differentiate between them.
(string) A comma-separated list of nagios servicegroups. If left empty, the nagios_context will be used as the servicegroup.
(int) Specifies the corosync.conf network mtu. If unset, the default corosync.conf value is used (currently 1500). See 'man corosync.conf' for detailed information on this config option.
(string) What to do when the cluster does not have quorum. Allowed values: ignore: continue all resource management, freeze: continue resource management, but don’t recover resources from nodes not in the affected partition, stop: stop all resources in the affected cluster partition, suicide: fence all nodes in the affected cluster partition
(string) This value will become the Pacemaker authentication key. To generate a suitable value use: . dd if=/dev/urandom of=/tmp/authkey bs=2048 count=1 cat /tmp/authkey | base64 -w 0 . If this configuration element is not set then the corosync key will be reused as the pacemaker key.
(boolean) If True enables IPv6 support. The charm will expect network interfaces to be configured with an IPv6 address. If set to False (default) IPv4 is expected. . NOTE: these charms do not currently support IPv6 privacy extension. In order for this charm to function correctly, the privacy extension must be disabled and a non-temporary address must be configured/available on your network interface.
(int) check_crm will generate a critical alert if the failcount of a resource has crossed this threshold. Set to 0 or '' to disable.
(int) check_crm will generate a warning if the failcount of a resource has crossed this threshold. Set to 0 or '' to disable.
(int) Systemd override value for corosync and pacemaker service start timeout in seconds. Set value to -1 turn off timeout for the services.
(int) Systemd override value for corosync and pacemaker service stop timeout seconds. The default value will cause systemd to timeout a service stop after 10 minutes. This should provide for sufficient time for resources to migrate away from the current node as part of the stop sequence in most cases. Set value to -1 turn off timeout for the services.
(string) DEPRECATED: is now ignored and will be removed in a future release. Resource fencing (aka STONITH) is now always enabled for every node in the cluster. This requires MAAS credentials be provided and each node's power parameters are properly configured in its inventory.