percona cluster #279

Supports: xenial bionic cosmic disco
Add to new model

Description

Percona XtraDB Cluster provides an active/active MySQL
compatible alternative implemented using the Galera
synchronous replication extensions.


Overview

Percona XtraDB Cluster is a high availability and high scalability solution for
MySQL clustering. Percona XtraDB Cluster integrates Percona Server with the
Galera library of MySQL high availability solutions in a single product package
which enables you to create a cost-effective MySQL cluster.

This charm deploys Percona XtraDB Cluster onto Ubuntu.

Usage

Deployment

To deploy this charm:

juju deploy percona-cluster

Passwords required for the correct operation of the deployment are automatically
generated and stored by the lead unit (typically the first unit).

To expand the deployment:

juju add-unit -n 2 percona-cluster

See notes in the 'HA/Clustering' section on safely deploying a PXC cluster
in a single action.

The root password for mysql can be retrieved using the following command:

juju run --unit percona-cluster/0 leader-get root-password

This is only usable from within one of the units within the deployment
(access to root is restricted to localhost only).

Memory Configuration

Percona Cluster is extremely memory sensitive. Setting memory values too low
will give poor performance. Setting them too high will create problems that are
very difficult to diagnose. Please take time to evaluate these settings for
each deployment environment rather than copying and pasting bundle
configurations.

The Percona Cluster charm needs to be able to be deployed in small low memory
development environments as well as high performance production environments.
The charm configuration opinionated defaults favor the developer environment in
order to ease initial testing. Production environments need to consider
carefully the memory requirements for the hardware or cloud in use. Consult a
MySQL memory calculator [2] to understand the implications of the values.

Between the 5.5 and 5.6 releases a significant default was changed.
The performance schema [1] defaulted to on for 5.6 and later. This allocates
all the memory that would be required to handle max-connections plus several
other memory settings. With 5.5 memory was allocated during runtime as needed.

The charm now makes performance schema configurable and defaults to off (False).
With the performance schema turned off memory is allocated when needed during
run time. It is important to understand this can lead to run time memory
exhaustion if the configuration values are set too high. Consult a MySQL memory
calculator [2] to understand the implications of the values.

Particularly consider the max-connections setting, this value is a balance
between connection exhaustion and memory exhaustion. Occasionally connection
exhaustion occurs in large production HA clouds with max-connections less than
2000. The common practice became to set max-connections unrealistically high
near 10k or 20k. In the move to 5.6 on Xenial this became a problem as Percona
would fail to start up or behave erratically as memory exhaustion occurred on
the host due to performance schema being turned on. Even with the default now
turned off this value should be carefully considered against the production
requirements and resources available.

[1] http://dev.mysql.com/doc/relnotes/mysql/5.6/en/news-5-6-6.html#mysqld-5-6-6-performance-schema
[2] http://www.mysqlcalculator.com/

HA/Clustering

When more than one unit of the charm is deployed with the hacluster charm
the percona charm will bring up an Active/Active cluster. The process of
clustering the units together takes some time. Due to the nature of
asynchronous hook execution it is possible client relationship hooks may
be executed before the cluster is complete. In some cases, this can lead
to client charm errors.

To guarantee client relation hooks will not be executed until clustering is
completed use the min-cluster-size configuration setting:

juju deploy -n 3 percona-cluster
juju config percona-cluster min-cluster-size=3

When min-cluster-size is not set the charm will still cluster, however,
there are no guarantees client relation hooks will not execute before it is
complete.

Single unit deployments behave as expected.

There are two mutually exclusive high availability options: using virtual
IP(s) or DNS. In both cases, a relationship to hacluster is required which
provides the corosync back end HA functionality.

To use virtual IP(s) the clustered nodes must be on the same subnet such that
the VIP is a valid IP on the subnet for one of the node's interfaces and each
node has an interface in said subnet. The VIP becomes a highly-available API
endpoint.

At a minimum, the config option 'vip' must be set in order to use virtual IP
HA. If multiple networks are being used, a VIP should be provided for each
network, separated by spaces. Optionally, vip_iface or vip_cidr may be
specified.

To use DNS high availability there are several prerequisites. However, DNS HA
does not require the clustered nodes to be on the same subnet.
Currently the DNS HA feature is only available for MAAS 2.0 or greater
environments. MAAS 2.0 requires Juju 2.0 or greater. The clustered nodes must
have static or "reserved" IP addresses registered in MAAS. The DNS hostname(s)
must be pre-registered in MAAS before use with DNS HA.

At a minimum, the config option 'dns-ha' must be set to true, the
'os-access-hostname' must be set, and the 'access' binding must be
defined in order to use DNS HA.

The charm will throw an exception in the following circumstances:

  • If neither 'vip' nor 'dns-ha' is set and the charm is related to hacluster
  • If both 'vip' and 'dns-ha' are set, as they are mutually exclusive
  • If 'dns-ha' is set and 'os-access-hostname' is not set
  • If the 'access' binding is not set and 'dns-ha' is set, consumers of the db may not be allowed to connect

MySQL asynchronous replication

This charm supports MySQL asynchronous replication feature which can be used
to replicate databases between multiple Percona XtraDB Clusters. In order to
setup master-slave replication of "example1" and "example2" databases between
"pxc1" and "pxc2" applications, first configure mandatory options:

juju config pxc1 databases-to-replicate="database1:table1,table2;database2"
juju config pxc2 databases-to-replicate="database1:table1,table2;database2"
juju config pxc1 cluster-id=1
juju config pxc2 cluster-id=2

and then relate them:

juju relate pxc1:master pxc2:slave

In order to setup master-master replication, add another relation:

juju relate pxc2:master pxc1:slave

In the same way circular replication can be setup between multiple clusters.

Network Space support

This charm supports the use of Juju Network Spaces, allowing the charm to be bound
to network space configurations managed directly by Juju. This is only supported
with Juju 2.0 and above.

You can ensure that database connections and cluster peer communication are bound to
specific network spaces by binding the appropriate interfaces:

juju deploy percona-cluster --bind "shared-db=internal-space,cluster=internal-space"

alternatively these can also be provided as part of a juju native bundle configuration:

percona-cluster:
  charm: cs:xenial/percona-cluster
  num_units: 1
  bindings:
    shared-db: internal-space
    cluster: internal-space

The 'cluster' endpoint binding is used to determine which network space units
within the percona-cluster deployment should use for communication with each
other; the 'shared-db' endpoint binding is used to determine which network
space should be used for access to MySQL databases services from other charms.

NOTE: Spaces must be configured in the underlying provider prior to
attempting to use them.

NOTE: Existing deployments using the access-network configuration option
will continue to function; this option is preferred over any network space
binding provided for the 'shared-db' relation if set.

Limitations

Note that Percona XtraDB Cluster is not a 'scale-out' MySQL solution; reads
and writes are channelled through a single service unit and synchronously
replicated to other nodes in the cluster; reads/writes are as slow as the
slowest node you have in your deployment.

Series Upgrade

Procedure

  1. Take a backup of all the databases

    juju run-action mysql/N backup

  • Get that backup off the mysql/N unit and somewhere safe.

    juju scp -- -r mysql/N:/opt/backups/mysql /path/to/local/backup/dir

  1. Pause all non-leader units and corresponding hacluster units.
    The leader node will remain up for the time being. This is to ensure the leader
    has the latest sequence number and will be considered the most up to date by
    the cluster.

    juju run-action hacluster/N pause
    juju run-action percona-cluster/N pause

  2. Prepare the leader node

    juju upgrade-series prepare $MACHINE_NUMBER $SERIES

  3. Administratively perform the upgrade.

  4. do-release-upgrade plus any further steps administratively required steps for an upgrade.

  5. Reboot

  6. Complete the series upgrade on the leader:

    juju upgrade-series complete $MACHINE_NUMBER

  7. Administratively validate the leader node database is up and running

  8. Connect to the database and check for expected data
  9. Review "SHOW GLOBAL STATUS;"

  10. Upgrade the non-leader nodes one at a time following the same pattern summarized bellow:

  • juju upgrade-series prepare $MACHINE_NUMBER $SERIES
  • Administratively Upgrade
  • Reboot
  • juju upgrade-series complete $MACHINE_NUMBER
  • Validate
  1. Finalize the upgrade
    Run action on leader node.
    This action informs each node of the cluster the upgrade process is complete cluster wide.
    This also updates mysql configuration with all peers in the cluster.

    juju run-action mysql/N complete-cluster-series-upgrade

  2. Set future instance to the new series and set the source origin

    juju set-series percona-cluster xenial
    juju config mysql source=distro

Documentation

Cold Boot

In the event of an unexpected power outage and cold boot, the cluster will be
unable to reestablish itself without manual intervention.

The cluster will be in scenario 3 or 6 from the upstream Percona Cluster
documentation

Please read the upstream documentation as it provides context to the steps
outlined here. In either scenario, it is necessary to choose a unit to become
the bootstrap node.

Determine the node with the highest sequence number

This information can be found in the
/var/lib/percona-xtradb-cluster/grastate.dat file. The charm will also display
this information in the juju status.

Example juju status after a cold boot of percona-cluster

Unit                Workload  Agent  Machine  Public address  Ports     Message
keystone/0*         active    idle   0        10.5.0.32       5000/tcp  Unit is ready
percona-cluster/0   blocked   idle   1        10.5.0.20       3306/tcp  MySQL is down. Sequence Number: 355. Safe To Bootstrap: 0
  hacluster/0       active    idle            10.5.0.20                 Unit is ready and clustered
percona-cluster/1   blocked   idle   2        10.5.0.17       3306/tcp  MySQL is down. Sequence Number: 355. Safe To Bootstrap: 0
  hacluster/1       active    idle            10.5.0.17                 Unit is ready and clustered
percona-cluster/2*  blocked   idle   3        10.5.0.27       3306/tcp  MySQL is down. Sequence Number: 355. Safe To Bootstrap: 0
  hacluster/2*      active    idle            10.5.0.27                 Unit is ready and clustered

Note: An application leader is denoted by any asterisk in the Unit column.

In the above example all the sequence numbers match. This means we can
bootstrap from any unit we choose.

In the next example the percona-cluster/2 node has the highest sequence number
so we must choose that node to avoid data loss.

Unit                Workload  Agent  Machine  Public address  Ports     Message
keystone/0*         active    idle   0        10.5.0.32       5000/tcp  Unit is ready
percona-cluster/0*  blocked   idle   1        10.5.0.20       3306/tcp  MySQL is down. Sequence Number: 1318. Safe To Bootstrap: 0
  hacluster/0*      active    idle            10.5.0.20                 Unit is ready and clustered
percona-cluster/1   blocked   idle   2        10.5.0.17       3306/tcp  MySQL is down. Sequence Number: 1318. Safe To Bootstrap: 0
  hacluster/1       active    idle            10.5.0.17                 Unit is ready and clustered
percona-cluster/2   blocked   idle   3        10.5.0.27       3306/tcp  MySQL is down. Sequence Number: 1325. Safe To Bootstrap: 0
  hacluster/2       active    idle            10.5.0.27                 Unit is ready and clustered

Bootstrap the node with the highest sequence number

Run the bootstrap-pxc action on the node with the highest sequence number. In
this example, it is unit percona-cluster/2, which happens to be a non-leader.

juju run-action --wait percona-cluster/2 bootstrap-pxc

Notify the cluster of the new bootstrap UUID

In the vast majority of cases, once the bootstrap-pxc action has been run and
the model has settled the output to the juju status command will now look
like this:

Unit                Workload  Agent  Machine  Public address  Ports     Message
keystone/0*         active    idle   0        10.5.0.32       5000/tcp  Unit is ready
percona-cluster/0*  waiting   idle   1        10.5.0.20       3306/tcp  Unit waiting for cluster bootstrap
  hacluster/0*      active    idle            10.5.0.20                 Unit is ready and clustered
percona-cluster/1   waiting   idle   2        10.5.0.17       3306/tcp  Unit waiting for cluster bootstrap
  hacluster/1       active    idle            10.5.0.17                 Unit is ready and clustered
percona-cluster/2   waiting   idle   3        10.5.0.27       3306/tcp  Unit waiting for cluster bootstrap
  hacluster/2       active    idle            10.5.0.27                 Unit is ready and clustered

If you observe the above output ("Unit waiting for cluster bootstrap") then the
notify-bootstrapped action needs to be run on a unit. There are two
possibilities:

  1. If the bootstrap-pxc action was run on a leader then run
    notify-bootstrapped on a non-leader.
  2. If the bootstrap-pxc action was run on a non-leader then run
    notify-bootstrapped on the leader.

In the current example, the first action was run on a non-leader so we'll run
the second action on the leader, percona-cluster/0:

juju run-action percona-cluster/0 notify-bootstrapped --wait

After the model settles, the output should show all nodes in active and ready
state:

Unit                Workload  Agent  Machine  Public address  Ports     Message
keystone/0*         active    idle   0        10.5.0.32       5000/tcp  Unit is ready
percona-cluster/0*  active    idle   1        10.5.0.20       3306/tcp  Unit is ready
  hacluster/0*      active    idle            10.5.0.20                 Unit is ready and clustered
percona-cluster/1   active    idle   2        10.5.0.17       3306/tcp  Unit is ready
  hacluster/1       active    idle            10.5.0.17                 Unit is ready and clustered
percona-cluster/2   active    idle   3        10.5.0.27       3306/tcp  Unit is ready
  hacluster/2       active    idle            10.5.0.27                 Unit is ready and clustered

The percona-cluster is now back to a clustered and healthy state.


Configuration

access-network
(string) The IP address and netmask of the 'access' network (e.g. 192.168.0.0/24) . This network will be used for access to database services.
binlogs-expire-days
(int) Sets the expire_logs_days mysql configuration option, which will make mysql server automatically remove logs older than configured number of days.
10
binlogs-max-size
(string) Sets the max_binlog_size mysql configuration option, which will limit the size of the binary log files. The server will automatically rotate binlogs after they grow to be bigger than this value. Keep in mind that transactions are never split between binary logs, so therefore binary logs might get larger than configured value.
100M
binlogs-path
(string) Location on the filesystem where binlogs are going to be placed. Default mimics what mysql-common package would do for mysql. Make sure you do not put binlogs inside mysql datadir (/var/lib/mysql/)!
/var/log/mysql/mysql-bin.log
cluster-id
(int) Cluster ID to be used when using MySQL asynchronous replication. . NOTE: This value must be different for each cluster.
cluster-network
(string) The IP address and netmask of the cluster (replication) network (e.g. 192.168.0.0/24) . This network will be used for wsrep_cluster replication.
databases-to-replicate
(string) Databases and tables to replicate using MySQL asynchronous replication. The databases should be separated with a semicolon while the tables should be separated with a comma. No tables mean that the whole database will be replicated. For example "database1:table1,table2;database2" will replicate "table1" and "table2" tables from "database1" databasae and all tables from "database2" database. . NOTE: This option should be used only when relating one cluster to the other. It does not affect Galera synchronous replication.
dataset-size
(string) [DEPRECATED] - use innodb-buffer-pool-size. How much data should be kept in memory in the DB. This will be used to tune settings in the database server appropriately. Supported suffixes include K/M/G/T. If suffixed with %, one will get that percentage of RAM allocated to the dataset.
dns-ha
(boolean) Use DNS HA with MAAS 2.0. Note if this is set do not set vip settings below.
enable-binlogs
(boolean) Turns on MySQL binary logs. The placement of the logs is controlled with the binlogs_path config option.
gcs-fc-limit
(int) This setting controls when flow control engages. Simply speaking, if the wsrep_local_recv_queue exceeds this size on a given node, a pausing flow control message will be sent. The fc_limit defaults to 16 transactions. This effectively means that this is as far as a given node can be behind committing transactions from the cluster.
ha-bindiface
(string) Default network interface on which HA cluster will bind to communication with the other members of the HA Cluster.
eth0
ha-mcastport
(int) Default multicast port number that will be used to communicate between HA Cluster nodes.
5490
harden
(string) Apply system hardening. Supports a space-delimited list of modules to run. Supported modules currently include os, ssh, apache and mysql.
innodb-buffer-pool-size
(string) By default this value will be set according to 50% of system total memory or 512MB (whichever is lowest) but also can be set to any specific value for the system. Supported suffixes include K/M/G/T. If suffixed with %, one will get that percentage of system total memory allocated.
innodb-change-buffering
(string) Configure whether InnoDB performs change buffering, an optimization that delays write operations to secondary indexes so that the I/O operations can be performed sequentially. . Permitted values include . none Do not buffer any operations. inserts Buffer insert operations. deletes Buffer delete marking operations; strictly speaking, the writes that mark index records for later deletion during a purge operation. changes Buffer inserts and delete-marking operations. purges Buffer the physical deletion operations that happen in the background. all The default. Buffer inserts, delete-marking operations, and purges. . For more details https://dev.mysql.com/doc/refman/5.6/en/innodb-parameters.html#sysvar_innodb_change_bufferring
innodb-file-per-table
(boolean) Turns on innodb_file_per_table option, which will make MySQL put each InnoDB table into separate .idb file. Existing InnoDB tables will remain in ibdata1 file - full dump/import is needed to get rid of large ibdata1 file
True
innodb-io-capacity
(int) Configure the InnoDB IO capacity which sets an upper limit on I/O activity performed by InnoDB background tasks, such as flushing pages from the buffer pool and merging data from the change buffer. . This value typically defaults to 200 but can be increased on systems with fast bus-attached SSD based storage to help the server handle the background maintenance work associated with a high rate of row changes. . Alternatively it can be decreased to a minimum of 100 on systems with low speed 5400 or 7200 rpm spindles, to reduce the proportion of IO operations being used for background maintenance work. . For more details https://dev.mysql.com/doc/refman/5.6/en/innodb-parameters.html#sysvar_innodb_io_capacity
key
(string) Key ID to import to the apt keyring to support use with arbitrary source configuration from outside of Launchpad archives or PPA's.
known-wait
(int) Known wait along with modulo nodes is used to help avoid restart collisions. Known wait is the amount of time between one node executing an operation and another. On slower hardware this value may need to be larger than the default of 30 seconds.
30
max-connections
(int) Maximum connections to allow. A value of -1 means use the server's compiled-in default. This is not typically that useful so the charm will configure PXC with a default max-connections value of 600. Note: Connections take up memory resources. Either at startup time with performance-schema=True or during run time with performance-schema=False. This value is a balance between connection exhaustion and memory exhaustion. . Consult a MySQL memory calculator like http://www.mysqlcalculator.com/ to understand memory resources consumed by connections. See also performance-schema.
600
min-cluster-size
(int) Minimum number of units expected to exist before charm will attempt to bootstrap percona cluster. If no value is provided this setting is ignored.
modulo-nodes
(int) This config option is rarely required but is provided for fine tuning, it is safe to leave unset. Modulo nodes is used to help avoid restart collisions as well as distribute load on the cloud at larger scale. During restarts and cluster joins percona needs to execute these operations serially. By setting modulo-nodes to the size of the cluster and known-wait to a reasonable value, the charm will distribute the operations serially. If this value is unset, the charm will check min-cluster-size or else finally default to the size of the cluster based on peer relations. Setting this value to 0 will execute operations with no wait time. Setting this value to less than the cluster size will distribute load but may lead to restart collisions.
nagios_context
(string) Used by the nrpe-external-master subordinate charm. A string that will be prepended to instance name to set the host name in nagios. So for instance the hostname would be something like 'juju-myservice-0'. If you are running multiple environments with the same services in them this allows you to differentiate between them.
juju
nagios_servicegroups
(string) A comma-separated list of nagios service groups. If left empty, the nagios_context will be used as the servicegroup
os-access-hostname
(string) The hostname or address of the access endpoint for percona-cluster.
peer-timeout
(string) This setting sets the gmcast.peer_timeout value. Possible values are documented on the galera cluster site http://galeracluster.com/documentation-webpages/galeraparameters.html For very busy clouds or in resource restricted environments this value can be changed. WARNING Please read all documentation before changing the default value which may have unintended consequences. It may be necessary to set this value higher during deploy time (PT15S) and subsequently change it back to the default (PT3S) after deployment.
performance-schema
(boolean) The performance schema attempts to automatically size the values of several of its parameters at server startup if they are not set explicitly. When set to on (True) memory is allocated at startup time. The implications of this is any memory related charm config options such as max-connections and innodb-buffer-pool-size must be explicitly set for the environment percona is running in or percona may fail to start. Default to off (False) at startup time giving 5.5 like behavior. The implication of this is one can set configuration values that could lead to memory exhaustion during run time as memory is not allocated at startup time.
prefer-ipv6
(boolean) If True enables IPv6 support. The charm will expect network interfaces to be configured with an IPv6 address. If set to False (default) IPv4 is expected. . NOTE: these charms do not currently support IPv6 privacy extension. In order for this charm to function correctly, the privacy extension must be disabled and a non-temporary address must be configured/available on your network interface.
pxc-strict-mode
(string) Configures pxc_strict_mode (https://www.percona.com/doc/percona-xtradb-cluster/LATEST/features/pxc-strict-mode.html) Valid values are 'disabled', 'permissive', 'enforcing' and 'master.' Defaults to 'enforcing', as this is what PXC5.7 on bionic (and above) does. This option is ignored on PXC < 5.7 (xenial defaults to 5.6, trusty defaults to 5.5)
enforcing
root-password
(string) Root account password for new cluster nodes. Overrides the automatic generation of a password for the root user, but must be set prior to deployment time to have any effect.
source
(string) Repository from which to install. May be one of the following: distro (default), ppa:somecustom/ppa, a deb url sources entry, or a supported Ubuntu Cloud Archive e.g. . cloud:<series>-<openstack-release> cloud:<series>-<openstack-release>/updates cloud:<series>-<openstack-release>/staging cloud:<series>-<openstack-release>/proposed . See https://wiki.ubuntu.com/OpenStack/CloudArchive for info on which cloud archives are available and supported.
sst-method
(string) Percona method for taking the State Snapshot Transfer (SST), can be: 'rsync', 'xtrabackup', 'xtrabackup-v2', 'mysqldump', 'skip' - see https://www.percona.com/doc/percona-xtradb-cluster/5.5/wsrep-system-index.html#wsrep_sst_method
xtrabackup-v2
sst-password
(string) SST account password for new cluster nodes. Overrides the automatic generation of a password for the sst user, but must be set prior to deployment time to have any effect.
table-open-cache
(int) Sets table_open_cache (formerly known as table_cache) to mysql.
2048
tuning-level
(string) Valid values are 'safest', 'fast', and 'unsafe'. If set to 'safest', all settings are tuned to have maximum safety at the cost of performance. 'fast' will turn off most controls, but may lose data on crashes. 'unsafe' will turn off all protections but this may be OK in clustered deployments.
safest
vip
(string) Virtual IP to use to front Percona XtraDB Cluster in active/active HA configuration
vip_cidr
(int) Netmask that will be used for the Virtual IP.
24
vip_iface
(string) Network interface on which to place the Virtual IP.
eth0
wait-timeout
(int) The number of seconds the server waits for activity on a noninteractive connection before closing it. -1 means use the server's compiled in default.
-1
wsrep-slave-threads
(int) Specifies the number of threads that can apply replication transactions in parallel. Galera supports true parallel replication that applies transactions in parallel only when it is safe to do so. When unset defaults to 48 for >= Bionic or 1 for <= Xenial.