openstack service checks #29

Supports: bionic xenial trusty
Add to new model

Description

OpenStack Services NRPE Checks


Overview

This charm provides OpenStack service checks for Nagios

Usage

juju deploy cs:~canonical-bootstack/openstack-service-checks
juju add-relation openstack-service-checks nrpe

This charm supports relating to keystone via the keystone-credentials
interface. If you do not wish to use this, you can supply your own credential
set for Openstack by adding 'os-credentials' setting (see setting description
hints)

juju config openstack-services-checks os-credentials=" ... "

With Keystone

juju add-relation openstack-service-checks:identity-credentials keystone:identity-credentials

API endpoints monitoring

If your OpenStack API endpoints have a common URL for the Admin, Public and
Internal addresses, you should consider disabling some endpoints which would be
duplicated otherwise, e.g.

juju config openstack-service-checks check_internal_urls=False check_admin_urls=False

If such API endpoints use TLS, new checks will monitor the certificates expiration time:

juju config openstack-service-checks tls_warn_days=30 tls_crit_days=14

Compute services monitoring

Compute services are monitored via the 'os-services' interface. Several thresholds can
be adjusted to tweak the alerting system: number of available nodes per host (warning
and critical thresholds), ignore certain host aggregates (by default, no aggregates
are skipped), ignore nodes in 'disabled' state.

juju config openstack-service-checks nova_warn=2 nova_crit=1
juju config openstack-service-checks skipped_host_aggregates='hostaggr1,hostaggr2'
juju config openstack-service-checks skip-disabled=true

Rally checks

A new nrpe check supports a limited list of rally/tempest tests, which can be
scheduled to run via cron (default cronjob schedule is every 15 minutes). Tests
can also be skipped as follows (available components are cinder, glance, nova and
neutron):

juju config openstack-service-checks check-rally=true
juju config openstack-service-checks rally-cron-schedule='*/20 * * * *'
juju config openstack-service-checks skip-rally='nova,neutron'

Contact information

Please contact Canonical's BootStack team via the "Submit a bug" link.
Upstream Project Name


Configuration

check-dns
(string) A space-separated list of DNS names to check. If any of the names are not resolvable, alert as CRITICAL.
check-neutron-agents
(boolean) Switch to turn on or off neutron agents checks. By default, neutron_agents nrpe check is enabled. If a different SDN (ie. Contrail) is in use, you may want to disable this check.
True
check-rally
(boolean) Switch to turn on or off rally checks via the fcbtest snap. By default, rally nrpe check is disabled.
check_admin_urls
(boolean) If true, create NRPE checks matching all 'admin' URLs in the Keystone catalog.
True
check_internal_urls
(boolean) If true, create NRPE checks matching all 'internal' URLs in the Keystone catalog.
True
check_public_urls
(boolean) If true, create NRPE checks matching all 'public' URLs in the Keystone catalog.
True
contrail_analytics_vip
(string) The VIP used for Contrail Analytics. Leave blank to disable Contrail monitoring.
nagios_context
(string) Used by the nrpe subordinate charms. A string that will be prepended to instance name to set the host name in nagios. So for instance the hostname would be something like: juju-myservice-0 If you're running multiple environments with the same services in them this allows you to differentiate between them.
juju
nagios_servicegroups
(string) A comma-separated list of nagios servicegroups. If left empty, the nagios_context will be used as the servicegroup
nova_crit
(int) Critical level for nova aggregate unit count check - setting this to -1 will effectively disable host aggregate checks.
1
nova_warn
(int) Warning level for nova aggregate unit count check - setting this to -1 will effectively disable host aggregate checks.
2
os-credentials
(string) Comma separated OpenStack credentials to be used by nagios. It is strongly recommended this be a user with a dedicated role, and not a full admin. Takes the format of username=foo, password=bar, credentials_project=baz, region_name=Region1, auth_url=http://127.0.0.1:35357
rally-cron-schedule
(string) Cron schedule used to run the rally tests. Default value is every 15 minutes. Furthermore, the cronjob is scheduled to time out after 13 minutes (SIGTERM) or 14 minutes (SIGKILL).
*/15 * * * *
skip-disabled
(boolean) An option to specify whether you want Warning alerts in nagios for disabled nova-compute hosts.
skip-rally
(string) Comma separated list of OpenStack components to not monitor. An empty string means all components will be monitored (up to the number of currently supported components: Cinder, Glance, Nova, Neutron). Sample: skip-rally=cinder
skipped_host_aggregates
(string) Comma separated list of host aggregates that need to be skipped from checks. Example "Agg1,AGg2" or 'Aggregate3'. This is a case-insensitive option.
snap_proxy
(string) DEPRECATED. Use snap-http-proxy and snap-https-proxy model configuration settings. HTTP/HTTPS web proxy for Snappy to use when accessing the snap store.
snap_proxy_url
(string) DEPRECATED. Use snap-store-proxy model configuration setting. The address of a Snap Store Proxy to use for snaps e.g. http://snap-proxy.example.com
snapd_refresh
(string) How often snapd handles updates for installed snaps. The default (an empty string) is 4x per day. Set to "max" to check once per month based on the charm deployment date. You may also set a custom string as described in the 'refresh.timer' section here: https://forum.snapcraft.io/t/system-options/87
swift_check_params
(string) URL to use with check_http if there is a Swift endpoint. Default is '/', but it's possible to add extra params, e.g. '/v3 -e Unauthorized -d x-openstack-request-id' or a different url, e.g. '/healthcheck'. Mitaka Swift typically needs '/healthcheck'.
/
tls_crit_days
(int) Number of days left for the TLS certificate to expire before alerting Critical.
14
tls_warn_days
(int) Number of days left for the TLS certificate to expire before warning.
30
trusted_ssl_ca
(string) base64 encoded SSL ca cert to use for OpenStack API client connections.