OpenStack Services NRPE Checks
This charm provides OpenStack service checks for Nagios
juju deploy cs:~canonical-bootstack/openstack-service-checks juju add-relation openstack-service-checks nrpe
This charm supports relating to keystone via the keystone-credentials interface. If you do not wish to use this, you can supply your own credential set for Openstack by adding 'os-credentials' setting (see setting description hints)
juju config openstack-services-checks os-credentials=" ... "
juju add-relation openstack-service-checks:identity-credentials keystone:identity-credentials
API endpoints monitoring
If your OpenStack API endpoints have a common URL for the Admin, Public and Internal addresses, you should consider disabling some endpoints which would be duplicated otherwise, e.g.
juju config openstack-service-checks check_internal_urls=False check_admin_urls=False
If such API endpoints use TLS, new checks will monitor the certificates expiration time:
juju config openstack-service-checks tls_warn_days=30 tls_crit_days=14
Note: in order to have endpoint checks updated on endpoint changes you should also relate identity-notifications:
juju add-relation keystone:identity-notifications openstack-service-checks:identity-notifications
Alternatively, instead of the above relation, there is also an action "refresh-endpoint-checks" available. Running this action will update the service checks with the current endpoints.
Knowning when an openstack load-balancer is having an issue is an important operational situation which this charm helps manage. There is both course grain control over octavia checks, as well as more fine-grained control by use of the following config items.
falsecan enable or disable checks
Each of these config items adds an ignore-list of keywords. Each keyword in the ignore list will be blocked when it appears in the output of the check.
Ignoring a test or non-production loadbalancer with the ID=
-56789012-dead-beef which is INACTIVE or DEGRADED.
juju config my-openstack-service-checks octavia-loadbalancer-ignored='deadbeef-1234-56789012-dead-beef,'
Ignoring all loadbalancers which happen to be DEGRADED.
juju config my-openstack-service-checks octavia-loadbalancer-ignored='DEGRADED,'
Ignoring amphorae that are stuck in BOOTING state
juju config my-openstack-service-checks octavia-amphorae-ignored='BOOTING,'
Alert when the openstack compute nodes protected by Masakari are on maintenance during a failure. Follow-up must be done to re-enable the service for nodes marked by Masakari as being in maintenance state.
juju config openstack-service-checks check-masakari=true
Compute services monitoring
Compute services are monitored via the 'os-services' interface. Several thresholds can be adjusted to tweak the alerting system: number of available nodes per host (warning and critical thresholds), ignore certain host aggregates (by default, no aggregates are skipped), ignore nodes in 'disabled' state.
juju config openstack-service-checks nova_warn=2 nova_crit=1 juju config openstack-service-checks skipped_host_aggregates='hostaggr1,hostaggr2' juju config openstack-service-checks skip-disabled=true
A new nrpe check supports a limited list of rally/tempest tests, which can be scheduled to run via cron (default cronjob schedule is every 15 minutes). Tests can also be skipped as follows (available components are cinder, glance, nova and neutron):
juju config openstack-service-checks check-rally=true juju config openstack-service-checks rally-cron-schedule='*/20 * * * *' juju config openstack-service-checks skip-rally='nova,neutron'
The rally/tempest tests are installed via the
fbctest snap. The charm supports installing it
from a juju resource, which can be handy in offline deployments. In this case
you'll also have to install the snaps upon which
snapd. Prefetch the snaps:
snap download fcbtest snap download core18 snap download snapd
Provide the snap files as resources to the application:
for e.g. (change the snap files version accordingly) FCBTEST_SNAP_FILE="fcbtest_14.snap" CORE18_SNAP_FILE="core18_2074.snap" SNAPD_SNAP_FILE="snapd_12398.snap" juju deploy cs:~canonical-bootstack/openstack-service-checks \ --resource fcbtest=$FCBTEST_SNAP_FILE \ --resource core18=$CORE18_SNAP_FILE \ --resource snapd=$SNAPD_SNAP_FILE
Please contact Canonical's BootStack team via the "Submit a bug" link. Upstream Project Name
- (string) A space-separated list of DNS names to check. If any of the names are not resolvable, alert as CRITICAL.
- (boolean) Switch to turn on or off check for masakari segment hosts
- (boolean) Switch to turn on or off neutron agents checks. By default, neutron_agents nrpe check is enabled. If a different SDN (ie. Contrail) is in use, you may want to disable this check.
- (boolean) Switch to turn on or off check for octavia services.
- (boolean) Switch to turn on or off check for port security. If hardware offloading is used on a port, port security must be disabled.
- (boolean) Switch to turn on or off rally checks via the fcbtest snap. By default, rally nrpe check is disabled.
- (boolean) If true, create NRPE checks matching all 'admin' URLs in the Keystone catalog.
- (boolean) If true, create NRPE checks matching all 'internal' URLs in the Keystone catalog.
- (boolean) If true, create NRPE checks matching all 'public' URLs in the Keystone catalog.
- (string) The VIP used for Contrail Analytics. Leave blank to disable Contrail monitoring.
- (string) Comma separated list of contrail alerts to ignore
- (string) From address when sending email notifications.
- (string) Comma separated list of email recipients to send notifications on demand.
- (string) Used by the nrpe subordinate charms. A string that will be prepended to instance name to set the host name in nagios. So for instance the hostname would be something like: juju-myservice-0 If you're running multiple environments with the same services in them this allows you to differentiate between them.
- (string) A comma-separated list of nagios servicegroups. If left empty, the nagios_context will be used as the servicegroup
- (int) Critical level for nova aggregate unit count check - setting this to -1 will effectively disable host aggregate checks.
- (int) Warning level for nova aggregate unit count check - setting this to -1 will effectively disable host aggregate checks.
- (int) If latest glance image tagged with above octavia-amp-image-tag is updated more than these days ago, a Nagios warning will be raised. The version of octavia agent builtin in amphora image must match version of octavia controller, otherwise octavia will fail to communicate with new amphora, failover will also fail.
- (string) The glance image tag octavia will use to create amphora.
- (string) Comma separated list of octavia amphorae alerts to ignore
- (string) Comma separated list of octavia image alerts to ignore
- (string) Comma separated list of octavia load balancer alerts to ignore
- (string) Comma separated list of octavia pool alerts to ignore
- (string) Comma separated OpenStack credentials to be used by nagios. It is strongly recommended this be a user with a dedicated role, and not a full admin. Takes the format of username=foo, password=bar, credentials_project=baz, region_name=Region1, auth_url=http://127.0.0.1:35357
- (string) Cron schedule used to run the rally tests. Default value is every 15 minutes. Furthermore, the cronjob is scheduled to time out after 13 minutes (SIGTERM) or 14 minutes (SIGKILL).
- */15 * * * *
- (string) URL to use with check_http if there is an S3 endpoint. Default is '/healthcheck', but it's possible to add extra params, e.g. '/v3 -e Unauthorized -d x-openstack-request-id' or a different url, e.g. '/' (when the endpoint is used with ceph-radosgw for example).
- (boolean) An option to specify whether you want Warning alerts in nagios for disabled nova-compute hosts.
- (string) Comma separated list of OpenStack components to not monitor. An empty string means all components will be monitored (up to the number of currently supported components: Cinder, Glance, Nova, Neutron). Sample: skip-rally=cinder
- (string) Comma separated list of host aggregates that need to be skipped from checks. Example "Agg1,AGg2" or 'Aggregate3'. This is a case-insensitive option.
- (string) How often snapd handles updates for installed snaps. The default (an empty string) is 4x per day. Set to "max" to check once per month based on the charm deployment date. You may also set a custom string as described in the 'refresh.timer' section here: https://forum.snapcraft.io/t/system-options/87
- (string) URL to use with check_http if there is a Swift endpoint. Default is '/', but it's possible to add extra params, e.g. '/v3 -e Unauthorized -d x-openstack-request-id' or a different url, e.g. '/healthcheck'. Mitaka Swift typically needs '/healthcheck'.
- (int) Number of days left for the TLS certificate to expire before alerting Critical.
- (int) Number of days left for the TLS certificate to expire before warning.
- (string) base64 encoded SSL ca cert to use for OpenStack API client connections.