Nagios is a host/service/network monitoring and management system. The purpose of this addon is to allow you to execute Nagios plugins on a remote host in as transparent a manner as possible. This program runs as a background process on the remote host and processes command execution requests from the check_nrpe plugin on the Nagios host.
- misc ›
This subordinate charm is used to configure nrpe (Nagios Remote Plugin Executor). It can be related to the nagios charm via the monitors relation and will pass a monitors yaml to nagios informing it of what checks to monitor.
This charm can be attached to any principal charm (via the juju-info relation) regardless of whether it has implemented the local-monitors or nrpe-external relations. For example:
juju deploy ubuntu juju deploy nrpe juju deploy nagios juju add-relation ubuntu nrpe juju add-relation nrpe:monitors nagios:monitors
If joined via the juju-info relation the default checks are configured and additional checks can be added via the monitors config option (see below).
The local-monitors relations allows the principal to request checks to be setup by passing a monitors yaml and listing them in the 'local' section. It can also list checks that is has configured by listing them in the remote nrpe section and finally it can request external monitors are setup by using one of the other remote types. See "Monitors yaml" below.
Other Subordinate Charms
If another subordinate charm deployed to the same principal has a local-monitors or nrpe-external relation then it can also be related to the local nrpe charm. For example:
echo -e "glance:\n vip: 10.5.106.1" > glance.yaml juju deploy -n3 --config glance.yaml glance juju deploy hacluster glance-hacluster juju deploy nrpe glance-nrpe juju deploy nagios juju add-relation glance glance-hacluster juju add-relation glance-nrpe:monitors nagios:monitors juju add-relation glance glance-nrpe juju add-relation glance-hacluster glance-nrpe
The glance-hacluster charm will pass monitoring information to glance-nrpe which will amalgamate all monitor definitions before passing them to nagios.
Check definitions can come from three places:
This charm creates a base set of checks in /etc/nagios/nrpe.d, including check_load, check_users, check_disk_root. All of the options for these are configurable but sensible defaults have been set in config.yaml. For example to increase the alert threshold for number of processes:
juju config nrpe load="-w 10,10,10 -c 25,25,25"
Default checks maybe disabled by setting them to the empty string.
Principal Requested Checks
Monitors passed to this charm by the principal charm via the local-monitors or nrpe-external relation. The principal charm can write its own check definition into /etc/nagios/nrpe.d and then inform this charm via the monitors setting. It can also request a direct external check of a service without using nrpe. See "Monitors yaml" below for examples.
User Requested Checks
This works in the same way as the Principal requested except the monitors yaml is set by the user via the monitors config option. For example to add a monitor for the rsyslog process:
juju config nrpe monitors=" monitors: local: procrunning: rsyslogd: min: 1 max: 1 executable: rsyslogd "
If the nagios server is not deployed in the juju environment then the charm can be configured, via the export_nagios_definitions, to write out nagios config fragments to /var/lib/nagios/export. Rsync is then configured to allow a host (specified by nagios_master) to collect the fragments. An rsync stanza is created allowing the Nagios server to pick up configs from /var/lib/nagios/export (as a target called "external-nagios"), which will also be configured to allow connections from the hostname or IP address as specified for the "nagios_master" variable.
It is up to you to configure the Nagios master to pull the configs needed, which will then cause it to connect back to the instances in question to run the nrpe checks you have defined.
The list of monitors past down the monitors relation is an amalgamation of the lists provided via the principal, the user and the default checks.
The monitors yaml is of the following form:
# Version of the spec, mostly ignored but 0.3 is the current one version: '0.3' # Dict with just 'local' and 'remote' as parts monitors: # local monitors need an agent to be handled. See nrpe charm for # some example implementations local: # procrunning checks for a running process named X (no path) procrunning: # Multiple procrunning can be defined, this is the "name" of it nagios3: min: 1 max: 1 executable: nagios3 # Remote monitors can be polled directly by a remote system remote: # do a request on the HTTP protocol http: nagios: port: 80 path: /nagios3/ # expected status response (otherwise just look for 200) status: 'HTTP/1.1 401' # Use as the Host: header (the server address will still be used to connect() to) host: www.fewbar.com mysql: # Named basic check basic: username: monitors password: abcdefg123456 nrpe: apache2: command: check_apache2
Before a monitor is added it is checked to see if it is in the 'local' section. If it is this charm needs to convert it into an nrpe checks. Only a small number of check types are currently supported (see below) .These checks can then be called by the nagios charm via the nrpe service. So for each check listed in the local section:
- The definition is read and a check definition it written /etc/nagios/nrpe.d
- The check is defined as a remote nrpe check in the yaml passed to nagios
In the example above a check_proc_nagios3_user.cfg file would be written out which contains:
# Check process nagios3 is running (user) command[check_proc_nagios3_user]=/usr/lib/nagios/plugins/check_procs -w 1 -c 1 -C nagios3
And the monitors yaml passed to nagios would include:
monitors: nrpe: check_proc_nagios3_user: command: check_proc_nagios3_user
The principal charm, or the user via the monitors config option, can request an external check by adding it to the remote section of the monitors yaml. In the example above direct checks of a webserver and of mysql are being requested. This charm passes those on to nagios unaltered.
Local check types
Supported nrpe checks are:
procrunning: min: Minimum number of 'executable' processes max: Maximum number of 'executable' processes executable: Name of executable to look for in process list processcount: min: Minimum total number processes max: Maximum total number processes executable: Name of executable to look for in process list disk: path: Directory to monitor space usage of custom: check: the name of the check to execute plugin_path: (optional) Absolute path to the directory containing the custom plugin. Default value is /var/lib/nagios/plugins description: (optional) Description of the check params: (optional) Parameters to pass to the check on invocation
Remote check types
Supported remote types: http, mysql, nrpe, tcp, rpc, pgsql (See Nagios charm for up-to-date list and options)
By defining 'monitors' binding, you can influence which nrpe's IP will be reported back to Nagios. This can be very handy if nrpe is placed on machines with multiple IPs/networks.
The charm defines 2 actions, 'list-nrpe-checks' that gives a list of all the nrpe checks defined for this unit and what commands they use. The other is run-nrpe-check, which allows you to run a specified nrpe check and get the output. This is useful to confirm if an alert is actually resolved.
- (string) Check conntrack (net.netfilter.nf_conntrack_count) against thresholds. . Set to '' in order to disable this check.
- -w 80 -c 90
- (string) CPU governor check. The string value here will be checked against all CPUs in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor. The supported values are 'ondemand', 'performance', 'powersave'. Unset value means the check will be disabled. There is a relation key called requested_cpu_governor='string', but the charm config value will take precedence over the relation data.
- (boolean) Setting debug to True enables debug=1 in nrpe.cfg
- (string) Root disk check. This can be made to also check non-root disk systems as follows: -u GB -w 20% -c 15% -r '/srv/juju/vol-' -C -u GB -w 25% -c 20% The string '-p /' will be appended to this check, so you must finish the string taking that into account. See the nagios check_disk plugin help for further details. . Set to '' in order to disable this check.
- -u GB -w 25% -c 20% -K 5%
- (boolean) Setting dont_blame_nrpe to True sets dont_blame_nrpe=1 in nrpe.cfg This config option which allows specifying arguments to nrpe scripts. This can be a security risk so it is disabled by default. Nrpe is compiled with --enable-command-args option by default, which this option enables.
- (boolean) If True nagios check definitions are written to '/var/lib/nagios/export' and rync is configured to allow nagios_master to collect them. Useful when Nagios is outside of the juju environment
- (string) Hostcheck to inherit
- (string) Comma separated list of hostgroups to add for these hosts
- (string) LACP bond interfaces, space-delimited (ie. 'bond0 bond1')
- (string) Load check arguments (e.g. "-w 8,8,8 -c 15,15,15"); if 'auto' is set, thresholds will be set to multipliers of processor count for 1m, 5m and 15m thresholds, with warning as "(4, 2, 1)", and critical set to "(8, 4, 2)". So if you have two processors, you'd get thresholds of "-w 8,4,2 -c 16,8,4". . Set to '' in order to disable this check.
- (string) Check memory % used. By default, thresholds are applied to the non-hugepages portion of the memory. . Set to '' in order to disable this check.
- -C -h -u -w 85 -c 90
- (string) Additional monitors defined in the monitors yaml format (see README)
- (string) Determines whether the nagios host check should use the private or public IP address of an instance. Can be "private" or "public".
- (string) A string which will be prepended to instance name to set the host name in nagios. So for instance the hostname would be something like: juju-postgresql-0 If you're running multiple environments with the same services in them this allows you to differentiate between them.
- (string) Determines whether a server is identified by its unit name or host name. If you're in a virtual environment, "unit" is probably best. If you're using MaaS, you may prefer "host". Use "auto" to have nrpe automatically distinguish between metal and non-metal hosts.
- (string) IP address of the nagios master from which to allow rsync access
- (string) Network interfaces to monitor for correct link state, MTU size and speed negotiated. The first argument is either an interface name or a CIDR expression. Parsed keywords are "mtu", "speed", and "op". Other keywords are ignored. . Note that CIDR expressions can match multiple devices. . For example (multi-line starts with pipe): - 10.1.2.0/24 mtu:9000 speed:25000 - eth0 mtu:9000 speed:25000 - lo mtu:65536 op:unknown - br0-mgmt mtu:9000 - br0-sta mtu:9000 - br0-stc mtu:9000 - br0-api mtu:1500 - bond0 mtu:9000 speed:50000 - bond0.25 mtu:1500 speed:50000 - ens3 mtu:1500 speed:-1 desc:openstack_iface - ...
- (boolean) add --skip-unfound-ifaces to check_netlinks.py.
- (string) Set thresholds for number of running processes. Defaults to disabled; to enable, specify 'auto' for the charm to generate thresholds based on processor count, or manually provide arguments for check_procs, for example: "-k -w 250 -c 300" to set warning and critical levels manually and exclude kernel threads.
- (string) Comma separated list of mount points to exclude from checks for readonly filesystem. Can be a substring rather than the entire mount point, e.g. /sys will match all filesystems beginning with the string /sys. The check is disabled on all LXD units, and also for non-container units if this parameter is set to ''.
- (int) Port on which nagios-nrpe-server will listen
- (string) A string to be appended onto all the nrpe checks created by this charm to avoid potential clashes with existing checks
- (string) Check swap utilisation. See the nagios check_swap plugin help for further details. The format looks like "-w 40% -c 25%" . Set to '' in order to disable this check.
- (string) Swapout activity check. Thresholds are expressed in kB, interval in seconds. . Set to '' in order to disable this check.
- -i 5 -w 10240 -c 40960
- (string) Set thresholds for number of logged-in users. Defaults to disabled; to enable, manually provide arguments for check_user, for example: "-w 20 -c 25"
- (string) dmesg history length to check for xfs errors, in minutes . Defaults to disabled, set the time to enable.
- (string) Zombie processes check; defaults to disabled. To enable, set the desired check_procs arguments pertaining to zombies, for example: "-w 3 -c 6 -s Z"