hw health #7

Supports: bionic xenial
Add to new model

Description

This addon installs hardware monitoring tools and configures Nagios checks
for the system hardware and storage monitoring.
.
Vendors supported:
Dell (MegaRAID)
Supermicro (LSI SAS)
Huawei (LSI SAS)
.
Tools supported:
megacli
mdadm
sas2ircu
sas3ircu


Overview

This charm installs Vendor supplied system monitoring tools and configures
Nagios NRPE checks. It will only work for bare-metal installations on specific
hardware.

Currently supported hardware is:
* Dell: LSI Logic MegaRAID SAS (Broadcoam MegaCLI utility)

  • Supermicro: LSI SAS3008 RAID card with sas3ircu (Broadcoam's SAS3IRCU_P16)
    mp3sas linux driver

  • Huawei: LSI SAS2308 RAID card with sas2ircu (Huawei FusionServer Tools
    InfoCollect)

  • Linux software RAID (mdadm)

In the backlog, hp-health logic still needs to be backported to support
Hewlett-Packard equipment (HP Smart Array Controllers and MSA Controllers with
hpacucli, hpssacli, ssacli).

Furthermore, other hardware in the roadmap is:
* Huawei's ES3000 V2 PCIe SSD Card with hio_info (Huawei ES3000 V2 Driver)
* S.M.A.R.T. Monitoring tool (smartctl)

Usage

Step by step instructions on using the charm:

juju deploy ubuntu
juju deploy hw-health
juju deploy nrpe
juju add-relation ubuntu nrpe
juju add-relation ubuntu hw-health
juju add-relation hw-health nrpe

Charmstore version already ships a resource. However, a new resource can be
attached:

  • Option 1: juju deploy hw-health --resource tools=/tmp/zipfile.zip

  • Option 2: juju attach-resource hw-health tools=/tmp/zipfile.zip

In both cases format of zipfile.zip must be one of the following:
example zip /tmp/zipfile.zip megacli sas2ircu sas3ircu zip /tmp/zipfile.zip megacli etc.

Known Limitations and Issues

Charm only install method is via Juju resources. There are plans to support
snaps but snapstore only supports strictly confined snaps. Hardware monitoring
tools need special permissions that are under development.

See https://forum.snapcraft.io/t/request-for-classic-confinement-sas2ircu/9023

"tools" resource needs to be attached in ZIP format, and hardware monitoring
tool(s) need to be on the first level of the archive tree.

Configuration

Manufacturer option needs to be left in auto mode.

Contact Information

Please contact the Nagios charmers via the "Submit a bug" link.

Upstream Project Name


Configuration

debug
(boolean) Enable debug logging.
manufacturer
(string) Choose the tools to get deployed (hp, dell, supermicro, huawei) or leave the charm to self discover the tools needed to run hardware health checks.
auto
nagios_context
(string) Used by the nrpe subordinate charms. A string that will be prepended to instance name to set the host name in nagios. So for instance the hostname would be something like: juju-myservice-0 If you're running multiple environments with the same services in them this allows you to differentiate between them.
juju
nagios_servicegroups
(string) A comma-separated list of nagios servicegroups. If left empty, the nagios_context will be used as the servicegroup
snap_proxy
(string) HTTP/HTTPS web proxy for Snappy to use when accessing the snap store.
snap_proxy_url
(string) The address of a Snap Store Proxy to use for snaps e.g. http://snap-proxy.example.com
snapd_refresh
(string) How often snapd handles updates for installed snaps. The default (an empty string) is 4x per day. Set to "max" to check once per month based on the charm deployment date. You may also set a custom string as described in the 'refresh.timer' section here: https://forum.snapcraft.io/t/system-options/87
timeout
(int) Amount of time allowed for scripts to run before exiting.
30