ceph osd #450

Supports: xenial bionic eoan focal trusty
Add to new model

Description

Ceph is a distributed storage and network file system designed to provide
excellent performance, reliability, and scalability.
.
This charm provides the Ceph OSD personality for expanding storage capacity
within a ceph deployment.


Overview

Ceph is a unified, distributed storage system designed for
excellent performance, reliability, and scalability.

The ceph-osd charm deploys the Ceph object storage daemon (OSD) and manages its
volumes. It is used in conjunction with the ceph-mon charm.
Together, these charms can scale out the amount of storage available in a Ceph
cluster.

Usage

Storage devices

The list of all possible storage devices for the cluster is defined by the
osd-devices option (default value is /dev/vdb). Configuration is typically
provided via a YAML file, like ceph-osd.yaml. See the following examples:

  1. Block devices (regular)
    ceph-osd:
      options:
        osd-devices: /dev/vdb /dev/vdc /dev/vdd

Each regular block device must be an absolute path to a device node.

  1. Block devices (Juju storage)
    ceph-osd:
      storage:
        osd-devices: cinder,20G

See the Juju documentation for guidance on implementing
Juju storage.

  1. Directory-backed OSDs
    ceph-osd:
      storage:
        osd-devices: /var/tmp/osd-1

Note: OSD directories can no longer be created starting with Ceph
Nautilus. Existing OSD directories will continue to function after an upgrade
to Nautilus.

The list defined by option osd-devices may affect newly added ceph-osd units
as well as existing units (the option may be modified after units have been
added). The charm will attempt to activate as Ceph storage any listed device
that is visible by the unit's underlying machine. To prevent the activation of
volumes on existing units the blacklist-add-disk action may be used.

The configuration option is modified in the usual way. For instance, to have it
consist solely of devices '/dev/sdb' and '/dev/sdc':

juju config ceph-osd osd-devices='/dev/sdb /dev/sdc'

The charm will go into a blocked state (visible in juju status output) if it
detects pre-existing data on a device. In this case the operator can either
instruct the charm to ignore the disk (action blacklist-add-disk) or to have
it purge all data on the disk (action zap-disk).

Deployment

A cloud with three MON nodes is a typical design whereas three OSD nodes are
considered the minimum. For example, to deploy a Ceph cluster consisting of
three OSDs and three MONs:

juju deploy --config ceph-osd.yaml -n 3 ceph-osd
juju deploy --to lxd:0 ceph-mon
juju add-unit --to lxd:1 ceph-mon
juju add-unit --to lxd:2 ceph-mon
juju add-relation ceph-osd ceph-mon

Here, a containerised MON is running alongside each OSD.

Note: Refer to the Install OpenStack page in the
OpenStack Charms Deployment Guide for instructions on installing the ceph-osd
application for use with OpenStack.

For each ceph-osd unit, the ceph-osd charm will scan for all the devices
configured via the osd-devices option and attempt to assign to it all the
ones it finds. The cluster's initial pool of available storage is the "sum" of
all these assigned devices.

Network spaces

This charm supports the use of Juju network spaces (Juju
v.2.0). This feature optionally allows specific types of the application's
network traffic to be bound to subnets that the underlying hardware is
connected to.

Note: Spaces must be configured in the backing cloud prior to deployment.

The ceph-osd charm exposes the following Ceph traffic types (bindings):

  • 'public' (front-side)
  • 'cluster' (back-side)

For example, providing that spaces 'data-space' and 'cluster-space' exist, the
deploy command above could look like this:

juju deploy --config ceph-osd.yaml -n 3 ceph-osd \
   --bind "public=data-space cluster=cluster-space"

Alternatively, configuration can be provided as part of a bundle:

    ceph-osd:
      charm: cs:ceph-osd
      num_units: 1
      bindings:
        public: data-space
        cluster: cluster-space

Refer to the Ceph Network Reference to learn about the
implications of segregating Ceph network traffic.

Note: Existing ceph-osd units configured with the ceph-public-network
or ceph-cluster-network options will continue to honour them. Furthermore,
these options override any space bindings, if set.

AppArmor profiles

Although AppArmor is not enabled for Ceph by default, an AppArmor profile can
be generated by the charm by assigning a value of 'complain', 'enforce', or
'disable' (the default) to option aa-profile-mode.

Caution: Enabling an AppArmor profile is disruptive to a running Ceph
cluster as all ceph-osd processes must be restarted.

The new profile has a narrow supported use case, and it should always be
verified in pre-production against the specific configurations and topologies
intended for production.

The profiles generated by the charm should not be used in the following
scenarios:

  • On any version of Ubuntu older than 16.04
  • On any version of Ceph older than Luminous
  • When OSD journal devices are in use
  • When Ceph BlueStore is enabled

Block device encryption

The ceph-osd charm supports encryption for OSD volumes that are backed by block
devices. To use Ceph's native key management framework, available since Ceph
Jewel, set option osd-encrypt for the ceph-osd charm:

    ceph-osd:
      options:
        osd-encrypt: True

Here, dm-crypt keys are stored in the MON sub-cluster.

Alternatively, since Ceph Luminous, encryption keys can be stored in Vault,
which is deployed and initialised via the vault charm. Set
options osd-encrypt and osd-encrypt-keymanager for the ceph-osd charm:

    ceph-osd:
      options:
        osd-encrypt: True
        osd-encrypt-keymanager: vault

Important: Post deployment configuration will only affect block devices
associated with new ceph-osd units.

Actions

This section covers Juju actions supported by the charm.
Actions allow specific operations to be performed on a per-unit basis. To
display action descriptions run juju actions ceph-osd. If the charm is not
deployed then see file actions.yaml.

  • add-disk
  • blacklist-add-disk
  • blacklist-remove-disk
  • list-disks
  • osd-in
  • osd-out
  • security-checklist
  • zap-disk

Working with OSDs

Set OSDs to 'out'

Use the osd-out action to set all OSD volumes on a unit to 'out'.

Warning: This action has the potential of impacting your cluster
significantly. The Ceph documentation on this
topic is considered essential reading.

The osd-out action sets all OSDs on the unit as 'out'. Unless the cluster
itself is set to 'noout' this action will cause Ceph to rebalance data by
migrating PGs out of the unit's OSDs and onto OSDs available on other units.
The impact is twofold:

  1. The available space on the remaining OSDs is reduced. Not only is there less
    space for future workloads but there is a danger of exceeding the cluster's
    storage capacity.
  2. The traffic and CPU load on the cluster is increased.

Note: It has been reported that setting OSDs as 'out' may cause some PGs
to get stuck in the 'active+remapped' state. This is an upstream issue.

The ceph-mon charm has an action called set-noout that sets
'noout' for the cluster.

It may be perfectly fine to have data rebalanced. The decisive factor is
whether the OSDs are being paused temporarily (e.g. the underlying machine is
scheduled for maintenance) or whether they are being removed from the cluster
completely (e.g. the storage hardware is reaching EOL).

Example:

juju run-action --wait ceph-osd/4 osd-out

Set OSDs to 'in'

Use the osd-in action to set all OSD volumes on a unit to 'in'.

The osd-in action is reciprocal to the osd-out action. The OSDs are set to
'in'. It is typically used when the osd-out action was used in conjunction
with the cluster 'noout' flag.

Example:

juju run-action --wait ceph-osd/4 osd-in

Working with disks

List disks

Use the list-disks action to list disks known to a unit.

The action lists the unit's block devices by categorising them in three ways:

  • disks: visible (known by udev), unused (not mounted), and not designated as
    an OSD journal (via the osd-journal configuration option)

  • blacklist: like disks but blacklisted (see action blacklist-add-disk)

  • non-pristine: like disks but not eligible for use due to the presence of
    existing data

Example:

juju run-action --wait ceph-osd/4 list-disks

Add a disk

Use the add-disk action to add a disk to a unit.

A ceph-osd unit is automatically assigned OSD volumes based on the current
value of the osd-devices application option. The add-disk action allows the
operator to manually add OSD volumes (for disks that are not listed by
osd-devices) to an existing unit.

Parameters

  • osd-devices (required)
    A space-separated list of devices to format and initialise as OSD volumes.
  • bucket
    The name of a Ceph bucket to add these devices to.

Example:

juju run-action --wait ceph-osd/4 add-disk osd-devices=/dev/vde

Blacklist a disk

Use the blacklist-add-disk action to add a disk to a unit's blacklist.

The action allows the operator to add disks (that are visible to the unit's
underlying machine) to the unit's blacklist. A blacklisted device will not be
initialised as an OSD volume when the value of the osd-devices application
option changes. This action does not prevent a device from being activated via
the add-disk action.

Use the list-disks action to list the unit's blacklist entries.

Important: This action and blacklist do not have any effect on current
OSD volumes.

Parameters

  • osd-devices (required)
    A space-separated list of devices to add to a unit's blacklist.

Example:

juju run-action --wait ceph-osd/0 \
   blacklist-add-disk osd-devices='/dev/vda /dev/vdf'

Un-blacklist a disk

Use the blacklist-remove-disk action to remove a disk from a unit's
blacklist.

Parameters

  • osd-devices (required)
    A space-separated list of devices to remove from a unit's blacklist.

Each device should have an existing entry in the unit's blacklist. Use the
list-disks action to list the unit's blacklist entries.

Example:

juju run-action --wait ceph-osd/1 \
   blacklist-remove-disk osd-devices=/dev/vdb

Zap a disk

Use the zap-disk action to purge a disk of all data.

In order to prevent unintentional data loss, the charm will not use a disk that
has existing data already on it. To forcibly make a disk available, the
zap-disk action can be used. Due to the destructive nature of this action the
i-really-mean-it option must be passed. This action is normally followed by
the add-disk action.

Parameters

  • devices (required)
    A space-separated list of devices to be recycled.
  • i-really-mean-it (required)
    An option that acts as a confirmation for performing the action.

Example:

juju run-action --wait ceph-osd/3 zap-disk i-really-mean-it devices=/dev/vdc

Bugs

Please report bugs on Launchpad.

For general charm questions refer to the OpenStack Charm Guide.


Configuration

aa-profile-mode
(string) Enable apparmor profile. Valid settings: 'complain', 'enforce' or 'disable'. . NOTE: changing the value of this option is disruptive to a running Ceph cluster as all ceph-osd processes must be restarted as part of changing the apparmor profile enforcement mode. Always test in pre-production before enabling AppArmor on a live cluster.
disable
autotune
(boolean) Enabling this option will attempt to tune your network card sysctls and hard drive settings. This changes hard drive read ahead settings and max_sectors_kb. For the network card this will detect the link speed and make appropriate sysctl changes. Enabling this option should generally be safe.
availability_zone
(string) Custom availability zone to provide to Ceph for the OSD placement
bdev-enable-discard
(string) Enables async discard on devices. This option will enable/disable both bdev-enable-discard and bdev-async-discard options in ceph configuration at the same time. The default value "auto" will try to autodetect and should work in most cases. If you need to force a behaviour you can set it to "enable" or "disable". Only applies for Ceph Mimic or later.
auto
bluestore
(boolean) Enable bluestore storage format for OSD devices; Only applies for Ceph Luminous or later.
True
bluestore-block-db-size
(int) Size (in bytes) of a partition, file or LV to use for BlueStore metadata or RocksDB SSTs, provided on a per backend device basis. . Example: 128 GB device, 8 data devices provided in "osd-devices" gives 128 / 8 GB = 16 GB = 16000000000 bytes per device. . A default value is not set as it is calculated by ceph-disk (before Luminous) or the charm itself, when ceph-volume is used (Luminous and above).
bluestore-block-wal-size
(int) Size (in bytes) of a partition, file or LV to use for BlueStore WAL (RocksDB WAL), provided on a per backend device basis. . Example: 128 GB device, 8 data devices provided in "osd-devices" gives 128 / 8 GB = 16 GB = 16000000000 bytes per device. . A default value is not set as it is calculated by ceph-disk (before Luminous) or the charm itself, when ceph-volume is used (Luminous and above).
bluestore-db
(string) Path to a BlueStore WAL db block device or file. If you have a separate physical device faster than the block device this will store all of the filesystem metadata (RocksDB) there and also integrates the Write Ahead Log (WAL) unless a further separate bluestore-wal device is configured which is not needed unless it is faster again than the bluestore-db device. This block device is used as an LVM PV and then space is allocated for each block device as needed based on the bluestore-block-db-size setting.
bluestore-wal
(string) Path to a BlueStore WAL block device or file. Should only be set if using a separate physical device that is faster than the DB device (such as an NVDIMM or faster SSD). Otherwise BlueStore automatically maintains the WAL inside of the DB device. This block device is used as an LVM PV and then space is allocated for each block device as needed based on the bluestore-block-wal-size setting.
ceph-cluster-network
(string) The IP address and netmask of the cluster (back-side) network (e.g., 192.168.0.0/24) . If multiple networks are to be used, a space-delimited list of a.b.c.d/x can be provided.
ceph-public-network
(string) The IP address and netmask of the public (front-side) network (e.g., 192.168.0.0/24) . If multiple networks are to be used, a space-delimited list of a.b.c.d/x can be provided.
config-flags
(string) User provided Ceph configuration. Supports a string representation of a python dictionary where each top-level key represents a section in the ceph.conf template. You may only use sections supported in the template. . WARNING: this is not the recommended way to configure the underlying services that this charm installs and is used at the user's own risk. This option is mainly provided as a stop-gap for users that either want to test the effect of modifying some config or who have found a critical bug in the way the charm has configured their services and need it fixed immediately. We ask that whenever this is used, that the user consider opening a bug on this charm at http://bugs.launchpad.net/charms providing an explanation of why the config was needed so that we may consider it for inclusion as a natively supported config in the charm.
crush-initial-weight
(float) The initial crush weight for newly added osds into crushmap. Use this option only if you wish to set the weight for newly added OSDs in order to gradually increase the weight over time. Be very aware that setting this overrides the default setting, which can lead to imbalance in the cluster, especially if there are OSDs of different sizes in use. By default, the initial crush weight for the newly added osd is set to its volume size in TB. Leave this option unset to use the default provided by Ceph itself. This option only affects NEW OSDs, not existing ones.
customize-failure-domain
(boolean) Setting this to true will tell Ceph to replicate across Juju's Availability Zone instead of specifically by host.
ephemeral-unmount
(string) Cloud instances provide ephermeral storage which is normally mounted on /mnt. . Setting this option to the path of the ephemeral mountpoint will force an unmount of the corresponding device so that it can be used as a OSD storage device. This is useful for testing purposes (cloud deployment is not a typical use case).
harden
(string) Apply system hardening. Supports a space-delimited list of modules to run. Supported modules currently include os, ssh, apache and mysql.
ignore-device-errors
(boolean) By default, the charm will raise errors if a whitelisted device is found, but for some reason the charm is unable to initialize the device for use by Ceph. . Setting this option to 'True' will result in the charm classifying such problems as warnings only and will not result in a hook error.
key
(string) Key ID to import to the apt keyring to support use with arbitary source configuration from outside of Launchpad archives or PPA's.
loglevel
(int) OSD debug level. Max is 20.
1
max-sectors-kb
(int) This parameter will adjust every block device in your server to allow greater IO operation sizes. If you have a RAID card with cache on it consider tuning this much higher than the 1MB default. 1MB is a safe default for spinning HDDs that don't have much cache.
1048576
nagios_context
(string) Used by the nrpe-external-master subordinate charm. A string that will be prepended to instance name to set the hostname in nagios. So for instance the hostname would be something like: . juju-myservice-0 . If you're running multiple environments with the same services in them this allows you to differentiate between them.
juju
nagios_servicegroups
(string) A comma-separated list of nagios servicegroups. If left empty, the nagios_context will be used as the servicegroup
osd-devices
(string) The devices to format and set up as OSD volumes. . These devices are the range of devices that will be checked for and used across all service units, in addition to any volumes attached via the --storage flag during deployment. . For ceph >= 0.56.6 these can also be directories instead of devices - the charm assumes anything not starting with /dev is a directory instead.
/dev/vdb
osd-encrypt
(boolean) By default, the charm will not encrypt Ceph OSD devices; however, by setting osd-encrypt to True, Ceph's dmcrypt support will be used to encrypt OSD devices. . Specifying this option on a running Ceph OSD node will have no effect until new disks are added, at which point new disks will be encrypted.
osd-encrypt-keymanager
(string) Keymanager to use for storage of dm-crypt keys used for OSD devices; by default 'ceph' itself will be used for storage of keys, making use of the key/value storage provided by the ceph-mon cluster. . Alternatively 'vault' may be used for storage of dm-crypt keys. Both approaches ensure that keys are never written to the local filesystem. This also requires a relation to the vault charm.
ceph
osd-format
(string) Format of filesystem to use for OSD devices; supported formats include: . xfs (Default >= 0.48.3) ext4 (Only option < 0.48.3) btrfs (experimental and not recommended) . Only supported with ceph >= 0.48.3.
xfs
osd-journal
(string) The device to use as a shared journal drive for all OSD's. By default a journal partition will be created on each OSD volume device for use by that OSD. . Only supported with ceph >= 0.48.3.
osd-journal-size
(int) Ceph OSD journal size. The journal size should be at least twice the product of the expected drive speed multiplied by filestore max sync interval. However, the most common practice is to partition the journal drive (often an SSD), and mount it such that Ceph uses the entire partition for the journal. . Only supported with ceph >= 0.48.3.
1024
osd-max-backfills
(int) The maximum number of backfills allowed to or from a single OSD. . Setting this option on a running Ceph OSD node will not affect running OSD devices, but will add the setting to ceph.conf for the next restart.
osd-recovery-max-active
(int) The number of active recovery requests per OSD at one time. More requests will accelerate recovery, but the requests places an increased load on the cluster. . Setting this option on a running Ceph OSD node will not affect running OSD devices, but will add the setting to ceph.conf for the next restart.
prefer-ipv6
(boolean) If True enables IPv6 support. The charm will expect network interfaces to be configured with an IPv6 address. If set to False (default) IPv4 is expected. . NOTE: these charms do not currently support IPv6 privacy extension. In order for this charm to function correctly, the privacy extension must be disabled and a non-temporary address must be configured/available on your network interface.
source
(string) Optional configuration to support use of additional sources such as: . - ppa:myteam/ppa - cloud:xenial-proposed/ocata - http://my.archive.com/ubuntu main . The last option should be used in conjunction with the key configuration option.
sysctl
(string) YAML-formatted associative array of sysctl key/value pairs to be set persistently. By default we set pid_max, max_map_count and threads-max to a high value to avoid problems with large numbers (>20) of OSDs recovering. very large clusters should set those values even higher (e.g. max for kernel.pid_max is 4194303).
{ kernel.pid_max : 2097152, vm.max_map_count : 524288, kernel.threads-max: 2097152 }
use-direct-io
(boolean) Configure use of direct IO for OSD journals.
True
use-syslog
(boolean) If set to True, supporting services will log to syslog.