ubuntu repository cache #0
Description
This is a purpose-oriented charm, to provide a caching proxy mirror of the Ubuntu archive on cloud platforms. . This is a hybrid mirror. All data except /ubuntu/pool is synced from upstream Ubuntu mirrors every 2 hours. /ubuntu/pool is forwarded internally to squid-deb-proxy, which keeps a local cache of .deb files as they are
- Tags:
- cache-proxy ›
Overview
This charm provides a partial caching proxy mirror of the Ubuntu Software Repository. This is intended for deployment in cloud environments to provide a cloud-local repository. Metadata will be updated every two hours.
This is a hybrid mirror / cache. Repository metadata, data under the ubuntu/dists/ directory, is copied from an upstream Ubuntu mirror and checked to ensure that it is consistent. Requests for package files in /ubuntu/pool are forwarded internally to squid-deb-proxy, which keeps a local cache of .deb files as they are requested from the upstream mirror. This approach minimized load on the upstream archive server, improves performance, and requires less disk space than a static archive mirror.
Usage
Deploy the charm with these example commands:
# Create cache units
juju deploy -n 3 ubuntu-repository-cache --constraints "mem=8G root-disk=80G"
juju set-constraints ubuntu-repository-cache mem=8G root-disk=80G
# Provide an haproxy front-end for the service
juju deploy -n 2 haproxy
juju add-relation haproxy:reverseproxy ubuntu-repository-cache:website
# Expose haproxy on the public network
$ juju expose haproxy
The ubuntu-repository-cache charm's constraint for disk size is optional, the intention is to allocate sufficient space for the metadata mirror (approximately 4GB) plus as much space as can be afforded for squid to cache package files.
Ideally this charm should be deployed with this disk space allocated as fast storage (which may require ephemeral storage depending on the provider); use of the manual provider may be necessary to achieve this. Alternately, the use of provider-specific constraints can be used to specify an instance type which provides ephemeral storage. The ephemeral storage device(s) would need to be specified in the charm configuration option 'ephemeral-devices'.
When running, you can browse to http://ip-address to view the repository when the initial mirror sync has completed. Alternatively, you may add a proxy in front of this service.
Disabling OS updates/upgrades
When deploying new units of this charm, Juju may update the operating system. This charm could fail to deploy if this charm is being deployed as the service for the package repository configured in the cloud image.
If that is the case, the juju environment configuration should be edited to disable OS update/upgrade. The charm will change the apt source to point at the archive specified in the 'sync-host' charm configuration and will perform an update during charm installation. (See https://bugs.launchpad.net/juju-core/+bug/1350493/comments/4)
Monitoring
This charm provides the local-monitoring relationship to enable more detailed monitoring of the metadata service with Nagios though NRPE.
# Example
juju deploy nagios
juju deploy ubuntu-repository-cache
juju deploy nrpe
juju add-relation nagios:monitors nrpe:monitors
juju add-relation ubuntu-repository-cache:local-monitors nrpe:local-monitors
juju expose nagios
Scale out usage
When additional units are added the content they serve will be synchronized from the lead unit. As the service is scaled, the use of haproxy in front of the mirror may be desirable to distribute load.
Sizing
Cache charm performance is sensitive to network throughput, system memory, and disk space.
Suggested minimum hardware per cache unit:
- 2 processors
- 24 GB RAM
- 200 GB storage (preferable fast, ephemeral storage)
Juju deployment constraints can be used to match these needs if the manual provider is not used. If the cloud provider supports the use of constraints to specify exact instance types, they should be used for consistent, repeatable deployment. An example of this is shown below. Exact instance types can be specified for EC2 which gives us known network performance characteristics (as networking can not be specified by generic constraints).
The basic pattern for the repository cache deployment puts multiple units of the ubuntu-repository-cache
behind haproxy
. The relationship between haproxy
and ubuntu-repository-cache
ensures that only active cache units are contacted by clients.
Testing configuration with explicit AWS instance types:
-
Cache units (x2) -- c3.8xlarge instance type:
- High network performance
- 60 GB RAM
- 320GB SSD ephemeral storage
-
HAProxy unit (x1) -- m3.xlarge instance type:
- High network performance
- 15 GB RAM
Example - Deployment with constraints
This example uses the ephemeral-devices
configuration option of the ubuntu-repository-cache
charm to provide access to a large, fast storage device. The value in this example is particular to the device name of ephemeral storage on an EC2 c3.8xlarge instance.
# Create a configuration file for the service
$ cat > urc-config.yaml << EOF
ubuntu-repository-cache:
sync-host: archive.ubuntu.com
sync-on-start: false
ephemeral-devices: /dev/xvdb
EOF
# Set instance type constraints for each charm
$ juju set-constraints --service haproxy instance-type=m3.xlarge
$ juju set-constraints --service ubuntu-repository-cache instance-type=c3.8xlarge
# Deploy the charms
$ juju deploy --num-unit 2 ubuntu-repository-cache --config=urc-config.yaml
$ juju deploy haproxy
# Add relationship between haproxy and the cache
$ juju add-relation haproxy:reverseproxy ubuntu-repository-cache:website
# Expose haproxy on the public network
$ juju expose haproxy
Example - Manual deployment
If the cloud has no juju provider, or sufficient control of constraints is not possible, it may be necessary to use Juju's manual provider. In this case, instances would be configured with Ubuntu per the manual provisioning documentation. Then deployment would specify specific machines for deployment.
Machines:
- Machine #1 in juju is sized for haproxy
- Machine #2 - #4 are sized for ubuntu-repository-cache with ephemeral storage device /dev/sdb
Steps:
# Create a configuration file for the service
$ cat > urc-config.yaml << EOF
ubuntu-repository-cache:
sync-host: archive.ubuntu.com
sync-on-start: false
ephemeral-devices: /dev/sdb
EOF
# Deploy the charms
$ juju deploy --to 1 haproxy
$ juju deploy --to 2 ubuntu-repository-cache --config=urc-config.yaml
$ juju add-unit --to 3 ubuntu-repository-cache
$ juju add-unit --to 4 ubuntu-repository-cache
# Add relationship between haproxy and the cache
$ juju add-relation haproxy:reverseproxy ubuntu-repository-cache:website
# Expose haproxy on the public network
$ juju expose haproxy
Example - Non-Ubuntu-archive mirroring
ubuntu-repository-cache may be used to mirror Debian/Ubuntu-style repositories which are not archive.ubuntu.com. For example, to mirror ports.ubuntu.com:
ubuntu-repository-cache:
sync-host: ports.ubuntu.com
display-host: ports.ubuntu.com
path-base: ubuntu-ports
rsync-module: ubuntu-ports
sync-on-start: false
update-unit-apt-sources: false
ephemeral-devices: /dev/xvdb
Or to mirror a Debian repository from a community mirror:
ubuntu-repository-cache:
sync-host: mirrors.kernel.org
display-host: ftp.debian.org
path-base: debian
rsync-module: debian-archive/debian
sync-on-start: false
update-unit-apt-sources: false
ephemeral-devices: /dev/xvdb
"update-unit-apt-sources: false" is required when the repository being mirrored does not match the distribution/type running on the ubuntu-repository-cache units.
Known Limitations and Issues
- Find existing bugs or report new ones in Launchpad
Configuration
-
sync-host - The host name or IP address of the archive which will be used to keep this mirror updated. The mirror must support 'rsync' access.
-
sync-on-start - Pull data from the sync-host during inital charm deployment. This should be true if deploying a single unit and false if deploying multiple units to reduce initial startup time. When multiple units are deployed they will choose a leader and pull data from the sync-host.
-
ephemeral-devices - A comma-separated list of storages devices to use for metadata and squid cache storage. Leave this empty if only the root disk will be used. the device(s) will be formatted and mounted during charm installation. This option must be set at in itial charm deployment. Changes after deployment will not effect running units, only newly added units. An example would be '/dev/xvdb,/dev/xvdc' to specify two ephemeral disks for cache storage.
-
apache2_* - Apache2 configuration options for tuning of security and multi-processing.
Contact Information
Questions and comments can be posted to ubuntu-cloud@lists.ubuntu.com, see https://lists.ubuntu.com/mailman/listinfo/ubuntu-cloud to subscribe to this mailing list.
Bugs can be viewed or reported at https://bugs.launchpad.net/ubuntu-repository-cache
Configuration
- apache2_mpm_maxconnectionsperchild
- (int) Maximum number of requests a server process serves
- apache2_mpm_maxrequestworkers
- (int) Maximum number of simultaneous client connections
- 16384
- apache2_mpm_maxsparethreads
- (int) Maximum number of worker threads which are kept spare
- 200
- apache2_mpm_minsparethreads
- (int) Minimum number of worker threads which are kept spare
- 100
- apache2_mpm_serverlimit
- (int) Upper limit on configurable number of processes
- 256
- apache2_mpm_startservers
- (int) Initial number of server processes to start
- 2
- apache2_mpm_threadlimit
- (int) Sets the upper limit on the configurable number of threads per child process
- 64
- apache2_mpm_threadsperchild
- (int) Constant number of worker threads in each server process
- 64
- apache2_mpm_type
- (string) Select the worker or prefork multi-processing module
- worker
- apache2_server_signature
- (string) Security setting. Set to one of On Off EMail
- On
- apache2_server_tokens
- (string) Security setting. Set to one of Full OS Minimal Minor Major Prod
- OS
- apache2_trace_enabled
- (string) Security setting. Set to one of On Off extended
- Off
- cache-storage-size
- (int) Configurable option (in MBytes) to tune/override storage used by squid cache. If 0 or unset, then auto-calculate.
- display-host
- (string) The hostname displayed in certain contexts, for example Apache directory listings. This is not required to be exactly the same as the logical hostname of the deployment (for example, region.cloud.archive.ubuntu.com can use the default archive.ubuntu.com), but should be changed if the archive type is non-default (e.g. ports.ubuntu.com).
- archive.ubuntu.com
- ephemeral-devices
- (string) Provide a comma-separated list of storages devices to use for metadata and squid cache storage. Leave this empty if only the root disk will be used. the device(s) will be formatted and mounted during charm installation. This option must be set at in itial charm deployment. Changes after deployment will not effect running units, only newly added units. An example would be '/dev/xvdb,/dev/xvdc' to specify two ephemeral disks for cache storage.
- logrotate_count
- (int) The number of days we want to retain logs for
- 7
- logrotate_dateext
- (boolean) Use daily extension like YYYMMDD instead of simply adding a number
- True
- logrotate_rotate
- (string) daily, weekly, monthly, or yearly?
- daily
- mirror-series
- (string) A space-separated list of ubuntu series metadata to mirror. An empty or blank string will mirror everything.
- nagios_context
- (string) A string that will be prepended to instance name to set the host name in nagios. So for instance the hostname would be something like: juju-myservice-0 If you're running multiple environments with the same services in them this allows you to differentiate between them.
- juju
- nagios_servicegroups
- (string) A comma-separated list of nagios servicegroups. If left empty, the nagios_context will be used as the servicegroup
- path-base
- (string) The base URI path of the site you want to mirror. At the moment, this may only be a single-level base (no directory slashes). Default is ubuntu.
- ubuntu
- remoteip_logging
- (boolean) Enables configuration that treats incoming connections to Apache from RFC1918 addresses as proxy connections and logs the contents of the X-Forwarded-For header (if any) to the Apache logs.
- rsync-module
- (string) The rsync module to sync from on sync-host, normally the same as path-base. Default is ubuntu.
- ubuntu
- squid_snmp
- (boolean) Enable SNMP for Squid (bound on localhost:3401, community "public")
- sync-age-crit
- (int) Age (in seconds) of CRITICAL level in Nagios check for cache sync.
- 21600
- sync-age-warn
- (int) Age (in seconds) of WARNING level in Nagios check for cache sync.
- 10800
- sync-host
- (string) The DNS or IP of the site you want to mirror. Default is archive.ubuntu.com.
- archive.ubuntu.com
- sync-on-start
- (boolean) Pull data from the sync-host during inital charm deployment. This should be true if deploying a single unit and false if deploying multiple units to reduce initial startup time.
- True
- update-unit-apt-sources
- (boolean) Whether to configure the units' sources.list to point to upstream sync-host directly, to avoid chicken/egg problems when bootstrapping a cloud. Default is true; disable if mirrored distro or archive type does not match the unit host.
- True