apache hadoop client #0

Supports: trusty
Add to new model


Hadoop is a software platform that lets one easily write and run applications that process vast amounts of data.
This charm manages a dedicated client node as a place to run mapreduce jobs. It can also host client services, such as Apache Hive or Apache Pig, which require access to Apache Hadoop.


The Apache Hadoop software library is a framework that allows for the
distributed processing of large data sets across clusters of computers
using a simple programming model.

This charm deploys a client node running
Apache Hadoop 2.4.1
from which workloads can be run, either manually or via workload


This charm is intended to be deployed via one of the
For example:

juju quickstart u/bigdata-dev/apache-analytics-sql

This will deploy the Apache Hadoop platform with a single client node
which is running Apache Hive to perform SQL-like queries against your data.

If you wanted to also wanted to be able to analyze your data using Apache Pig,
you could deploy it on the same client:

juju deploy cs:~bigdata-dev/apache-pig pig
juju add-relation client pig

Note that horizontally scaling client nodes with juju add-unit will also
replicate all of the node's associated workload charms, since they are
subordinates. While this can be useful to provide HA fail-over, if you
actually intend to have separate client nodes for, e.g., Apache Hive and Apache
Pig, you should instead deploy a separate instance of apache-hadoop-client:

juju deploy cs:~bigdata-dev/apache-hadoop-client pig-client
juju add-relation pig-client yarn-master
juju add-relation pig-client hdfs-master
juju deploy cs:~bigdata-dev/apache-pig pig
juju add-relation pig-client pig

You can also manually load and run map-reduce jobs via the client:

juju scp my-job.jar client/0:
juju ssh client/0
hadoop jar my-job.jar

Deploying in Network-Restricted Environments

The Apache Hadoop charms can be deployed in environments with limited network
access. To deploy in this environment, you will need a local mirror to serve
the packages and resources required by these charms.

Mirroring Packages

You can setup a local mirror for apt packages using squid-deb-proxy.
For instructions on configuring juju to use this, see the
Juju Proxy Documentation.

Mirroring Resources

In addition to apt packages, the Apache Hadoop charms require a few binary
resources, which are normally hosted on Launchpad. If access to Launchpad
is not available, the jujuresources library makes it easy to create a mirror
of these resources:

sudo pip install jujuresources
juju resources fetch --all apache-hadoop-client/resources.yaml -d /tmp/resources
juju resources serve -d /tmp/resources

This will fetch all of the resources needed by this charm and serve them via a
simple HTTP server. You can then set the resources_mirror config option to
have the charm use this server for retrieving resources.

You can fetch the resources for all of the Apache Hadoop charms
(apache-hadoop-hdfs-master, apache-hadoop-yarn-master,
apache-hadoop-compute-slave, apache-hadoop-client, etc) into a single
directory and serve them all with a single juju resources serve instance.

Contact Information



(string) URL from which to fetch resources (e.g., Hadoop binaries) instead of Launchpad.