Hadoop is a software platform that lets one easily write and run applications that process vast amounts of data.
This charm manages a dedicated client node as a place to run mapreduce jobs. It can also host client services, such as Apache Hive or Apache Pig, which require access to Apache Hadoop.
The Apache Hadoop software library is a framework that allows for the
distributed processing of large data sets across clusters of computers
using a simple programming model.
This charm deploys a client node running
Apache Hadoop 2.4.1
from which workloads can be run, either manually or via workload
This charm is intended to be deployed via one of the
juju quickstart u/bigdata-dev/apache-analytics-sql
This will deploy the Apache Hadoop platform with a single client node
which is running Apache Hive to perform SQL-like queries against your data.
If you wanted to also wanted to be able to analyze your data using Apache Pig,
you could deploy it on the same client:
juju deploy cs:~bigdata-dev/apache-pig pig juju add-relation client pig
Note that horizontally scaling client nodes with
juju add-unit will also
replicate all of the node's associated workload charms, since they are
subordinates. While this can be useful to provide HA fail-over, if you
actually intend to have separate client nodes for, e.g., Apache Hive and Apache
Pig, you should instead deploy a separate instance of apache-hadoop-client:
juju deploy cs:~bigdata-dev/apache-hadoop-client pig-client juju add-relation pig-client yarn-master juju add-relation pig-client hdfs-master juju deploy cs:~bigdata-dev/apache-pig pig juju add-relation pig-client pig
You can also manually load and run map-reduce jobs via the client:
juju scp my-job.jar client/0: juju ssh client/0 hadoop jar my-job.jar
Deploying in Network-Restricted Environments
The Apache Hadoop charms can be deployed in environments with limited network
access. To deploy in this environment, you will need a local mirror to serve
the packages and resources required by these charms.
You can setup a local mirror for apt packages using squid-deb-proxy.
For instructions on configuring juju to use this, see the
Juju Proxy Documentation.
In addition to apt packages, the Apache Hadoop charms require a few binary
resources, which are normally hosted on Launchpad. If access to Launchpad
is not available, the
jujuresources library makes it easy to create a mirror
of these resources:
sudo pip install jujuresources juju resources fetch --all apache-hadoop-client/resources.yaml -d /tmp/resources juju resources serve -d /tmp/resources
This will fetch all of the resources needed by this charm and serve them via a
simple HTTP server. You can then set the
resources_mirror config option to
have the charm use this server for retrieving resources.
You can fetch the resources for all of the Apache Hadoop charms
apache-hadoop-client, etc) into a single
directory and serve them all with a single
juju resources serve instance.
- Amir Sanjar email@example.com
- Cory Johns firstname.lastname@example.org
- Kevin Monroe email@example.com
- (string) URL from which to fetch resources (e.g., Hadoop binaries) instead of Launchpad.