hadoop #7

Supports: precise
Add to new model

A Hadoop Cluster

This bundle is a 7 node Hadoop cluster designed to scale out. It contains the following units:

  • One Hadoop Master Node
  • Two Hadoop Slave Cluster Nodes
  • Three Hive Nodes
  • 1 MySQL Node

Usage

Once you have a cluster running, just run:

juju run --unit hadoop-master/0 "sudo -u hdfs /usr/lib/hadoop/terasort.sh"

The above command will run terasort for you and show the progress of the terasort. You can also go to a web page, run

juju status hadoop-master
juju expose hadoop-master

to get the public IP of the master node and open the correct port, then go to http://public-address:50070 to get the status page of the cluster.

Scale Out Usage

In order to scale out you can add hadoop-slavecluster units:

juju add-unit hadoop-slavecluster
juju add-unit -n10 hadoop-slavecluster # this adds 10 units.

If you are on a public cloud please note that scaling too fast might trigger rate limiting, so if you are going to deploy a large-node cluster it might help to monitor your cloud provider's dashboard and metrics to ensure you're not hitting provider limits.

We also recommend larger instances for scaling past 100 hundred nodes, see the referenced blog post for config tips and tricks.

References

Bundle configuration

Embed this bundle

Add this card to your website by copying the code below. Learn more.

Preview