hadoop #7
A Hadoop Cluster
This bundle is a 7 node Hadoop cluster designed to scale out. It contains the following units:
- One Hadoop Master Node
- Two Hadoop Slave Cluster Nodes
- Three Hive Nodes
- 1 MySQL Node
Usage
Once you have a cluster running, just run:
juju run --unit hadoop-master/0 "sudo -u hdfs /usr/lib/hadoop/terasort.sh"
The above command will run terasort for you and show the progress of the terasort. You can also go to a web page, run
juju status hadoop-master
juju expose hadoop-master
to get the public IP of the master node and open the correct port, then go to http://public-address:50070
to get the status page of the cluster.
Scale Out Usage
In order to scale out you can add hadoop-slavecluster units:
juju add-unit hadoop-slavecluster
juju add-unit -n10 hadoop-slavecluster # this adds 10 units.
If you are on a public cloud please note that scaling too fast might trigger rate limiting, so if you are going to deploy a large-node cluster it might help to monitor your cloud provider's dashboard and metrics to ensure you're not hitting provider limits.
We also recommend larger instances for scaling past 100 hundred nodes, see the referenced blog post for config tips and tricks.
References
- Scaling a 2000-node Hadoop Cluster on EC2/Ubuntu With Juju - Contains useful Hadoop information for providing config options when scaling to 1000+ nodes. Note that some Juju information is out of date, but the basic concepts still apply.