hdp storm #12
Description
Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing. Storm is simple, can be used with any programming language, and is a lot of fun to use!
- Tags:
- applications ›
Hortonworks Storm Overview
Hortonworks (HDP 2.1.3) Apache Storm is a free and open source distributed real-time computation system. Storm makes it easy to reliably process unbounded streams of data, doing for real-time processing what Hadoop did for batch processing Storm has many use cases: real-time analytics, on-line machine learning, continuous computation, distributed RPC, ETL, and more. Storm is fast: a benchmark clocked it at over a million tuples processed per second per node. It is scalable, fault-tolerant, guarantees your data will be processed, and is easy to set up and operate. This charm will build a storm cluster consistent of: 1. Nimbus master node with following daemons will configured and loaded
storm-drpc storm-logviewer storm-nimbus storm-ui
- Storm worker node(s) with following daemons will configured and loaded:
storm-logviewer
storm-supervisor
Deployment
start a 3 node Hortonworks zookeeper quorum:
juju deploy hdp-zookeeper hdp-zookeeper
juju add-unit -n 2 hdp-zookeeper
NOTE: Zookeeper must be loaded and active, to verify:
$echo ruok | nc {hdp-zookeeper/0 IP address} 2181
imok # I'm ok must be the reply
$ echo stat | nc {hdp-zookeeper/0 IP address} 2181
Node count: 4 # check for node count
start Apache Storm:
juju deploy hdp-storm nimbus-server
juju deploy hdp-storm storm-worker
juju add-relation nimbus-server:zookeeper hdp-zookeeper:zookeeper
juju add-relation storm-worker:zookeeper hdp-zookeeper:zookeeper
juju add-relation nimbus-server:nimbus storm-worker:slave
To verify a successful deployment:
http://{nimbus-server ip address}:8080
Real-time usage
Example - Deploying and Managing Apache Storm Topologies: Following steps demonstrates how to deploy a Storm WordCount application . WordCount application has two parts- Spout randomly generates data streams and Bolts processes generated stream.
- $juju ssh nimbus-server/0
- $storm jar /usr/lib/storm/contrib/storm-starter/storm-starter-0.9.1.2.1.3.0-563-jar-with-dependencies.jar storm.starter.WordCountTopology WordCount
How to monitor deployment:
- go to http://{nimbus-server ip address}:8080
- Under "Topology summary", click on "WordCount"
- Monitor Spouts & Bolts tasks
Scale out usage
Example, adding 5 more worker nodes
juju add-unit -n 5 storm-worker
To verify a successful scale:
- http://{nimbus-server ip address}:8080
- Under "Topology summary", click on "WordCount"
- Click on "Spout" link in "Spouts (All time)" section
- Note "Host" list under "Executors (All time)" section
- Go back to "Topology summary"
- Click on "Rebalance" in "Topology actions" section
- Click on "Spout" link in "Spouts (All time)" section
- Refresh, notice re-balancing of job as more storm-worker threads become available
Contact Information
Amir Sanjar amir.sanjar@canonical.com