realtime analytics with storm #8
Real Time Analytics w/ Storm
Hortonworks (HDP 2.1.3) Apache Storm is a free and open source distributed real-time computation system. Storm makes it easy to reliably process unbounded streams of data, doing for real-time processing what Hadoop did for batch processing Storm has many use cases: real-time analytics, on-line machine learning, continuous computation, distributed RPC, ETL, and more. Storm is fast: a benchmark clocked it at over a million tuples processed per second per node. It is scalable, fault-tolerant, guarantees your data will be processed, and is easy to set up and operate.
This charm will build a storm cluster consistent of:
Nimbus master node with following daemons will configured and loaded
- storm-drpc
- storm-logviewer
- storm-nimbus
- storm-ui
Storm worker node(s) with following daemons will configured and loaded:
- storm-logviewer
- storm-supervisor
Verify deployment
Zookeeper must be loaded and active, and temporarily exposed.
juju expose hdp-zookeeper
to verify:
echo ruok | nc {hdp-zookeeper/0 IP address} 2181
echo stat | nc {hdp-zookeeper/0 IP address} 2181
to unexpose zookeeper:
juju unexpose hdp-zookeeper
To verify status of Apache Storm cluster go to:
http://{nimbus-server ip address}:8080
Real-time usage
Example - Deploying and Managing Apache Storm Topologies:
Following steps demonstrates how to deploy a Storm WordCount application . WordCount application has two parts- Spout randomly generates data streams and Bolts processes generated stream.
juju run --service nimbus-server "storm jar /usr/lib/storm/contrib/storm-starter/storm-starter-0.9.1.2.1.3.0-563-jar-with-dependencies.jar storm.starter.WordCountTopology WordCount"
How to monitor deployment:
- go to http://{nimbus-server ip address}:8080
- Under "Topology summary", click on "WordCount"
- Monitor Spouts & Bolts tasks
Scale out usage
Example, adding 5 more worker nodes
juju add-unit -n 5 storm-worker
To verify a successful scale:
- http://{nimbus-server ip address}:8080
- Under "Topology summary", click on "WordCount"
- Click on "Spout" link in "Spouts (All time)" section
- Note "Host" list under "Executors (All time)" section
- Go back to "Topology summary"
- Click on "Rebalance" in "Topology actions" section
- Click on "Spout" link in "Spouts (All time)" section
- Refresh, notice re-balancing of job as more storm-worker threads become available