realtime analytics with storm #10

Supports: trusty

Real Time Analytics w/ Storm

Hortonworks (HDP 2.1.3) Apache Storm is a free and open source distributed real-time computation system. Storm makes it easy to reliably process unbounded streams of data, doing for real-time processing what Hadoop did for batch processing Storm has many use cases: real-time analytics, on-line machine learning, continuous computation, distributed RPC, ETL, and more. Storm is fast: a benchmark clocked it at over a million tuples processed per second per node. It is scalable, fault-tolerant, guarantees your data will be processed, and is easy to set up and operate.

This charm will build a storm cluster consistent of:

Nimbus master node with following daemons will configured and loaded

  • storm-drpc
  • storm-logviewer
  • storm-nimbus
  • storm-ui

Storm worker node(s) with following daemons will configured and loaded:

  • storm-logviewer
  • storm-supervisor

Verify deployment

Zookeeper must be loaded and active, and temporarily exposed.

juju expose hdp-zookeeper

to verify:

echo ruok | nc {hdp-zookeeper/0 IP address} 2181
echo stat | nc {hdp-zookeeper/0 IP address} 2181

to unexpose zookeeper:

juju unexpose hdp-zookeeper

To verify status of Apache Storm cluster go to:

http://{nimbus-server ip address}:8080

Real-time usage

Example - Deploying and Managing Apache Storm Topologies:

Following steps demonstrates how to deploy a Storm WordCount application . WordCount application has two parts- Spout randomly generates data streams and Bolts processes generated stream.

juju run --service nimbus-server "storm jar /usr/lib/storm/contrib/storm-starter/storm-starter-  storm.starter.WordCountTopology WordCount"

How to monitor deployment:

  • go to http://{nimbus-server ip address}:8080
  • Under "Topology summary", click on "WordCount"
  • Monitor Spouts & Bolts tasks

Scale out usage

Example, adding 5 more worker nodes

juju add-unit -n 5 storm-worker

To verify a successful scale:

  • http://{nimbus-server ip address}:8080
  • Under "Topology summary", click on "WordCount"
  • Click on "Spout" link in "Spouts (All time)" section
  • Note "Host" list under "Executors (All time)" section
  • Go back to "Topology summary"
  • Click on "Rebalance" in "Topology actions" section
  • Click on "Spout" link in "Spouts (All time)" section
  • Refresh, notice re-balancing of job as more storm-worker threads become available

Contact Information

Apache & Hortonworks Storm

Bundle configuration