apache ingestion flume rabbitmq #2

Supports: trusty
Add to new model

Big Data Ingestion with Apache Flume and RabbitMQ

This bundle is an 8 node cluster designed to scale out. Built around Apache Hadoop components, it contains the following units:

  • 1 HDFS Master
  • 1 HDFS Secondary Namenode
  • 1 YARN Master
  • 3 Compute Slaves
  • 1 Flume-HDFS
  • 1 Plugin (colocated on the Flume unit)
  • 1 RabbitMQ
    • 1 Flume-RabbitMQ (colocated on the RabbitMQ unit)

The Flume-HDFS unit provides an Apache Flume agent featuring an Avro source, memory channel, and HDFS sink. This agent supports a relation with the Flume-RabbitMQ charm (apache-flume-rabbitmq) to ingest messages published to a given RabbitMQ queue into HDFS.


Deploy this bundle using juju-quickstart:

juju quickstart u/bigdata-dev/apache-ingestion-flume-rabbitmq

See juju quickstart --help for deployment options, including machine constraints and how to deploy a locally modified version of the apache-ingestion-flume-rabbitmq bundle.yaml.

Testing the deployment

Smoke test HDFS admin functionality

Once the deployment is complete and the cluster is running, ssh to the HDFS Master unit:

juju ssh hdfs-master/0

As the ubuntu user, create a temporary directory on the Hadoop file system. The steps below verify HDFS functionality:

hdfs dfs -mkdir -p /tmp/hdfs-test
hdfs dfs -chmod -R 777 /tmp/hdfs-test
hdfs dfs -ls /tmp # verify the newly created hdfs-test subdirectory exists
hdfs dfs -rm -R /tmp/hdfs-test
hdfs dfs -ls /tmp # verify the hdfs-test subdirectory has been removed

Smoke test YARN and MapReduce

Run the terasort.sh script from the Flume unit to generate and sort data. The steps below verify that Flume is communicating with the cluster via the plugin and that YARN and MapReduce are working as expected:

juju ssh flume-hdfs/0

Smoke test HDFS functionality from user space

From the Flume unit, delete the MapReduce output previously generated by the terasort.sh script:

juju ssh flume-hdfs/0
hdfs dfs -rm -R /user/ubuntu/tera_demo_out

Smoke test Flume

SSH to the Flume unit and verify the flume-ng java process is running:

juju ssh flume-hdfs/0
ps -ef | grep flume-ng # verify process is running

Scale Out Usage

This bundle was designed to scale out. To increase the amount of Compute Slaves, you can add units to the compute-slave service. To add one unit:

juju add-unit compute-slave

You can also add multiple units, for examle, to add four more compute slaves:

juju add-unit -n4 compute-slave

Contact Information


Bundle configuration

Embed this bundle

Add this card to your website by copying the code below. Learn more.