apache ingestion flume rabbitmq #2
Big Data Ingestion with Apache Flume and RabbitMQ
This bundle is an 8 node cluster designed to scale out. Built around Apache Hadoop components, it contains the following units:
- 1 HDFS Master
- 1 HDFS Secondary Namenode
- 1 YARN Master
- 3 Compute Slaves
- 1 Flume-HDFS
- 1 Plugin (colocated on the Flume unit)
- 1 RabbitMQ
-
- 1 Flume-RabbitMQ (colocated on the RabbitMQ unit)
The Flume-HDFS unit provides an Apache Flume agent featuring an Avro source,
memory channel, and HDFS sink. This agent supports a relation with the
Flume-RabbitMQ charm (apache-flume-rabbitmq
) to ingest messages published to a
given RabbitMQ queue into HDFS.
Usage
Deploy this bundle using juju-quickstart:
juju quickstart u/bigdata-dev/apache-ingestion-flume-rabbitmq
See juju quickstart --help
for deployment options, including machine
constraints and how to deploy a locally modified version of the
apache-ingestion-flume-rabbitmq
bundle.yaml.
Testing the deployment
Smoke test HDFS admin functionality
Once the deployment is complete and the cluster is running, ssh to the HDFS Master unit:
juju ssh hdfs-master/0
As the ubuntu
user, create a temporary directory on the Hadoop file system.
The steps below verify HDFS functionality:
hdfs dfs -mkdir -p /tmp/hdfs-test
hdfs dfs -chmod -R 777 /tmp/hdfs-test
hdfs dfs -ls /tmp # verify the newly created hdfs-test subdirectory exists
hdfs dfs -rm -R /tmp/hdfs-test
hdfs dfs -ls /tmp # verify the hdfs-test subdirectory has been removed
exit
Smoke test YARN and MapReduce
Run the terasort.sh
script from the Flume unit to generate and sort data. The
steps below verify that Flume is communicating with the cluster via the plugin
and that YARN and MapReduce are working as expected:
juju ssh flume-hdfs/0
~/terasort.sh
exit
Smoke test HDFS functionality from user space
From the Flume unit, delete the MapReduce output previously generated by the
terasort.sh
script:
juju ssh flume-hdfs/0
hdfs dfs -rm -R /user/ubuntu/tera_demo_out
exit
Smoke test Flume
SSH to the Flume unit and verify the flume-ng java process is running:
juju ssh flume-hdfs/0
ps -ef | grep flume-ng # verify process is running
exit
Scale Out Usage
This bundle was designed to scale out. To increase the amount of Compute Slaves, you can add units to the compute-slave service. To add one unit:
juju add-unit compute-slave
You can also add multiple units, for examle, to add four more compute slaves:
juju add-unit -n4 compute-slave