apache ingestion flume #1

Supports: trusty
Add to new model

Big Data Ingestion with Apache Flume

This bundle is a 7 node cluster designed to scale out. Built around Apache Hadoop components, it contains the following units:

  • 1 HDFS Master
  • 1 HDFS Secondary Namenode
  • 1 YARN Master
  • 3 Compute Slaves
  • 1 Flume-HDFS
  • 1 Plugin (colocated on the Flume unit)

The Flume-HDFS unit provides an Apache Flume agent featuring an Avro source, memory channel, and HDFS sink. This agent supports relations with the apache-flume-twitter and apache-flume-syslog charms to ingest Twitter and remote syslog data, respectively, into HDFS.

Usage

Deploy this bundle using juju-quickstart:

juju quickstart u/bigdata-dev/apache-ingestion-flume

See juju quickstart --help for deployment options, including machine constraints and how to deploy a locally modified version of the apache-ingestion-flume bundle.yaml.

Testing the deployment

Smoke test HDFS admin functionality

Once the deployment is complete and the cluster is running, ssh to the HDFS Master unit:

juju ssh hdfs-master/0

As the ubuntu user, create a temporary directory on the Hadoop file system. The steps below verify HDFS functionality:

hdfs dfs -mkdir -p /tmp/hdfs-test
hdfs dfs -chmod -R 777 /tmp/hdfs-test
hdfs dfs -ls /tmp # verify the newly created hdfs-test subdirectory exists
hdfs dfs -rm -R /tmp/hdfs-test
hdfs dfs -ls /tmp # verify the hdfs-test subdirectory has been removed
exit

Smoke test YARN and MapReduce

Run the terasort.sh script from the Flume unit to generate and sort data. The steps below verify that Flume is communicating with the cluster via the plugin and that YARN and MapReduce are working as expected:

juju ssh flume-hdfs/0
~/terasort.sh
exit

Smoke test HDFS functionality from user space

From the Flume unit, delete the MapReduce output previously generated by the terasort.sh script:

juju ssh flume-hdfs/0
hdfs dfs -rm -R /user/ubuntu/tera_demo_out
exit

Smoke test Flume

SSH to the Flume unit and verify the flume-ng java process is running:

juju ssh flume-hdfs/0
ps -ef | grep flume-ng # verify process is running
exit

Scale Out Usage

This bundle was designed to scale out. To increase the amount of Compute Slaves, you can add units to the compute-slave service. To add one unit:

juju add-unit compute-slave

You can also add multiple units, for examle, to add four more compute slaves:

juju add-unit -n4 compute-slave

Contact Information

Help

Bundle configuration

Embed this bundle

Add this card to your website by copying the code below. Learn more.

Preview