Data Analytics with Pig Latin
Big Data Analytic solution is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs.
Pig's infrastructure layer consists of a compiler that produces sequences of Map-Reduce programs, for which large-scale parallel implementations of HDP hadoop 2.4.1 is used.
This deploys a Hortonworks HDP 2.1 hadoop master node, which includes yarn resourcemanager and hdfs namenode servers, and compute nodes as cluster. By default this bundle uses two units, one for the master, and one for the compute.
from bundle home: juju quickstart bundles.yaml
Scale Out Usage
In order to increase the amount of slaves, you must add units, to add one unit:
juju add-unit compute-node
Or you can add multiple units at once:
juju add-unit -n4 compute-node
Smoke Test after deployment
1) juju ssh hdp-pig/0 2) sudo su $HDFS_USER 3) hadoop version <= verifies if hadoop client is installed 4) hdfs dfsadmin -report <= verifies if Pig client has been connected to the remote HDFS server 5) yarn rmadmin -getGroups <= verifies if Pig client has been connected to the remote ResourceManager server Run a Pig Script Test: 1) hdfs dfs -mkdir -p /user/hduser 2) hdfs dfs -copyFromLocal /etc/passwd /user/hduser/passwd 3) vim /tmp/id.pig 4) add following Pig script commands, save and exit: A = load '/user/hduser/passwd' using PigStorage(':'); B = foreach A generate \$0 as id; store B into '/tmp/id.out'; 5) pig -l /tmp/pig.log /tmp/id.pig 6) hadoop fs -cat /tmp/id.out/part-m-00000 <= check the result on the hadoop cluster
Following the Development of this Charm:
By default this bundle will deploy the stable version of the hadoop charm, but if you want to follow development you can: