apache spark standalone notebook #1

Supports: trusty
Add to new model

Spark Standalone 1.3.x cluster + IPython Nodebook

Apache Spark™ is a fast and general purpose engine for large-scale data processing.
Key features:
The IPython Notebook is an interactive computational environment, in which you
can combine code execution, rich text, mathematics, plots and rich media.
Run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk.
Spark has an advanced DAG execution engine that supports cyclic data flow and in-memory computing.
Ease of Use:
Write applications quickly in Java, Scala or Python.
Spark offers over 80 high-level operators that make it easy to build parallel apps.
And you can use it interactively from the Scala and Python shells.
General Purpose Engine:
Combine SQL, streaming, and complex analytics.
Spark powers a stack of high-level tools including Shark for SQL, MLlib for
machine learning, GraphX, and Spark Streaming. You can combine these frameworks
seamlessly in the same application.


from bundle's home directory:
juju quickstart bundles.yaml

Scale Out Usage

In order to increase the amount of spark slaves, you just add units, to add one
unit to spark-slave nodes (current bundle has 4 spark-slave):
juju add-unit -n4 spark-slave

Smoke tests after deployment

# Spark admins use ssh to access spark console from master node
1) juju ssh spark-master/0  <<= ssh to spark master
2) Use spark-submit to run your application:
spark-submit --class org.apache.spark.examples.SparkPi /usr/lib/spark/lib/spark-examples*.jar  10
you should get pi = 3.14
or execute demo.sh from /home/ubuntu

3) Spark’s shell provides a simple way to learn the API, as well as a powerful 
tool to analyze data interactively. It is available in either Scala or Python. 
Start it by running the following in the Spark directory:
$spark-shell <== for interaction using scala 
$pyspark     <== for interaction using python

Access Ipython notebook Web site

from any web browser, load : http://{spark-node-ip}:8880
note: use "juju status" command to discover spark node IP address

Bundle configuration

Embed this bundle

Add this card to your website by copying the code below. Learn more.