apache spark notebook #30

Supports: trusty


The IPython Notebook is an interactive computational environment, in which you can combine code execution, rich text, mathematics, plots, and rich media. IPython Notebook and Spark’s Python API are a powerful combination for data science.


IPython Notebook is a web-based notebook that enables interactive data analytics for Spark. The developers of Apache Spark have given thoughtful consideration to Python as a language of choice for data analysis. They have developed the PySpark API for working with RDDs in Python, and further support using the powerful IPythonshell instead of the builtin Python REPL.

The developers of IPython have invested considerable effort in building the IPython Notebook, a system inspired by Mathematica that allows you to create "executable documents." IPython Notebooks can integrate formatted text (Markdown), executable code (Python), mathematical formulae (LaTeX), and graphics/visualizations (matplotlib) into a single document that captures the flow of an exploration and can be exported as a formatted report or an executable script.


This is a subordinate charm that requires the apache-spark interface. This means that you will need to deploy a base Apache Spark cluster to use IPython Notebook. An easy way to deploy the recommended environment is to use the apache-hadoop-spark-notebook bundle. This will deploy the Apache Hadoop platform with an Apache Spark + IPython Notebook unit that communicates with the cluster by relating to the apache-hadoop-plugin subordinate charm:

juju-quickstart apache-hadoop-spark-notebook

Alternatively, you may manually deploy the recommended environment as follows:

juju deploy apache-hadoop-hdfs-master hdfs-master
juju deploy apache-hadoop-yarn-master yarn-master
juju deploy apache-hadoop-compute-slave compute-slave
juju deploy apache-hadoop-plugin plugin
juju deploy apache-spark spark
juju deploy apache-spark-notebook notebook

juju add-relation yarn-master hdfs-master
juju add-relation compute-slave yarn-master
juju add-relation compute-slave hdfs-master
juju add-relation plugin yarn-master
juju add-relation plugin hdfs-master
juju add-relation spark plugin
juju add-relation notebook spark

Once deployment is complete, expose the notebook service:

juju expose notebook

You may now access the web interface at http://{spark_unit_ip_address}:8880. The ip address can be found by running juju status spark | grep public-address.

Testing the deployment

From the IPython Notebook web interface, click on the "New Notebook" button. In the notebook cell type "sc." followed by the "Tab" key. The Spark API completion menu should appear. This verifies the notebook can communicate with the Spark unit.

Contact Information