apache spark notebook #4
Description
The IPython Notebook is an interactive computational environment, in which you can combine code execution, rich text, mathematics, plots, and rich media. IPython Notebook and Spark’s Python API are a powerful combination for data science.
Overview
IPython Notebook is a web-based notebook that enables interactive data analytics for Spark. The developers of Apache Spark have given thoughtful consideration to Python as a language of choice for data analysis. They have developed the PySpark API for working with RDDs in Python, and further support using the powerful IPythonshell instead of the builtin Python REPL.
The developers of IPython have invested considerable effort in building the IPython Notebook, a system inspired by Mathematica that allows you to create "executable documents." IPython Notebooks can integrate formatted text (Markdown), executable code (Python), mathematical formulae (LaTeX), and graphics/visualizations (matplotlib) into a single document that captures the flow of an exploration and can be exported as a formatted report or an executable script.
Usage
This is a subordinate charm that requires the apache-spark
interface. This
means that you will need to deploy a base Apache Spark cluster to use
the Notebook. An easy way to deploy the recommended environment is to use the
apache-hadoop-spark-notebook
bundle. This will deploy the Apache Hadoop platform with an Apache Spark +
Notebook unit that communicates with the cluster by relating to the
apache-hadoop-plugin
subordinate charm:
juju-quickstart apache-hadoop-spark-notebook
Once deployment is complete, expose the Notebook:
juju expose notebook
You may now access the web interface at
http://{spark_unit_ip_address}:9090. The ip address can be found by running
juju status spark | grep public-address
.
Verify the deployment
Status
The services provide extended status reporting to indicate when they are ready:
juju status --format=tabular
This is particularly useful when combined with watch
to track the on-going
progress of the deployment:
watch -n 0.5 juju status --format=tabular
The message for each unit will provide information about that unit's state.