Apache Drill and Apache Bigtop
Analyse and explore data.
Apache Drill allows you to query a number of less traditional datasources. As detailed above these do not have to be SQL databases and instead might be, CSV files, JSON files, data stored in Hadoop or a combination of all 3 and more. Apache Drill allows users to query multiple data sources as a single entitiy, letting you combine customer data from your CRM with sales exports data stored on a shared file server in a single view. Allowing users the ability to gain better insight into their data than ever before.
This bundle builds on top of the Hadoop processing bundle with additional Drill nodes and HDFS connectivity. This allows you to run SQL queries against files inside of HDFS and scale these queries as your data grows. Apache Drill will execute the query as close to the data as possible, improving performance and amount of data that can be crunched.
SQL Analysis For Big Data.
This bundle is a full Hadoop deployment with Apache Drill. It is designed to allow easy deployment of a scalable SQL over Hadoop setup. The deployment of this bundle will deploy the following units:
- 3 Apache Drill
- 3 Apache Zookeeper
- 1 Hadoop Namenode
- 3 Hadoop Slaves
- 1 Hadoop Resource Manager
- 1 Ganglia Server
- 1 Rsyslog Server
There are 2 easy ways to deploy this bundle.
Click the Add to model button at the top of this page. Then the Deploy changes button and follow the on screen instructions.
Deploy this bundle using juju:
juju deploy ~spiculecharms/drill-hadoop
juju expose apache-drill
Interacting with the bundle
Getting data into Hadoop
The first task is of course getting some queryable data into Hadoop. To access the HDFS file system you can do so by SSHing into the namenode and using hte hdfs command line tool. For example:
hdfs hadoop fs -put parquet/userdata* /user/ubuntu/sample/ hdfs dfs -ls /user/ubuntu/sample
Querying the data
From the Drill web interface you should then be able to interrogate your data with something like this:
select * from `juju_hdfs_namenode`.`root`.`/sample/`
This will then run your SQL query over your data.
The juju_hdfs_namenode part is the Storage pool name. root is the workspace and the last bit is the directory your data is stored in.
- Spicule’s solutions can solve your Big Data challenges
- Supported analytics
- Streaming data platforms
Embed this bundle
Add this card to your website by copying the code below. Learn more.