apache drill #132
Description
Apache Drill Drillbit
- Tags:
- big_data ›
- hadoop ›
Overview
Query any non-relational datastore (well, almost....)
Drill supports a variety of NoSQL databases and file systems, including HBase, MongoDB, MapR-DB, HDFS, MapR-FS, Amazon S3, Azure Blob Storage, Google Cloud Storage, Swift, NAS and local files. A single query can join data from multiple datastores. For example, you can join a user profile collection in MongoDB with a directory of event logs in Hadoop.
Drill's datastore-aware optimizer automatically restructures a query plan to leverage the datastore's internal processing capabilities. In addition, Drill supports data locality, so it's a good idea to co-locate Drill and the datastore on the same nodes.
Usage
To deploy this charm simply run:
juju deploy apache-zookeeper zookeeper
juju add-unit -n 2 apache-zookeeper (optional but recommended for a quorum)
juju deploy cs:~spiculecharms/apache-drill
juju add-relation apache-drill zookeeper
juju expose apache-drill
There is a webconsole running on http://
HDFS connectivity
If you are running a Hadoop setup, you can also test the HDFS connectivity.
juju add-relation apache-drill namenode
This will add a datasource entry for your Hadoop namenode. You can then query CSV/JSON/Parquet files.
Security
To enable security you need to first set and administrative user, inside the Drill console run the following:
ALTER SYSTEM SET `security.admin.users`='<myuser>'
Then in the config enable basic_auth and basic_security_auth. Once drill has restarted you should then see a user login in the top corner. To create a user either create a standard unix user using useradd or use the following action:
juju run-action apache-drill/0 adduser username="<myuser>" password="<mypword>"
You should then be able to login with this user.
Scale out Usage
You can simply add new units and they will be added to the cluster automatically:
juju add-unit -n 2 apache-drill
Configuration
drill_url: Allows you to set an alternative download url for Apache Drill.
cluster_id: Allows you to set an alternative cluster id for Zookeeper.
Contact Information
Apache Drill
Charm Support
If you require commercial support for this charm or Apache Drill, please contact us and we'd be happy to help. Email us at info@spicule.co.uk and we can arrange a call to discuss your requirements.
Configuration
- auth_mechanism
- (string) Alter the authentication mechanism for drill
- PLAIN
- auth_profiles
- (string) Authentication profile inputs for drill
- sudo, login
- basic_auth
- (boolean) Enable/disable basic authentication for drill
- basic_security_auth
- (boolean) Enable/disable basic security authentication for drill
- cluster_id
- (string) Cluster ID for zookeeper
- drill-cluster
- drill_heap
- (string) Drill Max Heap Size. By default we set this to 25% of total system ram. You can provide a percentage value or a Gigabyte value like: 3G
- 25%
- drill_max_direct_memory
- (string) Drill Max Available RAM. By default we set this to 75% of total system ram. You can provide a percentage value or a Gigabyte value like: 8G
- 75%
- extra_packages
- (string) Space separated list of extra deb packages to install.
- hdfs_formats
- (string)
- 'psv: type: text extensions: - tbl delimiter: "|" csv: type: text extensions: - csv delimiter: "," parquet: type: parquet json: type: json extensions: - json avro: type: avro sequencefile: type: sequencefile extensions: - seq csvh: type: text extensions: - csvh extractHeader: "true" delimiter: ","'
- hdfs_path
- (string) Default path for HDFS connections.
- /user/ubuntu
- hdfs_writeable
- (boolean) Is the HDFS path writable?
- True
- install_keys
- (string) List of signing keys for install_sources package sources, per charmhelpers standard format (a yaml list of strings encoded as a string). The keys should be the full ASCII armoured GPG public keys. While GPG key ids are also supported and looked up on a keyserver, operators should be aware that this mechanism is insecure. null can be used if a standard package signing key is used that will already be installed on the machine, and for PPA sources where the package signing key is securely retrieved from Launchpad.
- install_sources
- (string) List of extra apt sources, per charm-helpers standard format (a yaml list of strings encoded as a string). Each source may be either a line that can be added directly to sources.list(5), or in the form ppa:<user>/<ppa-name> for adding Personal Package Archives, or a distribution component to enable.
- package_status
- (string) The status of service-affecting packages will be set to this value in the dpkg database. Valid values are "install" and "hold".
- install
- snap_proxy
- (string) HTTP/HTTPS web proxy for Snappy to use when accessing the snap store.
- snap_proxy_url
- (string) The address of a Snap Store Proxy to use for snaps e.g. http://snap-proxy.example.com
- snapd_refresh
- (string) How often snapd handles updates for installed snaps. The default (an empty string) is 4x per day. Set to "max" to check once per month based on the charm deployment date. You may also set a custom string as described in the 'refresh.timer' section here: https://forum.snapcraft.io/t/system-options/87