apache hbase #19

Supports: trusty

Description

HBase is an open source, non-relational, distributed database modeled after Google's BigTable

Learn more at http://hbase.apache.org.


Overview

HBase is an open source, non-relational, distributed database modeled after Google's BigTable and written in Java. It is developed as part of Apache Software Foundation's Apache Hadoop project and runs on top of HDFS (Hadoop Distributed Filesystem), providing BigTable-like capabilities for Hadoop.

Features

  • Linear and modular scalability.
  • Strictly consistent reads and writes.
  • Automatic and configurable sharding of tables
  • Automatic failover support between RegionServers.
  • Convenient base classes for backing Hadoop MapReduce jobs with Apache HBase tables.
  • Easy to use Java API for client access.
  • Block cache and Bloom Filters for real-time queries.
  • Query predicate push down via server side Filters
  • Thrift gateway and a REST-ful Web service that supports XML, Protobuf, and binary data encoding options
  • Extensible jruby-based (JIRB) shell
  • Support for exporting metrics via the Hadoop metrics subsystem to files or Ganglia; or via JMX

When Would I Use Apache HBase?

Use Apache HBaseâ„¢ when you need random, realtime read/write access to your Big Data. This project's goal is the hosting of very large tables -- billions of rows X millions of columns -- atop clusters of commodity hardware. Apache HBase is an open-source, distributed, versioned, non-relational database modeled after Google's Bigtable: A Distributed Storage System for Structured Data by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, Apache HBase provides Bigtable-like capabilities on top of Hadoop and HDFS.

How Our Apache HBase Solution Works?

Apache HBase scales linearly by requiring all tables to have a primary key. The key space is divided into sequential blocks that are then allotted to a region. RegionServers own one or more regions, so the load is spread uniformly across the cluster. If the keys within a region are frequently accessed, Apache HBase can further subdivide the region by splitting it automatically, so that manual data sharding is not necessary.

Apache ZooKeeper and HMaster servers make information about the cluster topology available to clients. Clients connect to these nodes using Juju relations and download a list of RegionServers, the regions contained within those RegionServers and the key ranges hosted by the regions. Clients know exactly where any piece of data is in HDFS and can contact the RegionServer directly without any need for a central coordinator.

Usage

This charm leverages our pluggable Hadoop model with the hadoop-plugin interface and Apache Zookeeper charm. This means that you will need to deploy a base Apache Hadoop cluster and Apache Zookeeper quorum to run HBase. The suggested deployment method is to use the apache-hadoop-hbase bundle. This will deploy the Apache Hadoop/Zookeeper platform with a single Apache HBase HMaster and two scalable Apache HBase RegionServer units that communicates with the cluster by relating to the apache-hadoop-plugin subordinate charm:

juju-quickstart u/bigdata-dev/apache-hadoop-hbase

Alternatively, you may manually deploy the recommended environment as follows:

juju deploy apache-hadoop-hdfs-master hdfs-master
juju deploy apache-hadoop-yarn-master yarn-master
juju deploy apache-hadoop-compute-slave compute-slave
juju deploy apache-hadoop-plugin plugin
juju deploy -n 3 apache-zookeeper zookeeper
juju deploy apache-hbase hbase-master
juju deploy apache-hbase hbase-regionserver

juju add-relation yarn-master hdfs-master
juju add-relation compute-slave yarn-master
juju add-relation compute-slave hdfs-master
juju add-relation plugin yarn-master
juju add-relation plugin hdfs-master
juju add-relation hbase-master plugin
juju add-relation hbase-regionserver plugin
juju add-relation hbase-master:master hbase-regionserver:regionserver
juju add-relation zookeeper hbase-master
juju add-relation zookeeper hbase-regionserver

Once deployment is complete, you can manually load and run HBase shell or access the web interface at http://{hbase_master_ip}:60010

  • Apache HBase shell

The Apache HBase Shell is (J)Ruby's IRB with some HBase particular commands added. Anything you can do in IRB, you should be able to do in the HBase Shell. Type help and then to see a listing of shell commands and options. To run the HBase shell, do as follows:

   juju ssh hbase-master/0
   ./bin/hbase shell

Configuration

Testing the deployment

Smoke test HBase

SSH to the HBase unit and run the smoke test as follows:

juju ssh hbase-master/0
~/hbase_test.sh test_table

Verify Job History

Verify the Job History server shows the previous test results by visiting http://{hbase_master_ip}:60010

Contact Information

Help


Configuration

resources_mirror
(string) URL from which to fetch resources (e.g., Hadoop binaries) instead of Launchpad.