apache hbase #19

Supports: trusty
Add to new model

Description

HBase is an open source, non-relational, distributed database modeled after
Google's BigTable

Learn more at http://hbase.apache.org.


Overview

HBase is an open source, non-relational, distributed database modeled after
Google's BigTable and written in Java. It is developed as part of Apache
Software Foundation's Apache Hadoop project and runs on top of HDFS (Hadoop
Distributed Filesystem), providing BigTable-like capabilities for Hadoop.

Features

  • Linear and modular scalability.
  • Strictly consistent reads and writes.
  • Automatic and configurable sharding of tables
  • Automatic failover support between RegionServers.
  • Convenient base classes for backing Hadoop MapReduce jobs with Apache HBase
    tables.
  • Easy to use Java API for client access.
  • Block cache and Bloom Filters for real-time queries.
  • Query predicate push down via server side Filters
  • Thrift gateway and a REST-ful Web service that supports XML, Protobuf, and
    binary data encoding options
  • Extensible jruby-based (JIRB) shell
  • Support for exporting metrics via the Hadoop metrics subsystem to files or
    Ganglia; or via JMX

When Would I Use Apache HBase?

Use Apache HBase™ when you need random, realtime read/write access to your
Big Data. This project's goal is the hosting of very large tables -- billions
of rows X millions of columns -- atop clusters of commodity hardware.
Apache HBase is an open-source, distributed, versioned, non-relational
database modeled after Google's Bigtable: A Distributed Storage System
for Structured Data by Chang et al. Just as Bigtable leverages the distributed
data storage provided by the Google File System, Apache HBase provides
Bigtable-like capabilities on top of Hadoop and HDFS.

How Our Apache HBase Solution Works?

Apache HBase scales linearly by requiring all tables to have a primary key.
The key space is divided into sequential blocks that are then allotted to a
region. RegionServers own one or more regions, so the load is spread uniformly
across the cluster. If the keys within a region are frequently accessed, Apache
HBase can further subdivide the region by splitting it automatically, so that
manual data sharding is not necessary.

Apache ZooKeeper and HMaster servers make information about the cluster
topology available to clients. Clients connect to these nodes using Juju
relations and download a list of RegionServers, the regions contained within
those RegionServers and the key ranges hosted by the regions. Clients know
exactly where any piece of data is in HDFS and can contact the RegionServer
directly without any need for a central coordinator.

Usage

This charm leverages our pluggable Hadoop model with the hadoop-plugin
interface and Apache Zookeeper charm. This means that you will need to
deploy a base Apache Hadoop cluster and Apache Zookeeper quorum to run HBase.
The suggested deployment method is to use the
apache-hadoop-hbase
bundle. This will deploy the Apache Hadoop/Zookeeper platform with a single
Apache HBase HMaster and two scalable Apache HBase RegionServer units that
communicates with the cluster by relating to the
apache-hadoop-plugin subordinate charm:

juju-quickstart u/bigdata-dev/apache-hadoop-hbase

Alternatively, you may manually deploy the recommended environment as follows:

juju deploy apache-hadoop-hdfs-master hdfs-master
juju deploy apache-hadoop-yarn-master yarn-master
juju deploy apache-hadoop-compute-slave compute-slave
juju deploy apache-hadoop-plugin plugin
juju deploy -n 3 apache-zookeeper zookeeper
juju deploy apache-hbase hbase-master
juju deploy apache-hbase hbase-regionserver

juju add-relation yarn-master hdfs-master
juju add-relation compute-slave yarn-master
juju add-relation compute-slave hdfs-master
juju add-relation plugin yarn-master
juju add-relation plugin hdfs-master
juju add-relation hbase-master plugin
juju add-relation hbase-regionserver plugin
juju add-relation hbase-master:master hbase-regionserver:regionserver
juju add-relation zookeeper hbase-master
juju add-relation zookeeper hbase-regionserver

Once deployment is complete, you can manually load and run HBase shell or
access the web interface at http://{hbase_master_ip}:60010

  • Apache HBase shell

The Apache HBase Shell is (J)Ruby's IRB with some HBase particular commands
added. Anything you can do in IRB, you should be able to do in the HBase Shell.
Type help and then to see a listing of shell commands and options. To
run the HBase shell, do as follows:

   juju ssh hbase-master/0
   ./bin/hbase shell

Configuration

Testing the deployment

Smoke test HBase

SSH to the HBase unit and run the smoke test as follows:

juju ssh hbase-master/0
~/hbase_test.sh test_table

Verify Job History

Verify the Job History server shows the previous test results by visiting
http://{hbase_master_ip}:60010

Contact Information

Help


Configuration

resources_mirror
(string) URL from which to fetch resources (e.g., Hadoop binaries) instead of Launchpad.