hdp accumulo #9

Supports: trusty

Add to new model

Description

Apache Accumulo is a highly scalable structured and distributed key/value store for high performance data storage and retrieval..


Hortonworks HDP 2.1 Accumulo Overview

Apache™ Accumulo is a high performance data storage and retrieval system with
cell-level access control. It is a scalable implementation of Google’s Big Table
design that works on top of Apache Hadoop® and Apache ZooKeeper.

Cell-level access control is important for organizations with complex policies
governing who is allowed to see data. It enables the intermingling of different
data sets with different access control policies and proper handling of
individual data sets that have some sensitive portions.

Without Accumulo, those policies are difficult to enforce systematically.
Accumulo encodes those rules for each individual data cell and allows fine-grained
access control.

Accumulo is ideal to be used as a data storage component in any Big Data Healthcare,
Financial, and security solution.

Usage

Hortonworks Accumulo requires access to hadoop cluster and zookeeper quorum.

Deploy Hadoop cluster

juju deploy hdp-hadoop yarn-hdfs-master
juju deploy hdp-hadoop compute-node
juju add-relation yarn-hdfs-master:namenode compute-node:datanode
juju add-relation yarn-hdfs-master:resourcemanager compute-node:nodemanager

Deploy zookeeper quorum:

juju deploy hdp-zookeeper
juju add-unit -n 2 hdp-zookeeper

Deploy Accumulo Cluster:

juju deploy hdp-accumulo accumulo-master
juju deploy hdp-accumulo tablet-servers
juju add-relation accumulo-master:accumulo-server tablet-servers:tabletserver
juju add-relation accumulo-master:zookeeper hdp-zookeeper:zookeeper
juju add-relation tablet-servers:zookeeper hdp-zookeeper:zookeeper
juju add-relation accumulo-master:namenode yarn-hdfs-master:namenode
juju add-relation tablet-servers:namenode yarn-hdfs-master:namenode

Scale out usage

Hadoop cluster:

juju add-unit -n 2 compute-node

Accumulo cluster:

juju add-unit -n 3 tablet-servers

Verify Deployment

From any command line:

$juju ssh accumulo-master/0

HDFS validation from Tez Client Remote HDFS Cluster health

$su hdfs -c 'hdfs dfsadmin -report '

Accumulo install and configuratio Accumulo must be initialized to create the structures it uses internally to locate
data across the cluster. HDFS is required to be configured and running before
Accumulo can be initialized.
Once HDFS is started, initialization can be performed by executing:

$/usr/lib/accumulo/bin/accumulo init

This script will prompt for a name for this instance of Accumulo. The instance
name is used to identify a set of tables and instance-specific settings. The
script will then write some information into HDFS so Accumulo can start properly.
The initialization script will prompt you to set a root password. Once Accumulo
is initialized it can be started.

Run the Accumulo:

$/usr/lib/accumulo/bin/start-all.sh

View the Accumulo native UI:

http://<accumulo-master_IP_address>:50095

Contact Information

Amir Sanjar amir.sanjar@canonical.com

Upstream Accumulo

  • Upstream website http://hortonworks.com/ , https://github.com/apache/accumulo
  • Upstream bug tracker: https://issues.apache.org/jira/browse/ACCUMULO

Configuration

instance_dfs_dir
(string) HDFS directory Accumulo will write to.
/user/ubuntu/data
instance_secret
(string) A secret unique to a given instance that all servers must know in order to communicate with one another. Change it before initialization. To change it later use ./bin/accumulo org.apache.accumulo.server.util.ChangeSecret --old [oldpasswd] --new[newpasswd], and then update this file..
DEFAULT
instance_zookeeper_host
(string) comma separated list of zookeeper servers
trace_token_property_password
(string) TBD..
accumulo
trace_user
(string) TBD.
accumulo