Intro to HBase Internals & Schema Design (for HBase users)

Intro to HBase
Internals &
Schema Design
(for HBase Users)
Alex Baranau, Sematext International, 2012

Monday, July 9, 12

About Me

Software Engineer at Sematext International

http://blog.sematext.com/author/abaranau

@abaranau

http://github.com/sematext (abaranau)

Monday, July 9, 12

Agenda

Logical view

Physical view

Schema design

Other/Advanced topics

Monday, July 9, 12

Why?
Why should I (HBase user) care about
HBase internals?

HBase will not adjust cluster
settings to optimal based on usage
patterns automatically

Schema design, table settings
(defined upon creation), etc.
depend on HBase implementation
aspects

Monday, July 9, 12

Logical View

Monday, July 9, 12

Logical View: Regions
HBase cluster serves multiple tables, distinguished by
name

Each table contains of rows

Each row contains cells:
(row key, column family, column, timestamp) -> value

Table is split into Regions (table shards, each
contains full rows), defined by start and end row keys

Monday, July 9, 12

Logical View: Regions are
Shards
Regions are “atoms of distribution”

Each region assigned to single RegionServer
(HBase cluster slave)

Rows of particular Region served by single
RS (cluster slave)

Regions are distributed evenly across RSs

Region has configurable max size

When region reaches max size (or on request)
it is split into two smaller regions, which
can be assigned to different RSs

Monday, July 9, 12

Logical View: Regions on
Cluster
ZooKeeper
ZooKeeper
ZooKeeper

client HMaster
HMaster

Region Region

RegionServer

Region Region Region Region
RegionServer
RegionServer
RegionServer RegionServer

Monday, July 9, 12

Logical View: Regions Load

It is essential for Regions under the
load to be evenly distributed across
the cluster

It is HBase user’s job to make sure
the above is true. Note: even
distribution of Regions over cluster
doesn’t imply that the load is evenly
distributed

Monday, July 9, 12

Logical View: Regions Load

Take into account that rows are stored in ordered
manner

Make sure you don’t write rows with sequential
keys to avoid RS hotspotting*
When writing data with monotonically increasing/decreasing
keys, data is written at one RS at a time

Use pre-splitting of the table upon creation
Starting with single region means using one RS for some time

In general, splitting can be expensive

Increase max region size

* see https://github.com/sematext/HBaseWD

Monday, July 9, 12

Logical View: Slow RSs
When load is distributed evenly, watch for
slowest RSs (HBase slaves)

Since every region served by single RS,
one slow RS can slow down cluster
performance e.g. when:

data is written into multiple RSs at
even pace (random value-based row keys)

data is being read from many RSs when
doing scan

Monday, July 9, 12

Physical View

Monday, July 9, 12

Physical View: Write/Read Flow
HTable client client
buffer HTable

write read RegionServer

Region z Region ...
...
Store Store
MemStore MemStore
(per CF) (per CF)

flush

HFile HFile ... HFile HFile

Write Ahead Log

HDFS
Monday, July 9, 12

Physical: Speed up Writing

Enabling & increasing client-side buffer reduces RPC
operations amount

warn: possible loss of buffered data

in case of client failure; design for failover

in case of write failure (networking/server-
side issues); can be handled on client

Disabling WAL increases write speed

warn: possible data loss in case of RS failure

Use bulk import functionality (writes HFiles
directly, which can be later added to HBase)

Monday, July 9, 12

Physical: Memstore Flushes
When memstore is flushed N HFiles are created (one per
CF)

Memstore size which causes flushing is configured on
two levels:

per RS: % of heap occupied by memstores

per table: size in MB of single memstore (per CF)
of Region

When Region memstores flushes, memstores of all CFs
are flushed

Uneven data amount between CFs causes too many
flushes & creation of too many HFiles (one per CF
every time)

In most cases having one CF is the best design

Monday, July 9, 12

Important: there are Memstore size
thresholds which cause writes to be blocked,
so slow memstore flushes and overuse of
memory by memstore can cause write perf
degradation

Hint: watch for flush queue size metric on
RSs

At the same time the more memory memstore
uses the better for writing/reading perf
(unless it reaches those “write blocking”
thresholds)

Monday, July 9, 12


Example of good situation
*

* http://sematext.com/spm/index.html

Monday, July 9, 12

Physical: HFiles Compaction
HFiles are periodically compacted into bigger
HFiles containing same data

Reading from less HFiles faster

Important: there’s a configured max number of files
in Store which, when reached causes writes to block

Hint: watch for compaction queue size metric on
RSs

read Store
MemStore
(per CF)

HFile HFile

Monday, July 9, 12

Physical: Data Locality
RSs are usually collocated HDFS

with HDFS DataNodes MapReduce

HBase

RegionServer RegionServer

TaskTracker
TaskTracker

DataNode DataNode

Slave Node Slave Node

Monday, July 9, 12

HBase tries to assign Regions to RSs so that
Region data stored physically on the same node.
But sometimes fails

after Region splits there’s no guarantee that
there’s a node that has all blocks (HDFS level)
of new Region and

no guarantee that HBase will not re-assign this
Region to different RS in future (even
distribution of Regions takes preference over
data locality)

There’s an ongoing work towards better preserving
data locality

Monday, July 9, 12

Also, data locality can break when:

Adding new slaves to cluster

Removing slaves from cluster

Incl. node failures

Hint: look at networking IO between slaves when writing/reading
data, it should be minimal

Important:

make sure HDFS is well balanced (use balancer tool)

try to rebalance Regions in HBase cluster if possible (HBase
Master restart will do that) to regain data locality

Pre-split table on creation to limit (ideally avoid) splits
and regions movement; manage splits manually sometimes helps

Monday, July 9, 12

Schema Design
(very briefly)

Monday, July 9, 12

Schema: row keys
Using row key (or keys range) is the most
efficient way to retrieve the data from HBase

Row key design is major part of schema design

Note: no secondary indices available out of
the box

Row Key Data
‘login_2012-03-01.00:09:17’ d:{‘user’:‘alex’}
... ...
‘login_2012-03-01.23:59:35’ d:{‘user’:‘otis’}
‘login_2012-03-02.00:00:21’ d:{‘user’:‘david’}

Monday, July 9, 12

Schema: row keys
Redundancy is OK!
warn: changing two rows in HBase is not atomic operation

Row Key Data
‘login_2010-01-01.00:09:17’ d:{‘user’:‘alex’}
... ...
‘login_2012-03-01.23:59:35’ d:{‘user’:‘otis’}
‘alex_2010-01-01.00:09:17’ d:{‘action’:‘login’}
... ...
‘otis_2012-03-01.23:59:35’ d:{‘action’:‘login’}
‘alex_login_2010-01-01.00:09:17’ d:{‘device’:’pc’}
... ...
‘otis_login_2012-03-01.23:59:35’ d:{‘device’:‘mobile’}

Monday, July 9, 12

Schema: Relations
Not relational

No joins

Denormalization is OK! Use ‘nested entities’

Row Key Data

d:{
student_firstname:Alex,
student_lastname:Baranau,
student
professor_math_firstname:David,
* ‘student_abaranau’ professor_math_lastname:Smart,

* professor_cs_firstname:Jack,

professor professor_cs_lastname:Weird,

}

‘prof_dsmart’ d:{...}

Monday, July 9, 12

Schema: row key/CF/qual size

HBase stores cells individually

great for “sparse” data

row key, CF name and column name stored with each
cell which may affect data amount to be stored and
managed

keep them short

serialize and store many values into single cell

Row Key Data
d:{
s:Alex#Baranau#cs#2009,
‘s_abaranau’ p_math:David#Smart,
p_cs:Jack#Weird,
}

Monday, July 9, 12

Other/Advanced
Topics

Monday, July 9, 12

Advanced: Co-Processors
CoProcessors API (HBase 0.92.0+) allows to:

execute (querying/aggregation/etc.)
logic on server side (you may think of
it as of stored procedures in RDBMS)

perform auditing of actions performed on
server-side (you may think of it as of
triggers in RDBMS)

apply security rules for data access

and many more cool stuff

Monday, July 9, 12

Other: Use Compression
Using compression:

reduces data amount to be stored on disks

reduces data amount to be transferred when RS reading data not
from local replica

increases amount of CPU used, but CPU isn’t usually a bottleneck

Favor compression speed over compression ratio

SNAPPY is good

Use wisely:

e.g. avoid wasting CPU cycles on compressing images

compression can be configured on per CF basis, so storing
non-compressible data in separate CF sometimes helps

data blocks are uncompressed in memory, avoid this to cause OOME

note: when scanning (seeking data to return for scan) many data
blocks can be uncompressed even if none of the data will be
returned from those block

Monday, July 9, 12

Other: Use Monitoring

TBD

Ganglia, Cacti, other*, Just use it!

* http://sematext.com/spm/index.html

Monday, July 9, 12

Qs?

Sematext is hiring!
Monday, July 9, 12

Intro to HBase Internals & Schema Design (for HBase users)

Related slideshows

More Related Content

What's hot

What's hot (20)

Similar to Intro to HBase Internals & Schema Design (for HBase users)

Similar to Intro to HBase Internals & Schema Design (for HBase users) (20)

Recently uploaded

Recently uploaded (20)

Intro to HBase Internals & Schema Design (for HBase users)