Hbase Nosql

HBase
Ryan Rawson
Sr Developer @ SU, HBase commi8er

June 11th, NOSQL

Quick Backstory
•  Needed large data store @ SU
•  Started looking back in Jan ‘09
•  Looked at the ﬁeld of stores, tried:
–  Cassandra
–  Hypertable (fast)
–  HBase
•  Ended picking HBase

NOSQL Meetup

Now
•  Personally rewri8en large porRons of HBase
for 0.20
–  Code easy to work with, understand, modify
•  Recently voted to commi8er status (thanks!)
•  Now giving presentaRons (hi!)

NOSQL Meetup

Four Point Agenda
•  What is HBase?
•  Why HBase?
•  HBase 0.20
•  HBase At Stumbleupon

NOSQL Meetup

What is HBase?
•  Clone of Bigtable ‐
h8p://labs.google.com/papers/bigtable.html
•  Created originally at Powerset in 2007
•  Hadoop‐subproject
–  The usual ASF things apply (license, JIRA, etc)

NOSQL Meetup

What is HBase?
•  Column‐oriented semi‐structured data store
•  Distributed over many machines
–  Bigtable known to scale to >1000 nodes
•  Tolerant of machine failure
•  Layered over HDFS (& KFS)
•  Strong consistency (important)

NOSQL Meetup

Table & Regions
•  Rows stored in byte‐lexographic sorted order
•  Table dynamically split into “regions”
•  Each region contains values [startKey, endKey)
•  Regions hosted on a regionserver

NOSQL Meetup

Table & Regions

NOSQL Meetup

Column Storage
•  In HBase, don’t think of a spreadsheet:

All columns same ‘size’ and present (as NULL)

NOSQL Meetup

Column Storage
•  Instead think of tags. Values any length, no
predeﬁned names or widths:

Column names carry info (just like tags)

NOSQL Meetup

Column Families
•  Table consists of 1+ “column families”
•  Column family is unit of performance tuning
•  Stored in separate set of ﬁles
•  Column names scoped like so:
–  “Family:qualiﬁer”

NOSQL Meetup

SorCng
•  Rows stored in byte‐lexographical order (row
keys are raw bytes, not just strings)
•  Furthermore within a row, columns stored in
sorted order
•  Fast, cheap easy to scan adjacent rows &
columns

NOSQL Meetup

SorCng (but there’s more!)
•  Not just scanning, but can do parRal‐key
lookups
•  When combined with compound keys, has the
same properRes as leading‐lel edge indexes
in standard RDBMS
–  (Except your index is distributed of course)
•  Can use a second table to index a primary
table.

NOSQL Meetup

Values
•  Row id, column name, value all byte []
•  Can store ascii, any binary or use serializaRon
(eg: thril, protobuf)

•  Atomic increments available
•  SerializaRon good for structs that are always
read in one unit (eg: Address book entry)

NOSQL Meetup

Values & Versions
•  Each row id + column – stored with Rmestamp
•  HBase stores mulRple versions

•  Can be useful to recover data due to bugs!
•  Use to detect write conﬂicts/collisions

NOSQL Meetup

API Example
Scan scan = new Scan(startRow,
endRow).addFamily(“family”);
ResultScanner scanner = table.getScanner(scan);
Result result;
while ( (result=scanner.next()) != null) {
EnRty e = new EnRty();
dser.deserialize(e, result.getValue("default”, “0”);
}
scanner.close();

NOSQL Meetup

Why HBase?
•  Community is highly acRve, diverse, helpful
•  User list Email acRvity for May: 78 threads
•  IRC Channel #hbase highly acRve
•  Helpful people in mulRple Rmezones, email
answered all hours of the day/night/weekend.

NOSQL Meetup

Why HBase?
•  Commi8er & contributor base broad:
–  PSet, Streamy, SU, Trend Micro, Openplaces, and
more!
•  No monopoly on experts – deep knowledge at
these companies and more!

•  (We’re really friendly… honest!)

NOSQL Meetup

Why HBase?
•  Used in producRon at many companies
•  12 companies listed on
h8p://wiki.apache.org/hadoop/Hbase/
PoweredBy
•  Openplaces, Streamy, SU serve websites out of
HBase

•  Lots of experience to draw upon!

NOSQL Meetup

Why HBase? (Features)
•  Full web management/monitoring UI (master
& regionservers)
•  Push metrics to log ﬁles & Ganglia
•  Rolling upgrades possible! (Including master!)
•  Non‐SQL shell – re‐enforces the non‐SQL‐ness
of HBase

NOSQL Meetup

HBase Features
•  Easy integraRon with Hadoop MR – table input
and output formats ship
•  Cascading connectors for input and output
•  Other ancillary open source acRviRes around
the edges (ORM, schema management, etc)

NOSQL Meetup

Why HBase?
•  But… HBase is slow!
•  That metabrew/last.fm blog post said so!
–  (Also other people too…)

•  “It’s much more than a KV store, but latency is
too great to serve data to the website.”

•  Answer: 0.20

NOSQL Meetup

HBase 0.20
•  Two major and exciRng themes:

•  #1: Performance
•  #2: ZooKeeper integraRon, mulRple masters

NOSQL Meetup

HBase 0.20 vs 0.19
0.19 0.20
Master Single master – if it fails, so Master elecRon and
does the cluster membership via ZK
Compression Not really GZ, LZO
Memory usage Small values cause big New ﬁle‐format limits
indexes and OOM index size (800kB for 10m
entries)
Scan Speed 300‐600ms per 500 rows 20‐30ms per 500 rows

NOSQL Meetup

Zookeeper?
•  A highly available conﬁguraRon storage system
•  Set up in a 2N+1 quorum
•  Hadoop subproject

NOSQL Meetup

Master & Zookeeper
•  Store membership info in ZK
•  Detect dead servers (via ephemeral nodes)
•  Master elecRon and recovery

•  Can kill master and cluster conRnues
•  New master determines state and conRnues

NOSQL Meetup

Performance
•  Signiﬁcant performance gains in 0.20
•  New ﬁle format with 0‐copy infrastructure
•  Scan and get improvements
•  LZO compression
•  Block caching
•  Speed increases as much as 30x!

NOSQL Meetup

Performance
•  0.20 is not the ﬁnal word on performance:
•  Other RPC‐related performance improvements
•  Other Java‐related improvements (G1?, 1.7?)

NOSQL Meetup

Performance Numbers
•  1m rows, 1 column per row (~16 bytes)
–  SequenRal insert: 24s, .024ms/row
–  Random read: 1.42ms/row (avg)
–  Full Scan: 11s, (117ms/10k rows)
•  Performance under cache is very high:
–  1ms to get single row
–  20 ms to read 550 rows
–  75ms to get 5500 rows

NOSQL Meetup

HBase at Stumbleupon
•  Strong commitment to HBase @ SU
•  Supports a HBase commi8er
•  Looking to hire more HBase hackers

NOSQL Meetup

Big accomplishments @ SU
•  Over 9b small rows in single table
–  Sustained import performance – 3‐4 days to
import 9b rows (mysql limiRng speed)
•  1.2m row reads/sec on 19 nodes (!!)
–  That is 60‐100k reads/sec/node sustained, 2hrs
–  Scalable with more nodes
–  HBase has been improved since then

NOSQL Meetup

Fast accomplishments @ SU
•  Extremely high speed increments and writes
•  Supports su.pr analyRcs
•  Su.pr reads from HBase with no intervening
caches
•  Integrated with PHP

NOSQL Meetup

HBase & PHP @ SU
•  PHP access via Thril gateway
•  Easy (PHP) deployment with Thril
•  App developers like sol‐schema, easy
querying and wriRng
•  Want to use HBase for more features and
applicaRons!

NOSQL Meetup

HBase deployment trivia
•  Nodes are 8x16 w/2TB (best price point)
–  Don’t use RAID1. Use RAID0 or JBOD support
•  Ganglia allows overall cluster performance
monitoring
•  Clusters won’t span datacenters
–  We want fully duplicate data for DR anyways
•  Update master with code & conﬁg
–  Rsync to other nodes (1 dir, very easy)
–  Controlled restart for rolling upgrade

NOSQL Meetup

•  HDFS – set xciever limit to 2048, Xmx2000m
–  Never get HDFS problems even under heavy load
•  For 9b row import, randomized key insert order
gives substanRal speedup
•  Give HBase enough ram, you wouldn’t starve
mysql!
•  Import speeds of 200k ops/sec on 19 machines
possible!
–  Hard to provide a SQL‐based source fast enough
–  100k ops/sec typical for sustained

NOSQL Meetup

•  Consider dual writes or logs to get HBase up to
date but without actually moving your data
•  Duplicate data in indexes (already done in
mysql)
•  Have to think about read pa8erns when
designing table key order!

NOSQL Meetup

HBase future @ SU
•  Latency sensiRve cluster
•  Batch/analyRcs cluster
•  Use replicaRon to keep la8er up to date
•  Allows batch jobs to go full thro8le against
reasonably up to date data without risking the
website

NOSQL Meetup

Q&A
•  QuesRons?

•  Stumbleupon is hiring awesome HBase
hackers!

NOSQL Meetup

Hbase Nosql

Related slideshows

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Hbase Nosql

Similar to Hbase Nosql (20)

More from elliando dias

More from elliando dias (20)

Recently uploaded

Recently uploaded (20)

Hbase Nosql