Using HBase Coprocessors to implement Prospective Search - Berlin Buzzwords - June 2012

CC 2.0 by William Brawley | http://ﬂic.kr/p/7PdUP3

5. Juni
2012

•  Why Hadoop and HBase? 2

•  Social Media Monitoring
•  Prospective Search and Coprocessors
•  Challenges & Lessons Learned
•  Resources to get started

Agenda

5. Juni
2012

Software Architect 3

@ sentric

Co-founder and
organizer of the
Swiss HUG

Contact:
christian.guegi@sentric.ch
http://www.sentric.ch
@chrisgugi

About me

5. Juni
2012

•  Spin-oﬀ of MeMo News AG, the 4

leading provider for Social Media
Monitoring & Analytics in Switzerland
•  Big Data expert, focused on Hadoop,
HBase and Solr
•  Objective: Transforming data into
insights

About sentric

CC 2.0 by Pete Reed | h"p://ﬂic.kr/p/KS9kf

5. Juni
2012

6

Information Information Analysis & Insight
Gathering Processing Interpretation Presentation

Why Hadoop and HBase?

Social Media Monitoring Process

5. Juni
2012

7
Cost
eﬀective

High

SMM
Reliable
scalable

Analytical
RT Alerting
capabilities


Requirements

5. Juni
2012

8

Storage HBase /HDFS

Search Solr

Analytics Hadoop Mahout

Event mechanism (MQ) HBase RowLog

Real-time alerting Prospective search


Technology Stack

CC 2.0 by nolifebeforecoﬀee | http://ﬂic.kr/p/c1UTf

5. Juni
2012

10
Downloaded Articles

match?

Search Agents

Output

Web-UI Reports RT Alerts

Icons by http://dryicons.com

Social Media Monitoring

Overview

5. Juni
2012

11
n Crawler

REST
HBase

RowLog Coprocessor
Web-UI

MySQL Solr RT Alerts



Solution Architecture

5. Juni
2012

•  Inspired by Google Bigtable 12

coprocessors
•  HBase version 0.92
•  Embed code directly into server
processes
•  High-level call interface for clients
•  Automatic scaling, load balancing,
request routing

Short Primer on Coprocessors

Overview

5. Juni
2012

•  Like a database trigger 13

•  Provides event based hooks
•  Concrete Implementations
•  RegionObserver
•  CRUD or DML type operations
•  MasterObserver
•  DDL or metadata operations and cluster
administration
•  WALObserver
•  Write-ahead-log appending and restoration


Observer Classes

5. Juni
2012

14
Client:Get()

CP1:preGet() CP2:preGet() CP3:preGet()

Hregion:Get()

CP1:postGet() CP2:postGet() CP3:postGet()

RegionServer

client response


Observer Execution

5. Juni
2012

•  Comparable to stored procedures 15

•  Custom RPC protocol, used between
client and region server
•  Loaded in region server
•  Client call APIs over single row or a
row range
•  Framework translates row keys to region
location
•  Parallel execution


Endpoint Classes

5. Juni
2012

16
Client code

Batch.Call<CountProtocol,int> Region Server 1

int call(CountProtocol p) {
table,,12345678 CountProtocol
return p.getRowCount();
} .
table,bbb,12345678 CountProtocol

HTable

coprocessorExec()
Region Server 2

table,ccc,12345678 CountProtocol

table,ddd,12345678 CountProtocol
Map<byte[], Integer> countsByRegion


Endpoint Call Routine

5. Juni
2012

•  HBase Security (Version 0.94) 17

•  Aggregate operations avg(), sum()
•  AggregatorProtocol
•  HBASE-3529: Embedded search


Use Cases

5. Juni
2012

18

Processing

Put operations

Prospective
Search
HRegion RT Alerts
HRegionServer



Prospective Search with Coprocessors

5. Juni
2012

•  Standard, virtualized test cluster: 19

4RS/DN, 1HM, 1NN, 3ZK
•  Test dataset created from 2h of live
index (1GB)
•  Drive load on RS/DN


Testing Setup

5. Juni
2012

1800 20
1600

1400

1200
Writes/sec

1000

800

600

400

200

0
0 10 50 100 200 400 800

# of agents


Test Results

CC 2.0 by Sean Maurik | h"p://ﬂic.kr/p/JUduu

5. Juni
2012

•  Everyone is still learning 22

•  Some issues only appear at scale
•  Production cluster conﬁguration
•  Hardware issues
•  Tuning cluster conﬁguration to our work
loads
•  HBase stability
•  Monitoring health of HBase

Challenges & Lessons Learned

Challenges

5. Juni
2012

•  Be careful with expensive operations 23

in coprocessors
•  At scale, nothing works as advertised
•  Monitoring/Operational tooling is
most important
•  Play with all the conﬁgurations and
benchmark for tuning

Challenges & Lessons Learned

Lessons

5. Juni
2012

•  https://blogs.apache.org/hbase/ 24

entry/coprocessor_introduction
•  http://hbase.apache.org/apidocs/
index.html
•  http://www.lilyproject.org/lily/about/
playground/hbaserowlog.html
•  http://www.github.com/sentric/
HBasePS

Resources to get started

5. Juni
2012

25

Questions?
Christian Gügi
christian.guegi@sentric.ch

Berlin Buzzwords 2012

Thank you!

Using HBase Coprocessors to implement Prospective Search - Berlin Buzzwords - June 2012

Related slideshows

More Related Content

Viewers also liked

Viewers also liked (12)

Similar to Using HBase Coprocessors to implement Prospective Search - Berlin Buzzwords - June 2012

Similar to Using HBase Coprocessors to implement Prospective Search - Berlin Buzzwords - June 2012 (20)

More from Christian Gügi

More from Christian Gügi (6)

Recently uploaded

Recently uploaded (20)

Using HBase Coprocessors to implement Prospective Search - Berlin Buzzwords - June 2012