Apache Phoenix with Actor Model (Akka.io) for real-time Big Data Programming Stack

Apache Phoenix with
Actor Model (Akka.io)
for Real-time Big Data
Programming Stack
Why we still need SQL for Big Data ?
How to make Big Data more responsive and faster ?
By http://nguyentantrieu.info
Tech Lead at eClick team - FPT Online

Contents
1. What is Big data and Why ?
2. When standard relational database (Oracle,MySQL, ...) is
not good enough
3. Common problems in big data system
4. Introducing open-source tools in Big Data System
a. Apache Phoenix for ad-hoc query
b. Actor Model and Akka.io for reactive data processing

What Does Big Data Actually Mean?
“Big data means data
that cannot fit easily into
a standard relational database.”
Hal Varian- Chief Economist, Google
http://www.brookings.edu/blogs/techtank/posts/2014/09/11-big-data-definition

When standard relational database
(Oracle,MySQL, ...) is not good enough
the “analytic system” MySQL database from a
startup, tracking all actions in mobile games:
iOS, Android, ...

Complex analytic system and the “scale” pain

Definition from the crowd
“Big data is a term describing the storage
and analysis of large and or complex
data sets using a series of techniques
including, but not limited to: NoSQL,
MapReduce and machine learning.”
Jonathan Stuart Ward and Adam Barker
Source:
http://arxiv.org/abs/1309.5821
http://www.technologyreview.com/view/519851/the-big-data-conundrum-how-to-define-
it/

“Chaotic” fact and the demand
80% of that data is unstructured or “chaotic”
Photos, videos and social media posts - data that says so much
about us - but cannot be analyzed via traditional methods
Demand:
“Finding order among chaos”

3 common problems in Big Data System
1. Size: the volume of the datasets is a critical
factor.
2. Complexity: the structure, behaviour and
permutations of the datasets is a critical
factor.
3. Technologies: the tools and techniques
which are used to process a sizable or
complex dataset is a critical factor.

Introducing open-source tools in Big Data System
Apache Phoenix
as SQL ad-hoc query
engine
Actor Model as nano-service
for reactive data
computation
in the dawn of “Fast data”

Some innovative tools were born
in the dawn of Big Data Age

But could an elephant fly without wings ?

Apache Phoenix with Actor Model (Akka.io) for real-time Big Data Programming Stack

What is Apache Phoenix ?
Apache Phoenix is a SQL skin over HBase.
It means scaling Phoenix just like scale-up and
scale-out the Hbase

Interesting features of Apache Phoenix
● Embedded JDBC driver implements the majority of java.sql interfaces,
including the metadata APIs.
● Allows columns to be modeled as a multi-part row key or key/value cells.
● Full query support with predicate push down and optimal scan key
formation.
● DDL support: CREATE TABLE, DROP TABLE, and ALTER TABLE for
adding/removing columns.
● Versioned schema repository. Snapshot queries use the schema that was
in place when data was written.
● DML support: UPSERT VALUES for row-by-row insertion, UPSERT
SELECT for mass data transfer between the same or different tables, and
DELETE for deleting rows.
● Limited transaction support through client-side batching.
● Single table only - no joins yet and secondary indexes are a work in
progress.
● Follows ANSI SQL standards whenever possible
● Requires HBase v 0.94.2 or above
● 100% Java

Phoenix and SQL tool in Eclipse 4

Phoenix vs Hive
(running over HDFS and HBase)
http://phoenix.apache.org/performance.html

Actor Model in the dawn of “Fast data”

http://youtu.be/TnLiEWglqHk - Google I/O 2014 - The dawn of "Fast Data"

The paper: MillWheel: Fault-Tolerant Stream
Processing at Internet Scale

What is actor model ?
● Carl Hewitt defined the Actor
Model in 1973 as a mathematical
theory that treats “Actors” as the
universal primitives of concurrent
digital computation.
● A fitting model for heavily-parallel
processing in a cloud environment

is the framework for
implementing Actor computation

Inspired by MillWheel of Google and Storm of
Twitter, I have developed my own framework, the
“Rfx” (Reactive Functor Extension) with Akka as
core

The pipeline of finding social trends
in real-time analytics

Facebook Social Trending from a website

Quick demo
Using Akka (Rfx) and Apache Phoenix
for Social Media Real-time Analytics

Links for self-study and research
Actor Model and Programming:
● http://nguyentantrieu.info/blog/the-architecture-for-real-time-event-processing-with-
reactive-actor-model
● http://www.slideshare.net/drorbr/the-actor-model-towards-better-concurrency
● http://www.infoq.com/articles/reactive-cloud-actors
● http://www.mc2ads.com/p/rfx-for-big-data-developer.html
Apache Phoenix
● http://java.dzone.com/articles/apache-phoenix-sql-driver
● http://phoenix.apache.org/Phoenix-in-15-minutes-or-less.html
Big Data and Data Science
● http://www.mc2ads.com and http://www.mc2ads.org
● http://datascience101.wordpress.com
● http://lambda-architecture.net
● http://www.bigdata-startups.com
● https://www.coursera.org/course/datasci

Apache Phoenix with Actor Model (Akka.io) for real-time Big Data Programming Stack

Related slideshows

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (10)

Similar to Apache Phoenix with Actor Model (Akka.io) for real-time Big Data Programming Stack

Similar to Apache Phoenix with Actor Model (Akka.io) for real-time Big Data Programming Stack (20)

More from Trieu Nguyen

More from Trieu Nguyen (20)

Recently uploaded

Recently uploaded (20)

Apache Phoenix with Actor Model (Akka.io) for real-time Big Data Programming Stack