Apache Phoenix with 
Actor Model ( 
for Real-time Big Data 
Programming Stack 
Why we still need SQL for Big Data ? 
How to make Big Data more responsive and faster ? 
Tech Lead at eClick team - FPT Online
1. What is Big data and Why ? 
2. When standard relational database (Oracle,MySQL, ...) is 
not good enough 
3. Common problems in big data system 
4. Introducing open-source tools in Big Data System 
a. Apache Phoenix for ad-hoc query 
b. Actor Model and for reactive data processing
What Does Big Data Actually Mean? 
“Big data means data 
that cannot fit easily into 
a standard relational database.” 
Hal Varian- Chief Economist, Google
When standard relational database 
(Oracle,MySQL, ...) is not good enough 
the “analytic system” MySQL database from a 
startup, tracking all actions in mobile games: 
iOS, Android, ...

Running Zeppelin in Enterprise
Running Zeppelin in EnterpriseRunning Zeppelin in Enterprise
Running Zeppelin in Enterprise

Zeppelin has become a popular way to unlock the value of data lake due to its user interface and appeal to business users. These business users ask their IT department for access to Zeppelin. Enterprise IT department want to help their business users but they have several enterprise concerns such as enterprise security, integration with their corporate LDAP/AD, scalability and multi-user environment, integration with Ranger and Kerberos. This session will walk through enterprise concerns and how these concerns can be handled with Zeppelin.

dws17hadoop summitdataworks summit 2017
HBaseCon2017 HBase/Phoenix @ Scale @ Salesforce
HBaseCon2017 HBase/Phoenix @ Scale @ SalesforceHBaseCon2017 HBase/Phoenix @ Scale @ Salesforce
HBaseCon2017 HBase/Phoenix @ Scale @ Salesforce

This document summarizes Salesforce's use of HBase and Phoenix for storing and querying large amounts of unstructured data at scale. Some key details include: - Salesforce uses over 100 HBase clusters to store both customer and internal data, handling over 4 billion write requests and 600 million read requests per day. - This includes storing login data, archived relational data, user activity, machine metrics and more, totaling over 80 terabytes written and 500 gigabytes read daily. - An internal metrics database collects data from over 80,000 machines, storing 11.4 trillion metrics and growing, with 2.8 trillion metrics added in the last 6 months alone.

apache hbase hbasecon hbasecon2017 phoenix
Complex analytic system and the “scale” pain
Definition from the crowd 
“Big data is a term describing the storage 
and analysis of large and or complex 
data sets using a series of techniques 
including, but not limited to: NoSQL, 
MapReduce and machine learning.” 
Jonathan Stuart Ward and Adam Barker 
“Chaotic” fact and the demand 
80% of that data is unstructured or “chaotic” 
Photos, videos and social media posts - data that says so much 
about us - but cannot be analyzed via traditional methods 
“Finding order among chaos”
3 common problems in Big Data System 
1. Size: the volume of the datasets is a critical 
2. Complexity: the structure, behaviour and 
permutations of the datasets is a critical 
3. Technologies: the tools and techniques 
which are used to process a sizable or 
complex dataset is a critical factor.

Introducing open-source tools in Big Data System 
Apache Phoenix 
as SQL ad-hoc query 
Actor Model as nano-service 
for reactive data 
in the dawn of “Fast data”
Some innovative tools were born 
in the dawn of Big Data Age
But could an elephant fly without wings ?
Apache Phoenix with Actor Model (  for real-time Big Data Programming Stack

But a phoenix can fly !
What is Apache Phoenix ? 
Apache Phoenix is a SQL skin over HBase. 
It means scaling Phoenix just like scale-up and 
scale-out the Hbase
SQL Engine
Interesting features of Apache Phoenix 
● Embedded JDBC driver implements the majority of java.sql interfaces, 
including the metadata APIs. 
● Allows columns to be modeled as a multi-part row key or key/value cells. 
● Full query support with predicate push down and optimal scan key 
adding/removing columns. 
● Versioned schema repository. Snapshot queries use the schema that was 
in place when data was written. 
● DML support: UPSERT VALUES for row-by-row insertion, UPSERT 
SELECT for mass data transfer between the same or different tables, and 
DELETE for deleting rows. 
● Limited transaction support through client-side batching. 
● Single table only - no joins yet and secondary indexes are a work in 
● Follows ANSI SQL standards whenever possible 
● Requires HBase v 0.94.2 or above 
● 100% Java

Apache Phoenix with Actor Model (  for real-time Big Data Programming Stack
the Phoenix table schema
Setting JDBC Phoenix Driver
Phoenix and SQL tool in Eclipse 4

What is actor model ? 
● Carl Hewitt defined the Actor 
Model in 1973 as a mathematical 
theory that treats “Actors” as the 
universal primitives of concurrent 
digital computation. 
● A fitting model for heavily-parallel 
processing in a cloud environment
What actor model ?
is the framework for 
implementing Actor computation
Inspired by MillWheel of Google and Storm of 
Twitter, I have developed my own framework, the 
“Rfx” (Reactive Functor Extension) with Akka as 

More from Trieu Nguyen (20)

Building Your Customer Data Platform with LEO CDP in Travel Industry.pdf
Building Your Customer Data Platform with LEO CDP in Travel Industry.pdfBuilding Your Customer Data Platform with LEO CDP in Travel Industry.pdf
Building Your Customer Data Platform with LEO CDP in Travel Industry.pdf
Building Your Customer Data Platform with LEO CDP - Spa and Hotel Business
Building Your Customer Data Platform with LEO CDP - Spa and Hotel BusinessBuilding Your Customer Data Platform with LEO CDP - Spa and Hotel Business
Building Your Customer Data Platform with LEO CDP - Spa and Hotel Business
Building Your Customer Data Platform with LEO CDP
Building Your Customer Data Platform with LEO CDP Building Your Customer Data Platform with LEO CDP
Building Your Customer Data Platform with LEO CDP
How to track and improve Customer Experience with LEO CDP
How to track and improve Customer Experience with LEO CDPHow to track and improve Customer Experience with LEO CDP
How to track and improve Customer Experience with LEO CDP
[Notes] Customer 360 Analytics with LEO CDP
[Notes] Customer 360 Analytics with LEO CDP[Notes] Customer 360 Analytics with LEO CDP
[Notes] Customer 360 Analytics with LEO CDP
Leo CDP - Pitch Deck
Leo CDP - Pitch DeckLeo CDP - Pitch Deck
Leo CDP - Pitch Deck
LEO CDP - What's new in 2022
LEO CDP  - What's new in 2022LEO CDP  - What's new in 2022
LEO CDP - What's new in 2022
Lộ trình triển khai LEO CDP cho ngành bất động sản
Lộ trình triển khai LEO CDP cho ngành bất động sảnLộ trình triển khai LEO CDP cho ngành bất động sản
Lộ trình triển khai LEO CDP cho ngành bất động sản
Why is LEO CDP important for digital business ?
Why is LEO CDP important for digital business ?Why is LEO CDP important for digital business ?
Why is LEO CDP important for digital business ?
From Dataism to Customer Data Platform
From Dataism to Customer Data PlatformFrom Dataism to Customer Data Platform
From Dataism to Customer Data Platform
Data collection, processing & organization with USPA framework
Data collection, processing & organization with USPA frameworkData collection, processing & organization with USPA framework
Data collection, processing & organization with USPA framework
Part 1: Introduction to digital marketing technology
Part 1: Introduction to digital marketing technologyPart 1: Introduction to digital marketing technology
Part 1: Introduction to digital marketing technology
Why is Customer Data Platform (CDP) ?
Why is Customer Data Platform (CDP) ?Why is Customer Data Platform (CDP) ?
Why is Customer Data Platform (CDP) ?
How to build a Personalized News Recommendation Platform
How to build a Personalized News Recommendation PlatformHow to build a Personalized News Recommendation Platform
How to build a Personalized News Recommendation Platform
How to grow your business in the age of digital marketing 4.0
How to grow your business  in the age of digital marketing 4.0How to grow your business  in the age of digital marketing 4.0
How to grow your business in the age of digital marketing 4.0
Video Ecosystem and some ideas about video big data
Video Ecosystem and some ideas about video big dataVideo Ecosystem and some ideas about video big data
Video Ecosystem and some ideas about video big data
Concepts, use cases and principles to build big data systems (1)
Concepts, use cases and principles to build big data systems (1)Concepts, use cases and principles to build big data systems (1)
Concepts, use cases and principles to build big data systems (1)
Open OTT - Video Content Platform
Open OTT - Video Content PlatformOpen OTT - Video Content Platform
Open OTT - Video Content Platform
Apache Hadoop and Spark: Introduction and Use Cases for Data Analysis
Apache Hadoop and Spark: Introduction and Use Cases for Data AnalysisApache Hadoop and Spark: Introduction and Use Cases for Data Analysis
Apache Hadoop and Spark: Introduction and Use Cases for Data Analysis
Introduction to Recommendation Systems (Vietnam Web Submit)
Introduction to Recommendation Systems (Vietnam Web Submit)Introduction to Recommendation Systems (Vietnam Web Submit)
Introduction to Recommendation Systems (Vietnam Web Submit)

Apache Phoenix with Actor Model ( for real-time Big Data Programming Stack

  • 1. Apache Phoenix with Actor Model ( for Real-time Big Data Programming Stack Why we still need SQL for Big Data ? How to make Big Data more responsive and faster ? By Tech Lead at eClick team - FPT Online
  • 2. Contents 1. What is Big data and Why ? 2. When standard relational database (Oracle,MySQL, ...) is not good enough 3. Common problems in big data system 4. Introducing open-source tools in Big Data System a. Apache Phoenix for ad-hoc query b. Actor Model and for reactive data processing
  • 3. What Does Big Data Actually Mean? “Big data means data that cannot fit easily into a standard relational database.” Hal Varian- Chief Economist, Google
  • 4. When standard relational database (Oracle,MySQL, ...) is not good enough the “analytic system” MySQL database from a startup, tracking all actions in mobile games: iOS, Android, ...
  • 5. Complex analytic system and the “scale” pain
  • 6. Definition from the crowd “Big data is a term describing the storage and analysis of large and or complex data sets using a series of techniques including, but not limited to: NoSQL, MapReduce and machine learning.” Jonathan Stuart Ward and Adam Barker Source: it/
  • 7. “Chaotic” fact and the demand 80% of that data is unstructured or “chaotic” Photos, videos and social media posts - data that says so much about us - but cannot be analyzed via traditional methods Demand: “Finding order among chaos”
  • 8. 3 common problems in Big Data System 1. Size: the volume of the datasets is a critical factor. 2. Complexity: the structure, behaviour and permutations of the datasets is a critical factor. 3. Technologies: the tools and techniques which are used to process a sizable or complex dataset is a critical factor.
  • 9. Introducing open-source tools in Big Data System Apache Phoenix as SQL ad-hoc query engine Actor Model as nano-service for reactive data computation in the dawn of “Fast data”
  • 10. Some innovative tools were born in the dawn of Big Data Age
  • 11. But could an elephant fly without wings ?
  • 13. But a phoenix can fly !
  • 14. What is Apache Phoenix ? Apache Phoenix is a SQL skin over HBase. It means scaling Phoenix just like scale-up and scale-out the Hbase
  • 16. Interesting features of Apache Phoenix ● Embedded JDBC driver implements the majority of java.sql interfaces, including the metadata APIs. ● Allows columns to be modeled as a multi-part row key or key/value cells. ● Full query support with predicate push down and optimal scan key formation. ● DDL support: CREATE TABLE, DROP TABLE, and ALTER TABLE for adding/removing columns. ● Versioned schema repository. Snapshot queries use the schema that was in place when data was written. ● DML support: UPSERT VALUES for row-by-row insertion, UPSERT SELECT for mass data transfer between the same or different tables, and DELETE for deleting rows. ● Limited transaction support through client-side batching. ● Single table only - no joins yet and secondary indexes are a work in progress. ● Follows ANSI SQL standards whenever possible ● Requires HBase v 0.94.2 or above ● 100% Java
  • 20. Phoenix and SQL tool in Eclipse 4
  • 21. Phoenix vs Hive (running over HDFS and HBase)
  • 22. Actor Model in the dawn of “Fast data”
  • 23. - Google I/O 2014 - The dawn of "Fast Data"
  • 24. The paper: MillWheel: Fault-Tolerant Stream Processing at Internet Scale
  • 25. What is actor model ? ● Carl Hewitt defined the Actor Model in 1973 as a mathematical theory that treats “Actors” as the universal primitives of concurrent digital computation. ● A fitting model for heavily-parallel processing in a cloud environment
  • 27. is the framework for implementing Actor computation
  • 28. Inspired by MillWheel of Google and Storm of Twitter, I have developed my own framework, the “Rfx” (Reactive Functor Extension) with Akka as core
  • 29. The pipeline of finding social trends in real-time analytics
  • 30. Facebook Social Trending from a website
  • 31. Quick demo Using Akka (Rfx) and Apache Phoenix for Social Media Real-time Analytics
  • 32. Links for self-study and research Actor Model and Programming: ● reactive-actor-model ● ● ● Apache Phoenix ● ● Big Data and Data Science ● and ● ● ● ●