SlideShare a Scribd company logo
Page1 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Apache Phoenix and HBase: Past, Present
and Future of SQL over HBase
Enis Soztutar (enis@hortonworks.com)
Page2 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
About Me
Enis Soztutar
Committer and PMC member in Apache HBase, Phoenix, and Hadoop
HBase/Phoenix team @Hortonworks
Twitter @enissoz
Disclaimer: Not a SQL expert!
Page3 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Outline
PART I – The Past (a.k.a. All the existing stuff)
 Phoenix the basics
 Architecture
 Overview of existing Phoenix features
PART II – The Present (a.k.a. All the recent stuff)
 Look at recent releases
 Transactions
 Phoenix Query Server
 Other features
PART III – The Future (a.k.a. All the upcoming stuff)
 Calcite integration
 Phoenix – Hive
Page4 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Part I – The Past
All the existing stuff !
Page5 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Obligatory Slide - Who uses Phoenix
Page6 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Phoenix – The Basics
• Hope everybody is familiar with HBase
• Otherwise you are in the wrong talk!
• What is wrong with pure-HBase?
• HBase is a powerful, flexible and extensible “engine”
• Too low level
• Have to write java code to do anything!
• Phoenix is relational layer over HBase
• Also described as a SQL-Skin
• Looking more and more like a generic SQL engine
• Why not Hive / Spark SQL / other SQL-over-Hadoop
• OTLP versus OLAP
• As fast as HBase, 1 ms query, 10K-1M qps
Page7 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Why SQL?
Page8 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
From CDK Global
slides
https://phoenix.apache.
org/presentations/Strata
HadoopWorld.pdf
Page9 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
HBase Architecture
DataNode
RegionServer 2
T:foo, region:a
T:bar, region:54
T:foo, region:t
Application
HBase client
DataNode
RegionServer 1
T:foo, region:c
T:bar, region:14
T:foo, region:d
DataNode
RegionServer 3
T:bar, region:32
T:foo, region:k
ZooKeeper
Quorum
Page10 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Phoenix Architecture
DataNode
RegionServer 2
T:foo, region:c
T:bar, region:54
T:foo, region:t
Phoenix RPC
endpoint
px
px
Application
Phoenix client / JDBC
HBase client
DataNode
RegionServer 1
T:foo, region:c
T:bar, region:14
T:foo, region:d
Phoenix RPC
endpoint
px
px
DataNode
RegionServer 3
T:SYSTEM.CATALOG
T:bar, region:32
T:foo, region:k
Phoenix RPC
endpoint
px
px
ZooKeeper
Quorum
Page11 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Phoenix Goodies
SQL DataTypes
Schemas / DDL / HBase table properties
Composite Types (Composite Primary Key)
Map existing HBase tables
Write from HBase, read from Phoenix
Salting
Parallel Scan
Skip scan
Filter push down
Statistics Collection / Guideposts
Page12 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
DDL Example
CREATE TABLE IF NOT EXISTS METRIC_RECORD (
METRIC_NAME VARCHAR,
HOSTNAME VARCHAR,
SERVER_TIME UNSIGNED_LONG NOT NULL
METRIC_VALUE DOUBLE,
…
CONSTRAINT pk PRIMARY KEY (METRIC_NAME, HOSTNAME,
SERVER_TIME))
DATA_BLOCK_ENCODING=’FAST_DIFF', TTL=604800,
COMPRESSION=‘SNAPPY’
SPLIT ON ('a', 'k', 'm');
Page13 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
METRIC_NAME HOSTNAME SERVER_TIME METRIC_VALUE
Regionserver.readRequestCount cn011.hortonworks.com 1396743589 92045759
Regionserver.readRequestCount cn011.hortonworks.com 1396767589 93051916
Regionserver.readRequestCount cn011.hortonworks.com …. …
Regionserver.readRequestCount cn012. hortonworks.com 1396743589
….. … … …
Regionserver.wal.bytesWritten cn011.hortonworks.com
Regionserver.wal.bytesWritten …. …. …
SORT ORDERSORTORDER
HBASE ROW KEY OTHER COLUMNS
Page14 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Parallel Scan
SELECT * FROM METRIC_RECORD;
CLIENT 4-CHUNK PARALLEL 1-WAY
FULL SCAN OVER METRIC_RECORD
Region1
Region2
Region3
Region4
Client
RS3RS2
RS1
scanscanscanscan
Page15 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Filter push down
SELECT * FROM METRIC_RECORD
WHERE SERVER_TIME > NOW() - 7;
CLIENT 4-CHUNK PARALLEL 1-WAY
FULL SCAN OVER METRIC_RECORD
SERVER FILTER BY
SERVER_TIME > DATE
'2016-04-06 09:09:05.978’
Region1
Region2
Region3
Region4
Client
RS3RS2RS1
scanscanscanscan
Server-side Filter
Page16 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Skip Scan
SELECT * FROM METRIC_RECORD
WHERE METRIC_NAME LIKE 'abc%'
AND HOSTNAME in ('host1’,
'host2');
CLIENT 1-CHUNK PARALLEL 1-WAY SKIP
SCAN ON 2 RANGES OVER
METRIC_RECORD ['abc','host1'] -
['abd','host2']
Region1
Region2
Region3
Region4
Client
RS3RS2RS1
Skip scan
Page17 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
TopN
SELECT * FROM METRIC_RECORD
WHERE SERVER_TIME > NOW() - 7
ORDER BY HOSTNAME LIMIT 5;
CLIENT 4-CHUNK PARALLEL 4-WAY FULL
SCAN OVER METRIC_RECORD
SERVER FILTER BY SERVER_TIME > …
SERVER TOP 5 ROWS SORTED BY
[HOSTNAME]
CLIENT MERGE SORT
Region1
Region2
Region3
Region4
Client
RS3RS2RS1
scanscanscanscan
Sort by HOSTNAME
Return only 5
ROWS
Page18 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Aggregation
SELECT METRIC_NAME, HOSTNAME,
AVG(METRIC_VALUE)
FROM METRIC_RECORD
WHERE SERVER_TIME > NOW() - 7
GROUP BY METRIC_NAME, HOSTNAME
ORDER BY METRIC_NAME, HOSTNAME;
CLIENT 4-CHUNK PARALLEL 1-WAY FULL
SCAN OVER METRIC_RECORD
SERVER FILTER BY SERVER_TIME > …
SERVER AGGREGATE INTO ORDERED
DISTINCT ROWS BY
[METRIC_NAME, HOSTNAME]
CLIENT MERGE SORT
Region1
Region2
Region3
Region4
Client
RS3RS2RS1
scanscanscanscan
Return only
aggregated data by
METRIC_NAME,
HOSTNAME
Page19 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Joins and subqueries in Phoenix
Grammar
• Inner, Left, Right, Full outer join, Cross join
• Semi-join / Anti-join
Algorithms
• Hash-join, sort-merge join
• Hash-join table is computed and pushed to each regionserver from client
Optimizations
• Predicate push-down
• PK-to-FK join optimization
• Global index with missing columns
• Correlated query rewrite
Page20 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Joins and subqueries in Phoenix
Phoenix can execute most of TPC-H queries!
No nested loop join
With Calcite support, more improvements soon
No statistical Guided join selection yet
Not very good at executing very big joins
• No generic YARN / Tez execution layer
• But Hive / Spark support for generic DAG execution
Page21 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Secondary Indexes
HBase table is a sorted map
• Everything in HBase is sorted in primary key order
• Full or partial scans in sort order is very efficient in HBase
• Sort data differently with secondary index dimensions
Two types
• Global index
• Local index
Query
• Indexes are “covered”
• Indexes are automatically selected from queries
• Only covered columns are returned from index without going back to data table
Page22 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Global and Local Index
Global Index
• A single instance for all table data in a
different sort order
• A different HBase table per index
• Optimized for read-heavy use cases
• Can be one edit “behind” actual primary
data
• Transactional tables indices have ACID
guarantees
• Different consistency / durability for
mutable / immutable tables
Local Index
• Multiple mini-instances per region
• Uses same HBase table, different cf
• Optimized for write-heavy use cases
• Atomic commit and visibility (coming soon)
• Queries have to ask all regions for relevant
data from index
Page23 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Part II – The Present
All the recent stuff !
Page24 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Release Note Highlights
4.4
• Functional Indexes
• UDFs
• Query Server
• UNION ALL
• MR Index Build
• Spark Integration
• Date built-in functions
4.5
• Client-side per-statement metrics
• SELECT without FROM
• ALTER TABLE with VIEWS
• Math and Array built-in functions
Page25 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Release Note Highlights
4.6
• ROW_TIMESTAMP for HBase native timestamps
• Support for correlate variable
• Support for un-nesting arrays
• Web-app for visualizing trace info (alpha)
4.7
• Transaction support
• Enhanced secondary index consistency guarantees
• Statistics improvements
• Perf improvements
Page26 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Row Timestamps
A pseudo-column for HBase native timestamps (versions)
Enables setting and querying cell timestamps
Perfect for time-series use cases
• Combine with FIFO / Date Tiered Compaction policies
• And HBase scan file pruning based on min-max ts for very efficient scans
CREATE TABLE METRICS_TABLE (
CREATED_DATE NOT NULL DATE,
METRIC_ID NOT NULL CHAR(15), METRIC_VALUE LONG
CONSTRAINT PK PRIMARY KEY(CREATED_DATE ROW_TIMESTAMP,
METRIC_ID)) SALT_BUCKETS = 8;
Page27 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Transactions
Uses Tephra
Snapshot isolation semantics
Completely optional.
• Can be enabled per-table (TRANSACTIONAL=true)
• Transactional and non-transactional tables can live side by side
Transactions see their own uncommitted data
Released in 4.7, will GA in 5.0
Optimistic Concurrency Control
• No locking for rows
• Transactions have to roll back and undo their writes in case of conflict
• Cost of conflict is higher
Page28 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Tephra Architecture
RegionServer 2
Tephra / HBase Client
RegionServer 1 RegionServer 3
HBase client
ZooKeeper
Quorum
Tephra Trx Manager
(active)
Tephra Trx Manager
(standby)
Page29 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Transaction Lifecycle
From Tephra
presentation
http://www.slideshare.n
et/alexbaranau/transacti
ons-over-hbase
Page30 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Phoenix Query Server
Similar to HBase REST Server / Hive Server 2
Built on top of Calcite’s Avatica Server with Phoenix bindings
Embeds a Phoenix thick client inside
No client side sorting / join!
Protobuf-3.0 over HTTP protocol
Has a (thin) JDBC driver
Allows ODBC driver for Phoenix
Page31 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Phoenix architecture revisited (thick client)
RegionServer 2
T:foo, region:d
Phoenix RPC
endpoint
px
Application
RegionServer 1
T:foo, region:d
Phoenix RPC
endpoint
px
RegionServer 3
T:foo, region:d
Phoenix RPC
endpoint
px
HBase client
Phoenix client / JDBC
Page32 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Phoenix Query Server
Phoenix Query Server (thin client)
RegionServer 2
T:foo, region:d
Phoenix RPC
endpoint
px
Application
Phoenix thin client / JDBC
RegionServer 1
T:foo, region:d
Phoenix RPC
endpoint
px
RegionServer 3
T:foo, region:d
Phoenix RPC
endpoint
px
Phoenix client / JDBC
HBase client
Phoenix Query Server
Phoenix client / JDBC
HBase client
Phoenix Query Server
Phoenix client / JDBC
HBase client
Page33 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Other new features (4.8+)
Shaded client by default. No more library dependency problems!
Phoenix schema mapping to HBase namespace
• Allows using isolation and security features of HBase namespaces
• Standard SQL syntax:
CREATE SCHEMA FOO;
USE FOO;
LIMIT / OFFSET
• We already had LIMIT. Now we have OFFSET
• Together with Row-Value-Constructs, covers most of cursor use cases
Page34 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Part III – The Future
All the upcoming stuff !
Page35 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Local Index
• Local Index re-implemented
• Instead of a different table, now local index data is kept within the same data
table
• Local index data goes into a different column family
• Index and data is committed together atomically without external transactions
• Bunch of stability improvements with region splits and merges
Page36 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Calcite Integration
Calcite is a framework for:
• Query parser
• Compiler
• Planner
• Cost based optimizer
SQL-92 compliant
Based on relational algebra
Cost based optimizer with default rules + pluggable rules per-backend
Used by Hive / Drill / Kylin / Samza, etc.
Page37 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Calcite Integration
Page38 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Phoenix - Hive integration
Hive is a very rich and generic execution engine
Uses Tez + YARN to execute arbitrary DAG
Hive integration enables big joins and other Hive features
Phoenix DDL with HiveQL
Data insert / update delete (DML) with HiveQL
Predicate pushdown, salting, partitioning, partition pruning, etc
Can use secondary indexes as well since it uses Phoenix compiler
https://issues.apache.org/jira/browse/PHOENIX-2743
Page39 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Future<Phoenix>
JSON support
TPC-H / Microstrategy / Tableau queries
Sqoop integration
Support Omid based transactions
Dogfooding within the Hadoop-ecosystem
• Ambari Metrics Service (AMS) uses Phoenix
• YARN will soon use HBase / Phoenix (ATS)
STRUCT type
Improvements to cost based optimization
Security and other HBase features used from Phoenix
See https://phoenix.apache.org/roadmap.html
Page40 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Further Reference
Even more info on https://phoenix.apache.org
 New Features: https://phoenix.apache.org/recent.html
 Roadmap: https://phoenix.apache.org/roadmap.html
Get involved in mailing lists
 user@phoenix.apache.org
 dev@phoenix.apache.org
Page41 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Thanks
Q & A

More Related Content

What's hot

How to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiHow to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
Flink Forward
 
ORC improvement in Apache Spark 2.3
ORC improvement in Apache Spark 2.3ORC improvement in Apache Spark 2.3
ORC improvement in Apache Spark 2.3
DataWorks Summit
 
Apache Phoenix + Apache HBase
Apache Phoenix + Apache HBaseApache Phoenix + Apache HBase
Apache Phoenix + Apache HBase
DataWorks Summit/Hadoop Summit
 
OLTP+OLAP=HTAP
 OLTP+OLAP=HTAP OLTP+OLAP=HTAP
OLTP+OLAP=HTAP
EDB
 
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBaseApache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
DataWorks Summit/Hadoop Summit
 
Application Timeline Server - Past, Present and Future
Application Timeline Server - Past, Present and FutureApplication Timeline Server - Past, Present and Future
Application Timeline Server - Past, Present and Future
VARUN SAXENA
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
Alluxio, Inc.
 
Apache HBase Performance Tuning
Apache HBase Performance TuningApache HBase Performance Tuning
Apache HBase Performance Tuning
Lars Hofhansl
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
Dvir Volk
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
DataWorks Summit
 
Druid
DruidDruid
HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
HBaseCon 2012 | HBase Schema Design - Ian Varley, SalesforceHBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
Cloudera, Inc.
 
Apache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & InternalsApache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & Internals
Anton Kirillov
 
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
GetInData
 
Apache NiFi Crash Course Intro
Apache NiFi Crash Course IntroApache NiFi Crash Course Intro
Apache NiFi Crash Course Intro
DataWorks Summit/Hadoop Summit
 
Apache Arrow: In Theory, In Practice
Apache Arrow: In Theory, In PracticeApache Arrow: In Theory, In Practice
Apache Arrow: In Theory, In Practice
Dremio Corporation
 
Using Apache Hive with High Performance
Using Apache Hive with High PerformanceUsing Apache Hive with High Performance
Using Apache Hive with High Performance
Inderaj (Raj) Bains
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
Mike Dirolf
 
Hive tuning
Hive tuningHive tuning
Hive tuning
Michael Zhang
 
Understanding and Improving Code Generation
Understanding and Improving Code GenerationUnderstanding and Improving Code Generation
Understanding and Improving Code Generation
Databricks
 

What's hot (20)

How to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiHow to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
 
ORC improvement in Apache Spark 2.3
ORC improvement in Apache Spark 2.3ORC improvement in Apache Spark 2.3
ORC improvement in Apache Spark 2.3
 
Apache Phoenix + Apache HBase
Apache Phoenix + Apache HBaseApache Phoenix + Apache HBase
Apache Phoenix + Apache HBase
 
OLTP+OLAP=HTAP
 OLTP+OLAP=HTAP OLTP+OLAP=HTAP
OLTP+OLAP=HTAP
 
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBaseApache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
 
Application Timeline Server - Past, Present and Future
Application Timeline Server - Past, Present and FutureApplication Timeline Server - Past, Present and Future
Application Timeline Server - Past, Present and Future
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
 
Apache HBase Performance Tuning
Apache HBase Performance TuningApache HBase Performance Tuning
Apache HBase Performance Tuning
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
 
Druid
DruidDruid
Druid
 
HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
HBaseCon 2012 | HBase Schema Design - Ian Varley, SalesforceHBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
 
Apache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & InternalsApache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & Internals
 
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
 
Apache NiFi Crash Course Intro
Apache NiFi Crash Course IntroApache NiFi Crash Course Intro
Apache NiFi Crash Course Intro
 
Apache Arrow: In Theory, In Practice
Apache Arrow: In Theory, In PracticeApache Arrow: In Theory, In Practice
Apache Arrow: In Theory, In Practice
 
Using Apache Hive with High Performance
Using Apache Hive with High PerformanceUsing Apache Hive with High Performance
Using Apache Hive with High Performance
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Hive tuning
Hive tuningHive tuning
Hive tuning
 
Understanding and Improving Code Generation
Understanding and Improving Code GenerationUnderstanding and Improving Code Generation
Understanding and Improving Code Generation
 

Similar to Apache phoenix: Past, Present and Future of SQL over HBAse

Apache Phoenix and HBase - Hadoop Summit Tokyo, Japan
Apache Phoenix and HBase - Hadoop Summit Tokyo, JapanApache Phoenix and HBase - Hadoop Summit Tokyo, Japan
Apache Phoenix and HBase - Hadoop Summit Tokyo, Japan
Ankit Singhal
 
Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0
DataWorks Summit
 
Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0
DataWorks Summit
 
Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0
DataWorks Summit
 
Apache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
Apache Phoenix and Apache HBase: An Enterprise Grade Data WarehouseApache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
Apache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
Josh Elser
 
HBase Read High Availability Using Timeline Consistent Region Replicas
HBase  Read High Availability Using Timeline Consistent Region ReplicasHBase  Read High Availability Using Timeline Consistent Region Replicas
HBase Read High Availability Using Timeline Consistent Region Replicas
enissoz
 
Data Governance in Apache Falcon - Hadoop Summit Brussels 2015
Data Governance in Apache Falcon - Hadoop Summit Brussels 2015 Data Governance in Apache Falcon - Hadoop Summit Brussels 2015
Data Governance in Apache Falcon - Hadoop Summit Brussels 2015
Seetharam Venkatesh
 
Driving Enterprise Data Governance for Big Data Systems through Apache Falcon
Driving Enterprise Data Governance for Big Data Systems through Apache FalconDriving Enterprise Data Governance for Big Data Systems through Apache Falcon
Driving Enterprise Data Governance for Big Data Systems through Apache Falcon
DataWorks Summit
 
Design Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data AnalyticsDesign Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data Analytics
DataWorks Summit
 
Design Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data AnalyticsDesign Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data Analytics
DataWorks Summit
 
What's new in Ambari
What's new in AmbariWhat's new in Ambari
What's new in Ambari
DataWorks Summit
 
Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0
Ankit Singhal
 
Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017
alanfgates
 
HBase Read High Availability Using Timeline-Consistent Region Replicas
HBase Read High Availability Using Timeline-Consistent Region ReplicasHBase Read High Availability Using Timeline-Consistent Region Replicas
HBase Read High Availability Using Timeline-Consistent Region Replicas
HBaseCon
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
DataWorks Summit
 
Hive 3 a new horizon
Hive 3  a new horizonHive 3  a new horizon
Hive 3 a new horizon
Artem Ervits
 
Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0
DataWorks Summit
 
HBase Read High Availabilty using Timeline Consistent Region Replicas
HBase Read High Availabilty using Timeline Consistent Region ReplicasHBase Read High Availabilty using Timeline Consistent Region Replicas
HBase Read High Availabilty using Timeline Consistent Region Replicas
DataWorks Summit
 
Data Con LA 2018 - Streaming and IoT by Pat Alwell
Data Con LA 2018 - Streaming and IoT by Pat AlwellData Con LA 2018 - Streaming and IoT by Pat Alwell
Data Con LA 2018 - Streaming and IoT by Pat Alwell
Data Con LA
 
Sub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scaleSub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scale
Yifeng Jiang
 

Similar to Apache phoenix: Past, Present and Future of SQL over HBAse (20)

Apache Phoenix and HBase - Hadoop Summit Tokyo, Japan
Apache Phoenix and HBase - Hadoop Summit Tokyo, JapanApache Phoenix and HBase - Hadoop Summit Tokyo, Japan
Apache Phoenix and HBase - Hadoop Summit Tokyo, Japan
 
Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0
 
Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0
 
Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0
 
Apache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
Apache Phoenix and Apache HBase: An Enterprise Grade Data WarehouseApache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
Apache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
 
HBase Read High Availability Using Timeline Consistent Region Replicas
HBase  Read High Availability Using Timeline Consistent Region ReplicasHBase  Read High Availability Using Timeline Consistent Region Replicas
HBase Read High Availability Using Timeline Consistent Region Replicas
 
Data Governance in Apache Falcon - Hadoop Summit Brussels 2015
Data Governance in Apache Falcon - Hadoop Summit Brussels 2015 Data Governance in Apache Falcon - Hadoop Summit Brussels 2015
Data Governance in Apache Falcon - Hadoop Summit Brussels 2015
 
Driving Enterprise Data Governance for Big Data Systems through Apache Falcon
Driving Enterprise Data Governance for Big Data Systems through Apache FalconDriving Enterprise Data Governance for Big Data Systems through Apache Falcon
Driving Enterprise Data Governance for Big Data Systems through Apache Falcon
 
Design Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data AnalyticsDesign Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data Analytics
 
Design Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data AnalyticsDesign Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data Analytics
 
What's new in Ambari
What's new in AmbariWhat's new in Ambari
What's new in Ambari
 
Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0
 
Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017
 
HBase Read High Availability Using Timeline-Consistent Region Replicas
HBase Read High Availability Using Timeline-Consistent Region ReplicasHBase Read High Availability Using Timeline-Consistent Region Replicas
HBase Read High Availability Using Timeline-Consistent Region Replicas
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
 
Hive 3 a new horizon
Hive 3  a new horizonHive 3  a new horizon
Hive 3 a new horizon
 
Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0
 
HBase Read High Availabilty using Timeline Consistent Region Replicas
HBase Read High Availabilty using Timeline Consistent Region ReplicasHBase Read High Availabilty using Timeline Consistent Region Replicas
HBase Read High Availabilty using Timeline Consistent Region Replicas
 
Data Con LA 2018 - Streaming and IoT by Pat Alwell
Data Con LA 2018 - Streaming and IoT by Pat AlwellData Con LA 2018 - Streaming and IoT by Pat Alwell
Data Con LA 2018 - Streaming and IoT by Pat Alwell
 
Sub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scaleSub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scale
 

More from enissoz

Meet HBase 2.0
Meet HBase 2.0Meet HBase 2.0
Meet HBase 2.0
enissoz
 
Meet hbase 2.0
Meet hbase 2.0Meet hbase 2.0
Meet hbase 2.0
enissoz
 
Operating and supporting HBase Clusters
Operating and supporting HBase ClustersOperating and supporting HBase Clusters
Operating and supporting HBase Clusters
enissoz
 
HBase state of the union
HBase   state of the unionHBase   state of the union
HBase state of the union
enissoz
 
Meet HBase 1.0
Meet HBase 1.0Meet HBase 1.0
Meet HBase 1.0
enissoz
 
Mapreduce over snapshots
Mapreduce over snapshotsMapreduce over snapshots
Mapreduce over snapshots
enissoz
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
enissoz
 

More from enissoz (7)

Meet HBase 2.0
Meet HBase 2.0Meet HBase 2.0
Meet HBase 2.0
 
Meet hbase 2.0
Meet hbase 2.0Meet hbase 2.0
Meet hbase 2.0
 
Operating and supporting HBase Clusters
Operating and supporting HBase ClustersOperating and supporting HBase Clusters
Operating and supporting HBase Clusters
 
HBase state of the union
HBase   state of the unionHBase   state of the union
HBase state of the union
 
Meet HBase 1.0
Meet HBase 1.0Meet HBase 1.0
Meet HBase 1.0
 
Mapreduce over snapshots
Mapreduce over snapshotsMapreduce over snapshots
Mapreduce over snapshots
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
 

Recently uploaded

South Mumbai @Call @Girls Whatsapp 9930687706 With High Profile Service
South Mumbai @Call @Girls Whatsapp 9930687706 With High Profile ServiceSouth Mumbai @Call @Girls Whatsapp 9930687706 With High Profile Service
South Mumbai @Call @Girls Whatsapp 9930687706 With High Profile Service
kolkata dolls
 
Biology for computer science BBOC407 vtu
Biology for computer science BBOC407 vtuBiology for computer science BBOC407 vtu
Biology for computer science BBOC407 vtu
santoshpatilrao33
 
Bangalore @ℂall @Girls ꧁❤ 0000000000 ❤꧂@ℂall @Girls Service Vip Top Model Safe
Bangalore @ℂall @Girls ꧁❤ 0000000000 ❤꧂@ℂall @Girls Service Vip Top Model SafeBangalore @ℂall @Girls ꧁❤ 0000000000 ❤꧂@ℂall @Girls Service Vip Top Model Safe
Bangalore @ℂall @Girls ꧁❤ 0000000000 ❤꧂@ℂall @Girls Service Vip Top Model Safe
bookhotbebes1
 
LeetCode Database problems solved using PySpark.pdf
LeetCode Database problems solved using PySpark.pdfLeetCode Database problems solved using PySpark.pdf
LeetCode Database problems solved using PySpark.pdf
pavanaroshni1977
 
Software Engineering and Project Management - Introduction to Project Management
Software Engineering and Project Management - Introduction to Project ManagementSoftware Engineering and Project Management - Introduction to Project Management
Software Engineering and Project Management - Introduction to Project Management
Prakhyath Rai
 
Quadcopter Dynamics, Stability and Control
Quadcopter Dynamics, Stability and ControlQuadcopter Dynamics, Stability and Control
Quadcopter Dynamics, Stability and Control
Blesson Easo Varghese
 
system structure in operating systems.pdf
system structure in operating systems.pdfsystem structure in operating systems.pdf
system structure in operating systems.pdf
zyroxsunny
 
How to Manage Internal Notes in Odoo 17 POS
How to Manage Internal Notes in Odoo 17 POSHow to Manage Internal Notes in Odoo 17 POS
How to Manage Internal Notes in Odoo 17 POS
Celine George
 
IWISS Catalog 2024
IWISS Catalog 2024IWISS Catalog 2024
IWISS Catalog 2024
Iwiss Tools Co.,Ltd
 
GUIA_LEGAL_CHAPTER-9_COLOMBIAN ELECTRICITY (1).pdf
GUIA_LEGAL_CHAPTER-9_COLOMBIAN ELECTRICITY (1).pdfGUIA_LEGAL_CHAPTER-9_COLOMBIAN ELECTRICITY (1).pdf
GUIA_LEGAL_CHAPTER-9_COLOMBIAN ELECTRICITY (1).pdf
ProexportColombia1
 
CONVEGNO DA IRETI 18 giugno 2024 | PASQUALE Donato
CONVEGNO DA IRETI 18 giugno 2024 | PASQUALE DonatoCONVEGNO DA IRETI 18 giugno 2024 | PASQUALE Donato
CONVEGNO DA IRETI 18 giugno 2024 | PASQUALE Donato
Servizi a rete
 
Rohini @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model Safe
Rohini @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model SafeRohini @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model Safe
Rohini @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model Safe
binna singh$A17
 
Introduction to neural network (Module 1).pptx
Introduction to neural network (Module 1).pptxIntroduction to neural network (Module 1).pptx
Introduction to neural network (Module 1).pptx
archanac21
 
UNIT I INCEPTION OF INFORMATION DESIGN 20CDE09-ID
UNIT I INCEPTION OF INFORMATION DESIGN 20CDE09-IDUNIT I INCEPTION OF INFORMATION DESIGN 20CDE09-ID
UNIT I INCEPTION OF INFORMATION DESIGN 20CDE09-ID
GOWSIKRAJA PALANISAMY
 
Understanding Cybersecurity Breaches: Causes, Consequences, and Prevention
Understanding Cybersecurity Breaches: Causes, Consequences, and PreventionUnderstanding Cybersecurity Breaches: Causes, Consequences, and Prevention
Understanding Cybersecurity Breaches: Causes, Consequences, and Prevention
Bert Blevins
 
Lecture 6 - The effect of Corona effect in Power systems.pdf
Lecture 6 - The effect of Corona effect in Power systems.pdfLecture 6 - The effect of Corona effect in Power systems.pdf
Lecture 6 - The effect of Corona effect in Power systems.pdf
peacekipu
 
( Call  ) Girls Vasant Kunj Just 9873940964 High Class Model Shneha Patil
( Call  ) Girls Vasant Kunj Just 9873940964 High Class Model Shneha Patil( Call  ) Girls Vasant Kunj Just 9873940964 High Class Model Shneha Patil
( Call  ) Girls Vasant Kunj Just 9873940964 High Class Model Shneha Patil
kinni singh$A17
 
Paharganj @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Arti Singh Top Model Safe
Paharganj @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Arti Singh Top Model SafePaharganj @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Arti Singh Top Model Safe
Paharganj @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Arti Singh Top Model Safe
aarusi sexy model
 
Response & Safe AI at Summer School of AI at IIITH
Response & Safe AI at Summer School of AI at IIITHResponse & Safe AI at Summer School of AI at IIITH
Response & Safe AI at Summer School of AI at IIITH
IIIT Hyderabad
 
Literature Reivew of Student Center Design
Literature Reivew of Student Center DesignLiterature Reivew of Student Center Design
Literature Reivew of Student Center Design
PriyankaKarn3
 

Recently uploaded (20)

South Mumbai @Call @Girls Whatsapp 9930687706 With High Profile Service
South Mumbai @Call @Girls Whatsapp 9930687706 With High Profile ServiceSouth Mumbai @Call @Girls Whatsapp 9930687706 With High Profile Service
South Mumbai @Call @Girls Whatsapp 9930687706 With High Profile Service
 
Biology for computer science BBOC407 vtu
Biology for computer science BBOC407 vtuBiology for computer science BBOC407 vtu
Biology for computer science BBOC407 vtu
 
Bangalore @ℂall @Girls ꧁❤ 0000000000 ❤꧂@ℂall @Girls Service Vip Top Model Safe
Bangalore @ℂall @Girls ꧁❤ 0000000000 ❤꧂@ℂall @Girls Service Vip Top Model SafeBangalore @ℂall @Girls ꧁❤ 0000000000 ❤꧂@ℂall @Girls Service Vip Top Model Safe
Bangalore @ℂall @Girls ꧁❤ 0000000000 ❤꧂@ℂall @Girls Service Vip Top Model Safe
 
LeetCode Database problems solved using PySpark.pdf
LeetCode Database problems solved using PySpark.pdfLeetCode Database problems solved using PySpark.pdf
LeetCode Database problems solved using PySpark.pdf
 
Software Engineering and Project Management - Introduction to Project Management
Software Engineering and Project Management - Introduction to Project ManagementSoftware Engineering and Project Management - Introduction to Project Management
Software Engineering and Project Management - Introduction to Project Management
 
Quadcopter Dynamics, Stability and Control
Quadcopter Dynamics, Stability and ControlQuadcopter Dynamics, Stability and Control
Quadcopter Dynamics, Stability and Control
 
system structure in operating systems.pdf
system structure in operating systems.pdfsystem structure in operating systems.pdf
system structure in operating systems.pdf
 
How to Manage Internal Notes in Odoo 17 POS
How to Manage Internal Notes in Odoo 17 POSHow to Manage Internal Notes in Odoo 17 POS
How to Manage Internal Notes in Odoo 17 POS
 
IWISS Catalog 2024
IWISS Catalog 2024IWISS Catalog 2024
IWISS Catalog 2024
 
GUIA_LEGAL_CHAPTER-9_COLOMBIAN ELECTRICITY (1).pdf
GUIA_LEGAL_CHAPTER-9_COLOMBIAN ELECTRICITY (1).pdfGUIA_LEGAL_CHAPTER-9_COLOMBIAN ELECTRICITY (1).pdf
GUIA_LEGAL_CHAPTER-9_COLOMBIAN ELECTRICITY (1).pdf
 
CONVEGNO DA IRETI 18 giugno 2024 | PASQUALE Donato
CONVEGNO DA IRETI 18 giugno 2024 | PASQUALE DonatoCONVEGNO DA IRETI 18 giugno 2024 | PASQUALE Donato
CONVEGNO DA IRETI 18 giugno 2024 | PASQUALE Donato
 
Rohini @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model Safe
Rohini @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model SafeRohini @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model Safe
Rohini @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model Safe
 
Introduction to neural network (Module 1).pptx
Introduction to neural network (Module 1).pptxIntroduction to neural network (Module 1).pptx
Introduction to neural network (Module 1).pptx
 
UNIT I INCEPTION OF INFORMATION DESIGN 20CDE09-ID
UNIT I INCEPTION OF INFORMATION DESIGN 20CDE09-IDUNIT I INCEPTION OF INFORMATION DESIGN 20CDE09-ID
UNIT I INCEPTION OF INFORMATION DESIGN 20CDE09-ID
 
Understanding Cybersecurity Breaches: Causes, Consequences, and Prevention
Understanding Cybersecurity Breaches: Causes, Consequences, and PreventionUnderstanding Cybersecurity Breaches: Causes, Consequences, and Prevention
Understanding Cybersecurity Breaches: Causes, Consequences, and Prevention
 
Lecture 6 - The effect of Corona effect in Power systems.pdf
Lecture 6 - The effect of Corona effect in Power systems.pdfLecture 6 - The effect of Corona effect in Power systems.pdf
Lecture 6 - The effect of Corona effect in Power systems.pdf
 
( Call  ) Girls Vasant Kunj Just 9873940964 High Class Model Shneha Patil
( Call  ) Girls Vasant Kunj Just 9873940964 High Class Model Shneha Patil( Call  ) Girls Vasant Kunj Just 9873940964 High Class Model Shneha Patil
( Call  ) Girls Vasant Kunj Just 9873940964 High Class Model Shneha Patil
 
Paharganj @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Arti Singh Top Model Safe
Paharganj @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Arti Singh Top Model SafePaharganj @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Arti Singh Top Model Safe
Paharganj @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Arti Singh Top Model Safe
 
Response & Safe AI at Summer School of AI at IIITH
Response & Safe AI at Summer School of AI at IIITHResponse & Safe AI at Summer School of AI at IIITH
Response & Safe AI at Summer School of AI at IIITH
 
Literature Reivew of Student Center Design
Literature Reivew of Student Center DesignLiterature Reivew of Student Center Design
Literature Reivew of Student Center Design
 

Apache phoenix: Past, Present and Future of SQL over HBAse

  • 1. Page1 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Apache Phoenix and HBase: Past, Present and Future of SQL over HBase Enis Soztutar (enis@hortonworks.com)
  • 2. Page2 © Hortonworks Inc. 2011 – 2014. All Rights Reserved About Me Enis Soztutar Committer and PMC member in Apache HBase, Phoenix, and Hadoop HBase/Phoenix team @Hortonworks Twitter @enissoz Disclaimer: Not a SQL expert!
  • 3. Page3 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Outline PART I – The Past (a.k.a. All the existing stuff)  Phoenix the basics  Architecture  Overview of existing Phoenix features PART II – The Present (a.k.a. All the recent stuff)  Look at recent releases  Transactions  Phoenix Query Server  Other features PART III – The Future (a.k.a. All the upcoming stuff)  Calcite integration  Phoenix – Hive
  • 4. Page4 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Part I – The Past All the existing stuff !
  • 5. Page5 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Obligatory Slide - Who uses Phoenix
  • 6. Page6 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Phoenix – The Basics • Hope everybody is familiar with HBase • Otherwise you are in the wrong talk! • What is wrong with pure-HBase? • HBase is a powerful, flexible and extensible “engine” • Too low level • Have to write java code to do anything! • Phoenix is relational layer over HBase • Also described as a SQL-Skin • Looking more and more like a generic SQL engine • Why not Hive / Spark SQL / other SQL-over-Hadoop • OTLP versus OLAP • As fast as HBase, 1 ms query, 10K-1M qps
  • 7. Page7 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Why SQL?
  • 8. Page8 © Hortonworks Inc. 2011 – 2014. All Rights Reserved From CDK Global slides https://phoenix.apache. org/presentations/Strata HadoopWorld.pdf
  • 9. Page9 © Hortonworks Inc. 2011 – 2014. All Rights Reserved HBase Architecture DataNode RegionServer 2 T:foo, region:a T:bar, region:54 T:foo, region:t Application HBase client DataNode RegionServer 1 T:foo, region:c T:bar, region:14 T:foo, region:d DataNode RegionServer 3 T:bar, region:32 T:foo, region:k ZooKeeper Quorum
  • 10. Page10 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Phoenix Architecture DataNode RegionServer 2 T:foo, region:c T:bar, region:54 T:foo, region:t Phoenix RPC endpoint px px Application Phoenix client / JDBC HBase client DataNode RegionServer 1 T:foo, region:c T:bar, region:14 T:foo, region:d Phoenix RPC endpoint px px DataNode RegionServer 3 T:SYSTEM.CATALOG T:bar, region:32 T:foo, region:k Phoenix RPC endpoint px px ZooKeeper Quorum
  • 11. Page11 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Phoenix Goodies SQL DataTypes Schemas / DDL / HBase table properties Composite Types (Composite Primary Key) Map existing HBase tables Write from HBase, read from Phoenix Salting Parallel Scan Skip scan Filter push down Statistics Collection / Guideposts
  • 12. Page12 © Hortonworks Inc. 2011 – 2014. All Rights Reserved DDL Example CREATE TABLE IF NOT EXISTS METRIC_RECORD ( METRIC_NAME VARCHAR, HOSTNAME VARCHAR, SERVER_TIME UNSIGNED_LONG NOT NULL METRIC_VALUE DOUBLE, … CONSTRAINT pk PRIMARY KEY (METRIC_NAME, HOSTNAME, SERVER_TIME)) DATA_BLOCK_ENCODING=’FAST_DIFF', TTL=604800, COMPRESSION=‘SNAPPY’ SPLIT ON ('a', 'k', 'm');
  • 13. Page13 © Hortonworks Inc. 2011 – 2014. All Rights Reserved METRIC_NAME HOSTNAME SERVER_TIME METRIC_VALUE Regionserver.readRequestCount cn011.hortonworks.com 1396743589 92045759 Regionserver.readRequestCount cn011.hortonworks.com 1396767589 93051916 Regionserver.readRequestCount cn011.hortonworks.com …. … Regionserver.readRequestCount cn012. hortonworks.com 1396743589 ….. … … … Regionserver.wal.bytesWritten cn011.hortonworks.com Regionserver.wal.bytesWritten …. …. … SORT ORDERSORTORDER HBASE ROW KEY OTHER COLUMNS
  • 14. Page14 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Parallel Scan SELECT * FROM METRIC_RECORD; CLIENT 4-CHUNK PARALLEL 1-WAY FULL SCAN OVER METRIC_RECORD Region1 Region2 Region3 Region4 Client RS3RS2 RS1 scanscanscanscan
  • 15. Page15 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Filter push down SELECT * FROM METRIC_RECORD WHERE SERVER_TIME > NOW() - 7; CLIENT 4-CHUNK PARALLEL 1-WAY FULL SCAN OVER METRIC_RECORD SERVER FILTER BY SERVER_TIME > DATE '2016-04-06 09:09:05.978’ Region1 Region2 Region3 Region4 Client RS3RS2RS1 scanscanscanscan Server-side Filter
  • 16. Page16 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Skip Scan SELECT * FROM METRIC_RECORD WHERE METRIC_NAME LIKE 'abc%' AND HOSTNAME in ('host1’, 'host2'); CLIENT 1-CHUNK PARALLEL 1-WAY SKIP SCAN ON 2 RANGES OVER METRIC_RECORD ['abc','host1'] - ['abd','host2'] Region1 Region2 Region3 Region4 Client RS3RS2RS1 Skip scan
  • 17. Page17 © Hortonworks Inc. 2011 – 2014. All Rights Reserved TopN SELECT * FROM METRIC_RECORD WHERE SERVER_TIME > NOW() - 7 ORDER BY HOSTNAME LIMIT 5; CLIENT 4-CHUNK PARALLEL 4-WAY FULL SCAN OVER METRIC_RECORD SERVER FILTER BY SERVER_TIME > … SERVER TOP 5 ROWS SORTED BY [HOSTNAME] CLIENT MERGE SORT Region1 Region2 Region3 Region4 Client RS3RS2RS1 scanscanscanscan Sort by HOSTNAME Return only 5 ROWS
  • 18. Page18 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Aggregation SELECT METRIC_NAME, HOSTNAME, AVG(METRIC_VALUE) FROM METRIC_RECORD WHERE SERVER_TIME > NOW() - 7 GROUP BY METRIC_NAME, HOSTNAME ORDER BY METRIC_NAME, HOSTNAME; CLIENT 4-CHUNK PARALLEL 1-WAY FULL SCAN OVER METRIC_RECORD SERVER FILTER BY SERVER_TIME > … SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY [METRIC_NAME, HOSTNAME] CLIENT MERGE SORT Region1 Region2 Region3 Region4 Client RS3RS2RS1 scanscanscanscan Return only aggregated data by METRIC_NAME, HOSTNAME
  • 19. Page19 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Joins and subqueries in Phoenix Grammar • Inner, Left, Right, Full outer join, Cross join • Semi-join / Anti-join Algorithms • Hash-join, sort-merge join • Hash-join table is computed and pushed to each regionserver from client Optimizations • Predicate push-down • PK-to-FK join optimization • Global index with missing columns • Correlated query rewrite
  • 20. Page20 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Joins and subqueries in Phoenix Phoenix can execute most of TPC-H queries! No nested loop join With Calcite support, more improvements soon No statistical Guided join selection yet Not very good at executing very big joins • No generic YARN / Tez execution layer • But Hive / Spark support for generic DAG execution
  • 21. Page21 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Secondary Indexes HBase table is a sorted map • Everything in HBase is sorted in primary key order • Full or partial scans in sort order is very efficient in HBase • Sort data differently with secondary index dimensions Two types • Global index • Local index Query • Indexes are “covered” • Indexes are automatically selected from queries • Only covered columns are returned from index without going back to data table
  • 22. Page22 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Global and Local Index Global Index • A single instance for all table data in a different sort order • A different HBase table per index • Optimized for read-heavy use cases • Can be one edit “behind” actual primary data • Transactional tables indices have ACID guarantees • Different consistency / durability for mutable / immutable tables Local Index • Multiple mini-instances per region • Uses same HBase table, different cf • Optimized for write-heavy use cases • Atomic commit and visibility (coming soon) • Queries have to ask all regions for relevant data from index
  • 23. Page23 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Part II – The Present All the recent stuff !
  • 24. Page24 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Release Note Highlights 4.4 • Functional Indexes • UDFs • Query Server • UNION ALL • MR Index Build • Spark Integration • Date built-in functions 4.5 • Client-side per-statement metrics • SELECT without FROM • ALTER TABLE with VIEWS • Math and Array built-in functions
  • 25. Page25 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Release Note Highlights 4.6 • ROW_TIMESTAMP for HBase native timestamps • Support for correlate variable • Support for un-nesting arrays • Web-app for visualizing trace info (alpha) 4.7 • Transaction support • Enhanced secondary index consistency guarantees • Statistics improvements • Perf improvements
  • 26. Page26 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Row Timestamps A pseudo-column for HBase native timestamps (versions) Enables setting and querying cell timestamps Perfect for time-series use cases • Combine with FIFO / Date Tiered Compaction policies • And HBase scan file pruning based on min-max ts for very efficient scans CREATE TABLE METRICS_TABLE ( CREATED_DATE NOT NULL DATE, METRIC_ID NOT NULL CHAR(15), METRIC_VALUE LONG CONSTRAINT PK PRIMARY KEY(CREATED_DATE ROW_TIMESTAMP, METRIC_ID)) SALT_BUCKETS = 8;
  • 27. Page27 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Transactions Uses Tephra Snapshot isolation semantics Completely optional. • Can be enabled per-table (TRANSACTIONAL=true) • Transactional and non-transactional tables can live side by side Transactions see their own uncommitted data Released in 4.7, will GA in 5.0 Optimistic Concurrency Control • No locking for rows • Transactions have to roll back and undo their writes in case of conflict • Cost of conflict is higher
  • 28. Page28 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Tephra Architecture RegionServer 2 Tephra / HBase Client RegionServer 1 RegionServer 3 HBase client ZooKeeper Quorum Tephra Trx Manager (active) Tephra Trx Manager (standby)
  • 29. Page29 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Transaction Lifecycle From Tephra presentation http://www.slideshare.n et/alexbaranau/transacti ons-over-hbase
  • 30. Page30 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Phoenix Query Server Similar to HBase REST Server / Hive Server 2 Built on top of Calcite’s Avatica Server with Phoenix bindings Embeds a Phoenix thick client inside No client side sorting / join! Protobuf-3.0 over HTTP protocol Has a (thin) JDBC driver Allows ODBC driver for Phoenix
  • 31. Page31 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Phoenix architecture revisited (thick client) RegionServer 2 T:foo, region:d Phoenix RPC endpoint px Application RegionServer 1 T:foo, region:d Phoenix RPC endpoint px RegionServer 3 T:foo, region:d Phoenix RPC endpoint px HBase client Phoenix client / JDBC
  • 32. Page32 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Phoenix Query Server Phoenix Query Server (thin client) RegionServer 2 T:foo, region:d Phoenix RPC endpoint px Application Phoenix thin client / JDBC RegionServer 1 T:foo, region:d Phoenix RPC endpoint px RegionServer 3 T:foo, region:d Phoenix RPC endpoint px Phoenix client / JDBC HBase client Phoenix Query Server Phoenix client / JDBC HBase client Phoenix Query Server Phoenix client / JDBC HBase client
  • 33. Page33 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Other new features (4.8+) Shaded client by default. No more library dependency problems! Phoenix schema mapping to HBase namespace • Allows using isolation and security features of HBase namespaces • Standard SQL syntax: CREATE SCHEMA FOO; USE FOO; LIMIT / OFFSET • We already had LIMIT. Now we have OFFSET • Together with Row-Value-Constructs, covers most of cursor use cases
  • 34. Page34 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Part III – The Future All the upcoming stuff !
  • 35. Page35 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Local Index • Local Index re-implemented • Instead of a different table, now local index data is kept within the same data table • Local index data goes into a different column family • Index and data is committed together atomically without external transactions • Bunch of stability improvements with region splits and merges
  • 36. Page36 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Calcite Integration Calcite is a framework for: • Query parser • Compiler • Planner • Cost based optimizer SQL-92 compliant Based on relational algebra Cost based optimizer with default rules + pluggable rules per-backend Used by Hive / Drill / Kylin / Samza, etc.
  • 37. Page37 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Calcite Integration
  • 38. Page38 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Phoenix - Hive integration Hive is a very rich and generic execution engine Uses Tez + YARN to execute arbitrary DAG Hive integration enables big joins and other Hive features Phoenix DDL with HiveQL Data insert / update delete (DML) with HiveQL Predicate pushdown, salting, partitioning, partition pruning, etc Can use secondary indexes as well since it uses Phoenix compiler https://issues.apache.org/jira/browse/PHOENIX-2743
  • 39. Page39 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Future<Phoenix> JSON support TPC-H / Microstrategy / Tableau queries Sqoop integration Support Omid based transactions Dogfooding within the Hadoop-ecosystem • Ambari Metrics Service (AMS) uses Phoenix • YARN will soon use HBase / Phoenix (ATS) STRUCT type Improvements to cost based optimization Security and other HBase features used from Phoenix See https://phoenix.apache.org/roadmap.html
  • 40. Page40 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Further Reference Even more info on https://phoenix.apache.org  New Features: https://phoenix.apache.org/recent.html  Roadmap: https://phoenix.apache.org/roadmap.html Get involved in mailing lists  user@phoenix.apache.org  dev@phoenix.apache.org
  • 41. Page41 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Thanks Q & A

Editor's Notes

  1. - What is hbase? - What is it good at? - How do you use it in my applications? Context, first principals
  2. Understand the world it lives in and it’s building blocks
  3. Understand the world it lives in and it’s building blocks
  4. Understand the world it lives in and it’s building blocks