SlideShare a Scribd company logo
1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Phoenix + Apache HBase
An Enterprise Grade Data Warehouse
Ankit Singhal , Rajeshbabu , Josh Elser
June, 30 2016
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
About us!!
– Committer and member of Apache Phoenix PMC
– MTS at Hortonworks.
Ankit Singhal
– Committer and member of Apache Phoenix PMC
– Committer in Apache HBase
– MTS at Hortonworks.
– Committer in Apache Phoenix
– Committer and Member of Apache Calcite PMC
– MTS at Hortonworks.
Josh Elser
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Phoenix & HBase as an Enterprise Data Warehouse
Use Cases
Phoenix Query server
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Data Warehouse
EDW helps organize and aggregate analytical data from various functional domains and
serves as a critical repository for organizations’ operations.
Data Warehouse
ETL Visualization
or BI
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Phoenix Offerings and Interoperability:-
ETL Data Warehouse Visualization & BI
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HBase client
Phoenix client
Phx coproc
Phx coproc
Phx coproc
RegionServer RegionServer
HBase & Phoenix
HBase , a distributed NoSQL store
Phoenix , provides OLTP and Analytics over HBase
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Open Source Data Warehouse
Hardware cost
Specialized H/WCommodity H/W
LicensingcostNoCost SMPMPP
Source MPP
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Phoenix & HBase as a Data Warehouse
Run on
True MPP
O/S and
OLTP and
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Phoenix & HBase as a Data Warehouse
for storage
for memory
Open to
Third party
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Phoenix & HBase as a Data Warehouse
for disaster
Fully ACID
for Data
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Phoenix & HBase as a Data Warehouse
Modeling &
Or upgrade
Data Backup
and recovery
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Phoenix & HBase as an Enterprise Data Warehouse
Use cases
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Who uses Phoenix !!
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Analytics Use case - (Web Advertising company)
 Functional Requirements
– Create a single source of truth
– Cross dimensional query on 50+ dimension and 80+ metrics
– Support fast Top-N queries
 Non-functional requirements
– Less than 3 second Response time for slice and dice
– 250+ concurrent users
– 100k+ Analytics queries/day
– Highly available
– Linear scalability
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Data Warehouse Capacity
 Data Size(ETL Input)
– 24TB/day of raw data system wide
– 25 Billion of impressions
 HBase Input(cube)
– 6 Billion rows of aggregated data(100GB/day)
 HBase Cluster size
– 65 Nodes of HBase
– 520 TB of disk
– 4.1 TB of memory
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Use Case Architecture
Click Tracking
ETL Filter Aggregate
In- Memory
ETL Filter Aggregate
Batch Processing
Data Ingestion Analytics
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Cubes are stored in
slice and
dice query
to SQL
Analytics Data Warehouse Architecture
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Time Series Use Case- (Apache Ambari)
 Functional requirements
– Store all cluster metrics collected every second(10k to 100k metrics/second)
– Optimize storage/access for time series data
 Non-functional requirements
– Near real time response time
– Scalable
– Real time ingestion
Ambari Metrics System (AMS)
19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
AMS architecture
20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Phoenix & HBase as an Enterprise Data Warehouse
Use Cases
21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Schema Design
 Most important criteria for driving overall performance of queries on the table
 Primary key should be composed from most-used predicate columns in the queries
 In most cases, leading part of primary key should help to convert queries into point
lookups or range scans in HBase
Primary key design
22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Schema Design
 Use salting to alleviate write hot-spotting
– Number of buckets should be equal to number of RegionServers
 Otherwise, try to presplit the table if you know the row key data set
Salting vs pre-split
23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Schema Design
 Use block encoding and/or compression for better performance
 Use region replication for read high availability
Table properties
24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Schema Design
 Set UPDATE_CACHE_FREQUENCY to bigger value to avoid frequently touching server for
metadata updates
Table properties
25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Schema Design
 Divide columns into multiple column families if there are rarely accessed columns
– HBase reads only the files of column families specified in the query to reduce I/O
pk1 pk2
Col1 Col2 Col3 Col4 Col5 Col6 Col7
Frequently accessing columns Rarely accessing columns
26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Secondary Indexes
 Global indexes
– Optimized for read heavy use cases
CREATE INDEX idx on table(…)
 Local Indexes
– Optimized for write heavy and space constrained use cases
CREATE LOCAL INDEX idx on table(…)
 Functional indexes
– Allow you to create indexes on arbitrary expressions.
27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Secondary Indexes
 Use covered indexes to efficiently scan over the index table instead of primary table.
CREATE INDEX idx ON table(…) include(…)
 Pass index hint to guide query optimizer to select the right index for query
SELECT /*+INDEX(<table> <index>)*/..
28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Row Timestamp Column
 Maps HBase native row timestamp to a Phoenix column
 Leverage optimizations provided by HBase like setting the minimum and maximum time
range for scans to entirely skip the store files which don’t fall in that time range.
 Perfect for time series use cases.
 Syntax
29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Use of Statistics
Region A
Region F
Region L
Region R
Chunk A
Chunk C
Chunk F
Chunk I
Chunk L
Chunk O
Chunk R
Chunk U
Client Client
30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Skip Scan
 Phoenix supports skip scan to jump to matching keys directly when the query has key
sets in predicate
AND HOSTNAME in ('host1’, 'host2');
['abc','host1'] - ['abd','host2']
Skip scan
31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Join optimizations
 Hash Join
– Hash join outperforms other types of join algorithms when one of the relations is smaller or
records matching the predicate should fit into memory
 Sort-Merge join
– When the relations are very big in size then use the sort-merge join algorithm
– For multiple inner-join queries, Phoenix applies a star-join optimization by default. Use this hint in
the query if the overall size of all right-hand-side tables would exceed the memory size limit.
– Prevents the usage of child-parent-join optimization.
32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Optimize Writes
 Upsert values
– Call it multiple times before commit for batching mutations
– Use prepared statement when you run the query multiple times
 Upsert select
– Configure phoenix.mutate.batchSize based on row size
– Set auto-commit to true for writing scan results directly to HBase.
– Set auto-commit to true while running upsert selects on the same table so that writes happen at
33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Some important hints
34 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Additional References
 For some more optimizations you can refer to these documents
35 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Phoenix & HBase as an Enterprise Data Warehouse
Use Cases
Phoenix Query Server
36 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Phoenix Query Server
 A standalone service that proxies user requests to HBase/Phoenix
– Optional
 Reference client implementation via JDBC
– ”Thick” versus “Thin”
 First introduced in Apache Phoenix 4.4.0
 Built on Apache Calcite’s Avatica
– ”A framework for building database drivers”
37 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Traditional Apache Phoenix RPC Model
HBase client
Phoenix client
Phx coproc
Phx coproc
Phx coproc
RegionServer RegionServer
38 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Query Server Model
HBase client
Phoenix client
Phx coproc
Phx coproc
Phx coproc
RegionServer RegionServer
Query Server
39 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Query Server Technology
 HTTP Server and wire API definition
 Pluggable serialization
– Google Protocol Buffers
 “Thin” JDBC Driver (over HTTP)
 Other goodies!
– Pluggable metrics system
– TCK (technology compatibility kit)
– SPNEGO for Kerberos authentication
– Horizontally scalable with load balancing
40 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Query Server Clients
 Go language database/sql/driver
 .NET driver
– Built by, also available from Hortonworks
 Python DB API v2.0 (not “battle tested”)
Client enablement
41 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Phoenix & HBase as an Enterprise Data Warehouse
Use Cases
Phoenix Query Server
42 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
We hope to see you all migrating to Phoenix & HBase and expecting more questions on the user mailing
Get involved in mailing lists:-
You can reach us on:-
Phoenix & HBase
43 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Thank You

More Related Content

What's hot

Intro to HBase
Intro to HBaseIntro to HBase
Intro to HBase
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise NecessityApache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
Wes McKinney
Securing Hadoop with Apache Ranger
Securing Hadoop with Apache RangerSecuring Hadoop with Apache Ranger
Securing Hadoop with Apache Ranger
DataWorks Summit
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeSimplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
Alluxio, Inc.
Manage Add-On Services with Apache Ambari
Manage Add-On Services with Apache AmbariManage Add-On Services with Apache Ambari
Manage Add-On Services with Apache Ambari
DataWorks Summit
Scaling HBase for Big Data
Scaling HBase for Big DataScaling HBase for Big Data
Scaling HBase for Big Data
Salesforce Engineering
Change Data Feed in Delta
Change Data Feed in DeltaChange Data Feed in Delta
Change Data Feed in Delta
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
Michael Stack
Apache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the CoversApache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the Covers
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
Dremio Corporation
HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
HBaseCon 2012 | HBase Schema Design - Ian Varley, SalesforceHBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
Cloudera, Inc.
Apache HBase Performance Tuning
Apache HBase Performance TuningApache HBase Performance Tuning
Apache HBase Performance Tuning
Lars Hofhansl
Tuning Apache Phoenix/HBase
Tuning Apache Phoenix/HBaseTuning Apache Phoenix/HBase
Tuning Apache Phoenix/HBase
Anil Gupta
Apache Ambari: Past, Present, Future
Apache Ambari: Past, Present, FutureApache Ambari: Past, Present, Future
Apache Ambari: Past, Present, Future
Apache Phoenix: Transforming HBase into a SQL Database
Apache Phoenix: Transforming HBase into a SQL DatabaseApache Phoenix: Transforming HBase into a SQL Database
Apache Phoenix: Transforming HBase into a SQL Database
DataWorks Summit
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
DataWorks Summit/Hadoop Summit
Query Optimizer – MySQL vs. PostgreSQL
Query Optimizer – MySQL vs. PostgreSQLQuery Optimizer – MySQL vs. PostgreSQL
Query Optimizer – MySQL vs. PostgreSQL
Christian Antognini
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing
DataWorks Summit
Query Compilation in Impala
Query Compilation in ImpalaQuery Compilation in Impala
Query Compilation in Impala
Cloudera, Inc.

What's hot (20)

Intro to HBase
Intro to HBaseIntro to HBase
Intro to HBase
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise NecessityApache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
Securing Hadoop with Apache Ranger
Securing Hadoop with Apache RangerSecuring Hadoop with Apache Ranger
Securing Hadoop with Apache Ranger
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeSimplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
Manage Add-On Services with Apache Ambari
Manage Add-On Services with Apache AmbariManage Add-On Services with Apache Ambari
Manage Add-On Services with Apache Ambari
Scaling HBase for Big Data
Scaling HBase for Big DataScaling HBase for Big Data
Scaling HBase for Big Data
Change Data Feed in Delta
Change Data Feed in DeltaChange Data Feed in Delta
Change Data Feed in Delta
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
Apache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the CoversApache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the Covers
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
HBaseCon 2012 | HBase Schema Design - Ian Varley, SalesforceHBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
Apache HBase Performance Tuning
Apache HBase Performance TuningApache HBase Performance Tuning
Apache HBase Performance Tuning
Tuning Apache Phoenix/HBase
Tuning Apache Phoenix/HBaseTuning Apache Phoenix/HBase
Tuning Apache Phoenix/HBase
Apache Ambari: Past, Present, Future
Apache Ambari: Past, Present, FutureApache Ambari: Past, Present, Future
Apache Ambari: Past, Present, Future
Apache Phoenix: Transforming HBase into a SQL Database
Apache Phoenix: Transforming HBase into a SQL DatabaseApache Phoenix: Transforming HBase into a SQL Database
Apache Phoenix: Transforming HBase into a SQL Database
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
Query Optimizer – MySQL vs. PostgreSQL
Query Optimizer – MySQL vs. PostgreSQLQuery Optimizer – MySQL vs. PostgreSQL
Query Optimizer – MySQL vs. PostgreSQL
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing
Query Compilation in Impala
Query Compilation in ImpalaQuery Compilation in Impala
Query Compilation in Impala

Similar to Apache Phoenix + Apache HBase

HBase Read High Availability Using Timeline Consistent Region Replicas
HBase  Read High Availability Using Timeline Consistent Region ReplicasHBase  Read High Availability Using Timeline Consistent Region Replicas
HBase Read High Availability Using Timeline Consistent Region Replicas
Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0
DataWorks Summit
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBaseApache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
DataWorks Summit/Hadoop Summit
HBase Read High Availabilty using Timeline Consistent Region Replicas
HBase Read High Availabilty using Timeline Consistent Region ReplicasHBase Read High Availabilty using Timeline Consistent Region Replicas
HBase Read High Availabilty using Timeline Consistent Region Replicas
DataWorks Summit
Big data processing engines, Atlanta Meetup 4/30
Big data processing engines, Atlanta Meetup 4/30Big data processing engines, Atlanta Meetup 4/30
Big data processing engines, Atlanta Meetup 4/30
Ashish Narasimham
Apache Phoenix and HBase - Hadoop Summit Tokyo, Japan
Apache Phoenix and HBase - Hadoop Summit Tokyo, JapanApache Phoenix and HBase - Hadoop Summit Tokyo, Japan
Apache Phoenix and HBase - Hadoop Summit Tokyo, Japan
Ankit Singhal
An Overview on Optimization in Apache Hive: Past, Present Future
An Overview on Optimization in Apache Hive: Past, Present FutureAn Overview on Optimization in Apache Hive: Past, Present Future
An Overview on Optimization in Apache Hive: Past, Present Future
DataWorks Summit/Hadoop Summit
HBaseCon 2013: Integration of Apache Hive and HBase
HBaseCon 2013: Integration of Apache Hive and HBaseHBaseCon 2013: Integration of Apache Hive and HBase
HBaseCon 2013: Integration of Apache Hive and HBase
Cloudera, Inc.
Apache HBase Internals you hoped you Never Needed to Understand
Apache HBase Internals you hoped you Never Needed to UnderstandApache HBase Internals you hoped you Never Needed to Understand
Apache HBase Internals you hoped you Never Needed to Understand
Josh Elser
Enterprise data science at scale
Enterprise data science at scaleEnterprise data science at scale
Enterprise data science at scale
Carolyn Duby
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
Spark Summit
Driving Enterprise Data Governance for Big Data Systems through Apache Falcon
Driving Enterprise Data Governance for Big Data Systems through Apache FalconDriving Enterprise Data Governance for Big Data Systems through Apache Falcon
Driving Enterprise Data Governance for Big Data Systems through Apache Falcon
DataWorks Summit
Hive present-and-feature-shanghai
Hive present-and-feature-shanghaiHive present-and-feature-shanghai
Hive present-and-feature-shanghai
Yifeng Jiang
Data Governance in Apache Falcon - Hadoop Summit Brussels 2015
Data Governance in Apache Falcon - Hadoop Summit Brussels 2015 Data Governance in Apache Falcon - Hadoop Summit Brussels 2015
Data Governance in Apache Falcon - Hadoop Summit Brussels 2015
Seetharam Venkatesh
An Overview on Optimization in Apache Hive: Past, Present, Future
An Overview on Optimization in Apache Hive: Past, Present, FutureAn Overview on Optimization in Apache Hive: Past, Present, Future
An Overview on Optimization in Apache Hive: Past, Present, Future
DataWorks Summit
Moving towards enterprise ready Hadoop clusters on the cloud
Moving towards enterprise ready Hadoop clusters on the cloudMoving towards enterprise ready Hadoop clusters on the cloud
Moving towards enterprise ready Hadoop clusters on the cloud
DataWorks Summit/Hadoop Summit
Hive 3 - a new horizon
Hive 3 - a new horizonHive 3 - a new horizon
Hive 3 - a new horizon
Thejas Nair
Hadoop in adtech
Hadoop in adtechHadoop in adtech
Hadoop in adtech
Yuta Imai
Hadoop & cloud storage object store integration in production (final)
Hadoop & cloud storage  object store integration in production (final)Hadoop & cloud storage  object store integration in production (final)
Hadoop & cloud storage object store integration in production (final)
Chris Nauroth
File Format Benchmark - Avro, JSON, ORC & Parquet
File Format Benchmark - Avro, JSON, ORC & ParquetFile Format Benchmark - Avro, JSON, ORC & Parquet
File Format Benchmark - Avro, JSON, ORC & Parquet
DataWorks Summit/Hadoop Summit

Similar to Apache Phoenix + Apache HBase (20)

HBase Read High Availability Using Timeline Consistent Region Replicas
HBase  Read High Availability Using Timeline Consistent Region ReplicasHBase  Read High Availability Using Timeline Consistent Region Replicas
HBase Read High Availability Using Timeline Consistent Region Replicas
Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBaseApache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
HBase Read High Availabilty using Timeline Consistent Region Replicas
HBase Read High Availabilty using Timeline Consistent Region ReplicasHBase Read High Availabilty using Timeline Consistent Region Replicas
HBase Read High Availabilty using Timeline Consistent Region Replicas
Big data processing engines, Atlanta Meetup 4/30
Big data processing engines, Atlanta Meetup 4/30Big data processing engines, Atlanta Meetup 4/30
Big data processing engines, Atlanta Meetup 4/30
Apache Phoenix and HBase - Hadoop Summit Tokyo, Japan
Apache Phoenix and HBase - Hadoop Summit Tokyo, JapanApache Phoenix and HBase - Hadoop Summit Tokyo, Japan
Apache Phoenix and HBase - Hadoop Summit Tokyo, Japan
An Overview on Optimization in Apache Hive: Past, Present Future
An Overview on Optimization in Apache Hive: Past, Present FutureAn Overview on Optimization in Apache Hive: Past, Present Future
An Overview on Optimization in Apache Hive: Past, Present Future
HBaseCon 2013: Integration of Apache Hive and HBase
HBaseCon 2013: Integration of Apache Hive and HBaseHBaseCon 2013: Integration of Apache Hive and HBase
HBaseCon 2013: Integration of Apache Hive and HBase
Apache HBase Internals you hoped you Never Needed to Understand
Apache HBase Internals you hoped you Never Needed to UnderstandApache HBase Internals you hoped you Never Needed to Understand
Apache HBase Internals you hoped you Never Needed to Understand
Enterprise data science at scale
Enterprise data science at scaleEnterprise data science at scale
Enterprise data science at scale
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
Driving Enterprise Data Governance for Big Data Systems through Apache Falcon
Driving Enterprise Data Governance for Big Data Systems through Apache FalconDriving Enterprise Data Governance for Big Data Systems through Apache Falcon
Driving Enterprise Data Governance for Big Data Systems through Apache Falcon
Hive present-and-feature-shanghai
Hive present-and-feature-shanghaiHive present-and-feature-shanghai
Hive present-and-feature-shanghai
Data Governance in Apache Falcon - Hadoop Summit Brussels 2015
Data Governance in Apache Falcon - Hadoop Summit Brussels 2015 Data Governance in Apache Falcon - Hadoop Summit Brussels 2015
Data Governance in Apache Falcon - Hadoop Summit Brussels 2015
An Overview on Optimization in Apache Hive: Past, Present, Future
An Overview on Optimization in Apache Hive: Past, Present, FutureAn Overview on Optimization in Apache Hive: Past, Present, Future
An Overview on Optimization in Apache Hive: Past, Present, Future
Moving towards enterprise ready Hadoop clusters on the cloud
Moving towards enterprise ready Hadoop clusters on the cloudMoving towards enterprise ready Hadoop clusters on the cloud
Moving towards enterprise ready Hadoop clusters on the cloud
Hive 3 - a new horizon
Hive 3 - a new horizonHive 3 - a new horizon
Hive 3 - a new horizon
Hadoop in adtech
Hadoop in adtechHadoop in adtech
Hadoop in adtech
Hadoop & cloud storage object store integration in production (final)
Hadoop & cloud storage  object store integration in production (final)Hadoop & cloud storage  object store integration in production (final)
Hadoop & cloud storage object store integration in production (final)
File Format Benchmark - Avro, JSON, ORC & Parquet
File Format Benchmark - Avro, JSON, ORC & ParquetFile Format Benchmark - Avro, JSON, ORC & Parquet
File Format Benchmark - Avro, JSON, ORC & Parquet

More from DataWorks Summit/Hadoop Summit

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
DataWorks Summit/Hadoop Summit
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
DataWorks Summit/Hadoop Summit
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
DataWorks Summit/Hadoop Summit
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
DataWorks Summit/Hadoop Summit
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
DataWorks Summit/Hadoop Summit
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
DataWorks Summit/Hadoop Summit
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
DataWorks Summit/Hadoop Summit
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
DataWorks Summit/Hadoop Summit
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
DataWorks Summit/Hadoop Summit
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
DataWorks Summit/Hadoop Summit
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
DataWorks Summit/Hadoop Summit
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
DataWorks Summit/Hadoop Summit
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
DataWorks Summit/Hadoop Summit
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
DataWorks Summit/Hadoop Summit
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
DataWorks Summit/Hadoop Summit
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
DataWorks Summit/Hadoop Summit
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
DataWorks Summit/Hadoop Summit
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
DataWorks Summit/Hadoop Summit
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
DataWorks Summit/Hadoop Summit
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
DataWorks Summit/Hadoop Summit

More from DataWorks Summit/Hadoop Summit (20)

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes

Recently uploaded

20240702 QFM021 Machine Intelligence Reading List June 2024
20240702 QFM021 Machine Intelligence Reading List June 202420240702 QFM021 Machine Intelligence Reading List June 2024
20240702 QFM021 Machine Intelligence Reading List June 2024
Matthew Sinclair
Transcript: Details of description part II: Describing images in practice - T...
Transcript: Details of description part II: Describing images in practice - T...Transcript: Details of description part II: Describing images in practice - T...
Transcript: Details of description part II: Describing images in practice - T...
BookNet Canada
find out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challengesfind out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challenges
20240705 QFM024 Irresponsible AI Reading List June 2024
20240705 QFM024 Irresponsible AI Reading List June 202420240705 QFM024 Irresponsible AI Reading List June 2024
20240705 QFM024 Irresponsible AI Reading List June 2024
Matthew Sinclair
What’s New in Teams Calling, Meetings and Devices May 2024
What’s New in Teams Calling, Meetings and Devices May 2024What’s New in Teams Calling, Meetings and Devices May 2024
What’s New in Teams Calling, Meetings and Devices May 2024
Stephanie Beckett
UiPath Community Day Kraków: Devs4Devs Conference
UiPath Community Day Kraków: Devs4Devs ConferenceUiPath Community Day Kraków: Devs4Devs Conference
UiPath Community Day Kraków: Devs4Devs Conference
Knowledge and Prompt Engineering Part 2 Focus on Prompt Design Approaches
Knowledge and Prompt Engineering Part 2 Focus on Prompt Design ApproachesKnowledge and Prompt Engineering Part 2 Focus on Prompt Design Approaches
Knowledge and Prompt Engineering Part 2 Focus on Prompt Design Approaches
Earley Information Science
Recent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS InfrastructureRecent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS Infrastructure
GDG Cloud Southlake #34: Neatsun Ziv: Automating Appsec
GDG Cloud Southlake #34: Neatsun Ziv: Automating AppsecGDG Cloud Southlake #34: Neatsun Ziv: Automating Appsec
GDG Cloud Southlake #34: Neatsun Ziv: Automating Appsec
James Anderson
K2G - Insurtech Innovation EMEA Award 2024
K2G - Insurtech Innovation EMEA Award 2024K2G - Insurtech Innovation EMEA Award 2024
K2G - Insurtech Innovation EMEA Award 2024
The Digital Insurer
Implementations of Fused Deposition Modeling in real world
Implementations of Fused Deposition Modeling  in real worldImplementations of Fused Deposition Modeling  in real world
Implementations of Fused Deposition Modeling in real world
Emerging Tech
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
Kief Morris
Why do You Have to Redesign?_Redesign Challenge Day 1
Why do You Have to Redesign?_Redesign Challenge Day 1Why do You Have to Redesign?_Redesign Challenge Day 1
Why do You Have to Redesign?_Redesign Challenge Day 1
Coordinate Systems in FME 101 - Webinar Slides
Coordinate Systems in FME 101 - Webinar SlidesCoordinate Systems in FME 101 - Webinar Slides
Coordinate Systems in FME 101 - Webinar Slides
Safe Software
Running a Go App in Kubernetes: CPU Impacts
Running a Go App in Kubernetes: CPU ImpactsRunning a Go App in Kubernetes: CPU Impacts
Running a Go App in Kubernetes: CPU Impacts
@Call @Girls Thiruvananthapuram 🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cu...
@Call @Girls Thiruvananthapuram  🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cu...@Call @Girls Thiruvananthapuram  🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cu...
@Call @Girls Thiruvananthapuram 🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cu...
@Call @Girls Pune 0000000000 Riya Khan Beautiful Girl any Time
@Call @Girls Pune 0000000000 Riya Khan Beautiful Girl any Time@Call @Girls Pune 0000000000 Riya Khan Beautiful Girl any Time
@Call @Girls Pune 0000000000 Riya Khan Beautiful Girl any Time
Quality Patents: Patents That Stand the Test of Time
Quality Patents: Patents That Stand the Test of TimeQuality Patents: Patents That Stand the Test of Time
Quality Patents: Patents That Stand the Test of Time
Aurora Consulting
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Chris Swan
20240704 QFM023 Engineering Leadership Reading List June 2024
20240704 QFM023 Engineering Leadership Reading List June 202420240704 QFM023 Engineering Leadership Reading List June 2024
20240704 QFM023 Engineering Leadership Reading List June 2024
Matthew Sinclair

Recently uploaded (20)

20240702 QFM021 Machine Intelligence Reading List June 2024
20240702 QFM021 Machine Intelligence Reading List June 202420240702 QFM021 Machine Intelligence Reading List June 2024
20240702 QFM021 Machine Intelligence Reading List June 2024
Transcript: Details of description part II: Describing images in practice - T...
Transcript: Details of description part II: Describing images in practice - T...Transcript: Details of description part II: Describing images in practice - T...
Transcript: Details of description part II: Describing images in practice - T...
find out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challengesfind out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challenges
20240705 QFM024 Irresponsible AI Reading List June 2024
20240705 QFM024 Irresponsible AI Reading List June 202420240705 QFM024 Irresponsible AI Reading List June 2024
20240705 QFM024 Irresponsible AI Reading List June 2024
What’s New in Teams Calling, Meetings and Devices May 2024
What’s New in Teams Calling, Meetings and Devices May 2024What’s New in Teams Calling, Meetings and Devices May 2024
What’s New in Teams Calling, Meetings and Devices May 2024
UiPath Community Day Kraków: Devs4Devs Conference
UiPath Community Day Kraków: Devs4Devs ConferenceUiPath Community Day Kraków: Devs4Devs Conference
UiPath Community Day Kraków: Devs4Devs Conference
Knowledge and Prompt Engineering Part 2 Focus on Prompt Design Approaches
Knowledge and Prompt Engineering Part 2 Focus on Prompt Design ApproachesKnowledge and Prompt Engineering Part 2 Focus on Prompt Design Approaches
Knowledge and Prompt Engineering Part 2 Focus on Prompt Design Approaches
Recent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS InfrastructureRecent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS Infrastructure
GDG Cloud Southlake #34: Neatsun Ziv: Automating Appsec
GDG Cloud Southlake #34: Neatsun Ziv: Automating AppsecGDG Cloud Southlake #34: Neatsun Ziv: Automating Appsec
GDG Cloud Southlake #34: Neatsun Ziv: Automating Appsec
K2G - Insurtech Innovation EMEA Award 2024
K2G - Insurtech Innovation EMEA Award 2024K2G - Insurtech Innovation EMEA Award 2024
K2G - Insurtech Innovation EMEA Award 2024
Implementations of Fused Deposition Modeling in real world
Implementations of Fused Deposition Modeling  in real worldImplementations of Fused Deposition Modeling  in real world
Implementations of Fused Deposition Modeling in real world
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
Why do You Have to Redesign?_Redesign Challenge Day 1
Why do You Have to Redesign?_Redesign Challenge Day 1Why do You Have to Redesign?_Redesign Challenge Day 1
Why do You Have to Redesign?_Redesign Challenge Day 1
Coordinate Systems in FME 101 - Webinar Slides
Coordinate Systems in FME 101 - Webinar SlidesCoordinate Systems in FME 101 - Webinar Slides
Coordinate Systems in FME 101 - Webinar Slides
Running a Go App in Kubernetes: CPU Impacts
Running a Go App in Kubernetes: CPU ImpactsRunning a Go App in Kubernetes: CPU Impacts
Running a Go App in Kubernetes: CPU Impacts
@Call @Girls Thiruvananthapuram 🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cu...
@Call @Girls Thiruvananthapuram  🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cu...@Call @Girls Thiruvananthapuram  🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cu...
@Call @Girls Thiruvananthapuram 🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cu...
@Call @Girls Pune 0000000000 Riya Khan Beautiful Girl any Time
@Call @Girls Pune 0000000000 Riya Khan Beautiful Girl any Time@Call @Girls Pune 0000000000 Riya Khan Beautiful Girl any Time
@Call @Girls Pune 0000000000 Riya Khan Beautiful Girl any Time
Quality Patents: Patents That Stand the Test of Time
Quality Patents: Patents That Stand the Test of TimeQuality Patents: Patents That Stand the Test of Time
Quality Patents: Patents That Stand the Test of Time
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
20240704 QFM023 Engineering Leadership Reading List June 2024
20240704 QFM023 Engineering Leadership Reading List June 202420240704 QFM023 Engineering Leadership Reading List June 2024
20240704 QFM023 Engineering Leadership Reading List June 2024

Apache Phoenix + Apache HBase

  • 1. 1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Phoenix + Apache HBase An Enterprise Grade Data Warehouse Ankit Singhal , Rajeshbabu , Josh Elser June, 30 2016
  • 2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved About us!! – Committer and member of Apache Phoenix PMC – MTS at Hortonworks. Ankit Singhal – Committer and member of Apache Phoenix PMC – Committer in Apache HBase – MTS at Hortonworks. RajeshBabu – Committer in Apache Phoenix – Committer and Member of Apache Calcite PMC – MTS at Hortonworks. Josh Elser
  • 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda Phoenix & HBase as an Enterprise Data Warehouse Use Cases Optimizations Phoenix Query server Q&A
  • 4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Data Warehouse EDW helps organize and aggregate analytical data from various functional domains and serves as a critical repository for organizations’ operations. STAGING Files IOT data Data Warehouse Mart OLTP ETL Visualization or BI
  • 5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Phoenix Offerings and Interoperability:- ETL Data Warehouse Visualization & BI
  • 6. 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Table,a,123 Table,,123 RegionServer HDFS HBase client Phoenix client Phx coproc ZooKeeper Table,b,123 Table,a,123 Phx coproc Table,c,123 Table,b,123 Phx coproc RegionServer RegionServer Application HBase & Phoenix HBase , a distributed NoSQL store Phoenix , provides OLTP and Analytics over HBase
  • 7. 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Open Source Data Warehouse Hardware cost Softwarecost Specialized H/WCommodity H/W LicensingcostNoCost SMPMPP Open Source MPP HBase+ Phoenix
  • 8. 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Phoenix & HBase as a Data Warehouse Architecture Run on commodity H/W True MPP O/S and H/W flexibility Support OLTP and ROLAP
  • 9. 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Phoenix & HBase as a Data Warehouse Scalability Linear scalability for storage Linear scalability for memory Open to Third party storage
  • 10. 10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Phoenix & HBase as a Data Warehouse Reliability Highly Available Replication for disaster recovery Fully ACID for Data Integrity
  • 11. 11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Phoenix & HBase as a Data Warehouse Manageability Performance Tuning Data Modeling & Schema Evolution Data pruning Online expansion Or upgrade Data Backup and recovery
  • 12. 12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda Phoenix & HBase as an Enterprise Data Warehouse Use cases
  • 13. 13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Who uses Phoenix !!
  • 14. 14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Analytics Use case - (Web Advertising company)  Functional Requirements – Create a single source of truth – Cross dimensional query on 50+ dimension and 80+ metrics – Support fast Top-N queries  Non-functional requirements – Less than 3 second Response time for slice and dice – 250+ concurrent users – 100k+ Analytics queries/day – Highly available – Linear scalability
  • 15. 15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Data Warehouse Capacity  Data Size(ETL Input) – 24TB/day of raw data system wide – 25 Billion of impressions  HBase Input(cube) – 6 Billion rows of aggregated data(100GB/day)  HBase Cluster size – 65 Nodes of HBase – 520 TB of disk – 4.1 TB of memory
  • 16. 16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Use Case Architecture AdServer Click Tracking Kafka Input Kafka Input ETL Filter Aggregate In- Memory Store ETL Filter Aggregate Real-time Kafka CAMUS HDFS ETL HDFS Data Uploader D A T A A P I HBase Views A N A L Y T I C S UI Batch Processing Data Ingestion Analytics Apache Kafka
  • 17. 17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Cube Generation Cubes are stored in HBase A N A L Y T I C S UI Convert slice and dice query to SQL query Data API Analytics Data Warehouse Architecture Bulk Load HDFS ETL Backup and recovery
  • 18. 18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Time Series Use Case- (Apache Ambari)  Functional requirements – Store all cluster metrics collected every second(10k to 100k metrics/second) – Optimize storage/access for time series data  Non-functional requirements – Near real time response time – Scalable – Real time ingestion Ambari Metrics System (AMS)
  • 19. 19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved AMS architecture Metric Monitors Hosts Hadoop Sinks HBase Phoenix Metric Collector Ambari Server
  • 20. 20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda Phoenix & HBase as an Enterprise Data Warehouse Use Cases Optimizations
  • 21. 21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Schema Design  Most important criteria for driving overall performance of queries on the table  Primary key should be composed from most-used predicate columns in the queries  In most cases, leading part of primary key should help to convert queries into point lookups or range scans in HBase Primary key design
  • 22. 22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Schema Design  Use salting to alleviate write hot-spotting CREATE TABLE …( … ) SALT_BUCKETS = N – Number of buckets should be equal to number of RegionServers  Otherwise, try to presplit the table if you know the row key data set CREATE TABLE …( … ) SPLITS(…) Salting vs pre-split
  • 23. 23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Schema Design  Use block encoding and/or compression for better performance CREATE TABLE …( … ) DATA_BLOCK_ENCODING= ‘FAST_DIFF’, COMPRESSION=‘SNAPPY’  Use region replication for read high availability CREATE TABLE …( … ) “REGION_REPLICATION” = “2” Table properties
  • 24. 24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Schema Design  Set UPDATE_CACHE_FREQUENCY to bigger value to avoid frequently touching server for metadata updates CREATE TABLE …( … ) UPDATE_CACHE_FREQUENCY = 300000 Table properties
  • 25. 25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Schema Design  Divide columns into multiple column families if there are rarely accessed columns – HBase reads only the files of column families specified in the query to reduce I/O pk1 pk2 CF1 CF2 Col1 Col2 Col3 Col4 Col5 Col6 Col7 Frequently accessing columns Rarely accessing columns
  • 26. 26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Secondary Indexes  Global indexes – Optimized for read heavy use cases CREATE INDEX idx on table(…)  Local Indexes – Optimized for write heavy and space constrained use cases CREATE LOCAL INDEX idx on table(…)  Functional indexes – Allow you to create indexes on arbitrary expressions. CREATE INDEX UPPER_NAME_INDEX ON EMP(UPPER(FIRSTNAME||’ ’|| LASTNAME ))
  • 27. 27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Secondary Indexes  Use covered indexes to efficiently scan over the index table instead of primary table. CREATE INDEX idx ON table(…) include(…)  Pass index hint to guide query optimizer to select the right index for query SELECT /*+INDEX(<table> <index>)*/..
  • 28. 28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Row Timestamp Column  Maps HBase native row timestamp to a Phoenix column  Leverage optimizations provided by HBase like setting the minimum and maximum time range for scans to entirely skip the store files which don’t fall in that time range.  Perfect for time series use cases.  Syntax CREATE TABLE …(CREATED_DATE NOT NULL DATE … CONSTRAINT PK PRIMARY KEY(CREATED_DATE ROW_TIMESTAMP… )
  • 29. 29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Use of Statistics Region A Region F Region L Region R Chunk A Chunk C Chunk F Chunk I Chunk L Chunk O Chunk R Chunk U A F R L A F R L C I O U Client Client
  • 30. 30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Skip Scan  Phoenix supports skip scan to jump to matching keys directly when the query has key sets in predicate SELECT * FROM METRIC_RECORD WHERE METRIC_NAME LIKE 'abc%' AND HOSTNAME in ('host1’, 'host2'); CLIENT 1-CHUNK PARALLEL 1-WAY SKIP SCAN ON 2 RANGES OVER METRIC_RECORD ['abc','host1'] - ['abd','host2'] Region1 Region2 Region3 Region4 Client RS3RS2RS1 Skip scan
  • 31. 31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Join optimizations  Hash Join – Hash join outperforms other types of join algorithms when one of the relations is smaller or records matching the predicate should fit into memory  Sort-Merge join – When the relations are very big in size then use the sort-merge join algorithm  NO_STAR_JOIN hint – For multiple inner-join queries, Phoenix applies a star-join optimization by default. Use this hint in the query if the overall size of all right-hand-side tables would exceed the memory size limit.  NO_CHILD_PARENT_OPTIMIZATION hint – Prevents the usage of child-parent-join optimization.
  • 32. 32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Optimize Writes  Upsert values – Call it multiple times before commit for batching mutations – Use prepared statement when you run the query multiple times  Upsert select – Configure phoenix.mutate.batchSize based on row size – Set auto-commit to true for writing scan results directly to HBase. – Set auto-commit to true while running upsert selects on the same table so that writes happen at server.
  • 33. 33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Hints  SERIAL SCAN, RANGE SCAN  SERIAL  SMALL SCAN Some important hints
  • 34. 34 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Additional References  For some more optimizations you can refer to these documents – –
  • 35. 35 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda Phoenix & HBase as an Enterprise Data Warehouse Use Cases Optimizations Phoenix Query Server
  • 36. 36 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Phoenix Query Server  A standalone service that proxies user requests to HBase/Phoenix – Optional  Reference client implementation via JDBC – ”Thick” versus “Thin”  First introduced in Apache Phoenix 4.4.0  Built on Apache Calcite’s Avatica – ”A framework for building database drivers”
  • 37. 37 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Traditional Apache Phoenix RPC Model Table,a,123 Table,,123 RegionServer HDFS HBase client Phoenix client Phx coproc ZooKeeper Table,b,123 Table,a,123 Phx coproc Table,c,123 Table,b,123 Phx coproc RegionServer RegionServer Application
  • 38. 38 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Query Server Model Table,a,123 Table,,123 RegionServer HDFS HBase client Phoenix client Phx coproc ZooKeeper Table,b,123 Table,a,123 Phx coproc Table,d,123 Table,b,123 Phx coproc RegionServer RegionServer Query Server Application
  • 39. 39 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Query Server Technology  HTTP Server and wire API definition  Pluggable serialization – Google Protocol Buffers  “Thin” JDBC Driver (over HTTP)  Other goodies! – Pluggable metrics system – TCK (technology compatibility kit) – SPNEGO for Kerberos authentication – Horizontally scalable with load balancing
  • 40. 40 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Query Server Clients  Go language database/sql/driver –  .NET driver – –  ODBC – Built by, also available from Hortonworks  Python DB API v2.0 (not “battle tested”) – Client enablement
  • 41. 41 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda Phoenix & HBase as an Enterprise Data Warehouse Use Cases Optimizations Phoenix Query Server Q&A
  • 42. 42 © Hortonworks Inc. 2011 – 2016. All Rights Reserved We hope to see you all migrating to Phoenix & HBase and expecting more questions on the user mailing lists. Get involved in mailing lists:- You can reach us on:- Phoenix & HBase
  • 43. 43 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Thank You