SlideShare a Scribd company logo
Strongly Consistent Global Indexes for
Apache Phoenix
Kadir Ozdemir
September 2019
Why Phoenix at Salesforce?
Massive Data Scale w/
Familiar Interface
Trusted storage Consistent
Multi-cloud
Salesforce
Multi-tenancy
HDFS
HBase Server
(Da
Application Server HBase Region Servers
Phoenix
Server
Phoenix
Application
Phoenix Client
HBase Client
SQL
Table Scans/
Mutations
Table
Region
RPC
Secondary Indexing
ID Name City
1234 Ashley Seattle
2345 Kadir San Francisco
Primary Key Secondary Key
Secondary Indexing
ID Name City
1234 Ashley Seattle
2345 Kadir San Francisco
Primary KeyPrimary Key
ID Name City
1234 Ashley Seattle
2345 Kadir San Francisco
Primary Key
City ID Name
San Francisco 2345 Kadir
Seattle 12345 Ashley
Secondary Key
Data Table Index Table
Secondary Indexing - Update
ID Name City
1234 Ashley Seattle
Primary KeyPrimary Key
City ID Name
San Francisco 2345 Kadir
ID Name City
2345 Kadir San Francisco
City ID Name
Seattle 12345 Ashley
Data Table Index Table
Secondary Indexing - Update
ID Name City
1234 Ashley Seattle
Primary KeyPrimary Key
City ID Name
ID Name City
2345 Kadir San Francisco
City ID Name
Seattle 12345 Ashley
Data Table Index Table
Global Secondary Indexing - Update
ID Name City
1234 Ashley Seattle
Primary KeyPrimary Key
City ID Name
ID Name City
2345 Kadir Seattle
City ID Name
Seattle 1234 Ashley
Seattle 2345 Kadir
Data Table Index Table
Current Design Challenges
● Tries to make tables consistent at the write time by relying on client retries
○ May not handle correlated failures and may leave data table inconsistent with its indexes
● Needs external tools to detect inconsistencies and repair them
Design Objectives
● Secondary indexes should be always in sync with their data tables
● Strong consistency should not result in significant performance impact
● Strong consistency should not impact scalability significantly
Observations
● Data must be consistent at read time
○ An index table row can be repaired from the corresponding data table row at read time
● In HBase writes are fast
○ We can add extra write phase without severely impacting write performance
Strongly Consistent Design
Operation Strongly Consistent Design
Read
1. Read the index rows and check their status
2. The unverified rows repaired from the data table
Strongly Consistent Design
Operation Strongly Consistent Design
Read
1. Read the index rows and check their status
2. The unverified rows repaired from the data table
Write
1. Set the status of existing index rows unverified and write the new index
rows with the unverified status
2. Write the data table rows
3. Delete the existing index rows and set the status of new rows to verified
Strongly Consistent Design
Operation Strongly Consistent Design
Read
1. Read the index rows and check their status
2. The unverified rows repaired from the data table
Write
1. Set the status of existing index rows unverified and write the new index
rows with the unverified status
2. Write the data table rows
3. Delete the existing index rows and set the status of new rows to verified
Delete
1. Set the index table rows with the unverified status
2. Delete the data table rows
3. Delete index table rows
Correctness Without Concurrent Row Updates
● Missing index row is not possible
○ An index row is updated first before its data row
■ If the index update is failed then the data row update will not be attempted
○ An index row is deleted only after its data table row is deleted
● Verified index row implies existence of the corresponding data row
○ The status for an index row is set to verified only after the corresponding data row is written
○ The status for an index row is set to unverified before the corresponding data row is deleted
● Unverified index rows are not used for serving user queries
○ An unverified index row is repaired from its data row during scans
Correctness With Concurrent Row Updates
● The third phase is skipped for concurrent updates
○ Detect concurrent updates and leave them in the unverified state
● Use two phase row locking to detect concurrent updates on a data row
read the data
table
(phase 1) index
table update
(phase 2) update
the data table
phase 3 index
table update
Pending Rows
add remove
Performance Impact of Strong Consistency
● Setup: A data table with two indexes on a 10 node cluster
○ 1 billion large rows with random primary key
○ Top N queries on indexes where N is 50
● Less than 25% increase in write latency
○ Due to setting row status in phase 3
● No noticeable increase in read latency
○ The number of unverified rows due to pending updates on a given table region is limited by the
number of RPC threads and mutation batch size
Questions?

More Related Content

What's hot

Zeus: Uber’s Highly Scalable and Distributed Shuffle as a Service
Zeus: Uber’s Highly Scalable and Distributed Shuffle as a ServiceZeus: Uber’s Highly Scalable and Distributed Shuffle as a Service
Zeus: Uber’s Highly Scalable and Distributed Shuffle as a Service
Databricks
 
Hadoop World 2011: Advanced HBase Schema Design
Hadoop World 2011: Advanced HBase Schema DesignHadoop World 2011: Advanced HBase Schema Design
Hadoop World 2011: Advanced HBase Schema Design
Cloudera, Inc.
 
Cosco: An Efficient Facebook-Scale Shuffle Service
Cosco: An Efficient Facebook-Scale Shuffle ServiceCosco: An Efficient Facebook-Scale Shuffle Service
Cosco: An Efficient Facebook-Scale Shuffle Service
Databricks
 
Apache Tez – Present and Future
Apache Tez – Present and FutureApache Tez – Present and Future
Apache Tez – Present and Future
DataWorks Summit
 
ORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big Data
DataWorks Summit
 
SQL Performance Improvements at a Glance in Apache Spark 3.0
SQL Performance Improvements at a Glance in Apache Spark 3.0SQL Performance Improvements at a Glance in Apache Spark 3.0
SQL Performance Improvements at a Glance in Apache Spark 3.0
Databricks
 
Hyperspace for Delta Lake
Hyperspace for Delta LakeHyperspace for Delta Lake
Hyperspace for Delta Lake
Databricks
 
Hadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the FieldHadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the Field
DataWorks Summit
 
Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesHudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilities
Nishith Agarwal
 
ORC Files
ORC FilesORC Files
ORC Files
Owen O'Malley
 
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational CacheUsing Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Dremio Corporation
 
How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...
DataWorks Summit/Hadoop Summit
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Noritaka Sekiyama
 
Apache HBase Performance Tuning
Apache HBase Performance TuningApache HBase Performance Tuning
Apache HBase Performance Tuning
Lars Hofhansl
 
ORC 2015
ORC 2015ORC 2015
ORC 2015
t3rmin4t0r
 
Reshape Data Lake (as of 2020.07)
Reshape Data Lake (as of 2020.07)Reshape Data Lake (as of 2020.07)
Reshape Data Lake (as of 2020.07)
Eric Sun
 
Performance tuning in sql server
Performance tuning in sql serverPerformance tuning in sql server
Performance tuning in sql server
Antonios Chatzipavlis
 
Transactional operations in Apache Hive: present and future
Transactional operations in Apache Hive: present and futureTransactional operations in Apache Hive: present and future
Transactional operations in Apache Hive: present and future
DataWorks Summit
 
Intro to HBase
Intro to HBaseIntro to HBase
Intro to HBase
alexbaranau
 
Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive


Cloudera, Inc.
 

What's hot (20)

Zeus: Uber’s Highly Scalable and Distributed Shuffle as a Service
Zeus: Uber’s Highly Scalable and Distributed Shuffle as a ServiceZeus: Uber’s Highly Scalable and Distributed Shuffle as a Service
Zeus: Uber’s Highly Scalable and Distributed Shuffle as a Service
 
Hadoop World 2011: Advanced HBase Schema Design
Hadoop World 2011: Advanced HBase Schema DesignHadoop World 2011: Advanced HBase Schema Design
Hadoop World 2011: Advanced HBase Schema Design
 
Cosco: An Efficient Facebook-Scale Shuffle Service
Cosco: An Efficient Facebook-Scale Shuffle ServiceCosco: An Efficient Facebook-Scale Shuffle Service
Cosco: An Efficient Facebook-Scale Shuffle Service
 
Apache Tez – Present and Future
Apache Tez – Present and FutureApache Tez – Present and Future
Apache Tez – Present and Future
 
ORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big Data
 
SQL Performance Improvements at a Glance in Apache Spark 3.0
SQL Performance Improvements at a Glance in Apache Spark 3.0SQL Performance Improvements at a Glance in Apache Spark 3.0
SQL Performance Improvements at a Glance in Apache Spark 3.0
 
Hyperspace for Delta Lake
Hyperspace for Delta LakeHyperspace for Delta Lake
Hyperspace for Delta Lake
 
Hadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the FieldHadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the Field
 
Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesHudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilities
 
ORC Files
ORC FilesORC Files
ORC Files
 
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational CacheUsing Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
 
How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
 
Apache HBase Performance Tuning
Apache HBase Performance TuningApache HBase Performance Tuning
Apache HBase Performance Tuning
 
ORC 2015
ORC 2015ORC 2015
ORC 2015
 
Reshape Data Lake (as of 2020.07)
Reshape Data Lake (as of 2020.07)Reshape Data Lake (as of 2020.07)
Reshape Data Lake (as of 2020.07)
 
Performance tuning in sql server
Performance tuning in sql serverPerformance tuning in sql server
Performance tuning in sql server
 
Transactional operations in Apache Hive: present and future
Transactional operations in Apache Hive: present and futureTransactional operations in Apache Hive: present and future
Transactional operations in Apache Hive: present and future
 
Intro to HBase
Intro to HBaseIntro to HBase
Intro to HBase
 
Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive


 

Similar to Strongly Consistent Global Indexes for Apache Phoenix

SQL
SQLSQL
12c Database new features
12c Database new features12c Database new features
12c Database new features
Sandeep Redkar
 
Galene - LinkedIn's Search Architecture: Presented by Diego Buthay & Sriram S...
Galene - LinkedIn's Search Architecture: Presented by Diego Buthay & Sriram S...Galene - LinkedIn's Search Architecture: Presented by Diego Buthay & Sriram S...
Galene - LinkedIn's Search Architecture: Presented by Diego Buthay & Sriram S...
Lucidworks
 
Deploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSDeploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWS
Amazon Web Services
 
Query parameterization
Query parameterizationQuery parameterization
Query parameterization
Riteshkiit
 
Lsmw ppt in SAP ABAP
Lsmw ppt in SAP ABAPLsmw ppt in SAP ABAP
Lsmw ppt in SAP ABAP
Aabid Khan
 
Kafka Summit SF 2017 - Keynote - Go Against the Flow: Databases and Stream Pr...
Kafka Summit SF 2017 - Keynote - Go Against the Flow: Databases and Stream Pr...Kafka Summit SF 2017 - Keynote - Go Against the Flow: Databases and Stream Pr...
Kafka Summit SF 2017 - Keynote - Go Against the Flow: Databases and Stream Pr...
confluent
 
Flink Forward SF 2017: Timo Walther - Table & SQL API – unified APIs for bat...
Flink Forward SF 2017: Timo Walther -  Table & SQL API – unified APIs for bat...Flink Forward SF 2017: Timo Walther -  Table & SQL API – unified APIs for bat...
Flink Forward SF 2017: Timo Walther - Table & SQL API – unified APIs for bat...
Flink Forward
 
What's new in MariaDB TX 3.0
What's new in MariaDB TX 3.0What's new in MariaDB TX 3.0
What's new in MariaDB TX 3.0
MariaDB plc
 
Sql Server Query Parameterization
Sql Server Query ParameterizationSql Server Query Parameterization
Sql Server Query Parameterization
Mindfire Solutions
 
Database Performance
Database PerformanceDatabase Performance
Database Performance
Boris Hristov
 
Apache HAWQ Architecture
Apache HAWQ ArchitectureApache HAWQ Architecture
Apache HAWQ Architecture
Alexey Grishchenko
 
PostgreSQL - масштабирование в моде, Valentine Gogichashvili (Zalando SE)
PostgreSQL - масштабирование в моде, Valentine Gogichashvili (Zalando SE)PostgreSQL - масштабирование в моде, Valentine Gogichashvili (Zalando SE)
PostgreSQL - масштабирование в моде, Valentine Gogichashvili (Zalando SE)
Ontico
 
2017 AWS DB Day | Amazon Redshift 소개 및 실습
2017 AWS DB Day | Amazon Redshift  소개 및 실습2017 AWS DB Day | Amazon Redshift  소개 및 실습
2017 AWS DB Day | Amazon Redshift 소개 및 실습
Amazon Web Services Korea
 
Amazon Redshift For Data Analysts
Amazon Redshift For Data AnalystsAmazon Redshift For Data Analysts
Amazon Redshift For Data Analysts
Can Abacıgil
 
10 sql tips
10 sql tips10 sql tips
10 sql tips
Yogui Osasuno
 
SE2016 Java Roman Ugolnikov "Migration and source control for your DB"
SE2016 Java Roman Ugolnikov "Migration and source control for your DB"SE2016 Java Roman Ugolnikov "Migration and source control for your DB"
SE2016 Java Roman Ugolnikov "Migration and source control for your DB"
Inhacking
 
Roman Ugolnikov Migrationа and sourcecontrol for your db
Roman Ugolnikov Migrationа and sourcecontrol for your dbRoman Ugolnikov Migrationа and sourcecontrol for your db
Roman Ugolnikov Migrationа and sourcecontrol for your db
Аліна Шепшелей
 
Structured streaming in Spark
Structured streaming in SparkStructured streaming in Spark
Structured streaming in Spark
Giri R Varatharajan
 
Hpverticacertificationguide 150322232921-conversion-gate01
Hpverticacertificationguide 150322232921-conversion-gate01Hpverticacertificationguide 150322232921-conversion-gate01
Hpverticacertificationguide 150322232921-conversion-gate01
Anvith S. Upadhyaya
 

Similar to Strongly Consistent Global Indexes for Apache Phoenix (20)

SQL
SQLSQL
SQL
 
12c Database new features
12c Database new features12c Database new features
12c Database new features
 
Galene - LinkedIn's Search Architecture: Presented by Diego Buthay & Sriram S...
Galene - LinkedIn's Search Architecture: Presented by Diego Buthay & Sriram S...Galene - LinkedIn's Search Architecture: Presented by Diego Buthay & Sriram S...
Galene - LinkedIn's Search Architecture: Presented by Diego Buthay & Sriram S...
 
Deploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSDeploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWS
 
Query parameterization
Query parameterizationQuery parameterization
Query parameterization
 
Lsmw ppt in SAP ABAP
Lsmw ppt in SAP ABAPLsmw ppt in SAP ABAP
Lsmw ppt in SAP ABAP
 
Kafka Summit SF 2017 - Keynote - Go Against the Flow: Databases and Stream Pr...
Kafka Summit SF 2017 - Keynote - Go Against the Flow: Databases and Stream Pr...Kafka Summit SF 2017 - Keynote - Go Against the Flow: Databases and Stream Pr...
Kafka Summit SF 2017 - Keynote - Go Against the Flow: Databases and Stream Pr...
 
Flink Forward SF 2017: Timo Walther - Table & SQL API – unified APIs for bat...
Flink Forward SF 2017: Timo Walther -  Table & SQL API – unified APIs for bat...Flink Forward SF 2017: Timo Walther -  Table & SQL API – unified APIs for bat...
Flink Forward SF 2017: Timo Walther - Table & SQL API – unified APIs for bat...
 
What's new in MariaDB TX 3.0
What's new in MariaDB TX 3.0What's new in MariaDB TX 3.0
What's new in MariaDB TX 3.0
 
Sql Server Query Parameterization
Sql Server Query ParameterizationSql Server Query Parameterization
Sql Server Query Parameterization
 
Database Performance
Database PerformanceDatabase Performance
Database Performance
 
Apache HAWQ Architecture
Apache HAWQ ArchitectureApache HAWQ Architecture
Apache HAWQ Architecture
 
PostgreSQL - масштабирование в моде, Valentine Gogichashvili (Zalando SE)
PostgreSQL - масштабирование в моде, Valentine Gogichashvili (Zalando SE)PostgreSQL - масштабирование в моде, Valentine Gogichashvili (Zalando SE)
PostgreSQL - масштабирование в моде, Valentine Gogichashvili (Zalando SE)
 
2017 AWS DB Day | Amazon Redshift 소개 및 실습
2017 AWS DB Day | Amazon Redshift  소개 및 실습2017 AWS DB Day | Amazon Redshift  소개 및 실습
2017 AWS DB Day | Amazon Redshift 소개 및 실습
 
Amazon Redshift For Data Analysts
Amazon Redshift For Data AnalystsAmazon Redshift For Data Analysts
Amazon Redshift For Data Analysts
 
10 sql tips
10 sql tips10 sql tips
10 sql tips
 
SE2016 Java Roman Ugolnikov "Migration and source control for your DB"
SE2016 Java Roman Ugolnikov "Migration and source control for your DB"SE2016 Java Roman Ugolnikov "Migration and source control for your DB"
SE2016 Java Roman Ugolnikov "Migration and source control for your DB"
 
Roman Ugolnikov Migrationа and sourcecontrol for your db
Roman Ugolnikov Migrationа and sourcecontrol for your dbRoman Ugolnikov Migrationа and sourcecontrol for your db
Roman Ugolnikov Migrationа and sourcecontrol for your db
 
Structured streaming in Spark
Structured streaming in SparkStructured streaming in Spark
Structured streaming in Spark
 
Hpverticacertificationguide 150322232921-conversion-gate01
Hpverticacertificationguide 150322232921-conversion-gate01Hpverticacertificationguide 150322232921-conversion-gate01
Hpverticacertificationguide 150322232921-conversion-gate01
 

Recently uploaded

[D3T2S03] Data&AI Roadshow 2024 - Amazon DocumentDB 실습
[D3T2S03] Data&AI Roadshow 2024 - Amazon DocumentDB 실습[D3T2S03] Data&AI Roadshow 2024 - Amazon DocumentDB 실습
[D3T2S03] Data&AI Roadshow 2024 - Amazon DocumentDB 실습
Amazon Web Services Korea
 
A review of I_O behavior on Oracle database in ASM
A review of I_O behavior on Oracle database in ASMA review of I_O behavior on Oracle database in ASM
A review of I_O behavior on Oracle database in ASM
Alireza Kamrani
 
AWS Cloud Technology and Services by Miguel Ángel Rodríguez Anticona.pdf
AWS Cloud Technology and Services by Miguel Ángel Rodríguez Anticona.pdfAWS Cloud Technology and Services by Miguel Ángel Rodríguez Anticona.pdf
AWS Cloud Technology and Services by Miguel Ángel Rodríguez Anticona.pdf
Miguel Ángel Rodríguez Anticona
 
Orange Yellow Gradient Aesthetic Y2K Creative Portfolio Presentation -3.pdf
Orange Yellow Gradient Aesthetic Y2K Creative Portfolio Presentation -3.pdfOrange Yellow Gradient Aesthetic Y2K Creative Portfolio Presentation -3.pdf
Orange Yellow Gradient Aesthetic Y2K Creative Portfolio Presentation -3.pdf
RealDarrah
 
ℂall Girls Lucknow (india) +91-8630512678 Lucknow ℂall Girls
ℂall Girls Lucknow (india) +91-8630512678 Lucknow ℂall Girlsℂall Girls Lucknow (india) +91-8630512678 Lucknow ℂall Girls
ℂall Girls Lucknow (india) +91-8630512678 Lucknow ℂall Girls
sagunroayal
 
Oracle PaaS and IaaS Universal Credits Service Descriptions.pdf
Oracle PaaS and IaaS Universal Credits Service Descriptions.pdfOracle PaaS and IaaS Universal Credits Service Descriptions.pdf
Oracle PaaS and IaaS Universal Credits Service Descriptions.pdf
JetenderSambyal1
 
Puta best ppt for it to understand ppt for it
Puta best ppt for it to understand ppt for itPuta best ppt for it to understand ppt for it
Puta best ppt for it to understand ppt for it
ZUES787
 
❻❸❼⓿❽❻❷⓿⓿❼KALYAN MATKA CHART FINAL OPEN JODI PANNA FIXXX DPBOSS MATKA RESULT ...
❻❸❼⓿❽❻❷⓿⓿❼KALYAN MATKA CHART FINAL OPEN JODI PANNA FIXXX DPBOSS MATKA RESULT ...❻❸❼⓿❽❻❷⓿⓿❼KALYAN MATKA CHART FINAL OPEN JODI PANNA FIXXX DPBOSS MATKA RESULT ...
❻❸❼⓿❽❻❷⓿⓿❼KALYAN MATKA CHART FINAL OPEN JODI PANNA FIXXX DPBOSS MATKA RESULT ...
#kalyanmatkaresult #dpboss #kalyanmatka #satta #matka #sattamatka
 
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
#kalyanmatkaresult #dpboss #kalyanmatka #satta #matka #sattamatka
 
一比一原版(AUT毕业证)奥克兰理工大学毕业证如何办理
一比一原版(AUT毕业证)奥克兰理工大学毕业证如何办理一比一原版(AUT毕业证)奥克兰理工大学毕业证如何办理
一比一原版(AUT毕业证)奥克兰理工大学毕业证如何办理
ysftc
 
01 - Motagua 3.0 - 16x9 - Light - [MAIN].pptx
01 - Motagua 3.0 - 16x9 - Light - [MAIN].pptx01 - Motagua 3.0 - 16x9 - Light - [MAIN].pptx
01 - Motagua 3.0 - 16x9 - Light - [MAIN].pptx
CindyBanurea3
 
legislatives-abstention-elections-premier-tour
legislatives-abstention-elections-premier-tourlegislatives-abstention-elections-premier-tour
legislatives-abstention-elections-premier-tour
contact Elabe
 
Scorpio_N_Accessories Catalogue_Rev - 2.7.pdf
Scorpio_N_Accessories Catalogue_Rev - 2.7.pdfScorpio_N_Accessories Catalogue_Rev - 2.7.pdf
Scorpio_N_Accessories Catalogue_Rev - 2.7.pdf
SyedMisbah12
 
Niagara College degree offer diploma Transcript
Niagara College  degree offer diploma TranscriptNiagara College  degree offer diploma Transcript
Niagara College degree offer diploma Transcript
taqyea
 
Applications of Data Science in Various Industries
Applications of Data Science in Various IndustriesApplications of Data Science in Various Industries
Applications of Data Science in Various Industries
IABAC
 
Difference in Differences - Does Strict Speed Limit Restrictions Reduce Road ...
Difference in Differences - Does Strict Speed Limit Restrictions Reduce Road ...Difference in Differences - Does Strict Speed Limit Restrictions Reduce Road ...
Difference in Differences - Does Strict Speed Limit Restrictions Reduce Road ...
ThinkInnovation
 
Euro 2024 Predictions - Group Stage Results.pptx
Euro 2024 Predictions - Group Stage Results.pptxEuro 2024 Predictions - Group Stage Results.pptx
Euro 2024 Predictions - Group Stage Results.pptx
Select Distinct Limited
 
mathmathmathmathmathmathmathmathmathmath
mathmathmathmathmathmathmathmathmathmathmathmathmathmathmathmathmathmathmathmath
mathmathmathmathmathmathmathmathmathmath
JoshuaWong902269
 
LLM Cheatsheet and it's brief introduction
LLM Cheatsheet and it's brief introductionLLM Cheatsheet and it's brief introduction
LLM Cheatsheet and it's brief introduction
DarkKnight437486
 
@Call @Girls Kolkata 0000000000 Shivani Beautiful Girl any Time
@Call @Girls Kolkata 0000000000 Shivani Beautiful Girl any Time@Call @Girls Kolkata 0000000000 Shivani Beautiful Girl any Time
@Call @Girls Kolkata 0000000000 Shivani Beautiful Girl any Time
manjukaushik328
 

Recently uploaded (20)

[D3T2S03] Data&AI Roadshow 2024 - Amazon DocumentDB 실습
[D3T2S03] Data&AI Roadshow 2024 - Amazon DocumentDB 실습[D3T2S03] Data&AI Roadshow 2024 - Amazon DocumentDB 실습
[D3T2S03] Data&AI Roadshow 2024 - Amazon DocumentDB 실습
 
A review of I_O behavior on Oracle database in ASM
A review of I_O behavior on Oracle database in ASMA review of I_O behavior on Oracle database in ASM
A review of I_O behavior on Oracle database in ASM
 
AWS Cloud Technology and Services by Miguel Ángel Rodríguez Anticona.pdf
AWS Cloud Technology and Services by Miguel Ángel Rodríguez Anticona.pdfAWS Cloud Technology and Services by Miguel Ángel Rodríguez Anticona.pdf
AWS Cloud Technology and Services by Miguel Ángel Rodríguez Anticona.pdf
 
Orange Yellow Gradient Aesthetic Y2K Creative Portfolio Presentation -3.pdf
Orange Yellow Gradient Aesthetic Y2K Creative Portfolio Presentation -3.pdfOrange Yellow Gradient Aesthetic Y2K Creative Portfolio Presentation -3.pdf
Orange Yellow Gradient Aesthetic Y2K Creative Portfolio Presentation -3.pdf
 
ℂall Girls Lucknow (india) +91-8630512678 Lucknow ℂall Girls
ℂall Girls Lucknow (india) +91-8630512678 Lucknow ℂall Girlsℂall Girls Lucknow (india) +91-8630512678 Lucknow ℂall Girls
ℂall Girls Lucknow (india) +91-8630512678 Lucknow ℂall Girls
 
Oracle PaaS and IaaS Universal Credits Service Descriptions.pdf
Oracle PaaS and IaaS Universal Credits Service Descriptions.pdfOracle PaaS and IaaS Universal Credits Service Descriptions.pdf
Oracle PaaS and IaaS Universal Credits Service Descriptions.pdf
 
Puta best ppt for it to understand ppt for it
Puta best ppt for it to understand ppt for itPuta best ppt for it to understand ppt for it
Puta best ppt for it to understand ppt for it
 
❻❸❼⓿❽❻❷⓿⓿❼KALYAN MATKA CHART FINAL OPEN JODI PANNA FIXXX DPBOSS MATKA RESULT ...
❻❸❼⓿❽❻❷⓿⓿❼KALYAN MATKA CHART FINAL OPEN JODI PANNA FIXXX DPBOSS MATKA RESULT ...❻❸❼⓿❽❻❷⓿⓿❼KALYAN MATKA CHART FINAL OPEN JODI PANNA FIXXX DPBOSS MATKA RESULT ...
❻❸❼⓿❽❻❷⓿⓿❼KALYAN MATKA CHART FINAL OPEN JODI PANNA FIXXX DPBOSS MATKA RESULT ...
 
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
 
一比一原版(AUT毕业证)奥克兰理工大学毕业证如何办理
一比一原版(AUT毕业证)奥���兰理工大学毕业证如何办理一比一原版(AUT毕业证)奥克兰理工大学毕业证如何办理
一比一原版(AUT毕业证)奥克兰理工大学毕业证如何办理
 
01 - Motagua 3.0 - 16x9 - Light - [MAIN].pptx
01 - Motagua 3.0 - 16x9 - Light - [MAIN].pptx01 - Motagua 3.0 - 16x9 - Light - [MAIN].pptx
01 - Motagua 3.0 - 16x9 - Light - [MAIN].pptx
 
legislatives-abstention-elections-premier-tour
legislatives-abstention-elections-premier-tourlegislatives-abstention-elections-premier-tour
legislatives-abstention-elections-premier-tour
 
Scorpio_N_Accessories Catalogue_Rev - 2.7.pdf
Scorpio_N_Accessories Catalogue_Rev - 2.7.pdfScorpio_N_Accessories Catalogue_Rev - 2.7.pdf
Scorpio_N_Accessories Catalogue_Rev - 2.7.pdf
 
Niagara College degree offer diploma Transcript
Niagara College  degree offer diploma TranscriptNiagara College  degree offer diploma Transcript
Niagara College degree offer diploma Transcript
 
Applications of Data Science in Various Industries
Applications of Data Science in Various IndustriesApplications of Data Science in Various Industries
Applications of Data Science in Various Industries
 
Difference in Differences - Does Strict Speed Limit Restrictions Reduce Road ...
Difference in Differences - Does Strict Speed Limit Restrictions Reduce Road ...Difference in Differences - Does Strict Speed Limit Restrictions Reduce Road ...
Difference in Differences - Does Strict Speed Limit Restrictions Reduce Road ...
 
Euro 2024 Predictions - Group Stage Results.pptx
Euro 2024 Predictions - Group Stage Results.pptxEuro 2024 Predictions - Group Stage Results.pptx
Euro 2024 Predictions - Group Stage Results.pptx
 
mathmathmathmathmathmathmathmathmathmath
mathmathmathmathmathmathmathmathmathmathmathmathmathmathmathmathmathmathmathmath
mathmathmathmathmathmathmathmathmathmath
 
LLM Cheatsheet and it's brief introduction
LLM Cheatsheet and it's brief introductionLLM Cheatsheet and it's brief introduction
LLM Cheatsheet and it's brief introduction
 
@Call @Girls Kolkata 0000000000 Shivani Beautiful Girl any Time
@Call @Girls Kolkata 0000000000 Shivani Beautiful Girl any Time@Call @Girls Kolkata 0000000000 Shivani Beautiful Girl any Time
@Call @Girls Kolkata 0000000000 Shivani Beautiful Girl any Time
 

Strongly Consistent Global Indexes for Apache Phoenix

  • 1. Strongly Consistent Global Indexes for Apache Phoenix Kadir Ozdemir September 2019
  • 2. Why Phoenix at Salesforce? Massive Data Scale w/ Familiar Interface Trusted storage Consistent Multi-cloud Salesforce Multi-tenancy
  • 3. HDFS HBase Server (Da Application Server HBase Region Servers Phoenix Server Phoenix Application Phoenix Client HBase Client SQL Table Scans/ Mutations Table Region RPC
  • 4. Secondary Indexing ID Name City 1234 Ashley Seattle 2345 Kadir San Francisco Primary Key Secondary Key
  • 5. Secondary Indexing ID Name City 1234 Ashley Seattle 2345 Kadir San Francisco Primary KeyPrimary Key ID Name City 1234 Ashley Seattle 2345 Kadir San Francisco Primary Key City ID Name San Francisco 2345 Kadir Seattle 12345 Ashley Secondary Key Data Table Index Table
  • 6. Secondary Indexing - Update ID Name City 1234 Ashley Seattle Primary KeyPrimary Key City ID Name San Francisco 2345 Kadir ID Name City 2345 Kadir San Francisco City ID Name Seattle 12345 Ashley Data Table Index Table
  • 7. Secondary Indexing - Update ID Name City 1234 Ashley Seattle Primary KeyPrimary Key City ID Name ID Name City 2345 Kadir San Francisco City ID Name Seattle 12345 Ashley Data Table Index Table
  • 8. Global Secondary Indexing - Update ID Name City 1234 Ashley Seattle Primary KeyPrimary Key City ID Name ID Name City 2345 Kadir Seattle City ID Name Seattle 1234 Ashley Seattle 2345 Kadir Data Table Index Table
  • 9. Current Design Challenges ● Tries to make tables consistent at the write time by relying on client retries ○ May not handle correlated failures and may leave data table inconsistent with its indexes ● Needs external tools to detect inconsistencies and repair them
  • 10. Design Objectives ● Secondary indexes should be always in sync with their data tables ● Strong consistency should not result in significant performance impact ● Strong consistency should not impact scalability significantly
  • 11. Observations ● Data must be consistent at read time ○ An index table row can be repaired from the corresponding data table row at read time ● In HBase writes are fast ○ We can add extra write phase without severely impacting write performance
  • 12. Strongly Consistent Design Operation Strongly Consistent Design Read 1. Read the index rows and check their status 2. The unverified rows repaired from the data table
  • 13. Strongly Consistent Design Operation Strongly Consistent Design Read 1. Read the index rows and check their status 2. The unverified rows repaired from the data table Write 1. Set the status of existing index rows unverified and write the new index rows with the unverified status 2. Write the data table rows 3. Delete the existing index rows and set the status of new rows to verified
  • 14. Strongly Consistent Design Operation Strongly Consistent Design Read 1. Read the index rows and check their status 2. The unverified rows repaired from the data table Write 1. Set the status of existing index rows unverified and write the new index rows with the unverified status 2. Write the data table rows 3. Delete the existing index rows and set the status of new rows to verified Delete 1. Set the index table rows with the unverified status 2. Delete the data table rows 3. Delete index table rows
  • 15. Correctness Without Concurrent Row Updates ● Missing index row is not possible ○ An index row is updated first before its data row ■ If the index update is failed then the data row update will not be attempted ○ An index row is deleted only after its data table row is deleted ● Verified index row implies existence of the corresponding data row ○ The status for an index row is set to verified only after the corresponding data row is written ○ The status for an index row is set to unverified before the corresponding data row is deleted ● Unverified index rows are not used for serving user queries ○ An unverified index row is repaired from its data row during scans
  • 16. Correctness With Concurrent Row Updates ● The third phase is skipped for concurrent updates ○ Detect concurrent updates and leave them in the unverified state ● Use two phase row locking to detect concurrent updates on a data row read the data table (phase 1) index table update (phase 2) update the data table phase 3 index table update Pending Rows add remove
  • 17. Performance Impact of Strong Consistency ● Setup: A data table with two indexes on a 10 node cluster ○ 1 billion large rows with random primary key ○ Top N queries on indexes where N is 50 ● Less than 25% increase in write latency ○ Due to setting row status in phase 3 ● No noticeable increase in read latency ○ The number of unverified rows due to pending updates on a given table region is limited by the number of RPC threads and mutation batch size