SlideShare a Scribd company logo
Apache
Phoenix
Put the SQL back in NoSQL
1
Osama Hussein, March 2021
Agenda
● History
● Overview
● Architecture
2
● Capabilities
● Code
● Scenarios
1.
History
From open-source repo to top Apache project
Overview (Apache Phoenix)
4
● Began as an internal project by the
company (salesforce.com).
MAY 2014
JAN 2014
A Top-Level Apache
Project
Orignially Open-
Sourced on Github

Recommended for you

Scylla Summit 2022: Making Schema Changes Safe with Raft
Scylla Summit 2022: Making Schema Changes Safe with RaftScylla Summit 2022: Making Schema Changes Safe with Raft
Scylla Summit 2022: Making Schema Changes Safe with Raft

ScyllaDB adopted Raft as a consensus protocol in order to dramatically improve our operational aspects as well as provide strong consistency to the end-user. This talk will explain how Raft behaves in Scylla Open Source 5.0 and introduce the first end-user visible major improvement: schema changes. Learn how cluster configuration resides in Raft, providing consistent cluster assembly and configuration management. This makes bootstrapping safer and provides reliable disaster recovery when you lose the majority of the cluster. To watch all of the recordings hosted during Scylla Summit 2022 visit our website here: https://www.scylladb.com/summit.

big data databasedatabasenosql
Extending Apache Ranger Authorization Beyond Hadoop: Review of Apache Ranger ...
Extending Apache Ranger Authorization Beyond Hadoop: Review of Apache Ranger ...Extending Apache Ranger Authorization Beyond Hadoop: Review of Apache Ranger ...
Extending Apache Ranger Authorization Beyond Hadoop: Review of Apache Ranger ...

Ranger’s pluggable architecture allows resource access policy administration and enforcement for standard and custom services from a “single pane of glass”. Apache Ranger has a rich Authorization Model, which provides the mechanism to author Policy in a Ranger Admin Server and serves as policy decision and audit point in authorizing user’s resource access within various components of Hadoop ecosystem. This session will provide a deep dive into Ranger framework and a cook-book for extending Ranger to do authorization / auditing on resource access to external applications, including technical details of Rest APIs, Ranger policy engine and enriching authorization requests, with a demo of a sample application.We will then demonstrate a real-world example of how Ranger has simplified security enforcement for Hadoop-native MPP SQL engine like Apache HAWQ (incubating),which previously used its built-in Postgres-like authorization mechanisms. The integration design includes a Ranger Plugin Service that allows transparent authorization API calls between C-based Apache HAWQ and Java-based Apache Ranger.

dataworks summitdws17dataworks summit 2017
An Introduction To REST API
An Introduction To REST APIAn Introduction To REST API
An Introduction To REST API

This document provides an introduction and overview of REST APIs. It defines REST as an architectural style based on web standards like HTTP that defines resources that are accessed via common operations like GET, PUT, POST, and DELETE. It outlines best practices for REST API design, including using nouns in URIs, plural resource names, GET for retrieval only, HTTP status codes, and versioning. It also covers concepts like filtering, sorting, paging, and common queries.

restapiweb
2.
Overview
UDF, Transactions and Schema
Overview (Apache Phoenix)
6
Lorem ipsum
congue tempus
Support for
late-bound,
schema-on-
read
SQL and
JDBC API
support
Access to data
stored and
produced in other
components such
as Apache Spark
and Apache Hive
● Developed as part of Apache Hadoop.
● Runs on top of Hadoop Distributed File System (HDFS).
● HBase scales linearly and shards automatically.
Overview (Apache Phoenix)
7
Lorem ipsum
congue tempus
Support for
late-bound,
schema-on-
read
SQL and
JDBC API
support
Access to data
stored and
produced in other
components such
as Apache Spark
and Apache Hive
● Apache Phoenix is an add-on for Apache HBase that provides a
programmatic ANSI SQL interface.
● implements best-practice optimizations to enable software
engineers to develop next-generation data-driven applications
based on HBase.
● Create and interact with tables in the form of typical DDL/DML
statements using the standard JDBC API.
Overview (Apache Phoenix)
8
● Written in Java and SQL
● Atomicity, Consistency, Isolation and
Durability (ACID)
● Fully integrated with other Hadoop
products such as Spark, Hive, Pig, Flume,
and Map Reduce.

Recommended for you

Apache Ambari: Past, Present, Future
Apache Ambari: Past, Present, FutureApache Ambari: Past, Present, Future
Apache Ambari: Past, Present, Future

This document discusses Apache Ambari and provides the following information: 1) It provides a background on Apache Ambari, describing it as an open source management platform for provisioning, managing, monitoring and securing Apache Hadoop clusters. 2) It discusses recent Ambari releases including versions 2.2.0, 2.2.2 and 2.4.0 GA. 3) It describes features of Ambari including alerts and metrics, blueprints, security setup using Kerberos and RBAC, log search, automated cluster upgrades and extensibility options.

futureofdataapache ambarihadoop
Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explained

Watch this talk here: https://www.confluent.io/online-talks/apache-kafka-architecture-and-fundamentals-explained-on-demand This session explains Apache Kafka’s internal design and architecture. Companies like LinkedIn are now sending more than 1 trillion messages per day to Apache Kafka. Learn about the underlying design in Kafka that leads to such high throughput. This talk provides a comprehensive overview of Kafka architecture and internal functions, including: -Topics, partitions and segments -The commit log and streams -Brokers and broker replication -Producer basics -Consumers, consumer groups and offsets This session is part 2 of 4 in our Fundamentals for Apache Kafka series.

apache kafkaconfluentconfluent platform
ELK Stack
ELK StackELK Stack
ELK Stack

The document provides an introduction to the ELK stack, which is a collection of three open source products: Elasticsearch, Logstash, and Kibana. It describes each component, including that Elasticsearch is a search and analytics engine, Logstash is used to collect, parse, and store logs, and Kibana is used to visualize data with charts and graphs. It also provides examples of how each component works together in processing and analyzing log data.

kibanaelklogstash
Overview (Apache Phoenix)
9
● included in
○ Cloudera Data Platform 7.0 and above.
○ Hortonworks distribution for HDP 2.1
and above.
○ Available as part of Cloudera labs.
○ Part of the Hadoop ecosystem.
Overview (SQL Support)
10
● Compiles SQL to and orchestrate running
of HBase scans.
● Produces JDBC result set.
● All standard SQL query constructs are
supported.
Overview (SQL Support)
11
● Direct use of the HBase API, along with
coprocessors and custom filters.
Performance:
○ Milliseconds for small queries
○ Seconds for tens of millions of rows.
Overview (Bulk Loading)
12
● MapReduce-based :
○ CSV and JSON
○ Via Phoenix
○ MapReduce library
● Single-Threaded:
○ CSV
○ Via PostgreSQL (PSQL)
○ HBase on local machine

Recommended for you

Ozone: scaling HDFS to trillions of objects
Ozone: scaling HDFS to trillions of objectsOzone: scaling HDFS to trillions of objects
Ozone: scaling HDFS to trillions of objects

Ozone is an object store for Hadoop. Ozone solves the small file problem of HDFS, which allows users to store trillions of files in Ozone and access them as if there are on HDFS. Ozone plugs into existing Hadoop deployments seamlessly, and programs like Hive, LLAP, and Spark work without any modifications. This talk looks at the architecture, reliability, and performance of Ozone. In this talk, we will also explore Hadoop distributed storage layer, a block storage layer that makes this scaling possible, and how we plan to use the Hadoop distributed storage layer for scaling HDFS. We will demonstrate how to install an Ozone cluster, how to create volumes, buckets, and keys, how to run Hive and Spark against HDFS and Ozone file systems using federation, so that users don’t have to worry about where the data is stored. In other words, a full user primer on Ozone will be part of this talk. Speakers Anu Engineer, Software Engineer, Hortonworks Xiaoyu Yao, Software Engineer, Hortonworks

apache hadoopbig compute and storagedataworks summit san jose
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...

Did you like it? Check out our E-book: Apache NiFi - A Complete Guide https://ebook.getindata.com/apache-nifi-complete-guide Apache NiFi is one of the most popular services for running ETL pipelines otherwise it’s not the youngest technology. During the talk, there are described all details about migrating pipelines from the old Hadoop platform to the Kubernetes, managing everything as the code, monitoring all corner cases of NiFi and making it a robust solution that is user-friendly even for non-programmers. Author: Albert Lewandowski Linkedin: https://www.linkedin.com/in/albert-lewandowski/ ___ Getindata is a company founded in 2014 by ex-Spotify data engineers. From day one our focus has been on Big Data projects. We bring together a group of best and most experienced experts in Poland, working with cloud and open-source Big Data technologies to help companies build scalable data architectures and implement advanced analytics over large data sets. Our experts have vast production experience in implementing Big Data projects for Polish as well as foreign companies including i.a. Spotify, Play, Truecaller, Kcell, Acast, Allegro, ING, Agora, Synerise, StepStone, iZettle and many others from the pharmaceutical, media, finance and FMCG industries. https://getindata.com​

big databig data expertsdevops
Introduction to sqoop
Introduction to sqoopIntroduction to sqoop
Introduction to sqoop

Sqoop is a tool for efficiently transferring bulk data between Apache Hadoop and structured data stores like relational databases. It allows importing of data from external sources into HDFS and Hive, and exporting data from Hadoop to external systems. Sqoop uses parallelization for fast data transfer and makes analyzing large datasets across systems more efficient.

sqoophivehadoop
Overview (User Defintion Functions)
13
● Temporary UDFs for sessions only.
● Permanent UDFs stored in system functions.
● UDF used in SQL and indexes.
● Tenant specific UDF usage and support.
● UDF jar update require cluster bounce.
Overview (Transactions)
14
● Using Apache Tephra cross row/table/ACID support.
● Create tables with flag ‘transactional=true’.
● Enable transactions and snapshot directory and set
timeout value ‘hbase-site.xml’.
● Transactions start with statement against table.
● Transactions end with commit or rollback.
Overview (Transactions)
15
● Applications let HBase manage timestamps.
● Incase the application needs to control the timestamp
‘CurrentSCN’ property must be specified at the
connection time.
● ‘CurrentSCN’ controls the timestamp for any DDL,
DML, or query.
Overview (Schema)
16
● The table metadata is stored in versioned HBase table
(Up to 1000 versions).
● ‘UPDATE_CACHE_FREQUENCY’ allow the user to
declare how often the server will be checked for meta
data updates. Values:
○ Always
○ Never
○ Millisecond value

Recommended for you

Apache Ambari Stack Extensibility
Apache Ambari Stack ExtensibilityApache Ambari Stack Extensibility
Apache Ambari Stack Extensibility

The document discusses new extensibility features in Apache Ambari that allow for better integration of third-party services and custom stacks. Key features include stack featurization to remove hardcoded dependencies, service-level extension points for upgrade packs, advisors and quick links, and the ability to define custom repositories at the service level. These changes improve Ambari's ability to integrate custom services and stacks in a decoupled and self-contained manner.

State of the Trino Project
State of the Trino ProjectState of the Trino Project
State of the Trino Project

Trino (formerly known as PrestoSQL) is an open source distributed SQL query engine for running fast analytical queries against data sources of all sizes. Some key updates since being rebranded from PrestoSQL to Trino include new security features, language features like window functions and temporal types, performance improvements through dynamic filtering and partition pruning, and new connectors. Upcoming improvements include support for MERGE statements, MATCH_RECOGNIZE patterns, and materialized view enhancements.

prestotrinohive
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBaseHBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase

This document discusses file system usage in HBase. It describes the main file types in HBase including write ahead logs (WALs), data files, and reference files. It covers topics like durability semantics, IO fencing, and data locality techniques used in HBase like short circuit reads, checksums, and block placement. The document is presented by Enis Söztutar and is intended to help understand how HBase performs IO operations over HDFS for tuning performance.

hbaseconhbasebig data
Overview (Schema)
17
● Phoenix table can be:
○ Built from scratch.
○ Mapped to an existing HBase table.
■ Read-Write Table
■ Read-Only View
Overview (Schema)
18
 Read-Write Table:
○ column families will be created automatically if they
don’t already exist.
○ An empty key value will be added to the first column
family of each existing row to minimize the size of
the projection for queries.
Overview (Schema)
19
 Read-Only View:
○ All column families must already exist.
○ Addition of the Phoenix coprocessors used for query
processing (Only change to HBase table).
3.
Architecture
Architecture, Phoenix Data Mode, Query Execution
and Enviroment

Recommended for you

Apache Kafka® and API Management
Apache Kafka® and API ManagementApache Kafka® and API Management
Apache Kafka® and API Management

Kafka and Confluent are nice, but what about the integration with public clouds like Azure. Or even better, to integrate Kafka and Confluent with a managed API management like Azure API Gateway. In this talk I will show you how it is possible to integrate an event streaming platform like Confluent into an enterprise API Management and different other services to build up a lambda based data platform architecture.

apache kafka
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...

Uber has real needs to provide faster, fresher data to data consumers & products, running hundreds of thousands of analytical queries everyday. Uber engineers will share the design, architecture & use-cases of the second generation of ‘Hudi’, a self contained Apache Spark library to build large scale analytical datasets designed to serve such needs and beyond. Hudi (formerly Hoodie) is created to effectively manage petabytes of analytical data on distributed storage, while supporting fast ingestion & queries. In this talk, we will discuss how we leveraged Spark as a general purpose distributed execution engine to build Hudi, detailing tradeoffs & operational experience. We will also show to ingest data into Hudi using Spark Datasource/Streaming APIs and build Notebooks/Dashboards on top using Spark SQL.

apache sparksparkaisummit
From my sql to postgresql using kafka+debezium
From my sql to postgresql using kafka+debeziumFrom my sql to postgresql using kafka+debezium
From my sql to postgresql using kafka+debezium

REX How Jobteaser got rid of an old data dump job using change data capture (CDC) with Debezium and Kafka.

kafkadebeziumjobteaser
Architecture
21
Architecture
22
Architecture (Phoenix Data Model)
23
Architecture (Server Metrics Example)
24

Recommended for you

Building large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBuilding large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudi

Data is a critical infrastructure for building machine learning systems. From ensuring accurate ETAs to predicting optimal traffic routes, providing safe, seamless transportation and delivery experiences on the Uber platform requires reliable, performant large-scale data storage and analysis. In 2016, Uber developed Apache Hudi, an incremental processing framework, to power business critical data pipelines at low latency and high efficiency, and helps distributed organizations build and manage petabyte-scale data lakes. In this talk, I will describe what is APache Hudi and its architectural design, and then deep dive to improving data operations by providing features such as data versioning, time travel. We will also go over how Hudi brings kappa architecture to big data systems and enables efficient incremental processing for near real time use cases. Speaker: Satish Kotha (Uber) Apache Hudi committer and Engineer at Uber. Previously, he worked on building real time distributed storage systems like Twitter MetricsDB and BlobStore. website: https://www.aicamp.ai/event/eventdetails/W2021043010

Spark streaming , Spark SQL
Spark streaming , Spark SQLSpark streaming , Spark SQL
Spark streaming , Spark SQL

Spark Streaming allows processing of live data streams in Spark. It integrates streaming data and batch processing within the same Spark application. Spark SQL provides a programming abstraction called DataFrames and can be used to query structured data in Spark. Structured Streaming in Spark 2.0 provides a high-level API for building streaming applications on top of Spark SQL's engine. It allows running the same queries on streaming data as on batch data and unifies streaming, interactive, and batch processing.

sparksqlsparkspark2.0
Hadoop and HBase experiences in perf log project
Hadoop and HBase experiences in perf log projectHadoop and HBase experiences in perf log project
Hadoop and HBase experiences in perf log project

This document discusses experiences using Hadoop and HBase in the Perf-Log project. It provides an overview of the Perf-Log data format and architecture, describes how Hadoop and HBase were configured, and gives examples of using MapReduce jobs and HBase APIs like Put and Scan to analyze log data. Key aspects covered include matching Hadoop and HBase versions, running MapReduce jobs, using column families in HBase, and filtering Scan results.

hbaseperformancehadoop
Architecture (Server Metrics Example)
25
● Example:
26
Overlay Row Key
Query
Perform Merge Sort
Skip Filtering
Scan Interception
Execute Scan
Perform Final Merge Sort
Intercept Scan in Coprocessor
Filter using Skip Scan
Execute Parallel Scans
Overlay Row Key Ranges with Regions
Identify Row Key Ranges from Query
Architecture (Query Execution)
Architecture (Enviroment)
27
Data Warehouse
Extract, Transform,
Load (ETL)
BI and Visualizing
4.
Code
Commands and Sample Codes

Recommended for you

Apache Big Data EU 2015 - Phoenix
Apache Big Data EU 2015 - PhoenixApache Big Data EU 2015 - Phoenix
Apache Big Data EU 2015 - Phoenix

This document provides an overview of Apache Phoenix, including: - What Phoenix is and how it provides a SQL interface for Apache HBase - The current state of Phoenix including SQL support, secondary indexes, and optimizations - New features in Phoenix 4.4 like functional indexes, user defined functions, and integration with Spark The presentation covers the evolution and capabilities of Phoenix as a relational layer for HBase that transforms SQL queries into native HBase API calls.

apache phoenixapache hbasesql
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013

Impala is a massively parallel processing SQL query engine for Hadoop. It allows users to issue SQL queries directly to their data in Apache Hadoop. Impala uses a distributed architecture where queries are executed in parallel across nodes by Impala daemons. It uses a new execution engine written in C++ with runtime code generation for high performance. Impala also supports commonly used Hadoop file formats and can query data stored in HDFS and HBase.

HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and SparkHBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark

This document discusses using Phoenix and Spark with ApsaraDB HBase. It covers the architecture of Phoenix as a service over HBase, use cases like log and internet company scenarios, best practices for table properties and queries, challenges around availability and stability, and improvements being made. It also discusses how Spark can be used for analysis, bulk loading, real-time ETL, and to provide elastic compute resources. Example architectures show Spark SQL analyzing HBase and structured streaming incrementally loading data. Scenarios discussed include online reporting, complex analysis, log indexing and querying, and time series monitoring.

hbasehbaseconasia2018alibaba
Code (Commands)
29
● DML Commands:
○ UPSERT VALUES
○ UPSERT SELECT
○ DELETE
● DDL Commands:
○ CREATE TABLE
○ CREATE VIEW
○ Drop Table
○ Drop View
30
Connection:
● Long Running
● Short Running Connection conn =
DriverManager.getConnection(“jdbc:phoenix:my_server:longRunning”,
longRunningProps);
Connection conn =
DriverManager.getConnection("jdbc:phoenix:my_server:shortRunning",
shortRunningProps);
31
@Test
public void createTable() throws Exception {
String tableName = generateUniqueName();
long numSaltBuckets = 6;
String ddl = "CREATE TABLE " + tableName + " (K VARCHAR NOT NULL
PRIMARY KEY, V VARCHAR)" + " SALT_BUCKETS = " + numSaltBuckets;
Connection conn = DriverManager.getConnection(getUrl());
conn.createStatement().execute(ddl);
}
Transactions:
● Create Table
32
@Test
public void readTable() throws Exception {
String tableName = generateUniqueName();
long numSaltBuckets = 6;
long numRows = 1000;
long numExpectedTasks = numSaltBuckets;
insertRowsInTable(tableName, numRows);
String query = "SELECT * FROM " + tableName;
Statement stmt = conn.createStatement();
ResultSet rs = stmt.executeQuery(query);
PhoenixResultSet resultSetBeingTested = rs.unwrap(PhoenixResultSet.class);
changeInternalStateForTesting(resultSetBeingTested);
while (resultSetBeingTested.next()) {}
resultSetBeingTested.close();
Set<String> expectedTableNames = Sets.newHashSet(tableName);
assertReadMetricValuesForSelectSql(Lists.newArrayList(numRows),
Lists.newArrayList(numExpectedTasks),
resultSetBeingTested, expectedTableNames);
}
Transactions:
● Read Table

Recommended for you

Extending Spark for Qbeast's SQL Data Source​ with Paola Pardo and Cesare Cug...
Extending Spark for Qbeast's SQL Data Source​ with Paola Pardo and Cesare Cug...Extending Spark for Qbeast's SQL Data Source​ with Paola Pardo and Cesare Cug...
Extending Spark for Qbeast's SQL Data Source​ with Paola Pardo and Cesare Cug...

Slides of the Barcelona Spark meetup of the 24th of October 2019. The recording is available at https://www.youtube.com/watch?v=eCoCcBH4hIU. Abstract One of the key strengths of Spark is its flexibility as it integrates with dozens of different storage systems and file formats. However, it is not the same reading from a CSV file, or a SQL database, or an exotic stratified sampled multidimensional database. And finding the right balance between modularity and flexibility is not easy! In this presentation, we will talk about the evolution of Spark's DataSource API, and how it integrates with the SQL optimizer, highlighting how we can make much faster queries with logical and the physical plans that better integrates with the storage. From theory to practise, we will then discuss how we extended the Spark's internals, and we built a new source integration that allows the push-down of both sampling and multidimensional filtering. About the speakers: Paola Pardo is a Computer Engineer from Barcelona. She graduated in Computer engineer this last summer at the Technical University of Catalunya with a thesis focused on Data storage push down optimization based on Apache Spark. She is, and she is currently working at Barcelona Supercomputing Center and in its spin-off Qbeast developing a Qbeast-Spark connector. Cesare Cugnasco is a PhD in Computer Architecture and a researcher at the Barcelona Supercomputing Center. His research focuses on NoSQL databases, distributed computing and High-performance storage. He invented and patented a new database architecture for Big Data, and he is building a spin-off for its commercialization.

big datadatabasescala
Fast federated SQL with Apache Calcite
Fast federated SQL with Apache CalciteFast federated SQL with Apache Calcite
Fast federated SQL with Apache Calcite

This document discusses Apache Calcite, an open source framework for federated SQL queries. It provides an introduction to Calcite and its components. It then evaluates Calcite's performance on single data sources through benchmarks. Lastly, it proposes a hybrid approach to enable efficient federated queries using Calcite and Spark.

sqlapacheconapache calcite
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem

Hadoop became the most common systm to store big data. With Hadoop, many supporting systems emerged to complete the aspects that are missing in Hadoop itself. Together they form a big ecosystem. This presentation covers some of those systems. While not capable to cover too many in one presentation, I tried to focus on the most famous/popular ones and on the most interesting ones.

hivescaldingparquet
33
@Override
public void getRowCount(ResultSet resultSet) throws SQLException {
Tuple row = resultSet.unwrap(PhoenixResultSet.class).getCurrentRow();
Cell kv = row.getValue(0);
ImmutableBytesWritable tmpPtr = new
ImmutableBytesWritable(kv.getValueArray(), kv.getValueOffset(),
kv.getValueLength());
// A single Cell will be returned with the count(*) - we decode that here
rowCount = PLong.INSTANCE.getCodec().decodeLong(tmpPtr,
SortOrder.getDefault());
}
Transactions:
● Row Count
34
private void changeInternalStateForTesting(PhoenixResultSet rs) {
// get and set the internal state for testing purposes.
ReadMetricQueue testMetricsQueue = new
TestReadMetricsQueue(LogLevel.OFF,true);
StatementContext ctx = (StatementContext)Whitebox.getInternalState(rs,
"context");
Whitebox.setInternalState(ctx, "readMetricsQueue", testMetricsQueue);
Whitebox.setInternalState(rs, "readMetricsQueue", testMetricsQueue);
}
Transactions:
● Internal State
5.
Capabilities
Features and Capabilities
Capabilities
● Overlays on top of HBase Data Model
● Keeps Versioned Schema Respository
● Query Processor
36

Recommended for you

How to make data available for analytics ASAP
How to make data available for analytics ASAPHow to make data available for analytics ASAP
How to make data available for analytics ASAP

This document discusses how to make data available for analytics in MariaDB ColumnStore. It covers loading data using command line tools, SQL, and bulk write APIs. It also discusses integrating with applications via data adapters like Pentaho and MaxScale CDC. Future improvements may include integrated MaxScale CDC and performance enhancements to loading tools.

Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache ApexApache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex

Stream processing applications built on Apache Apex run on Hadoop clusters and typically power analytics use cases where availability, flexible scaling, high throughput, low latency and correctness are essential. These applications consume data from a variety of sources, including streaming sources like Apache Kafka, Kinesis or JMS, file based sources or databases. Processing results often need to be stored in external systems (sinks) for downstream consumers (pub-sub messaging, real-time visualization, Hive and other SQL databases etc.). Apex has the Malhar library with a wide range of connectors and other operators that are readily available to build applications. We will cover key characteristics like partitioning and processing guarantees, generic building blocks for new operators (write-ahead-log, incremental state saving, windowing etc.) and APIs for application specification.

big dataapache apexbig data analytics
ApacheCon: Apache Flink - Fast and Reliable Large-Scale Data Processing
ApacheCon: Apache Flink - Fast and Reliable Large-Scale Data ProcessingApacheCon: Apache Flink - Fast and Reliable Large-Scale Data Processing
ApacheCon: Apache Flink - Fast and Reliable Large-Scale Data Processing

This document provides an overview of Apache Flink, a distributed dataflow processing system for large-scale data analytics. Flink supports both stream and batch processing with easy to use APIs in Java and Scala. It focuses on fast and reliable processing at large scales and includes libraries for machine learning, graphs, and SQL-like queries.

flinkapacheapachecon
Capabilities
● Cost-based query optimizer.
● Enhance existing statistics collection.
● Generate histograms to drive query
optimization decisions and join ordering.
37
Capabilities
● Secondary indexes:
● Boost the speed of queries without relying
on specific row-key designs.
● Enable users to use star schemes.
● Leverage SQL tools and Online Analytics 38
Capabilities
● Row timestamp column.
● Set minimum and maximum time range
for scans.
● Improves performance especially when
querying the tail-end of the data.
39
5.
Scenarios
Use Cases

Recommended for you

Apache Calcite: One Frontend to Rule Them All
Apache Calcite: One Frontend to Rule Them AllApache Calcite: One Frontend to Rule Them All
Apache Calcite: One Frontend to Rule Them All

Apache Calcite is an open source framework that allows for a unified query interface over heterogeneous data sources. It provides an ANSI-compliant SQL parser, a logical query optimizer, and acts as a middleware layer that can integrate data from multiple sources. Calcite uses a relational algebra approach and has pluggable adapters that allow it to connect to different backends like MySQL, MongoDB, and streaming data sources. It supports features like SQL queries, views, optimization rules, and works across both batch and streaming data. The project aims to continue adding new capabilities like geospatial queries and improved cost modeling.

apachecalciteopen source
OpenTSDB 2.0
OpenTSDB 2.0OpenTSDB 2.0
OpenTSDB 2.0

Speakers: Chris Larsen (Limelight Networks) and Benoit Sigoure (Arista Networks) The OpenTSDB community continues to grow and with users looking to store massive amounts of time-series data in a scalable manner. In this talk, we will discuss a number of use cases and best practices around naming schemas and HBase configuration. We will also review OpenTSDB 2.0's new features, including the HTTP API, plugins, annotations, millisecond support, and metadata, as well as what's next in the roadmap.

hbasehbasecon
Extend Redis with Modules
Extend Redis with ModulesExtend Redis with Modules
Extend Redis with Modules

Starting with v4, modules hold a promise for changing how Redis is used and developed for. Enabling custom data types and commands, Redis Modules build upon and extend the core functionality to handle any use case. The video of the webinar given with these slides is at: https://youtu.be/EglSYFodaqw

redisdatabasedevelopment
Scenarios (Server Metrics Example)
41
SELECT substr(host,1,3), trunc(date,’DAY’),
avg(response_time) FROM server_metrics
WHERE date > CURRENT_DATE() – 7
AND substr(host, 1, 3) IN (‘sf1’, ‘sf3, ‘sf7’)
GROUP BY substr(host, 1, 3), trunc(date,’DAY’)
42
Scenarios (Chart Response Time Per Cluster)
SELECT host, date, gc_time
FROM server_metrics WHERE date >
CURRENT_DATE() – 7
AND substr(host, 1, 3) IN (‘sf1’, ‘sf3, ‘sf7’)
ORDER BY gc_time DESC
LIMIT 5 43
Scenarios (Find 5 Longest GC Times )
Thanks!
Any questions?
You can find me at:
Github: @sxaxmz
Linkedin: linkedin.com/in/husseinosama
44

Recommended for you

Apache Spark Introduction - CloudxLab
Apache Spark Introduction - CloudxLabApache Spark Introduction - CloudxLab
Apache Spark Introduction - CloudxLab

Hands-on Session on Big Data processing using Apache Spark and Hadoop Distributed File System This is the first session in the series of "Apache Spark Hands-on" Topics Covered + Introduction to Apache Spark + Introduction to RDD (Resilient Distributed Datasets) + Loading data into an RDD + RDD Operations - Transformation + RDD Operations - Actions + Hands-on demos using CloudxLab

spark cloudxlab hadoop bigdata
Flink Forward SF 2017: Timo Walther - Table & SQL API – unified APIs for bat...
Flink Forward SF 2017: Timo Walther -  Table & SQL API – unified APIs for bat...Flink Forward SF 2017: Timo Walther -  Table & SQL API – unified APIs for bat...
Flink Forward SF 2017: Timo Walther - Table & SQL API – unified APIs for bat...

This document discusses Flink's Table and SQL APIs, which provide a unified way to write batch and streaming queries. It motivates the need for a relational API by explaining that while Flink's DataStream API is powerful, it requires more technical skills. The Table and SQL APIs allow users to focus on business logic by writing declarative queries. It describes how the APIs work, including translating queries to logical and execution plans and supporting batch, streaming and windowed queries. Finally, it outlines the current capabilities and opportunities for contributors to help expand Flink's relational features.

open sourceconferencebig data
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem

Hadoop became the most common systm to store big data. With Hadoop, many supporting systems emerged to complete the aspects that are missing in Hadoop itself. Together they form a big ecosystem. This presentation covers some of those systems. While not capable to cover too many in one presentation, I tried to focus on the most famous/popular ones and on the most interesting ones.

hadoop scalding hive impala parquet cloudera

More Related Content

What's hot

Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
StreamNative
 
Tuning Apache Ambari performance for Big Data at scale with 3000 agents
Tuning Apache Ambari performance for Big Data at scale with 3000 agentsTuning Apache Ambari performance for Big Data at scale with 3000 agents
Tuning Apache Ambari performance for Big Data at scale with 3000 agents
DataWorks Summit
 
OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)
OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)
OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)
SANG WON PARK
 
Scylla Summit 2022: Making Schema Changes Safe with Raft
Scylla Summit 2022: Making Schema Changes Safe with RaftScylla Summit 2022: Making Schema Changes Safe with Raft
Scylla Summit 2022: Making Schema Changes Safe with Raft
ScyllaDB
 
Extending Apache Ranger Authorization Beyond Hadoop: Review of Apache Ranger ...
Extending Apache Ranger Authorization Beyond Hadoop: Review of Apache Ranger ...Extending Apache Ranger Authorization Beyond Hadoop: Review of Apache Ranger ...
Extending Apache Ranger Authorization Beyond Hadoop: Review of Apache Ranger ...
DataWorks Summit
 
An Introduction To REST API
An Introduction To REST APIAn Introduction To REST API
An Introduction To REST API
Aniruddh Bhilvare
 
Apache Ambari: Past, Present, Future
Apache Ambari: Past, Present, FutureApache Ambari: Past, Present, Future
Apache Ambari: Past, Present, Future
Hortonworks
 
Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explained
confluent
 
ELK Stack
ELK StackELK Stack
ELK Stack
Phuc Nguyen
 
Ozone: scaling HDFS to trillions of objects
Ozone: scaling HDFS to trillions of objectsOzone: scaling HDFS to trillions of objects
Ozone: scaling HDFS to trillions of objects
DataWorks Summit
 
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
GetInData
 
Introduction to sqoop
Introduction to sqoopIntroduction to sqoop
Introduction to sqoop
Uday Vakalapudi
 
Apache Ambari Stack Extensibility
Apache Ambari Stack ExtensibilityApache Ambari Stack Extensibility
Apache Ambari Stack Extensibility
Jayush Luniya
 
State of the Trino Project
State of the Trino ProjectState of the Trino Project
State of the Trino Project
Martin Traverso
 
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBaseHBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
Cloudera, Inc.
 
Apache Kafka® and API Management
Apache Kafka® and API ManagementApache Kafka® and API Management
Apache Kafka® and API Management
confluent
 
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Databricks
 
From my sql to postgresql using kafka+debezium
From my sql to postgresql using kafka+debeziumFrom my sql to postgresql using kafka+debezium
From my sql to postgresql using kafka+debezium
Clement Demonchy
 
Building large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBuilding large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudi
Bill Liu
 
Spark streaming , Spark SQL
Spark streaming , Spark SQLSpark streaming , Spark SQL
Spark streaming , Spark SQL
Yousun Jeong
 

What's hot (20)

Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
 
Tuning Apache Ambari performance for Big Data at scale with 3000 agents
Tuning Apache Ambari performance for Big Data at scale with 3000 agentsTuning Apache Ambari performance for Big Data at scale with 3000 agents
Tuning Apache Ambari performance for Big Data at scale with 3000 agents
 
OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)
OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)
OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)
 
Scylla Summit 2022: Making Schema Changes Safe with Raft
Scylla Summit 2022: Making Schema Changes Safe with RaftScylla Summit 2022: Making Schema Changes Safe with Raft
Scylla Summit 2022: Making Schema Changes Safe with Raft
 
Extending Apache Ranger Authorization Beyond Hadoop: Review of Apache Ranger ...
Extending Apache Ranger Authorization Beyond Hadoop: Review of Apache Ranger ...Extending Apache Ranger Authorization Beyond Hadoop: Review of Apache Ranger ...
Extending Apache Ranger Authorization Beyond Hadoop: Review of Apache Ranger ...
 
An Introduction To REST API
An Introduction To REST APIAn Introduction To REST API
An Introduction To REST API
 
Apache Ambari: Past, Present, Future
Apache Ambari: Past, Present, FutureApache Ambari: Past, Present, Future
Apache Ambari: Past, Present, Future
 
Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explained
 
ELK Stack
ELK StackELK Stack
ELK Stack
 
Ozone: scaling HDFS to trillions of objects
Ozone: scaling HDFS to trillions of objectsOzone: scaling HDFS to trillions of objects
Ozone: scaling HDFS to trillions of objects
 
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
 
Introduction to sqoop
Introduction to sqoopIntroduction to sqoop
Introduction to sqoop
 
Apache Ambari Stack Extensibility
Apache Ambari Stack ExtensibilityApache Ambari Stack Extensibility
Apache Ambari Stack Extensibility
 
State of the Trino Project
State of the Trino ProjectState of the Trino Project
State of the Trino Project
 
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBaseHBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
 
Apache Kafka® and API Management
Apache Kafka® and API ManagementApache Kafka® and API Management
Apache Kafka® and API Management
 
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
 
From my sql to postgresql using kafka+debezium
From my sql to postgresql using kafka+debeziumFrom my sql to postgresql using kafka+debezium
From my sql to postgresql using kafka+debezium
 
Building large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBuilding large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudi
 
Spark streaming , Spark SQL
Spark streaming , Spark SQLSpark streaming , Spark SQL
Spark streaming , Spark SQL
 

Similar to Apache phoenix

Hadoop and HBase experiences in perf log project
Hadoop and HBase experiences in perf log projectHadoop and HBase experiences in perf log project
Hadoop and HBase experiences in perf log project
Mao Geng
 
Apache Big Data EU 2015 - Phoenix
Apache Big Data EU 2015 - PhoenixApache Big Data EU 2015 - Phoenix
Apache Big Data EU 2015 - Phoenix
Nick Dimiduk
 
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Modern Data Stack France
 
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and SparkHBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
Michael Stack
 
Extending Spark for Qbeast's SQL Data Source​ with Paola Pardo and Cesare Cug...
Extending Spark for Qbeast's SQL Data Source​ with Paola Pardo and Cesare Cug...Extending Spark for Qbeast's SQL Data Source​ with Paola Pardo and Cesare Cug...
Extending Spark for Qbeast's SQL Data Source​ with Paola Pardo and Cesare Cug...
Qbeast
 
Fast federated SQL with Apache Calcite
Fast federated SQL with Apache CalciteFast federated SQL with Apache Calcite
Fast federated SQL with Apache Calcite
Chris Baynes
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
Ran Silberman
 
How to make data available for analytics ASAP
How to make data available for analytics ASAPHow to make data available for analytics ASAP
How to make data available for analytics ASAP
MariaDB plc
 
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache ApexApache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Apex
 
ApacheCon: Apache Flink - Fast and Reliable Large-Scale Data Processing
ApacheCon: Apache Flink - Fast and Reliable Large-Scale Data ProcessingApacheCon: Apache Flink - Fast and Reliable Large-Scale Data Processing
ApacheCon: Apache Flink - Fast and Reliable Large-Scale Data Processing
Fabian Hueske
 
Apache Calcite: One Frontend to Rule Them All
Apache Calcite: One Frontend to Rule Them AllApache Calcite: One Frontend to Rule Them All
Apache Calcite: One Frontend to Rule Them All
Michael Mior
 
OpenTSDB 2.0
OpenTSDB 2.0OpenTSDB 2.0
OpenTSDB 2.0
HBaseCon
 
Extend Redis with Modules
Extend Redis with ModulesExtend Redis with Modules
Extend Redis with Modules
Itamar Haber
 
Apache Spark Introduction - CloudxLab
Apache Spark Introduction - CloudxLabApache Spark Introduction - CloudxLab
Apache Spark Introduction - CloudxLab
Abhinav Singh
 
Flink Forward SF 2017: Timo Walther - Table & SQL API – unified APIs for bat...
Flink Forward SF 2017: Timo Walther -  Table & SQL API – unified APIs for bat...Flink Forward SF 2017: Timo Walther -  Table & SQL API – unified APIs for bat...
Flink Forward SF 2017: Timo Walther - Table & SQL API – unified APIs for bat...
Flink Forward
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
Ran Silberman
 
Java one2013
Java one2013Java one2013
Java one2013
Aleksei Kornev
 
Spark Study Notes
Spark Study NotesSpark Study Notes
Spark Study Notes
Richard Kuo
 
Flink internals web
Flink internals web Flink internals web
Flink internals web
Kostas Tzoumas
 
Chapel-on-X: Exploring Tasking Runtimes for PGAS Languages
Chapel-on-X: Exploring Tasking Runtimes for PGAS LanguagesChapel-on-X: Exploring Tasking Runtimes for PGAS Languages
Chapel-on-X: Exploring Tasking Runtimes for PGAS Languages
Akihiro Hayashi
 

Similar to Apache phoenix (20)

Hadoop and HBase experiences in perf log project
Hadoop and HBase experiences in perf log projectHadoop and HBase experiences in perf log project
Hadoop and HBase experiences in perf log project
 
Apache Big Data EU 2015 - Phoenix
Apache Big Data EU 2015 - PhoenixApache Big Data EU 2015 - Phoenix
Apache Big Data EU 2015 - Phoenix
 
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
 
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and SparkHBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
 
Extending Spark for Qbeast's SQL Data Source​ with Paola Pardo and Cesare Cug...
Extending Spark for Qbeast's SQL Data Source​ with Paola Pardo and Cesare Cug...Extending Spark for Qbeast's SQL Data Source​ with Paola Pardo and Cesare Cug...
Extending Spark for Qbeast's SQL Data Source​ with Paola Pardo and Cesare Cug...
 
Fast federated SQL with Apache Calcite
Fast federated SQL with Apache CalciteFast federated SQL with Apache Calcite
Fast federated SQL with Apache Calcite
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
How to make data available for analytics ASAP
How to make data available for analytics ASAPHow to make data available for analytics ASAP
How to make data available for analytics ASAP
 
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache ApexApache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
 
ApacheCon: Apache Flink - Fast and Reliable Large-Scale Data Processing
ApacheCon: Apache Flink - Fast and Reliable Large-Scale Data ProcessingApacheCon: Apache Flink - Fast and Reliable Large-Scale Data Processing
ApacheCon: Apache Flink - Fast and Reliable Large-Scale Data Processing
 
Apache Calcite: One Frontend to Rule Them All
Apache Calcite: One Frontend to Rule Them AllApache Calcite: One Frontend to Rule Them All
Apache Calcite: One Frontend to Rule Them All
 
OpenTSDB 2.0
OpenTSDB 2.0OpenTSDB 2.0
OpenTSDB 2.0
 
Extend Redis with Modules
Extend Redis with ModulesExtend Redis with Modules
Extend Redis with Modules
 
Apache Spark Introduction - CloudxLab
Apache Spark Introduction - CloudxLabApache Spark Introduction - CloudxLab
Apache Spark Introduction - CloudxLab
 
Flink Forward SF 2017: Timo Walther - Table & SQL API – unified APIs for bat...
Flink Forward SF 2017: Timo Walther -  Table & SQL API – unified APIs for bat...Flink Forward SF 2017: Timo Walther -  Table & SQL API – unified APIs for bat...
Flink Forward SF 2017: Timo Walther - Table & SQL API – unified APIs for bat...
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Java one2013
Java one2013Java one2013
Java one2013
 
Spark Study Notes
Spark Study NotesSpark Study Notes
Spark Study Notes
 
Flink internals web
Flink internals web Flink internals web
Flink internals web
 
Chapel-on-X: Exploring Tasking Runtimes for PGAS Languages
Chapel-on-X: Exploring Tasking Runtimes for PGAS LanguagesChapel-on-X: Exploring Tasking Runtimes for PGAS Languages
Chapel-on-X: Exploring Tasking Runtimes for PGAS Languages
 

Recently uploaded

Mira Bhayandar @Call @Girls Whatsapp 9920725232 With High Profile Offer
Mira Bhayandar @Call @Girls Whatsapp 9920725232 With High Profile OfferMira Bhayandar @Call @Girls Whatsapp 9920725232 With High Profile Offer
Mira Bhayandar @Call @Girls Whatsapp 9920725232 With High Profile Offer
amaa57820
 
@Call @Girls Coimbatore 🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cute Girl a...
@Call @Girls Coimbatore 🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cute Girl a...@Call @Girls Coimbatore 🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cute Girl a...
@Call @Girls Coimbatore 🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cute Girl a...
shivvichadda
 
BIGPPTTTTTTTTtttttttttttttttttttttt.pptx
BIGPPTTTTTTTTtttttttttttttttttttttt.pptxBIGPPTTTTTTTTtttttttttttttttttttttt.pptx
BIGPPTTTTTTTTtttttttttttttttttttttt.pptx
RajdeepPaul47
 
₹Call ₹Girls Mumbai Central 09930245274 Deshi Chori Near You
₹Call ₹Girls Mumbai Central 09930245274 Deshi Chori Near You₹Call ₹Girls Mumbai Central 09930245274 Deshi Chori Near You
₹Call ₹Girls Mumbai Central 09930245274 Deshi Chori Near You
model sexy
 
Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...
Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...
Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...
javier ramirez
 
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
#kalyanmatkaresult #dpboss #kalyanmatka #satta #matka #sattamatka
 
Orange Yellow Gradient Aesthetic Y2K Creative Portfolio Presentation -3.pdf
Orange Yellow Gradient Aesthetic Y2K Creative Portfolio Presentation -3.pdfOrange Yellow Gradient Aesthetic Y2K Creative Portfolio Presentation -3.pdf
Orange Yellow Gradient Aesthetic Y2K Creative Portfolio Presentation -3.pdf
RealDarrah
 
Niagara College degree offer diploma Transcript
Niagara College  degree offer diploma TranscriptNiagara College  degree offer diploma Transcript
Niagara College degree offer diploma Transcript
taqyea
 
一比一原版英国埃塞克斯大学毕业证(essex毕业证书)如何办理
一比一原版英国埃塞克斯大学毕业证(essex毕业证书)如何办理一比一原版英国埃塞克斯大学毕业证(essex毕业证书)如何办理
一比一原版英国埃塞克斯大学毕业证(essex毕业证书)如何办理
qemnpg
 
AWS Cloud Technology and Services by Miguel Ángel Rodríguez Anticona.pdf
AWS Cloud Technology and Services by Miguel Ángel Rodríguez Anticona.pdfAWS Cloud Technology and Services by Miguel Ángel Rodríguez Anticona.pdf
AWS Cloud Technology and Services by Miguel Ángel Rodríguez Anticona.pdf
Miguel Ángel Rodríguez Anticona
 
[D3T2S03] Data&AI Roadshow 2024 - Amazon DocumentDB 실습
[D3T2S03] Data&AI Roadshow 2024 - Amazon DocumentDB 실습[D3T2S03] Data&AI Roadshow 2024 - Amazon DocumentDB 실습
[D3T2S03] Data&AI Roadshow 2024 - Amazon DocumentDB 실습
Amazon Web Services Korea
 
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
#kalyanmatkaresult #dpboss #kalyanmatka #satta #matka #sattamatka
 
Delhi @ℂall @Girls ꧁❤ 9711199012 ❤꧂Glamorous sonam Mehra Top Model Safe
Delhi @ℂall @Girls ꧁❤ 9711199012 ❤꧂Glamorous sonam Mehra Top Model SafeDelhi @ℂall @Girls ꧁❤ 9711199012 ❤꧂Glamorous sonam Mehra Top Model Safe
Delhi @ℂall @Girls ꧁❤ 9711199012 ❤꧂Glamorous sonam Mehra Top Model Safe
dipti singh$A17
 
Bangalore @Call @Girls 0000000000 Riya Khan Beautiful And Cute Girl any Time
Bangalore @Call @Girls 0000000000 Riya Khan Beautiful And Cute Girl any TimeBangalore @Call @Girls 0000000000 Riya Khan Beautiful And Cute Girl any Time
Bangalore @Call @Girls 0000000000 Riya Khan Beautiful And Cute Girl any Time
adityaroy0215
 
@Call @Girls in Kolkata 💋😂 XXXXXXXX 👄👄 Hello My name Is Kamli I am Here meet you
@Call @Girls in Kolkata 💋😂 XXXXXXXX 👄👄 Hello My name Is Kamli I am Here meet you@Call @Girls in Kolkata 💋😂 XXXXXXXX 👄👄 Hello My name Is Kamli I am Here meet you
@Call @Girls in Kolkata 💋😂 XXXXXXXX 👄👄 Hello My name Is Kamli I am Here meet you
Delhi Call Girls
 
@Call @Girls Mira Bhayandar phone 9920874524 You Are Serach A Beautyfull Doll...
@Call @Girls Mira Bhayandar phone 9920874524 You Are Serach A Beautyfull Doll...@Call @Girls Mira Bhayandar phone 9920874524 You Are Serach A Beautyfull Doll...
@Call @Girls Mira Bhayandar phone 9920874524 You Are Serach A Beautyfull Doll...
Disha Mukharji
 
Mumbai Central @Call @Girls 🛴 9930687706 🛴 Aaradhaya Best High Class Mumbai A...
Mumbai Central @Call @Girls 🛴 9930687706 🛴 Aaradhaya Best High Class Mumbai A...Mumbai Central @Call @Girls 🛴 9930687706 🛴 Aaradhaya Best High Class Mumbai A...
Mumbai Central @Call @Girls 🛴 9930687706 🛴 Aaradhaya Best High Class Mumbai A...
1258strict
 
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
#kalyanmatkaresult #dpboss #kalyanmatka #satta #matka #sattamatka
 
Maruti Wagon R on road price in Faridabad - CarDekho
Maruti Wagon R on road price in Faridabad - CarDekhoMaruti Wagon R on road price in Faridabad - CarDekho
Maruti Wagon R on road price in Faridabad - CarDekho
kamli sharma#S10
 
( Call ) Girls South Mumbai phone 9930687706 You Are Serach A Beautyfull Doll...
( Call ) Girls South Mumbai phone 9930687706 You Are Serach A Beautyfull Doll...( Call ) Girls South Mumbai phone 9930687706 You Are Serach A Beautyfull Doll...
( Call ) Girls South Mumbai phone 9930687706 You Are Serach A Beautyfull Doll...
seenu pandey
 

Recently uploaded (20)

Mira Bhayandar @Call @Girls Whatsapp 9920725232 With High Profile Offer
Mira Bhayandar @Call @Girls Whatsapp 9920725232 With High Profile OfferMira Bhayandar @Call @Girls Whatsapp 9920725232 With High Profile Offer
Mira Bhayandar @Call @Girls Whatsapp 9920725232 With High Profile Offer
 
@Call @Girls Coimbatore 🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cute Girl a...
@Call @Girls Coimbatore 🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cute Girl a...@Call @Girls Coimbatore 🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cute Girl a...
@Call @Girls Coimbatore 🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cute Girl a...
 
BIGPPTTTTTTTTtttttttttttttttttttttt.pptx
BIGPPTTTTTTTTtttttttttttttttttttttt.pptxBIGPPTTTTTTTTtttttttttttttttttttttt.pptx
BIGPPTTTTTTTTtttttttttttttttttttttt.pptx
 
₹Call ₹Girls Mumbai Central 09930245274 Deshi Chori Near You
₹Call ₹Girls Mumbai Central 09930245274 Deshi Chori Near You₹Call ₹Girls Mumbai Central 09930245274 Deshi Chori Near You
₹Call ₹Girls Mumbai Central 09930245274 Deshi Chori Near You
 
Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...
Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...
Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...
 
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
 
Orange Yellow Gradient Aesthetic Y2K Creative Portfolio Presentation -3.pdf
Orange Yellow Gradient Aesthetic Y2K Creative Portfolio Presentation -3.pdfOrange Yellow Gradient Aesthetic Y2K Creative Portfolio Presentation -3.pdf
Orange Yellow Gradient Aesthetic Y2K Creative Portfolio Presentation -3.pdf
 
Niagara College degree offer diploma Transcript
Niagara College  degree offer diploma TranscriptNiagara College  degree offer diploma Transcript
Niagara College degree offer diploma Transcript
 
一比一原版英国埃塞克斯大学毕业证(essex毕业证书)如何办理
一比一原版英国埃塞克斯大学毕业证(essex毕业证书)如何办理一比一原版英国埃塞克斯大学毕业证(essex毕业证书)如何办理
一比一原版英国埃塞克斯大学毕业证(essex毕业证书)如何办理
 
AWS Cloud Technology and Services by Miguel Ángel Rodríguez Anticona.pdf
AWS Cloud Technology and Services by Miguel Ángel Rodríguez Anticona.pdfAWS Cloud Technology and Services by Miguel Ángel Rodríguez Anticona.pdf
AWS Cloud Technology and Services by Miguel Ángel Rodríguez Anticona.pdf
 
[D3T2S03] Data&AI Roadshow 2024 - Amazon DocumentDB 실습
[D3T2S03] Data&AI Roadshow 2024 - Amazon DocumentDB 실습[D3T2S03] Data&AI Roadshow 2024 - Amazon DocumentDB 실습
[D3T2S03] Data&AI Roadshow 2024 - Amazon DocumentDB 실습
 
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
 
Delhi @ℂall @Girls ꧁❤ 9711199012 ❤꧂Glamorous sonam Mehra Top Model Safe
Delhi @ℂall @Girls ꧁❤ 9711199012 ❤꧂Glamorous sonam Mehra Top Model SafeDelhi @ℂall @Girls ꧁❤ 9711199012 ❤꧂Glamorous sonam Mehra Top Model Safe
Delhi @ℂall @Girls ꧁❤ 9711199012 ❤꧂Glamorous sonam Mehra Top Model Safe
 
Bangalore @Call @Girls 0000000000 Riya Khan Beautiful And Cute Girl any Time
Bangalore @Call @Girls 0000000000 Riya Khan Beautiful And Cute Girl any TimeBangalore @Call @Girls 0000000000 Riya Khan Beautiful And Cute Girl any Time
Bangalore @Call @Girls 0000000000 Riya Khan Beautiful And Cute Girl any Time
 
@Call @Girls in Kolkata 💋😂 XXXXXXXX 👄👄 Hello My name Is Kamli I am Here meet you
@Call @Girls in Kolkata 💋😂 XXXXXXXX 👄👄 Hello My name Is Kamli I am Here meet you@Call @Girls in Kolkata 💋😂 XXXXXXXX 👄👄 Hello My name Is Kamli I am Here meet you
@Call @Girls in Kolkata 💋😂 XXXXXXXX 👄👄 Hello My name Is Kamli I am Here meet you
 
@Call @Girls Mira Bhayandar phone 9920874524 You Are Serach A Beautyfull Doll...
@Call @Girls Mira Bhayandar phone 9920874524 You Are Serach A Beautyfull Doll...@Call @Girls Mira Bhayandar phone 9920874524 You Are Serach A Beautyfull Doll...
@Call @Girls Mira Bhayandar phone 9920874524 You Are Serach A Beautyfull Doll...
 
Mumbai Central @Call @Girls 🛴 9930687706 🛴 Aaradhaya Best High Class Mumbai A...
Mumbai Central @Call @Girls 🛴 9930687706 🛴 Aaradhaya Best High Class Mumbai A...Mumbai Central @Call @Girls 🛴 9930687706 🛴 Aaradhaya Best High Class Mumbai A...
Mumbai Central @Call @Girls 🛴 9930687706 🛴 Aaradhaya Best High Class Mumbai A...
 
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
 
Maruti Wagon R on road price in Faridabad - CarDekho
Maruti Wagon R on road price in Faridabad - CarDekhoMaruti Wagon R on road price in Faridabad - CarDekho
Maruti Wagon R on road price in Faridabad - CarDekho
 
( Call ) Girls South Mumbai phone 9930687706 You Are Serach A Beautyfull Doll...
( Call ) Girls South Mumbai phone 9930687706 You Are Serach A Beautyfull Doll...( Call ) Girls South Mumbai phone 9930687706 You Are Serach A Beautyfull Doll...
( Call ) Girls South Mumbai phone 9930687706 You Are Serach A Beautyfull Doll...
 

Apache phoenix

  • 1. Apache Phoenix Put the SQL back in NoSQL 1 Osama Hussein, March 2021
  • 2. Agenda ● History ● Overview ● Architecture 2 ● Capabilities ● Code ● Scenarios
  • 3. 1. History From open-source repo to top Apache project
  • 4. Overview (Apache Phoenix) 4 ● Began as an internal project by the company (salesforce.com). MAY 2014 JAN 2014 A Top-Level Apache Project Orignially Open- Sourced on Github
  • 6. Overview (Apache Phoenix) 6 Lorem ipsum congue tempus Support for late-bound, schema-on- read SQL and JDBC API support Access to data stored and produced in other components such as Apache Spark and Apache Hive ● Developed as part of Apache Hadoop. ● Runs on top of Hadoop Distributed File System (HDFS). ● HBase scales linearly and shards automatically.
  • 7. Overview (Apache Phoenix) 7 Lorem ipsum congue tempus Support for late-bound, schema-on- read SQL and JDBC API support Access to data stored and produced in other components such as Apache Spark and Apache Hive ● Apache Phoenix is an add-on for Apache HBase that provides a programmatic ANSI SQL interface. ● implements best-practice optimizations to enable software engineers to develop next-generation data-driven applications based on HBase. ● Create and interact with tables in the form of typical DDL/DML statements using the standard JDBC API.
  • 8. Overview (Apache Phoenix) 8 ● Written in Java and SQL ● Atomicity, Consistency, Isolation and Durability (ACID) ● Fully integrated with other Hadoop products such as Spark, Hive, Pig, Flume, and Map Reduce.
  • 9. Overview (Apache Phoenix) 9 ● included in ○ Cloudera Data Platform 7.0 and above. ○ Hortonworks distribution for HDP 2.1 and above. ○ Available as part of Cloudera labs. ○ Part of the Hadoop ecosystem.
  • 10. Overview (SQL Support) 10 ● Compiles SQL to and orchestrate running of HBase scans. ● Produces JDBC result set. ● All standard SQL query constructs are supported.
  • 11. Overview (SQL Support) 11 ● Direct use of the HBase API, along with coprocessors and custom filters. Performance: ○ Milliseconds for small queries ○ Seconds for tens of millions of rows.
  • 12. Overview (Bulk Loading) 12 ● MapReduce-based : ○ CSV and JSON ○ Via Phoenix ○ MapReduce library ● Single-Threaded: ○ CSV ○ Via PostgreSQL (PSQL) ○ HBase on local machine
  • 13. Overview (User Defintion Functions) 13 ● Temporary UDFs for sessions only. ● Permanent UDFs stored in system functions. ● UDF used in SQL and indexes. ● Tenant specific UDF usage and support. ● UDF jar update require cluster bounce.
  • 14. Overview (Transactions) 14 ● Using Apache Tephra cross row/table/ACID support. ● Create tables with flag ‘transactional=true’. ● Enable transactions and snapshot directory and set timeout value ‘hbase-site.xml’. ● Transactions start with statement against table. ● Transactions end with commit or rollback.
  • 15. Overview (Transactions) 15 ● Applications let HBase manage timestamps. ● Incase the application needs to control the timestamp ‘CurrentSCN’ property must be specified at the connection time. ● ‘CurrentSCN’ controls the timestamp for any DDL, DML, or query.
  • 16. Overview (Schema) 16 ● The table metadata is stored in versioned HBase table (Up to 1000 versions). ● ‘UPDATE_CACHE_FREQUENCY’ allow the user to declare how often the server will be checked for meta data updates. Values: ○ Always ○ Never ○ Millisecond value
  • 17. Overview (Schema) 17 ● Phoenix table can be: ○ Built from scratch. ○ Mapped to an existing HBase table. ■ Read-Write Table ■ Read-Only View
  • 18. Overview (Schema) 18  Read-Write Table: ○ column families will be created automatically if they don’t already exist. ○ An empty key value will be added to the first column family of each existing row to minimize the size of the projection for queries.
  • 19. Overview (Schema) 19  Read-Only View: ○ All column families must already exist. ○ Addition of the Phoenix coprocessors used for query processing (Only change to HBase table).
  • 20. 3. Architecture Architecture, Phoenix Data Mode, Query Execution and Enviroment
  • 25. Architecture (Server Metrics Example) 25 ● Example:
  • 26. 26 Overlay Row Key Query Perform Merge Sort Skip Filtering Scan Interception Execute Scan Perform Final Merge Sort Intercept Scan in Coprocessor Filter using Skip Scan Execute Parallel Scans Overlay Row Key Ranges with Regions Identify Row Key Ranges from Query Architecture (Query Execution)
  • 27. Architecture (Enviroment) 27 Data Warehouse Extract, Transform, Load (ETL) BI and Visualizing
  • 29. Code (Commands) 29 ● DML Commands: ○ UPSERT VALUES ○ UPSERT SELECT ○ DELETE ● DDL Commands: ○ CREATE TABLE ○ CREATE VIEW ○ Drop Table ○ Drop View
  • 30. 30 Connection: ● Long Running ● Short Running Connection conn = DriverManager.getConnection(“jdbc:phoenix:my_server:longRunning”, longRunningProps); Connection conn = DriverManager.getConnection("jdbc:phoenix:my_server:shortRunning", shortRunningProps);
  • 31. 31 @Test public void createTable() throws Exception { String tableName = generateUniqueName(); long numSaltBuckets = 6; String ddl = "CREATE TABLE " + tableName + " (K VARCHAR NOT NULL PRIMARY KEY, V VARCHAR)" + " SALT_BUCKETS = " + numSaltBuckets; Connection conn = DriverManager.getConnection(getUrl()); conn.createStatement().execute(ddl); } Transactions: ● Create Table
  • 32. 32 @Test public void readTable() throws Exception { String tableName = generateUniqueName(); long numSaltBuckets = 6; long numRows = 1000; long numExpectedTasks = numSaltBuckets; insertRowsInTable(tableName, numRows); String query = "SELECT * FROM " + tableName; Statement stmt = conn.createStatement(); ResultSet rs = stmt.executeQuery(query); PhoenixResultSet resultSetBeingTested = rs.unwrap(PhoenixResultSet.class); changeInternalStateForTesting(resultSetBeingTested); while (resultSetBeingTested.next()) {} resultSetBeingTested.close(); Set<String> expectedTableNames = Sets.newHashSet(tableName); assertReadMetricValuesForSelectSql(Lists.newArrayList(numRows), Lists.newArrayList(numExpectedTasks), resultSetBeingTested, expectedTableNames); } Transactions: ● Read Table
  • 33. 33 @Override public void getRowCount(ResultSet resultSet) throws SQLException { Tuple row = resultSet.unwrap(PhoenixResultSet.class).getCurrentRow(); Cell kv = row.getValue(0); ImmutableBytesWritable tmpPtr = new ImmutableBytesWritable(kv.getValueArray(), kv.getValueOffset(), kv.getValueLength()); // A single Cell will be returned with the count(*) - we decode that here rowCount = PLong.INSTANCE.getCodec().decodeLong(tmpPtr, SortOrder.getDefault()); } Transactions: ● Row Count
  • 34. 34 private void changeInternalStateForTesting(PhoenixResultSet rs) { // get and set the internal state for testing purposes. ReadMetricQueue testMetricsQueue = new TestReadMetricsQueue(LogLevel.OFF,true); StatementContext ctx = (StatementContext)Whitebox.getInternalState(rs, "context"); Whitebox.setInternalState(ctx, "readMetricsQueue", testMetricsQueue); Whitebox.setInternalState(rs, "readMetricsQueue", testMetricsQueue); } Transactions: ● Internal State
  • 36. Capabilities ● Overlays on top of HBase Data Model ● Keeps Versioned Schema Respository ● Query Processor 36
  • 37. Capabilities ● Cost-based query optimizer. ● Enhance existing statistics collection. ● Generate histograms to drive query optimization decisions and join ordering. 37
  • 38. Capabilities ● Secondary indexes: ● Boost the speed of queries without relying on specific row-key designs. ● Enable users to use star schemes. ● Leverage SQL tools and Online Analytics 38
  • 39. Capabilities ● Row timestamp column. ● Set minimum and maximum time range for scans. ● Improves performance especially when querying the tail-end of the data. 39
  • 42. SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time) FROM server_metrics WHERE date > CURRENT_DATE() – 7 AND substr(host, 1, 3) IN (‘sf1’, ‘sf3, ‘sf7’) GROUP BY substr(host, 1, 3), trunc(date,’DAY’) 42 Scenarios (Chart Response Time Per Cluster)
  • 43. SELECT host, date, gc_time FROM server_metrics WHERE date > CURRENT_DATE() – 7 AND substr(host, 1, 3) IN (‘sf1’, ‘sf3, ‘sf7’) ORDER BY gc_time DESC LIMIT 5 43 Scenarios (Find 5 Longest GC Times )
  • 44. Thanks! Any questions? You can find me at: Github: @sxaxmz Linkedin: linkedin.com/in/husseinosama 44

Editor's Notes

  1. Apache Phoenix -> A scale-out RDBMS with evolutionary schema built on Apache HBase
  2. Internal project out of a need to support a higher level, well understood, SQL language.
  3. Apache HBase -> open-source non-relational distributed database modeled after Google's Bigtable and written in Java. Used to have random, real-time read/write access to Big Data. column-oriented, NoSQL database built on top of Hadoop.
  4. Apache Phoenix -> Open source massively parallel relational database engine supporting database for Online Transactional Processing (OLTP) and operational analytics in Hadoop. Provides JDB browser enabling users to create, delete and alter SQL tables, view instances indexes and querying data through SQL. Apache phoenix is a relational layer over Hbase. SQL skin for Hbase. Provides a JDBC driver that hides the intricacies of the noSQL
  5.  ACID is a set of properties of database transactions intended to guarantee data validity despite errors, power failures, and other mishaps. All changes to data are performed as if they are a single operation. 1. Atomicity preserves the “completeness” of the business process (all or nothing behavior) 2. Consistency refers to the state of the data both before and after the transaction is executed (Use transaction maintains the consistency of the state of the data) 3. Isolation means that transactions can run at the same time., there is no concurrency (locking mechanism is required) 4. Durability refers to the impact of an outage or a failure on a running transaction (data survives any failures) To summarize, a transaction will either complete, producing correct results, or terminate, with no effect.
  6. Bulk loading for tables created in phoenix is easier compared to tables created in HBase shell.
  7. (Server Bounce) Adminstrator/Technician removes power to the device in a "non-controlled shutdown.“ The "down" part of the bounce. Once the server is completely off, and all activity has ceased, the administrator restarts the server.
  8. Set phoenix.transactions.enabled property to true along with running transaction manager (included in distribution) to enable full ACID transactions (Tables may optionally be declared as transactionaltable may optionally be declared as transactional). A concurrency model is used to detect row level conflicts with first commit wins semantics. The later commit would produce an exception indicating that a conflict was detected. A transaction is started implicitly when a transactional table is referenced in a statement. at which no updates can be seen from other connections until either a commit or rollback occurs. A non transactional tables will not see their updates until after a commit has occurred. 
  9. Phoenix uses the value of this connection property as the max timestamp of scans. Timestamps may not be controlled for transactional tables. Instead, the transaction manager assigns timestamps which become the HBase cell timestamps after a commit. Timestamps are multiplied by 1,000,000 to ensure enough granularity for uniqueness across the cluster.
  10. Snapshot queries over older data will pick up and use the correct schema based on the time of connection (Based on CurrentSCN). Data updates such as addition or removal of a table column or the updates of table statistics. 1. ALWAYS value will cause the client to check with the server each time a statement is executed that references a table  (or once per commit for an UPSERT VALUES statement. 2. Millisecond value indicates how long the client will hold on to its cached version of the metadata before checking back with the server for updates.
  11. From scratch -> HBase table and column families will be created automatically. Mapped to existing -> The binary representation of the row key and key values must match that of the Phoenix data types
  12. 1. The primary use case for a VIEW is to transfer existing data into a Phoenix table. A table could also be declared as salted to prevent HBase region hot spotting.  The table catalog argument in the metadata APIs is used to filter based on the tenant ID for multi-tenant tables. 2. since data modification are not allowed on a VIEW and query performance will likely be less than as with a TABLE. Phoenix supports updatable views on top of tables with the unique feature leveraging the schemaless capabilities of HBase of being able to add columns to them. All views all share the same underlying physical HBase table and may even be indexed independently. A multi-tenant view may add columns which are defined solely for that user.
  13. 1. The primary use case for a VIEW is to transfer existing data into a Phoenix table. A table could also be declared as salted to prevent HBase region hot spotting.  The table catalog argument in the metadata APIs is used to filter based on the tenant ID for multi-tenant tables. 2. since data modification are not allowed on a VIEW and query performance will likely be less than as with a TABLE. Phoenix supports updatable views on top of tables with the unique feature leveraging the schemaless capabilities of HBase of being able to add columns to them. All views all share the same underlying physical HBase table and may even be indexed independently. A multi-tenant view may add columns which are defined solely for that user.
  14. Phoenix chunks up query using guidePosts, which means more threads working on a single region. Phoenix runs the queries in parallel on the client using a configurable number of threads. Aggregation is done in a coprocessor on the server-side, reducing the amount of data that is returned to the client.
  15. Phoenix chunks up query using guidePosts, which means more threads working on a single region. Phoenix runs the queries in parallel on the client using a configurable number of threads. Aggregation is done in a coprocessor on the server-side, reducing the amount of data that is returned to the client.
  16. Phoenix chunks up query using guidePosts, which means more threads working on a single region. Phoenix runs the queries in parallel on the client using a configurable number of threads. Aggregation is done in a coprocessor on the server-side, reducing the amount of data that is returned to the client.
  17. ETL is a type of data integration that refers to the three steps used to blend data from multiple sources. It's often used to build a data warehouse.
  18. Data Manipulation Language (DML). Data Definition Language (DDL). For CREATE TABLE: 1. Any HBase metadata (table, column families) that doesn’t already exist will be created. 2. KEEP_DELETED_CELLS option is enabled to allow for flashback queries to work correctly. 3. an empty key value will also be added for each row so that queries behave as expected (without requiring all columns to be projected during scans).  For CREATE VIEW: Instead the existing HBase metadata must match the metadata specified in the DDL statement (or table read only error). For UPSERT VALUES: Use It multiple times before comminting mutations batching For UPSERT SELECT: Configure phoenix.mutate.batchSize based on row size Write scans directly to Hbase and to write on the server while running upsert select on the same table by setting auto-commit to true
  19. Enhance existing statistics collection by enabling further query optmizations based on the size and cardinality of the data. Generate histograms to drive query optimization decisions such as secondary index usage and join ordering based on cardinalities to produce the most efficient query plan.
  20. Secondary Indexies Types: Global Index (Optimized for read heavy use case), local index (Optimized for write heavy space constrained use cases) and functional index (Create index on arbitrary expression). Hbase tables are sorted maps. Star schema is the simplest style of data mart schema (separates business process data into facts), approach is widely used to develop data warehouses and dimensional data mart. The star schema consists of one or more fact tables referencing any number of dimension tables. Fact table contains measurements, metrics, and facts about a business process while the Dimension table is a companion to the fact table which contains descriptive attributes to be used as query constraining Types of Dimension Table: Slowly Changing Dimension, Conformed Dimension, Junk Dimension, Degenerate Dimension, Roleplay Dimension
  21. Maps Hbase native timestamp to a Phoenix column. Take advantage of various optimizations that HBase provides for time ranges. ROW_TIMESTAMP needs to be a primary key column in a date or time format (Specified in documentations for more details). Only one primary key can be designated with ROW_TIMESTAMP, decleration upon table creation (No null or negative values allowed).
  22. Cache content on server through 2 main parts (SQL Read, SQL Write) with end user and collecting content from content providers.
  23. Cache content on server through 2 main parts (SQL Read, SQL Write) with end user and collecting content from content providers.