De-Mystifying the Apache Phoenix QueryServer

De-Mystifying the
Apache Phoenix
QueryServer
Josh Elser
MTS
2016-04-13

2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
About me
• (Recent) Apache Phoenix Committer
• Apache Calcite Committer and PMC
• Long-time NoSQL developer, re-learning SQL
Apache Calcite and Apache Phoenix are projects at the Apache Software Foundation.
These names are trademarks of the Foundation.

Agenda
What?
Why?
How?
Apache Phoenix QueryServer

“What” is Apache Phoenix?
 Been called many things [1]
– “We put the SQL back in NoSQL!”
– “A SQL skin on HBase”
– “A relational layer on HBase”
– “Online transaction processing and operational analytics for Hadoop”
 Built on HDFS and HBase
– Clients use a JDBC driver
– Lots of server-side “magic” through HBase Coprocessors
 A query system capable of both OLAP and OLTP workloads
– More or less
[1] https://medium.com/salesforce-open-source/apache-phoenix-a-conversation-with-pmc-chair-james-taylor-cc0dd8c7c3e5

“What” is the Apache Phoenix QueryServer?
 An HTTP abstraction of a JDBC Driver
– Built on Apache Calcite’s Avatica sub-project
 A standalone-service to be run on each node in a cluster
– An HTTP server
– Configurable serialization mechanism
 A new JDBC Driver to use with the QueryServer
– A glorified HTTP client
– A new sqlline script

“What” is Apache Calcite?
 SQL Parser
– One SQL implementation usable by everyone
 Cost-Based Optimizer
– “Optimizations are easy”
 Pluggable Data Sources
– Implement your own SQL engine
 Avatica
– Calcite sub-project
– Implements the JDBC-over-HTTP abstraction
– Written to the JDBC spec, not database-specific
The coolest project approximately one person can explain

Agenda
What?
Why?
How?

“Why” should I care?
 A true “thin” client
– No required connection to HBase/ZooKeeper/HDFS
– Greatly simplifies definition of “Phoenix client”
 Offload computational resources to cluster
– QueryServers run on the cluster
– Not your laptop or some “edge” node
 Enables non-Java clients
– The big one
Because it’s friggin’ cool!

“Why” are non-Java clients important?
 ”Native” bindings in any language
– HTTP clients are easily implemented
– Serialization approaches (often) have cross-language support
 Access to data in HBase is suddenly easily accessible
– Standardized table format through Phoenix
– Well-defined APIs: Python Database API, Ruby ActiveRecord, etc
 ODBC and BI Tools
– The moonshot.
– The hopes and dreams of services people everywhere.
Not everyone wants to use Java.

“Why” not <insert rpc framework here> instead of HTTP?
 HTTP is simple
– “You have multiple versions of Thrift on the classpath”
– “You have to use Protobuf 2.4”
 Designed to be stateless
– JDBC doesn’t make this easy
– Can work around it via Avatica’s wire API
 Statelessness makes scaling easier
– Pull down any HTTP load balancer
– Deploy more Avatica servers to scale up
Because portability sucks

Agenda
What?
Why?
How?

“How” does it work?
 HTTP Server
– Jetty
– Phoenix “thick” Driver
 Serialization mechanism
– Protocol Buffers
– JSON
 Metrics system
– Dropwizard Metrics
– Apache Hadoop Metrics2
 Authentication
– Kerberos via SPNEGO
– HTTP Basic or Digest
The QueryServer itself

“How” does the serialization work?
 Google Protocol Buffers (v3)
– “think XML, but smaller, faster, and simpler” [1]
– 110% supported WRT compatibility
– Native bindings in most every popular language
– Clients can use any version of protobuf3
 JSON
– Nice for testing
– 110% unsupported WRT compatibility
– You will run into issue with mismatched client/server versions
Please, please, please use Protocol Buffers
[1] https://developers.google.com/protocol-buffers/

“How” do I make a client?
 Choose a language
– Find an HTTP client supported with that language
– Install Protobuf bindings for that language
 Read the Avatica docs [1]
– Tell us when docs are incorrect/lacking/wrong/boring/lame
 Write tests
 Publish the client
– And tell us!
Sit down and write code
[1] http://calcite.apache.org/avatica/docs/protobuf_reference.html

“How” do I get involved?
 Provide servers for databases
– A simple project for a specific database
 Write some tests
 Proofread the docs
 Contribute a client
 Answer questions on Stackoverflow/mailing lists
Carpe diem

Thanks!
Email: elserj@apache.org
Twitter: @josh_elser
Mailing lists:
Phoenix: dev@phoenix.apache.org, user@phoenix.apache.org,
Calcite: dev@calcite.apache.org
Project info:
https://phoenix.apache.org/server.html
https://calcite.apache.org/avatica/

De-Mystifying the Apache Phoenix QueryServer

Related slideshows

More Related Content

What's hot

What's hot (20)

Similar to De-Mystifying the Apache Phoenix QueryServer

Similar to De-Mystifying the Apache Phoenix QueryServer (20)

Recently uploaded

Recently uploaded (20)

De-Mystifying the Apache Phoenix QueryServer