SlideShare a Scribd company logo
HAWQ
Architecture
Alexey Grishchenko
Who I am
Enterprise Architect @ Pivotal
• 7 years in data processing
• 5 years of experience with MPP
• 4 years with Hadoop
• Using HAWQ since the first internal Beta
• Responsible for designing most of the EMEA HAWQ
and Greenplum implementations
• Spark contributor
• http://0x0fff.com
Agenda
• What is HAWQ
Agenda
• What is HAWQ
• Why you need it

Recommended for you

Apache Druid 101
Apache Druid 101Apache Druid 101
Apache Druid 101

Data Con LA 2020 Description Apache Druid is a cloud-native open-source database that enables developers to build highly-scalable, low-latency, real-time interactive dashboards and apps to explore huge quantities of data. This column-oriented database provides the microsecond query response times required for ad-hoc queries and programmatic analytics. Druid natively streams data from Apache Kafka (and more) and batch loads just about anything. At ingestion, Druid partitions data based on time so time-based queries run significantly faster than traditional databases, plus Druid offers SQL compatibility. Druid is used in production by AirBnB, Nielsen, Netflix and more for real-time and historical data analytics. This talk provides an introduction to Apache Druid including: Druid's core architecture and its advantages, Working with streaming and batch data in Druid, Querying data and building apps on Druid and Real-world examples of Apache Druid in action Speaker Matt Sarrel, Imply Data, Developer Evangelist

data con ladata con la 2020dcla
Data at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceData at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and Governance

Do you ever wonder how data-driven organizations fuel analytics, improve customer experience, and accelerate business productivity? They are successful by governing and mastering data effectively so they can get trusted data to those who need it faster. Efficient data discovery, mastering and democratization is critical for swiftly linking accurate data with business consumers. When business teams can quickly and easily locate, interpret, trust, and apply data assets to support sound business judgment, it takes less time to see value.    Join data mastering and data governance experts from Informatica—plus a real-world organization empowering trusted data for analytics—for a lively panel discussion. You’ll hear more about how a single cloud-native approach can help global businesses in any economy create more value—faster, more reliably, and with more confidence—by making data management and governance easier to implement.

datadata managementdataversity
Databricks + Snowflake: Catalyzing Data and AI Initiatives
Databricks + Snowflake: Catalyzing Data and AI InitiativesDatabricks + Snowflake: Catalyzing Data and AI Initiatives
Databricks + Snowflake: Catalyzing Data and AI Initiatives

"Combining Databricks, the unified analytics platform with Snowflake, the data warehouse built for the cloud is a powerful combo. Databricks offers the ability to process large amounts of data reliably, including developing scalable AI projects. Snowflake offers the elasticity of a cloud-based data warehouse that centralizes the access to data. Databricks brings the unparalleled utility of being based on a mature distributed big data processing and AI-enabled tool to the table, capable of integrating with nearly every technology, from message queues (e.g. Kafka) to databases (e.g. Snowflake) to object stores (e.g. S3) and AI tools (e.g. Tensorflow). Key Takeaways: How Databricks & Snowflake work; Why they're so powerful; How Databricks + Snowflake symbiotically catalyze analytics and AI initiatives"

Agenda
• What is HAWQ
• Why you need it
• HAWQ Components
Agenda
• What is HAWQ
• Why you need it
• HAWQ Components
• HAWQ Design
Agenda
• What is HAWQ
• Why you need it
• HAWQ Components
• HAWQ Design
• Query execution example
Agenda
• What is HAWQ
• Why you need it
• HAWQ Components
• HAWQ Design
• Query execution example
• Competitive solutions

Recommended for you

Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2

The world of data architecture began with applications. Next came data warehouses. Then text was organized into a data warehouse. Then one day the world discovered a whole new kind of data that was being generated by organizations. The world found that machines generated data that could be transformed into valuable insights. This was the origin of what is today called the data lakehouse. The evolution of data architecture continues today. Come listen to industry experts describe this transformation of ordinary data into a data architecture that is invaluable to business. Simply put, organizations that take data architecture seriously are going to be at the forefront of business tomorrow. This is an educational event. Several of the authors of the book Building the Data Lakehouse will be presenting at this symposium.

Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data Engineering

Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/2OUz6dt. Chris Riccomini talks about the current state-of-the-art in data pipelines and data warehousing, and shares some of the solutions to current problems dealing with data streaming and warehousing. Filmed at qconsf.com. Chris Riccomini works as a Software Engineer at WePay.

chris riccominiinfoqqcon
Dynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache SparkDynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache Spark

In data analytics frameworks such as Spark it is important to detect and avoid scanning data that is irrelevant to the executed query, an optimization which is known as partition pruning. Dynamic partition pruning occurs when the optimizer is unable to identify at parse time the partitions it has to eliminate. In particular, we consider a star schema which consists of one or multiple fact tables referencing any number of dimension tables. In such join operations, we can prune the partitions the join reads from a fact table by identifying those partitions that result from filtering the dimension tables. In this talk we present a mechanism for performing dynamic partition pruning at runtime by reusing the dimension table broadcast results in hash joins and we show significant improvements for most TPCDS queries.

* 
apache spark

 *big data

 *ai

 *
What is
• Analytical SQL-on-Hadoop engine
What is
• Analytical SQL-on-Hadoop engine
• HAdoop With Queries
What is
• Analytical SQL-on-Hadoop engine
• HAdoop With Queries
Postgres Greenplum HAWQ
2005
Fork
Postgres 8.0.2
What is
• Analytical SQL-on-Hadoop engine
• HAdoop With Queries
Postgres HAWQ
2005
Fork
Postgres 8.0.2
2009
Rebase
Postgres 8.2.15
Greenplum

Recommended for you

Interactive real time dashboards on data streams using Kafka, Druid, and Supe...
Interactive real time dashboards on data streams using Kafka, Druid, and Supe...Interactive real time dashboards on data streams using Kafka, Druid, and Supe...
Interactive real time dashboards on data streams using Kafka, Druid, and Supe...

When interacting with analytics dashboards, in order to achieve a smooth user experience, two major key requirements are quick response time and data freshness. To meet the requirements of creating fast interactive BI dashboards over streaming data, organizations often struggle with selecting a proper serving layer. Cluster computing frameworks such as Hadoop or Spark work well for storing large volumes of data, although they are not optimized for making it available for queries in real time. Long query latencies also make these systems suboptimal choices for powering interactive dashboards and BI use cases. This talk presents an open source real time data analytics stack using Apache Kafka, Druid, and Superset. The stack combines the low-latency streaming and processing capabilities of Kafka with Druid, which enables immediate exploration and provides low-latency queries over the ingested data streams. Superset provides the visualization and dashboarding that integrates nicely with Druid. In this talk we will discuss why this architecture is well suited to interactive applications over streaming data, present an end-to-end demo of complete stack, discuss its key features, and discuss performance characteristics from real-world use cases. Speaker Nishant Bangarwa, Software Engineer, Hortonworks

iot and streamingbusiness intelligencedata engineering
Experiences Migrating Hive Workload to SparkSQL with Jie Xiong and Zhan Zhang
Experiences Migrating Hive Workload to SparkSQL with Jie Xiong and Zhan ZhangExperiences Migrating Hive Workload to SparkSQL with Jie Xiong and Zhan Zhang
Experiences Migrating Hive Workload to SparkSQL with Jie Xiong and Zhan Zhang

At Facebook, millions of Hive queries are executed on a daily basis, and the workload contributes to important analytics that drive product decisions and insights. Spark SQL in Apache Spark provides much of the same functionality as Hive query language (HQL) more efficiently, and Facebook is building a framework to migrate existing production Hive workload to Spark SQL with minimal user intervention. Before Facebook began large-scale migration to SparkSQL, they worked on identifying the gap between HQL and SparkSQL. They built an offline syntax analysis tool that parses, analyzes, optimizes and generates physical plans on daily HQL workload. In this session, they’ll share their results. After finding their syntactic analysis encouraging, they built tooling for offline semantic analysis where they run HQL queries in their Spark shadow cluster and validate the outputs. Output validation is necessary since the runtime behavior in Spark SQL may be different from HQL. They have built a migration framework that supports HQL in both Hive and Spark execution engines, can shadow and validate HQL workloads in Spark, and makes it easy for users to convert their workloads.

apache sparkspark summit
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx

The document discusses migrating a data warehouse to the Databricks Lakehouse Platform. It outlines why legacy data warehouses are struggling, how the Databricks Platform addresses these issues, and key considerations for modern analytics and data warehousing. The document then provides an overview of the migration methodology, approach, strategies, and key takeaways for moving to a lakehouse on Databricks.

What is
• Analytical SQL-on-Hadoop engine
• HAdoop With Queries
Postgres HAWQ
2005
Fork
Postgres 8.0.2
2009
Rebase
Postgres 8.2.15
2011 Fork
GPDB 4.2.0.0
Greenplum
What is
• Analytical SQL-on-Hadoop engine
• HAdoop With Queries
Postgres HAWQ
2005
Fork
Postgres 8.0.2
2009
Rebase
Postgres 8.2.15
2011 Fork
GPDB 4.2.0.0
2013
HAWQ 1.0.0.0
Greenplum
What is
• Analytical SQL-on-Hadoop engine
• HAdoop With Queries
Postgres HAWQ
2005
Fork
Postgres 8.0.2
2009
Rebase
Postgres 8.2.15
2011 Fork
GPDB 4.2.0.0
2013
HAWQ 1.0.0.0
HAWQ 2.0.0.0
Open Source
2015
Greenplum
HAWQ is …
• 1’500’000 C and C++ lines of code

Recommended for you

AWS Big Data Platform
AWS Big Data PlatformAWS Big Data Platform
AWS Big Data Platform

This overview presentation discusses big data challenges and provides an overview of the AWS Big Data Platform by covering: - How AWS customers leverage the platform to manage massive volumes of data from a variety of sources while containing costs. - Reference architectures for popular use cases, including, connected devices (IoT), log streaming, real-time intelligence, and analytics. - The AWS big data portfolio of services, including, Amazon S3, Kinesis, DynamoDB, Elastic MapReduce (EMR), and Redshift. - The latest relational database engine, Amazon Aurora— a MySQL-compatible, highly-available relational database engine, which provides up to five times better performance than MySQL at one-tenth the cost of a commercial database. Created by: Rahul Pathak, Sr. Manager of Software Development

cloudawsbig data
Learn to Use Databricks for the Full ML Lifecycle
Learn to Use Databricks for the Full ML LifecycleLearn to Use Databricks for the Full ML Lifecycle
Learn to Use Databricks for the Full ML Lifecycle

Machine learning development brings many new complexities beyond the traditional software development lifecycle. Unlike traditional software development, ML developers want to try multiple algorithms, tools and parameters to get the best results, and they need to track this information to reproduce work. In addition, developers need to use many distinct systems to productionize models. In this talk, learn how to operationalize ML across the full lifecycle with Databricks Machine Learning.

Data Quality Best Practices
Data Quality Best PracticesData Quality Best Practices
Data Quality Best Practices

Tackling Data Quality problems requires more than a series of tactical, one-off improvement projects. By their nature, many Data Quality problems extend across and often beyond an organization. Addressing these issues requires a holistic architectural approach combining people, process, and technology. Join Nigel Turner and Donna Burbank as they provide practical ways to control Data Quality issues in your organization.

datadata managementdataversity
HAWQ is …
• 1’500’000 C and C++ lines of code
– 200’000 of them in headers only
HAWQ is …
• 1’500’000 C and C++ lines of code
– 200’000 of them in headers only
• 180’000 Python LOC
HAWQ is …
• 1’500’000 C and C++ lines of code
– 200’000 of them in headers only
• 180’000 Python LOC
• 60’000 Java LOC
HAWQ is …
• 1’500’000 C and C++ lines of code
– 200’000 of them in headers only
• 180’000 Python LOC
• 60’000 Java LOC
• 23’000 Makefile LOC

Recommended for you

Azure Synapse Analytics
Azure Synapse AnalyticsAzure Synapse Analytics
Azure Synapse Analytics

Organizations are grappling to manually classify and create an inventory for distributed and heterogeneous data assets to deliver value. However, the new Azure service for enterprises – Azure Synapse Analytics is poised to help organizations and fill the gap between data warehouses and data lakes.

azure synapse analyticssynapse analyticsazure synapse
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)

Azure Synapse Analytics is Azure SQL Data Warehouse evolved: a limitless analytics service, that brings together enterprise data warehousing and Big Data analytics into a single service. It gives you the freedom to query data on your terms, using either serverless on-demand or provisioned resources, at scale. Azure Synapse brings these two worlds together with a unified experience to ingest, prepare, manage, and serve data for immediate business intelligence and machine learning needs. This is a huge deck with lots of screenshots so you can see exactly how it works.

azure synapse analyticssql data warehouse
Building End-to-End Delta Pipelines on GCP
Building End-to-End Delta Pipelines on GCPBuilding End-to-End Delta Pipelines on GCP
Building End-to-End Delta Pipelines on GCP

Delta has been powering many production pipelines at scale in the Data and AI space since it has been introduced for the past few years. Built on open standards, Delta provides data reliability, enhances storage and query performance to support big data use cases (both batch and streaming), fast interactive queries for BI and enabling machine learning. Delta has matured over the past couple of years in both AWS and AZURE and has become the de-facto standard for organizations building their Data and AI pipelines. In today’s talk, we will explore building end-to-end pipelines on the Google Cloud Platform (GCP). Through presentation, code examples and notebooks, we will build the Delta Pipeline from ingest to consumption using our Delta Bronze-Silver-Gold architecture pattern and show examples of Consuming the delta files using the Big Query Connector.

HAWQ is …
• 1’500’000 C and C++ lines of code
– 200’000 of them in headers only
• 180’000 Python LOC
• 60’000 Java LOC
• 23’000 Makefile LOC
• 7’000 Shell scripts LOC
HAWQ is …
• 1’500’000 C and C++ lines of code
– 200’000 of them in headers only
• 180’000 Python LOC
• 60’000 Java LOC
• 23’000 Makefile LOC
• 7’000 Shell scripts LOC
• More than 50 enterprise customers
HAWQ is …
• 1’500’000 C and C++ lines of code
– 200’000 of them in headers only
• 180’000 Python LOC
• 60’000 Java LOC
• 23’000 Makefile LOC
• 7’000 Shell scripts LOC
• More than 50 enterprise customers
– More than 10 of them in EMEA
Apache HAWQ
• Apache HAWQ (incubating) from 09’2015
– http://hawq.incubator.apache.org
– https://github.com/apache/incubator-hawq
• What’s in Open Source
– Sources of HAWQ 2.0 alpha
– HAWQ 2.0 beta is planned for 2015’Q4
– HAWQ 2.0 GA is planned for 2016’Q1
• Community is yet young – come and join!

Recommended for you

Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture

Organizations with on-premises Hadoop infrastructure are bogged down by system complexity, unscalable infrastructure, and the increasing burden on DevOps to manage legacy architectures. Costs and resource utilization continue to go up while innovation has flatlined. In this session, you will learn why, now more than ever, enterprises are looking for cloud alternatives to Hadoop and are migrating off of the architecture in large numbers. You will also learn how elastic compute models’ benefits help one customer scale their analytics and AI workloads and best practices from their experience on a successful migration of their data and workloads to the cloud.

Differentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionDifferentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solution

It can be quite challenging keeping up with the frequent updates to the Microsoft products and understanding all their use cases and how all the products fit together.  In this session we will differentiate the use cases for each of the Microsoft services, explaining and demonstrating what is good and what isn't, in order for you to position, design and deliver the proper adoption use cases for each with your customers.  We will cover a wide range of products such as Databricks, SQL Data Warehouse, HDInsight, Azure Data Lake Analytics, Azure Data Lake Store, Blob storage, and AAS  as well as high-level concepts such as when to use a data lake.  We will also review the most common reference architectures (“patterns”) witnessed in customer adoption.

big datadata warehousecloud
Deploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSDeploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWS

Data warehousing is a critical component for analysing and extracting actionable insights from your data. Amazon Redshift allows you to deploy a scalable data warehouse in a matter of minutes and starts to analyse your data right away using your existing business intelligence tools.

awscloud-computingaws-loft
Why do we need it?
Why do we need it?
• SQL-interface for BI solutions to the Hadoop
data complaint with ANSI SQL-92, -99, -2003
Why do we need it?
• SQL-interface for BI solutions to the Hadoop
data complaint with ANSI SQL-92, -99, -2003
– Example - 5000-line query with a number of
window function generated by Cognos
Why do we need it?
• SQL-interface for BI solutions to the Hadoop
data complaint with ANSI SQL-92, -99, -2003
– Example - 5000-line query with a number of
window function generated by Cognos
• Universal tool for ad hoc analytics on top of
Hadoop data

Recommended for you

aip-workshop1-dev-tutorial
aip-workshop1-dev-tutorialaip-workshop1-dev-tutorial
aip-workshop1-dev-tutorial

The document discusses developing data APIs for the Arabidopsis Information Portal (AIP) to enable discovery and reuse of services, data, and codes. It describes the AIP strategy of centralized data warehousing with infrastructure for data federation through web services and standards like REST. The AIP architecture includes an API manager, services bus and mediators to integrate diverse data sources and legacy systems while providing authentication, documentation, logging and versioning.

araport agave_api tacc jcvi iplant
Architecting a Next Generation Data Platform
Architecting a Next Generation Data PlatformArchitecting a Next Generation Data Platform
Architecting a Next Generation Data Platform

This document discusses a presentation on architecting Hadoop application architectures for a next generation data platform. It provides an overview of the presentation topics which include a case study on using Hadoop for an Internet of Things and entity 360 application. It introduces the key components of the proposed high level architecture including ingesting streaming and batch data using Kafka and Flume, stream processing with Kafka streams and storage in Hadoop.

hadooparchitecturesapplication
Evolve Your Schemas in a Better Way! A Deep Dive into Avro Schema Compatibili...
Evolve Your Schemas in a Better Way! A Deep Dive into Avro Schema Compatibili...Evolve Your Schemas in a Better Way! A Deep Dive into Avro Schema Compatibili...
Evolve Your Schemas in a Better Way! A Deep Dive into Avro Schema Compatibili...

"The only constant in life is change! The same applies to your Kafka events flowing through your streaming applications. The Confluent Schema Registry allows us to control how schemas can evolve over time without breaking the compatibility of our streaming applications. But when you start with Kafka and (Avro) schemas, this can be pretty overwhelming. Join Kosta and Tim as we dive into the tricky world of backward and forward compatibility in schema design. During this deep dive talk, we are going to answer questions like: * What compatibility level to pick? * What changes can I make when evolving my schemas? * What options do I have when I need to introduce a breaking change? * Should we automatically register schemas from our applications? Or do we need a separate step in our deployment process to promote schemas to higher-level environments? * What to promote first? My producer, consumer or schema? * How do you generate Java classes from your Avro schemas using Maven or Gradle, and how to integrate this into your project(s)? * How do you build an automated test suite (unit tests) to gain more confidence and verify you are not breaking compatibility? Even before deploying a new version of your schema or application. With live demos, we'll show you how to make schema changes work seamlessly. Emphasizing the crucial decisions, using real-life examples, pitfalls and best practices when promoting schemas on the consumer and producer sides. Explore the ins and outs of Apache Avro and the Schema Registry with us at the Kafka Summit! Start evolving your schemas in a better way today, and join this talk!"

schema registrykafka summitkafka summit london
Why do we need it?
• SQL-interface for BI solutions to the Hadoop
data complaint with ANSI SQL-92, -99, -2003
– Example - 5000-line query with a number of
window function generated by Cognos
• Universal tool for ad hoc analytics on top of
Hadoop data
– Example - parse URL to extract protocol, host
name, port, GET parameters
Why do we need it?
• SQL-interface for BI solutions to the Hadoop
data complaint with ANSI SQL-92, -99, -2003
– Example - 5000-line query with a number of
window function generated by Cognos
• Universal tool for ad hoc analytics on top of
Hadoop data
– Example - parse URL to extract protocol, host
name, port, GET parameters
• Good performance
Why do we need it?
• SQL-interface for BI solutions to the Hadoop
data complaint with ANSI SQL-92, -99, -2003
– Example - 5000-line query with a number of
window function generated by Cognos
• Universal tool for ad hoc analytics on top of
Hadoop data
– Example - parse URL to extract protocol, host
name, port, GET parameters
• Good performance
– How many times the data would hit the HDD during
a single Hive query?
HAWQ Cluster
Server 1
SNameNode
Server 4
ZK JM
NameNode
Server 3
ZK JM
Server 2
ZK JM
Server 6
Datanode
Server N
Datanode
Server 5
Datanode
interconnect
…

Recommended for you

Hadoop application architectures - using Customer 360 as an example
Hadoop application architectures - using Customer 360 as an exampleHadoop application architectures - using Customer 360 as an example
Hadoop application architectures - using Customer 360 as an example

Hadoop application architectures - using Customer 360 (more generally, Entity 360) as an example. By Ted Malaska, Jonathan Seidman and Mark Grover at Strata + Hadoop World 2016 in NYC.

architecturescustomer experienceentity 360
Cost-based query optimization in Apache Hive 0.14
Cost-based query optimization in Apache Hive 0.14Cost-based query optimization in Apache Hive 0.14
Cost-based query optimization in Apache Hive 0.14

Slides from a talk given by Julian Hyde to the Hive User Group Meetup NYC Edition, on October 15th, 2014.

cbocost-based query optimizationapache calcite
Accelerating SQL queries in NoSQL Databases using Apache Drill and Secondary ...
Accelerating SQL queries in NoSQL Databases using Apache Drill and Secondary ...Accelerating SQL queries in NoSQL Databases using Apache Drill and Secondary ...
Accelerating SQL queries in NoSQL Databases using Apache Drill and Secondary ...

Talk at Apache Drill Meetup (November 2018) describing how to accelerate SQL queries in a NoSQL database using Apache Drill and Secondary Indexes. Drill (in conjunction with Apache Calcite) provides a comprehensive cost-based index planning and execution framework. Queries with indexed columns in the WHERE clause, ORDER BY, GROUP BY and Joins can be sped up substantially. A reference implementation with MapR-DB JSON database is described.

apache drillsecondary indexesnosql database
HAWQ Cluster
Server 1
SNameNode
Server 4
ZK JM
NameNode
Server 3
ZK JM
Server 2
ZK JM
Server 6
Datanode
Server N
Datanode
Server 5
Datanode
YARN NM YARN NM YARN NM
YARN RM
YARN App
Timeline
interconnect
…
HAWQ Cluster
HAWQ Master
Server 1
SNameNode
Server 4
ZK JM
NameNode
Server 3
ZK JM
HAWQ
Standby
Server 2
ZK JM
HAWQ Segment
Server 6
Datanode
HAWQ Segment
Server N
Datanode
HAWQ Segment
Server 5
Datanode
YARN NM YARN NM YARN NM
YARN RM
YARN App
Timeline
interconnect
…
Master Servers
Server 1
SNameNode
Server 4
ZK JM
NameNode
Server 3
ZK JM
Server 2
ZK JM
HAWQ Segment
Server 6
Datanode
HAWQ Segment
Server N
Datanode
HAWQ Segment
Server 5
Datanode
YARN NM YARN NM YARN NM
YARN RM
YARN App
Timeline
interconnect
…
HAWQ Master
HAWQ
Standby
Master Servers
HAWQ Master
Query Parser
Query
Optimizer
Global
Resource
Manager
Distributed
Transactions
Manager
Query Dispatch
Metadata
Catalog
HAWQ Standby Master
Query Parser
Query
Optimizer
Global Resource
Manager
Distributed
Transactions
Manager
Query Dispatch
Metadata
Catalog
WAL
repl.

Recommended for you

Architecting a next generation data platform
Architecting a next generation data platformArchitecting a next generation data platform
Architecting a next generation data platform

A tutorial on architecting a next generation data platform by Jonathan Seidman, Gwen Shapira and Mark Grover at Strata New York 2017.

hadoopapplicationarchitectures
Architecting a next-generation data platform
Architecting a next-generation data platformArchitecting a next-generation data platform
Architecting a next-generation data platform

This document discusses a high-level architecture for analyzing taxi trip data in real-time and batch using Apache Hadoop and streaming technologies. The architecture includes ingesting data from multiple sources using Kafka, processing streaming data using stream processing engines, storing data in data stores like HDFS, and enabling real-time and batch querying and analytics. Key considerations discussed are choosing data transport and stream processing technologies, scaling and reliability, and processing both streaming and batch data.

big datareal timestrata
Architectural Patterns for Streaming Applications
Architectural Patterns for Streaming ApplicationsArchitectural Patterns for Streaming Applications
Architectural Patterns for Streaming Applications

Presentation on Architectural Patterns for Streaming Applications at Strata + Hadoop World Singapore 2015

streamingarchitecturehadoop
HAWQ Master
HAWQ
Standby
Segments
Server 1
SNameNode
Server 4
ZK JM
NameNode
Server 3
ZK JM
Server 2
ZK JM
Server 6
Datanode
Server N
Datanode
Server 5
Datanode
YARN NM YARN NM YARN NM
YARN RM
YARN App
Timeline
interconnect
HAWQ Segment HAWQ SegmentHAWQ Segment …
Segments
HAWQ Segment
Query Executor
libhdfs3
PXF
HDFS Datanode
Local Filesystem
Temporary Data
Directory
Logs
YARN Node Manager
Metadata
• HAWQ metadata structure is similar to
Postgres catalog structure
Metadata
• HAWQ metadata structure is similar to
Postgres catalog structure
• Statistics
– Number of rows and pages in the table

Recommended for you

SQL on everything, in memory
SQL on everything, in memorySQL on everything, in memory
SQL on everything, in memory

Enterprise data is moving into Hadoop, but some data has to stay in operational systems. Apache Calcite (the technology behind Hive’s new cost-based optimizer, formerly known as Optiq) is a query-optimization and data federation technology that allows you to combine data in Hadoop with data in NoSQL systems such as MongoDB and Splunk, and access it all via SQL. Hyde shows how to quickly build a SQL interface to a NoSQL system using Calcite. He shows how to add rules and operators to Calcite to push down processing to the source system, and how to automatically build materialized data sets in memory for blazing-fast interactive analysis.

sqlquery optimizermaterialized views
Genji: Framework for building resilient near-realtime data pipelines
Genji: Framework for building resilient near-realtime data pipelinesGenji: Framework for building resilient near-realtime data pipelines
Genji: Framework for building resilient near-realtime data pipelines

Presented at the O'Reilly Velocity 2017 Conference in NY, on building a framework to support resilient near-realtime data pipelines

big datastream processingspark
The Polyglot Data Scientist - Exploring R, Python, and SQL Server
The Polyglot Data Scientist - Exploring R, Python, and SQL ServerThe Polyglot Data Scientist - Exploring R, Python, and SQL Server
The Polyglot Data Scientist - Exploring R, Python, and SQL Server

This document provides an overview of a presentation on being a polyglot data scientist using multiple languages and tools. It discusses using SQL, R, and Python together in data science work. The presentation covers the challenges of being a polyglot, how SQL Server with R or Python can help solve problems more easily, and examples of analyzing sensor data with these tools. It also discusses resources for learning more about R, Python, and machine learning services in SQL Server.

data sciencesql serverr
Metadata
• HAWQ metadata structure is similar to
Postgres catalog structure
• Statistics
– Number of rows and pages in the table
– Most common values for each field
Metadata
• HAWQ metadata structure is similar to
Postgres catalog structure
• Statistics
– Number of rows and pages in the table
– Most common values for each field
– Histogram of values distribution for each field
Metadata
• HAWQ metadata structure is similar to
Postgres catalog structure
• Statistics
– Number of rows and pages in the table
– Most common values for each field
– Histogram of values distribution for each field
– Number of unique values in the field
Metadata
• HAWQ metadata structure is similar to
Postgres catalog structure
• Statistics
– Number of rows and pages in the table
– Most common values for each field
– Histogram of values distribution for each field
– Number of unique values in the field
– Number of null values in the field

Recommended for you

APEX 5 IR Guts and Performance
APEX 5 IR Guts and PerformanceAPEX 5 IR Guts and Performance
APEX 5 IR Guts and Performance

APEX 5 Interactive Reports (IR) are powerful out of the box, but one can significantly improve performance by understanding a few key settings.

thtechnologyirapex 5
APEX 5 IR: Guts & Performance
APEX 5 IR:  Guts & PerformanceAPEX 5 IR:  Guts & Performance
APEX 5 IR: Guts & Performance

APEX 5 Interactive Reports (IR) are powerful out of the box, but one can significantly improve performance by strategic settings of certain key parameters. The full presentation covers all the options.

apex 5kscope16thtechnology
Prometheus lightning talk (Devops Dublin March 2015)
Prometheus lightning talk (Devops Dublin March 2015)Prometheus lightning talk (Devops Dublin March 2015)
Prometheus lightning talk (Devops Dublin March 2015)

This document introduces Prometheus, an open-source monitoring system that allows instrumentation of everything including RPCs, interfaces, business logic, and logs. It provides client libraries that make instrumentation easy across many languages. The Prometheus server can handle over a million time series in one instance with no dependencies. It offers dashboards, expression queries, alerts and integrates with many systems. Time series have structured labels allowing flexible aggregation and complex math for rules and alerts. Prometheus costs less than $.001 per time series per month and is developed by SoundCloud, Boxever and Docker with an active community.

dashboardsalertingmonitoring
Metadata
• HAWQ metadata structure is similar to
Postgres catalog structure
• Statistics
– Number of rows and pages in the table
– Most common values for each field
– Histogram of values distribution for each field
– Number of unique values in the field
– Number of null values in the field
– Average width of the field in bytes
Statistics
No Statistics
How many rows would produce the join of two
tables?
Statistics
No Statistics
How many rows would produce the join of two
tables?
 From 0 to infinity
Statistics
No Statistics
Row Count
How many rows would produce the join of two
tables?
 From 0 to infinity
How many rows would produce the join of two 1000-
row tables?

Recommended for you

Creating PostgreSQL-as-a-Service at Scale
Creating PostgreSQL-as-a-Service at ScaleCreating PostgreSQL-as-a-Service at Scale
Creating PostgreSQL-as-a-Service at Scale

Description of some of the elements that go in to creating a PostgreSQL-as-a-Service for organizations with many teams and a diverse ecosystem of applications and teams.

as-a-serviceconsulpostgresql
APEX 5 Interactive Reports: Guts and PErformance
APEX 5 Interactive Reports: Guts and PErformanceAPEX 5 Interactive Reports: Guts and PErformance
APEX 5 Interactive Reports: Guts and PErformance

Outlines the CSS and JavaScript changes in APEX 5 Interactive Reports, recommending supported APIs and some unsupported options for customizing were necessary. Discusses and dmeonstrates how typical declarative settings influence end-user performance. LEarn how to leverage IR settings to maximize end user performance.

apex 5 interactive reports odtug kscope thtechnol
OpenTSDB for monitoring @ Criteo
OpenTSDB for monitoring @ CriteoOpenTSDB for monitoring @ Criteo
OpenTSDB for monitoring @ Criteo

OpenTSDB is used at Criteo for monitoring their large Hadoop infrastructure which includes over 2500 servers running many different services like HDFS, YARN, HBase, Kafka, and Storm. OpenTSDB was chosen because it can handle the scale of metrics collected, store metrics for long periods of time with fine-grained resolution, and is easily extensible to add new metrics. It uses HBase for storage which is optimized for the time series data stored in OpenTSDB and can scale to meet Criteo's needs of storing billions of data points and handling high query loads.

monitoringhadoopinfrastructure
Statistics
No Statistics
Row Count
How many rows would produce the join of two
tables?
 From 0 to infinity
How many rows would produce the join of two 1000-
row tables?
 From 0 to 1’000’000
Statistics
No Statistics
Row Count
Histograms and MCV
How many rows would produce the join of two
tables?
 From 0 to infinity
How many rows would produce the join of two 1000-
row tables?
 From 0 to 1’000’000
How many rows would produce the join of two 1000-
row tables, with known field cardinality, values
distribution diagram, number of nulls, most common
values?
Statistics
No Statistics
Row Count
Histograms and MCV
How many rows would produce the join of two
tables?
 From 0 to infinity
How many rows would produce the join of two 1000-
row tables?
 From 0 to 1’000’000
How many rows would produce the join of two 1000-
row tables, with known field cardinality, values
distribution diagram, number of nulls, most common
values?
 ~ From 500 to 1’500
Metadata
• Table structure information
ID Name Num Price
1 Яблоко 10 50
2 Груша 20 80
3 Банан 40 40
4 Апельсин 25 50
5 Киви 5 120
6 Арбуз 20 30
7 Дыня 40 100
8 Ананас 35 90

Recommended for you

Scaling ingest pipelines with high performance computing principles - Rajiv K...
Scaling ingest pipelines with high performance computing principles - Rajiv K...Scaling ingest pipelines with high performance computing principles - Rajiv K...
Scaling ingest pipelines with high performance computing principles - Rajiv K...

By Rajiv Kurian, software engineer at SignalFx. At SignalFx, we deal with high-volume high-resolution data from our users. This requires a high performance ingest pipeline. Over time we’ve found that we needed to adapt architectural principles from specialized fields such as HPC to get beyond performance plateaus encountered with more generic approaches. Some key examples include: * Write very simple single threaded code, instead of complex algorithms * Parallelize by running multiple copies of simple single threaded code, instead of using concurrent algorithms * Separate the data plane from the control plane, instead of slowing data for control * Write compact, array-based data structures with minimal indirection, instead of pointer-based data structures and uncontrolled allocation

scalearchitecturehpc
MPP vs Hadoop
MPP vs HadoopMPP vs Hadoop
MPP vs Hadoop

This is the presentation I made on the Hadoop User Group Ireland meetup in Dublin. It covers the main ideas of both MPP, Hadoop and the distributed systems in general, and also how to chose the best option for you

hdfshadoopgreenplum
Greenplum Architecture
Greenplum ArchitectureGreenplum Architecture
Greenplum Architecture

This is the presentation I delivered on Hadoop User Group Ireland meetup in Dublin on Nov 28 2015. It covers at glance the architecture of GPDB and most important its features. Sorry for the colors - Slideshare is crappy with PDFs

pivotaldistributed databasempp
Metadata
• Table structure information
– Distribution fields
ID Name Num Price
1 Яблоко 10 50
2 Груша 20 80
3 Банан 40 40
4 Апельсин 25 50
5 Киви 5 120
6 Арбуз 20 30
7 Дыня 40 100
8 Ананас 35 90
hash(ID)
Metadata
• Table structure information
– Distribution fields
– Number of hash buckets
ID Name Num Price
1 Яблоко 10 50
2 Груша 20 80
3 Банан 40 40
4 Апельсин 25 50
5 Киви 5 120
6 Арбуз 20 30
7 Дыня 40 100
8 Ананас 35 90
hash(ID)
ID Name Num Price
1 Яблоко 10 50
2 Груша 20 80
3 Банан 40 40
4 Апельсин 25 50
5 Киви 5 120
6 Арбуз 20 30
7 Дыня 40 100
8 Ананас 35 90
Metadata
• Table structure information
– Distribution fields
– Number of hash buckets
– Partitioning (hash, list, range)
ID Name Num Price
1 Яблоко 10 50
2 Груша 20 80
3 Банан 40 40
4 Апельсин 25 50
5 Киви 5 120
6 Арбуз 20 30
7 Дыня 40 100
8 Ананас 35 90
hash(ID)
ID Name Num Price
1 Яблоко 10 50
2 Груша 20 80
3 Банан 40 40
4 Апельсин 25 50
5 Киви 5 120
6 Арбуз 20 30
7 Дыня 40 100
8 Ананас 35 90
Metadata
• Table structure information
– Distribution fields
– Number of hash buckets
– Partitioning (hash, list, range)
• General metadata
– Users and groups

Recommended for you

Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture

This is the presentation I made on JavaDay Kiev 2015 regarding the architecture of Apache Spark. It covers the memory model, the shuffle implementations, data frames and some other high-level staff and can be used as an introduction to Apache Spark

apache sparkdistributed systemtungsten
Modern Data Architecture
Modern Data ArchitectureModern Data Architecture
Modern Data Architecture

This is the presentation for the talk I gave at JavaDay Kiev 2015. This is about an evolution of data processing systems from simple ones with single DWH to the complex approaches like Data Lake, Lambda Architecture and Pipeline architecture

pipeline architecturehawqcloudfoundry
Архитектура Apache HAWQ Highload++ 2015
Архитектура Apache HAWQ Highload++ 2015Архитектура Apache HAWQ Highload++ 2015
Архитектура Apache HAWQ Highload++ 2015

Презентация с highload++ 2015 по архитектуре Apache HAWQ на русском языке

sparksqlhivehawq 2.0
Metadata
• Table structure information
– Distribution fields
– Number of hash buckets
– Partitioning (hash, list, range)
• General metadata
– Users and groups
– Access privileges
Metadata
• Table structure information
– Distribution fields
– Number of hash buckets
– Partitioning (hash, list, range)
• General metadata
– Users and groups
– Access privileges
• Stored procedures
– PL/pgSQL, PL/Java, PL/Python, PL/Perl, PL/R
Query Optimizer
• HAWQ uses cost-based query optimizers
Query Optimizer
• HAWQ uses cost-based query optimizers
• You have two options
– Planner – evolved from the Postgres query
optimizer
– ORCA (Pivotal Query Optimizer) – developed
specifically for HAWQ

Recommended for you

Pivotal hawq internals
Pivotal hawq internalsPivotal hawq internals
Pivotal hawq internals

The presentation delivered during Hadoop Kitchen in Moscow on 27.09.2014. It describes the technology that lies beneath Pivotal HAWQ technology

hadoopgreenplumpxf
Karol Bagh @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Jya Khan Top Model Safe
Karol Bagh @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Jya Khan Top Model SafeKarol Bagh @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Jya Khan Top Model Safe
Karol Bagh @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Jya Khan Top Model Safe

Karol Bagh @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Jya Khan Top Model Safe

@Call @Girls Coimbatore 🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cute Girl a...
@Call @Girls Coimbatore 🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cute Girl a...@Call @Girls Coimbatore 🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cute Girl a...
@Call @Girls Coimbatore 🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cute Girl a...

For Ad post Contact : adityaroy0215@gmail.com @Call @Girls Coimbatore 🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cute Girl any Time

Query Optimizer
• HAWQ uses cost-based query optimizers
• You have two options
– Planner – evolved from the Postgres query
optimizer
– ORCA (Pivotal Query Optimizer) – developed
specifically for HAWQ
• Optimizer hints work just like in Postgres
– Enable/disable specific operation
– Change the cost estimations for basic actions
Storage Formats
Which storage format is the most optimal?
Storage Formats
Which storage format is the most optimal?
 It depends on what you mean by “optimal”
Storage Formats
Which storage format is the most optimal?
 It depends on what you mean by “optimal”
– Minimal CPU usage for reading and writing the data

Recommended for you

[D2T2S04] SageMaker를 활용한 Generative AI Foundation Model Training and Tuning
[D2T2S04] SageMaker를 활용한 Generative AI Foundation Model Training and Tuning[D2T2S04] SageMaker를 활용한 Generative AI Foundation Model Training and Tuning
[D2T2S04] SageMaker를 활용한 Generative AI Foundation Model Training and Tuning

이 세션에서는 SageMaker Training Jobs / SageMaker Jumpstart를 사용하여 Foundation Model 을 Pre-Triaining 하거나 Fine Tuing 하는 방안을 제시합니다. 이 세션을 통해 아래 3가지가 소개됩니다. 1. 파운데이션 모델을 처음부터 Training 2. 오픈 소스 모델을 사용하여 파운데이션 모델을 Pre-Training 3. 도메인에 맞게 모델을 Fine Tuning하는 방안 발표자: Miron Perel, Principal ML GTM Specialist, AWS Kristine Pearce, Principal ML BD, AWS

aws data analytics ai
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...

❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RESULT KALYAN MATKA TIPS SATTA MATKA MATKA COM MATKA PANA JODI TODAY

#sattamatka #matka #dpboss#kalyanmatkadpboss kalyan matka guessing
AIRLINE_SATISFACTION_Data Science Solution on Azure
AIRLINE_SATISFACTION_Data Science Solution on AzureAIRLINE_SATISFACTION_Data Science Solution on Azure
AIRLINE_SATISFACTION_Data Science Solution on Azure

Airline Satisfaction Project using Azure This presentation is created as a foundation of understanding and comparing data science/machine learning solutions made in Python notebooks locally and on Azure cloud, as a part of Course DP-100 - Designing and Implementing a Data Science Solution on Azure.

data science
Storage Formats
Which storage format is the most optimal?
 It depends on what you mean by “optimal”
– Minimal CPU usage for reading and writing the data
– Minimal disk space usage
Storage Formats
Which storage format is the most optimal?
 It depends on what you mean by “optimal”
– Minimal CPU usage for reading and writing the data
– Minimal disk space usage
– Minimal time to retrieve record by key
Storage Formats
Which storage format is the most optimal?
 It depends on what you mean by “optimal”
– Minimal CPU usage for reading and writing the data
– Minimal disk space usage
– Minimal time to retrieve record by key
– Minimal time to retrieve subset of columns
– etc.
Storage Formats
• Row-based storage format
– Similar to Postgres heap storage
• No toast
• No ctid, xmin, xmax, cmin, cmax

Recommended for you

[D3T1S02] Aurora Limitless Database Introduction
[D3T1S02] Aurora Limitless Database Introduction[D3T1S02] Aurora Limitless Database Introduction
[D3T1S02] Aurora Limitless Database Introduction

Amazon Aurora 클러스터를 초당 수백만 건의 쓰기 트랜잭션으로 확장하고 페타바이트 규모의 데이터를 관리할 수 있으며, 사용자 지정 애플리케이션 로직을 생성하거나 여러 데이터베이스를 관리할 필요 없이 Aurora에서 관계형 데이터베이스 워크로드를 단일 Aurora 라이터 인스턴스의 한도 이상으로 확장할 수 있는 Amazon Aurora Limitless Database를 소개합니다.

awsdatabaseaurora
iot paper presentation FINAL EDIT by kiran.pptx
iot paper presentation FINAL EDIT by kiran.pptxiot paper presentation FINAL EDIT by kiran.pptx
iot paper presentation FINAL EDIT by kiran.pptx

Iot

Madurai @Call @Girls Whatsapp 0000000000 With High Profile Offer 25%
Madurai @Call @Girls Whatsapp 0000000000 With High Profile Offer 25%Madurai @Call @Girls Whatsapp 0000000000 With High Profile Offer 25%
Madurai @Call @Girls Whatsapp 0000000000 With High Profile Offer 25%

Madurai @Call @Girls Whatsapp 0000000000 With High Profile Offer 25% For Ad post Contact : adityaroy0215@gmail.com

Storage Formats
• Row-based storage format
– Similar to Postgres heap storage
• No toast
• No ctid, xmin, xmax, cmin, cmax
– Compression
• No compression
• Quicklz
• Zlib levels 1 - 9
Storage Formats
• Apache Parquet
– Mixed row-columnar table store, the data is split
into “row groups” stored in columnar format
Storage Formats
• Apache Parquet
– Mixed row-columnar table store, the data is split
into “row groups” stored in columnar format
– Compression
• No compression
• Snappy
• Gzip levels 1 – 9
Storage Formats
• Apache Parquet
– Mixed row-columnar table store, the data is split
into “row groups” stored in columnar format
– Compression
• No compression
• Snappy
• Gzip levels 1 – 9
– The size of “row group” and page size can be set
for each table separately

Recommended for you

₹Call ₹Girls Mumbai Central 09930245274 Deshi Chori Near You
₹Call ₹Girls Mumbai Central 09930245274 Deshi Chori Near You₹Call ₹Girls Mumbai Central 09930245274 Deshi Chori Near You
₹Call ₹Girls Mumbai Central 09930245274 Deshi Chori Near You

₹Call ₹Girls Mumbai Central 09930245274 Deshi Chori Near You

Introduction to the Red Hat Portfolio.pdf
Introduction to the Red Hat Portfolio.pdfIntroduction to the Red Hat Portfolio.pdf
Introduction to the Red Hat Portfolio.pdf

red hat

Hiranandani Gardens @Call @Girls Whatsapp 9833363713 With High Profile Offer
Hiranandani Gardens @Call @Girls Whatsapp 9833363713 With High Profile OfferHiranandani Gardens @Call @Girls Whatsapp 9833363713 With High Profile Offer
Hiranandani Gardens @Call @Girls Whatsapp 9833363713 With High Profile Offer

Hiranandani Gardens @Call @Girls Whatsapp 9833363713 With High Profile Offer

 
by $A19
Resource Management
• Two main options
– Static resource split – HAWQ and YARN does not
know about each other
Resource Management
• Two main options
– Static resource split – HAWQ and YARN does not
know about each other
– YARN – HAWQ asks YARN Resource Manager for
query execution resources
Resource Management
• Two main options
– Static resource split – HAWQ and YARN does not
know about each other
– YARN – HAWQ asks YARN Resource Manager for
query execution resources
• Flexible cluster utilization
– Query might run on a subset of nodes if it is small
Resource Management
• Two main options
– Static resource split – HAWQ and YARN does not
know about each other
– YARN – HAWQ asks YARN Resource Manager for
query execution resources
• Flexible cluster utilization
– Query might run on a subset of nodes if it is small
– Query might have many executors on each cluster
node to make it run faster

Recommended for you

Saket @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Neha Singla Top Model Safe
Saket @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Neha Singla Top Model SafeSaket @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Neha Singla Top Model Safe
Saket @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Neha Singla Top Model Safe

Saket @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Neha Singla Top Model Safe

一比一原版英国埃塞克斯大学毕业证(essex毕业证书)如何办理
一比一原版英国埃塞克斯大学毕业证(essex毕业证书)如何办理一比一原版英国埃塞克斯大学毕业证(essex毕业证书)如何办理
一比一原版英国埃塞克斯大学毕业证(essex毕业证书)如何办理

原版一模一样【微信:741003700 】【英国埃塞克斯大学毕业证(essex毕业证书)成绩单】【微信:741003700 】学位证,留信学历认证(真实可查,永久存档)原件一模一样纸张工艺/offer、在读证明、外壳等材料/诚信可靠,可直接看成品样本,帮您解决无法毕业带来的各种难题!外壳,原版制作,诚信可靠,可直接看成品样本。行业标杆!精益求精,诚心合作,真诚制作!多年品质 ,按需精细制作,24小时接单,全套进口原装设备。十五年致力于帮助留学生解决难题,包您满意。 本公司拥有海外各大学样板无数,能完美还原。 1:1完美还原海外各大学毕业材料上的工艺:水印,阴影底纹,钢印LOGO烫金烫银,LOGO烫金烫银复合重叠。文字图案浮雕、激光镭射、紫外荧光、温感、复印防伪等防伪工艺。材料咨询办理、认证咨询办理请加学历顾问Q/微741003700 【主营项目】 一.毕业证【q微741003700】成绩单、使馆认证、教育部认证、雅思托福成绩单、学生卡等! 二.真实使馆公证(即留学回国人员证明,不成功不收费) 三.真实教育部学历学位认证(教育部存档!教育部留服网站永久可查) 四.办理各国各大学文凭(一对一专业服务,可全程监控跟踪进度) 如果您处于以下几种情况: ◇在校期间,因各种原因未能顺利毕业……拿不到官方毕业证【q/微741003700】 ◇面对父母的压力,希望尽快拿到; ◇不清楚认证流程以及材料该如何准备; ◇回国时间很长,忘记办理; ◇回国马上就要找工作,办给用人单位看; ◇企事业单位必须要求办理的 ◇需要报考公务员、购买免税车、落转户口 ◇申请留学生创业基金 留信网认证的作用: 1:该专业认证可证明留学生真实身份 2:同时对留学生所学专业登记给予评定 3:国家专业人才认证中心颁发入库证书 4:这个认证书并且可以归档倒地方 5:凡事获得留信网入网的信息将会逐步更新到个人身份内,将在公安局网内查询个人身份证信息后,同步读取人才网入库信息 6:个人职称评审加20分 7:个人信誉贷款加10分 8:在国家人才网主办的国家网络招聘大会中纳入资料,供国家高端企业选择人才 办理英国埃塞克斯大学毕业证(essex毕业证书)【微信:741003700 】外观非常简单,由纸质材料制成,上面印有校徽、校名、毕业生姓名、专业等信息。 办理英国埃塞克斯大学毕业证(essex毕业证书)【微信:741003700 】格式相对统一,各专业都有相应的模板。通常包括以下部分: 校徽:象征着学校的荣誉和传承。 校名:学校英文全称 授予学位:本部分将注明获得的具体学位名称。 毕业生姓名:这是最重要的信息之一,标志着该证书是由特定人员获得的。 颁发日期:这是毕业正式生效的时间,也代表着毕业生学业的结束。 其他信息:根据不同的专业和学位,可能会有一些特定的信息或章节。 办理英国埃塞克斯大学毕业证(essex毕业证书)【微信:741003700 】价值很高,需要妥善保管。一般来说,应放置在安全、干燥、防潮的地方,避免长时间暴露在阳光下。如需使用,最好使用复印件而不是原件,以免丢失。 综上所述,办理英国埃塞克斯大学毕业证(essex毕业证书)【微信:741003700 】是证明身份和学历的高价值文件。外观简单庄重,格式统一,包括重要的个人信息和发布日期。对持有人来说,妥善保管是非常重要的。

英国埃塞克斯大学毕业证(essex毕业证书)
[D3T1S03] Amazon DynamoDB design puzzlers
[D3T1S03] Amazon DynamoDB design puzzlers[D3T1S03] Amazon DynamoDB design puzzlers
[D3T1S03] Amazon DynamoDB design puzzlers

간단해 보이지만 실제로는 복잡한 몇 가지 Amazon DynamoDB 디자인 퍼즐을 함께 해결하며 DynamoDB가 대규모로 작동하는 방식에 대해 자세히 알아봅니다. DynamoDB의 작동 방식을 이해함으로써 더 효과적이고 확장 가능한 솔루션을 찾는 방법을 알아보세요.

awsdatabasedynamodb
Resource Management
• Two main options
– Static resource split – HAWQ and YARN does not
know about each other
– YARN – HAWQ asks YARN Resource Manager for
query execution resources
• Flexible cluster utilization
– Query might run on a subset of nodes if it is small
– Query might have many executors on each cluster
node to make it run faster
– You can control the parallelism of each query
Resource Management
• Resource Queue can be set with
– Maximum number of parallel queries
Resource Management
• Resource Queue can be set with
– Maximum number of parallel queries
– CPU usage priority
Resource Management
• Resource Queue can be set with
– Maximum number of parallel queries
– CPU usage priority
– Memory usage limits

Recommended for you

LLM powered Contract Compliance Application.pptx
LLM powered Contract Compliance Application.pptxLLM powered Contract Compliance Application.pptx
LLM powered Contract Compliance Application.pptx

LLM powered contract compliance application which uses Advanced RAG method Self-RAG and Knowledge Graph together for the first time. It provides highest accuracy for contract compliance recorded so far for Oil and Gas Industry.

❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...

❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RESULT KALYAN MATKA TIPS SATTA MATKA MATKA COM MATKA PANA JODI TODAY

kalyan matka results main baza#sattamatka #matka #dpboss#kalyanmatka #matka #
*Call *Girls in Hyderabad 🤣 8826483818 🤣 Pooja Sharma Best High Class Hyderab...
*Call *Girls in Hyderabad 🤣 8826483818 🤣 Pooja Sharma Best High Class Hyderab...*Call *Girls in Hyderabad 🤣 8826483818 🤣 Pooja Sharma Best High Class Hyderab...
*Call *Girls in Hyderabad 🤣 8826483818 🤣 Pooja Sharma Best High Class Hyderab...

*Call *Girls in Hyderabad 🤣 8826483818 🤣 Pooja Sharma Best High Class Hyderabad Available

Resource Management
• Resource Queue can be set with
– Maximum number of parallel queries
– CPU usage priority
– Memory usage limits
– CPU cores usage limit
Resource Management
• Resource Queue can be set with
– Maximum number of parallel queries
– CPU usage priority
– Memory usage limits
– CPU cores usage limit
– MIN/MAX number of executors across the system
Resource Management
• Resource Queue can be set with
– Maximum number of parallel queries
– CPU usage priority
– Memory usage limits
– CPU cores usage limit
– MIN/MAX number of executors across the system
– MIN/MAX number of executors on each node
Resource Management
• Resource Queue can be set with
– Maximum number of parallel queries
– CPU usage priority
– Memory usage limits
– CPU cores usage limit
– MIN/MAX number of executors across the system
– MIN/MAX number of executors on each node
• Can be set up for user or group

Recommended for you

[D3T1S04] Aurora PostgreSQL performance monitoring and troubleshooting by use...
[D3T1S04] Aurora PostgreSQL performance monitoring and troubleshooting by use...[D3T1S04] Aurora PostgreSQL performance monitoring and troubleshooting by use...
[D3T1S04] Aurora PostgreSQL performance monitoring and troubleshooting by use...

Aurora PostgreSQL에서 가장 일반적인 performance use case 들에 대해 Aurora PostreSQL의 모니터링 Tool들을 통해 어떤게 문제를 식별하고 분석하는지 그리고 이 문제를 해결해나가는 절차와 방법에 대한 Deep Dive입니다.

awsdatabaseaurora
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN MATKA RESULTS KALYAN CHART KALYAN MATKA ...
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN MATKA RESULTS KALYAN CHART KALYAN MATKA ...❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN MATKA RESULTS KALYAN CHART KALYAN MATKA ...
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN MATKA RESULTS KALYAN CHART KALYAN MATKA ...

❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN MATKA RESULTS KALYAN CHART KALYAN MATKA MATKA RESULT KALYAN MATKA TIPS SATTA MATKA MATKA COM MATKA PANA JODI TODAY

satta matka dpboss
Orange Yellow Gradient Aesthetic Y2K Creative Portfolio Presentation -3.pdf
Orange Yellow Gradient Aesthetic Y2K Creative Portfolio Presentation -3.pdfOrange Yellow Gradient Aesthetic Y2K Creative Portfolio Presentation -3.pdf
Orange Yellow Gradient Aesthetic Y2K Creative Portfolio Presentation -3.pdf

jjj

External Data
• PXF
– Framework for external data access
– Easy to extend, many public plugins available
– Official plugins: CSV, SequenceFile, Avro, Hive,
HBase
– Open Source plugins: JSON, Accumulo,
Cassandra, JDBC, Redis, Pipe
External Data
• PXF
– Framework for external data access
– Easy to extend, many public plugins available
– Official plugins: CSV, SequenceFile, Avro, Hive,
HBase
– Open Source plugins: JSON, Accumulo,
Cassandra, JDBC, Redis, Pipe
• HCatalog
– HAWQ can query tables from HCatalog the same
way as HAWQ native tables
Query Example
HAWQ Master
Metadata
Transaction Mgr.
Query Parser Query Optimizer
Query Dispatch
Resource Mgr.
NameNode
Server 1
Local directory
HAWQ Segment
Postmaster
HDFS Datanode
Server 2
Local directory
HAWQ Segment
Postmaster
HDFS Datanode
Server N
Local directory
HAWQ Segment
Postmaster
HDFS Datanode
YARN RMPostmaster
Resource Prepare Execute Result CleanupPlan
Query Example
HAWQ Master
Metadata
Transaction Mgr.
Query Parser Query Optimizer
Query Dispatch
Resource Mgr.
NameNode
Server 1
Local directory
HAWQ Segment
Postmaster
HDFS Datanode
Server 2
Local directory
HAWQ Segment
Postmaster
HDFS Datanode
Server N
Local directory
HAWQ Segment
Postmaster
HDFS Datanode
YARN RMPostmaster
Resource Prepare Execute Result CleanupPlan
QE

Recommended for you

Query Example
HAWQ Master
Metadata
Transaction Mgr.
Query Parser Query Optimizer
Query Dispatch
Resource Mgr.
NameNode
Server 1
Local directory
HAWQ Segment
Postmaster
HDFS Datanode
Server 2
Local directory
HAWQ Segment
Postmaster
HDFS Datanode
Server N
Local directory
HAWQ Segment
Postmaster
HDFS Datanode
YARN RMPostmaster
Resource Prepare Execute Result CleanupPlan
QE
Query Example
HAWQ Master
Metadata
Transaction Mgr.
Query Parser Query Optimizer
Query Dispatch
Resource Mgr.
NameNode
Server 1
Local directory
HAWQ Segment
Postmaster
HDFS Datanode
Server 2
Local directory
HAWQ Segment
Postmaster
HDFS Datanode
Server N
Local directory
HAWQ Segment
Postmaster
HDFS Datanode
YARN RMPostmaster
Resource Prepare Execute Result CleanupPlan
QE
Query Example
HAWQ Master
Metadata
Transaction Mgr.
Query Parser Query Optimizer
Query Dispatch
Resource Mgr.
NameNode
Server 1
Local directory
HAWQ Segment
Postmaster
HDFS Datanode
Server 2
Local directory
HAWQ Segment
Postmaster
HDFS Datanode
Server N
Local directory
HAWQ Segment
Postmaster
HDFS Datanode
YARN RMPostmaster
Resource Prepare Execute Result CleanupPlan
QE
Query Example
HAWQ Master
Metadata
Transaction Mgr.
Query Parser Query Optimizer
Query Dispatch
Resource Mgr.
NameNode
Server 1
Local directory
HAWQ Segment
Postmaster
HDFS Datanode
Server 2
Local directory
HAWQ Segment
Postmaster
HDFS Datanode
Server N
Local directory
HAWQ Segment
Postmaster
HDFS Datanode
YARN RMPostmaster
Resource Prepare Execute Result CleanupPlan
QE ScanBars
b
HashJoinb.name =s.bar
ScanSells
s
Filterb.city ='SanFrancisco'
Projects.beer, s.price
MotionGather
MotionRedist(b.name)

Recommended for you

Plan
Query Example
HAWQ Master
Metadata
Transaction Mgr.
Query Parser Query Optimizer
Query Dispatch
Resource Mgr.
NameNode
Server 1
Local directory
HAWQ Segment
Postmaster
HDFS Datanode
Server 2
Local directory
HAWQ Segment
Postmaster
HDFS Datanode
Server N
Local directory
HAWQ Segment
Postmaster
HDFS Datanode
YARN RMPostmaster
Prepare Execute Result Cleanup
QE
Resource
Plan
Query Example
HAWQ Master
Metadata
Transaction Mgr.
Query Parser Query Optimizer
Query Dispatch
Resource Mgr.
NameNode
Server 1
Local directory
HAWQ Segment
Postmaster
HDFS Datanode
Server 2
Local directory
HAWQ Segment
Postmaster
HDFS Datanode
Server N
Local directory
HAWQ Segment
Postmaster
HDFS Datanode
YARN RMPostmaster
Prepare Execute Result Cleanup
QE
Resource
I need 5 containers
Each with 1 CPU core
and 256 MB RAM
Plan
Query Example
HAWQ Master
Metadata
Transaction Mgr.
Query Parser Query Optimizer
Query Dispatch
Resource Mgr.
NameNode
Server 1
Local directory
HAWQ Segment
Postmaster
HDFS Datanode
Server 2
Local directory
HAWQ Segment
Postmaster
HDFS Datanode
Server N
Local directory
HAWQ Segment
Postmaster
HDFS Datanode
YARN RMPostmaster
Prepare Execute Result Cleanup
QE
Resource
I need 5 containers
Each with 1 CPU core
and 256 MB RAM
Server 1: 2 containers
Server 2: 1 container
Server N: 2 containers
Plan
Query Example
HAWQ Master
Metadata
Transaction Mgr.
Query Parser Query Optimizer
Query Dispatch
Resource Mgr.
NameNode
Server 1
Local directory
HAWQ Segment
Postmaster
HDFS Datanode
Server 2
Local directory
HAWQ Segment
Postmaster
HDFS Datanode
Server N
Local directory
HAWQ Segment
Postmaster
HDFS Datanode
YARN RMPostmaster
Prepare Execute Result Cleanup
QE
Resource
I need 5 containers
Each with 1 CPU core
and 256 MB RAM
Server 1: 2 containers
Server 2: 1 container
Server N: 2 containers

Recommended for you

Plan
Query Example
HAWQ Master
Metadata
Transaction Mgr.
Query Parser Query Optimizer
Query Dispatch
Resource Mgr.
NameNode
Server 1
Local directory
HAWQ Segment
Postmaster
HDFS Datanode
Server 2
Local directory
HAWQ Segment
Postmaster
HDFS Datanode
Server N
Local directory
HAWQ Segment
Postmaster
HDFS Datanode
YARN RMPostmaster
Prepare Execute Result Cleanup
QE
Resource
I need 5 containers
Each with 1 CPU core
and 256 MB RAM
Server 1: 2 containers
Server 2: 1 container
Server N: 2 containers
Plan
Query Example
HAWQ Master
Metadata
Transaction Mgr.
Query Parser Query Optimizer
Query Dispatch
Resource Mgr.
NameNode
Server 1
Local directory
HAWQ Segment
Postmaster
HDFS Datanode
Server 2
Local directory
HAWQ Segment
Postmaster
HDFS Datanode
Server N
Local directory
HAWQ Segment
Postmaster
HDFS Datanode
YARN RMPostmaster
Prepare Execute Result Cleanup
QE
Resource
I need 5 containers
Each with 1 CPU core
and 256 MB RAM
Server 1: 2 containers
Server 2: 1 container
Server N: 2 containers
QE QE QE QE QE
ResourcePlan
Query Example
HAWQ Master
Metadata
Transaction Mgr.
Query Parser Query Optimizer
Query Dispatch
Resource Mgr.
NameNode
Server 1
Local directory
HAWQ Segment
Postmaster
HDFS Datanode
Server 2
Local directory
HAWQ Segment
Postmaster
HDFS Datanode
Server N
Local directory
HAWQ Segment
Postmaster
HDFS Datanode
YARN RMPostmaster
Execute Result Cleanup
QE
QE QE QE QE QE
Prepare
ResourcePlan
Query Example
HAWQ Master
Metadata
Transaction Mgr.
Query Parser Query Optimizer
Query Dispatch
Resource Mgr.
NameNode
Server 1
Local directory
HAWQ Segment
Postmaster
HDFS Datanode
Server 2
Local directory
HAWQ Segment
Postmaster
HDFS Datanode
Server N
Local directory
HAWQ Segment
Postmaster
HDFS Datanode
YARN RMPostmaster
Execute Result Cleanup
QE
QE QE QE QE QE
Prepare
ScanBars
b
HashJoinb.name =s.bar
ScanSells
s
Filterb.city ='SanFrancisco'
Projects.beer, s.price
MotionGather
MotionRedist(b.name)

Recommended for you

ResourcePlan
Query Example
HAWQ Master
Metadata
Transaction Mgr.
Query Parser Query Optimizer
Query Dispatch
Resource Mgr.
NameNode
Server 1
Local directory
HAWQ Segment
Postmaster
HDFS Datanode
Server 2
Local directory
HAWQ Segment
Postmaster
HDFS Datanode
Server N
Local directory
HAWQ Segment
Postmaster
HDFS Datanode
YARN RMPostmaster
Execute Result Cleanup
QE
QE QE QE QE QE
Prepare
ScanBars
b
HashJoinb.name =s.bar
ScanSells
s
Filterb.city ='SanFrancisco'
Projects.beer, s.price
MotionGather
MotionRedist(b.name)
ResourcePlan
Query Example
HAWQ Master
Metadata
Transaction Mgr.
Query Parser Query Optimizer
Query Dispatch
Resource Mgr.
NameNode
Server 1
Local directory
HAWQ Segment
Postmaster
HDFS Datanode
Server 2
Local directory
HAWQ Segment
Postmaster
HDFS Datanode
Server N
Local directory
HAWQ Segment
Postmaster
HDFS Datanode
YARN RMPostmaster
Result Cleanup
QE
QE QE QE QE QE
Prepare Execute
ScanBars
b
HashJoinb.name =s.bar
ScanSells
s
Filterb.city ='SanFrancisco'
Projects.beer, s.price
MotionGather
MotionRedist(b.name)
ResourcePlan
Query Example
HAWQ Master
Metadata
Transaction Mgr.
Query Parser Query Optimizer
Query Dispatch
Resource Mgr.
NameNode
Server 1
Local directory
HAWQ Segment
Postmaster
HDFS Datanode
Server 2
Local directory
HAWQ Segment
Postmaster
HDFS Datanode
Server N
Local directory
HAWQ Segment
Postmaster
HDFS Datanode
YARN RMPostmaster
Result Cleanup
QE
QE QE QE QE QE
Prepare Execute
ScanBars
b
HashJoinb.name =s.bar
ScanSells
s
Filterb.city ='SanFrancisco'
Projects.beer, s.price
MotionGather
MotionRedist(b.name)
ResourcePlan
Query Example
HAWQ Master
Metadata
Transaction Mgr.
Query Parser Query Optimizer
Query Dispatch
Resource Mgr.
NameNode
Server 1
Local directory
HAWQ Segment
Postmaster
HDFS Datanode
Server 2
Local directory
HAWQ Segment
Postmaster
HDFS Datanode
Server N
Local directory
HAWQ Segment
Postmaster
HDFS Datanode
YARN RMPostmaster
Result Cleanup
QE
QE QE QE QE QE
Prepare Execute
ScanBars
b
HashJoinb.name =s.bar
ScanSells
s
Filterb.city ='SanFrancisco'
Projects.beer, s.price
MotionGather
MotionRedist(b.name)

Recommended for you

ResourcePlan
Query Example
HAWQ Master
Metadata
Transaction Mgr.
Query Parser Query Optimizer
Query Dispatch
Resource Mgr.
NameNode
Server 1
Local directory
HAWQ Segment
Postmaster
HDFS Datanode
Server 2
Local directory
HAWQ Segment
Postmaster
HDFS Datanode
Server N
Local directory
HAWQ Segment
Postmaster
HDFS Datanode
YARN RMPostmaster
Cleanup
QE
QE QE QE QE QE
Prepare Execute Result
ResourcePlan
Query Example
HAWQ Master
Metadata
Transaction Mgr.
Query Parser Query Optimizer
Query Dispatch
Resource Mgr.
NameNode
Server 1
Local directory
HAWQ Segment
Postmaster
HDFS Datanode
Server 2
Local directory
HAWQ Segment
Postmaster
HDFS Datanode
Server N
Local directory
HAWQ Segment
Postmaster
HDFS Datanode
YARN RMPostmaster
Cleanup
QE
QE QE QE QE QE
Prepare Execute Result
ResourcePlan
Query Example
HAWQ Master
Metadata
Transaction Mgr.
Query Parser Query Optimizer
Query Dispatch
Resource Mgr.
NameNode
Server 1
Local directory
HAWQ Segment
Postmaster
HDFS Datanode
Server 2
Local directory
HAWQ Segment
Postmaster
HDFS Datanode
Server N
Local directory
HAWQ Segment
Postmaster
HDFS Datanode
YARN RMPostmaster
Cleanup
QE
QE QE QE QE QE
Prepare Execute Result
ResourcePlan
Query Example
HAWQ Master
Metadata
Transaction Mgr.
Query Parser Query Optimizer
Query Dispatch
Resource Mgr.
NameNode
Server 1
Local directory
HAWQ Segment
Postmaster
HDFS Datanode
Server 2
Local directory
HAWQ Segment
Postmaster
HDFS Datanode
Server N
Local directory
HAWQ Segment
Postmaster
HDFS Datanode
YARN RMPostmaster
QE
QE QE QE QE QE
Prepare Execute Result Cleanup

Recommended for you

ResourcePlan
Query Example
HAWQ Master
Metadata
Transaction Mgr.
Query Parser Query Optimizer
Query Dispatch
Resource Mgr.
NameNode
Server 1
Local directory
HAWQ Segment
Postmaster
HDFS Datanode
Server 2
Local directory
HAWQ Segment
Postmaster
HDFS Datanode
Server N
Local directory
HAWQ Segment
Postmaster
HDFS Datanode
YARN RMPostmaster
QE
QE QE QE QE QE
Prepare Execute Result Cleanup
Free query resources
Server 1: 2 containers
Server 2: 1 container
Server N: 2 containers
ResourcePlan
Query Example
HAWQ Master
Metadata
Transaction Mgr.
Query Parser Query Optimizer
Query Dispatch
Resource Mgr.
NameNode
Server 1
Local directory
HAWQ Segment
Postmaster
HDFS Datanode
Server 2
Local directory
HAWQ Segment
Postmaster
HDFS Datanode
Server N
Local directory
HAWQ Segment
Postmaster
HDFS Datanode
YARN RMPostmaster
QE
QE QE QE QE QE
Prepare Execute Result Cleanup
Free query resources
Server 1: 2 containers
Server 2: 1 container
Server N: 2 containers
OK
ResourcePlan
Query Example
HAWQ Master
Metadata
Transaction Mgr.
Query Parser Query Optimizer
Query Dispatch
Resource Mgr.
NameNode
Server 1
Local directory
HAWQ Segment
Postmaster
HDFS Datanode
Server 2
Local directory
HAWQ Segment
Postmaster
HDFS Datanode
Server N
Local directory
HAWQ Segment
Postmaster
HDFS Datanode
YARN RMPostmaster
QE
QE QE QE QE QE
Prepare Execute Result Cleanup
Free query resources
Server 1: 2 containers
Server 2: 1 container
Server N: 2 containers
OK
ResourcePlan
Query Example
HAWQ Master
Metadata
Transaction Mgr.
Query Parser Query Optimizer
Query Dispatch
Resource Mgr.
NameNode
Server 1
Local directory
HAWQ Segment
Postmaster
HDFS Datanode
Server 2
Local directory
HAWQ Segment
Postmaster
HDFS Datanode
Server N
Local directory
HAWQ Segment
Postmaster
HDFS Datanode
YARN RMPostmaster
QE
QE QE QE QE QE
Prepare Execute Result Cleanup
Free query resources
Server 1: 2 containers
Server 2: 1 container
Server N: 2 containers
OK

Recommended for you

ResourcePlan
Query Example
HAWQ Master
Metadata
Transaction Mgr.
Query Parser Query Optimizer
Query Dispatch
Resource Mgr.
NameNode
Server 1
Local directory
HAWQ Segment
Postmaster
HDFS Datanode
Server 2
Local directory
HAWQ Segment
Postmaster
HDFS Datanode
Server N
Local directory
HAWQ Segment
Postmaster
HDFS Datanode
YARN RMPostmaster
QE
QE QE QE QE QE
Prepare Execute Result Cleanup
Free query resources
Server 1: 2 containers
Server 2: 1 container
Server N: 2 containers
OK
ResourcePlan
Query Example
HAWQ Master
Metadata
Transaction Mgr.
Query Parser Query Optimizer
Query Dispatch
Resource Mgr.
NameNode
Server 1
Local directory
HAWQ Segment
Postmaster
HDFS Datanode
Server 2
Local directory
HAWQ Segment
Postmaster
HDFS Datanode
Server N
Local directory
HAWQ Segment
Postmaster
HDFS Datanode
YARN RMPostmaster
Prepare Execute Result Cleanup
Query Performance
• Data does not hit the disk unless this cannot be
avoided
Query Performance
• Data does not hit the disk unless this cannot be
avoided
• Data is not buffered on the segments unless
this cannot be avoided

Recommended for you

Query Performance
• Data does not hit the disk unless this cannot be
avoided
• Data is not buffered on the segments unless
this cannot be avoided
• Data is transferred between the nodes by UDP
Query Performance
• Data does not hit the disk unless this cannot be
avoided
• Data is not buffered on the segments unless
this cannot be avoided
• Data is transferred between the nodes by UDP
• HAWQ has a good cost-based query optimizer
Query Performance
• Data does not hit the disk unless this cannot be
avoided
• Data is not buffered on the segments unless
this cannot be avoided
• Data is transferred between the nodes by UDP
• HAWQ has a good cost-based query optimizer
• C/C++ implementation is more efficient than
Java implementation of competitive solutions
Query Performance
• Data does not hit the disk unless this cannot be
avoided
• Data is not buffered on the segments unless
this cannot be avoided
• Data is transferred between the nodes by UDP
• HAWQ has a good cost-based query optimizer
• C/C++ implementation is more efficient than
Java implementation of competitive solutions
• Query parallelism can be easily tuned

Recommended for you

Competitive Solutions
Hive SparkSQL Impala HAWQ
Query Optimizer
Competitive Solutions
Hive SparkSQL Impala HAWQ
Query Optimizer
ANSI SQL
Competitive Solutions
Hive SparkSQL Impala HAWQ
Query Optimizer
ANSI SQL
Built-in Languages
Competitive Solutions
Hive SparkSQL Impala HAWQ
Query Optimizer
ANSI SQL
Built-in Languages
Disk IO

Recommended for you

Competitive Solutions
Hive SparkSQL Impala HAWQ
Query Optimizer
ANSI SQL
Built-in Languages
Disk IO
Parallelism
Competitive Solutions
Hive SparkSQL Impala HAWQ
Query Optimizer
ANSI SQL
Built-in Languages
Disk IO
Parallelism
Distributions
Competitive Solutions
Hive SparkSQL Impala HAWQ
Query Optimizer
ANSI SQL
Built-in Languages
Disk IO
Parallelism
Distributions
Stability
Competitive Solutions
Hive SparkSQL Impala HAWQ
Query Optimizer
ANSI SQL
Built-in Languages
Disk IO
Parallelism
Distributions
Stability
Community

Recommended for you

Roadmap
• AWS and S3 integration
Roadmap
• AWS and S3 integration
• Mesos integration
Roadmap
• AWS and S3 integration
• Mesos integration
• Better Ambari integration
Roadmap
• AWS and S3 integration
• Mesos integration
• Better Ambari integration
• Cloudera, MapR and IBM Hadoop distributions
native support

Recommended for you

Roadmap
• AWS and S3 integration
• Mesos integration
• Better Ambari integration
• Cloudera, MapR and IBM Hadoop distributions
native support
• Make the SQL-on-Hadoop engine ever!
Summary
• Modern SQL-on-Hadoop engine
• For structured data processing and analysis
• Combines the best techniques of competitive
solutions
• Just released to the open source
• Community is very young
Join our community and contribute!
Questions
Apache HAWQ
http://hawq.incubator.apache.org
dev@hawq.incubator.apache.org
user@hawq.incubator.apache.org
Reach me on http://0x0fff.com

More Related Content

What's hot

BDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
BDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMRBDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
BDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
Amazon Web Services
 
[EN] Building modern data pipeline with Snowflake + DBT + Airflow.pdf
[EN] Building modern data pipeline with Snowflake + DBT + Airflow.pdf[EN] Building modern data pipeline with Snowflake + DBT + Airflow.pdf
[EN] Building modern data pipeline with Snowflake + DBT + Airflow.pdf
Chris Hoyean Song
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
Databricks
 
Apache Druid 101
Apache Druid 101Apache Druid 101
Apache Druid 101
Data Con LA
 
Data at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceData at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and Governance
DATAVERSITY
 
Databricks + Snowflake: Catalyzing Data and AI Initiatives
Databricks + Snowflake: Catalyzing Data and AI InitiativesDatabricks + Snowflake: Catalyzing Data and AI Initiatives
Databricks + Snowflake: Catalyzing Data and AI Initiatives
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data Engineering
C4Media
 
Dynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache SparkDynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache Spark
Databricks
 
Interactive real time dashboards on data streams using Kafka, Druid, and Supe...
Interactive real time dashboards on data streams using Kafka, Druid, and Supe...Interactive real time dashboards on data streams using Kafka, Druid, and Supe...
Interactive real time dashboards on data streams using Kafka, Druid, and Supe...
DataWorks Summit
 
Experiences Migrating Hive Workload to SparkSQL with Jie Xiong and Zhan Zhang
Experiences Migrating Hive Workload to SparkSQL with Jie Xiong and Zhan ZhangExperiences Migrating Hive Workload to SparkSQL with Jie Xiong and Zhan Zhang
Experiences Migrating Hive Workload to SparkSQL with Jie Xiong and Zhan Zhang
Databricks
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
Databricks
 
AWS Big Data Platform
AWS Big Data PlatformAWS Big Data Platform
AWS Big Data Platform
Amazon Web Services
 
Learn to Use Databricks for the Full ML Lifecycle
Learn to Use Databricks for the Full ML LifecycleLearn to Use Databricks for the Full ML Lifecycle
Learn to Use Databricks for the Full ML Lifecycle
Databricks
 
Data Quality Best Practices
Data Quality Best PracticesData Quality Best Practices
Data Quality Best Practices
DATAVERSITY
 
Azure Synapse Analytics
Azure Synapse AnalyticsAzure Synapse Analytics
Azure Synapse Analytics
WinWire Technologies Inc
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)
James Serra
 
Building End-to-End Delta Pipelines on GCP
Building End-to-End Delta Pipelines on GCPBuilding End-to-End Delta Pipelines on GCP
Building End-to-End Delta Pipelines on GCP
Databricks
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
Databricks
 
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionDifferentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
James Serra
 

What's hot (20)

BDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
BDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMRBDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
BDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
 
[EN] Building modern data pipeline with Snowflake + DBT + Airflow.pdf
[EN] Building modern data pipeline with Snowflake + DBT + Airflow.pdf[EN] Building modern data pipeline with Snowflake + DBT + Airflow.pdf
[EN] Building modern data pipeline with Snowflake + DBT + Airflow.pdf
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
Apache Druid 101
Apache Druid 101Apache Druid 101
Apache Druid 101
 
Data at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceData at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and Governance
 
Databricks + Snowflake: Catalyzing Data and AI Initiatives
Databricks + Snowflake: Catalyzing Data and AI InitiativesDatabricks + Snowflake: Catalyzing Data and AI Initiatives
Databricks + Snowflake: Catalyzing Data and AI Initiatives
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data Engineering
 
Dynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache SparkDynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache Spark
 
Interactive real time dashboards on data streams using Kafka, Druid, and Supe...
Interactive real time dashboards on data streams using Kafka, Druid, and Supe...Interactive real time dashboards on data streams using Kafka, Druid, and Supe...
Interactive real time dashboards on data streams using Kafka, Druid, and Supe...
 
Experiences Migrating Hive Workload to SparkSQL with Jie Xiong and Zhan Zhang
Experiences Migrating Hive Workload to SparkSQL with Jie Xiong and Zhan ZhangExperiences Migrating Hive Workload to SparkSQL with Jie Xiong and Zhan Zhang
Experiences Migrating Hive Workload to SparkSQL with Jie Xiong and Zhan Zhang
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
AWS Big Data Platform
AWS Big Data PlatformAWS Big Data Platform
AWS Big Data Platform
 
Learn to Use Databricks for the Full ML Lifecycle
Learn to Use Databricks for the Full ML LifecycleLearn to Use Databricks for the Full ML Lifecycle
Learn to Use Databricks for the Full ML Lifecycle
 
Data Quality Best Practices
Data Quality Best PracticesData Quality Best Practices
Data Quality Best Practices
 
Azure Synapse Analytics
Azure Synapse AnalyticsAzure Synapse Analytics
Azure Synapse Analytics
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)
 
Building End-to-End Delta Pipelines on GCP
Building End-to-End Delta Pipelines on GCPBuilding End-to-End Delta Pipelines on GCP
Building End-to-End Delta Pipelines on GCP
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
 
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionDifferentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
 

Similar to Apache HAWQ Architecture

Deploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSDeploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWS
Amazon Web Services
 
aip-workshop1-dev-tutorial
aip-workshop1-dev-tutorialaip-workshop1-dev-tutorial
aip-workshop1-dev-tutorial
Matthew Vaughn
 
Architecting a Next Generation Data Platform
Architecting a Next Generation Data PlatformArchitecting a Next Generation Data Platform
Architecting a Next Generation Data Platform
hadooparchbook
 
Evolve Your Schemas in a Better Way! A Deep Dive into Avro Schema Compatibili...
Evolve Your Schemas in a Better Way! A Deep Dive into Avro Schema Compatibili...Evolve Your Schemas in a Better Way! A Deep Dive into Avro Schema Compatibili...
Evolve Your Schemas in a Better Way! A Deep Dive into Avro Schema Compatibili...
HostedbyConfluent
 
Hadoop application architectures - using Customer 360 as an example
Hadoop application architectures - using Customer 360 as an exampleHadoop application architectures - using Customer 360 as an example
Hadoop application architectures - using Customer 360 as an example
hadooparchbook
 
Cost-based query optimization in Apache Hive 0.14
Cost-based query optimization in Apache Hive 0.14Cost-based query optimization in Apache Hive 0.14
Cost-based query optimization in Apache Hive 0.14
Julian Hyde
 
Accelerating SQL queries in NoSQL Databases using Apache Drill and Secondary ...
Accelerating SQL queries in NoSQL Databases using Apache Drill and Secondary ...Accelerating SQL queries in NoSQL Databases using Apache Drill and Secondary ...
Accelerating SQL queries in NoSQL Databases using Apache Drill and Secondary ...
Aman Sinha
 
Architecting a next generation data platform
Architecting a next generation data platformArchitecting a next generation data platform
Architecting a next generation data platform
hadooparchbook
 
Architecting a next-generation data platform
Architecting a next-generation data platformArchitecting a next-generation data platform
Architecting a next-generation data platform
hadooparchbook
 
Architectural Patterns for Streaming Applications
Architectural Patterns for Streaming ApplicationsArchitectural Patterns for Streaming Applications
Architectural Patterns for Streaming Applications
hadooparchbook
 
SQL on everything, in memory
SQL on everything, in memorySQL on everything, in memory
SQL on everything, in memory
Julian Hyde
 
Genji: Framework for building resilient near-realtime data pipelines
Genji: Framework for building resilient near-realtime data pipelinesGenji: Framework for building resilient near-realtime data pipelines
Genji: Framework for building resilient near-realtime data pipelines
Swami Sundaramurthy
 
The Polyglot Data Scientist - Exploring R, Python, and SQL Server
The Polyglot Data Scientist - Exploring R, Python, and SQL ServerThe Polyglot Data Scientist - Exploring R, Python, and SQL Server
The Polyglot Data Scientist - Exploring R, Python, and SQL Server
Sarah Dutkiewicz
 
APEX 5 IR Guts and Performance
APEX 5 IR Guts and PerformanceAPEX 5 IR Guts and Performance
APEX 5 IR Guts and Performance
Karen Cannell
 
APEX 5 IR: Guts & Performance
APEX 5 IR:  Guts & PerformanceAPEX 5 IR:  Guts & Performance
APEX 5 IR: Guts & Performance
Karen Cannell
 
Prometheus lightning talk (Devops Dublin March 2015)
Prometheus lightning talk (Devops Dublin March 2015)Prometheus lightning talk (Devops Dublin March 2015)
Prometheus lightning talk (Devops Dublin March 2015)
Brian Brazil
 
Creating PostgreSQL-as-a-Service at Scale
Creating PostgreSQL-as-a-Service at ScaleCreating PostgreSQL-as-a-Service at Scale
Creating PostgreSQL-as-a-Service at Scale
Sean Chittenden
 
APEX 5 Interactive Reports: Guts and PErformance
APEX 5 Interactive Reports: Guts and PErformanceAPEX 5 Interactive Reports: Guts and PErformance
APEX 5 Interactive Reports: Guts and PErformance
Karen Cannell
 
OpenTSDB for monitoring @ Criteo
OpenTSDB for monitoring @ CriteoOpenTSDB for monitoring @ Criteo
OpenTSDB for monitoring @ Criteo
Nathaniel Braun
 
Scaling ingest pipelines with high performance computing principles - Rajiv K...
Scaling ingest pipelines with high performance computing principles - Rajiv K...Scaling ingest pipelines with high performance computing principles - Rajiv K...
Scaling ingest pipelines with high performance computing principles - Rajiv K...
SignalFx
 

Similar to Apache HAWQ Architecture (20)

Deploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSDeploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWS
 
aip-workshop1-dev-tutorial
aip-workshop1-dev-tutorialaip-workshop1-dev-tutorial
aip-workshop1-dev-tutorial
 
Architecting a Next Generation Data Platform
Architecting a Next Generation Data PlatformArchitecting a Next Generation Data Platform
Architecting a Next Generation Data Platform
 
Evolve Your Schemas in a Better Way! A Deep Dive into Avro Schema Compatibili...
Evolve Your Schemas in a Better Way! A Deep Dive into Avro Schema Compatibili...Evolve Your Schemas in a Better Way! A Deep Dive into Avro Schema Compatibili...
Evolve Your Schemas in a Better Way! A Deep Dive into Avro Schema Compatibili...
 
Hadoop application architectures - using Customer 360 as an example
Hadoop application architectures - using Customer 360 as an exampleHadoop application architectures - using Customer 360 as an example
Hadoop application architectures - using Customer 360 as an example
 
Cost-based query optimization in Apache Hive 0.14
Cost-based query optimization in Apache Hive 0.14Cost-based query optimization in Apache Hive 0.14
Cost-based query optimization in Apache Hive 0.14
 
Accelerating SQL queries in NoSQL Databases using Apache Drill and Secondary ...
Accelerating SQL queries in NoSQL Databases using Apache Drill and Secondary ...Accelerating SQL queries in NoSQL Databases using Apache Drill and Secondary ...
Accelerating SQL queries in NoSQL Databases using Apache Drill and Secondary ...
 
Architecting a next generation data platform
Architecting a next generation data platformArchitecting a next generation data platform
Architecting a next generation data platform
 
Architecting a next-generation data platform
Architecting a next-generation data platformArchitecting a next-generation data platform
Architecting a next-generation data platform
 
Architectural Patterns for Streaming Applications
Architectural Patterns for Streaming ApplicationsArchitectural Patterns for Streaming Applications
Architectural Patterns for Streaming Applications
 
SQL on everything, in memory
SQL on everything, in memorySQL on everything, in memory
SQL on everything, in memory
 
Genji: Framework for building resilient near-realtime data pipelines
Genji: Framework for building resilient near-realtime data pipelinesGenji: Framework for building resilient near-realtime data pipelines
Genji: Framework for building resilient near-realtime data pipelines
 
The Polyglot Data Scientist - Exploring R, Python, and SQL Server
The Polyglot Data Scientist - Exploring R, Python, and SQL ServerThe Polyglot Data Scientist - Exploring R, Python, and SQL Server
The Polyglot Data Scientist - Exploring R, Python, and SQL Server
 
APEX 5 IR Guts and Performance
APEX 5 IR Guts and PerformanceAPEX 5 IR Guts and Performance
APEX 5 IR Guts and Performance
 
APEX 5 IR: Guts & Performance
APEX 5 IR:  Guts & PerformanceAPEX 5 IR:  Guts & Performance
APEX 5 IR: Guts & Performance
 
Prometheus lightning talk (Devops Dublin March 2015)
Prometheus lightning talk (Devops Dublin March 2015)Prometheus lightning talk (Devops Dublin March 2015)
Prometheus lightning talk (Devops Dublin March 2015)
 
Creating PostgreSQL-as-a-Service at Scale
Creating PostgreSQL-as-a-Service at ScaleCreating PostgreSQL-as-a-Service at Scale
Creating PostgreSQL-as-a-Service at Scale
 
APEX 5 Interactive Reports: Guts and PErformance
APEX 5 Interactive Reports: Guts and PErformanceAPEX 5 Interactive Reports: Guts and PErformance
APEX 5 Interactive Reports: Guts and PErformance
 
OpenTSDB for monitoring @ Criteo
OpenTSDB for monitoring @ CriteoOpenTSDB for monitoring @ Criteo
OpenTSDB for monitoring @ Criteo
 
Scaling ingest pipelines with high performance computing principles - Rajiv K...
Scaling ingest pipelines with high performance computing principles - Rajiv K...Scaling ingest pipelines with high performance computing principles - Rajiv K...
Scaling ingest pipelines with high performance computing principles - Rajiv K...
 

More from Alexey Grishchenko

MPP vs Hadoop
MPP vs HadoopMPP vs Hadoop
MPP vs Hadoop
Alexey Grishchenko
 
Greenplum Architecture
Greenplum ArchitectureGreenplum Architecture
Greenplum Architecture
Alexey Grishchenko
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
Alexey Grishchenko
 
Modern Data Architecture
Modern Data ArchitectureModern Data Architecture
Modern Data Architecture
Alexey Grishchenko
 
Архитектура Apache HAWQ Highload++ 2015
Архитектура Apache HAWQ Highload++ 2015Архитектура Apache HAWQ Highload++ 2015
Архитектура Apache HAWQ Highload++ 2015
Alexey Grishchenko
 
Pivotal hawq internals
Pivotal hawq internalsPivotal hawq internals
Pivotal hawq internals
Alexey Grishchenko
 

More from Alexey Grishchenko (6)

MPP vs Hadoop
MPP vs HadoopMPP vs Hadoop
MPP vs Hadoop
 
Greenplum Architecture
Greenplum ArchitectureGreenplum Architecture
Greenplum Architecture
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
 
Modern Data Architecture
Modern Data ArchitectureModern Data Architecture
Modern Data Architecture
 
Архитектура Apache HAWQ Highload++ 2015
Архитектура Apache HAWQ Highload++ 2015Архитектура Apache HAWQ Highload++ 2015
Архитектура Apache HAWQ Highload++ 2015
 
Pivotal hawq internals
Pivotal hawq internalsPivotal hawq internals
Pivotal hawq internals
 

Recently uploaded

Karol Bagh @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Jya Khan Top Model Safe
Karol Bagh @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Jya Khan Top Model SafeKarol Bagh @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Jya Khan Top Model Safe
Karol Bagh @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Jya Khan Top Model Safe
bookmybebe1
 
@Call @Girls Coimbatore 🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cute Girl a...
@Call @Girls Coimbatore 🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cute Girl a...@Call @Girls Coimbatore 🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cute Girl a...
@Call @Girls Coimbatore 🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cute Girl a...
shivvichadda
 
[D2T2S04] SageMaker를 활용한 Generative AI Foundation Model Training and Tuning
[D2T2S04] SageMaker를 활용한 Generative AI Foundation Model Training and Tuning[D2T2S04] SageMaker를 활용한 Generative AI Foundation Model Training and Tuning
[D2T2S04] SageMaker를 활용한 Generative AI Foundation Model Training and Tuning
Donghwan Lee
 
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
#kalyanmatkaresult #dpboss #kalyanmatka #satta #matka #sattamatka
 
AIRLINE_SATISFACTION_Data Science Solution on Azure
AIRLINE_SATISFACTION_Data Science Solution on AzureAIRLINE_SATISFACTION_Data Science Solution on Azure
AIRLINE_SATISFACTION_Data Science Solution on Azure
SanelaNikodinoska1
 
[D3T1S02] Aurora Limitless Database Introduction
[D3T1S02] Aurora Limitless Database Introduction[D3T1S02] Aurora Limitless Database Introduction
[D3T1S02] Aurora Limitless Database Introduction
Amazon Web Services Korea
 
iot paper presentation FINAL EDIT by kiran.pptx
iot paper presentation FINAL EDIT by kiran.pptxiot paper presentation FINAL EDIT by kiran.pptx
iot paper presentation FINAL EDIT by kiran.pptx
KiranKumar139571
 
Madurai @Call @Girls Whatsapp 0000000000 With High Profile Offer 25%
Madurai @Call @Girls Whatsapp 0000000000 With High Profile Offer 25%Madurai @Call @Girls Whatsapp 0000000000 With High Profile Offer 25%
Madurai @Call @Girls Whatsapp 0000000000 With High Profile Offer 25%
punebabes1
 
₹Call ₹Girls Mumbai Central 09930245274 Deshi Chori Near You
₹Call ₹Girls Mumbai Central 09930245274 Deshi Chori Near You₹Call ₹Girls Mumbai Central 09930245274 Deshi Chori Near You
₹Call ₹Girls Mumbai Central 09930245274 Deshi Chori Near You
model sexy
 
Introduction to the Red Hat Portfolio.pdf
Introduction to the Red Hat Portfolio.pdfIntroduction to the Red Hat Portfolio.pdf
Introduction to the Red Hat Portfolio.pdf
kihus38
 
Hiranandani Gardens @Call @Girls Whatsapp 9833363713 With High Profile Offer
Hiranandani Gardens @Call @Girls Whatsapp 9833363713 With High Profile OfferHiranandani Gardens @Call @Girls Whatsapp 9833363713 With High Profile Offer
Hiranandani Gardens @Call @Girls Whatsapp 9833363713 With High Profile Offer
$A19
 
Saket @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Neha Singla Top Model Safe
Saket @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Neha Singla Top Model SafeSaket @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Neha Singla Top Model Safe
Saket @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Neha Singla Top Model Safe
shruti singh$A17
 
一比一原版英国埃塞克斯大学毕业证(essex毕业证书)如何办理
一比一原版英国埃塞克斯大学毕业证(essex毕业证书)如何办理一比一原版英国埃塞克斯大学毕业证(essex毕业证书)如何办理
一比一原版英国埃塞克斯大学毕业证(essex毕业证书)如何办理
qemnpg
 
[D3T1S03] Amazon DynamoDB design puzzlers
[D3T1S03] Amazon DynamoDB design puzzlers[D3T1S03] Amazon DynamoDB design puzzlers
[D3T1S03] Amazon DynamoDB design puzzlers
Amazon Web Services Korea
 
LLM powered Contract Compliance Application.pptx
LLM powered Contract Compliance Application.pptxLLM powered Contract Compliance Application.pptx
LLM powered Contract Compliance Application.pptx
Jyotishko Biswas
 
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
#kalyanmatkaresult #dpboss #kalyanmatka #satta #matka #sattamatka
 
*Call *Girls in Hyderabad 🤣 8826483818 🤣 Pooja Sharma Best High Class Hyderab...
*Call *Girls in Hyderabad 🤣 8826483818 🤣 Pooja Sharma Best High Class Hyderab...*Call *Girls in Hyderabad 🤣 8826483818 🤣 Pooja Sharma Best High Class Hyderab...
*Call *Girls in Hyderabad 🤣 8826483818 🤣 Pooja Sharma Best High Class Hyderab...
roobykhan02154
 
[D3T1S04] Aurora PostgreSQL performance monitoring and troubleshooting by use...
[D3T1S04] Aurora PostgreSQL performance monitoring and troubleshooting by use...[D3T1S04] Aurora PostgreSQL performance monitoring and troubleshooting by use...
[D3T1S04] Aurora PostgreSQL performance monitoring and troubleshooting by use...
Amazon Web Services Korea
 
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN MATKA RESULTS KALYAN CHART KALYAN MATKA ...
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN MATKA RESULTS KALYAN CHART KALYAN MATKA ...❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN MATKA RESULTS KALYAN CHART KALYAN MATKA ...
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN MATKA RESULTS KALYAN CHART KALYAN MATKA ...
#kalyanmatkaresult #dpboss #kalyanmatka #satta #matka #sattamatka
 
Orange Yellow Gradient Aesthetic Y2K Creative Portfolio Presentation -3.pdf
Orange Yellow Gradient Aesthetic Y2K Creative Portfolio Presentation -3.pdfOrange Yellow Gradient Aesthetic Y2K Creative Portfolio Presentation -3.pdf
Orange Yellow Gradient Aesthetic Y2K Creative Portfolio Presentation -3.pdf
RealDarrah
 

Recently uploaded (20)

Karol Bagh @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Jya Khan Top Model Safe
Karol Bagh @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Jya Khan Top Model SafeKarol Bagh @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Jya Khan Top Model Safe
Karol Bagh @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Jya Khan Top Model Safe
 
@Call @Girls Coimbatore 🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cute Girl a...
@Call @Girls Coimbatore 🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cute Girl a...@Call @Girls Coimbatore 🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cute Girl a...
@Call @Girls Coimbatore 🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cute Girl a...
 
[D2T2S04] SageMaker를 활용한 Generative AI Foundation Model Training and Tuning
[D2T2S04] SageMaker를 활용한 Generative AI Foundation Model Training and Tuning[D2T2S04] SageMaker를 활용한 Generative AI Foundation Model Training and Tuning
[D2T2S04] SageMaker를 활용한 Generative AI Foundation Model Training and Tuning
 
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
 
AIRLINE_SATISFACTION_Data Science Solution on Azure
AIRLINE_SATISFACTION_Data Science Solution on AzureAIRLINE_SATISFACTION_Data Science Solution on Azure
AIRLINE_SATISFACTION_Data Science Solution on Azure
 
[D3T1S02] Aurora Limitless Database Introduction
[D3T1S02] Aurora Limitless Database Introduction[D3T1S02] Aurora Limitless Database Introduction
[D3T1S02] Aurora Limitless Database Introduction
 
iot paper presentation FINAL EDIT by kiran.pptx
iot paper presentation FINAL EDIT by kiran.pptxiot paper presentation FINAL EDIT by kiran.pptx
iot paper presentation FINAL EDIT by kiran.pptx
 
Madurai @Call @Girls Whatsapp 0000000000 With High Profile Offer 25%
Madurai @Call @Girls Whatsapp 0000000000 With High Profile Offer 25%Madurai @Call @Girls Whatsapp 0000000000 With High Profile Offer 25%
Madurai @Call @Girls Whatsapp 0000000000 With High Profile Offer 25%
 
₹Call ₹Girls Mumbai Central 09930245274 Deshi Chori Near You
₹Call ₹Girls Mumbai Central 09930245274 Deshi Chori Near You₹Call ₹Girls Mumbai Central 09930245274 Deshi Chori Near You
₹Call ₹Girls Mumbai Central 09930245274 Deshi Chori Near You
 
Introduction to the Red Hat Portfolio.pdf
Introduction to the Red Hat Portfolio.pdfIntroduction to the Red Hat Portfolio.pdf
Introduction to the Red Hat Portfolio.pdf
 
Hiranandani Gardens @Call @Girls Whatsapp 9833363713 With High Profile Offer
Hiranandani Gardens @Call @Girls Whatsapp 9833363713 With High Profile OfferHiranandani Gardens @Call @Girls Whatsapp 9833363713 With High Profile Offer
Hiranandani Gardens @Call @Girls Whatsapp 9833363713 With High Profile Offer
 
Saket @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Neha Singla Top Model Safe
Saket @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Neha Singla Top Model SafeSaket @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Neha Singla Top Model Safe
Saket @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Neha Singla Top Model Safe
 
一比一原版英国埃塞克斯大学毕业证(essex毕业证书)如何办理
一比一原版英国埃塞克斯大学毕业证(essex毕业证书)如何办理一比一原版英国埃塞克斯大学毕业证(essex毕业证书)如何办理
一比一原版英国埃塞克斯大学毕业证(essex毕业证书)如何办理
 
[D3T1S03] Amazon DynamoDB design puzzlers
[D3T1S03] Amazon DynamoDB design puzzlers[D3T1S03] Amazon DynamoDB design puzzlers
[D3T1S03] Amazon DynamoDB design puzzlers
 
LLM powered Contract Compliance Application.pptx
LLM powered Contract Compliance Application.pptxLLM powered Contract Compliance Application.pptx
LLM powered Contract Compliance Application.pptx
 
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
 
*Call *Girls in Hyderabad 🤣 8826483818 🤣 Pooja Sharma Best High Class Hyderab...
*Call *Girls in Hyderabad 🤣 8826483818 🤣 Pooja Sharma Best High Class Hyderab...*Call *Girls in Hyderabad 🤣 8826483818 🤣 Pooja Sharma Best High Class Hyderab...
*Call *Girls in Hyderabad 🤣 8826483818 🤣 Pooja Sharma Best High Class Hyderab...
 
[D3T1S04] Aurora PostgreSQL performance monitoring and troubleshooting by use...
[D3T1S04] Aurora PostgreSQL performance monitoring and troubleshooting by use...[D3T1S04] Aurora PostgreSQL performance monitoring and troubleshooting by use...
[D3T1S04] Aurora PostgreSQL performance monitoring and troubleshooting by use...
 
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN MATKA RESULTS KALYAN CHART KALYAN MATKA ...
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN MATKA RESULTS KALYAN CHART KALYAN MATKA ...❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN MATKA RESULTS KALYAN CHART KALYAN MATKA ...
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN MATKA RESULTS KALYAN CHART KALYAN MATKA ...
 
Orange Yellow Gradient Aesthetic Y2K Creative Portfolio Presentation -3.pdf
Orange Yellow Gradient Aesthetic Y2K Creative Portfolio Presentation -3.pdfOrange Yellow Gradient Aesthetic Y2K Creative Portfolio Presentation -3.pdf
Orange Yellow Gradient Aesthetic Y2K Creative Portfolio Presentation -3.pdf
 

Apache HAWQ Architecture

  • 2. Who I am Enterprise Architect @ Pivotal • 7 years in data processing • 5 years of experience with MPP • 4 years with Hadoop • Using HAWQ since the first internal Beta • Responsible for designing most of the EMEA HAWQ and Greenplum implementations • Spark contributor • http://0x0fff.com
  • 4. Agenda • What is HAWQ • Why you need it
  • 5. Agenda • What is HAWQ • Why you need it • HAWQ Components
  • 6. Agenda • What is HAWQ • Why you need it • HAWQ Components • HAWQ Design
  • 7. Agenda • What is HAWQ • Why you need it • HAWQ Components • HAWQ Design • Query execution example
  • 8. Agenda • What is HAWQ • Why you need it • HAWQ Components • HAWQ Design • Query execution example • Competitive solutions
  • 9. What is • Analytical SQL-on-Hadoop engine
  • 10. What is • Analytical SQL-on-Hadoop engine • HAdoop With Queries
  • 11. What is • Analytical SQL-on-Hadoop engine • HAdoop With Queries Postgres Greenplum HAWQ 2005 Fork Postgres 8.0.2
  • 12. What is • Analytical SQL-on-Hadoop engine • HAdoop With Queries Postgres HAWQ 2005 Fork Postgres 8.0.2 2009 Rebase Postgres 8.2.15 Greenplum
  • 13. What is • Analytical SQL-on-Hadoop engine • HAdoop With Queries Postgres HAWQ 2005 Fork Postgres 8.0.2 2009 Rebase Postgres 8.2.15 2011 Fork GPDB 4.2.0.0 Greenplum
  • 14. What is • Analytical SQL-on-Hadoop engine • HAdoop With Queries Postgres HAWQ 2005 Fork Postgres 8.0.2 2009 Rebase Postgres 8.2.15 2011 Fork GPDB 4.2.0.0 2013 HAWQ 1.0.0.0 Greenplum
  • 15. What is • Analytical SQL-on-Hadoop engine • HAdoop With Queries Postgres HAWQ 2005 Fork Postgres 8.0.2 2009 Rebase Postgres 8.2.15 2011 Fork GPDB 4.2.0.0 2013 HAWQ 1.0.0.0 HAWQ 2.0.0.0 Open Source 2015 Greenplum
  • 16. HAWQ is … • 1’500’000 C and C++ lines of code
  • 17. HAWQ is … • 1’500’000 C and C++ lines of code – 200’000 of them in headers only
  • 18. HAWQ is … • 1’500’000 C and C++ lines of code – 200’000 of them in headers only • 180’000 Python LOC
  • 19. HAWQ is … • 1’500’000 C and C++ lines of code – 200’000 of them in headers only • 180’000 Python LOC • 60’000 Java LOC
  • 20. HAWQ is … • 1’500’000 C and C++ lines of code – 200’000 of them in headers only • 180’000 Python LOC • 60’000 Java LOC • 23’000 Makefile LOC
  • 21. HAWQ is … • 1’500’000 C and C++ lines of code – 200’000 of them in headers only • 180’000 Python LOC • 60’000 Java LOC • 23’000 Makefile LOC • 7’000 Shell scripts LOC
  • 22. HAWQ is … • 1’500’000 C and C++ lines of code – 200’000 of them in headers only • 180’000 Python LOC • 60’000 Java LOC • 23’000 Makefile LOC • 7’000 Shell scripts LOC • More than 50 enterprise customers
  • 23. HAWQ is … • 1’500’000 C and C++ lines of code – 200’000 of them in headers only • 180’000 Python LOC • 60’000 Java LOC • 23’000 Makefile LOC • 7’000 Shell scripts LOC • More than 50 enterprise customers – More than 10 of them in EMEA
  • 24. Apache HAWQ • Apache HAWQ (incubating) from 09’2015 – http://hawq.incubator.apache.org – https://github.com/apache/incubator-hawq • What’s in Open Source – Sources of HAWQ 2.0 alpha – HAWQ 2.0 beta is planned for 2015’Q4 – HAWQ 2.0 GA is planned for 2016’Q1 • Community is yet young – come and join!
  • 25. Why do we need it?
  • 26. Why do we need it? • SQL-interface for BI solutions to the Hadoop data complaint with ANSI SQL-92, -99, -2003
  • 27. Why do we need it? • SQL-interface for BI solutions to the Hadoop data complaint with ANSI SQL-92, -99, -2003 – Example - 5000-line query with a number of window function generated by Cognos
  • 28. Why do we need it? • SQL-interface for BI solutions to the Hadoop data complaint with ANSI SQL-92, -99, -2003 – Example - 5000-line query with a number of window function generated by Cognos • Universal tool for ad hoc analytics on top of Hadoop data
  • 29. Why do we need it? • SQL-interface for BI solutions to the Hadoop data complaint with ANSI SQL-92, -99, -2003 – Example - 5000-line query with a number of window function generated by Cognos • Universal tool for ad hoc analytics on top of Hadoop data – Example - parse URL to extract protocol, host name, port, GET parameters
  • 30. Why do we need it? • SQL-interface for BI solutions to the Hadoop data complaint with ANSI SQL-92, -99, -2003 – Example - 5000-line query with a number of window function generated by Cognos • Universal tool for ad hoc analytics on top of Hadoop data – Example - parse URL to extract protocol, host name, port, GET parameters • Good performance
  • 31. Why do we need it? • SQL-interface for BI solutions to the Hadoop data complaint with ANSI SQL-92, -99, -2003 – Example - 5000-line query with a number of window function generated by Cognos • Universal tool for ad hoc analytics on top of Hadoop data – Example - parse URL to extract protocol, host name, port, GET parameters • Good performance – How many times the data would hit the HDD during a single Hive query?
  • 32. HAWQ Cluster Server 1 SNameNode Server 4 ZK JM NameNode Server 3 ZK JM Server 2 ZK JM Server 6 Datanode Server N Datanode Server 5 Datanode interconnect …
  • 33. HAWQ Cluster Server 1 SNameNode Server 4 ZK JM NameNode Server 3 ZK JM Server 2 ZK JM Server 6 Datanode Server N Datanode Server 5 Datanode YARN NM YARN NM YARN NM YARN RM YARN App Timeline interconnect …
  • 34. HAWQ Cluster HAWQ Master Server 1 SNameNode Server 4 ZK JM NameNode Server 3 ZK JM HAWQ Standby Server 2 ZK JM HAWQ Segment Server 6 Datanode HAWQ Segment Server N Datanode HAWQ Segment Server 5 Datanode YARN NM YARN NM YARN NM YARN RM YARN App Timeline interconnect …
  • 35. Master Servers Server 1 SNameNode Server 4 ZK JM NameNode Server 3 ZK JM Server 2 ZK JM HAWQ Segment Server 6 Datanode HAWQ Segment Server N Datanode HAWQ Segment Server 5 Datanode YARN NM YARN NM YARN NM YARN RM YARN App Timeline interconnect … HAWQ Master HAWQ Standby
  • 36. Master Servers HAWQ Master Query Parser Query Optimizer Global Resource Manager Distributed Transactions Manager Query Dispatch Metadata Catalog HAWQ Standby Master Query Parser Query Optimizer Global Resource Manager Distributed Transactions Manager Query Dispatch Metadata Catalog WAL repl.
  • 37. HAWQ Master HAWQ Standby Segments Server 1 SNameNode Server 4 ZK JM NameNode Server 3 ZK JM Server 2 ZK JM Server 6 Datanode Server N Datanode Server 5 Datanode YARN NM YARN NM YARN NM YARN RM YARN App Timeline interconnect HAWQ Segment HAWQ SegmentHAWQ Segment …
  • 38. Segments HAWQ Segment Query Executor libhdfs3 PXF HDFS Datanode Local Filesystem Temporary Data Directory Logs YARN Node Manager
  • 39. Metadata • HAWQ metadata structure is similar to Postgres catalog structure
  • 40. Metadata • HAWQ metadata structure is similar to Postgres catalog structure • Statistics – Number of rows and pages in the table
  • 41. Metadata • HAWQ metadata structure is similar to Postgres catalog structure • Statistics – Number of rows and pages in the table – Most common values for each field
  • 42. Metadata • HAWQ metadata structure is similar to Postgres catalog structure • Statistics – Number of rows and pages in the table – Most common values for each field – Histogram of values distribution for each field
  • 43. Metadata • HAWQ metadata structure is similar to Postgres catalog structure • Statistics – Number of rows and pages in the table – Most common values for each field – Histogram of values distribution for each field – Number of unique values in the field
  • 44. Metadata • HAWQ metadata structure is similar to Postgres catalog structure • Statistics – Number of rows and pages in the table – Most common values for each field – Histogram of values distribution for each field – Number of unique values in the field – Number of null values in the field
  • 45. Metadata • HAWQ metadata structure is similar to Postgres catalog structure • Statistics – Number of rows and pages in the table – Most common values for each field – Histogram of values distribution for each field – Number of unique values in the field – Number of null values in the field – Average width of the field in bytes
  • 46. Statistics No Statistics How many rows would produce the join of two tables?
  • 47. Statistics No Statistics How many rows would produce the join of two tables?  From 0 to infinity
  • 48. Statistics No Statistics Row Count How many rows would produce the join of two tables?  From 0 to infinity How many rows would produce the join of two 1000- row tables?
  • 49. Statistics No Statistics Row Count How many rows would produce the join of two tables?  From 0 to infinity How many rows would produce the join of two 1000- row tables?  From 0 to 1’000’000
  • 50. Statistics No Statistics Row Count Histograms and MCV How many rows would produce the join of two tables?  From 0 to infinity How many rows would produce the join of two 1000- row tables?  From 0 to 1’000’000 How many rows would produce the join of two 1000- row tables, with known field cardinality, values distribution diagram, number of nulls, most common values?
  • 51. Statistics No Statistics Row Count Histograms and MCV How many rows would produce the join of two tables?  From 0 to infinity How many rows would produce the join of two 1000- row tables?  From 0 to 1’000’000 How many rows would produce the join of two 1000- row tables, with known field cardinality, values distribution diagram, number of nulls, most common values?  ~ From 500 to 1’500
  • 52. Metadata • Table structure information ID Name Num Price 1 Яблоко 10 50 2 Груша 20 80 3 Банан 40 40 4 Апельсин 25 50 5 Киви 5 120 6 Арбуз 20 30 7 Дыня 40 100 8 Ананас 35 90
  • 53. Metadata • Table structure information – Distribution fields ID Name Num Price 1 Яблоко 10 50 2 Груша 20 80 3 Банан 40 40 4 Апельсин 25 50 5 Киви 5 120 6 Арбуз 20 30 7 Дыня 40 100 8 Ананас 35 90 hash(ID)
  • 54. Metadata • Table structure information – Distribution fields – Number of hash buckets ID Name Num Price 1 Яблоко 10 50 2 Груша 20 80 3 Банан 40 40 4 Апельсин 25 50 5 Киви 5 120 6 Арбуз 20 30 7 Дыня 40 100 8 Ананас 35 90 hash(ID) ID Name Num Price 1 Яблоко 10 50 2 Груша 20 80 3 Банан 40 40 4 Апельсин 25 50 5 Киви 5 120 6 Арбуз 20 30 7 Дыня 40 100 8 Ананас 35 90
  • 55. Metadata • Table structure information – Distribution fields – Number of hash buckets – Partitioning (hash, list, range) ID Name Num Price 1 Яблоко 10 50 2 Груша 20 80 3 Банан 40 40 4 Апельсин 25 50 5 Киви 5 120 6 Арбуз 20 30 7 Дыня 40 100 8 Ананас 35 90 hash(ID) ID Name Num Price 1 Яблоко 10 50 2 Груша 20 80 3 Банан 40 40 4 Апельсин 25 50 5 Киви 5 120 6 Арбуз 20 30 7 Дыня 40 100 8 Ананас 35 90
  • 56. Metadata • Table structure information – Distribution fields – Number of hash buckets – Partitioning (hash, list, range) • General metadata – Users and groups
  • 57. Metadata • Table structure information – Distribution fields – Number of hash buckets – Partitioning (hash, list, range) • General metadata – Users and groups – Access privileges
  • 58. Metadata • Table structure information – Distribution fields – Number of hash buckets – Partitioning (hash, list, range) • General metadata – Users and groups – Access privileges • Stored procedures – PL/pgSQL, PL/Java, PL/Python, PL/Perl, PL/R
  • 59. Query Optimizer • HAWQ uses cost-based query optimizers
  • 60. Query Optimizer • HAWQ uses cost-based query optimizers • You have two options – Planner – evolved from the Postgres query optimizer – ORCA (Pivotal Query Optimizer) – developed specifically for HAWQ
  • 61. Query Optimizer • HAWQ uses cost-based query optimizers • You have two options – Planner – evolved from the Postgres query optimizer – ORCA (Pivotal Query Optimizer) – developed specifically for HAWQ • Optimizer hints work just like in Postgres – Enable/disable specific operation – Change the cost estimations for basic actions
  • 62. Storage Formats Which storage format is the most optimal?
  • 63. Storage Formats Which storage format is the most optimal?  It depends on what you mean by “optimal”
  • 64. Storage Formats Which storage format is the most optimal?  It depends on what you mean by “optimal” – Minimal CPU usage for reading and writing the data
  • 65. Storage Formats Which storage format is the most optimal?  It depends on what you mean by “optimal” – Minimal CPU usage for reading and writing the data – Minimal disk space usage
  • 66. Storage Formats Which storage format is the most optimal?  It depends on what you mean by “optimal” – Minimal CPU usage for reading and writing the data – Minimal disk space usage – Minimal time to retrieve record by key
  • 67. Storage Formats Which storage format is the most optimal?  It depends on what you mean by “optimal” – Minimal CPU usage for reading and writing the data – Minimal disk space usage – Minimal time to retrieve record by key – Minimal time to retrieve subset of columns – etc.
  • 68. Storage Formats • Row-based storage format – Similar to Postgres heap storage • No toast • No ctid, xmin, xmax, cmin, cmax
  • 69. Storage Formats • Row-based storage format – Similar to Postgres heap storage • No toast • No ctid, xmin, xmax, cmin, cmax – Compression • No compression • Quicklz • Zlib levels 1 - 9
  • 70. Storage Formats • Apache Parquet – Mixed row-columnar table store, the data is split into “row groups” stored in columnar format
  • 71. Storage Formats • Apache Parquet – Mixed row-columnar table store, the data is split into “row groups” stored in columnar format – Compression • No compression • Snappy • Gzip levels 1 – 9
  • 72. Storage Formats • Apache Parquet – Mixed row-columnar table store, the data is split into “row groups” stored in columnar format – Compression • No compression • Snappy • Gzip levels 1 – 9 – The size of “row group” and page size can be set for each table separately
  • 73. Resource Management • Two main options – Static resource split – HAWQ and YARN does not know about each other
  • 74. Resource Management • Two main options – Static resource split – HAWQ and YARN does not know about each other – YARN – HAWQ asks YARN Resource Manager for query execution resources
  • 75. Resource Management • Two main options – Static resource split – HAWQ and YARN does not know about each other – YARN – HAWQ asks YARN Resource Manager for query execution resources • Flexible cluster utilization – Query might run on a subset of nodes if it is small
  • 76. Resource Management • Two main options – Static resource split – HAWQ and YARN does not know about each other – YARN – HAWQ asks YARN Resource Manager for query execution resources • Flexible cluster utilization – Query might run on a subset of nodes if it is small – Query might have many executors on each cluster node to make it run faster
  • 77. Resource Management • Two main options – Static resource split – HAWQ and YARN does not know about each other – YARN – HAWQ asks YARN Resource Manager for query execution resources • Flexible cluster utilization – Query might run on a subset of nodes if it is small – Query might have many executors on each cluster node to make it run faster – You can control the parallelism of each query
  • 78. Resource Management • Resource Queue can be set with – Maximum number of parallel queries
  • 79. Resource Management • Resource Queue can be set with – Maximum number of parallel queries – CPU usage priority
  • 80. Resource Management • Resource Queue can be set with – Maximum number of parallel queries – CPU usage priority – Memory usage limits
  • 81. Resource Management • Resource Queue can be set with – Maximum number of parallel queries – CPU usage priority – Memory usage limits – CPU cores usage limit
  • 82. Resource Management • Resource Queue can be set with – Maximum number of parallel queries – CPU usage priority – Memory usage limits – CPU cores usage limit – MIN/MAX number of executors across the system
  • 83. Resource Management • Resource Queue can be set with – Maximum number of parallel queries – CPU usage priority – Memory usage limits – CPU cores usage limit – MIN/MAX number of executors across the system – MIN/MAX number of executors on each node
  • 84. Resource Management • Resource Queue can be set with – Maximum number of parallel queries – CPU usage priority – Memory usage limits – CPU cores usage limit – MIN/MAX number of executors across the system – MIN/MAX number of executors on each node • Can be set up for user or group
  • 85. External Data • PXF – Framework for external data access – Easy to extend, many public plugins available – Official plugins: CSV, SequenceFile, Avro, Hive, HBase – Open Source plugins: JSON, Accumulo, Cassandra, JDBC, Redis, Pipe
  • 86. External Data • PXF – Framework for external data access – Easy to extend, many public plugins available – Official plugins: CSV, SequenceFile, Avro, Hive, HBase – Open Source plugins: JSON, Accumulo, Cassandra, JDBC, Redis, Pipe • HCatalog – HAWQ can query tables from HCatalog the same way as HAWQ native tables
  • 87. Query Example HAWQ Master Metadata Transaction Mgr. Query Parser Query Optimizer Query Dispatch Resource Mgr. NameNode Server 1 Local directory HAWQ Segment Postmaster HDFS Datanode Server 2 Local directory HAWQ Segment Postmaster HDFS Datanode Server N Local directory HAWQ Segment Postmaster HDFS Datanode YARN RMPostmaster Resource Prepare Execute Result CleanupPlan
  • 88. Query Example HAWQ Master Metadata Transaction Mgr. Query Parser Query Optimizer Query Dispatch Resource Mgr. NameNode Server 1 Local directory HAWQ Segment Postmaster HDFS Datanode Server 2 Local directory HAWQ Segment Postmaster HDFS Datanode Server N Local directory HAWQ Segment Postmaster HDFS Datanode YARN RMPostmaster Resource Prepare Execute Result CleanupPlan QE
  • 89. Query Example HAWQ Master Metadata Transaction Mgr. Query Parser Query Optimizer Query Dispatch Resource Mgr. NameNode Server 1 Local directory HAWQ Segment Postmaster HDFS Datanode Server 2 Local directory HAWQ Segment Postmaster HDFS Datanode Server N Local directory HAWQ Segment Postmaster HDFS Datanode YARN RMPostmaster Resource Prepare Execute Result CleanupPlan QE
  • 90. Query Example HAWQ Master Metadata Transaction Mgr. Query Parser Query Optimizer Query Dispatch Resource Mgr. NameNode Server 1 Local directory HAWQ Segment Postmaster HDFS Datanode Server 2 Local directory HAWQ Segment Postmaster HDFS Datanode Server N Local directory HAWQ Segment Postmaster HDFS Datanode YARN RMPostmaster Resource Prepare Execute Result CleanupPlan QE
  • 91. Query Example HAWQ Master Metadata Transaction Mgr. Query Parser Query Optimizer Query Dispatch Resource Mgr. NameNode Server 1 Local directory HAWQ Segment Postmaster HDFS Datanode Server 2 Local directory HAWQ Segment Postmaster HDFS Datanode Server N Local directory HAWQ Segment Postmaster HDFS Datanode YARN RMPostmaster Resource Prepare Execute Result CleanupPlan QE
  • 92. Query Example HAWQ Master Metadata Transaction Mgr. Query Parser Query Optimizer Query Dispatch Resource Mgr. NameNode Server 1 Local directory HAWQ Segment Postmaster HDFS Datanode Server 2 Local directory HAWQ Segment Postmaster HDFS Datanode Server N Local directory HAWQ Segment Postmaster HDFS Datanode YARN RMPostmaster Resource Prepare Execute Result CleanupPlan QE ScanBars b HashJoinb.name =s.bar ScanSells s Filterb.city ='SanFrancisco' Projects.beer, s.price MotionGather MotionRedist(b.name)
  • 93. Plan Query Example HAWQ Master Metadata Transaction Mgr. Query Parser Query Optimizer Query Dispatch Resource Mgr. NameNode Server 1 Local directory HAWQ Segment Postmaster HDFS Datanode Server 2 Local directory HAWQ Segment Postmaster HDFS Datanode Server N Local directory HAWQ Segment Postmaster HDFS Datanode YARN RMPostmaster Prepare Execute Result Cleanup QE Resource
  • 94. Plan Query Example HAWQ Master Metadata Transaction Mgr. Query Parser Query Optimizer Query Dispatch Resource Mgr. NameNode Server 1 Local directory HAWQ Segment Postmaster HDFS Datanode Server 2 Local directory HAWQ Segment Postmaster HDFS Datanode Server N Local directory HAWQ Segment Postmaster HDFS Datanode YARN RMPostmaster Prepare Execute Result Cleanup QE Resource I need 5 containers Each with 1 CPU core and 256 MB RAM
  • 95. Plan Query Example HAWQ Master Metadata Transaction Mgr. Query Parser Query Optimizer Query Dispatch Resource Mgr. NameNode Server 1 Local directory HAWQ Segment Postmaster HDFS Datanode Server 2 Local directory HAWQ Segment Postmaster HDFS Datanode Server N Local directory HAWQ Segment Postmaster HDFS Datanode YARN RMPostmaster Prepare Execute Result Cleanup QE Resource I need 5 containers Each with 1 CPU core and 256 MB RAM Server 1: 2 containers Server 2: 1 container Server N: 2 containers
  • 96. Plan Query Example HAWQ Master Metadata Transaction Mgr. Query Parser Query Optimizer Query Dispatch Resource Mgr. NameNode Server 1 Local directory HAWQ Segment Postmaster HDFS Datanode Server 2 Local directory HAWQ Segment Postmaster HDFS Datanode Server N Local directory HAWQ Segment Postmaster HDFS Datanode YARN RMPostmaster Prepare Execute Result Cleanup QE Resource I need 5 containers Each with 1 CPU core and 256 MB RAM Server 1: 2 containers Server 2: 1 container Server N: 2 containers
  • 97. Plan Query Example HAWQ Master Metadata Transaction Mgr. Query Parser Query Optimizer Query Dispatch Resource Mgr. NameNode Server 1 Local directory HAWQ Segment Postmaster HDFS Datanode Server 2 Local directory HAWQ Segment Postmaster HDFS Datanode Server N Local directory HAWQ Segment Postmaster HDFS Datanode YARN RMPostmaster Prepare Execute Result Cleanup QE Resource I need 5 containers Each with 1 CPU core and 256 MB RAM Server 1: 2 containers Server 2: 1 container Server N: 2 containers
  • 98. Plan Query Example HAWQ Master Metadata Transaction Mgr. Query Parser Query Optimizer Query Dispatch Resource Mgr. NameNode Server 1 Local directory HAWQ Segment Postmaster HDFS Datanode Server 2 Local directory HAWQ Segment Postmaster HDFS Datanode Server N Local directory HAWQ Segment Postmaster HDFS Datanode YARN RMPostmaster Prepare Execute Result Cleanup QE Resource I need 5 containers Each with 1 CPU core and 256 MB RAM Server 1: 2 containers Server 2: 1 container Server N: 2 containers QE QE QE QE QE
  • 99. ResourcePlan Query Example HAWQ Master Metadata Transaction Mgr. Query Parser Query Optimizer Query Dispatch Resource Mgr. NameNode Server 1 Local directory HAWQ Segment Postmaster HDFS Datanode Server 2 Local directory HAWQ Segment Postmaster HDFS Datanode Server N Local directory HAWQ Segment Postmaster HDFS Datanode YARN RMPostmaster Execute Result Cleanup QE QE QE QE QE QE Prepare
  • 100. ResourcePlan Query Example HAWQ Master Metadata Transaction Mgr. Query Parser Query Optimizer Query Dispatch Resource Mgr. NameNode Server 1 Local directory HAWQ Segment Postmaster HDFS Datanode Server 2 Local directory HAWQ Segment Postmaster HDFS Datanode Server N Local directory HAWQ Segment Postmaster HDFS Datanode YARN RMPostmaster Execute Result Cleanup QE QE QE QE QE QE Prepare ScanBars b HashJoinb.name =s.bar ScanSells s Filterb.city ='SanFrancisco' Projects.beer, s.price MotionGather MotionRedist(b.name)
  • 101. ResourcePlan Query Example HAWQ Master Metadata Transaction Mgr. Query Parser Query Optimizer Query Dispatch Resource Mgr. NameNode Server 1 Local directory HAWQ Segment Postmaster HDFS Datanode Server 2 Local directory HAWQ Segment Postmaster HDFS Datanode Server N Local directory HAWQ Segment Postmaster HDFS Datanode YARN RMPostmaster Execute Result Cleanup QE QE QE QE QE QE Prepare ScanBars b HashJoinb.name =s.bar ScanSells s Filterb.city ='SanFrancisco' Projects.beer, s.price MotionGather MotionRedist(b.name)
  • 102. ResourcePlan Query Example HAWQ Master Metadata Transaction Mgr. Query Parser Query Optimizer Query Dispatch Resource Mgr. NameNode Server 1 Local directory HAWQ Segment Postmaster HDFS Datanode Server 2 Local directory HAWQ Segment Postmaster HDFS Datanode Server N Local directory HAWQ Segment Postmaster HDFS Datanode YARN RMPostmaster Result Cleanup QE QE QE QE QE QE Prepare Execute ScanBars b HashJoinb.name =s.bar ScanSells s Filterb.city ='SanFrancisco' Projects.beer, s.price MotionGather MotionRedist(b.name)
  • 103. ResourcePlan Query Example HAWQ Master Metadata Transaction Mgr. Query Parser Query Optimizer Query Dispatch Resource Mgr. NameNode Server 1 Local directory HAWQ Segment Postmaster HDFS Datanode Server 2 Local directory HAWQ Segment Postmaster HDFS Datanode Server N Local directory HAWQ Segment Postmaster HDFS Datanode YARN RMPostmaster Result Cleanup QE QE QE QE QE QE Prepare Execute ScanBars b HashJoinb.name =s.bar ScanSells s Filterb.city ='SanFrancisco' Projects.beer, s.price MotionGather MotionRedist(b.name)
  • 104. ResourcePlan Query Example HAWQ Master Metadata Transaction Mgr. Query Parser Query Optimizer Query Dispatch Resource Mgr. NameNode Server 1 Local directory HAWQ Segment Postmaster HDFS Datanode Server 2 Local directory HAWQ Segment Postmaster HDFS Datanode Server N Local directory HAWQ Segment Postmaster HDFS Datanode YARN RMPostmaster Result Cleanup QE QE QE QE QE QE Prepare Execute ScanBars b HashJoinb.name =s.bar ScanSells s Filterb.city ='SanFrancisco' Projects.beer, s.price MotionGather MotionRedist(b.name)
  • 105. ResourcePlan Query Example HAWQ Master Metadata Transaction Mgr. Query Parser Query Optimizer Query Dispatch Resource Mgr. NameNode Server 1 Local directory HAWQ Segment Postmaster HDFS Datanode Server 2 Local directory HAWQ Segment Postmaster HDFS Datanode Server N Local directory HAWQ Segment Postmaster HDFS Datanode YARN RMPostmaster Cleanup QE QE QE QE QE QE Prepare Execute Result
  • 106. ResourcePlan Query Example HAWQ Master Metadata Transaction Mgr. Query Parser Query Optimizer Query Dispatch Resource Mgr. NameNode Server 1 Local directory HAWQ Segment Postmaster HDFS Datanode Server 2 Local directory HAWQ Segment Postmaster HDFS Datanode Server N Local directory HAWQ Segment Postmaster HDFS Datanode YARN RMPostmaster Cleanup QE QE QE QE QE QE Prepare Execute Result
  • 107. ResourcePlan Query Example HAWQ Master Metadata Transaction Mgr. Query Parser Query Optimizer Query Dispatch Resource Mgr. NameNode Server 1 Local directory HAWQ Segment Postmaster HDFS Datanode Server 2 Local directory HAWQ Segment Postmaster HDFS Datanode Server N Local directory HAWQ Segment Postmaster HDFS Datanode YARN RMPostmaster Cleanup QE QE QE QE QE QE Prepare Execute Result
  • 108. ResourcePlan Query Example HAWQ Master Metadata Transaction Mgr. Query Parser Query Optimizer Query Dispatch Resource Mgr. NameNode Server 1 Local directory HAWQ Segment Postmaster HDFS Datanode Server 2 Local directory HAWQ Segment Postmaster HDFS Datanode Server N Local directory HAWQ Segment Postmaster HDFS Datanode YARN RMPostmaster QE QE QE QE QE QE Prepare Execute Result Cleanup
  • 109. ResourcePlan Query Example HAWQ Master Metadata Transaction Mgr. Query Parser Query Optimizer Query Dispatch Resource Mgr. NameNode Server 1 Local directory HAWQ Segment Postmaster HDFS Datanode Server 2 Local directory HAWQ Segment Postmaster HDFS Datanode Server N Local directory HAWQ Segment Postmaster HDFS Datanode YARN RMPostmaster QE QE QE QE QE QE Prepare Execute Result Cleanup Free query resources Server 1: 2 containers Server 2: 1 container Server N: 2 containers
  • 110. ResourcePlan Query Example HAWQ Master Metadata Transaction Mgr. Query Parser Query Optimizer Query Dispatch Resource Mgr. NameNode Server 1 Local directory HAWQ Segment Postmaster HDFS Datanode Server 2 Local directory HAWQ Segment Postmaster HDFS Datanode Server N Local directory HAWQ Segment Postmaster HDFS Datanode YARN RMPostmaster QE QE QE QE QE QE Prepare Execute Result Cleanup Free query resources Server 1: 2 containers Server 2: 1 container Server N: 2 containers OK
  • 111. ResourcePlan Query Example HAWQ Master Metadata Transaction Mgr. Query Parser Query Optimizer Query Dispatch Resource Mgr. NameNode Server 1 Local directory HAWQ Segment Postmaster HDFS Datanode Server 2 Local directory HAWQ Segment Postmaster HDFS Datanode Server N Local directory HAWQ Segment Postmaster HDFS Datanode YARN RMPostmaster QE QE QE QE QE QE Prepare Execute Result Cleanup Free query resources Server 1: 2 containers Server 2: 1 container Server N: 2 containers OK
  • 112. ResourcePlan Query Example HAWQ Master Metadata Transaction Mgr. Query Parser Query Optimizer Query Dispatch Resource Mgr. NameNode Server 1 Local directory HAWQ Segment Postmaster HDFS Datanode Server 2 Local directory HAWQ Segment Postmaster HDFS Datanode Server N Local directory HAWQ Segment Postmaster HDFS Datanode YARN RMPostmaster QE QE QE QE QE QE Prepare Execute Result Cleanup Free query resources Server 1: 2 containers Server 2: 1 container Server N: 2 containers OK
  • 113. ResourcePlan Query Example HAWQ Master Metadata Transaction Mgr. Query Parser Query Optimizer Query Dispatch Resource Mgr. NameNode Server 1 Local directory HAWQ Segment Postmaster HDFS Datanode Server 2 Local directory HAWQ Segment Postmaster HDFS Datanode Server N Local directory HAWQ Segment Postmaster HDFS Datanode YARN RMPostmaster QE QE QE QE QE QE Prepare Execute Result Cleanup Free query resources Server 1: 2 containers Server 2: 1 container Server N: 2 containers OK
  • 114. ResourcePlan Query Example HAWQ Master Metadata Transaction Mgr. Query Parser Query Optimizer Query Dispatch Resource Mgr. NameNode Server 1 Local directory HAWQ Segment Postmaster HDFS Datanode Server 2 Local directory HAWQ Segment Postmaster HDFS Datanode Server N Local directory HAWQ Segment Postmaster HDFS Datanode YARN RMPostmaster Prepare Execute Result Cleanup
  • 115. Query Performance • Data does not hit the disk unless this cannot be avoided
  • 116. Query Performance • Data does not hit the disk unless this cannot be avoided • Data is not buffered on the segments unless this cannot be avoided
  • 117. Query Performance • Data does not hit the disk unless this cannot be avoided • Data is not buffered on the segments unless this cannot be avoided • Data is transferred between the nodes by UDP
  • 118. Query Performance • Data does not hit the disk unless this cannot be avoided • Data is not buffered on the segments unless this cannot be avoided • Data is transferred between the nodes by UDP • HAWQ has a good cost-based query optimizer
  • 119. Query Performance • Data does not hit the disk unless this cannot be avoided • Data is not buffered on the segments unless this cannot be avoided • Data is transferred between the nodes by UDP • HAWQ has a good cost-based query optimizer • C/C++ implementation is more efficient than Java implementation of competitive solutions
  • 120. Query Performance • Data does not hit the disk unless this cannot be avoided • Data is not buffered on the segments unless this cannot be avoided • Data is transferred between the nodes by UDP • HAWQ has a good cost-based query optimizer • C/C++ implementation is more efficient than Java implementation of competitive solutions • Query parallelism can be easily tuned
  • 121. Competitive Solutions Hive SparkSQL Impala HAWQ Query Optimizer
  • 122. Competitive Solutions Hive SparkSQL Impala HAWQ Query Optimizer ANSI SQL
  • 123. Competitive Solutions Hive SparkSQL Impala HAWQ Query Optimizer ANSI SQL Built-in Languages
  • 124. Competitive Solutions Hive SparkSQL Impala HAWQ Query Optimizer ANSI SQL Built-in Languages Disk IO
  • 125. Competitive Solutions Hive SparkSQL Impala HAWQ Query Optimizer ANSI SQL Built-in Languages Disk IO Parallelism
  • 126. Competitive Solutions Hive SparkSQL Impala HAWQ Query Optimizer ANSI SQL Built-in Languages Disk IO Parallelism Distributions
  • 127. Competitive Solutions Hive SparkSQL Impala HAWQ Query Optimizer ANSI SQL Built-in Languages Disk IO Parallelism Distributions Stability
  • 128. Competitive Solutions Hive SparkSQL Impala HAWQ Query Optimizer ANSI SQL Built-in Languages Disk IO Parallelism Distributions Stability Community
  • 129. Roadmap • AWS and S3 integration
  • 130. Roadmap • AWS and S3 integration • Mesos integration
  • 131. Roadmap • AWS and S3 integration • Mesos integration • Better Ambari integration
  • 132. Roadmap • AWS and S3 integration • Mesos integration • Better Ambari integration • Cloudera, MapR and IBM Hadoop distributions native support
  • 133. Roadmap • AWS and S3 integration • Mesos integration • Better Ambari integration • Cloudera, MapR and IBM Hadoop distributions native support • Make the SQL-on-Hadoop engine ever!
  • 134. Summary • Modern SQL-on-Hadoop engine • For structured data processing and analysis • Combines the best techniques of competitive solutions • Just released to the open source • Community is very young Join our community and contribute!