SlideShare a Scribd company logo
Emergence of SQL over Hadoop
Sudheesh Narayanan
Chief Architect – Big Data
About Me
Author of
My Expertise
• Hadoop and Ecosystem Components
• Machine Learning
• Text Analytics
• Image Analytics
• Data Science
• Real Time Event Stream Processing
• NoSQL Databases
• Complex Event Processing
Agenda
•
•
•
•
•
•

Why SQL Over Hadoop ?
Technology Landscape
Fundamentals behind SQL over Hadoop
Understand different type of SQL over Hadoop
Architecture Comparisons
Conclusions
SQL has come full Circle!!
• SQL has been ruling since 1970!!
• Hadoop came…But little traction…
• Facebook open-sourced HIVE in 2008.. Hadoop takes
the next leap in adoption
• RDBMS and MPP Vendors brought Hadoop Connectors
• Niche players used SQL engine to run Distributed
Query on Hadoop
• In 2012 Cloudera Impala sets the trend for Real time
Query over Hadoop
• Facebook open sourced Presto in 2013!!
SQL OVER HADOOP IS REALLY CROWDED!!
Which one is better!!
HIVE  First SQL over Hadoop!!
HQL
(Hive Query Language)

HIVE
Query Engine

Name Node

Storage Formats

Compressions
Metastore

Schema on Read
Mid-Query Fault Tolerance

Map-Reduce Pipelines
Hadoop

Map Reduce Latency

Job Tracker/
Resource
Manager

Processing
Logic(MR)

Processing
Logic(MR)

Processing
Logic(MR)

Processing
Logic(MR)

Data
Blocks

Data
Blocks

Data
Blocks

Data
Blocks

Node1

Node 2

Node 3

Node…
The Fundamentals!!
Processing
Logic

App Server

App Server

Data Transfer
Data

Network Switch

1.
2.
3.
4.
5.

DB Server
Query Engine

Network Latency
Storage Layer
Scalability
File Formats and Compressions
ANSI SQL Compliance

Storage Switch
Storage Array
Disk1

Disk2

Disk3

Source: http://hortonworks.com/labs/stinger/
So Lets Understand different types
of SQL Over Hadoop!!
Type 1MapReduce Batch
Map Reduce Latency still exist

1
2
3

HQL
(Hive Query Language)

4

HIVE
Query Engine

File Format Support
Improved Query Optimizer
Vectorized Query Engine

Metastore

Map-Reduce
Pipelines

IBM BigSQL

Hadoop
Node 1

Node 2

Node 3

Stinger Improved Original HIVE Performance by 35%
Type 2:- Pull Data Out of HDFS to Query Engine
RDBMS Vendors supporting Hadoop as External
Tables
1. Oracle Hadoop Connector
2. DB2 Hadoop Connector
3. Microsoft PDW Connector

SQL

Database Server
Leverage Database Query Engine

Query Engine

Pull Data from HDFS

Hadoop
Data Node

No Data Local Processing
Full ANSI SQL Compliance

Data Node

Data Node

Poor Response Time (Limited to Low Volumes)
Type 3:- Pull Data Out of HDFS to Parallel Query Engine
Leverage Specialized Query Engine

No Data Local Processing

SQL

Full ANSI SQL Compliance
Better Response Time due to Parallel processing

Polybase

Query Node is separate from Data Node!!
Type 4:- MPP Database using HDFS as Data store
Leverage MPP Query Framework
Data Local Processing but streaming pipeline
SQL

ANSI SQL Compliance

Example

Example

Response Time is good

Example
Greenplum over HDFS

Data is moved out of HDFS to MPP Engine
Type 5:- RDBMS Locally on a HDFS Node
Wrapper for access Hadoop data locally on each node
Data Local Processing
Limited ANSI SQL Compliance
SQL

Response Time is better than HIVE

Example

Example

Metadata is replicated

Still File Formats and Compression support expected

Query is pushed down to the local DB Engine on Each Node
Type 6:- Distributed Native SQL Query on HDFS
Distributed SQL Engine
Data Local Processing with streaming Pipeline
Different File Format and Compressions
Limited ANSI SQL support
Fast Response Time and Highly Scalable
Summary
The 6 Types of SQL over Hadoop!!
Batch Map Reduce
RDBMS Connector to HDFS as External Tables
Parallel Query Engine pull data out of HDFS
MPP Database using HDFS as storage
RDBMS Store Locally on HDFS Node
Distributed Query Engine
What should you look for when you choose SQL over Hadoop!!
Standard ANSI SQL Compliance

Push Down Distributed Data Local Processing
Support Variety of File Formats including Compressions
Optimized Query Engine

JDBC/ODBC Connectivity
Linear Scalability
Low Latency Query and Cost

More Related Content

What's hot

Geek Sync | Successfully Migrating Existing Databases to Azure SQL Database
Geek Sync | Successfully Migrating Existing Databases to Azure SQL DatabaseGeek Sync | Successfully Migrating Existing Databases to Azure SQL Database
Geek Sync | Successfully Migrating Existing Databases to Azure SQL Database
IDERA Software
 
How & When to Use NoSQL at Websummit Dublin
How & When to Use NoSQL at Websummit DublinHow & When to Use NoSQL at Websummit Dublin
How & When to Use NoSQL at Websummit Dublin
Amazon Web Services
 
Building a Virtual Data Lake with Apache Arrow
Building a Virtual Data Lake with Apache ArrowBuilding a Virtual Data Lake with Apache Arrow
Building a Virtual Data Lake with Apache Arrow
Dremio Corporation
 
HBaseCon2017 Democratizing HBase
HBaseCon2017 Democratizing HBaseHBaseCon2017 Democratizing HBase
HBaseCon2017 Democratizing HBase
HBaseCon
 
Sparkler Presentation for Spark Summit East 2017
Sparkler Presentation for Spark Summit East 2017Sparkler Presentation for Spark Summit East 2017
Sparkler Presentation for Spark Summit East 2017
Karanjeet Singh
 
Presto: SQL-on-anything
Presto: SQL-on-anythingPresto: SQL-on-anything
Presto: SQL-on-anything
DataWorks Summit
 
HBaseConAsia2018 Track2-1: Kerberos-based Big Data Security Solution and Prac...
HBaseConAsia2018 Track2-1: Kerberos-based Big Data Security Solution and Prac...HBaseConAsia2018 Track2-1: Kerberos-based Big Data Security Solution and Prac...
HBaseConAsia2018 Track2-1: Kerberos-based Big Data Security Solution and Prac...
Michael Stack
 
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...
HBaseCon
 
Migration from Redshift to Spark
Migration from Redshift to SparkMigration from Redshift to Spark
Migration from Redshift to Spark
Sky Yin
 
An Introduction to Sparkling Water by Michal Malohlava
An Introduction to Sparkling Water by Michal MalohlavaAn Introduction to Sparkling Water by Michal Malohlava
An Introduction to Sparkling Water by Michal Malohlava
Spark Summit
 
Apache Arrow: Present and Future @ ScaledML 2020
Apache Arrow: Present and Future @ ScaledML 2020Apache Arrow: Present and Future @ ScaledML 2020
Apache Arrow: Present and Future @ ScaledML 2020
Wes McKinney
 
MongoDB Capacity Planning
MongoDB Capacity PlanningMongoDB Capacity Planning
MongoDB Capacity Planning
Norberto Leite
 
HBaseConAsia2018 Track2-2: Apache Kylin on HBase: Extreme OLAP for big data
HBaseConAsia2018  Track2-2: Apache Kylin on HBase: Extreme OLAP for big dataHBaseConAsia2018  Track2-2: Apache Kylin on HBase: Extreme OLAP for big data
HBaseConAsia2018 Track2-2: Apache Kylin on HBase: Extreme OLAP for big data
Michael Stack
 
Introduction to NoSQL and Couchbase
Introduction to NoSQL and CouchbaseIntroduction to NoSQL and Couchbase
Introduction to NoSQL and Couchbase
Cecile Le Pape
 
2013 CPM Conference, Nov 6th, NoSQL Capacity Planning
2013 CPM Conference, Nov 6th, NoSQL Capacity Planning2013 CPM Conference, Nov 6th, NoSQL Capacity Planning
2013 CPM Conference, Nov 6th, NoSQL Capacity Planning
asya999
 
Apache Arrow - An Overview
Apache Arrow - An OverviewApache Arrow - An Overview
Apache Arrow - An Overview
Dremio Corporation
 
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...
Qubole
 
SharePoint Databases: What you need to know (201609)
SharePoint Databases: What you need to know (201609)SharePoint Databases: What you need to know (201609)
SharePoint Databases: What you need to know (201609)
Alan Eardley
 
Microsoft's Big Play for Big Data
Microsoft's Big Play for Big DataMicrosoft's Big Play for Big Data
Microsoft's Big Play for Big Data
Andrew Brust
 

What's hot (19)

Geek Sync | Successfully Migrating Existing Databases to Azure SQL Database
Geek Sync | Successfully Migrating Existing Databases to Azure SQL DatabaseGeek Sync | Successfully Migrating Existing Databases to Azure SQL Database
Geek Sync | Successfully Migrating Existing Databases to Azure SQL Database
 
How & When to Use NoSQL at Websummit Dublin
How & When to Use NoSQL at Websummit DublinHow & When to Use NoSQL at Websummit Dublin
How & When to Use NoSQL at Websummit Dublin
 
Building a Virtual Data Lake with Apache Arrow
Building a Virtual Data Lake with Apache ArrowBuilding a Virtual Data Lake with Apache Arrow
Building a Virtual Data Lake with Apache Arrow
 
HBaseCon2017 Democratizing HBase
HBaseCon2017 Democratizing HBaseHBaseCon2017 Democratizing HBase
HBaseCon2017 Democratizing HBase
 
Sparkler Presentation for Spark Summit East 2017
Sparkler Presentation for Spark Summit East 2017Sparkler Presentation for Spark Summit East 2017
Sparkler Presentation for Spark Summit East 2017
 
Presto: SQL-on-anything
Presto: SQL-on-anythingPresto: SQL-on-anything
Presto: SQL-on-anything
 
HBaseConAsia2018 Track2-1: Kerberos-based Big Data Security Solution and Prac...
HBaseConAsia2018 Track2-1: Kerberos-based Big Data Security Solution and Prac...HBaseConAsia2018 Track2-1: Kerberos-based Big Data Security Solution and Prac...
HBaseConAsia2018 Track2-1: Kerberos-based Big Data Security Solution and Prac...
 
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...
 
Migration from Redshift to Spark
Migration from Redshift to SparkMigration from Redshift to Spark
Migration from Redshift to Spark
 
An Introduction to Sparkling Water by Michal Malohlava
An Introduction to Sparkling Water by Michal MalohlavaAn Introduction to Sparkling Water by Michal Malohlava
An Introduction to Sparkling Water by Michal Malohlava
 
Apache Arrow: Present and Future @ ScaledML 2020
Apache Arrow: Present and Future @ ScaledML 2020Apache Arrow: Present and Future @ ScaledML 2020
Apache Arrow: Present and Future @ ScaledML 2020
 
MongoDB Capacity Planning
MongoDB Capacity PlanningMongoDB Capacity Planning
MongoDB Capacity Planning
 
HBaseConAsia2018 Track2-2: Apache Kylin on HBase: Extreme OLAP for big data
HBaseConAsia2018  Track2-2: Apache Kylin on HBase: Extreme OLAP for big dataHBaseConAsia2018  Track2-2: Apache Kylin on HBase: Extreme OLAP for big data
HBaseConAsia2018 Track2-2: Apache Kylin on HBase: Extreme OLAP for big data
 
Introduction to NoSQL and Couchbase
Introduction to NoSQL and CouchbaseIntroduction to NoSQL and Couchbase
Introduction to NoSQL and Couchbase
 
2013 CPM Conference, Nov 6th, NoSQL Capacity Planning
2013 CPM Conference, Nov 6th, NoSQL Capacity Planning2013 CPM Conference, Nov 6th, NoSQL Capacity Planning
2013 CPM Conference, Nov 6th, NoSQL Capacity Planning
 
Apache Arrow - An Overview
Apache Arrow - An OverviewApache Arrow - An Overview
Apache Arrow - An Overview
 
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...
 
SharePoint Databases: What you need to know (201609)
SharePoint Databases: What you need to know (201609)SharePoint Databases: What you need to know (201609)
SharePoint Databases: What you need to know (201609)
 
Microsoft's Big Play for Big Data
Microsoft's Big Play for Big DataMicrosoft's Big Play for Big Data
Microsoft's Big Play for Big Data
 

Similar to Sql over hadoop ver 3

Impala for PhillyDB Meetup
Impala for PhillyDB MeetupImpala for PhillyDB Meetup
Impala for PhillyDB Meetup
Shravan (Sean) Pabba
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
tcloudcomputing-tw
 
Etu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和SparkEtu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和Spark
James Chen
 
Hadoop
HadoopHadoop
Hadoop
avnishagr
 
List of Engineering Colleges in Uttarakhand
List of Engineering Colleges in UttarakhandList of Engineering Colleges in Uttarakhand
List of Engineering Colleges in Uttarakhand
Roorkee College of Engineering, Roorkee
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
sonukumar379092
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
arslanhaneef
 
Big SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor LandscapeBig SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor Landscape
Nicolas Morales
 
Technologies for Data Analytics Platform
Technologies for Data Analytics PlatformTechnologies for Data Analytics Platform
Technologies for Data Analytics Platform
N Masahiro
 
Hadoop ppt1
Hadoop ppt1Hadoop ppt1
Hadoop ppt1
chariorienit
 
Twitter with hadoop for oow
Twitter with hadoop for oowTwitter with hadoop for oow
Twitter with hadoop for oow
Gwen (Chen) Shapira
 
Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)
Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)
Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)
VMware Tanzu
 
Big Data Developers Moscow Meetup 1 - sql on hadoop
Big Data Developers Moscow Meetup 1  - sql on hadoopBig Data Developers Moscow Meetup 1  - sql on hadoop
Big Data Developers Moscow Meetup 1 - sql on hadoop
bddmoscow
 
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
Andrew Brust
 
BIG DATA Session 6
BIG DATA Session 6BIG DATA Session 6
BIG DATA Session 6
Infinity Tech Solutions
 
Big Data Technologies and Why They Matter To R Users
Big Data Technologies and Why They Matter To R UsersBig Data Technologies and Why They Matter To R Users
Big Data Technologies and Why They Matter To R Users
Adaryl "Bob" Wakefield, MBA
 
Apache drill
Apache drillApache drill
Apache drill
MapR Technologies
 
hadoop distributed file systems complete information
hadoop distributed file systems complete informationhadoop distributed file systems complete information
hadoop distributed file systems complete information
bhargavi804095
 
Real time hadoop + mapreduce intro
Real time hadoop + mapreduce introReal time hadoop + mapreduce intro
Real time hadoop + mapreduce intro
Geoff Hendrey
 
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
Venneladonthireddy1
 

Similar to Sql over hadoop ver 3 (20)

Impala for PhillyDB Meetup
Impala for PhillyDB MeetupImpala for PhillyDB Meetup
Impala for PhillyDB Meetup
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
 
Etu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和SparkEtu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和Spark
 
Hadoop
HadoopHadoop
Hadoop
 
List of Engineering Colleges in Uttarakhand
List of Engineering Colleges in UttarakhandList of Engineering Colleges in Uttarakhand
List of Engineering Colleges in Uttarakhand
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Big SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor LandscapeBig SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor Landscape
 
Technologies for Data Analytics Platform
Technologies for Data Analytics PlatformTechnologies for Data Analytics Platform
Technologies for Data Analytics Platform
 
Hadoop ppt1
Hadoop ppt1Hadoop ppt1
Hadoop ppt1
 
Twitter with hadoop for oow
Twitter with hadoop for oowTwitter with hadoop for oow
Twitter with hadoop for oow
 
Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)
Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)
Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)
 
Big Data Developers Moscow Meetup 1 - sql on hadoop
Big Data Developers Moscow Meetup 1  - sql on hadoopBig Data Developers Moscow Meetup 1  - sql on hadoop
Big Data Developers Moscow Meetup 1 - sql on hadoop
 
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
 
BIG DATA Session 6
BIG DATA Session 6BIG DATA Session 6
BIG DATA Session 6
 
Big Data Technologies and Why They Matter To R Users
Big Data Technologies and Why They Matter To R UsersBig Data Technologies and Why They Matter To R Users
Big Data Technologies and Why They Matter To R Users
 
Apache drill
Apache drillApache drill
Apache drill
 
hadoop distributed file systems complete information
hadoop distributed file systems complete informationhadoop distributed file systems complete information
hadoop distributed file systems complete information
 
Real time hadoop + mapreduce intro
Real time hadoop + mapreduce introReal time hadoop + mapreduce intro
Real time hadoop + mapreduce intro
 
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
 

Recently uploaded

What’s New in Teams Calling, Meetings and Devices May 2024
What’s New in Teams Calling, Meetings and Devices May 2024What’s New in Teams Calling, Meetings and Devices May 2024
What’s New in Teams Calling, Meetings and Devices May 2024
Stephanie Beckett
 
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
Kief Morris
 
How Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdfHow Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdf
HackersList
 
Pigging Solutions Sustainability brochure.pdf
Pigging Solutions Sustainability brochure.pdfPigging Solutions Sustainability brochure.pdf
Pigging Solutions Sustainability brochure.pdf
Pigging Solutions
 
How to Avoid Learning the Linux-Kernel Memory Model
How to Avoid Learning the Linux-Kernel Memory ModelHow to Avoid Learning the Linux-Kernel Memory Model
How to Avoid Learning the Linux-Kernel Memory Model
ScyllaDB
 
How Netflix Builds High Performance Applications at Global Scale
How Netflix Builds High Performance Applications at Global ScaleHow Netflix Builds High Performance Applications at Global Scale
How Netflix Builds High Performance Applications at Global Scale
ScyllaDB
 
Coordinate Systems in FME 101 - Webinar Slides
Coordinate Systems in FME 101 - Webinar SlidesCoordinate Systems in FME 101 - Webinar Slides
Coordinate Systems in FME 101 - Webinar Slides
Safe Software
 
Performance Budgets for the Real World by Tammy Everts
Performance Budgets for the Real World by Tammy EvertsPerformance Budgets for the Real World by Tammy Everts
Performance Budgets for the Real World by Tammy Everts
ScyllaDB
 
INDIAN AIR FORCE FIGHTER PLANES LIST.pdf
INDIAN AIR FORCE FIGHTER PLANES LIST.pdfINDIAN AIR FORCE FIGHTER PLANES LIST.pdf
INDIAN AIR FORCE FIGHTER PLANES LIST.pdf
jackson110191
 
DealBook of Ukraine: 2024 edition
DealBook of Ukraine: 2024 editionDealBook of Ukraine: 2024 edition
DealBook of Ukraine: 2024 edition
Yevgen Sysoyev
 
Running a Go App in Kubernetes: CPU Impacts
Running a Go App in Kubernetes: CPU ImpactsRunning a Go App in Kubernetes: CPU Impacts
Running a Go App in Kubernetes: CPU Impacts
ScyllaDB
 
一比一原版(msvu毕业证书)圣文森山大学毕业证如何办理
一比一原版(msvu毕业证书)圣文森山大学毕业证如何办理一比一原版(msvu毕业证书)圣文森山大学毕业证如何办理
一比一原版(msvu毕业证书)圣文森山大学毕业证如何办理
uuuot
 
@Call @Girls Guwahati 🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cute Girl any...
@Call @Girls Guwahati 🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cute Girl any...@Call @Girls Guwahati 🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cute Girl any...
@Call @Girls Guwahati 🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cute Girl any...
kantakumariji156
 
K2G - Insurtech Innovation EMEA Award 2024
K2G - Insurtech Innovation EMEA Award 2024K2G - Insurtech Innovation EMEA Award 2024
K2G - Insurtech Innovation EMEA Award 2024
The Digital Insurer
 
20240705 QFM024 Irresponsible AI Reading List June 2024
20240705 QFM024 Irresponsible AI Reading List June 202420240705 QFM024 Irresponsible AI Reading List June 2024
20240705 QFM024 Irresponsible AI Reading List June 2024
Matthew Sinclair
 
Implementations of Fused Deposition Modeling in real world
Implementations of Fused Deposition Modeling  in real worldImplementations of Fused Deposition Modeling  in real world
Implementations of Fused Deposition Modeling in real world
Emerging Tech
 
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Chris Swan
 
Recent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS InfrastructureRecent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS Infrastructure
KAMAL CHOUDHARY
 
20240704 QFM023 Engineering Leadership Reading List June 2024
20240704 QFM023 Engineering Leadership Reading List June 202420240704 QFM023 Engineering Leadership Reading List June 2024
20240704 QFM023 Engineering Leadership Reading List June 2024
Matthew Sinclair
 
BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL
BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALLBLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL
BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL
Liveplex
 

Recently uploaded (20)

What’s New in Teams Calling, Meetings and Devices May 2024
What’s New in Teams Calling, Meetings and Devices May 2024What’s New in Teams Calling, Meetings and Devices May 2024
What’s New in Teams Calling, Meetings and Devices May 2024
 
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
 
How Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdfHow Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdf
 
Pigging Solutions Sustainability brochure.pdf
Pigging Solutions Sustainability brochure.pdfPigging Solutions Sustainability brochure.pdf
Pigging Solutions Sustainability brochure.pdf
 
How to Avoid Learning the Linux-Kernel Memory Model
How to Avoid Learning the Linux-Kernel Memory ModelHow to Avoid Learning the Linux-Kernel Memory Model
How to Avoid Learning the Linux-Kernel Memory Model
 
How Netflix Builds High Performance Applications at Global Scale
How Netflix Builds High Performance Applications at Global ScaleHow Netflix Builds High Performance Applications at Global Scale
How Netflix Builds High Performance Applications at Global Scale
 
Coordinate Systems in FME 101 - Webinar Slides
Coordinate Systems in FME 101 - Webinar SlidesCoordinate Systems in FME 101 - Webinar Slides
Coordinate Systems in FME 101 - Webinar Slides
 
Performance Budgets for the Real World by Tammy Everts
Performance Budgets for the Real World by Tammy EvertsPerformance Budgets for the Real World by Tammy Everts
Performance Budgets for the Real World by Tammy Everts
 
INDIAN AIR FORCE FIGHTER PLANES LIST.pdf
INDIAN AIR FORCE FIGHTER PLANES LIST.pdfINDIAN AIR FORCE FIGHTER PLANES LIST.pdf
INDIAN AIR FORCE FIGHTER PLANES LIST.pdf
 
DealBook of Ukraine: 2024 edition
DealBook of Ukraine: 2024 editionDealBook of Ukraine: 2024 edition
DealBook of Ukraine: 2024 edition
 
Running a Go App in Kubernetes: CPU Impacts
Running a Go App in Kubernetes: CPU ImpactsRunning a Go App in Kubernetes: CPU Impacts
Running a Go App in Kubernetes: CPU Impacts
 
一比一原版(msvu毕业证书)圣文森山大学毕业证如何办理
一比一原版(msvu毕业证书)圣文森山大学毕业证如何办理一比一原版(msvu毕业证书)圣文森山大学毕业证如何办理
一比一原版(msvu毕业证书)圣文森山大学毕业证如何办理
 
@Call @Girls Guwahati 🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cute Girl any...
@Call @Girls Guwahati 🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cute Girl any...@Call @Girls Guwahati 🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cute Girl any...
@Call @Girls Guwahati 🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cute Girl any...
 
K2G - Insurtech Innovation EMEA Award 2024
K2G - Insurtech Innovation EMEA Award 2024K2G - Insurtech Innovation EMEA Award 2024
K2G - Insurtech Innovation EMEA Award 2024
 
20240705 QFM024 Irresponsible AI Reading List June 2024
20240705 QFM024 Irresponsible AI Reading List June 202420240705 QFM024 Irresponsible AI Reading List June 2024
20240705 QFM024 Irresponsible AI Reading List June 2024
 
Implementations of Fused Deposition Modeling in real world
Implementations of Fused Deposition Modeling  in real worldImplementations of Fused Deposition Modeling  in real world
Implementations of Fused Deposition Modeling in real world
 
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
 
Recent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS InfrastructureRecent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS Infrastructure
 
20240704 QFM023 Engineering Leadership Reading List June 2024
20240704 QFM023 Engineering Leadership Reading List June 202420240704 QFM023 Engineering Leadership Reading List June 2024
20240704 QFM023 Engineering Leadership Reading List June 2024
 
BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL
BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALLBLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL
BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL
 

Sql over hadoop ver 3

  • 1. Emergence of SQL over Hadoop Sudheesh Narayanan Chief Architect – Big Data
  • 2. About Me Author of My Expertise • Hadoop and Ecosystem Components • Machine Learning • Text Analytics • Image Analytics • Data Science • Real Time Event Stream Processing • NoSQL Databases • Complex Event Processing
  • 3. Agenda • • • • • • Why SQL Over Hadoop ? Technology Landscape Fundamentals behind SQL over Hadoop Understand different type of SQL over Hadoop Architecture Comparisons Conclusions
  • 4. SQL has come full Circle!! • SQL has been ruling since 1970!! • Hadoop came…But little traction… • Facebook open-sourced HIVE in 2008.. Hadoop takes the next leap in adoption • RDBMS and MPP Vendors brought Hadoop Connectors • Niche players used SQL engine to run Distributed Query on Hadoop • In 2012 Cloudera Impala sets the trend for Real time Query over Hadoop • Facebook open sourced Presto in 2013!!
  • 5. SQL OVER HADOOP IS REALLY CROWDED!! Which one is better!!
  • 6. HIVE  First SQL over Hadoop!! HQL (Hive Query Language) HIVE Query Engine Name Node Storage Formats Compressions Metastore Schema on Read Mid-Query Fault Tolerance Map-Reduce Pipelines Hadoop Map Reduce Latency Job Tracker/ Resource Manager Processing Logic(MR) Processing Logic(MR) Processing Logic(MR) Processing Logic(MR) Data Blocks Data Blocks Data Blocks Data Blocks Node1 Node 2 Node 3 Node…
  • 7. The Fundamentals!! Processing Logic App Server App Server Data Transfer Data Network Switch 1. 2. 3. 4. 5. DB Server Query Engine Network Latency Storage Layer Scalability File Formats and Compressions ANSI SQL Compliance Storage Switch Storage Array Disk1 Disk2 Disk3 Source: http://hortonworks.com/labs/stinger/
  • 8. So Lets Understand different types of SQL Over Hadoop!!
  • 9. Type 1MapReduce Batch Map Reduce Latency still exist 1 2 3 HQL (Hive Query Language) 4 HIVE Query Engine File Format Support Improved Query Optimizer Vectorized Query Engine Metastore Map-Reduce Pipelines IBM BigSQL Hadoop Node 1 Node 2 Node 3 Stinger Improved Original HIVE Performance by 35%
  • 10. Type 2:- Pull Data Out of HDFS to Query Engine RDBMS Vendors supporting Hadoop as External Tables 1. Oracle Hadoop Connector 2. DB2 Hadoop Connector 3. Microsoft PDW Connector SQL Database Server Leverage Database Query Engine Query Engine Pull Data from HDFS Hadoop Data Node No Data Local Processing Full ANSI SQL Compliance Data Node Data Node Poor Response Time (Limited to Low Volumes)
  • 11. Type 3:- Pull Data Out of HDFS to Parallel Query Engine Leverage Specialized Query Engine No Data Local Processing SQL Full ANSI SQL Compliance Better Response Time due to Parallel processing Polybase Query Node is separate from Data Node!!
  • 12. Type 4:- MPP Database using HDFS as Data store Leverage MPP Query Framework Data Local Processing but streaming pipeline SQL ANSI SQL Compliance Example Example Response Time is good Example Greenplum over HDFS Data is moved out of HDFS to MPP Engine
  • 13. Type 5:- RDBMS Locally on a HDFS Node Wrapper for access Hadoop data locally on each node Data Local Processing Limited ANSI SQL Compliance SQL Response Time is better than HIVE Example Example Metadata is replicated Still File Formats and Compression support expected Query is pushed down to the local DB Engine on Each Node
  • 14. Type 6:- Distributed Native SQL Query on HDFS Distributed SQL Engine Data Local Processing with streaming Pipeline Different File Format and Compressions Limited ANSI SQL support Fast Response Time and Highly Scalable
  • 15. Summary The 6 Types of SQL over Hadoop!! Batch Map Reduce RDBMS Connector to HDFS as External Tables Parallel Query Engine pull data out of HDFS MPP Database using HDFS as storage RDBMS Store Locally on HDFS Node Distributed Query Engine
  • 16. What should you look for when you choose SQL over Hadoop!! Standard ANSI SQL Compliance Push Down Distributed Data Local Processing Support Variety of File Formats including Compressions Optimized Query Engine JDBC/ODBC Connectivity Linear Scalability Low Latency Query and Cost