SlideShare a Scribd company logo
Jethro for tableau webinar (11 15)
Webinar Topics
• Who is Jethro?
• Tableau & Big Data: Extract vs. Live Connect
• Big Data Platforms: Hadoop vs. EDW Appliances
• Two DB architectures: Full-scan vs. Index Access
• Live Demo: Tableau over Impala / Redshift / Jethro
• What is Jethro for Tableau and how it accelerates Tableau’s performance
• Q&A
About Us
• What does Jethro do?
– SQL engine optimized for accelerating
BI on big data
• How it works?
– Combines Columnar SQL DB design
with full-indexing technology
• Where is it?
– In dev since 2012; GA: mid 2015
– Download & free eval
• When to use it?
– BI Spinner Syndrome (BSS)
• Partnerships
– BI and Hadoop vendors
• Speaker
– Eli Singer, CEO JethroData
– esinger@jethrodata.com
– 917.509.6111
• Experience
– Long-time DBA
– Over 20 years of leading Tech startups
• Where to find us
– Jethrodata.com
– @JethroData
Tableau and Big Data: Extract (In-Mem)
Tableau
Extract
EDW / Hadoop
• Typical Tableau usage is based
on extracting selective data from
remote sources
• Extracted data is then
dynamically loaded into Tableau
memory for interactive analysis
• Limitations: Performance
degradation and scale (typically
~200M rows)
Tableau and Big Data: Live Connect (In-DB)
Tableau
EDW / Hadoop
• Tableau issues SQL queries
to the target DB for every
user interaction
• DB retrieves requested
data and returns to Tableau
• Limitation: DB
performance is significantly
slower than in-mem speed
Live
Connect
Big Data Platforms: Hadoop Vs. EDW Appliances
10x-100x Data
1/10 HW $cost
Open Platform
Analytics: ETL, Predictive, Reporting, BI
SQL enables the change of data platform while keeping the analytic apps intact
The Hadoop Trade-Off: Scale & Cost Vs. Performance
SQL-on-Hadoop
ETL Predictive Reporting

BI
Too SLOW in Hadoopx
It’s unrealistic to expect to the same performance when data is much
larger, and highly optimized hardware is replaced with commodity boxes.
SQL-on-Hadoop – MPP / Full Scan Architecture
Architecture:
MPP / Full-Scan (All SQL-on-Hadoop)
Query:
List books by author “Stephen King”
Process:
Each librarian is assigned a rack, they
then pull each book, check if author is
“Stephen King”, if so, get book title
Result:
Too slow, costly, unscalable.
Unsuitable for BI
A Library Analogy:
Billions of books, Thousands of racks
SQL-on-Hadoop – Index-Access Architecture
Architecture:
Index Access (Only Jethro)
Query:
List books by author “Stephen King”
Process:
Access Author index, entry of
“Stephen King”, get list of books, fetch
only these books
Result:
Fast, minimal resources, scalable
Optimal for BI
10
SQL on Hadoop – Competitive Landscape
• Hive
• Impala
• Presto
• SparkSQL
• Drill
• Pivotal/HAWQ
• IBM/Big SQL
• Actian
• Teradata/SQL-H
• …
• Jethro
Full-Scan Based Solutions
Reads all rows. Every Time.
Index Based Solution
Reads ONLY needed rows.
Use-Case Comparison:
Full-Scan: Optimal for Predictive, reporting
Index: Optimal for Interactive BI
LIVE Benchmark: BI on Hadoop (and Redshift)
Hardware – AWS
• Hadoop: CDH 5.4
• 6 nodes: m1.xlarge, r3.xlarge
• Jethro: r3.8xlarge
• Point browser at: tableau.jethrodata.com
– UID/PWD: demo / demo
• Choose workbook: “Jethro”, “Impala”, “Redshift”
• BI Dashboard: choose year, category or any other filter to drill-down
• Data
– Based on TPC-DS benchmark
– 1TB raw data (400GB fact)
– Fact table: ~2.9B rows
– Dimensions: 7
Hardware Data
Format
Hadoop
Cluster
Compute
Cluster
Total
RAM, CPU
AWS
$ per hr.
Jethro Jethro
indexes
(250GB)
3x m1.xlarge 2x r3.4xlarge
(spot)
289GB,
44 cores
$0.80
Impala Parquet
(160GB)
8x r3.2xlarge
1x r3.xlarge
510GB
68 cores
$5.95
Redshift Redshift
(229GB)
8x dc1.large 120GB,
16 cores
$2.00
What Is Jethro for Tableau?
Tableau
EDW / Hadoop / Cloud / Local FS / NAS
Extract
• An indexing & caching server
• Relevant data is extracted from EDW
/ Hadoop into Jethro. No size
limitation
• Jethro then fully indexes the data
(every column!)
• Jethro’s column and index files are
stored back in Hadoop (or other
storage system)
• Tableau uses Live Connect to send
Jethro SQL queries (ODBC)
• Jethro uses indexes to speed up
queries and return results to Tableau
Live
Connect
2. Store
3.
1.
Selecting Data for Jethro Acceleration
• Select only Tableau “worthy” datasets
– Not ALL data in Hadoop should have Jethro
• Use any ETL tool to extract from source
– Jethro receives data in a CSV/delimited format
– Extracted data can be temporarily stored in a file or
“piped” live to Jethro
• After initial creation, incremental loads are supported
– As frequently as every few min
• Jethro stores it’s version of the dataset back in HDFS
– Can also use local filesystem, network storage or cloud storage
• Load is fast
– ~1B rows/hour
– Data in highly compressed: 1TB -> 400GB data + indexes
EDW / Hadoop
Extract
Data
Node
Index-Access – How it works
Data
Node
Data
Node
Data
Node
Data
Node
Jethro
Query
Node
Query
Node
1. Index Access 2. Read data only for require rows
Performance and resources based on the size of the working-set
Storage
- HDFS
- Cloud (S3, EFS)
- NAS/SAN
- Local FS
Tableau
SELECT day, sum(sales) FROM t1 WHERE prod=‘abc’ GROUP BY day
Jethro Indexes – Superior Technology
http://www.google.com/patents/WO2013001535A3?cl=enPatent Pending:
• Complete
– Every column is indexed
• Simple
 Inverted-list indexes map each column
value to a list of rows
• Fast to read
 Direct Access to a value entry
 No need to scan entire index, or load
index to memory
• Scalable
 Distributed, highly hierarchical
compressed bitmaps
Appendable Index Structure for
Fast Incremental Loads
Adaptive Optimization: Active Cache of Query Results
• Reuse of intermediate/final query results
– Repeat queries return immediately
• Addresses wide top-of-the-funnel queries
– Exploration starts with queries with no/few
filters
– Those queries are likely to be repeated in
dashboard scenarios
• Transparently adapts to incremental loads
– Execution on delta data + merge saved results
Query
Speed
Query
Selectivity
Fast
Slow
Few More
Query
speed
Query
Selectivity
Fast
Slow
Few More
Query
speed
Query
Selectivity
Fast
Slow
Few More
Index Performance Cache Performance
Index + Cache
Summary: Why Index Access Optimal for BI?
1. Use of indexes eliminates need to read unnecessary data
2. The deeper you go, the faster it gets: as users drill down and add
more filters the faster the queries perform
3. Unlimited flexibility: users can aggregate and filter by any columns
they choose with no performance penalty
4. Concurrent users accessing dashboards generate repeatable queries
that result in high cache efficiency
5. Shields BI workload from other analytics overwhelming the cluster
Ready to Try Jethro?
1. Register: jethrodata.com/download-jethro-for-tableau
2. Schedule a 45min POC review with Jethro SA (free!)
3. One time setup
- Download and Install Jethro on a server / VM
- Start services, configure instance
4. Extract & Load data
5. Use Tableau
- Install ODBC driver
- Point Tableau data source at Jethro
That’s It!
Q&A
Thank You!

More Related Content

What's hot

Azure_Business_Opportunity
Azure_Business_OpportunityAzure_Business_Opportunity
Azure_Business_Opportunity
Nojan Emad
 
SQL on Hadoop
SQL on HadoopSQL on Hadoop
SQL on Hadoop
nvvrajesh
 
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksUsing Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
DataWorks Summit
 
Cloudera Impala
Cloudera ImpalaCloudera Impala
Cloudera Impala
Scott Leberknight
 
Introducing Kudu, Big Data Warehousing Meetup
Introducing Kudu, Big Data Warehousing MeetupIntroducing Kudu, Big Data Warehousing Meetup
Introducing Kudu, Big Data Warehousing Meetup
Caserta
 
High concurrency,
Low latency analytics
using Spark/Kudu
 High concurrency,
Low latency analytics
using Spark/Kudu High concurrency,
Low latency analytics
using Spark/Kudu
High concurrency,
Low latency analytics
using Spark/Kudu
Chris George
 
SQL-on-Hadoop Tutorial
SQL-on-Hadoop TutorialSQL-on-Hadoop Tutorial
SQL-on-Hadoop Tutorial
Daniel Abadi
 
Impala: Real-time Queries in Hadoop
Impala: Real-time Queries in HadoopImpala: Real-time Queries in Hadoop
Impala: Real-time Queries in Hadoop
Cloudera, Inc.
 
Impala 2.0 - The Best Analytic Database for Hadoop
Impala 2.0 - The Best Analytic Database for HadoopImpala 2.0 - The Best Analytic Database for Hadoop
Impala 2.0 - The Best Analytic Database for Hadoop
Cloudera, Inc.
 
SQL on Hadoop in Taiwan
SQL on Hadoop in TaiwanSQL on Hadoop in Taiwan
SQL on Hadoop in Taiwan
Treasure Data, Inc.
 
HBase and Drill: How loosley typed SQL is ideal for NoSQL
HBase and Drill: How loosley typed SQL is ideal for NoSQLHBase and Drill: How loosley typed SQL is ideal for NoSQL
HBase and Drill: How loosley typed SQL is ideal for NoSQL
DataWorks Summit
 
Application Architectures with Hadoop
Application Architectures with HadoopApplication Architectures with Hadoop
Application Architectures with Hadoop
hadooparchbook
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
markgrover
 
High-Scale Entity Resolution in Hadoop
High-Scale Entity Resolution in HadoopHigh-Scale Entity Resolution in Hadoop
High-Scale Entity Resolution in Hadoop
DataWorks Summit/Hadoop Summit
 
An Online Spark Pipeline: Semi-Supervised Learning and Automatic Retraining w...
An Online Spark Pipeline: Semi-Supervised Learning and Automatic Retraining w...An Online Spark Pipeline: Semi-Supervised Learning and Automatic Retraining w...
An Online Spark Pipeline: Semi-Supervised Learning and Automatic Retraining w...
Databricks
 
A Closer Look at Apache Kudu
A Closer Look at Apache KuduA Closer Look at Apache Kudu
A Closer Look at Apache Kudu
Andriy Zabavskyy
 
Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks
Hortonworks
 
Data warehousing with Hadoop
Data warehousing with HadoopData warehousing with Hadoop
Data warehousing with Hadoop
hadooparchbook
 

What's hot (18)

Azure_Business_Opportunity
Azure_Business_OpportunityAzure_Business_Opportunity
Azure_Business_Opportunity
 
SQL on Hadoop
SQL on HadoopSQL on Hadoop
SQL on Hadoop
 
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksUsing Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
 
Cloudera Impala
Cloudera ImpalaCloudera Impala
Cloudera Impala
 
Introducing Kudu, Big Data Warehousing Meetup
Introducing Kudu, Big Data Warehousing MeetupIntroducing Kudu, Big Data Warehousing Meetup
Introducing Kudu, Big Data Warehousing Meetup
 
High concurrency,
Low latency analytics
using Spark/Kudu
 High concurrency,
Low latency analytics
using Spark/Kudu High concurrency,
Low latency analytics
using Spark/Kudu
High concurrency,
Low latency analytics
using Spark/Kudu
 
SQL-on-Hadoop Tutorial
SQL-on-Hadoop TutorialSQL-on-Hadoop Tutorial
SQL-on-Hadoop Tutorial
 
Impala: Real-time Queries in Hadoop
Impala: Real-time Queries in HadoopImpala: Real-time Queries in Hadoop
Impala: Real-time Queries in Hadoop
 
Impala 2.0 - The Best Analytic Database for Hadoop
Impala 2.0 - The Best Analytic Database for HadoopImpala 2.0 - The Best Analytic Database for Hadoop
Impala 2.0 - The Best Analytic Database for Hadoop
 
SQL on Hadoop in Taiwan
SQL on Hadoop in TaiwanSQL on Hadoop in Taiwan
SQL on Hadoop in Taiwan
 
HBase and Drill: How loosley typed SQL is ideal for NoSQL
HBase and Drill: How loosley typed SQL is ideal for NoSQLHBase and Drill: How loosley typed SQL is ideal for NoSQL
HBase and Drill: How loosley typed SQL is ideal for NoSQL
 
Application Architectures with Hadoop
Application Architectures with HadoopApplication Architectures with Hadoop
Application Architectures with Hadoop
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
 
High-Scale Entity Resolution in Hadoop
High-Scale Entity Resolution in HadoopHigh-Scale Entity Resolution in Hadoop
High-Scale Entity Resolution in Hadoop
 
An Online Spark Pipeline: Semi-Supervised Learning and Automatic Retraining w...
An Online Spark Pipeline: Semi-Supervised Learning and Automatic Retraining w...An Online Spark Pipeline: Semi-Supervised Learning and Automatic Retraining w...
An Online Spark Pipeline: Semi-Supervised Learning and Automatic Retraining w...
 
A Closer Look at Apache Kudu
A Closer Look at Apache KuduA Closer Look at Apache Kudu
A Closer Look at Apache Kudu
 
Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks
 
Data warehousing with Hadoop
Data warehousing with HadoopData warehousing with Hadoop
Data warehousing with Hadoop
 

Viewers also liked

How Auto Microcubes Work with Indexing & Caching to Deliver a Consistently Fa...
How Auto Microcubes Work with Indexing & Caching to Deliver a Consistently Fa...How Auto Microcubes Work with Indexing & Caching to Deliver a Consistently Fa...
How Auto Microcubes Work with Indexing & Caching to Deliver a Consistently Fa...
Remy Rosenbaum
 
Hadoop in the Cloud: Common Architectural Patterns
Hadoop in the Cloud: Common Architectural PatternsHadoop in the Cloud: Common Architectural Patterns
Hadoop in the Cloud: Common Architectural Patterns
DataWorks Summit
 
Cloud Computing: Hadoop
Cloud Computing: HadoopCloud Computing: Hadoop
Cloud Computing: Hadoop
darugar
 
100424 teradata cloud computing 3rd party influencers2c
100424 teradata cloud computing 3rd party influencers2c100424 teradata cloud computing 3rd party influencers2c
100424 teradata cloud computing 3rd party influencers2c
guest8ebe0a8
 
(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift
(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift
(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift
Amazon Web Services
 
A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0 A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0
DataWorks Summit
 
Ietm history of economic thought
Ietm history of economic thoughtIetm history of economic thought
Ietm history of economic thought
Orxan Hesenli
 
Large scale ETL with Hadoop
Large scale ETL with HadoopLarge scale ETL with Hadoop
Large scale ETL with Hadoop
OReillyStrata
 
Big Data & The Cloud
Big Data & The CloudBig Data & The Cloud
Big Data & The Cloud
Amazon Web Services
 
Ricardian Theory of Rent
Ricardian Theory of RentRicardian Theory of Rent
Ricardian Theory of Rent
Higher Education Department, Haryana
 
Mercantilism and-the-physiocrats
Mercantilism and-the-physiocratsMercantilism and-the-physiocrats
Mercantilism and-the-physiocrats
Sana Hassan Afridi
 
Brief review of Adam Smith's main concepts of growth.
Brief review of Adam Smith's main concepts of growth.Brief review of Adam Smith's main concepts of growth.
Brief review of Adam Smith's main concepts of growth.
Prabha Panth
 

Viewers also liked (12)

How Auto Microcubes Work with Indexing & Caching to Deliver a Consistently Fa...
How Auto Microcubes Work with Indexing & Caching to Deliver a Consistently Fa...How Auto Microcubes Work with Indexing & Caching to Deliver a Consistently Fa...
How Auto Microcubes Work with Indexing & Caching to Deliver a Consistently Fa...
 
Hadoop in the Cloud: Common Architectural Patterns
Hadoop in the Cloud: Common Architectural PatternsHadoop in the Cloud: Common Architectural Patterns
Hadoop in the Cloud: Common Architectural Patterns
 
Cloud Computing: Hadoop
Cloud Computing: HadoopCloud Computing: Hadoop
Cloud Computing: Hadoop
 
100424 teradata cloud computing 3rd party influencers2c
100424 teradata cloud computing 3rd party influencers2c100424 teradata cloud computing 3rd party influencers2c
100424 teradata cloud computing 3rd party influencers2c
 
(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift
(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift
(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift
 
A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0 A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0
 
Ietm history of economic thought
Ietm history of economic thoughtIetm history of economic thought
Ietm history of economic thought
 
Large scale ETL with Hadoop
Large scale ETL with HadoopLarge scale ETL with Hadoop
Large scale ETL with Hadoop
 
Big Data & The Cloud
Big Data & The CloudBig Data & The Cloud
Big Data & The Cloud
 
Ricardian Theory of Rent
Ricardian Theory of RentRicardian Theory of Rent
Ricardian Theory of Rent
 
Mercantilism and-the-physiocrats
Mercantilism and-the-physiocratsMercantilism and-the-physiocrats
Mercantilism and-the-physiocrats
 
Brief review of Adam Smith's main concepts of growth.
Brief review of Adam Smith's main concepts of growth.Brief review of Adam Smith's main concepts of growth.
Brief review of Adam Smith's main concepts of growth.
 

Similar to Jethro for tableau webinar (11 15)

Tableau on Hadoop Meet Up: Advancing from Extracts to Live Connect
Tableau on Hadoop Meet Up: Advancing from Extracts to Live ConnectTableau on Hadoop Meet Up: Advancing from Extracts to Live Connect
Tableau on Hadoop Meet Up: Advancing from Extracts to Live Connect
Remy Rosenbaum
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinar
Michael Hiskey
 
Technologies for Data Analytics Platform
Technologies for Data Analytics PlatformTechnologies for Data Analytics Platform
Technologies for Data Analytics Platform
N Masahiro
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinar
Kognitio
 
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
ssuserd3a367
 
Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineering
Thang Bui (Bob)
 
Apache Drill at ApacheCon2014
Apache Drill at ApacheCon2014Apache Drill at ApacheCon2014
Apache Drill at ApacheCon2014
Neeraja Rentachintala
 
Presto Summit 2018 - 02 - LinkedIn
Presto Summit 2018  - 02 - LinkedInPresto Summit 2018  - 02 - LinkedIn
Presto Summit 2018 - 02 - LinkedIn
kbajda
 
Apache hive
Apache hiveApache hive
Apache hive
pradipbajpai68
 
Data Ingestion Engine
Data Ingestion EngineData Ingestion Engine
Data Ingestion Engine
Adam Doyle
 
Apache drill
Apache drillApache drill
Apache drill
MapR Technologies
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
James Serra
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in HadoopBackup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
larsgeorge
 
StreamHorizon overview
StreamHorizon overviewStreamHorizon overview
StreamHorizon overview
StreamHorizon
 
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Fwdays
 
Is the traditional data warehouse dead?
Is the traditional data warehouse dead?Is the traditional data warehouse dead?
Is the traditional data warehouse dead?
James Serra
 
Testing Big Data: Automated Testing of Hadoop with QuerySurge
Testing Big Data: Automated  Testing of Hadoop with QuerySurgeTesting Big Data: Automated  Testing of Hadoop with QuerySurge
Testing Big Data: Automated Testing of Hadoop with QuerySurge
RTTS
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
ch adnan
 
Incredible Impala
Incredible Impala Incredible Impala
Incredible Impala
Gwen (Chen) Shapira
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
James Serra
 

Similar to Jethro for tableau webinar (11 15) (20)

Tableau on Hadoop Meet Up: Advancing from Extracts to Live Connect
Tableau on Hadoop Meet Up: Advancing from Extracts to Live ConnectTableau on Hadoop Meet Up: Advancing from Extracts to Live Connect
Tableau on Hadoop Meet Up: Advancing from Extracts to Live Connect
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinar
 
Technologies for Data Analytics Platform
Technologies for Data Analytics PlatformTechnologies for Data Analytics Platform
Technologies for Data Analytics Platform
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinar
 
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
 
Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineering
 
Apache Drill at ApacheCon2014
Apache Drill at ApacheCon2014Apache Drill at ApacheCon2014
Apache Drill at ApacheCon2014
 
Presto Summit 2018 - 02 - LinkedIn
Presto Summit 2018  - 02 - LinkedInPresto Summit 2018  - 02 - LinkedIn
Presto Summit 2018 - 02 - LinkedIn
 
Apache hive
Apache hiveApache hive
Apache hive
 
Data Ingestion Engine
Data Ingestion EngineData Ingestion Engine
Data Ingestion Engine
 
Apache drill
Apache drillApache drill
Apache drill
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in HadoopBackup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 
StreamHorizon overview
StreamHorizon overviewStreamHorizon overview
StreamHorizon overview
 
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
 
Is the traditional data warehouse dead?
Is the traditional data warehouse dead?Is the traditional data warehouse dead?
Is the traditional data warehouse dead?
 
Testing Big Data: Automated Testing of Hadoop with QuerySurge
Testing Big Data: Automated  Testing of Hadoop with QuerySurgeTesting Big Data: Automated  Testing of Hadoop with QuerySurge
Testing Big Data: Automated Testing of Hadoop with QuerySurge
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Incredible Impala
Incredible Impala Incredible Impala
Incredible Impala
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 

Recently uploaded

"Hands-on development experience using wasm Blazor", Furdak Vladyslav.pptx
"Hands-on development experience using wasm Blazor", Furdak Vladyslav.pptx"Hands-on development experience using wasm Blazor", Furdak Vladyslav.pptx
"Hands-on development experience using wasm Blazor", Furdak Vladyslav.pptx
Fwdays
 
Increase Quality with User Access Policies - July 2024
Increase Quality with User Access Policies - July 2024Increase Quality with User Access Policies - July 2024
Increase Quality with User Access Policies - July 2024
Peter Caitens
 
FIDO Munich Seminar FIDO Automotive Apps.pptx
FIDO Munich Seminar FIDO Automotive Apps.pptxFIDO Munich Seminar FIDO Automotive Apps.pptx
FIDO Munich Seminar FIDO Automotive Apps.pptx
FIDO Alliance
 
Mule Experience Hub and Release Channel with Java 17
Mule Experience Hub and Release Channel with Java 17Mule Experience Hub and Release Channel with Java 17
Mule Experience Hub and Release Channel with Java 17
Bhajan Mehta
 
DefCamp_2016_Chemerkin_Yury_--_publish.pdf
DefCamp_2016_Chemerkin_Yury_--_publish.pdfDefCamp_2016_Chemerkin_Yury_--_publish.pdf
DefCamp_2016_Chemerkin_Yury_--_publish.pdf
Yury Chemerkin
 
Indian Privacy law & Infosec for Startups
Indian Privacy law & Infosec for StartupsIndian Privacy law & Infosec for Startups
Indian Privacy law & Infosec for Startups
AMol NAik
 
Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...
Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...
Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...
Snarky Security
 
The History of Embeddings & Multimodal Embeddings
The History of Embeddings & Multimodal EmbeddingsThe History of Embeddings & Multimodal Embeddings
The History of Embeddings & Multimodal Embeddings
Zilliz
 
"Making .NET Application Even Faster", Sergey Teplyakov.pptx
"Making .NET Application Even Faster", Sergey Teplyakov.pptx"Making .NET Application Even Faster", Sergey Teplyakov.pptx
"Making .NET Application Even Faster", Sergey Teplyakov.pptx
Fwdays
 
Finetuning GenAI For Hacking and Defending
Finetuning GenAI For Hacking and DefendingFinetuning GenAI For Hacking and Defending
Finetuning GenAI For Hacking and Defending
Priyanka Aash
 
"Building Future-Ready Apps with .NET 8 and Azure Serverless Ecosystem", Stan...
"Building Future-Ready Apps with .NET 8 and Azure Serverless Ecosystem", Stan..."Building Future-Ready Apps with .NET 8 and Azure Serverless Ecosystem", Stan...
"Building Future-Ready Apps with .NET 8 and Azure Serverless Ecosystem", Stan...
Fwdays
 
FIDO Munich Seminar: Strong Workforce Authn Push & Pull Factors.pptx
FIDO Munich Seminar: Strong Workforce Authn Push & Pull Factors.pptxFIDO Munich Seminar: Strong Workforce Authn Push & Pull Factors.pptx
FIDO Munich Seminar: Strong Workforce Authn Push & Pull Factors.pptx
FIDO Alliance
 
Keynote : AI & Future Of Offensive Security
Keynote : AI & Future Of Offensive SecurityKeynote : AI & Future Of Offensive Security
Keynote : AI & Future Of Offensive Security
Priyanka Aash
 
Demystifying Neural Networks And Building Cybersecurity Applications
Demystifying Neural Networks And Building Cybersecurity ApplicationsDemystifying Neural Networks And Building Cybersecurity Applications
Demystifying Neural Networks And Building Cybersecurity Applications
Priyanka Aash
 
FIDO Munich Seminar Introduction to FIDO.pptx
FIDO Munich Seminar Introduction to FIDO.pptxFIDO Munich Seminar Introduction to FIDO.pptx
FIDO Munich Seminar Introduction to FIDO.pptx
FIDO Alliance
 
FIDO Munich Seminar In-Vehicle Payment Trends.pptx
FIDO Munich Seminar In-Vehicle Payment Trends.pptxFIDO Munich Seminar In-Vehicle Payment Trends.pptx
FIDO Munich Seminar In-Vehicle Payment Trends.pptx
FIDO Alliance
 
FIDO Munich Seminar: Biometrics and Passkeys for In-Vehicle Apps.pptx
FIDO Munich Seminar: Biometrics and Passkeys for In-Vehicle Apps.pptxFIDO Munich Seminar: Biometrics and Passkeys for In-Vehicle Apps.pptx
FIDO Munich Seminar: Biometrics and Passkeys for In-Vehicle Apps.pptx
FIDO Alliance
 
The Challenge of Interpretability in Generative AI Models.pdf
The Challenge of Interpretability in Generative AI Models.pdfThe Challenge of Interpretability in Generative AI Models.pdf
The Challenge of Interpretability in Generative AI Models.pdf
Sara Kroft
 
Generative AI Reasoning Tech Talk - July 2024
Generative AI Reasoning Tech Talk - July 2024Generative AI Reasoning Tech Talk - July 2024
Generative AI Reasoning Tech Talk - July 2024
siddu769252
 
Mastering Board Best Practices: Essential Skills for Effective Non-profit Lea...
Mastering Board Best Practices: Essential Skills for Effective Non-profit Lea...Mastering Board Best Practices: Essential Skills for Effective Non-profit Lea...
Mastering Board Best Practices: Essential Skills for Effective Non-profit Lea...
OnBoard
 

Recently uploaded (20)

"Hands-on development experience using wasm Blazor", Furdak Vladyslav.pptx
"Hands-on development experience using wasm Blazor", Furdak Vladyslav.pptx"Hands-on development experience using wasm Blazor", Furdak Vladyslav.pptx
"Hands-on development experience using wasm Blazor", Furdak Vladyslav.pptx
 
Increase Quality with User Access Policies - July 2024
Increase Quality with User Access Policies - July 2024Increase Quality with User Access Policies - July 2024
Increase Quality with User Access Policies - July 2024
 
FIDO Munich Seminar FIDO Automotive Apps.pptx
FIDO Munich Seminar FIDO Automotive Apps.pptxFIDO Munich Seminar FIDO Automotive Apps.pptx
FIDO Munich Seminar FIDO Automotive Apps.pptx
 
Mule Experience Hub and Release Channel with Java 17
Mule Experience Hub and Release Channel with Java 17Mule Experience Hub and Release Channel with Java 17
Mule Experience Hub and Release Channel with Java 17
 
DefCamp_2016_Chemerkin_Yury_--_publish.pdf
DefCamp_2016_Chemerkin_Yury_--_publish.pdfDefCamp_2016_Chemerkin_Yury_--_publish.pdf
DefCamp_2016_Chemerkin_Yury_--_publish.pdf
 
Indian Privacy law & Infosec for Startups
Indian Privacy law & Infosec for StartupsIndian Privacy law & Infosec for Startups
Indian Privacy law & Infosec for Startups
 
Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...
Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...
Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...
 
The History of Embeddings & Multimodal Embeddings
The History of Embeddings & Multimodal EmbeddingsThe History of Embeddings & Multimodal Embeddings
The History of Embeddings & Multimodal Embeddings
 
"Making .NET Application Even Faster", Sergey Teplyakov.pptx
"Making .NET Application Even Faster", Sergey Teplyakov.pptx"Making .NET Application Even Faster", Sergey Teplyakov.pptx
"Making .NET Application Even Faster", Sergey Teplyakov.pptx
 
Finetuning GenAI For Hacking and Defending
Finetuning GenAI For Hacking and DefendingFinetuning GenAI For Hacking and Defending
Finetuning GenAI For Hacking and Defending
 
"Building Future-Ready Apps with .NET 8 and Azure Serverless Ecosystem", Stan...
"Building Future-Ready Apps with .NET 8 and Azure Serverless Ecosystem", Stan..."Building Future-Ready Apps with .NET 8 and Azure Serverless Ecosystem", Stan...
"Building Future-Ready Apps with .NET 8 and Azure Serverless Ecosystem", Stan...
 
FIDO Munich Seminar: Strong Workforce Authn Push & Pull Factors.pptx
FIDO Munich Seminar: Strong Workforce Authn Push & Pull Factors.pptxFIDO Munich Seminar: Strong Workforce Authn Push & Pull Factors.pptx
FIDO Munich Seminar: Strong Workforce Authn Push & Pull Factors.pptx
 
Keynote : AI & Future Of Offensive Security
Keynote : AI & Future Of Offensive SecurityKeynote : AI & Future Of Offensive Security
Keynote : AI & Future Of Offensive Security
 
Demystifying Neural Networks And Building Cybersecurity Applications
Demystifying Neural Networks And Building Cybersecurity ApplicationsDemystifying Neural Networks And Building Cybersecurity Applications
Demystifying Neural Networks And Building Cybersecurity Applications
 
FIDO Munich Seminar Introduction to FIDO.pptx
FIDO Munich Seminar Introduction to FIDO.pptxFIDO Munich Seminar Introduction to FIDO.pptx
FIDO Munich Seminar Introduction to FIDO.pptx
 
FIDO Munich Seminar In-Vehicle Payment Trends.pptx
FIDO Munich Seminar In-Vehicle Payment Trends.pptxFIDO Munich Seminar In-Vehicle Payment Trends.pptx
FIDO Munich Seminar In-Vehicle Payment Trends.pptx
 
FIDO Munich Seminar: Biometrics and Passkeys for In-Vehicle Apps.pptx
FIDO Munich Seminar: Biometrics and Passkeys for In-Vehicle Apps.pptxFIDO Munich Seminar: Biometrics and Passkeys for In-Vehicle Apps.pptx
FIDO Munich Seminar: Biometrics and Passkeys for In-Vehicle Apps.pptx
 
The Challenge of Interpretability in Generative AI Models.pdf
The Challenge of Interpretability in Generative AI Models.pdfThe Challenge of Interpretability in Generative AI Models.pdf
The Challenge of Interpretability in Generative AI Models.pdf
 
Generative AI Reasoning Tech Talk - July 2024
Generative AI Reasoning Tech Talk - July 2024Generative AI Reasoning Tech Talk - July 2024
Generative AI Reasoning Tech Talk - July 2024
 
Mastering Board Best Practices: Essential Skills for Effective Non-profit Lea...
Mastering Board Best Practices: Essential Skills for Effective Non-profit Lea...Mastering Board Best Practices: Essential Skills for Effective Non-profit Lea...
Mastering Board Best Practices: Essential Skills for Effective Non-profit Lea...
 

Jethro for tableau webinar (11 15)

  • 2. Webinar Topics • Who is Jethro? • Tableau & Big Data: Extract vs. Live Connect • Big Data Platforms: Hadoop vs. EDW Appliances • Two DB architectures: Full-scan vs. Index Access • Live Demo: Tableau over Impala / Redshift / Jethro • What is Jethro for Tableau and how it accelerates Tableau’s performance • Q&A
  • 3. About Us • What does Jethro do? – SQL engine optimized for accelerating BI on big data • How it works? – Combines Columnar SQL DB design with full-indexing technology • Where is it? – In dev since 2012; GA: mid 2015 – Download & free eval • When to use it? – BI Spinner Syndrome (BSS) • Partnerships – BI and Hadoop vendors • Speaker – Eli Singer, CEO JethroData – esinger@jethrodata.com – 917.509.6111 • Experience – Long-time DBA – Over 20 years of leading Tech startups • Where to find us – Jethrodata.com – @JethroData
  • 4. Tableau and Big Data: Extract (In-Mem) Tableau Extract EDW / Hadoop • Typical Tableau usage is based on extracting selective data from remote sources • Extracted data is then dynamically loaded into Tableau memory for interactive analysis • Limitations: Performance degradation and scale (typically ~200M rows)
  • 5. Tableau and Big Data: Live Connect (In-DB) Tableau EDW / Hadoop • Tableau issues SQL queries to the target DB for every user interaction • DB retrieves requested data and returns to Tableau • Limitation: DB performance is significantly slower than in-mem speed Live Connect
  • 6. Big Data Platforms: Hadoop Vs. EDW Appliances 10x-100x Data 1/10 HW $cost Open Platform Analytics: ETL, Predictive, Reporting, BI SQL enables the change of data platform while keeping the analytic apps intact
  • 7. The Hadoop Trade-Off: Scale & Cost Vs. Performance SQL-on-Hadoop ETL Predictive Reporting  BI Too SLOW in Hadoopx It’s unrealistic to expect to the same performance when data is much larger, and highly optimized hardware is replaced with commodity boxes.
  • 8. SQL-on-Hadoop – MPP / Full Scan Architecture Architecture: MPP / Full-Scan (All SQL-on-Hadoop) Query: List books by author “Stephen King” Process: Each librarian is assigned a rack, they then pull each book, check if author is “Stephen King”, if so, get book title Result: Too slow, costly, unscalable. Unsuitable for BI A Library Analogy: Billions of books, Thousands of racks
  • 9. SQL-on-Hadoop – Index-Access Architecture Architecture: Index Access (Only Jethro) Query: List books by author “Stephen King” Process: Access Author index, entry of “Stephen King”, get list of books, fetch only these books Result: Fast, minimal resources, scalable Optimal for BI
  • 10. 10 SQL on Hadoop – Competitive Landscape • Hive • Impala • Presto • SparkSQL • Drill • Pivotal/HAWQ • IBM/Big SQL • Actian • Teradata/SQL-H • … • Jethro Full-Scan Based Solutions Reads all rows. Every Time. Index Based Solution Reads ONLY needed rows. Use-Case Comparison: Full-Scan: Optimal for Predictive, reporting Index: Optimal for Interactive BI
  • 11. LIVE Benchmark: BI on Hadoop (and Redshift) Hardware – AWS • Hadoop: CDH 5.4 • 6 nodes: m1.xlarge, r3.xlarge • Jethro: r3.8xlarge • Point browser at: tableau.jethrodata.com – UID/PWD: demo / demo • Choose workbook: “Jethro”, “Impala”, “Redshift” • BI Dashboard: choose year, category or any other filter to drill-down • Data – Based on TPC-DS benchmark – 1TB raw data (400GB fact) – Fact table: ~2.9B rows – Dimensions: 7 Hardware Data Format Hadoop Cluster Compute Cluster Total RAM, CPU AWS $ per hr. Jethro Jethro indexes (250GB) 3x m1.xlarge 2x r3.4xlarge (spot) 289GB, 44 cores $0.80 Impala Parquet (160GB) 8x r3.2xlarge 1x r3.xlarge 510GB 68 cores $5.95 Redshift Redshift (229GB) 8x dc1.large 120GB, 16 cores $2.00
  • 12. What Is Jethro for Tableau? Tableau EDW / Hadoop / Cloud / Local FS / NAS Extract • An indexing & caching server • Relevant data is extracted from EDW / Hadoop into Jethro. No size limitation • Jethro then fully indexes the data (every column!) • Jethro’s column and index files are stored back in Hadoop (or other storage system) • Tableau uses Live Connect to send Jethro SQL queries (ODBC) • Jethro uses indexes to speed up queries and return results to Tableau Live Connect 2. Store 3. 1.
  • 13. Selecting Data for Jethro Acceleration • Select only Tableau “worthy” datasets – Not ALL data in Hadoop should have Jethro • Use any ETL tool to extract from source – Jethro receives data in a CSV/delimited format – Extracted data can be temporarily stored in a file or “piped” live to Jethro • After initial creation, incremental loads are supported – As frequently as every few min • Jethro stores it’s version of the dataset back in HDFS – Can also use local filesystem, network storage or cloud storage • Load is fast – ~1B rows/hour – Data in highly compressed: 1TB -> 400GB data + indexes EDW / Hadoop Extract
  • 14. Data Node Index-Access – How it works Data Node Data Node Data Node Data Node Jethro Query Node Query Node 1. Index Access 2. Read data only for require rows Performance and resources based on the size of the working-set Storage - HDFS - Cloud (S3, EFS) - NAS/SAN - Local FS Tableau SELECT day, sum(sales) FROM t1 WHERE prod=‘abc’ GROUP BY day
  • 15. Jethro Indexes – Superior Technology http://www.google.com/patents/WO2013001535A3?cl=enPatent Pending: • Complete – Every column is indexed • Simple  Inverted-list indexes map each column value to a list of rows • Fast to read  Direct Access to a value entry  No need to scan entire index, or load index to memory • Scalable  Distributed, highly hierarchical compressed bitmaps Appendable Index Structure for Fast Incremental Loads
  • 16. Adaptive Optimization: Active Cache of Query Results • Reuse of intermediate/final query results – Repeat queries return immediately • Addresses wide top-of-the-funnel queries – Exploration starts with queries with no/few filters – Those queries are likely to be repeated in dashboard scenarios • Transparently adapts to incremental loads – Execution on delta data + merge saved results Query Speed Query Selectivity Fast Slow Few More Query speed Query Selectivity Fast Slow Few More Query speed Query Selectivity Fast Slow Few More Index Performance Cache Performance Index + Cache
  • 17. Summary: Why Index Access Optimal for BI? 1. Use of indexes eliminates need to read unnecessary data 2. The deeper you go, the faster it gets: as users drill down and add more filters the faster the queries perform 3. Unlimited flexibility: users can aggregate and filter by any columns they choose with no performance penalty 4. Concurrent users accessing dashboards generate repeatable queries that result in high cache efficiency 5. Shields BI workload from other analytics overwhelming the cluster
  • 18. Ready to Try Jethro? 1. Register: jethrodata.com/download-jethro-for-tableau 2. Schedule a 45min POC review with Jethro SA (free!) 3. One time setup - Download and Install Jethro on a server / VM - Start services, configure instance 4. Extract & Load data 5. Use Tableau - Install ODBC driver - Point Tableau data source at Jethro That’s It!
  • 19. Q&A