SlideShare a Scribd company logo
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Tony Gibbs, Big Data Solutions Architect
Data Warehousing with Amazon
Redshift
Deep Dive Agenda
• Amazon Redshift history and development
• Cluster architecture
• Concepts and terminology
• Node components
• Storage deep dive
• Design considerations
• Parallelism deep dive
• Open Q&A
Amazon Redshift History &
Development
Columnar
MPP
OLAP
AWS IAMAmazon
VPC
Amazon SWF
Amazon S3 AWS KMS Amazon
Route 53
Amazon
CloudWatch
Amazon
EC2
February 2013
October 2016
> 100 Significant Patches
> 130 Significant Features
Amazon Redshift Cluster
Architecture
Amazon Redshift Cluster Architecture
Massively parallel, shared nothing
Leader node
• SQL endpoint
• Stores metadata
• Coordinates parallel SQL processing
Compute nodes
• Local, columnar storage
• Executes queries in parallel
• Load, backup, restore
10 GigE
(HPC)
Ingestion
Backup
Restore
SQL Clients/BI Tools
128GB RAM
16TB disk
16 cores
S3 / EMR / DynamoDB / SSH
JDBC/ODBC
128GB RAM
16TB disk
16 coresCompute
Node
128GB RAM
16TB disk
16 coresCompute
Node
128GB RAM
16TB disk
16 coresCompute
Node
Leader
Node
Compute & Leader Node
Components
128GB RAM
16TB disk
16 cores
128GB RAM
16TB disk
16 cores
Compute Node
128GB RAM
16TB disk
16 cores
Compute Node
128GB RAM
16TB disk
16 cores
Compute Node
Leader Node
128GB RAM
16TB disk
16 cores
128GB RAM
16TB disk
16 cores
Compute Node
128GB RAM
16TB disk
16 cores
Compute Node
128GB RAM
16TB disk
16 cores
Compute Node
Leader Node
• Parser & Rewriter
• Planner & Optimizer
• Code Generator
• Input: Optimized plan
• Output: >=1 C++
functions
• Compiler
• Task Scheduler
• WLM
• Admission
• Scheduling
• PostgreSQL Catalog Tables
128GB RAM
16TB disk
16 cores
128GB RAM
16TB disk
16 cores
Compute Node
128GB RAM
16TB disk
16 cores
Compute Node
128GB RAM
16TB disk
16 cores
Compute Node
Leader Node
• Query execution processes
• Backup & restore processes
• Replication processes
• Local Storage
• Disks
• Slices
• Tables
• Columns
• Blocks
• Superblocks
128GB RAM
16TB disk
16 cores
128GB RAM
16TB disk
16 cores
Compute Node
128GB RAM
16TB disk
16 cores
Compute Node
128GB RAM
16TB disk
16 cores
Compute Node
Leader Node
• Query execution processes
• Backup & restore processes
• Replication processes
• Local Storage
• Disks
• Slices
• Tables
• Columns
• Blocks
• Superblocks
Concepts and Terminology
Designed for I/O Reduction
Columnar storage
Data compression
Zone maps
aid loc dt
1 SFO 2016-09-01
2 JFK 2016-09-14
3 SFO 2017-04-01
4 JFK 2017-05-14
• Accessing dt with row storage:
– Need to read everything
– Unnecessary I/O
aid loc dt
CREATE TABLE loft_deep_dive (
aid INT --audience_id
,loc CHAR(3) --location
,dt DATE --date
);
Designed for I/O Reduction
Columnar storage
Data compression
Zone maps
aid loc dt
1 SFO 2016-09-01
2 JFK 2016-09-14
3 SFO 2017-04-01
4 JFK 2017-05-14
• Accessing dt with columnar storage:
– Only scan blocks for relevant
column
aid loc dt
CREATE TABLE loft_deep_dive (
aid INT --audience_id
,loc CHAR(3) --location
,dt DATE --date
);
Designed for I/O Reduction
Columnar storage
Data compression
Zone maps
aid loc dt
1 SFO 2016-09-01
2 JFK 2016-09-14
3 SFO 2017-04-01
4 JFK 2017-05-14
• Columns grow and shrink independently
• Effective compression ratios due to like
data
• Reduces storage requirements
• Reduces I/O
aid loc dt
CREATE TABLE loft_deep_dive (
aid INT ENCODE LZO
,loc CHAR(3) ENCODE BYTEDICT
,dt DATE ENCODE RUNLENGTH
);
Designed for I/O Reduction
Columnar storage
Data compression
Zone maps
aid loc dt
1 SFO 2016-09-01
2 JFK 2016-09-14
3 SFO 2017-04-01
4 JFK 2017-05-14
aid loc dt
CREATE TABLE loft_deep_dive (
aid INT --audience_id
,loc CHAR(3) --location
,dt DATE --date
);
• In-memory block metadata
• Contains per-block MIN and MAX value
• Effectively prunes blocks which cannot
contain data for a given query
• Eliminates unnecessary I/O
SELECT COUNT(*) FROM LOGS WHERE DATE = '09-JUNE-2013'
MIN: 01-JUNE-2013
MAX: 20-JUNE-2013
MIN: 08-JUNE-2013
MAX: 30-JUNE-2013
MIN: 12-JUNE-2013
MAX: 20-JUNE-2013
MIN: 02-JUNE-2013
MAX: 25-JUNE-2013
Unsorted Table
MIN: 01-JUNE-2013
MAX: 06-JUNE-2013
MIN: 07-JUNE-2013
MAX: 12-JUNE-2013
MIN: 13-JUNE-2013
MAX: 18-JUNE-2013
MIN: 19-JUNE-2013
MAX: 24-JUNE-2013
Sorted By Date
Zone Maps
Terminology and Concepts: Data Sorting
• Goals:
• Physically order rows of table data based on certain column(s)
• Optimize effectiveness of zone maps
• Enable MERGE JOIN operations
• Impact:
• Enables rrscans to prune blocks by leveraging zone maps
• Overall reduction in block I/O
• Achieved with the table property SORTKEY defined over one or more
columns
• Optimal SORTKEY is dependent on:
• Query patterns
• Data profile
• Business requirements
Terminology and Concepts: Slices
A slice can be thought of like a “virtual compute node”
• Unit of data partitioning
• Parallel query processing
Facts about slices:
• Each compute node has either 2, 16, or 32 slices
• Table rows are distributed to slices
• A slice processes only its own data
Data Distribution
• Distribution style is a table property which dictates how that table’s data is
distributed throughout the cluster:
• KEY: Value is hashed, same value goes to same location (slice)
• ALL: Full table data goes to first slice of every node
• EVEN: Round robin
• Goals:
• Distribute data evenly for parallel processing
• Minimize data movement during query processing
KEY
ALL
Node 1
Slice
1
Slice
2
Node 2
Slice
3
Slice
4
Node 1
Slice
1
Slice
2
Node 2
Slice
3
Slice
4
Node 1
Slice
1
Slice
2
Node 2
Slice
3
Slice
4
EVEN
Data Distribution: Example
CREATE TABLE loft_deep_dive (
aid INT --audience_id
,loc CHAR(3) --location
,dt DATE --date
) DISTSTYLE (EVEN|KEY|ALL);
CN1
Slice 0 Slice 1
CN2
Slice 2 Slice 3
Table: loft_deep_dive
User
Columns
System
Columns
aid loc dt ins del row
Data Distribution: EVEN Example
CREATE TABLE loft_deep_dive (
aid INT --audience_id
,loc CHAR(3) --location
,dt DATE --date
) DISTSTYLE EVEN;
CN1
Slice 0 Slice 1
CN2
Slice 2 Slice 3
INSERT INTO loft_deep_dive VALUES
(1, 'SFO', '2016-09-01'),
(2, 'JFK', '2016-09-14'),
(3, 'SFO', '2017-04-01'),
(4, 'JFK', '2017-05-14');
Table: loft_deep_dive
User Columns System Columns
aid loc dt ins del row
Table: loft_deep_dive
User Columns System Columns
aid loc dt ins del row
Table: loft_deep_dive
User Columns System Columns
aid loc dt ins del row
Table: loft_deep_dive
User Columns System Columns
aid loc dt ins del row
Rows: 0 Rows: 0 Rows: 0 Rows: 0
(3 User Columns + 3 System Columns) x (4 slices) = 24 Blocks (24MB)
Rows: 1 Rows: 1 Rows: 1 Rows: 1
Data Distribution: KEY Example #1
CREATE TABLE loft_deep_dive (
aid INT --audience_id
,loc CHAR(3) --location
,dt DATE --date
) DISTSTYLE KEY DISTKEY (loc);
CN1
Slice 0 Slice 1
CN2
Slice 2 Slice 3
INSERT INTO loft_deep_dive VALUES
(1, 'SFO', '2016-09-01'),
(2, 'JFK', '2016-09-14'),
(3, 'SFO', '2017-04-01'),
(4, 'JFK', '2017-05-14');
Table: loft_deep_dive
User Columns System Columns
aid loc dt ins del row
Rows: 2 Rows: 0 Rows: 0
(3 User Columns + 3 System Columns) x (2 slices) = 12 Blocks (12MB)
Rows: 0Rows: 1
Table: loft_deep_dive
User Columns System Columns
aid loc dt ins del row
Rows: 2Rows: 0Rows: 1
Data Distribution: KEY Example #2
CREATE TABLE loft_deep_dive (
aid INT --audience_id
,loc CHAR(3) --location
,dt DATE --date
) DISTSTYLE KEY DISTKEY (aid);
CN1
Slice 0 Slice 1
CN2
Slice 2 Slice 3
INSERT INTO loft_deep_dive VALUES
(1, 'SFO', '2016-09-01'),
(2, 'JFK', '2016-09-14'),
(3, 'SFO', '2017-04-01'),
(4, 'JFK', '2017-05-14');
Table: loft_deep_dive
User Columns System Columns
aid loc dt ins del row
Table: loft_deep_dive
User Columns System Columns
aid loc dt ins del row
Table: loft_deep_dive
User Columns System Columns
aid loc dt ins del row
Table: loft_deep_dive
User Columns System Columns
aid loc dt ins del row
Rows: 0 Rows: 0 Rows: 0 Rows: 0
(3 User Columns + 3 System Columns) x (4 slices) = 24 Blocks (24MB)
Rows: 1 Rows: 1 Rows: 1 Rows: 1
Data Distribution: ALL Example
CREATE TABLE loft_deep_dive (
aid INT --audience_id
,loc CHAR(3) --location
,dt DATE --date
) DISTSTYLE ALL;
CN1
Slice 0 Slice 1
CN2
Slice 2 Slice 3
INSERT INTO loft_deep_dive VALUES
(1, 'SFO', '2016-09-01'),
(2, 'JFK', '2016-09-14'),
(3, 'SFO', '2017-04-01'),
(4, 'JFK', '2017-05-14');
Rows: 0 Rows: 0
(3 User Columns + 3 System Columns) x (2 slice) = 12 Blocks (12MB)
Table: loft_deep_dive
User Columns System Columns
aid loc dt ins del row
Rows: 0Rows: 1Rows: 2Rows: 4Rows: 3
Table: loft_deep_dive
User Columns System Columns
aid loc dt ins del row
Rows: 0Rows: 1Rows: 2Rows: 4Rows: 3
Terminology and Concepts: Data Distribution
KEY
• The key creates an even distribution of data
• Joins are performed between large fact/dimension tables
• Optimizing merge joins and group by
ALL
• Small and medium size dimension tables (< 2-3M)
EVEN
• When key cannot produce an even distribution
Storage Deep Dive
Storage Deep Dive: Disks
Amazon Redshift utilizes locally attached storage
devices
• Compute nodes have 2.5-3x the advertised storage
capacity
1, 3, 8, or 24 disks depending on node type
Each disk is split into two partitions
• Local data storage, accessed by local CN
• Mirrored data, accessed by remote CN
Partitions are raw devices
• Local storage devices are ephemeral in nature
• Tolerant to multiple disk failures on a single node
Storage Deep Dive: Blocks
Column data is persisted to 1MB immutable blocks
Each block contains in-memory metadata:
• Zone Maps (MIN/MAX value)
• Location of previous/next block
• Blocks are individually compressed with 1 of 10 encodings
A full block contains between 16 and 8.4 million values
Storage Deep Dive: Columns
Column: Logical structure accessible via SQL
Physical structure is a doubly linked list of blocks
These blockchains exist on each slice for each column
All sorted & unsorted blockchains compose a column
Column properties include:
• Distribution Key
• Sort Key
• Compression Encoding
Columns shrink and grow independently, 1 block at a time
Three system columns per table-per slice for MVCC
Block Properties: Design Considerations
• Small writes:
• Batch processing system, optimized for processing massive
amounts of data
• 1MB size + immutable blocks means that we clone blocks on write
so as not to introduce fragmentation
• Small write (~1-10 rows) has similar cost to a larger write (~100 K
rows)
• UPDATE and DELETE:
• Immutable blocks means that we only logically delete rows on
UPDATE or DELETE
• Must VACUUM or DEEP COPY to remove ghost rows from table
Column Properties: Design Considerations
• Compression:
• COPY automatically analyzes and compresses data when loading into empty
tables
• ANALYZE COMPRESSION checks existing tables and proposes optimal
compression algorithms for each column
• Changing column encoding requires a table rebuild
• DISTKEY and SORTKEY significantly influence performance (orders of magnitude)
• Distribution Keys:
• A poor DISTKEY can introduce data skew and an unbalanced workload
• A query completes only as fast as the slowest slice completes
• Sort Keys:
• A sortkey is only effective as the data profile allows it to be
• Selectivity needs to be considered
Parallelism Deep Dive
Storage Deep Dive: Slices
Each compute node has either 2, 16, or 32 slices
A slice can be thought of like a “virtual compute node”
• Unit of data partitioning
• Parallel query processing
Facts about slices:
• Table rows are distributed to slices
• A slice processes only its own data
• Within a compute node all slices read from and write to all
disks
128GB RAM
16TB disk
16 cores
128GB RAM
16TB disk
16 cores
Compute Node
128GB RAM
16TB disk
16 cores
Compute Node
128GB RAM
16TB disk
16 cores
Compute Node
Leader Node
• Parser & Rewriter
• Planner & Optimizer
• Code Generator
• Input: Optimized plan
• Output: >=1 C++
functions
• Compiler
• Task Scheduler
• WLM
• Admission
• Scheduling
• PostgreSQL Catalog
Tables
• Amazon Redshift
System Tables (STV)
128GB RAM
16TB disk
16 cores
128GB RAM
16TB disk
16 cores
Compute Node
128GB RAM
16TB disk
16 cores
Compute Node
128GB RAM
16TB disk
16 cores
Compute Node
Leader Node
• Parser & Rewriter
• Planner & Optimizer
• Code Generator
• Input: Optimized plan
• Output: >=1 C++
functions
• Compiler
• Task Scheduler
• WLM
• Admission
• Scheduling
• PostgreSQL Catalog Tables
• Amazon Redshift System
Tables (STV)
Query Execution Terminology
Step: An individual operation needed during query execution.
Steps need to be combined to allow compute nodes to perform a
join. Examples: scan, sort, hash, aggr
Segment: A combination of several steps that can be done by a
single process. The smallest compilation unit executable by a
slice. Segments within a stream run in parallel.
Stream: A collection of combined segments which output to the
next stream or SQL client.
Visualizing Streams, Segments, and Steps
Stream 0
Segment 0
Step 0 Step 1 Step 2
Segment 1
Step 0 Step 1 Step 2 Step 3 Step 4
Segment 2
Step 0 Step 1 Step 2 Step 3
Segment 3
Step 0 Step 1 Step 2 Step 3 Step 4 Step 5
Stream 1
Segment 4
Step 0 Step 1 Step 2 Step 3
Segment 5
Step 0 Step 1 Step 2
Segment 6
Step 0 Step 1 Step 2 Step 3 Step 4
Stream 2
Segment 7
Step 0 Step 1
Segment 8
Step 0 Step 1
Time
Query Lifecycle
client
JDBC ODBC
Leader Node
Parser
Query Planner
Code Generator
Final Computations
Generate code
for all segments
of one stream
Explain Plans
Compute Node
Receive Compiled
Code
Run the Compiled
Code
Return results to
Leader
Compute Node
Receive Compiled
Code
Run the Compiled Code
Return results to
Leader
Return results to client
Segments in a stream
are executed
concurrently. Each
step in a segment is
executed serially.
Query Execution Deep Dive: Leader Node
1.The leader node receives the query and parses the SQL.
2.The parser produces a logical representation of the original query.
3.This query tree is input into the query optimizer (volt).
4.Volt rewrites the query to maximize its efficiency. Sometimes a single query will
be rewritten as several dependent statements in the background.
5.The rewritten query is sent to the planner which generates >= 1 query plans for
the execution with the best estimated performance.
6.The query plan is sent to the execution engine, where it’s translated into steps,
segments, and streams.
7.This translated plan is sent to the code generator, which generates a C++
function for each segment.
8.This generated C++ is compiled with gcc to a .o file and distributed to the
compute nodes.
Query Execution Deep Dive: Compute Nodes
• Slices execute query segments in parallel
• Executable segments are created for one stream at a
time in sequence
• When the compute nodes are done, they return the
query results to the leader node for final processing
• Leader node merges data into a single result set and
addresses any needed sorting or aggregation
• Leader node then returns the results to the client
Visualizing Streams, Segments, and Steps
Stream 0
Segment 0
Step 0 Step 1 Step 2
Segment 1
Step 0 Step 1 Step 2 Step 3 Step 4
Segment 2
Step 0 Step 1 Step 2 Step 3
Segment 3
Step 0 Step 1 Step 2 Step 3 Step 4 Step 5
Stream 1
Segment 4
Step 0 Step 1 Step 2 Step 3
Segment 5
Step 0 Step 1 Step 2
Segment 6
Step 0 Step 1 Step 2 Step 3 Step 4
Stream 2
Segment 7
Step 0 Step 1
Segment 8
Step 0 Step 1
Time
Query Execution
Stream 0
Segment 0
Step 0 Step 1 Step 2
Segment 1
Step 0 Step 1 Step 2 Step 3 Step 4
Segment 2
Step 0 Step 1 Step 2 Step 3
Segment 3
Step 0 Step 1 Step 2 Step 3 Step 4 Step 5
Stream 1
Segment 4
Step 0 Step 1 Step 2 Step 3
Segment 5
Step 0 Step 1 Step 2
Segment 6
Step 0 Step 1 Step 2 Step 3 Step 4
Stream 2
Segment 7
Step 0 Step 1
Segment 8
Step 0 Step 1
Time
Stream 0
Segment 0
Step 0 Step 1 Step 2
Segment 1
Step 0 Step 1 Step 2 Step 3 Step 4
Segment 2
Step 0 Step 1 Step 2 Step 3
Segment 3
Step 0 Step 1 Step 2 Step 3 Step 4 Step 5
Stream 1
Segment 4
Step 0 Step 1 Step 2 Step 3
Segment 5
Step 0 Step 1 Step 2
Segment 6
Step 0 Step 1 Step 2 Step 3 Step 4
Stream 2
Segment 7
Step 0 Step 1
Segment 8
Step 0 Step 1
Stream 0
Segment 0
Step 0 Step 1 Step 2
Segment 1
Step 0 Step 1 Step 2 Step 3 Step 4
Segment 2
Step 0 Step 1 Step 2 Step 3
Segment 3
Step 0 Step 1 Step 2 Step 3 Step 4 Step 5
Stream 1
Segment 4
Step 0 Step 1 Step 2 Step 3
Segment 5
Step 0 Step 1 Step 2
Segment 6
Step 0 Step 1 Step 2 Step 3 Step 4
Stream 2
Segment 7
Step 0 Step 1
Segment 8
Step 0 Step 1
Stream 0
Segment 0
Step 0 Step 1 Step 2
Segment 1
Step 0 Step 1 Step 2 Step 3 Step 4
Segment 2
Step 0 Step 1 Step 2 Step 3
Segment 3
Step 0 Step 1 Step 2 Step 3 Step 4 Step 5
Stream 1
Segment 4
Step 0 Step 1 Step 2 Step 3
Segment 5
Step 0 Step 1 Step 2
Segment 6
Step 0 Step 1 Step 2 Step 3 Step 4
Stream 2
Segment 7
Step 0 Step 1
Segment 8
Step 0 Step 1
Slices
0
1
2
3
Parallelism considerations with Redshift slices
DS2.8XL Compute Node
Ingestion Throughput:
• Each slice’s query processors can load one file at a time:
• Streaming decompression
• Parse
• Distribute
• Write
Realizing only partial node usage as 6.25% of slices are active
0 2 4 6 8 10 12 141 3 5 7 9 11 13 15
Design considerations for Redshift slices
Use at least as many input files
as there are slices in the cluster
With 16 input files, all slices are
working so you maximize
throughput
COPY continues to scale linearly
as you add nodes 16 Input Files
DS2.8XL Compute Node
0 2 4 6 8 10 12 141 3 5 7 9 11 13 15
Open source tools
https://github.com/awslabs/amazon-redshift-utils
https://github.com/awslabs/amazon-redshift-monitoring
https://github.com/awslabs/amazon-redshift-udfs
Admin scripts
Collection of utilities for running diagnostics on your cluster
Admin views
Collection of utilities for managing your cluster, generating schema DDL, etc.
ColumnEncodingUtility
Gives you the ability to apply optimal column encoding to an established
schema with data already loaded
Thank you!
Q&A
tonygibb@amazon.com
aws.amazon.com/activate

More Related Content

What's hot

Top 10 Mistakes When Migrating From Oracle to PostgreSQL
Top 10 Mistakes When Migrating From Oracle to PostgreSQLTop 10 Mistakes When Migrating From Oracle to PostgreSQL
Top 10 Mistakes When Migrating From Oracle to PostgreSQL
Jim Mlodgenski
 
Introduction to Amazon Redshift
Introduction to Amazon RedshiftIntroduction to Amazon Redshift
Introduction to Amazon Redshift
Amazon Web Services
 
RocksDB compaction
RocksDB compactionRocksDB compaction
RocksDB compaction
MIJIN AN
 
Building large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBuilding large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudi
Bill Liu
 
Top 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsTop 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark Applications
Spark Summit
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
Ryan Blue
 
PostgreSQL Performance Tuning
PostgreSQL Performance TuningPostgreSQL Performance Tuning
PostgreSQL Performance Tuning
elliando dias
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
Amazon Web Services
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Noritaka Sekiyama
 
Best Practices for Running PostgreSQL on AWS - DAT314 - re:Invent 2017
Best Practices for Running PostgreSQL on AWS - DAT314 - re:Invent 2017Best Practices for Running PostgreSQL on AWS - DAT314 - re:Invent 2017
Best Practices for Running PostgreSQL on AWS - DAT314 - re:Invent 2017
Amazon Web Services
 
HBase Low Latency
HBase Low LatencyHBase Low Latency
HBase Low Latency
DataWorks Summit
 
Introduction to AWS Glue
Introduction to AWS Glue Introduction to AWS Glue
Introduction to AWS Glue
Amazon Web Services
 
Performance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaPerformance Optimizations in Apache Impala
Performance Optimizations in Apache Impala
Cloudera, Inc.
 
Masterclass - Redshift
Masterclass - RedshiftMasterclass - Redshift
Masterclass - Redshift
Amazon Web Services
 
Apache Spark Core—Deep Dive—Proper Optimization
Apache Spark Core—Deep Dive—Proper OptimizationApache Spark Core—Deep Dive—Proper Optimization
Apache Spark Core—Deep Dive—Proper Optimization
Databricks
 
(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon Redshift(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon Redshift
Amazon Web Services
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon Redshift
Amazon Web Services
 
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark Summit
 
Introduction to Amazon Athena
Introduction to Amazon AthenaIntroduction to Amazon Athena
Introduction to Amazon Athena
Amazon Web Services
 
HBase in Practice
HBase in PracticeHBase in Practice
HBase in Practice
larsgeorge
 

What's hot (20)

Top 10 Mistakes When Migrating From Oracle to PostgreSQL
Top 10 Mistakes When Migrating From Oracle to PostgreSQLTop 10 Mistakes When Migrating From Oracle to PostgreSQL
Top 10 Mistakes When Migrating From Oracle to PostgreSQL
 
Introduction to Amazon Redshift
Introduction to Amazon RedshiftIntroduction to Amazon Redshift
Introduction to Amazon Redshift
 
RocksDB compaction
RocksDB compactionRocksDB compaction
RocksDB compaction
 
Building large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBuilding large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudi
 
Top 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsTop 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark Applications
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
 
PostgreSQL Performance Tuning
PostgreSQL Performance TuningPostgreSQL Performance Tuning
PostgreSQL Performance Tuning
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
 
Best Practices for Running PostgreSQL on AWS - DAT314 - re:Invent 2017
Best Practices for Running PostgreSQL on AWS - DAT314 - re:Invent 2017Best Practices for Running PostgreSQL on AWS - DAT314 - re:Invent 2017
Best Practices for Running PostgreSQL on AWS - DAT314 - re:Invent 2017
 
HBase Low Latency
HBase Low LatencyHBase Low Latency
HBase Low Latency
 
Introduction to AWS Glue
Introduction to AWS Glue Introduction to AWS Glue
Introduction to AWS Glue
 
Performance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaPerformance Optimizations in Apache Impala
Performance Optimizations in Apache Impala
 
Masterclass - Redshift
Masterclass - RedshiftMasterclass - Redshift
Masterclass - Redshift
 
Apache Spark Core—Deep Dive—Proper Optimization
Apache Spark Core—Deep Dive—Proper OptimizationApache Spark Core—Deep Dive—Proper Optimization
Apache Spark Core—Deep Dive—Proper Optimization
 
(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon Redshift(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon Redshift
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon Redshift
 
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
 
Introduction to Amazon Athena
Introduction to Amazon AthenaIntroduction to Amazon Athena
Introduction to Amazon Athena
 
HBase in Practice
HBase in PracticeHBase in Practice
HBase in Practice
 

Viewers also liked

Redshift performance tuning
Redshift performance tuningRedshift performance tuning
Redshift performance tuning
Carlos del Cacho
 
Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift
Amazon Web Services
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon Redshift
Amazon Web Services
 
Deep Dive Amazon Redshift for Big Data Analytics - September Webinar Series
Deep Dive Amazon Redshift for Big Data Analytics - September Webinar SeriesDeep Dive Amazon Redshift for Big Data Analytics - September Webinar Series
Deep Dive Amazon Redshift for Big Data Analytics - September Webinar Series
Amazon Web Services
 
Amazon Redshift: Performance Tuning and Optimization
Amazon Redshift: Performance Tuning and OptimizationAmazon Redshift: Performance Tuning and Optimization
Amazon Redshift: Performance Tuning and Optimization
Amazon Web Services
 
RedShift-Performance turning in few clicks
RedShift-Performance turning in few clicksRedShift-Performance turning in few clicks
RedShift-Performance turning in few clicks
Sadagopan Iyengar
 
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호
Amazon Web Services Korea
 
Building prediction models with Amazon Redshift and Amazon Machine Learning -...
Building prediction models with Amazon Redshift and Amazon Machine Learning -...Building prediction models with Amazon Redshift and Amazon Machine Learning -...
Building prediction models with Amazon Redshift and Amazon Machine Learning -...
Amazon Web Services
 
Security Innovations in the Cloud
Security Innovations in the CloudSecurity Innovations in the Cloud
Security Innovations in the Cloud
Amazon Web Services
 
Getting Started on AWS
Getting Started on AWS Getting Started on AWS
Getting Started on AWS
Amazon Web Services
 
Deep Dive on Microservices and Amazon ECS
Deep Dive on Microservices and Amazon ECSDeep Dive on Microservices and Amazon ECS
Deep Dive on Microservices and Amazon ECS
Amazon Web Services
 
AWS Enterprise Summit Netherlands - Starting Your Journey in the Cloud
AWS Enterprise Summit Netherlands - Starting Your Journey in the CloudAWS Enterprise Summit Netherlands - Starting Your Journey in the Cloud
AWS Enterprise Summit Netherlands - Starting Your Journey in the Cloud
Amazon Web Services
 
AWS Enterprise Summit Netherlands - Big Data Architectural Patterns & Best Pr...
AWS Enterprise Summit Netherlands - Big Data Architectural Patterns & Best Pr...AWS Enterprise Summit Netherlands - Big Data Architectural Patterns & Best Pr...
AWS Enterprise Summit Netherlands - Big Data Architectural Patterns & Best Pr...
Amazon Web Services
 
Getting started with Amazon ElastiCache
Getting started with Amazon ElastiCacheGetting started with Amazon ElastiCache
Getting started with Amazon ElastiCache
Amazon Web Services
 
Rackspace: Best Practices for Security Compliance on AWS
Rackspace: Best Practices for Security Compliance on AWSRackspace: Best Practices for Security Compliance on AWS
Rackspace: Best Practices for Security Compliance on AWS
Amazon Web Services
 
AWS Enterprise Summit Netherlands - Enterprise Applications on AWS
AWS Enterprise Summit Netherlands - Enterprise Applications on AWSAWS Enterprise Summit Netherlands - Enterprise Applications on AWS
AWS Enterprise Summit Netherlands - Enterprise Applications on AWS
Amazon Web Services
 
DevOps at Amazon: A Look at Our Tools and Processes
 DevOps at Amazon: A Look at Our Tools and Processes DevOps at Amazon: A Look at Our Tools and Processes
DevOps at Amazon: A Look at Our Tools and Processes
Amazon Web Services
 
AWS Enterprise Summit Netherlands - Cost Optimisation at Scale
AWS Enterprise Summit Netherlands - Cost Optimisation at ScaleAWS Enterprise Summit Netherlands - Cost Optimisation at Scale
AWS Enterprise Summit Netherlands - Cost Optimisation at Scale
Amazon Web Services
 
AWS Enterprise Summit Netherlands - Creating a Landing Zone
AWS Enterprise Summit Netherlands - Creating a Landing ZoneAWS Enterprise Summit Netherlands - Creating a Landing Zone
AWS Enterprise Summit Netherlands - Creating a Landing Zone
Amazon Web Services
 
Fast Data at Scale with Amazon ElastiCache for Redis
Fast Data at Scale with Amazon ElastiCache for RedisFast Data at Scale with Amazon ElastiCache for Redis
Fast Data at Scale with Amazon ElastiCache for Redis
Amazon Web Services
 

Viewers also liked (20)

Redshift performance tuning
Redshift performance tuningRedshift performance tuning
Redshift performance tuning
 
Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon Redshift
 
Deep Dive Amazon Redshift for Big Data Analytics - September Webinar Series
Deep Dive Amazon Redshift for Big Data Analytics - September Webinar SeriesDeep Dive Amazon Redshift for Big Data Analytics - September Webinar Series
Deep Dive Amazon Redshift for Big Data Analytics - September Webinar Series
 
Amazon Redshift: Performance Tuning and Optimization
Amazon Redshift: Performance Tuning and OptimizationAmazon Redshift: Performance Tuning and Optimization
Amazon Redshift: Performance Tuning and Optimization
 
RedShift-Performance turning in few clicks
RedShift-Performance turning in few clicksRedShift-Performance turning in few clicks
RedShift-Performance turning in few clicks
 
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호
 
Building prediction models with Amazon Redshift and Amazon Machine Learning -...
Building prediction models with Amazon Redshift and Amazon Machine Learning -...Building prediction models with Amazon Redshift and Amazon Machine Learning -...
Building prediction models with Amazon Redshift and Amazon Machine Learning -...
 
Security Innovations in the Cloud
Security Innovations in the CloudSecurity Innovations in the Cloud
Security Innovations in the Cloud
 
Getting Started on AWS
Getting Started on AWS Getting Started on AWS
Getting Started on AWS
 
Deep Dive on Microservices and Amazon ECS
Deep Dive on Microservices and Amazon ECSDeep Dive on Microservices and Amazon ECS
Deep Dive on Microservices and Amazon ECS
 
AWS Enterprise Summit Netherlands - Starting Your Journey in the Cloud
AWS Enterprise Summit Netherlands - Starting Your Journey in the CloudAWS Enterprise Summit Netherlands - Starting Your Journey in the Cloud
AWS Enterprise Summit Netherlands - Starting Your Journey in the Cloud
 
AWS Enterprise Summit Netherlands - Big Data Architectural Patterns & Best Pr...
AWS Enterprise Summit Netherlands - Big Data Architectural Patterns & Best Pr...AWS Enterprise Summit Netherlands - Big Data Architectural Patterns & Best Pr...
AWS Enterprise Summit Netherlands - Big Data Architectural Patterns & Best Pr...
 
Getting started with Amazon ElastiCache
Getting started with Amazon ElastiCacheGetting started with Amazon ElastiCache
Getting started with Amazon ElastiCache
 
Rackspace: Best Practices for Security Compliance on AWS
Rackspace: Best Practices for Security Compliance on AWSRackspace: Best Practices for Security Compliance on AWS
Rackspace: Best Practices for Security Compliance on AWS
 
AWS Enterprise Summit Netherlands - Enterprise Applications on AWS
AWS Enterprise Summit Netherlands - Enterprise Applications on AWSAWS Enterprise Summit Netherlands - Enterprise Applications on AWS
AWS Enterprise Summit Netherlands - Enterprise Applications on AWS
 
DevOps at Amazon: A Look at Our Tools and Processes
 DevOps at Amazon: A Look at Our Tools and Processes DevOps at Amazon: A Look at Our Tools and Processes
DevOps at Amazon: A Look at Our Tools and Processes
 
AWS Enterprise Summit Netherlands - Cost Optimisation at Scale
AWS Enterprise Summit Netherlands - Cost Optimisation at ScaleAWS Enterprise Summit Netherlands - Cost Optimisation at Scale
AWS Enterprise Summit Netherlands - Cost Optimisation at Scale
 
AWS Enterprise Summit Netherlands - Creating a Landing Zone
AWS Enterprise Summit Netherlands - Creating a Landing ZoneAWS Enterprise Summit Netherlands - Creating a Landing Zone
AWS Enterprise Summit Netherlands - Creating a Landing Zone
 
Fast Data at Scale with Amazon ElastiCache for Redis
Fast Data at Scale with Amazon ElastiCache for RedisFast Data at Scale with Amazon ElastiCache for Redis
Fast Data at Scale with Amazon ElastiCache for Redis
 

Similar to Data Warehousing with Amazon Redshift

Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
Amazon Web Services
 
Data Warehousing in the Era of Big Data
Data Warehousing in the Era of Big DataData Warehousing in the Era of Big Data
Data Warehousing in the Era of Big Data
Amazon Web Services
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
Amazon Web Services
 
Amazon Redshift Deep Dive - February Online Tech Talks
Amazon Redshift Deep Dive - February Online Tech TalksAmazon Redshift Deep Dive - February Online Tech Talks
Amazon Redshift Deep Dive - February Online Tech Talks
Amazon Web Services
 
SRV405 Deep Dive on Amazon Redshift
SRV405 Deep Dive on Amazon RedshiftSRV405 Deep Dive on Amazon Redshift
SRV405 Deep Dive on Amazon Redshift
Amazon Web Services
 
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Amazon Web Services
 
SRV405 Ancestry's Journey to Amazon Redshift
SRV405 Ancestry's Journey to Amazon RedshiftSRV405 Ancestry's Journey to Amazon Redshift
SRV405 Ancestry's Journey to Amazon Redshift
Amazon Web Services
 
Data Warehousing in the Era of Big Data: Deep Dive into Amazon Redshift
Data Warehousing in the Era of Big Data: Deep Dive into Amazon RedshiftData Warehousing in the Era of Big Data: Deep Dive into Amazon Redshift
Data Warehousing in the Era of Big Data: Deep Dive into Amazon Redshift
Amazon Web Services
 
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWS
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWSAWS SSA Webinar 20 - Getting Started with Data Warehouses on AWS
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWS
Cobus Bernard
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
Amazon Web Services
 
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
Amazon Web Services
 
Best Practices for Migrating Your Data Warehouse to Amazon Redshift
Best Practices for Migrating Your Data Warehouse to Amazon RedshiftBest Practices for Migrating Your Data Warehouse to Amazon Redshift
Best Practices for Migrating Your Data Warehouse to Amazon Redshift
Amazon Web Services
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Amazon Web Services
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Amazon Web Services
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Amazon Web Services
 
Building your data warehouse with Redshift
Building your data warehouse with RedshiftBuilding your data warehouse with Redshift
Building your data warehouse with Redshift
Amazon Web Services
 
ABD304-R-Best Practices for Data Warehousing with Amazon Redshift & Spectrum
ABD304-R-Best Practices for Data Warehousing with Amazon Redshift & SpectrumABD304-R-Best Practices for Data Warehousing with Amazon Redshift & Spectrum
ABD304-R-Best Practices for Data Warehousing with Amazon Redshift & Spectrum
Amazon Web Services
 
AWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
AWS June 2016 Webinar Series - Amazon Redshift or Big Data AnalyticsAWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
AWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
Amazon Web Services
 
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Michael Rys
 
16119 - Get to Know Your Data Sets (1).pdf
16119 - Get to Know Your Data Sets (1).pdf16119 - Get to Know Your Data Sets (1).pdf
16119 - Get to Know Your Data Sets (1).pdf
3operatordcslipiPeng
 

Similar to Data Warehousing with Amazon Redshift (20)

Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Data Warehousing in the Era of Big Data
Data Warehousing in the Era of Big DataData Warehousing in the Era of Big Data
Data Warehousing in the Era of Big Data
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Amazon Redshift Deep Dive - February Online Tech Talks
Amazon Redshift Deep Dive - February Online Tech TalksAmazon Redshift Deep Dive - February Online Tech Talks
Amazon Redshift Deep Dive - February Online Tech Talks
 
SRV405 Deep Dive on Amazon Redshift
SRV405 Deep Dive on Amazon RedshiftSRV405 Deep Dive on Amazon Redshift
SRV405 Deep Dive on Amazon Redshift
 
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
 
SRV405 Ancestry's Journey to Amazon Redshift
SRV405 Ancestry's Journey to Amazon RedshiftSRV405 Ancestry's Journey to Amazon Redshift
SRV405 Ancestry's Journey to Amazon Redshift
 
Data Warehousing in the Era of Big Data: Deep Dive into Amazon Redshift
Data Warehousing in the Era of Big Data: Deep Dive into Amazon RedshiftData Warehousing in the Era of Big Data: Deep Dive into Amazon Redshift
Data Warehousing in the Era of Big Data: Deep Dive into Amazon Redshift
 
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWS
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWSAWS SSA Webinar 20 - Getting Started with Data Warehouses on AWS
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWS
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
 
Best Practices for Migrating Your Data Warehouse to Amazon Redshift
Best Practices for Migrating Your Data Warehouse to Amazon RedshiftBest Practices for Migrating Your Data Warehouse to Amazon Redshift
Best Practices for Migrating Your Data Warehouse to Amazon Redshift
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
 
Building your data warehouse with Redshift
Building your data warehouse with RedshiftBuilding your data warehouse with Redshift
Building your data warehouse with Redshift
 
ABD304-R-Best Practices for Data Warehousing with Amazon Redshift & Spectrum
ABD304-R-Best Practices for Data Warehousing with Amazon Redshift & SpectrumABD304-R-Best Practices for Data Warehousing with Amazon Redshift & Spectrum
ABD304-R-Best Practices for Data Warehousing with Amazon Redshift & Spectrum
 
AWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
AWS June 2016 Webinar Series - Amazon Redshift or Big Data AnalyticsAWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
AWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
 
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
 
16119 - Get to Know Your Data Sets (1).pdf
16119 - Get to Know Your Data Sets (1).pdf16119 - Get to Know Your Data Sets (1).pdf
16119 - Get to Know Your Data Sets (1).pdf
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
Amazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
Amazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
Amazon Web Services
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Amazon Web Services
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
Amazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
Amazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Amazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
Amazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Amazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
Amazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Amazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
Amazon Web Services
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
Amazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
Amazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Recently uploaded

Keynote : AI & Future Of Offensive Security
Keynote : AI & Future Of Offensive SecurityKeynote : AI & Future Of Offensive Security
Keynote : AI & Future Of Offensive Security
Priyanka Aash
 
TrustArc Webinar - Innovating with TRUSTe Responsible AI Certification
TrustArc Webinar - Innovating with TRUSTe Responsible AI CertificationTrustArc Webinar - Innovating with TRUSTe Responsible AI Certification
TrustArc Webinar - Innovating with TRUSTe Responsible AI Certification
TrustArc
 
Camunda Chapter NY Meetup July 2024.pptx
Camunda Chapter NY Meetup July 2024.pptxCamunda Chapter NY Meetup July 2024.pptx
Camunda Chapter NY Meetup July 2024.pptx
ZachWylie3
 
Exchange, Entra ID, Conectores, RAML: Todo, a la vez, en todas partes
Exchange, Entra ID, Conectores, RAML: Todo, a la vez, en todas partesExchange, Entra ID, Conectores, RAML: Todo, a la vez, en todas partes
Exchange, Entra ID, Conectores, RAML: Todo, a la vez, en todas partes
jorgelebrato
 
FIDO Munich Seminar In-Vehicle Payment Trends.pptx
FIDO Munich Seminar In-Vehicle Payment Trends.pptxFIDO Munich Seminar In-Vehicle Payment Trends.pptx
FIDO Munich Seminar In-Vehicle Payment Trends.pptx
FIDO Alliance
 
FIDO Munich Seminar: Strong Workforce Authn Push & Pull Factors.pptx
FIDO Munich Seminar: Strong Workforce Authn Push & Pull Factors.pptxFIDO Munich Seminar: Strong Workforce Authn Push & Pull Factors.pptx
FIDO Munich Seminar: Strong Workforce Authn Push & Pull Factors.pptx
FIDO Alliance
 
UiPath Community Day Amsterdam: Code, Collaborate, Connect
UiPath Community Day Amsterdam: Code, Collaborate, ConnectUiPath Community Day Amsterdam: Code, Collaborate, Connect
UiPath Community Day Amsterdam: Code, Collaborate, Connect
UiPathCommunity
 
"Hands-on development experience using wasm Blazor", Furdak Vladyslav.pptx
"Hands-on development experience using wasm Blazor", Furdak Vladyslav.pptx"Hands-on development experience using wasm Blazor", Furdak Vladyslav.pptx
"Hands-on development experience using wasm Blazor", Furdak Vladyslav.pptx
Fwdays
 
Indian Privacy law & Infosec for Startups
Indian Privacy law & Infosec for StartupsIndian Privacy law & Infosec for Startups
Indian Privacy law & Infosec for Startups
AMol NAik
 
Cracking AI Black Box - Strategies for Customer-centric Enterprise Excellence
Cracking AI Black Box - Strategies for Customer-centric Enterprise ExcellenceCracking AI Black Box - Strategies for Customer-centric Enterprise Excellence
Cracking AI Black Box - Strategies for Customer-centric Enterprise Excellence
Quentin Reul
 
Mule Experience Hub and Release Channel with Java 17
Mule Experience Hub and Release Channel with Java 17Mule Experience Hub and Release Channel with Java 17
Mule Experience Hub and Release Channel with Java 17
Bhajan Mehta
 
FIDO Munich Seminar: FIDO Tech Principles.pptx
FIDO Munich Seminar: FIDO Tech Principles.pptxFIDO Munich Seminar: FIDO Tech Principles.pptx
FIDO Munich Seminar: FIDO Tech Principles.pptx
FIDO Alliance
 
Retrieval Augmented Generation Evaluation with Ragas
Retrieval Augmented Generation Evaluation with RagasRetrieval Augmented Generation Evaluation with Ragas
Retrieval Augmented Generation Evaluation with Ragas
Zilliz
 
How UiPath Discovery Suite supports identification of Agentic Process Automat...
How UiPath Discovery Suite supports identification of Agentic Process Automat...How UiPath Discovery Suite supports identification of Agentic Process Automat...
How UiPath Discovery Suite supports identification of Agentic Process Automat...
DianaGray10
 
It's your unstructured data: How to get your GenAI app to production (and spe...
It's your unstructured data: How to get your GenAI app to production (and spe...It's your unstructured data: How to get your GenAI app to production (and spe...
It's your unstructured data: How to get your GenAI app to production (and spe...
Zilliz
 
AMD Zen 5 Architecture Deep Dive from Tech Day
AMD Zen 5 Architecture Deep Dive from Tech DayAMD Zen 5 Architecture Deep Dive from Tech Day
AMD Zen 5 Architecture Deep Dive from Tech Day
Low Hong Chuan
 
Perth MuleSoft Meetup July 2024
Perth MuleSoft Meetup July 2024Perth MuleSoft Meetup July 2024
Perth MuleSoft Meetup July 2024
Michael Price
 
Generative AI technology is a fascinating field that focuses on creating comp...
Generative AI technology is a fascinating field that focuses on creating comp...Generative AI technology is a fascinating field that focuses on creating comp...
Generative AI technology is a fascinating field that focuses on creating comp...
Nohoax Kanont
 
The History of Embeddings & Multimodal Embeddings
The History of Embeddings & Multimodal EmbeddingsThe History of Embeddings & Multimodal Embeddings
The History of Embeddings & Multimodal Embeddings
Zilliz
 
FIDO Munich Seminar FIDO Automotive Apps.pptx
FIDO Munich Seminar FIDO Automotive Apps.pptxFIDO Munich Seminar FIDO Automotive Apps.pptx
FIDO Munich Seminar FIDO Automotive Apps.pptx
FIDO Alliance
 

Recently uploaded (20)

Keynote : AI & Future Of Offensive Security
Keynote : AI & Future Of Offensive SecurityKeynote : AI & Future Of Offensive Security
Keynote : AI & Future Of Offensive Security
 
TrustArc Webinar - Innovating with TRUSTe Responsible AI Certification
TrustArc Webinar - Innovating with TRUSTe Responsible AI CertificationTrustArc Webinar - Innovating with TRUSTe Responsible AI Certification
TrustArc Webinar - Innovating with TRUSTe Responsible AI Certification
 
Camunda Chapter NY Meetup July 2024.pptx
Camunda Chapter NY Meetup July 2024.pptxCamunda Chapter NY Meetup July 2024.pptx
Camunda Chapter NY Meetup July 2024.pptx
 
Exchange, Entra ID, Conectores, RAML: Todo, a la vez, en todas partes
Exchange, Entra ID, Conectores, RAML: Todo, a la vez, en todas partesExchange, Entra ID, Conectores, RAML: Todo, a la vez, en todas partes
Exchange, Entra ID, Conectores, RAML: Todo, a la vez, en todas partes
 
FIDO Munich Seminar In-Vehicle Payment Trends.pptx
FIDO Munich Seminar In-Vehicle Payment Trends.pptxFIDO Munich Seminar In-Vehicle Payment Trends.pptx
FIDO Munich Seminar In-Vehicle Payment Trends.pptx
 
FIDO Munich Seminar: Strong Workforce Authn Push & Pull Factors.pptx
FIDO Munich Seminar: Strong Workforce Authn Push & Pull Factors.pptxFIDO Munich Seminar: Strong Workforce Authn Push & Pull Factors.pptx
FIDO Munich Seminar: Strong Workforce Authn Push & Pull Factors.pptx
 
UiPath Community Day Amsterdam: Code, Collaborate, Connect
UiPath Community Day Amsterdam: Code, Collaborate, ConnectUiPath Community Day Amsterdam: Code, Collaborate, Connect
UiPath Community Day Amsterdam: Code, Collaborate, Connect
 
"Hands-on development experience using wasm Blazor", Furdak Vladyslav.pptx
"Hands-on development experience using wasm Blazor", Furdak Vladyslav.pptx"Hands-on development experience using wasm Blazor", Furdak Vladyslav.pptx
"Hands-on development experience using wasm Blazor", Furdak Vladyslav.pptx
 
Indian Privacy law & Infosec for Startups
Indian Privacy law & Infosec for StartupsIndian Privacy law & Infosec for Startups
Indian Privacy law & Infosec for Startups
 
Cracking AI Black Box - Strategies for Customer-centric Enterprise Excellence
Cracking AI Black Box - Strategies for Customer-centric Enterprise ExcellenceCracking AI Black Box - Strategies for Customer-centric Enterprise Excellence
Cracking AI Black Box - Strategies for Customer-centric Enterprise Excellence
 
Mule Experience Hub and Release Channel with Java 17
Mule Experience Hub and Release Channel with Java 17Mule Experience Hub and Release Channel with Java 17
Mule Experience Hub and Release Channel with Java 17
 
FIDO Munich Seminar: FIDO Tech Principles.pptx
FIDO Munich Seminar: FIDO Tech Principles.pptxFIDO Munich Seminar: FIDO Tech Principles.pptx
FIDO Munich Seminar: FIDO Tech Principles.pptx
 
Retrieval Augmented Generation Evaluation with Ragas
Retrieval Augmented Generation Evaluation with RagasRetrieval Augmented Generation Evaluation with Ragas
Retrieval Augmented Generation Evaluation with Ragas
 
How UiPath Discovery Suite supports identification of Agentic Process Automat...
How UiPath Discovery Suite supports identification of Agentic Process Automat...How UiPath Discovery Suite supports identification of Agentic Process Automat...
How UiPath Discovery Suite supports identification of Agentic Process Automat...
 
It's your unstructured data: How to get your GenAI app to production (and spe...
It's your unstructured data: How to get your GenAI app to production (and spe...It's your unstructured data: How to get your GenAI app to production (and spe...
It's your unstructured data: How to get your GenAI app to production (and spe...
 
AMD Zen 5 Architecture Deep Dive from Tech Day
AMD Zen 5 Architecture Deep Dive from Tech DayAMD Zen 5 Architecture Deep Dive from Tech Day
AMD Zen 5 Architecture Deep Dive from Tech Day
 
Perth MuleSoft Meetup July 2024
Perth MuleSoft Meetup July 2024Perth MuleSoft Meetup July 2024
Perth MuleSoft Meetup July 2024
 
Generative AI technology is a fascinating field that focuses on creating comp...
Generative AI technology is a fascinating field that focuses on creating comp...Generative AI technology is a fascinating field that focuses on creating comp...
Generative AI technology is a fascinating field that focuses on creating comp...
 
The History of Embeddings & Multimodal Embeddings
The History of Embeddings & Multimodal EmbeddingsThe History of Embeddings & Multimodal Embeddings
The History of Embeddings & Multimodal Embeddings
 
FIDO Munich Seminar FIDO Automotive Apps.pptx
FIDO Munich Seminar FIDO Automotive Apps.pptxFIDO Munich Seminar FIDO Automotive Apps.pptx
FIDO Munich Seminar FIDO Automotive Apps.pptx
 

Data Warehousing with Amazon Redshift

  • 1. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Tony Gibbs, Big Data Solutions Architect Data Warehousing with Amazon Redshift
  • 2. Deep Dive Agenda • Amazon Redshift history and development • Cluster architecture • Concepts and terminology • Node components • Storage deep dive • Design considerations • Parallelism deep dive • Open Q&A
  • 3. Amazon Redshift History & Development
  • 4. Columnar MPP OLAP AWS IAMAmazon VPC Amazon SWF Amazon S3 AWS KMS Amazon Route 53 Amazon CloudWatch Amazon EC2
  • 5. February 2013 October 2016 > 100 Significant Patches > 130 Significant Features
  • 7. Amazon Redshift Cluster Architecture Massively parallel, shared nothing Leader node • SQL endpoint • Stores metadata • Coordinates parallel SQL processing Compute nodes • Local, columnar storage • Executes queries in parallel • Load, backup, restore 10 GigE (HPC) Ingestion Backup Restore SQL Clients/BI Tools 128GB RAM 16TB disk 16 cores S3 / EMR / DynamoDB / SSH JDBC/ODBC 128GB RAM 16TB disk 16 coresCompute Node 128GB RAM 16TB disk 16 coresCompute Node 128GB RAM 16TB disk 16 coresCompute Node Leader Node
  • 8. Compute & Leader Node Components
  • 9. 128GB RAM 16TB disk 16 cores 128GB RAM 16TB disk 16 cores Compute Node 128GB RAM 16TB disk 16 cores Compute Node 128GB RAM 16TB disk 16 cores Compute Node Leader Node
  • 10. 128GB RAM 16TB disk 16 cores 128GB RAM 16TB disk 16 cores Compute Node 128GB RAM 16TB disk 16 cores Compute Node 128GB RAM 16TB disk 16 cores Compute Node Leader Node • Parser & Rewriter • Planner & Optimizer • Code Generator • Input: Optimized plan • Output: >=1 C++ functions • Compiler • Task Scheduler • WLM • Admission • Scheduling • PostgreSQL Catalog Tables
  • 11. 128GB RAM 16TB disk 16 cores 128GB RAM 16TB disk 16 cores Compute Node 128GB RAM 16TB disk 16 cores Compute Node 128GB RAM 16TB disk 16 cores Compute Node Leader Node • Query execution processes • Backup & restore processes • Replication processes • Local Storage • Disks • Slices • Tables • Columns • Blocks • Superblocks
  • 12. 128GB RAM 16TB disk 16 cores 128GB RAM 16TB disk 16 cores Compute Node 128GB RAM 16TB disk 16 cores Compute Node 128GB RAM 16TB disk 16 cores Compute Node Leader Node • Query execution processes • Backup & restore processes • Replication processes • Local Storage • Disks • Slices • Tables • Columns • Blocks • Superblocks
  • 14. Designed for I/O Reduction Columnar storage Data compression Zone maps aid loc dt 1 SFO 2016-09-01 2 JFK 2016-09-14 3 SFO 2017-04-01 4 JFK 2017-05-14 • Accessing dt with row storage: – Need to read everything – Unnecessary I/O aid loc dt CREATE TABLE loft_deep_dive ( aid INT --audience_id ,loc CHAR(3) --location ,dt DATE --date );
  • 15. Designed for I/O Reduction Columnar storage Data compression Zone maps aid loc dt 1 SFO 2016-09-01 2 JFK 2016-09-14 3 SFO 2017-04-01 4 JFK 2017-05-14 • Accessing dt with columnar storage: – Only scan blocks for relevant column aid loc dt CREATE TABLE loft_deep_dive ( aid INT --audience_id ,loc CHAR(3) --location ,dt DATE --date );
  • 16. Designed for I/O Reduction Columnar storage Data compression Zone maps aid loc dt 1 SFO 2016-09-01 2 JFK 2016-09-14 3 SFO 2017-04-01 4 JFK 2017-05-14 • Columns grow and shrink independently • Effective compression ratios due to like data • Reduces storage requirements • Reduces I/O aid loc dt CREATE TABLE loft_deep_dive ( aid INT ENCODE LZO ,loc CHAR(3) ENCODE BYTEDICT ,dt DATE ENCODE RUNLENGTH );
  • 17. Designed for I/O Reduction Columnar storage Data compression Zone maps aid loc dt 1 SFO 2016-09-01 2 JFK 2016-09-14 3 SFO 2017-04-01 4 JFK 2017-05-14 aid loc dt CREATE TABLE loft_deep_dive ( aid INT --audience_id ,loc CHAR(3) --location ,dt DATE --date ); • In-memory block metadata • Contains per-block MIN and MAX value • Effectively prunes blocks which cannot contain data for a given query • Eliminates unnecessary I/O
  • 18. SELECT COUNT(*) FROM LOGS WHERE DATE = '09-JUNE-2013' MIN: 01-JUNE-2013 MAX: 20-JUNE-2013 MIN: 08-JUNE-2013 MAX: 30-JUNE-2013 MIN: 12-JUNE-2013 MAX: 20-JUNE-2013 MIN: 02-JUNE-2013 MAX: 25-JUNE-2013 Unsorted Table MIN: 01-JUNE-2013 MAX: 06-JUNE-2013 MIN: 07-JUNE-2013 MAX: 12-JUNE-2013 MIN: 13-JUNE-2013 MAX: 18-JUNE-2013 MIN: 19-JUNE-2013 MAX: 24-JUNE-2013 Sorted By Date Zone Maps
  • 19. Terminology and Concepts: Data Sorting • Goals: • Physically order rows of table data based on certain column(s) • Optimize effectiveness of zone maps • Enable MERGE JOIN operations • Impact: • Enables rrscans to prune blocks by leveraging zone maps • Overall reduction in block I/O • Achieved with the table property SORTKEY defined over one or more columns • Optimal SORTKEY is dependent on: • Query patterns • Data profile • Business requirements
  • 20. Terminology and Concepts: Slices A slice can be thought of like a “virtual compute node” • Unit of data partitioning • Parallel query processing Facts about slices: • Each compute node has either 2, 16, or 32 slices • Table rows are distributed to slices • A slice processes only its own data
  • 21. Data Distribution • Distribution style is a table property which dictates how that table’s data is distributed throughout the cluster: • KEY: Value is hashed, same value goes to same location (slice) • ALL: Full table data goes to first slice of every node • EVEN: Round robin • Goals: • Distribute data evenly for parallel processing • Minimize data movement during query processing KEY ALL Node 1 Slice 1 Slice 2 Node 2 Slice 3 Slice 4 Node 1 Slice 1 Slice 2 Node 2 Slice 3 Slice 4 Node 1 Slice 1 Slice 2 Node 2 Slice 3 Slice 4 EVEN
  • 22. Data Distribution: Example CREATE TABLE loft_deep_dive ( aid INT --audience_id ,loc CHAR(3) --location ,dt DATE --date ) DISTSTYLE (EVEN|KEY|ALL); CN1 Slice 0 Slice 1 CN2 Slice 2 Slice 3 Table: loft_deep_dive User Columns System Columns aid loc dt ins del row
  • 23. Data Distribution: EVEN Example CREATE TABLE loft_deep_dive ( aid INT --audience_id ,loc CHAR(3) --location ,dt DATE --date ) DISTSTYLE EVEN; CN1 Slice 0 Slice 1 CN2 Slice 2 Slice 3 INSERT INTO loft_deep_dive VALUES (1, 'SFO', '2016-09-01'), (2, 'JFK', '2016-09-14'), (3, 'SFO', '2017-04-01'), (4, 'JFK', '2017-05-14'); Table: loft_deep_dive User Columns System Columns aid loc dt ins del row Table: loft_deep_dive User Columns System Columns aid loc dt ins del row Table: loft_deep_dive User Columns System Columns aid loc dt ins del row Table: loft_deep_dive User Columns System Columns aid loc dt ins del row Rows: 0 Rows: 0 Rows: 0 Rows: 0 (3 User Columns + 3 System Columns) x (4 slices) = 24 Blocks (24MB) Rows: 1 Rows: 1 Rows: 1 Rows: 1
  • 24. Data Distribution: KEY Example #1 CREATE TABLE loft_deep_dive ( aid INT --audience_id ,loc CHAR(3) --location ,dt DATE --date ) DISTSTYLE KEY DISTKEY (loc); CN1 Slice 0 Slice 1 CN2 Slice 2 Slice 3 INSERT INTO loft_deep_dive VALUES (1, 'SFO', '2016-09-01'), (2, 'JFK', '2016-09-14'), (3, 'SFO', '2017-04-01'), (4, 'JFK', '2017-05-14'); Table: loft_deep_dive User Columns System Columns aid loc dt ins del row Rows: 2 Rows: 0 Rows: 0 (3 User Columns + 3 System Columns) x (2 slices) = 12 Blocks (12MB) Rows: 0Rows: 1 Table: loft_deep_dive User Columns System Columns aid loc dt ins del row Rows: 2Rows: 0Rows: 1
  • 25. Data Distribution: KEY Example #2 CREATE TABLE loft_deep_dive ( aid INT --audience_id ,loc CHAR(3) --location ,dt DATE --date ) DISTSTYLE KEY DISTKEY (aid); CN1 Slice 0 Slice 1 CN2 Slice 2 Slice 3 INSERT INTO loft_deep_dive VALUES (1, 'SFO', '2016-09-01'), (2, 'JFK', '2016-09-14'), (3, 'SFO', '2017-04-01'), (4, 'JFK', '2017-05-14'); Table: loft_deep_dive User Columns System Columns aid loc dt ins del row Table: loft_deep_dive User Columns System Columns aid loc dt ins del row Table: loft_deep_dive User Columns System Columns aid loc dt ins del row Table: loft_deep_dive User Columns System Columns aid loc dt ins del row Rows: 0 Rows: 0 Rows: 0 Rows: 0 (3 User Columns + 3 System Columns) x (4 slices) = 24 Blocks (24MB) Rows: 1 Rows: 1 Rows: 1 Rows: 1
  • 26. Data Distribution: ALL Example CREATE TABLE loft_deep_dive ( aid INT --audience_id ,loc CHAR(3) --location ,dt DATE --date ) DISTSTYLE ALL; CN1 Slice 0 Slice 1 CN2 Slice 2 Slice 3 INSERT INTO loft_deep_dive VALUES (1, 'SFO', '2016-09-01'), (2, 'JFK', '2016-09-14'), (3, 'SFO', '2017-04-01'), (4, 'JFK', '2017-05-14'); Rows: 0 Rows: 0 (3 User Columns + 3 System Columns) x (2 slice) = 12 Blocks (12MB) Table: loft_deep_dive User Columns System Columns aid loc dt ins del row Rows: 0Rows: 1Rows: 2Rows: 4Rows: 3 Table: loft_deep_dive User Columns System Columns aid loc dt ins del row Rows: 0Rows: 1Rows: 2Rows: 4Rows: 3
  • 27. Terminology and Concepts: Data Distribution KEY • The key creates an even distribution of data • Joins are performed between large fact/dimension tables • Optimizing merge joins and group by ALL • Small and medium size dimension tables (< 2-3M) EVEN • When key cannot produce an even distribution
  • 29. Storage Deep Dive: Disks Amazon Redshift utilizes locally attached storage devices • Compute nodes have 2.5-3x the advertised storage capacity 1, 3, 8, or 24 disks depending on node type Each disk is split into two partitions • Local data storage, accessed by local CN • Mirrored data, accessed by remote CN Partitions are raw devices • Local storage devices are ephemeral in nature • Tolerant to multiple disk failures on a single node
  • 30. Storage Deep Dive: Blocks Column data is persisted to 1MB immutable blocks Each block contains in-memory metadata: • Zone Maps (MIN/MAX value) • Location of previous/next block • Blocks are individually compressed with 1 of 10 encodings A full block contains between 16 and 8.4 million values
  • 31. Storage Deep Dive: Columns Column: Logical structure accessible via SQL Physical structure is a doubly linked list of blocks These blockchains exist on each slice for each column All sorted & unsorted blockchains compose a column Column properties include: • Distribution Key • Sort Key • Compression Encoding Columns shrink and grow independently, 1 block at a time Three system columns per table-per slice for MVCC
  • 32. Block Properties: Design Considerations • Small writes: • Batch processing system, optimized for processing massive amounts of data • 1MB size + immutable blocks means that we clone blocks on write so as not to introduce fragmentation • Small write (~1-10 rows) has similar cost to a larger write (~100 K rows) • UPDATE and DELETE: • Immutable blocks means that we only logically delete rows on UPDATE or DELETE • Must VACUUM or DEEP COPY to remove ghost rows from table
  • 33. Column Properties: Design Considerations • Compression: • COPY automatically analyzes and compresses data when loading into empty tables • ANALYZE COMPRESSION checks existing tables and proposes optimal compression algorithms for each column • Changing column encoding requires a table rebuild • DISTKEY and SORTKEY significantly influence performance (orders of magnitude) • Distribution Keys: • A poor DISTKEY can introduce data skew and an unbalanced workload • A query completes only as fast as the slowest slice completes • Sort Keys: • A sortkey is only effective as the data profile allows it to be • Selectivity needs to be considered
  • 35. Storage Deep Dive: Slices Each compute node has either 2, 16, or 32 slices A slice can be thought of like a “virtual compute node” • Unit of data partitioning • Parallel query processing Facts about slices: • Table rows are distributed to slices • A slice processes only its own data • Within a compute node all slices read from and write to all disks
  • 36. 128GB RAM 16TB disk 16 cores 128GB RAM 16TB disk 16 cores Compute Node 128GB RAM 16TB disk 16 cores Compute Node 128GB RAM 16TB disk 16 cores Compute Node Leader Node • Parser & Rewriter • Planner & Optimizer • Code Generator • Input: Optimized plan • Output: >=1 C++ functions • Compiler • Task Scheduler • WLM • Admission • Scheduling • PostgreSQL Catalog Tables • Amazon Redshift System Tables (STV)
  • 37. 128GB RAM 16TB disk 16 cores 128GB RAM 16TB disk 16 cores Compute Node 128GB RAM 16TB disk 16 cores Compute Node 128GB RAM 16TB disk 16 cores Compute Node Leader Node • Parser & Rewriter • Planner & Optimizer • Code Generator • Input: Optimized plan • Output: >=1 C++ functions • Compiler • Task Scheduler • WLM • Admission • Scheduling • PostgreSQL Catalog Tables • Amazon Redshift System Tables (STV)
  • 38. Query Execution Terminology Step: An individual operation needed during query execution. Steps need to be combined to allow compute nodes to perform a join. Examples: scan, sort, hash, aggr Segment: A combination of several steps that can be done by a single process. The smallest compilation unit executable by a slice. Segments within a stream run in parallel. Stream: A collection of combined segments which output to the next stream or SQL client.
  • 39. Visualizing Streams, Segments, and Steps Stream 0 Segment 0 Step 0 Step 1 Step 2 Segment 1 Step 0 Step 1 Step 2 Step 3 Step 4 Segment 2 Step 0 Step 1 Step 2 Step 3 Segment 3 Step 0 Step 1 Step 2 Step 3 Step 4 Step 5 Stream 1 Segment 4 Step 0 Step 1 Step 2 Step 3 Segment 5 Step 0 Step 1 Step 2 Segment 6 Step 0 Step 1 Step 2 Step 3 Step 4 Stream 2 Segment 7 Step 0 Step 1 Segment 8 Step 0 Step 1 Time
  • 40. Query Lifecycle client JDBC ODBC Leader Node Parser Query Planner Code Generator Final Computations Generate code for all segments of one stream Explain Plans Compute Node Receive Compiled Code Run the Compiled Code Return results to Leader Compute Node Receive Compiled Code Run the Compiled Code Return results to Leader Return results to client Segments in a stream are executed concurrently. Each step in a segment is executed serially.
  • 41. Query Execution Deep Dive: Leader Node 1.The leader node receives the query and parses the SQL. 2.The parser produces a logical representation of the original query. 3.This query tree is input into the query optimizer (volt). 4.Volt rewrites the query to maximize its efficiency. Sometimes a single query will be rewritten as several dependent statements in the background. 5.The rewritten query is sent to the planner which generates >= 1 query plans for the execution with the best estimated performance. 6.The query plan is sent to the execution engine, where it’s translated into steps, segments, and streams. 7.This translated plan is sent to the code generator, which generates a C++ function for each segment. 8.This generated C++ is compiled with gcc to a .o file and distributed to the compute nodes.
  • 42. Query Execution Deep Dive: Compute Nodes • Slices execute query segments in parallel • Executable segments are created for one stream at a time in sequence • When the compute nodes are done, they return the query results to the leader node for final processing • Leader node merges data into a single result set and addresses any needed sorting or aggregation • Leader node then returns the results to the client
  • 43. Visualizing Streams, Segments, and Steps Stream 0 Segment 0 Step 0 Step 1 Step 2 Segment 1 Step 0 Step 1 Step 2 Step 3 Step 4 Segment 2 Step 0 Step 1 Step 2 Step 3 Segment 3 Step 0 Step 1 Step 2 Step 3 Step 4 Step 5 Stream 1 Segment 4 Step 0 Step 1 Step 2 Step 3 Segment 5 Step 0 Step 1 Step 2 Segment 6 Step 0 Step 1 Step 2 Step 3 Step 4 Stream 2 Segment 7 Step 0 Step 1 Segment 8 Step 0 Step 1 Time
  • 44. Query Execution Stream 0 Segment 0 Step 0 Step 1 Step 2 Segment 1 Step 0 Step 1 Step 2 Step 3 Step 4 Segment 2 Step 0 Step 1 Step 2 Step 3 Segment 3 Step 0 Step 1 Step 2 Step 3 Step 4 Step 5 Stream 1 Segment 4 Step 0 Step 1 Step 2 Step 3 Segment 5 Step 0 Step 1 Step 2 Segment 6 Step 0 Step 1 Step 2 Step 3 Step 4 Stream 2 Segment 7 Step 0 Step 1 Segment 8 Step 0 Step 1 Time Stream 0 Segment 0 Step 0 Step 1 Step 2 Segment 1 Step 0 Step 1 Step 2 Step 3 Step 4 Segment 2 Step 0 Step 1 Step 2 Step 3 Segment 3 Step 0 Step 1 Step 2 Step 3 Step 4 Step 5 Stream 1 Segment 4 Step 0 Step 1 Step 2 Step 3 Segment 5 Step 0 Step 1 Step 2 Segment 6 Step 0 Step 1 Step 2 Step 3 Step 4 Stream 2 Segment 7 Step 0 Step 1 Segment 8 Step 0 Step 1 Stream 0 Segment 0 Step 0 Step 1 Step 2 Segment 1 Step 0 Step 1 Step 2 Step 3 Step 4 Segment 2 Step 0 Step 1 Step 2 Step 3 Segment 3 Step 0 Step 1 Step 2 Step 3 Step 4 Step 5 Stream 1 Segment 4 Step 0 Step 1 Step 2 Step 3 Segment 5 Step 0 Step 1 Step 2 Segment 6 Step 0 Step 1 Step 2 Step 3 Step 4 Stream 2 Segment 7 Step 0 Step 1 Segment 8 Step 0 Step 1 Stream 0 Segment 0 Step 0 Step 1 Step 2 Segment 1 Step 0 Step 1 Step 2 Step 3 Step 4 Segment 2 Step 0 Step 1 Step 2 Step 3 Segment 3 Step 0 Step 1 Step 2 Step 3 Step 4 Step 5 Stream 1 Segment 4 Step 0 Step 1 Step 2 Step 3 Segment 5 Step 0 Step 1 Step 2 Segment 6 Step 0 Step 1 Step 2 Step 3 Step 4 Stream 2 Segment 7 Step 0 Step 1 Segment 8 Step 0 Step 1 Slices 0 1 2 3
  • 45. Parallelism considerations with Redshift slices DS2.8XL Compute Node Ingestion Throughput: • Each slice’s query processors can load one file at a time: • Streaming decompression • Parse • Distribute • Write Realizing only partial node usage as 6.25% of slices are active 0 2 4 6 8 10 12 141 3 5 7 9 11 13 15
  • 46. Design considerations for Redshift slices Use at least as many input files as there are slices in the cluster With 16 input files, all slices are working so you maximize throughput COPY continues to scale linearly as you add nodes 16 Input Files DS2.8XL Compute Node 0 2 4 6 8 10 12 141 3 5 7 9 11 13 15
  • 47. Open source tools https://github.com/awslabs/amazon-redshift-utils https://github.com/awslabs/amazon-redshift-monitoring https://github.com/awslabs/amazon-redshift-udfs Admin scripts Collection of utilities for running diagnostics on your cluster Admin views Collection of utilities for managing your cluster, generating schema DDL, etc. ColumnEncodingUtility Gives you the ability to apply optimal column encoding to an established schema with data already loaded