SlideShare a Scribd company logo
Deep Dive on Amazon Redshift
Storage Subsystem and Query Life Cycle
Eric Ferreira, Principal Database Engineer, Amazon Redshift
Mar 2017
Deep Dive Overview
• Amazon Redshift History and Development
• Cluster Architecture
• Concepts and Terminology
• Storage Deep Dive
• Design Considerations
• Query Life Cycle
• New & Upcoming Feature
• Open Q&A
Amazon Redshift History & Development
Amazon S3 AWSKMS Amazon
Route 53
Amazon EC2
PostgreSQL Amazon Redshift
February 2013
February 2017
> 100 Significant Patches
> 140 Significant Features
Amazon Redshift Cluster Architecture
Redshift Cluster Architecture
• Massively parallel, shared nothing
• Leader node
– SQL endpoint
– Stores metadata
– Coordinates parallel SQL processing
• Compute nodes
– Local, columnar storage
– Executes queries in parallel
– Load, backup, restore
10 GigE
SQL Clients/BI Tools
16TB disk
16 cores
S3 / EMR / DynamoDB / SSH
16TB disk
16 coresCompute
16TB disk
16 coresCompute
16TB disk
16 coresCompute
16TB disk
16 cores
16TB disk
16 cores
Compute Node
16TB disk
16 cores
Compute Node
16TB disk
16 cores
Compute Node
Leader Node
16TB disk
16 cores
16TB disk
16 cores
Compute Node
16TB disk
16 cores
Compute Node
16TB disk
16 cores
Compute Node
Leader Node
• Parser & Rewriter
• Planner & Optimizer
• Code Generator
• Input: Optimized plan
• Output: >=1 C++ functions
• Compiler
• Task Scheduler
• Admission
• Scheduling
• PostgreSQL Catalog Tables
16TB disk
16 cores
16TB disk
16 cores
Compute Node
16TB disk
16 cores
Compute Node
16TB disk
16 cores
Compute Node
Leader Node
• Query execution processes
• Backup & restore processes
• Replication processes
• Local Storage
• Disks
• Slices
• Tables
• Columns
• Blocks
16TB disk
16 cores
16TB disk
16 cores
Compute Node
16TB disk
16 cores
Compute Node
16TB disk
16 cores
Compute Node
Leader Node
• Query execution processes
• Backup & restore processes
• Replication processes
• Local Storage
• Disks
• Slices
• Tables
• Columns
• Blocks
Concepts and Terminology
Designed for I/O Reduction
• Columnar storage
• Data compression
• Zone maps
aid loc dt
1 SFO 2016-09-01
2 JFK 2016-09-14
3 SFO 2017-04-01
4 JFK 2017-05-14
• Accessing dt with row storage:
– Need to read everything
– Unnecessary I/O
aid loc dt
CREATE TABLE loft_deep_dive (
aid INT --audience_id
,loc CHAR(3) --location
,dt DATE --date
Designed for I/O Reduction
• Columnar storage
• Data compression
• Zone maps
aid loc dt
1 SFO 2016-09-01
2 JFK 2016-09-14
3 SFO 2017-04-01
4 JFK 2017-05-14
• Accessing dt with columnar storage:
– Only scan blocks for relevant
aid loc dt
CREATE TABLE loft_deep_dive (
aid INT --audience_id
,loc CHAR(3) --location
,dt DATE --date
Designed for I/O Reduction
• Columnar storage
• Data compression
• Zone maps
aid loc dt
1 SFO 2016-09-01
2 JFK 2016-09-14
3 SFO 2017-04-01
4 JFK 2017-05-14
• Columns grow and shrink independently
• Effective compression ratios due to like data
• Reduces storage requirements
• Reduces I/O
aid loc dt
CREATE TABLE loft_deep_dive (
Designed for I/O Reduction
• Columnar storage
• Data compression
• Zone maps
aid loc dt
1 SFO 2016-09-01
2 JFK 2016-09-14
3 SFO 2017-04-01
4 JFK 2017-05-14
aid loc dt
CREATE TABLE loft_deep_dive (
aid INT --audience_id
,loc CHAR(3) --location
,dt DATE --date
• In-memory block metadata
• Contains per-block MIN and MAX value
• Effectively prunes blocks which cannot
contain data for a given query
• Eliminates unnecessary I/O
MIN: 01-JUNE-2013
MAX: 20-JUNE-2013
MIN: 08-JUNE-2013
MAX: 30-JUNE-2013
MIN: 12-JUNE-2013
MAX: 20-JUNE-2013
MIN: 02-JUNE-2013
MAX: 25-JUNE-2013
Unsorted Table
MIN: 01-JUNE-2013
MAX: 06-JUNE-2013
MIN: 07-JUNE-2013
MAX: 12-JUNE-2013
MIN: 13-JUNE-2013
MAX: 18-JUNE-2013
MIN: 19-JUNE-2013
MAX: 24-JUNE-2013
Sorted By Date
Zone Maps
Terminology and Concepts: Data Sorting
• Goals:
• Physically order rows of table data based on certain column(s)
• Optimize effectiveness of zone maps
• Enable MERGE JOIN operations
• Impact:
• Enables rrscans to prune blocks by leveraging zone maps
• Overall reduction in block I/O
• Achieved with the table property SORTKEY defined over one or more columns
• Optimal SORTKEY is dependent on:
• Query patterns
• Data profile
• Business requirements
Terminology and Concepts: Slices
• A slice can be thought of like a “virtual compute node”
– Unit of data partitioning
– Parallel query processing
• Facts about slices:
– Each compute node has either 2, 16, or 32 slices
– Table rows are distributed to slices
– A slice processes only its own data
Data Distribution
• Distribution style is a table property which dictates how that table’s data is
distributed throughout the cluster:
• KEY: Value is hashed, same value goes to same location (slice)
• ALL: Full table data goes to first slice of every node
• EVEN: Round robin
• Goals:
• Distribute data evenly for parallel processing
• Minimize data movement during query processing
Node 1
Slice 1 Slice 2
Node 2
Slice 3 Slice 4
Node 1
Slice 1 Slice 2
Node 2
Slice 3 Slice 4
Node 1
Slice 1 Slice 2
Node 2
Slice 3 Slice 4
Data Distribution: Example
CREATE TABLE loft_deep_dive (
aid INT --audience_id
,loc CHAR(3) --location
,dt DATE --date
Slice 0 Slice 1
Slice 2 Slice 3
Table: loft_deep_dive
User Columns System Columns
aid loc dt ins del row
Data Distribution: EVEN Example
CREATE TABLE loft_deep_dive (
aid INT --audience_id
,loc CHAR(3) --location
,dt DATE --date
Slice 0 Slice 1
Slice 2 Slice 3
INSERT INTO loft_deep_dive VALUES
(1, 'SFO', '2016-09-01'),
(2, 'JFK', '2016-09-14'),
(3, 'SFO', '2017-04-01'),
(4, 'JFK', '2017-05-14');
Table: loft_deep_dive
User Columns System Columns
aid loc dt ins del row
Table: loft_deep_dive
User Columns System Columns
aid loc dt ins del row
Table: loft_deep_dive
User Columns System Columns
aid loc dt ins del row
Table: loft_deep_dive
User Columns System Columns
aid loc dt ins del row
Rows: 0 Rows: 0 Rows: 0 Rows: 0
(3 User Columns + 3 System Columns) x (4 slices) = 24 Blocks (24MB)
Rows: 1 Rows: 1 Rows: 1 Rows: 1
Data Distribution: KEY Example #1
CREATE TABLE loft_deep_dive (
aid INT --audience_id
,loc CHAR(3) --location
,dt DATE --date
Slice 0 Slice 1
Slice 2 Slice 3
INSERT INTO loft_deep_dive VALUES
(1, 'SFO', '2016-09-01'),
(2, 'JFK', '2016-09-14'),
(3, 'SFO', '2017-04-01'),
(4, 'JFK', '2017-05-14');
Table: loft_deep_dive
User Columns System Columns
aid loc dt ins del row
Rows: 2 Rows: 0 Rows: 0
(3 User Columns + 3 System Columns) x (2 slices) = 12 Blocks (12MB)
Rows: 0Rows: 1
Table: loft_deep_dive
User Columns System Columns
aid loc dt ins del row
Rows: 2Rows: 0Rows: 1
Data Distribution: KEY Example #2
CREATE TABLE loft_deep_dive (
aid INT --audience_id
,loc CHAR(3) --location
,dt DATE --date
Slice 0 Slice 1
Slice 2 Slice 3
INSERT INTO loft_deep_dive VALUES
(1, 'SFO', '2016-09-01'),
(2, 'JFK', '2016-09-14'),
(3, 'SFO', '2017-04-01'),
(4, 'JFK', '2017-05-14');
Table: loft_deep_dive
User Columns System Columns
aid loc dt ins del row
Table: loft_deep_dive
User Columns System Columns
aid loc dt ins del row
Table: loft_deep_dive
User Columns System Columns
aid loc dt ins del row
Table: loft_deep_dive
User Columns System Columns
aid loc dt ins del row
Rows: 0 Rows: 0 Rows: 0 Rows: 0
(3 User Columns + 3 System Columns) x (4 slices) = 24 Blocks (24MB)
Rows: 1 Rows: 1 Rows: 1 Rows: 1
Data Distribution: ALL Example
CREATE TABLE loft_deep_dive (
aid INT --audience_id
,loc CHAR(3) --location
,dt DATE --date
Slice 0 Slice 1
Slice 2 Slice 3
INSERT INTO loft_deep_dive VALUES
(1, 'SFO', '2016-09-01'),
(2, 'JFK', '2016-09-14'),
(3, 'SFO', '2017-04-01'),
(4, 'JFK', '2017-05-14');
Rows: 0 Rows: 0
(3 User Columns + 3 System Columns) x (2 slice) = 12 Blocks (12MB)
Table: loft_deep_dive
User Columns System Columns
aid loc dt ins del row
Rows: 0Rows: 1Rows: 2Rows: 4Rows: 3
Table: loft_deep_dive
User Columns System Columns
aid loc dt ins del row
Rows: 0Rows: 1Rows: 2Rows: 4Rows: 3
Terminology and Concepts: Data Distribution
– The key creates an even distribution of data
– Joins are performed between large fact/dimension tables
– Optimizing merge joins and group by
– Small and medium size dimension tables (< 2-3M)
– When key cannot produce an even distribution
Storage Deep Dive
Storage Deep Dive: Disks
• Redshift utilizes locally attached storage devices
• Compute nodes have 2.5-3x the advertised storage capacity
• 1, 3, 8, or 24 disks depending on node type
• Each disk is split into two partitions
– Local data storage, accessed by local CN
– Mirrored data, accessed by remote CN
• Partitions are raw devices
– Local storage devices are ephemeral in nature
– Tolerant to multiple disk failures on a single node
Storage Deep Dive: Blocks
• Column data is persisted to 1MB immutable blocks
• Each block contains in-memory metadata:
– Zone Maps (MIN/MAX value)
– Location of previous/next block
• Blocks are individually compressed with 1 of 10 encodings
• A full block contains between 16 and 8.4 million values
Storage Deep Dive: Columns
• Column: Logical structure accessible via SQL
• Physical structure is a doubly linked list of blocks
• These blockchains exist on each slice for each column
• All sorted & unsorted blockchains compose a column
• Column properties include:
– Distribution Key
– Sort Key
– Compression Encoding
• Columns shrink and grow independently, 1 block at a time
• Three system columns per table-per slice for MVCC
Block Properties: Design Considerations
• Small writes:
• Batch processing system, optimized for processing massive amounts of data
• 1MB size + immutable blocks means that we clone blocks on write so as not to
introduce fragmentation
• Small write (~1-10 rows) has similar cost to a larger write (~100 K rows)
• Immutable blocks means that we only logically delete rows on UPDATE or DELETE
• Must VACUUM or DEEP COPY to remove ghost rows from table
Column Properties: Design Considerations
• Compression:
• COPY automatically analyzes and compresses data when loading into empty tables
• ANALYZE COMPRESSION checks existing tables and proposes optimal
compression algorithms for each column
• Changing column encoding requires a table rebuild
• DISTKEY and SORTKEY significantly influence performance (orders of magnitude)
• Distribution Keys:
• A poor DISTKEY can introduce data skew and an unbalanced workload
• A query completes only as fast as the slowest slice completes
• Sort Keys:
• A sortkey is only effective as the data profile allows it to be
• Selectivity needs to be considered
Parallelism Deep Dive
Storage Deep Dive: Slices
• Each compute node has either 2, 16, or 32 slices
• A slice can be thought of like a “virtual compute node”
– Unit of data partitioning
– Parallel query processing
• Facts about slices:
– Table rows are distributed to slices
– A slice processes only its own data
– Within a compute node all slices read from and write to all disks
16TB disk
16 cores
16TB disk
16 cores
Compute Node
16TB disk
16 cores
Compute Node
16TB disk
16 cores
Compute Node
Leader Node
• Parser & Rewriter
• Planner & Optimizer
• Code Generator
• Input: Optimized plan
• Output: >=1 C++ functions
• Compiler
• Task Scheduler
• Admission
• Scheduling
• PostgreSQL Catalog Tables
• Redshift System Tables (STV)
16TB disk
16 cores
16TB disk
16 cores
Compute Node
16TB disk
16 cores
Compute Node
16TB disk
16 cores
Compute Node
Leader Node
• Parser & Rewriter
• Planner & Optimizer
• Code Generator
• Input: Optimized plan
• Output: >=1 C++ functions
• Compiler
• Task Scheduler
• Admission
• Scheduling
• PostgreSQL Catalog Tables
• Redshift System Tables (STV)
Query Execution Terminology
• Step: An individual operation needed during query execution. Steps need to be
combined to allow compute nodes to perform a join. Examples: scan, sort,
hash, aggr
• Segment: A combination of several steps that can be done by a single process.
The smallest compilation unit executable by a slice. Segments within a stream
run in parallel.
• Stream: A collection of combined segments which output to the next stream or
SQL client.
Visualizing Streams, Segments, and Steps
Stream 0
Segment 0
Step 0 Step 1 Step 2
Segment 1
Step 0 Step 1 Step 2 Step 3 Step 4
Segment 2
Step 0 Step 1 Step 2 Step 3
Segment 3
Step 0 Step 1 Step 2 Step 3 Step 4 Step 5
Stream 1
Segment 4
Step 0 Step 1 Step 2 Step 3
Segment 5
Step 0 Step 1 Step 2
Segment 6
Step 0 Step 1 Step 2 Step 3 Step 4
Stream 2
Segment 7
Step 0 Step 1
Segment 8
Step 0 Step 1
Leader Node
Query Planner
Code Generator
Final Computations
Generate code for
all segments of
one stream
Explain Plans
Compute Node
Receive Compiled Code
Run the Compiled Code
Return results to Leader
Compute Node
Receive Compiled Code
Run the Compiled Code
Return results to Leader
Return results to client
Segments in a stream are
executed concurrently.
Each step in a segment is
executed serially.
Query Lifecycle
Query Execution Deep Dive: Leader Node
1. The leader node receives the query and parses the SQL.
2. The parser produces a logical representation of the original query.
3. This query tree is input into the query optimizer (volt).
4. Volt rewrites the query to maximize its efficiency. Sometimes a single query will be
rewritten as several dependent statements in the background.
5. The rewritten query is sent to the planner which generates >= 1 query plans for the
execution with the best estimated performance.
6. The query plan is sent to the execution engine, where it’s translated into steps,
segments, and streams.
7. This translated plan is sent to the code generator, which generates a C++ function
for each segment.
8. This generated C++ is compiled with gcc to a .o file and distributed to the compute
Query Execution Deep Dive: Compute Nodes
• Slices execute the query segments in parallel.
• Executable segments are created for one stream at a time. When the segments
of that stream are complete, the engine generates the segments for the next
• When the compute nodes are done, they return the query results to the leader
node for final processing.
• The leader node merges the data into a single result set and addresses any
needed sorting or aggregation.
• The leader node then returns the results to the client.
Visualizing Streams, Segments, and Steps
Stream 0
Segment 0
Step 0 Step 1 Step 2
Segment 1
Step 0 Step 1 Step 2 Step 3 Step 4
Segment 2
Step 0 Step 1 Step 2 Step 3
Segment 3
Step 0 Step 1 Step 2 Step 3 Step 4 Step 5
Stream 1
Segment 4
Step 0 Step 1 Step 2 Step 3
Segment 5
Step 0 Step 1 Step 2
Segment 6
Step 0 Step 1 Step 2 Step 3 Step 4
Stream 2
Segment 7
Step 0 Step 1
Segment 8
Step 0 Step 1
Query Execution
Stream 0
Segment 0
Step 0 Step 1 Step 2
Segment 1
Step 0 Step 1 Step 2 Step 3 Step 4
Segment 2
Step 0 Step 1 Step 2 Step 3
Segment 3
Step 0 Step 1 Step 2 Step 3 Step 4 Step 5
Stream 1
Segment 4
Step 0 Step 1 Step 2 Step 3
Segment 5
Step 0 Step 1 Step 2
Segment 6
Step 0 Step 1 Step 2 Step 3 Step 4
Stream 2
Segment 7
Step 0 Step 1
Segment 8
Step 0 Step 1
Stream 0
Segment 0
Step 0 Step 1 Step 2
Segment 1
Step 0 Step 1 Step 2 Step 3 Step 4
Segment 2
Step 0 Step 1 Step 2 Step 3
Segment 3
Step 0 Step 1 Step 2 Step 3 Step 4 Step 5
Stream 1
Segment 4
Step 0 Step 1 Step 2 Step 3
Segment 5
Step 0 Step 1 Step 2
Segment 6
Step 0 Step 1 Step 2 Step 3 Step 4
Stream 2
Segment 7
Step 0 Step 1
Segment 8
Step 0 Step 1
Stream 0
Segment 0
Step 0 Step 1 Step 2
Segment 1
Step 0 Step 1 Step 2 Step 3 Step 4
Segment 2
Step 0 Step 1 Step 2 Step 3
Segment 3
Step 0 Step 1 Step 2 Step 3 Step 4 Step 5
Stream 1
Segment 4
Step 0 Step 1 Step 2 Step 3
Segment 5
Step 0 Step 1 Step 2
Segment 6
Step 0 Step 1 Step 2 Step 3 Step 4
Stream 2
Segment 7
Step 0 Step 1
Segment 8
Step 0 Step 1
Stream 0
Segment 0
Step 0 Step 1 Step 2
Segment 1
Step 0 Step 1 Step 2 Step 3 Step 4
Segment 2
Step 0 Step 1 Step 2 Step 3
Segment 3
Step 0 Step 1 Step 2 Step 3 Step 4 Step 5
Stream 1
Segment 4
Step 0 Step 1 Step 2 Step 3
Segment 5
Step 0 Step 1 Step 2
Segment 6
Step 0 Step 1 Step 2 Step 3 Step 4
Stream 2
Segment 7
Step 0 Step 1
Segment 8
Step 0 Step 1
New & Upcoming Features
Recently Released Features
• Support for Timestamp with Time zone : New TIMESTAMPTZ data type to input complete
timestamp values that include the date, the time of day, and a time zone.
Eg: 30 Nov 07:37:16 2016 PST
Multi-byte Object Names
• Support for Multi-byte (UTF-8) characters for tables, columns, and other database object
User Connection Limits
• You can now set a limit on the number of database connections a user is permitted to have
open concurrently
Automatic Data Compression for CTAS
• All newly created tables will leverage default encoding
Amazon Redshift Workload Management
BI tools
SQL clients
Analytics tools
Queue 1
Queue 2
4 Slots
2 Slots
Short queries go to
the head of the
Coming Soon: Short Query Bias
Amazon Redshift Cluster
BI tools
SQL clients
Analytics tools
Leader node
Compute node
Compute node
Compute node
All queries receive a
power start. Shorter
queries benefit the
Coming Soon: Power Start
Monitor and
control cluster
consumed by a
Get notified, abort
and reprioritize
long-running / bad
templates for
common use cases
Coming Soon: Query Monitoring Rules
BI tools SQL clientsAnalytics tools
Client AWS
Active Directory IAM
Amazon Redshift
User groups Individual user
Single Sign-On
Identity providers
New Redshift
drivers. Grab the
ticket (userid) and
get a SAML
Coming Soon: IAM Authentication
Coming Soon: Lots More …
Automatic and Incremental Background VACUUM
• Reclaims space and sorts when Redshift clusters are idle
• Vacuum is initiated when performance can be enhanced
• Improves ETL and query performance
Automatic Compression for New Tables
• All newly created tables will leverage default encoding
• Provides higher compression rates
New Functions
• Approximate Percentile
• Admin scripts
Collection of utilities for running diagnostics on your cluster
• Admin views
Collection of utilities for managing your cluster, generating schema DDL, etc.
• ColumnEncodingUtility
Gives you the ability to apply optimal column encoding to an established schema with
data already loaded
• Amazon Redshift Engineering’s Advanced Table Design Playbook
Thank you!

More Related Content

What's hot

Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
Amazon Web Services
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon Redshift
Amazon Web Services
Redshift performance tuning
Redshift performance tuningRedshift performance tuning
Redshift performance tuning
Carlos del Cacho
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
Amazon Web Services
Redshift overview
Redshift overviewRedshift overview
Redshift overview
Amazon Web Services LATAM
Amazon Redshift: Performance Tuning and Optimization
Amazon Redshift: Performance Tuning and OptimizationAmazon Redshift: Performance Tuning and Optimization
Amazon Redshift: Performance Tuning and Optimization
Amazon Web Services
Intro to Cassandra
Intro to CassandraIntro to Cassandra
Intro to Cassandra
DataStax Academy
(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon Redshift(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon Redshift
Amazon Web Services
Iceberg + Alluxio for Fast Data Analytics
Iceberg + Alluxio for Fast Data AnalyticsIceberg + Alluxio for Fast Data Analytics
Iceberg + Alluxio for Fast Data Analytics
Alluxio, Inc.
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Loading Data into Amazon Redshift
Loading Data into Amazon RedshiftLoading Data into Amazon Redshift
Loading Data into Amazon Redshift
Amazon Web Services
Amazon Redshift Masterclass
Amazon Redshift MasterclassAmazon Redshift Masterclass
Amazon Redshift Masterclass
Amazon Web Services
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Amazon Web Services
Spark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark MeetupSpark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark Meetup
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
Amazon Web Services
Performance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaPerformance Optimizations in Apache Impala
Performance Optimizations in Apache Impala
Cloudera, Inc.
RocksDB detail
RocksDB detailRocksDB detail
RocksDB detail
Top 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsTop 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark Applications
Spark Summit
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
Amazon Web Services
Productizing Structured Streaming Jobs
Productizing Structured Streaming JobsProductizing Structured Streaming Jobs
Productizing Structured Streaming Jobs

What's hot (20)

Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon Redshift
Redshift performance tuning
Redshift performance tuningRedshift performance tuning
Redshift performance tuning
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
Redshift overview
Redshift overviewRedshift overview
Redshift overview
Amazon Redshift: Performance Tuning and Optimization
Amazon Redshift: Performance Tuning and OptimizationAmazon Redshift: Performance Tuning and Optimization
Amazon Redshift: Performance Tuning and Optimization
Intro to Cassandra
Intro to CassandraIntro to Cassandra
Intro to Cassandra
(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon Redshift(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon Redshift
Iceberg + Alluxio for Fast Data Analytics
Iceberg + Alluxio for Fast Data AnalyticsIceberg + Alluxio for Fast Data Analytics
Iceberg + Alluxio for Fast Data Analytics
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Loading Data into Amazon Redshift
Loading Data into Amazon RedshiftLoading Data into Amazon Redshift
Loading Data into Amazon Redshift
Amazon Redshift Masterclass
Amazon Redshift MasterclassAmazon Redshift Masterclass
Amazon Redshift Masterclass
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Spark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark MeetupSpark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark Meetup
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
Performance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaPerformance Optimizations in Apache Impala
Performance Optimizations in Apache Impala
RocksDB detail
RocksDB detailRocksDB detail
RocksDB detail
Top 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsTop 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark Applications
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
Productizing Structured Streaming Jobs
Productizing Structured Streaming JobsProductizing Structured Streaming Jobs
Productizing Structured Streaming Jobs

Viewers also liked

Streaming Data Analytics with Amazon Redshift and Kinesis Firehose
Streaming Data Analytics with Amazon Redshift and Kinesis FirehoseStreaming Data Analytics with Amazon Redshift and Kinesis Firehose
Streaming Data Analytics with Amazon Redshift and Kinesis Firehose
Amazon Web Services
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Amazon Web Services
Best Practices running SQL Server on AWS
Best Practices running SQL Server on AWSBest Practices running SQL Server on AWS
Best Practices running SQL Server on AWS
Amazon Web Services
What’s New in Amazon RDS for Open-Source and Commercial Databases
What’s New in Amazon RDS for Open-Source and Commercial DatabasesWhat’s New in Amazon RDS for Open-Source and Commercial Databases
What’s New in Amazon RDS for Open-Source and Commercial Databases
Amazon Web Services
Making (Almost) Any Database Faster and Cheaper with Caching
Making (Almost) Any Database Faster and Cheaper with CachingMaking (Almost) Any Database Faster and Cheaper with Caching
Making (Almost) Any Database Faster and Cheaper with Caching
Amazon Web Services
What’s New in Amazon Aurora for MySQL and PostgreSQL
What’s New in Amazon Aurora for MySQL and PostgreSQLWhat’s New in Amazon Aurora for MySQL and PostgreSQL
What’s New in Amazon Aurora for MySQL and PostgreSQL
Amazon Web Services
Deep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDBDeep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDB
Amazon Web Services
Migrating to Amazon RDS with Database Migration Service
Migrating to Amazon RDS with Database Migration ServiceMigrating to Amazon RDS with Database Migration Service
Migrating to Amazon RDS with Database Migration Service
Amazon Web Services
Introduction to Amazon Relational Database Service
Introduction to Amazon Relational Database ServiceIntroduction to Amazon Relational Database Service
Introduction to Amazon Relational Database Service
Amazon Web Services
Introduction to Amazon DynamoDB
Introduction to Amazon DynamoDBIntroduction to Amazon DynamoDB
Introduction to Amazon DynamoDB
Amazon Web Services
Hive: A Cloud Story
Hive: A Cloud StoryHive: A Cloud Story
Hive: A Cloud Story
Amazon Web Services
Everything You Need for a Viral Game, Except the Game
Everything You Need for a Viral Game, Except the GameEverything You Need for a Viral Game, Except the Game
Everything You Need for a Viral Game, Except the Game
Amazon Web Services
Telenor Connexion
Telenor Connexion Telenor Connexion
Telenor Connexion
Amazon Web Services
Cost Optimisation on AWS
Cost Optimisation on AWSCost Optimisation on AWS
Cost Optimisation on AWS
Amazon Web Services
Partnering with AWS
Partnering with AWSPartnering with AWS
Partnering with AWS
Amazon Web Services
The Benefits of Cloud Computing
The Benefits of Cloud ComputingThe Benefits of Cloud Computing
The Benefits of Cloud Computing
Amazon Web Services
Opening Keynote
Opening Keynote Opening Keynote
Opening Keynote
Amazon Web Services
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon Redshift
Amazon Web Services
Real-time Data Processing using AWS Lambda
Real-time Data Processing using AWS LambdaReal-time Data Processing using AWS Lambda
Real-time Data Processing using AWS Lambda
Amazon Web Services
Security Best Practices
Security Best PracticesSecurity Best Practices
Security Best Practices
Amazon Web Services

Viewers also liked (20)

Streaming Data Analytics with Amazon Redshift and Kinesis Firehose
Streaming Data Analytics with Amazon Redshift and Kinesis FirehoseStreaming Data Analytics with Amazon Redshift and Kinesis Firehose
Streaming Data Analytics with Amazon Redshift and Kinesis Firehose
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices running SQL Server on AWS
Best Practices running SQL Server on AWSBest Practices running SQL Server on AWS
Best Practices running SQL Server on AWS
What’s New in Amazon RDS for Open-Source and Commercial Databases
What’s New in Amazon RDS for Open-Source and Commercial DatabasesWhat’s New in Amazon RDS for Open-Source and Commercial Databases
What’s New in Amazon RDS for Open-Source and Commercial Databases
Making (Almost) Any Database Faster and Cheaper with Caching
Making (Almost) Any Database Faster and Cheaper with CachingMaking (Almost) Any Database Faster and Cheaper with Caching
Making (Almost) Any Database Faster and Cheaper with Caching
What’s New in Amazon Aurora for MySQL and PostgreSQL
What’s New in Amazon Aurora for MySQL and PostgreSQLWhat’s New in Amazon Aurora for MySQL and PostgreSQL
What’s New in Amazon Aurora for MySQL and PostgreSQL
Deep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDBDeep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDB
Migrating to Amazon RDS with Database Migration Service
Migrating to Amazon RDS with Database Migration ServiceMigrating to Amazon RDS with Database Migration Service
Migrating to Amazon RDS with Database Migration Service
Introduction to Amazon Relational Database Service
Introduction to Amazon Relational Database ServiceIntroduction to Amazon Relational Database Service
Introduction to Amazon Relational Database Service
Introduction to Amazon DynamoDB
Introduction to Amazon DynamoDBIntroduction to Amazon DynamoDB
Introduction to Amazon DynamoDB
Hive: A Cloud Story
Hive: A Cloud StoryHive: A Cloud Story
Hive: A Cloud Story
Everything You Need for a Viral Game, Except the Game
Everything You Need for a Viral Game, Except the GameEverything You Need for a Viral Game, Except the Game
Everything You Need for a Viral Game, Except the Game
Telenor Connexion
Telenor Connexion Telenor Connexion
Telenor Connexion
Cost Optimisation on AWS
Cost Optimisation on AWSCost Optimisation on AWS
Cost Optimisation on AWS
Partnering with AWS
Partnering with AWSPartnering with AWS
Partnering with AWS
The Benefits of Cloud Computing
The Benefits of Cloud ComputingThe Benefits of Cloud Computing
The Benefits of Cloud Computing
Opening Keynote
Opening Keynote Opening Keynote
Opening Keynote
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon Redshift
Real-time Data Processing using AWS Lambda
Real-time Data Processing using AWS LambdaReal-time Data Processing using AWS Lambda
Real-time Data Processing using AWS Lambda
Security Best Practices
Security Best PracticesSecurity Best Practices
Security Best Practices

Similar to Deep Dive on Amazon Redshift

Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Amazon Web Services
SRV405 Deep Dive on Amazon Redshift
SRV405 Deep Dive on Amazon RedshiftSRV405 Deep Dive on Amazon Redshift
SRV405 Deep Dive on Amazon Redshift
Amazon Web Services
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Amazon Web Services
Data Warehousing in the Era of Big Data: Deep Dive into Amazon Redshift
Data Warehousing in the Era of Big Data: Deep Dive into Amazon RedshiftData Warehousing in the Era of Big Data: Deep Dive into Amazon Redshift
Data Warehousing in the Era of Big Data: Deep Dive into Amazon Redshift
Amazon Web Services
SRV405 Ancestry's Journey to Amazon Redshift
SRV405 Ancestry's Journey to Amazon RedshiftSRV405 Ancestry's Journey to Amazon Redshift
SRV405 Ancestry's Journey to Amazon Redshift
Amazon Web Services
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
Amazon Web Services
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWS
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWSAWS SSA Webinar 20 - Getting Started with Data Warehouses on AWS
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWS
Cobus Bernard
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
Amazon Web Services
Best Practices for Migrating Your Data Warehouse to Amazon Redshift
Best Practices for Migrating Your Data Warehouse to Amazon RedshiftBest Practices for Migrating Your Data Warehouse to Amazon Redshift
Best Practices for Migrating Your Data Warehouse to Amazon Redshift
Amazon Web Services
Building your data warehouse with Redshift
Building your data warehouse with RedshiftBuilding your data warehouse with Redshift
Building your data warehouse with Redshift
Amazon Web Services
AWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
AWS June 2016 Webinar Series - Amazon Redshift or Big Data AnalyticsAWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
AWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
Amazon Web Services
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Michael Rys
16119 - Get to Know Your Data Sets (1).pdf
16119 - Get to Know Your Data Sets (1).pdf16119 - Get to Know Your Data Sets (1).pdf
16119 - Get to Know Your Data Sets (1).pdf
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Amazon Web Services
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Amazon Web Services
SQL Server 2014 Memory Optimised Tables - Advanced
SQL Server 2014 Memory Optimised Tables - AdvancedSQL Server 2014 Memory Optimised Tables - Advanced
SQL Server 2014 Memory Optimised Tables - Advanced
Tony Rogerson
ABD304-R-Best Practices for Data Warehousing with Amazon Redshift & Spectrum
ABD304-R-Best Practices for Data Warehousing with Amazon Redshift & SpectrumABD304-R-Best Practices for Data Warehousing with Amazon Redshift & Spectrum
ABD304-R-Best Practices for Data Warehousing with Amazon Redshift & Spectrum
Amazon Web Services
Best Practices for Supercharging Cloud Analytics on Amazon Redshift
Best Practices for Supercharging Cloud Analytics on Amazon RedshiftBest Practices for Supercharging Cloud Analytics on Amazon Redshift
Best Practices for Supercharging Cloud Analytics on Amazon Redshift
2017 AWS DB Day | Amazon Redshift 소개 및 실습
2017 AWS DB Day | Amazon Redshift  소개 및 실습2017 AWS DB Day | Amazon Redshift  소개 및 실습
2017 AWS DB Day | Amazon Redshift 소개 및 실습
Amazon Web Services Korea

Similar to Deep Dive on Amazon Redshift (20)

Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
SRV405 Deep Dive on Amazon Redshift
SRV405 Deep Dive on Amazon RedshiftSRV405 Deep Dive on Amazon Redshift
SRV405 Deep Dive on Amazon Redshift
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Data Warehousing in the Era of Big Data: Deep Dive into Amazon Redshift
Data Warehousing in the Era of Big Data: Deep Dive into Amazon RedshiftData Warehousing in the Era of Big Data: Deep Dive into Amazon Redshift
Data Warehousing in the Era of Big Data: Deep Dive into Amazon Redshift
SRV405 Ancestry's Journey to Amazon Redshift
SRV405 Ancestry's Journey to Amazon RedshiftSRV405 Ancestry's Journey to Amazon Redshift
SRV405 Ancestry's Journey to Amazon Redshift
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWS
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWSAWS SSA Webinar 20 - Getting Started with Data Warehouses on AWS
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWS
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
Best Practices for Migrating Your Data Warehouse to Amazon Redshift
Best Practices for Migrating Your Data Warehouse to Amazon RedshiftBest Practices for Migrating Your Data Warehouse to Amazon Redshift
Best Practices for Migrating Your Data Warehouse to Amazon Redshift
Building your data warehouse with Redshift
Building your data warehouse with RedshiftBuilding your data warehouse with Redshift
Building your data warehouse with Redshift
AWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
AWS June 2016 Webinar Series - Amazon Redshift or Big Data AnalyticsAWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
AWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
16119 - Get to Know Your Data Sets (1).pdf
16119 - Get to Know Your Data Sets (1).pdf16119 - Get to Know Your Data Sets (1).pdf
16119 - Get to Know Your Data Sets (1).pdf
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
SQL Server 2014 Memory Optimised Tables - Advanced
SQL Server 2014 Memory Optimised Tables - AdvancedSQL Server 2014 Memory Optimised Tables - Advanced
SQL Server 2014 Memory Optimised Tables - Advanced
ABD304-R-Best Practices for Data Warehousing with Amazon Redshift & Spectrum
ABD304-R-Best Practices for Data Warehousing with Amazon Redshift & SpectrumABD304-R-Best Practices for Data Warehousing with Amazon Redshift & Spectrum
ABD304-R-Best Practices for Data Warehousing with Amazon Redshift & Spectrum
Best Practices for Supercharging Cloud Analytics on Amazon Redshift
Best Practices for Supercharging Cloud Analytics on Amazon RedshiftBest Practices for Supercharging Cloud Analytics on Amazon Redshift
Best Practices for Supercharging Cloud Analytics on Amazon Redshift
2017 AWS DB Day | Amazon Redshift 소개 및 실습
2017 AWS DB Day | Amazon Redshift  소개 및 실습2017 AWS DB Day | Amazon Redshift  소개 및 실습
2017 AWS DB Day | Amazon Redshift 소개 및 실습

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Amazon Web Services
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Amazon Web Services
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
Amazon Web Services
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
Amazon Web Services
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
Amazon Web Services
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
Amazon Web Services
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Amazon Web Services
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
Amazon Web Services
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Amazon Web Services
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
Amazon Web Services
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
Amazon Web Services
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Amazon Web Services
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
Amazon Web Services
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Amazon Web Services
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
Amazon Web Services
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Amazon Web Services
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
Amazon Web Services
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
Amazon Web Services
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Amazon Web Services
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
Amazon Web Services

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service

Recently uploaded

Cal Girls Nirman Nagar Jaipur | 8445551418 | Top Class High Profile Beauty Girl
Cal Girls Nirman Nagar Jaipur | 8445551418 | Top Class High Profile Beauty GirlCal Girls Nirman Nagar Jaipur | 8445551418 | Top Class High Profile Beauty Girl
Cal Girls Nirman Nagar Jaipur | 8445551418 | Top Class High Profile Beauty Girl
Toast To TGIS- Newsletter June 2024 TGIS.pdf
Toast To TGIS- Newsletter June 2024 TGIS.pdfToast To TGIS- Newsletter June 2024 TGIS.pdf
Toast To TGIS- Newsletter June 2024 TGIS.pdf
Cal Girls Gopalpura Bypass Rd Jaipur | 8445551418 | Top Class High Profile Be...
Cal Girls Gopalpura Bypass Rd Jaipur | 8445551418 | Top Class High Profile Be...Cal Girls Gopalpura Bypass Rd Jaipur | 8445551418 | Top Class High Profile Be...
Cal Girls Gopalpura Bypass Rd Jaipur | 8445551418 | Top Class High Profile Be...
Large language model for public services
Large language model for public servicesLarge language model for public services
Large language model for public services
Mohamed Elharty
Flinders Cert degree offer diploma
Flinders Cert degree offer diploma Flinders Cert degree offer diploma
Flinders Cert degree offer diploma
Fertilizer production by indorama fertilizer co.pptx
Fertilizer production by indorama fertilizer co.pptxFertilizer production by indorama fertilizer co.pptx
Fertilizer production by indorama fertilizer co.pptx
Cal Girls Holiday Inn Jaipur City Centre | 8445551418 | Girls Call With Sweet...
Cal Girls Holiday Inn Jaipur City Centre | 8445551418 | Girls Call With Sweet...Cal Girls Holiday Inn Jaipur City Centre | 8445551418 | Girls Call With Sweet...
Cal Girls Holiday Inn Jaipur City Centre | 8445551418 | Girls Call With Sweet...
DAY 10 D Revelation 07-21-2024 PPT.pptx
DAY 10  D Revelation 07-21-2024 PPT.pptxDAY 10  D Revelation 07-21-2024 PPT.pptx
DAY 10 D Revelation 07-21-2024 PPT.pptx
Integrated and localized Approach in Development Communication.pptx
Integrated and localized Approach in Development Communication.pptxIntegrated and localized Approach in Development Communication.pptx
Integrated and localized Approach in Development Communication.pptx
Sayan Bachaspati
Communication Skills........Let's Learn
Communication Skills........Let's Learn Communication Skills........Let's Learn
Communication Skills........Let's Learn
2024-07-21 Transformed 08 (shared slides).pptx
2024-07-21 Transformed 08 (shared slides).pptx2024-07-21 Transformed 08 (shared slides).pptx
2024-07-21 Transformed 08 (shared slides).pptx
Dale Wells
Pass AWS Certified Developer Associate with new exam dumps 2024
Pass AWS Certified Developer Associate  with new exam dumps 2024Pass AWS Certified Developer Associate  with new exam dumps 2024
Pass AWS Certified Developer Associate with new exam dumps 2024
Cal Girls Hotel Highway King Jaipur | 8445551418 | Top Class High Profile Bea...
Cal Girls Hotel Highway King Jaipur | 8445551418 | Top Class High Profile Bea...Cal Girls Hotel Highway King Jaipur | 8445551418 | Top Class High Profile Bea...
Cal Girls Hotel Highway King Jaipur | 8445551418 | Top Class High Profile Bea...
Listening- Stating Opinion, Agreeing, and Disagreeing (1).pptx
Listening- Stating Opinion, Agreeing, and Disagreeing (1).pptxListening- Stating Opinion, Agreeing, and Disagreeing (1).pptx
Listening- Stating Opinion, Agreeing, and Disagreeing (1).pptx
Trapbone Routing Plan created by Marcus Davis Jr
Trapbone Routing Plan created by Marcus Davis JrTrapbone Routing Plan created by Marcus Davis Jr
Trapbone Routing Plan created by Marcus Davis Jr
June 17, 2024, Meet Mack Monday Zoom Meeting
June 17, 2024, Meet Mack Monday Zoom MeetingJune 17, 2024, Meet Mack Monday Zoom Meeting
June 17, 2024, Meet Mack Monday Zoom Meeting
DCS for presenation ahah phd gaming zone for
DCS for presenation ahah phd gaming zone forDCS for presenation ahah phd gaming zone for
DCS for presenation ahah phd gaming zone for
PSUG 3 - 2024-07-15 - Splunk & AI with Philipp Drieger
PSUG 3 - 2024-07-15 - Splunk & AI with Philipp DriegerPSUG 3 - 2024-07-15 - Splunk & AI with Philipp Drieger
PSUG 3 - 2024-07-15 - Splunk & AI with Philipp Drieger
Tomas Moser

Recently uploaded (20)

Cal Girls Nirman Nagar Jaipur | 8445551418 | Top Class High Profile Beauty Girl
Cal Girls Nirman Nagar Jaipur | 8445551418 | Top Class High Profile Beauty GirlCal Girls Nirman Nagar Jaipur | 8445551418 | Top Class High Profile Beauty Girl
Cal Girls Nirman Nagar Jaipur | 8445551418 | Top Class High Profile Beauty Girl
Toast To TGIS- Newsletter June 2024 TGIS.pdf
Toast To TGIS- Newsletter June 2024 TGIS.pdfToast To TGIS- Newsletter June 2024 TGIS.pdf
Toast To TGIS- Newsletter June 2024 TGIS.pdf
Cal Girls Gopalpura Bypass Rd Jaipur | 8445551418 | Top Class High Profile Be...
Cal Girls Gopalpura Bypass Rd Jaipur | 8445551418 | Top Class High Profile Be...Cal Girls Gopalpura Bypass Rd Jaipur | 8445551418 | Top Class High Profile Be...
Cal Girls Gopalpura Bypass Rd Jaipur | 8445551418 | Top Class High Profile Be...
Large language model for public services
Large language model for public servicesLarge language model for public services
Large language model for public services
Flinders Cert degree offer diploma
Flinders Cert degree offer diploma Flinders Cert degree offer diploma
Flinders Cert degree offer diploma
Fertilizer production by indorama fertilizer co.pptx
Fertilizer production by indorama fertilizer co.pptxFertilizer production by indorama fertilizer co.pptx
Fertilizer production by indorama fertilizer co.pptx
Cal Girls Holiday Inn Jaipur City Centre | 8445551418 | Girls Call With Sweet...
Cal Girls Holiday Inn Jaipur City Centre | 8445551418 | Girls Call With Sweet...Cal Girls Holiday Inn Jaipur City Centre | 8445551418 | Girls Call With Sweet...
Cal Girls Holiday Inn Jaipur City Centre | 8445551418 | Girls Call With Sweet...
DAY 10 D Revelation 07-21-2024 PPT.pptx
DAY 10  D Revelation 07-21-2024 PPT.pptxDAY 10  D Revelation 07-21-2024 PPT.pptx
DAY 10 D Revelation 07-21-2024 PPT.pptx
Integrated and localized Approach in Development Communication.pptx
Integrated and localized Approach in Development Communication.pptxIntegrated and localized Approach in Development Communication.pptx
Integrated and localized Approach in Development Communication.pptx
Communication Skills........Let's Learn
Communication Skills........Let's Learn Communication Skills........Let's Learn
Communication Skills........Let's Learn
2024-07-21 Transformed 08 (shared slides).pptx
2024-07-21 Transformed 08 (shared slides).pptx2024-07-21 Transformed 08 (shared slides).pptx
2024-07-21 Transformed 08 (shared slides).pptx
Pass AWS Certified Developer Associate with new exam dumps 2024
Pass AWS Certified Developer Associate  with new exam dumps 2024Pass AWS Certified Developer Associate  with new exam dumps 2024
Pass AWS Certified Developer Associate with new exam dumps 2024
Cal Girls Hotel Highway King Jaipur | 8445551418 | Top Class High Profile Bea...
Cal Girls Hotel Highway King Jaipur | 8445551418 | Top Class High Profile Bea...Cal Girls Hotel Highway King Jaipur | 8445551418 | Top Class High Profile Bea...
Cal Girls Hotel Highway King Jaipur | 8445551418 | Top Class High Profile Bea...
Listening- Stating Opinion, Agreeing, and Disagreeing (1).pptx
Listening- Stating Opinion, Agreeing, and Disagreeing (1).pptxListening- Stating Opinion, Agreeing, and Disagreeing (1).pptx
Listening- Stating Opinion, Agreeing, and Disagreeing (1).pptx
Trapbone Routing Plan created by Marcus Davis Jr
Trapbone Routing Plan created by Marcus Davis JrTrapbone Routing Plan created by Marcus Davis Jr
Trapbone Routing Plan created by Marcus Davis Jr
June 17, 2024, Meet Mack Monday Zoom Meeting
June 17, 2024, Meet Mack Monday Zoom MeetingJune 17, 2024, Meet Mack Monday Zoom Meeting
June 17, 2024, Meet Mack Monday Zoom Meeting
DCS for presenation ahah phd gaming zone for
DCS for presenation ahah phd gaming zone forDCS for presenation ahah phd gaming zone for
DCS for presenation ahah phd gaming zone for
PSUG 3 - 2024-07-15 - Splunk & AI with Philipp Drieger
PSUG 3 - 2024-07-15 - Splunk & AI with Philipp DriegerPSUG 3 - 2024-07-15 - Splunk & AI with Philipp Drieger
PSUG 3 - 2024-07-15 - Splunk & AI with Philipp Drieger

Deep Dive on Amazon Redshift

  • 1. Deep Dive on Amazon Redshift Storage Subsystem and Query Life Cycle Eric Ferreira, Principal Database Engineer, Amazon Redshift Mar 2017
  • 2. Deep Dive Overview • Amazon Redshift History and Development • Cluster Architecture • Concepts and Terminology • Storage Deep Dive • Design Considerations • Query Life Cycle • New & Upcoming Feature • Open Q&A
  • 3. Amazon Redshift History & Development
  • 4. Columnar MPP OLAP AWS IAMAmazon VPCAmazon SWF Amazon S3 AWSKMS Amazon Route 53 Amazon CloudWatch Amazon EC2 PostgreSQL Amazon Redshift
  • 5. February 2013 February 2017 > 100 Significant Patches > 140 Significant Features
  • 6. Amazon Redshift Cluster Architecture
  • 7. Redshift Cluster Architecture • Massively parallel, shared nothing • Leader node – SQL endpoint – Stores metadata – Coordinates parallel SQL processing • Compute nodes – Local, columnar storage – Executes queries in parallel – Load, backup, restore 10 GigE (HPC) Ingestion Backup Restore SQL Clients/BI Tools 128GB RAM 16TB disk 16 cores S3 / EMR / DynamoDB / SSH JDBC/ODBC 128GB RAM 16TB disk 16 coresCompute Node 128GB RAM 16TB disk 16 coresCompute Node 128GB RAM 16TB disk 16 coresCompute Node Leader Node
  • 8. 128GB RAM 16TB disk 16 cores 128GB RAM 16TB disk 16 cores Compute Node 128GB RAM 16TB disk 16 cores Compute Node 128GB RAM 16TB disk 16 cores Compute Node Leader Node
  • 9. 128GB RAM 16TB disk 16 cores 128GB RAM 16TB disk 16 cores Compute Node 128GB RAM 16TB disk 16 cores Compute Node 128GB RAM 16TB disk 16 cores Compute Node Leader Node • Parser & Rewriter • Planner & Optimizer • Code Generator • Input: Optimized plan • Output: >=1 C++ functions • Compiler • Task Scheduler • WLM • Admission • Scheduling • PostgreSQL Catalog Tables
  • 10. 128GB RAM 16TB disk 16 cores 128GB RAM 16TB disk 16 cores Compute Node 128GB RAM 16TB disk 16 cores Compute Node 128GB RAM 16TB disk 16 cores Compute Node Leader Node • Query execution processes • Backup & restore processes • Replication processes • Local Storage • Disks • Slices • Tables • Columns • Blocks
  • 11. 128GB RAM 16TB disk 16 cores 128GB RAM 16TB disk 16 cores Compute Node 128GB RAM 16TB disk 16 cores Compute Node 128GB RAM 16TB disk 16 cores Compute Node Leader Node • Query execution processes • Backup & restore processes • Replication processes • Local Storage • Disks • Slices • Tables • Columns • Blocks
  • 13. Designed for I/O Reduction • Columnar storage • Data compression • Zone maps aid loc dt 1 SFO 2016-09-01 2 JFK 2016-09-14 3 SFO 2017-04-01 4 JFK 2017-05-14 • Accessing dt with row storage: – Need to read everything – Unnecessary I/O aid loc dt CREATE TABLE loft_deep_dive ( aid INT --audience_id ,loc CHAR(3) --location ,dt DATE --date );
  • 14. Designed for I/O Reduction • Columnar storage • Data compression • Zone maps aid loc dt 1 SFO 2016-09-01 2 JFK 2016-09-14 3 SFO 2017-04-01 4 JFK 2017-05-14 • Accessing dt with columnar storage: – Only scan blocks for relevant column aid loc dt CREATE TABLE loft_deep_dive ( aid INT --audience_id ,loc CHAR(3) --location ,dt DATE --date );
  • 15. Designed for I/O Reduction • Columnar storage • Data compression • Zone maps aid loc dt 1 SFO 2016-09-01 2 JFK 2016-09-14 3 SFO 2017-04-01 4 JFK 2017-05-14 • Columns grow and shrink independently • Effective compression ratios due to like data • Reduces storage requirements • Reduces I/O aid loc dt CREATE TABLE loft_deep_dive ( aid INT ENCODE LZO ,loc CHAR(3) ENCODE BYTEDICT ,dt DATE ENCODE RUNLENGTH );
  • 16. Designed for I/O Reduction • Columnar storage • Data compression • Zone maps aid loc dt 1 SFO 2016-09-01 2 JFK 2016-09-14 3 SFO 2017-04-01 4 JFK 2017-05-14 aid loc dt CREATE TABLE loft_deep_dive ( aid INT --audience_id ,loc CHAR(3) --location ,dt DATE --date ); • In-memory block metadata • Contains per-block MIN and MAX value • Effectively prunes blocks which cannot contain data for a given query • Eliminates unnecessary I/O
  • 17. SELECT COUNT(*) FROM LOGS WHERE DATE = '09-JUNE-2013' MIN: 01-JUNE-2013 MAX: 20-JUNE-2013 MIN: 08-JUNE-2013 MAX: 30-JUNE-2013 MIN: 12-JUNE-2013 MAX: 20-JUNE-2013 MIN: 02-JUNE-2013 MAX: 25-JUNE-2013 Unsorted Table MIN: 01-JUNE-2013 MAX: 06-JUNE-2013 MIN: 07-JUNE-2013 MAX: 12-JUNE-2013 MIN: 13-JUNE-2013 MAX: 18-JUNE-2013 MIN: 19-JUNE-2013 MAX: 24-JUNE-2013 Sorted By Date Zone Maps
  • 18. Terminology and Concepts: Data Sorting • Goals: • Physically order rows of table data based on certain column(s) • Optimize effectiveness of zone maps • Enable MERGE JOIN operations • Impact: • Enables rrscans to prune blocks by leveraging zone maps • Overall reduction in block I/O • Achieved with the table property SORTKEY defined over one or more columns • Optimal SORTKEY is dependent on: • Query patterns • Data profile • Business requirements
  • 19. Terminology and Concepts: Slices • A slice can be thought of like a “virtual compute node” – Unit of data partitioning – Parallel query processing • Facts about slices: – Each compute node has either 2, 16, or 32 slices – Table rows are distributed to slices – A slice processes only its own data
  • 20. Data Distribution • Distribution style is a table property which dictates how that table’s data is distributed throughout the cluster: • KEY: Value is hashed, same value goes to same location (slice) • ALL: Full table data goes to first slice of every node • EVEN: Round robin • Goals: • Distribute data evenly for parallel processing • Minimize data movement during query processing KEY ALL Node 1 Slice 1 Slice 2 Node 2 Slice 3 Slice 4 Node 1 Slice 1 Slice 2 Node 2 Slice 3 Slice 4 Node 1 Slice 1 Slice 2 Node 2 Slice 3 Slice 4 EVEN
  • 21. Data Distribution: Example CREATE TABLE loft_deep_dive ( aid INT --audience_id ,loc CHAR(3) --location ,dt DATE --date ) DISTSTYLE (EVEN|KEY|ALL); CN1 Slice 0 Slice 1 CN2 Slice 2 Slice 3 Table: loft_deep_dive User Columns System Columns aid loc dt ins del row
  • 22. Data Distribution: EVEN Example CREATE TABLE loft_deep_dive ( aid INT --audience_id ,loc CHAR(3) --location ,dt DATE --date ) DISTSTYLE EVEN; CN1 Slice 0 Slice 1 CN2 Slice 2 Slice 3 INSERT INTO loft_deep_dive VALUES (1, 'SFO', '2016-09-01'), (2, 'JFK', '2016-09-14'), (3, 'SFO', '2017-04-01'), (4, 'JFK', '2017-05-14'); Table: loft_deep_dive User Columns System Columns aid loc dt ins del row Table: loft_deep_dive User Columns System Columns aid loc dt ins del row Table: loft_deep_dive User Columns System Columns aid loc dt ins del row Table: loft_deep_dive User Columns System Columns aid loc dt ins del row Rows: 0 Rows: 0 Rows: 0 Rows: 0 (3 User Columns + 3 System Columns) x (4 slices) = 24 Blocks (24MB) Rows: 1 Rows: 1 Rows: 1 Rows: 1
  • 23. Data Distribution: KEY Example #1 CREATE TABLE loft_deep_dive ( aid INT --audience_id ,loc CHAR(3) --location ,dt DATE --date ) DISTSTYLE KEY DISTKEY (loc); CN1 Slice 0 Slice 1 CN2 Slice 2 Slice 3 INSERT INTO loft_deep_dive VALUES (1, 'SFO', '2016-09-01'), (2, 'JFK', '2016-09-14'), (3, 'SFO', '2017-04-01'), (4, 'JFK', '2017-05-14'); Table: loft_deep_dive User Columns System Columns aid loc dt ins del row Rows: 2 Rows: 0 Rows: 0 (3 User Columns + 3 System Columns) x (2 slices) = 12 Blocks (12MB) Rows: 0Rows: 1 Table: loft_deep_dive User Columns System Columns aid loc dt ins del row Rows: 2Rows: 0Rows: 1
  • 24. Data Distribution: KEY Example #2 CREATE TABLE loft_deep_dive ( aid INT --audience_id ,loc CHAR(3) --location ,dt DATE --date ) DISTSTYLE KEY DISTKEY (aid); CN1 Slice 0 Slice 1 CN2 Slice 2 Slice 3 INSERT INTO loft_deep_dive VALUES (1, 'SFO', '2016-09-01'), (2, 'JFK', '2016-09-14'), (3, 'SFO', '2017-04-01'), (4, 'JFK', '2017-05-14'); Table: loft_deep_dive User Columns System Columns aid loc dt ins del row Table: loft_deep_dive User Columns System Columns aid loc dt ins del row Table: loft_deep_dive User Columns System Columns aid loc dt ins del row Table: loft_deep_dive User Columns System Columns aid loc dt ins del row Rows: 0 Rows: 0 Rows: 0 Rows: 0 (3 User Columns + 3 System Columns) x (4 slices) = 24 Blocks (24MB) Rows: 1 Rows: 1 Rows: 1 Rows: 1
  • 25. Data Distribution: ALL Example CREATE TABLE loft_deep_dive ( aid INT --audience_id ,loc CHAR(3) --location ,dt DATE --date ) DISTSTYLE ALL; CN1 Slice 0 Slice 1 CN2 Slice 2 Slice 3 INSERT INTO loft_deep_dive VALUES (1, 'SFO', '2016-09-01'), (2, 'JFK', '2016-09-14'), (3, 'SFO', '2017-04-01'), (4, 'JFK', '2017-05-14'); Rows: 0 Rows: 0 (3 User Columns + 3 System Columns) x (2 slice) = 12 Blocks (12MB) Table: loft_deep_dive User Columns System Columns aid loc dt ins del row Rows: 0Rows: 1Rows: 2Rows: 4Rows: 3 Table: loft_deep_dive User Columns System Columns aid loc dt ins del row Rows: 0Rows: 1Rows: 2Rows: 4Rows: 3
  • 26. Terminology and Concepts: Data Distribution • KEY – The key creates an even distribution of data – Joins are performed between large fact/dimension tables – Optimizing merge joins and group by • ALL – Small and medium size dimension tables (< 2-3M) • EVEN – When key cannot produce an even distribution
  • 28. Storage Deep Dive: Disks • Redshift utilizes locally attached storage devices • Compute nodes have 2.5-3x the advertised storage capacity • 1, 3, 8, or 24 disks depending on node type • Each disk is split into two partitions – Local data storage, accessed by local CN – Mirrored data, accessed by remote CN • Partitions are raw devices – Local storage devices are ephemeral in nature – Tolerant to multiple disk failures on a single node
  • 29. Storage Deep Dive: Blocks • Column data is persisted to 1MB immutable blocks • Each block contains in-memory metadata: – Zone Maps (MIN/MAX value) – Location of previous/next block • Blocks are individually compressed with 1 of 10 encodings • A full block contains between 16 and 8.4 million values
  • 30. Storage Deep Dive: Columns • Column: Logical structure accessible via SQL • Physical structure is a doubly linked list of blocks • These blockchains exist on each slice for each column • All sorted & unsorted blockchains compose a column • Column properties include: – Distribution Key – Sort Key – Compression Encoding • Columns shrink and grow independently, 1 block at a time • Three system columns per table-per slice for MVCC
  • 31. Block Properties: Design Considerations • Small writes: • Batch processing system, optimized for processing massive amounts of data • 1MB size + immutable blocks means that we clone blocks on write so as not to introduce fragmentation • Small write (~1-10 rows) has similar cost to a larger write (~100 K rows) • UPDATE and DELETE: • Immutable blocks means that we only logically delete rows on UPDATE or DELETE • Must VACUUM or DEEP COPY to remove ghost rows from table
  • 32. Column Properties: Design Considerations • Compression: • COPY automatically analyzes and compresses data when loading into empty tables • ANALYZE COMPRESSION checks existing tables and proposes optimal compression algorithms for each column • Changing column encoding requires a table rebuild • DISTKEY and SORTKEY significantly influence performance (orders of magnitude) • Distribution Keys: • A poor DISTKEY can introduce data skew and an unbalanced workload • A query completes only as fast as the slowest slice completes • Sort Keys: • A sortkey is only effective as the data profile allows it to be • Selectivity needs to be considered
  • 34. Storage Deep Dive: Slices • Each compute node has either 2, 16, or 32 slices • A slice can be thought of like a “virtual compute node” – Unit of data partitioning – Parallel query processing • Facts about slices: – Table rows are distributed to slices – A slice processes only its own data – Within a compute node all slices read from and write to all disks
  • 35. 128GB RAM 16TB disk 16 cores 128GB RAM 16TB disk 16 cores Compute Node 128GB RAM 16TB disk 16 cores Compute Node 128GB RAM 16TB disk 16 cores Compute Node Leader Node • Parser & Rewriter • Planner & Optimizer • Code Generator • Input: Optimized plan • Output: >=1 C++ functions • Compiler • Task Scheduler • WLM • Admission • Scheduling • PostgreSQL Catalog Tables • Redshift System Tables (STV)
  • 36. 128GB RAM 16TB disk 16 cores 128GB RAM 16TB disk 16 cores Compute Node 128GB RAM 16TB disk 16 cores Compute Node 128GB RAM 16TB disk 16 cores Compute Node Leader Node • Parser & Rewriter • Planner & Optimizer • Code Generator • Input: Optimized plan • Output: >=1 C++ functions • Compiler • Task Scheduler • WLM • Admission • Scheduling • PostgreSQL Catalog Tables • Redshift System Tables (STV)
  • 37. Query Execution Terminology • Step: An individual operation needed during query execution. Steps need to be combined to allow compute nodes to perform a join. Examples: scan, sort, hash, aggr • Segment: A combination of several steps that can be done by a single process. The smallest compilation unit executable by a slice. Segments within a stream run in parallel. • Stream: A collection of combined segments which output to the next stream or SQL client.
  • 38. Visualizing Streams, Segments, and Steps Stream 0 Segment 0 Step 0 Step 1 Step 2 Segment 1 Step 0 Step 1 Step 2 Step 3 Step 4 Segment 2 Step 0 Step 1 Step 2 Step 3 Segment 3 Step 0 Step 1 Step 2 Step 3 Step 4 Step 5 Stream 1 Segment 4 Step 0 Step 1 Step 2 Step 3 Segment 5 Step 0 Step 1 Step 2 Segment 6 Step 0 Step 1 Step 2 Step 3 Step 4 Stream 2 Segment 7 Step 0 Step 1 Segment 8 Step 0 Step 1 Time
  • 39. client JDBC ODBC Leader Node Parser Query Planner Code Generator Final Computations Generate code for all segments of one stream Explain Plans Compute Node Receive Compiled Code Run the Compiled Code Return results to Leader Compute Node Receive Compiled Code Run the Compiled Code Return results to Leader Return results to client Segments in a stream are executed concurrently. Each step in a segment is executed serially. Query Lifecycle
  • 40. Query Execution Deep Dive: Leader Node 1. The leader node receives the query and parses the SQL. 2. The parser produces a logical representation of the original query. 3. This query tree is input into the query optimizer (volt). 4. Volt rewrites the query to maximize its efficiency. Sometimes a single query will be rewritten as several dependent statements in the background. 5. The rewritten query is sent to the planner which generates >= 1 query plans for the execution with the best estimated performance. 6. The query plan is sent to the execution engine, where it’s translated into steps, segments, and streams. 7. This translated plan is sent to the code generator, which generates a C++ function for each segment. 8. This generated C++ is compiled with gcc to a .o file and distributed to the compute nodes.
  • 41. Query Execution Deep Dive: Compute Nodes • Slices execute the query segments in parallel. • Executable segments are created for one stream at a time. When the segments of that stream are complete, the engine generates the segments for the next stream. • When the compute nodes are done, they return the query results to the leader node for final processing. • The leader node merges the data into a single result set and addresses any needed sorting or aggregation. • The leader node then returns the results to the client.
  • 42. Visualizing Streams, Segments, and Steps Stream 0 Segment 0 Step 0 Step 1 Step 2 Segment 1 Step 0 Step 1 Step 2 Step 3 Step 4 Segment 2 Step 0 Step 1 Step 2 Step 3 Segment 3 Step 0 Step 1 Step 2 Step 3 Step 4 Step 5 Stream 1 Segment 4 Step 0 Step 1 Step 2 Step 3 Segment 5 Step 0 Step 1 Step 2 Segment 6 Step 0 Step 1 Step 2 Step 3 Step 4 Stream 2 Segment 7 Step 0 Step 1 Segment 8 Step 0 Step 1 Time
  • 43. Query Execution Stream 0 Segment 0 Step 0 Step 1 Step 2 Segment 1 Step 0 Step 1 Step 2 Step 3 Step 4 Segment 2 Step 0 Step 1 Step 2 Step 3 Segment 3 Step 0 Step 1 Step 2 Step 3 Step 4 Step 5 Stream 1 Segment 4 Step 0 Step 1 Step 2 Step 3 Segment 5 Step 0 Step 1 Step 2 Segment 6 Step 0 Step 1 Step 2 Step 3 Step 4 Stream 2 Segment 7 Step 0 Step 1 Segment 8 Step 0 Step 1 Time Stream 0 Segment 0 Step 0 Step 1 Step 2 Segment 1 Step 0 Step 1 Step 2 Step 3 Step 4 Segment 2 Step 0 Step 1 Step 2 Step 3 Segment 3 Step 0 Step 1 Step 2 Step 3 Step 4 Step 5 Stream 1 Segment 4 Step 0 Step 1 Step 2 Step 3 Segment 5 Step 0 Step 1 Step 2 Segment 6 Step 0 Step 1 Step 2 Step 3 Step 4 Stream 2 Segment 7 Step 0 Step 1 Segment 8 Step 0 Step 1 Stream 0 Segment 0 Step 0 Step 1 Step 2 Segment 1 Step 0 Step 1 Step 2 Step 3 Step 4 Segment 2 Step 0 Step 1 Step 2 Step 3 Segment 3 Step 0 Step 1 Step 2 Step 3 Step 4 Step 5 Stream 1 Segment 4 Step 0 Step 1 Step 2 Step 3 Segment 5 Step 0 Step 1 Step 2 Segment 6 Step 0 Step 1 Step 2 Step 3 Step 4 Stream 2 Segment 7 Step 0 Step 1 Segment 8 Step 0 Step 1 Stream 0 Segment 0 Step 0 Step 1 Step 2 Segment 1 Step 0 Step 1 Step 2 Step 3 Step 4 Segment 2 Step 0 Step 1 Step 2 Step 3 Segment 3 Step 0 Step 1 Step 2 Step 3 Step 4 Step 5 Stream 1 Segment 4 Step 0 Step 1 Step 2 Step 3 Segment 5 Step 0 Step 1 Step 2 Segment 6 Step 0 Step 1 Step 2 Step 3 Step 4 Stream 2 Segment 7 Step 0 Step 1 Segment 8 Step 0 Step 1 Slices 0 1 2 3
  • 44. New & Upcoming Features
  • 45. Recently Released Features New Data Type - TIMESTAMPTZ • Support for Timestamp with Time zone : New TIMESTAMPTZ data type to input complete timestamp values that include the date, the time of day, and a time zone. Eg: 30 Nov 07:37:16 2016 PST Multi-byte Object Names • Support for Multi-byte (UTF-8) characters for tables, columns, and other database object names User Connection Limits • You can now set a limit on the number of database connections a user is permitted to have open concurrently Automatic Data Compression for CTAS • All newly created tables will leverage default encoding
  • 46. Amazon Redshift Workload Management Waiting BI tools SQL clients Analytics tools Client Running Queue 1 Queue 2 4 Slots 2 Slots Short queries go to the head of the queue 1 1 Coming Soon: Short Query Bias
  • 47. Amazon Redshift Cluster BI tools SQL clients Analytics tools Client Leader node Compute node Compute node Compute node 2 2 2 2 All queries receive a power start. Shorter queries benefit the most Coming Soon: Power Start
  • 48. Monitor and control cluster resources consumed by a query Get notified, abort and reprioritize long-running / bad queries Pre-defined templates for common use cases Coming Soon: Query Monitoring Rules
  • 49. BI tools SQL clientsAnalytics tools Client AWS Redshift ADFS Corporate Active Directory IAM Amazon Redshift ODBC/JDBC User groups Individual user Single Sign-On Identity providers New Redshift ODBC/JDBC drivers. Grab the ticket (userid) and get a SAML assertion. Coming Soon: IAM Authentication
  • 50. Coming Soon: Lots More … Automatic and Incremental Background VACUUM • Reclaims space and sorts when Redshift clusters are idle • Vacuum is initiated when performance can be enhanced • Improves ETL and query performance Automatic Compression for New Tables • All newly created tables will leverage default encoding • Provides higher compression rates New Functions • Approximate Percentile 010101010101
  • 51. Resources • • • • Admin scripts Collection of utilities for running diagnostics on your cluster • Admin views Collection of utilities for managing your cluster, generating schema DDL, etc. • ColumnEncodingUtility Gives you the ability to apply optimal column encoding to an established schema with data already loaded • Amazon Redshift Engineering’s Advanced Table Design Playbook design-playbook-preamble-prerequisites-and-prioritization/