SlideShare a Scribd company logo
Amazon Redshift:
Good practices for
performance optimization
What is Amazon Redshift?
•Petabyte-scale columnar
database
•Fast response time
• ~ 10 times faster than
typical relational database
•Cost-effective (around 1000 $
per TB per year)
When customers are using Amazon Redshift?
• Reduce cost by
extending DW
instead of adding
HW
• Migrate from
existing DWH
system
• Respond faster to
business: provision
in minutes
Replacing
traditional DWH
• Improve
performance by an
order of magnitude
• Make more data
available for
analysis
• Access business
data via standard
reporting tools
Analyzing Big
Data
• Add analytics
functionality to the
applications
• Scale DW capacity
as demand grows
• Reduce SW and HW
costs by an order of
magnitude
Providing services
and SaaS
Amazon Redshift architecture
Redshift reduces IO
• With raw
storage you do
unnecessary IO
• With columnar
storage you
only read the
data you need
Column storage
• COPY
compresses
automatically
• You can analyze
and override
• More
performance,
less cost
Data
compression
• Track the
minimum and
maximum value
for each block
• Skip over blocks
that do not
contain relevant
data
Zone maps Direct-attached
storage
• > 2 GB/sec scan
rate
• Optimized for
data processing
• High disk density
Redshift Recap
Main possible Issues with Redshift
performance
• Incorrect column encoding
• Skewed table data
• Queries not benefiting from
sort keys – Excessive scanning
• Tables without statistics or
which need vacuum
• Tables with very large
VARCHAR columns
• Queries waiting on queue
slots
• Queries that are disk-based –
incorrect sizing, GROUP BY
distinct values, UNION, hash
joins with DISTINCT values
• Commit queue waits
• Inefficient data loads
• Inefficient use of Temporary
Tables
• Large nested loop JOINs
• Inappropriate Join Cardinality
Incorrect column encoding
Running an Amazon Redshift cluster without column encoding is not considered a best practice, and customers find large
performance gains when they ensure that column encoding is optimally applied. To determine if you are deviating from
this best practice, you can use the v_extended_table_info view from the Amazon Redshift Utils GitHub repository.
If you find that you have tables without optimal column encoding, then use
the Amazon Redshift Column Encoding Utility from the Utils repository to
apply encoding. This command line utility uses the ANALYZE COMPRESSION
command on each table. If encoding is required, it generates a SQL script
which creates a new table with the correct encoding, copies all the data
into the new table, and then transactionally renames the new table to the
old name while retaining the original data.
•Raw Encoding
•Byte-Dictionary Encoding
•Delta Encoding
•LZO Encoding
•Mostly Encoding
•Runlength Encoding
•Text255 and Text32k Encodings
•Zstandard Encoding
Skewed table data
If skew is a problem, you typically see
that node performance is uneven on
the cluster. Use table_inspector.sql, to
see how data blocks in a distribution
key map to the slices and nodes in the
cluster.
Consider changing the distribution key
to a column that exhibits high
cardinality and uniform distribution.
Evaluate a candidate column as a
distribution key by creating a new table
using CTAS:
CREATE TABLE my_test_table DISTKEY
(<column name>) AS SELECT <column
name> FROM <table name>;
You can use SCT for setting the proper Distribution Key and Style during the
migration process
Deal with tables with very large VARCHAR
columns
During processing of complex queries, intermediate query results might need to be stored in temporary blocks.
These temporary tables are not compressed, so unnecessarily wide columns consume excessive memory and
temporary disk space, which can affect query performance.
SELECT database, schema || '.' || "table" AS
"table", max_varchar FROM svv_table_info
WHERE max_varchar > 150 ORDER BY 2;
Use the following query to generate a list of
tables that should have their maximum column
widths reviewed:
Identify which table columns have wide varchar columns and
then determine the true maximum width for each wide
column:
SELECT max(len(rtrim(column_name))) FROM table_name;
If you query the top running queries for the database using the top_queries.sql admin script, pay special attention to
SELECT * queries which include the JSON fragment column. If end users query these large columns but don’t use
actually execute JSON functions against them, consider moving them into another table that only contains the
primary key column of the original table and the JSON column.
Queries not benefiting from sort keys
Amazon Redshift tables can have a sort key column identified, which acts like an index in other databases, but which
does not incur a storage cost as with other platforms (for more information, see Choosing Sort Keys). A sort key should
be created on those columns which are most commonly used in WHERE clauses. If you have a known query pattern,
then COMPOUND sort keys give the best performance; if end users query different columns equally, then use an
INTERLEAVED sort key. If using compound sort keys, review your queries to ensure that their WHERE clauses specify the
sort columns in the same order they were defined in the compound key.
To determine which tables don’t have sort keys, run the following query against the v_extended_table_info view from
the Amazon Redshift Utils repository:
SELECT * FROM admin.v_extended_table_info WHERE sortkey IS null;
If you do not have a sort key this potentially can lead to a excessive scanning issue.
Queries waiting on queue slots (ICT)
Amazon Redshift runs queries using a queuing system known as
workload management (WLM). You can define up to 8 queues
to separate workloads from each other, and set the
concurrency on each queue to meet your overall throughput
requirements.
In some cases, the queue to which a user or query has been
assigned is completely busy and a user’s query must wait for a
slot to be open. During this time, the system is not executing
the query at all, which is a sign that you may need to increase
concurrency.
First, you need to determine if any queries are queuing, using
the queuing_queries.sql admin script. Review the maximum
concurrency that your cluster has needed in the past with
wlm_apex.sql, down to an hour-by-hour historical analysis with
wlm_apex_hourly.sql. Keep in mind that while increasing
concurrency allows more queries to run, each query will get a
smaller share of the memory allocated to its queue
Queries that are disk-based (ICT)
SELECT q.query, trim(q.cat_text)
FROM (
SELECT query,
replace( listagg(text,' ') WITHIN GROUP (ORDER BY sequence),
'n', ' ') AS cat_text
FROM stl_querytext
WHERE userid>1
GROUP BY query) q
JOIN (
SELECT distinct query
FROM svl_query_summary
WHERE is_diskbased='t' AND (LABEL LIKE 'hash%' OR LABEL LIKE 'sort%’
OR LABEL LIKE 'aggr%’) AND userid > 1) qs
ON qs.query = q.query;
If a query isn’t able to
completely execute in memory,
it may need to use disk-based
temporary storage for parts of
an explain plan. The additional
disk I/O slows down the query;
this can be addressed by
increasing the amount of
memory allocated to a session
(for more information, see WLM
Dynamic Memory Allocation).
To determine if any queries have been writing to disk, use the following query:
Based on the user or the queue
assignment rules, you can
increase the amount of
memory given to the selected
queue to prevent queries
needing to spill to disk to
complete.
Commit queue waits (ICT)
Amazon Redshift is designed for analytics queries, rather than transaction processing. The cost of COMMIT is relatively
high, and excessive use of COMMIT can result in queries waiting for access to a commit queue.
If you are committing too often on your database, you will start to see waits on the commit queue increase, which can
be viewed with the commit_stats.sql admin script. This script shows the largest queue length and queue time for
queries run in the past two days. If you have queries that are waiting on the commit queue, then look for sessions that
are committing multiple times per session, such as ETL jobs that are logging progress or inefficient data loads.
One of the worst practices is to insert data into Amazon Redshift row by row. Use COPY command or your ETL tool
compatible with Amazon Redshift instead.
Inefficient use of Temporary Tables
Amazon Redshift provides temporary tables, which are like normal tables except that they are only visible within a single
session. When the user disconnects the session, the tables are automatically deleted. Temporary tables can be created
using the CREATE TEMPORARY TABLE syntax, or by issuing a SELECT … INTO #TEMP_TABLE query. The CREATE TABLE
statement gives you complete control over the definition of the temporary table, while the SELECT … INTO and C(T)TAS
commands use the input data to determine column names, sizes and data types, and use default storage properties.
These default storage properties may cause issues if not carefully considered. Amazon Redshift’s default table structure is
to use EVEN distribution with no column encoding. This is a sub-optimal data structure for many types of queries, and if you
are using the SELECT…INTO syntax you cannot set the column encoding or distribution and sort keys. If you use the CREATE
TABLE AS (CTAS) syntax, you can specify a distribution style and sort keys, and Redshift will automatically apply LZO
encoding for everything other than sort keys, booleans, reals and doubles. If you consider this automatic encoding sub-
optimal, and require further control, use the CREATE TABLE syntax rather than CTAS.
If you are creating temporary tables, it is highly recommended that you convert all SELECT…INTO syntax to use the CREATE
statement. This ensures that your temporary tables have column encodings and are distributed in a fashion that is
sympathetic to the other entities that are part of the workflow.
Tables without statistics
• Amazon Redshift, like other databases, requires statistics about tables and the
composition of data blocks being stored in order to make good decisions when
planning a query (for more information, see Analyzing Tables). Without good
statistics, the optimiser may make suboptimal choices about the order in which
to access tables, or how to join datasets together.
• The ANALYZE Command History topic in the Amazon Redshift Developer Guide
supplies queries to help you address missing or stale statistics, and you can also
simply run the missing_table_stats.sql admin script to determine which tables
are missing stats, or the statement below to determine tables that have stale
statistics:
SELECT database, schema || '.' || "table" AS "table", stats_off FROM svv_table_info
WHERE stats_off > 5 ORDER BY 2;
Tables which need VACUUM
In Amazon Redshift, data blocks are immutable. When
rows are DELETED or UPDATED, they are simply logically
deleted (flagged for deletion) but not physically
removed from disk. Updates result in a new block being
written with new data appended. Both of these
operations cause the previous version of the row to
continue consuming disk space and continue being
scanned when a query scans the table. As a result, table
storage space is increased and performance degraded
due to otherwise avoidable disk I/O during scans. A
VACUUM command recovers the space from deleted
rows and restores the sort order.
You can use the perf_alert.sql admin script to identify
tables that have had alerts about scanning a large
number of deleted rows raised in the last seven days.
To address issues with tables with missing or stale statistics or where vacuum is required, run another AWS Labs utility,
Analyze & Vacuum Schema. This ensures that you always keep up-to-date statistics, and only vacuum tables that actually
need reorganization.
Analyze the performance of the queries and
address issues
• Your best friends in a Redshift world
are:
• ANALYZE for identifying out-of-date
statistics
• SVL_QUERY_SUMMARY View for
summary of all the useful data around
the behavior of your queries and high-
level overview of the cluster
• Query Alerts – for receiving an
important information around
deviations in behavior of your queries
• SVL_QUERY_REPORT View – for
collecting details about your queries
health and performance
Setup and finetune monitoring and
maintenance procedures
• In addition to your preferred monitoring methods and multiple partners
solutions we do have some helpful tools:
• Amazon Redshift Column Encoding Utility
• Use table_inspector.sql, to see how data blocks in a distribution key map to the
slices and nodes in the cluster.
• Run the missing_table_stats.sql admin script to determine which tables are missing
stats.
• Use the perf_alert.sql admin script to identify tables that have had alerts about
scanning a large number of deleted rows raised in the last seven days.
• Use top_queries.sql to determine the top running queries.
• Review the maximum concurrency that your cluster has needed in the past with
wlm_apex.sql, down to an hour-by-hour historical analysis with
wlm_apex_hourly.sql.
• View the commit stats with the commit_stats.sql admin script
Eliminate Nested Loops
• Due to SQL query specifying a
join condition that requires a
”brute force” approach
between two large tables
• Quite easy to spot
• Look for “Nested Loop” in a
Query Plans or
STL_ALERT_EVENT_LOG
ON a.date > b.date
ON a.text LIKE b.text
ON a.x = b.x OR a.y = b.y
• Rewrite inequality JOIN condition
as a window function
• Use small nested loop instead of
two large tables
• Maybe you can do a nested loop
join and persist the results in a
separate relational table
Inappropriate Joins Cardinality
• Query “Fan-out”
• Look for high number of rows
generated from joins (higher
than the sum of rows of all
scanned tables) in the query
plan or execution metrics
• Carefully review the joins logic
• Use SVL_QUERY_SUMMARY to
detect
FROM house
JOIN rooms ON rooms.house_id = house.id
JOIN residents ON residents.house_id = house.id
• If possible break large fan-out
queries into several smaller
queries
• Use derived tables to transform
one-to-many joins into one-to-one
joins
• Try out some advanced techniques
like 1 and 2
Suboptimal WHERE clause
If your WHERE clause causes excessive table scans, you might see a SCAN
step in the segment with the highest maxtime value in
SVL_QUERY_SUMMARY. For more information, see Using the
SVL_QUERY_SUMMARY View.
To fix this issue, add a WHERE clause to the query based on the primary sort
column of the largest table. This approach will help minimize scanning time.
For more information, see Amazon Redshift Best Practices for Designing
Tables.
Insufficiently Restrictive Predicate
If your query has an insufficiently restrictive predicate, you might see a
SCAN step in the segment with the highest maxtime value in
SVL_QUERY_SUMMARY that has a very high rows value compared to the
rows value in the final RETURN step in the query. For more information, see
Using the SVL_QUERY_SUMMARY View.
To fix this issue, try adding a predicate to the query or making the existing
predicate more restrictive to narrow the output.
Very Large Result Set
If your query returns a very large result set, consider rewriting the query to
use UNLOAD to write the results to Amazon S3.
This approach will improve the performance of the RETURN step by taking
advantage of parallel processing. For more information on checking for a
very large result set, see Using the SVL_QUERY_SUMMARY View.
Large SELECT List
If your query has an unusually large SELECT list, you might see a bytes value
that is high relative to the rows value for any step (in comparison to other
steps) in SVL_QUERY_SUMMARY. This high bytes value can be an indicator
that you are selecting a lot of columns. For more information, see Using the
SVL_QUERY_SUMMARY View.
To fix this issue, review the columns you are selecting and see if any can be
removed.
Working with SVL_QUERY_SUMMARY
SELECT query, elapsed, substring FROM svl_qlog ORDER BY query DESC limit 5;
Select your query ID
Collect query data
SELECT * FROM svl_query_summary WHERE query = MyQueryID ORDER BY stm, seg, step;
For analysis please refer to: https://docs.aws.amazon.com/redshift/latest/dg/using-SVL-
Query-Summary.html
Some additional materials can be found here
• Best practices for designing queries
• Best practices for designing tables
• Top 10 performance tuning techniques
• Troubleshooting queries
Amazon Redshift Developer Guide
https://docs.aws.amazon.com/redshift/l
atest/dg/redshift-dg.pdf

More Related Content

What's hot

Google Analytics 101 | 2015
Google Analytics 101 |  2015Google Analytics 101 |  2015
Google Analytics 101 | 2015
Insivia
 
Power pivot intro
Power pivot introPower pivot intro
Power pivot intro
asantaballa
 
DAX (Data Analysis eXpressions) from Zero to Hero
DAX (Data Analysis eXpressions) from Zero to HeroDAX (Data Analysis eXpressions) from Zero to Hero
DAX (Data Analysis eXpressions) from Zero to Hero
Microsoft TechNet - Belgium and Luxembourg
 
Google Analytics - A Brief Intro
Google Analytics - A Brief IntroGoogle Analytics - A Brief Intro
Google Analytics - A Brief Intro
Kashyap Shah
 
Intro to DAX Patterns
Intro to DAX PatternsIntro to DAX Patterns
Intro to DAX Patterns
Eric Bragas
 
Dimensional Modeling
Dimensional ModelingDimensional Modeling
Dimensional Modeling
Sunita Sahu
 
Power BI for Developers
Power BI for DevelopersPower BI for Developers
Power BI for Developers
Jan Pieter Posthuma
 
Data mining & big data presentation 01
Data mining & big data presentation 01Data mining & big data presentation 01
Data mining & big data presentation 01
Aseem Chakrabarthy
 
Tech Due Diligence from CTO's perspective - Talk at code.talks commerce
Tech Due Diligence from CTO's perspective - Talk at code.talks commerceTech Due Diligence from CTO's perspective - Talk at code.talks commerce
Tech Due Diligence from CTO's perspective - Talk at code.talks commerce
Chris Philipps
 
Report authoring in Business Intelligence
Report authoring in Business IntelligenceReport authoring in Business Intelligence
Report authoring in Business Intelligence
Ravi Pandit
 
Power bi
Power biPower bi
Power bi
jainema23
 
Data Quality Patterns in the Cloud with Azure Data Factory
Data Quality Patterns in the Cloud with Azure Data FactoryData Quality Patterns in the Cloud with Azure Data Factory
Data Quality Patterns in the Cloud with Azure Data Factory
Mark Kromer
 
Power BI Data Modeling.pdf
Power BI Data Modeling.pdfPower BI Data Modeling.pdf
Power BI Data Modeling.pdf
VishnuGone
 
Dataware house multidimensionalmodelling
Dataware house multidimensionalmodellingDataware house multidimensionalmodelling
Dataware house multidimensionalmodelling
meghu123
 
Data Visualization Design Best Practices Workshop
Data Visualization Design Best Practices WorkshopData Visualization Design Best Practices Workshop
Data Visualization Design Best Practices Workshop
JSI
 
35 power bi presentations
35 power bi presentations35 power bi presentations
35 power bi presentations
Sean Brady
 
Data Science in the Real World: Making a Difference
Data Science in the Real World: Making a Difference Data Science in the Real World: Making a Difference
Data Science in the Real World: Making a Difference
Srinath Perera
 
Data cube
Data cubeData cube
Data cube
Hitesh Mohapatra
 
What Is Power BI? | Introduction To Microsoft Power BI | Power BI Training | ...
What Is Power BI? | Introduction To Microsoft Power BI | Power BI Training | ...What Is Power BI? | Introduction To Microsoft Power BI | Power BI Training | ...
What Is Power BI? | Introduction To Microsoft Power BI | Power BI Training | ...
Edureka!
 
Power BI
Power BIPower BI
Power BI
Kiran Joy
 

What's hot (20)

Google Analytics 101 | 2015
Google Analytics 101 |  2015Google Analytics 101 |  2015
Google Analytics 101 | 2015
 
Power pivot intro
Power pivot introPower pivot intro
Power pivot intro
 
DAX (Data Analysis eXpressions) from Zero to Hero
DAX (Data Analysis eXpressions) from Zero to HeroDAX (Data Analysis eXpressions) from Zero to Hero
DAX (Data Analysis eXpressions) from Zero to Hero
 
Google Analytics - A Brief Intro
Google Analytics - A Brief IntroGoogle Analytics - A Brief Intro
Google Analytics - A Brief Intro
 
Intro to DAX Patterns
Intro to DAX PatternsIntro to DAX Patterns
Intro to DAX Patterns
 
Dimensional Modeling
Dimensional ModelingDimensional Modeling
Dimensional Modeling
 
Power BI for Developers
Power BI for DevelopersPower BI for Developers
Power BI for Developers
 
Data mining & big data presentation 01
Data mining & big data presentation 01Data mining & big data presentation 01
Data mining & big data presentation 01
 
Tech Due Diligence from CTO's perspective - Talk at code.talks commerce
Tech Due Diligence from CTO's perspective - Talk at code.talks commerceTech Due Diligence from CTO's perspective - Talk at code.talks commerce
Tech Due Diligence from CTO's perspective - Talk at code.talks commerce
 
Report authoring in Business Intelligence
Report authoring in Business IntelligenceReport authoring in Business Intelligence
Report authoring in Business Intelligence
 
Power bi
Power biPower bi
Power bi
 
Data Quality Patterns in the Cloud with Azure Data Factory
Data Quality Patterns in the Cloud with Azure Data FactoryData Quality Patterns in the Cloud with Azure Data Factory
Data Quality Patterns in the Cloud with Azure Data Factory
 
Power BI Data Modeling.pdf
Power BI Data Modeling.pdfPower BI Data Modeling.pdf
Power BI Data Modeling.pdf
 
Dataware house multidimensionalmodelling
Dataware house multidimensionalmodellingDataware house multidimensionalmodelling
Dataware house multidimensionalmodelling
 
Data Visualization Design Best Practices Workshop
Data Visualization Design Best Practices WorkshopData Visualization Design Best Practices Workshop
Data Visualization Design Best Practices Workshop
 
35 power bi presentations
35 power bi presentations35 power bi presentations
35 power bi presentations
 
Data Science in the Real World: Making a Difference
Data Science in the Real World: Making a Difference Data Science in the Real World: Making a Difference
Data Science in the Real World: Making a Difference
 
Data cube
Data cubeData cube
Data cube
 
What Is Power BI? | Introduction To Microsoft Power BI | Power BI Training | ...
What Is Power BI? | Introduction To Microsoft Power BI | Power BI Training | ...What Is Power BI? | Introduction To Microsoft Power BI | Power BI Training | ...
What Is Power BI? | Introduction To Microsoft Power BI | Power BI Training | ...
 
Power BI
Power BIPower BI
Power BI
 

Similar to How to Fine-Tune Performance Using Amazon Redshift

Amazon Redshift: Performance Tuning and Optimization
Amazon Redshift: Performance Tuning and OptimizationAmazon Redshift: Performance Tuning and Optimization
Amazon Redshift: Performance Tuning and Optimization
Amazon Web Services
 
(BDT401) Amazon Redshift Deep Dive: Tuning and Best Practices
(BDT401) Amazon Redshift Deep Dive: Tuning and Best Practices(BDT401) Amazon Redshift Deep Dive: Tuning and Best Practices
(BDT401) Amazon Redshift Deep Dive: Tuning and Best Practices
Amazon Web Services
 
Melhores práticas de data warehouse no Amazon Redshift
Melhores práticas de data warehouse no Amazon RedshiftMelhores práticas de data warehouse no Amazon Redshift
Melhores práticas de data warehouse no Amazon Redshift
Amazon Web Services LATAM
 
Deep Dive: Amazon Redshift (March 2017)
Deep Dive: Amazon Redshift (March 2017)Deep Dive: Amazon Redshift (March 2017)
Deep Dive: Amazon Redshift (March 2017)
Julien SIMON
 
Deep Dive Redshift, with a focus on performance
Deep Dive Redshift, with a focus on performanceDeep Dive Redshift, with a focus on performance
Deep Dive Redshift, with a focus on performance
Amazon Web Services
 
AWS (Amazon Redshift) presentation
AWS (Amazon Redshift) presentationAWS (Amazon Redshift) presentation
AWS (Amazon Redshift) presentation
Volodymyr Rovetskiy
 
Brad McGehee Intepreting Execution Plans Mar09
Brad McGehee Intepreting Execution Plans Mar09Brad McGehee Intepreting Execution Plans Mar09
Brad McGehee Intepreting Execution Plans Mar09
guest9d79e073
 
Brad McGehee Intepreting Execution Plans Mar09
Brad McGehee Intepreting Execution Plans Mar09Brad McGehee Intepreting Execution Plans Mar09
Brad McGehee Intepreting Execution Plans Mar09
Mark Ginnebaugh
 
Myth busters - performance tuning 102 2008
Myth busters - performance tuning 102 2008Myth busters - performance tuning 102 2008
Myth busters - performance tuning 102 2008
paulguerin
 
Cassandra data modelling best practices
Cassandra data modelling best practicesCassandra data modelling best practices
Cassandra data modelling best practices
Sandeep Sharma IIMK Smart City,IoT,Bigdata,Cloud,BI,DW
 
Mysql For Developers
Mysql For DevelopersMysql For Developers
Mysql For Developers
Carol McDonald
 
MySQL: Know more about open Source Database
MySQL: Know more about open Source DatabaseMySQL: Know more about open Source Database
MySQL: Know more about open Source Database
Mahesh Salaria
 
02 database oprimization - improving sql performance - ent-db
02  database oprimization - improving sql performance - ent-db02  database oprimization - improving sql performance - ent-db
02 database oprimization - improving sql performance - ent-db
uncleRhyme
 
Query Optimization in SQL Server
Query Optimization in SQL ServerQuery Optimization in SQL Server
Query Optimization in SQL Server
Rajesh Gunasundaram
 
Data Warehouse Best Practices
Data Warehouse Best PracticesData Warehouse Best Practices
Data Warehouse Best Practices
Eduardo Castro
 
Amazon Redshift For Data Analysts
Amazon Redshift For Data AnalystsAmazon Redshift For Data Analysts
Amazon Redshift For Data Analysts
Can Abacıgil
 
London Redshift Meetup - July 2017
London Redshift Meetup - July 2017London Redshift Meetup - July 2017
London Redshift Meetup - July 2017
Pratim Das
 
SQL Server 2008 Development for Programmers
SQL Server 2008 Development for ProgrammersSQL Server 2008 Development for Programmers
SQL Server 2008 Development for Programmers
Adam Hutson
 
Mohan Testing
Mohan TestingMohan Testing
Mohan Testing
smittal81
 
AWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
AWS June 2016 Webinar Series - Amazon Redshift or Big Data AnalyticsAWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
AWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
Amazon Web Services
 

Similar to How to Fine-Tune Performance Using Amazon Redshift (20)

Amazon Redshift: Performance Tuning and Optimization
Amazon Redshift: Performance Tuning and OptimizationAmazon Redshift: Performance Tuning and Optimization
Amazon Redshift: Performance Tuning and Optimization
 
(BDT401) Amazon Redshift Deep Dive: Tuning and Best Practices
(BDT401) Amazon Redshift Deep Dive: Tuning and Best Practices(BDT401) Amazon Redshift Deep Dive: Tuning and Best Practices
(BDT401) Amazon Redshift Deep Dive: Tuning and Best Practices
 
Melhores práticas de data warehouse no Amazon Redshift
Melhores práticas de data warehouse no Amazon RedshiftMelhores práticas de data warehouse no Amazon Redshift
Melhores práticas de data warehouse no Amazon Redshift
 
Deep Dive: Amazon Redshift (March 2017)
Deep Dive: Amazon Redshift (March 2017)Deep Dive: Amazon Redshift (March 2017)
Deep Dive: Amazon Redshift (March 2017)
 
Deep Dive Redshift, with a focus on performance
Deep Dive Redshift, with a focus on performanceDeep Dive Redshift, with a focus on performance
Deep Dive Redshift, with a focus on performance
 
AWS (Amazon Redshift) presentation
AWS (Amazon Redshift) presentationAWS (Amazon Redshift) presentation
AWS (Amazon Redshift) presentation
 
Brad McGehee Intepreting Execution Plans Mar09
Brad McGehee Intepreting Execution Plans Mar09Brad McGehee Intepreting Execution Plans Mar09
Brad McGehee Intepreting Execution Plans Mar09
 
Brad McGehee Intepreting Execution Plans Mar09
Brad McGehee Intepreting Execution Plans Mar09Brad McGehee Intepreting Execution Plans Mar09
Brad McGehee Intepreting Execution Plans Mar09
 
Myth busters - performance tuning 102 2008
Myth busters - performance tuning 102 2008Myth busters - performance tuning 102 2008
Myth busters - performance tuning 102 2008
 
Cassandra data modelling best practices
Cassandra data modelling best practicesCassandra data modelling best practices
Cassandra data modelling best practices
 
Mysql For Developers
Mysql For DevelopersMysql For Developers
Mysql For Developers
 
MySQL: Know more about open Source Database
MySQL: Know more about open Source DatabaseMySQL: Know more about open Source Database
MySQL: Know more about open Source Database
 
02 database oprimization - improving sql performance - ent-db
02  database oprimization - improving sql performance - ent-db02  database oprimization - improving sql performance - ent-db
02 database oprimization - improving sql performance - ent-db
 
Query Optimization in SQL Server
Query Optimization in SQL ServerQuery Optimization in SQL Server
Query Optimization in SQL Server
 
Data Warehouse Best Practices
Data Warehouse Best PracticesData Warehouse Best Practices
Data Warehouse Best Practices
 
Amazon Redshift For Data Analysts
Amazon Redshift For Data AnalystsAmazon Redshift For Data Analysts
Amazon Redshift For Data Analysts
 
London Redshift Meetup - July 2017
London Redshift Meetup - July 2017London Redshift Meetup - July 2017
London Redshift Meetup - July 2017
 
SQL Server 2008 Development for Programmers
SQL Server 2008 Development for ProgrammersSQL Server 2008 Development for Programmers
SQL Server 2008 Development for Programmers
 
Mohan Testing
Mohan TestingMohan Testing
Mohan Testing
 
AWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
AWS June 2016 Webinar Series - Amazon Redshift or Big Data AnalyticsAWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
AWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
 

More from AWS Germany

Analytics Web Day | From Theory to Practice: Big Data Stories from the Field
Analytics Web Day | From Theory to Practice: Big Data Stories from the FieldAnalytics Web Day | From Theory to Practice: Big Data Stories from the Field
Analytics Web Day | From Theory to Practice: Big Data Stories from the Field
AWS Germany
 
Analytics Web Day | Query your Data in S3 with SQL and optimize for Cost and ...
Analytics Web Day | Query your Data in S3 with SQL and optimize for Cost and ...Analytics Web Day | Query your Data in S3 with SQL and optimize for Cost and ...
Analytics Web Day | Query your Data in S3 with SQL and optimize for Cost and ...
AWS Germany
 
Modern Applications Web Day | Impress Your Friends with Your First Serverless...
Modern Applications Web Day | Impress Your Friends with Your First Serverless...Modern Applications Web Day | Impress Your Friends with Your First Serverless...
Modern Applications Web Day | Impress Your Friends with Your First Serverless...
AWS Germany
 
Modern Applications Web Day | Manage Your Infrastructure and Configuration on...
Modern Applications Web Day | Manage Your Infrastructure and Configuration on...Modern Applications Web Day | Manage Your Infrastructure and Configuration on...
Modern Applications Web Day | Manage Your Infrastructure and Configuration on...
AWS Germany
 
Modern Applications Web Day | Container Workloads on AWS
Modern Applications Web Day | Container Workloads on AWSModern Applications Web Day | Container Workloads on AWS
Modern Applications Web Day | Container Workloads on AWS
AWS Germany
 
Modern Applications Web Day | Continuous Delivery to Amazon EKS with Spinnaker
Modern Applications Web Day | Continuous Delivery to Amazon EKS with SpinnakerModern Applications Web Day | Continuous Delivery to Amazon EKS with Spinnaker
Modern Applications Web Day | Continuous Delivery to Amazon EKS with Spinnaker
AWS Germany
 
Building Smart Home skills for Alexa
Building Smart Home skills for AlexaBuilding Smart Home skills for Alexa
Building Smart Home skills for Alexa
AWS Germany
 
Hotel or Taxi? "Sorting hat" for travel expenses with AWS ML infrastructure
Hotel or Taxi? "Sorting hat" for travel expenses with AWS ML infrastructureHotel or Taxi? "Sorting hat" for travel expenses with AWS ML infrastructure
Hotel or Taxi? "Sorting hat" for travel expenses with AWS ML infrastructure
AWS Germany
 
Wild Rydes with Big Data/Kinesis focus: AWS Serverless Workshop
Wild Rydes with Big Data/Kinesis focus: AWS Serverless WorkshopWild Rydes with Big Data/Kinesis focus: AWS Serverless Workshop
Wild Rydes with Big Data/Kinesis focus: AWS Serverless Workshop
AWS Germany
 
Log Analytics with AWS
Log Analytics with AWSLog Analytics with AWS
Log Analytics with AWS
AWS Germany
 
Deep Dive into Concepts and Tools for Analyzing Streaming Data on AWS
Deep Dive into Concepts and Tools for Analyzing Streaming Data on AWS Deep Dive into Concepts and Tools for Analyzing Streaming Data on AWS
Deep Dive into Concepts and Tools for Analyzing Streaming Data on AWS
AWS Germany
 
AWS Programme für Nonprofits
AWS Programme für NonprofitsAWS Programme für Nonprofits
AWS Programme für Nonprofits
AWS Germany
 
Microservices and Data Design
Microservices and Data DesignMicroservices and Data Design
Microservices and Data Design
AWS Germany
 
Serverless vs. Developers – the real crash
Serverless vs. Developers – the real crashServerless vs. Developers – the real crash
Serverless vs. Developers – the real crash
AWS Germany
 
Query your data in S3 with SQL and optimize for cost and performance
Query your data in S3 with SQL and optimize for cost and performanceQuery your data in S3 with SQL and optimize for cost and performance
Query your data in S3 with SQL and optimize for cost and performance
AWS Germany
 
Secret Management with Hashicorp’s Vault
Secret Management with Hashicorp’s VaultSecret Management with Hashicorp’s Vault
Secret Management with Hashicorp’s Vault
AWS Germany
 
EKS Workshop
 EKS Workshop EKS Workshop
EKS Workshop
AWS Germany
 
Scale to Infinity with ECS
Scale to Infinity with ECSScale to Infinity with ECS
Scale to Infinity with ECS
AWS Germany
 
Containers on AWS - State of the Union
Containers on AWS - State of the UnionContainers on AWS - State of the Union
Containers on AWS - State of the Union
AWS Germany
 
Deploying and Scaling Your First Cloud Application with Amazon Lightsail
Deploying and Scaling Your First Cloud Application with Amazon LightsailDeploying and Scaling Your First Cloud Application with Amazon Lightsail
Deploying and Scaling Your First Cloud Application with Amazon Lightsail
AWS Germany
 

More from AWS Germany (20)

Analytics Web Day | From Theory to Practice: Big Data Stories from the Field
Analytics Web Day | From Theory to Practice: Big Data Stories from the FieldAnalytics Web Day | From Theory to Practice: Big Data Stories from the Field
Analytics Web Day | From Theory to Practice: Big Data Stories from the Field
 
Analytics Web Day | Query your Data in S3 with SQL and optimize for Cost and ...
Analytics Web Day | Query your Data in S3 with SQL and optimize for Cost and ...Analytics Web Day | Query your Data in S3 with SQL and optimize for Cost and ...
Analytics Web Day | Query your Data in S3 with SQL and optimize for Cost and ...
 
Modern Applications Web Day | Impress Your Friends with Your First Serverless...
Modern Applications Web Day | Impress Your Friends with Your First Serverless...Modern Applications Web Day | Impress Your Friends with Your First Serverless...
Modern Applications Web Day | Impress Your Friends with Your First Serverless...
 
Modern Applications Web Day | Manage Your Infrastructure and Configuration on...
Modern Applications Web Day | Manage Your Infrastructure and Configuration on...Modern Applications Web Day | Manage Your Infrastructure and Configuration on...
Modern Applications Web Day | Manage Your Infrastructure and Configuration on...
 
Modern Applications Web Day | Container Workloads on AWS
Modern Applications Web Day | Container Workloads on AWSModern Applications Web Day | Container Workloads on AWS
Modern Applications Web Day | Container Workloads on AWS
 
Modern Applications Web Day | Continuous Delivery to Amazon EKS with Spinnaker
Modern Applications Web Day | Continuous Delivery to Amazon EKS with SpinnakerModern Applications Web Day | Continuous Delivery to Amazon EKS with Spinnaker
Modern Applications Web Day | Continuous Delivery to Amazon EKS with Spinnaker
 
Building Smart Home skills for Alexa
Building Smart Home skills for AlexaBuilding Smart Home skills for Alexa
Building Smart Home skills for Alexa
 
Hotel or Taxi? "Sorting hat" for travel expenses with AWS ML infrastructure
Hotel or Taxi? "Sorting hat" for travel expenses with AWS ML infrastructureHotel or Taxi? "Sorting hat" for travel expenses with AWS ML infrastructure
Hotel or Taxi? "Sorting hat" for travel expenses with AWS ML infrastructure
 
Wild Rydes with Big Data/Kinesis focus: AWS Serverless Workshop
Wild Rydes with Big Data/Kinesis focus: AWS Serverless WorkshopWild Rydes with Big Data/Kinesis focus: AWS Serverless Workshop
Wild Rydes with Big Data/Kinesis focus: AWS Serverless Workshop
 
Log Analytics with AWS
Log Analytics with AWSLog Analytics with AWS
Log Analytics with AWS
 
Deep Dive into Concepts and Tools for Analyzing Streaming Data on AWS
Deep Dive into Concepts and Tools for Analyzing Streaming Data on AWS Deep Dive into Concepts and Tools for Analyzing Streaming Data on AWS
Deep Dive into Concepts and Tools for Analyzing Streaming Data on AWS
 
AWS Programme für Nonprofits
AWS Programme für NonprofitsAWS Programme für Nonprofits
AWS Programme für Nonprofits
 
Microservices and Data Design
Microservices and Data DesignMicroservices and Data Design
Microservices and Data Design
 
Serverless vs. Developers – the real crash
Serverless vs. Developers – the real crashServerless vs. Developers – the real crash
Serverless vs. Developers – the real crash
 
Query your data in S3 with SQL and optimize for cost and performance
Query your data in S3 with SQL and optimize for cost and performanceQuery your data in S3 with SQL and optimize for cost and performance
Query your data in S3 with SQL and optimize for cost and performance
 
Secret Management with Hashicorp’s Vault
Secret Management with Hashicorp’s VaultSecret Management with Hashicorp’s Vault
Secret Management with Hashicorp’s Vault
 
EKS Workshop
 EKS Workshop EKS Workshop
EKS Workshop
 
Scale to Infinity with ECS
Scale to Infinity with ECSScale to Infinity with ECS
Scale to Infinity with ECS
 
Containers on AWS - State of the Union
Containers on AWS - State of the UnionContainers on AWS - State of the Union
Containers on AWS - State of the Union
 
Deploying and Scaling Your First Cloud Application with Amazon Lightsail
Deploying and Scaling Your First Cloud Application with Amazon LightsailDeploying and Scaling Your First Cloud Application with Amazon Lightsail
Deploying and Scaling Your First Cloud Application with Amazon Lightsail
 

Recently uploaded

CHEMICAL INDUSTRY IN MALAYSIA-CIMAH.pptx
CHEMICAL INDUSTRY IN MALAYSIA-CIMAH.pptxCHEMICAL INDUSTRY IN MALAYSIA-CIMAH.pptx
CHEMICAL INDUSTRY IN MALAYSIA-CIMAH.pptx
izzah863829
 
June 17, 2024, Meet Mack Monday Zoom Meeting
June 17, 2024, Meet Mack Monday Zoom MeetingJune 17, 2024, Meet Mack Monday Zoom Meeting
June 17, 2024, Meet Mack Monday Zoom Meeting
JohnMackNewtown
 
DAY 10 D Revelation 07-21-2024 PPT.pptx
DAY 10  D Revelation 07-21-2024 PPT.pptxDAY 10  D Revelation 07-21-2024 PPT.pptx
DAY 10 D Revelation 07-21-2024 PPT.pptx
FamilyWorshipCenterD
 
Pass AWS Certified Developer Associate with new exam dumps 2024
Pass AWS Certified Developer Associate  with new exam dumps 2024Pass AWS Certified Developer Associate  with new exam dumps 2024
Pass AWS Certified Developer Associate with new exam dumps 2024
SkillCertProExams
 
2024-07-21 Transformed 08 (shared slides).pptx
2024-07-21 Transformed 08 (shared slides).pptx2024-07-21 Transformed 08 (shared slides).pptx
2024-07-21 Transformed 08 (shared slides).pptx
Dale Wells
 
Curtin Cert degree offer diploma
Curtin Cert degree offer diploma Curtin Cert degree offer diploma
Curtin Cert degree offer diploma
popecap
 
Integrated and localized Approach in Development Communication.pptx
Integrated and localized Approach in Development Communication.pptxIntegrated and localized Approach in Development Communication.pptx
Integrated and localized Approach in Development Communication.pptx
Sayan Bachaspati
 
Cal Girls Bani Park Jaipur | | Girls Call Free Drop Service
Cal Girls Bani Park Jaipur | | Girls Call Free Drop ServiceCal Girls Bani Park Jaipur | | Girls Call Free Drop Service
Cal Girls Bani Park Jaipur | | Girls Call Free Drop Service
Deepikakumari457585
 
Fertilizer production by indorama fertilizer co.pptx
Fertilizer production by indorama fertilizer co.pptxFertilizer production by indorama fertilizer co.pptx
Fertilizer production by indorama fertilizer co.pptx
JohnMatthew62
 
Girls Call Raja Park Jaipur | 08445551418 | Free Drop Service
Girls Call Raja Park Jaipur | 08445551418 | Free Drop ServiceGirls Call Raja Park Jaipur | 08445551418 | Free Drop Service
Girls Call Raja Park Jaipur | 08445551418 | Free Drop Service
yadhnajanni
 
Large language model for public services
Large language model for public servicesLarge language model for public services
Large language model for public services
Mohamed Elharty
 
Cal Girls Hotel Highway King Jaipur | 8445551418 | Top Class High Profile Bea...
Cal Girls Hotel Highway King Jaipur | 8445551418 | Top Class High Profile Bea...Cal Girls Hotel Highway King Jaipur | 8445551418 | Top Class High Profile Bea...
Cal Girls Hotel Highway King Jaipur | 8445551418 | Top Class High Profile Bea...
pradeepkumar66952#S007
 
Cal Girls Shyam Nagar Jaipur | 8445551418 | Sweet Girls Call With Hotels
Cal Girls Shyam Nagar Jaipur | 8445551418 | Sweet Girls Call With HotelsCal Girls Shyam Nagar Jaipur | 8445551418 | Sweet Girls Call With Hotels
Cal Girls Shyam Nagar Jaipur | 8445551418 | Sweet Girls Call With Hotels
chanchalrani3534
 
Trapbone Routing Plan created by Marcus Davis Jr
Trapbone Routing Plan created by Marcus Davis JrTrapbone Routing Plan created by Marcus Davis Jr
Trapbone Routing Plan created by Marcus Davis Jr
MarcusDavisJr1
 
Communication Skills........Let's Learn
Communication Skills........Let's Learn Communication Skills........Let's Learn
Communication Skills........Let's Learn
pdtrainernayab
 
Cal Girls Holiday Inn Jaipur City Centre | 8445551418 | Girls Call With Sweet...
Cal Girls Holiday Inn Jaipur City Centre | 8445551418 | Girls Call With Sweet...Cal Girls Holiday Inn Jaipur City Centre | 8445551418 | Girls Call With Sweet...
Cal Girls Holiday Inn Jaipur City Centre | 8445551418 | Girls Call With Sweet...
mohankumar66951#S0007
 
DCS for presenation ahah phd gaming zone for
DCS for presenation ahah phd gaming zone forDCS for presenation ahah phd gaming zone for
DCS for presenation ahah phd gaming zone for
abhishekaiimsonian
 
Toast To TGIS- Newsletter June 2024 TGIS.pdf
Toast To TGIS- Newsletter June 2024 TGIS.pdfToast To TGIS- Newsletter June 2024 TGIS.pdf
Toast To TGIS- Newsletter June 2024 TGIS.pdf
toastmasterstgis
 
Cal Girls Gopalpura Bypass Rd Jaipur | 8445551418 | Top Class High Profile Be...
Cal Girls Gopalpura Bypass Rd Jaipur | 8445551418 | Top Class High Profile Be...Cal Girls Gopalpura Bypass Rd Jaipur | 8445551418 | Top Class High Profile Be...
Cal Girls Gopalpura Bypass Rd Jaipur | 8445551418 | Top Class High Profile Be...
nagunakhan
 
NAAC REFORMS IN ACCREDITATION 2024.pptx
NAAC REFORMS IN ACCREDITATION  2024.pptxNAAC REFORMS IN ACCREDITATION  2024.pptx
NAAC REFORMS IN ACCREDITATION 2024.pptx
VeluSureshKumar
 

Recently uploaded (20)

CHEMICAL INDUSTRY IN MALAYSIA-CIMAH.pptx
CHEMICAL INDUSTRY IN MALAYSIA-CIMAH.pptxCHEMICAL INDUSTRY IN MALAYSIA-CIMAH.pptx
CHEMICAL INDUSTRY IN MALAYSIA-CIMAH.pptx
 
June 17, 2024, Meet Mack Monday Zoom Meeting
June 17, 2024, Meet Mack Monday Zoom MeetingJune 17, 2024, Meet Mack Monday Zoom Meeting
June 17, 2024, Meet Mack Monday Zoom Meeting
 
DAY 10 D Revelation 07-21-2024 PPT.pptx
DAY 10  D Revelation 07-21-2024 PPT.pptxDAY 10  D Revelation 07-21-2024 PPT.pptx
DAY 10 D Revelation 07-21-2024 PPT.pptx
 
Pass AWS Certified Developer Associate with new exam dumps 2024
Pass AWS Certified Developer Associate  with new exam dumps 2024Pass AWS Certified Developer Associate  with new exam dumps 2024
Pass AWS Certified Developer Associate with new exam dumps 2024
 
2024-07-21 Transformed 08 (shared slides).pptx
2024-07-21 Transformed 08 (shared slides).pptx2024-07-21 Transformed 08 (shared slides).pptx
2024-07-21 Transformed 08 (shared slides).pptx
 
Curtin Cert degree offer diploma
Curtin Cert degree offer diploma Curtin Cert degree offer diploma
Curtin Cert degree offer diploma
 
Integrated and localized Approach in Development Communication.pptx
Integrated and localized Approach in Development Communication.pptxIntegrated and localized Approach in Development Communication.pptx
Integrated and localized Approach in Development Communication.pptx
 
Cal Girls Bani Park Jaipur | | Girls Call Free Drop Service
Cal Girls Bani Park Jaipur | | Girls Call Free Drop ServiceCal Girls Bani Park Jaipur | | Girls Call Free Drop Service
Cal Girls Bani Park Jaipur | | Girls Call Free Drop Service
 
Fertilizer production by indorama fertilizer co.pptx
Fertilizer production by indorama fertilizer co.pptxFertilizer production by indorama fertilizer co.pptx
Fertilizer production by indorama fertilizer co.pptx
 
Girls Call Raja Park Jaipur | 08445551418 | Free Drop Service
Girls Call Raja Park Jaipur | 08445551418 | Free Drop ServiceGirls Call Raja Park Jaipur | 08445551418 | Free Drop Service
Girls Call Raja Park Jaipur | 08445551418 | Free Drop Service
 
Large language model for public services
Large language model for public servicesLarge language model for public services
Large language model for public services
 
Cal Girls Hotel Highway King Jaipur | 8445551418 | Top Class High Profile Bea...
Cal Girls Hotel Highway King Jaipur | 8445551418 | Top Class High Profile Bea...Cal Girls Hotel Highway King Jaipur | 8445551418 | Top Class High Profile Bea...
Cal Girls Hotel Highway King Jaipur | 8445551418 | Top Class High Profile Bea...
 
Cal Girls Shyam Nagar Jaipur | 8445551418 | Sweet Girls Call With Hotels
Cal Girls Shyam Nagar Jaipur | 8445551418 | Sweet Girls Call With HotelsCal Girls Shyam Nagar Jaipur | 8445551418 | Sweet Girls Call With Hotels
Cal Girls Shyam Nagar Jaipur | 8445551418 | Sweet Girls Call With Hotels
 
Trapbone Routing Plan created by Marcus Davis Jr
Trapbone Routing Plan created by Marcus Davis JrTrapbone Routing Plan created by Marcus Davis Jr
Trapbone Routing Plan created by Marcus Davis Jr
 
Communication Skills........Let's Learn
Communication Skills........Let's Learn Communication Skills........Let's Learn
Communication Skills........Let's Learn
 
Cal Girls Holiday Inn Jaipur City Centre | 8445551418 | Girls Call With Sweet...
Cal Girls Holiday Inn Jaipur City Centre | 8445551418 | Girls Call With Sweet...Cal Girls Holiday Inn Jaipur City Centre | 8445551418 | Girls Call With Sweet...
Cal Girls Holiday Inn Jaipur City Centre | 8445551418 | Girls Call With Sweet...
 
DCS for presenation ahah phd gaming zone for
DCS for presenation ahah phd gaming zone forDCS for presenation ahah phd gaming zone for
DCS for presenation ahah phd gaming zone for
 
Toast To TGIS- Newsletter June 2024 TGIS.pdf
Toast To TGIS- Newsletter June 2024 TGIS.pdfToast To TGIS- Newsletter June 2024 TGIS.pdf
Toast To TGIS- Newsletter June 2024 TGIS.pdf
 
Cal Girls Gopalpura Bypass Rd Jaipur | 8445551418 | Top Class High Profile Be...
Cal Girls Gopalpura Bypass Rd Jaipur | 8445551418 | Top Class High Profile Be...Cal Girls Gopalpura Bypass Rd Jaipur | 8445551418 | Top Class High Profile Be...
Cal Girls Gopalpura Bypass Rd Jaipur | 8445551418 | Top Class High Profile Be...
 
NAAC REFORMS IN ACCREDITATION 2024.pptx
NAAC REFORMS IN ACCREDITATION  2024.pptxNAAC REFORMS IN ACCREDITATION  2024.pptx
NAAC REFORMS IN ACCREDITATION 2024.pptx
 

How to Fine-Tune Performance Using Amazon Redshift

  • 1. Amazon Redshift: Good practices for performance optimization
  • 2. What is Amazon Redshift? •Petabyte-scale columnar database •Fast response time • ~ 10 times faster than typical relational database •Cost-effective (around 1000 $ per TB per year)
  • 3. When customers are using Amazon Redshift? • Reduce cost by extending DW instead of adding HW • Migrate from existing DWH system • Respond faster to business: provision in minutes Replacing traditional DWH • Improve performance by an order of magnitude • Make more data available for analysis • Access business data via standard reporting tools Analyzing Big Data • Add analytics functionality to the applications • Scale DW capacity as demand grows • Reduce SW and HW costs by an order of magnitude Providing services and SaaS
  • 5. Redshift reduces IO • With raw storage you do unnecessary IO • With columnar storage you only read the data you need Column storage • COPY compresses automatically • You can analyze and override • More performance, less cost Data compression • Track the minimum and maximum value for each block • Skip over blocks that do not contain relevant data Zone maps Direct-attached storage • > 2 GB/sec scan rate • Optimized for data processing • High disk density
  • 7. Main possible Issues with Redshift performance • Incorrect column encoding • Skewed table data • Queries not benefiting from sort keys – Excessive scanning • Tables without statistics or which need vacuum • Tables with very large VARCHAR columns • Queries waiting on queue slots • Queries that are disk-based – incorrect sizing, GROUP BY distinct values, UNION, hash joins with DISTINCT values • Commit queue waits • Inefficient data loads • Inefficient use of Temporary Tables • Large nested loop JOINs • Inappropriate Join Cardinality
  • 8. Incorrect column encoding Running an Amazon Redshift cluster without column encoding is not considered a best practice, and customers find large performance gains when they ensure that column encoding is optimally applied. To determine if you are deviating from this best practice, you can use the v_extended_table_info view from the Amazon Redshift Utils GitHub repository. If you find that you have tables without optimal column encoding, then use the Amazon Redshift Column Encoding Utility from the Utils repository to apply encoding. This command line utility uses the ANALYZE COMPRESSION command on each table. If encoding is required, it generates a SQL script which creates a new table with the correct encoding, copies all the data into the new table, and then transactionally renames the new table to the old name while retaining the original data. •Raw Encoding •Byte-Dictionary Encoding •Delta Encoding •LZO Encoding •Mostly Encoding •Runlength Encoding •Text255 and Text32k Encodings •Zstandard Encoding
  • 9. Skewed table data If skew is a problem, you typically see that node performance is uneven on the cluster. Use table_inspector.sql, to see how data blocks in a distribution key map to the slices and nodes in the cluster. Consider changing the distribution key to a column that exhibits high cardinality and uniform distribution. Evaluate a candidate column as a distribution key by creating a new table using CTAS: CREATE TABLE my_test_table DISTKEY (<column name>) AS SELECT <column name> FROM <table name>; You can use SCT for setting the proper Distribution Key and Style during the migration process
  • 10. Deal with tables with very large VARCHAR columns During processing of complex queries, intermediate query results might need to be stored in temporary blocks. These temporary tables are not compressed, so unnecessarily wide columns consume excessive memory and temporary disk space, which can affect query performance. SELECT database, schema || '.' || "table" AS "table", max_varchar FROM svv_table_info WHERE max_varchar > 150 ORDER BY 2; Use the following query to generate a list of tables that should have their maximum column widths reviewed: Identify which table columns have wide varchar columns and then determine the true maximum width for each wide column: SELECT max(len(rtrim(column_name))) FROM table_name; If you query the top running queries for the database using the top_queries.sql admin script, pay special attention to SELECT * queries which include the JSON fragment column. If end users query these large columns but don’t use actually execute JSON functions against them, consider moving them into another table that only contains the primary key column of the original table and the JSON column.
  • 11. Queries not benefiting from sort keys Amazon Redshift tables can have a sort key column identified, which acts like an index in other databases, but which does not incur a storage cost as with other platforms (for more information, see Choosing Sort Keys). A sort key should be created on those columns which are most commonly used in WHERE clauses. If you have a known query pattern, then COMPOUND sort keys give the best performance; if end users query different columns equally, then use an INTERLEAVED sort key. If using compound sort keys, review your queries to ensure that their WHERE clauses specify the sort columns in the same order they were defined in the compound key. To determine which tables don’t have sort keys, run the following query against the v_extended_table_info view from the Amazon Redshift Utils repository: SELECT * FROM admin.v_extended_table_info WHERE sortkey IS null; If you do not have a sort key this potentially can lead to a excessive scanning issue.
  • 12. Queries waiting on queue slots (ICT) Amazon Redshift runs queries using a queuing system known as workload management (WLM). You can define up to 8 queues to separate workloads from each other, and set the concurrency on each queue to meet your overall throughput requirements. In some cases, the queue to which a user or query has been assigned is completely busy and a user’s query must wait for a slot to be open. During this time, the system is not executing the query at all, which is a sign that you may need to increase concurrency. First, you need to determine if any queries are queuing, using the queuing_queries.sql admin script. Review the maximum concurrency that your cluster has needed in the past with wlm_apex.sql, down to an hour-by-hour historical analysis with wlm_apex_hourly.sql. Keep in mind that while increasing concurrency allows more queries to run, each query will get a smaller share of the memory allocated to its queue
  • 13. Queries that are disk-based (ICT) SELECT q.query, trim(q.cat_text) FROM ( SELECT query, replace( listagg(text,' ') WITHIN GROUP (ORDER BY sequence), 'n', ' ') AS cat_text FROM stl_querytext WHERE userid>1 GROUP BY query) q JOIN ( SELECT distinct query FROM svl_query_summary WHERE is_diskbased='t' AND (LABEL LIKE 'hash%' OR LABEL LIKE 'sort%’ OR LABEL LIKE 'aggr%’) AND userid > 1) qs ON qs.query = q.query; If a query isn’t able to completely execute in memory, it may need to use disk-based temporary storage for parts of an explain plan. The additional disk I/O slows down the query; this can be addressed by increasing the amount of memory allocated to a session (for more information, see WLM Dynamic Memory Allocation). To determine if any queries have been writing to disk, use the following query: Based on the user or the queue assignment rules, you can increase the amount of memory given to the selected queue to prevent queries needing to spill to disk to complete.
  • 14. Commit queue waits (ICT) Amazon Redshift is designed for analytics queries, rather than transaction processing. The cost of COMMIT is relatively high, and excessive use of COMMIT can result in queries waiting for access to a commit queue. If you are committing too often on your database, you will start to see waits on the commit queue increase, which can be viewed with the commit_stats.sql admin script. This script shows the largest queue length and queue time for queries run in the past two days. If you have queries that are waiting on the commit queue, then look for sessions that are committing multiple times per session, such as ETL jobs that are logging progress or inefficient data loads. One of the worst practices is to insert data into Amazon Redshift row by row. Use COPY command or your ETL tool compatible with Amazon Redshift instead.
  • 15. Inefficient use of Temporary Tables Amazon Redshift provides temporary tables, which are like normal tables except that they are only visible within a single session. When the user disconnects the session, the tables are automatically deleted. Temporary tables can be created using the CREATE TEMPORARY TABLE syntax, or by issuing a SELECT … INTO #TEMP_TABLE query. The CREATE TABLE statement gives you complete control over the definition of the temporary table, while the SELECT … INTO and C(T)TAS commands use the input data to determine column names, sizes and data types, and use default storage properties. These default storage properties may cause issues if not carefully considered. Amazon Redshift’s default table structure is to use EVEN distribution with no column encoding. This is a sub-optimal data structure for many types of queries, and if you are using the SELECT…INTO syntax you cannot set the column encoding or distribution and sort keys. If you use the CREATE TABLE AS (CTAS) syntax, you can specify a distribution style and sort keys, and Redshift will automatically apply LZO encoding for everything other than sort keys, booleans, reals and doubles. If you consider this automatic encoding sub- optimal, and require further control, use the CREATE TABLE syntax rather than CTAS. If you are creating temporary tables, it is highly recommended that you convert all SELECT…INTO syntax to use the CREATE statement. This ensures that your temporary tables have column encodings and are distributed in a fashion that is sympathetic to the other entities that are part of the workflow.
  • 16. Tables without statistics • Amazon Redshift, like other databases, requires statistics about tables and the composition of data blocks being stored in order to make good decisions when planning a query (for more information, see Analyzing Tables). Without good statistics, the optimiser may make suboptimal choices about the order in which to access tables, or how to join datasets together. • The ANALYZE Command History topic in the Amazon Redshift Developer Guide supplies queries to help you address missing or stale statistics, and you can also simply run the missing_table_stats.sql admin script to determine which tables are missing stats, or the statement below to determine tables that have stale statistics: SELECT database, schema || '.' || "table" AS "table", stats_off FROM svv_table_info WHERE stats_off > 5 ORDER BY 2;
  • 17. Tables which need VACUUM In Amazon Redshift, data blocks are immutable. When rows are DELETED or UPDATED, they are simply logically deleted (flagged for deletion) but not physically removed from disk. Updates result in a new block being written with new data appended. Both of these operations cause the previous version of the row to continue consuming disk space and continue being scanned when a query scans the table. As a result, table storage space is increased and performance degraded due to otherwise avoidable disk I/O during scans. A VACUUM command recovers the space from deleted rows and restores the sort order. You can use the perf_alert.sql admin script to identify tables that have had alerts about scanning a large number of deleted rows raised in the last seven days. To address issues with tables with missing or stale statistics or where vacuum is required, run another AWS Labs utility, Analyze & Vacuum Schema. This ensures that you always keep up-to-date statistics, and only vacuum tables that actually need reorganization.
  • 18. Analyze the performance of the queries and address issues • Your best friends in a Redshift world are: • ANALYZE for identifying out-of-date statistics • SVL_QUERY_SUMMARY View for summary of all the useful data around the behavior of your queries and high- level overview of the cluster • Query Alerts – for receiving an important information around deviations in behavior of your queries • SVL_QUERY_REPORT View – for collecting details about your queries health and performance
  • 19. Setup and finetune monitoring and maintenance procedures • In addition to your preferred monitoring methods and multiple partners solutions we do have some helpful tools: • Amazon Redshift Column Encoding Utility • Use table_inspector.sql, to see how data blocks in a distribution key map to the slices and nodes in the cluster. • Run the missing_table_stats.sql admin script to determine which tables are missing stats. • Use the perf_alert.sql admin script to identify tables that have had alerts about scanning a large number of deleted rows raised in the last seven days. • Use top_queries.sql to determine the top running queries. • Review the maximum concurrency that your cluster has needed in the past with wlm_apex.sql, down to an hour-by-hour historical analysis with wlm_apex_hourly.sql. • View the commit stats with the commit_stats.sql admin script
  • 20. Eliminate Nested Loops • Due to SQL query specifying a join condition that requires a ”brute force” approach between two large tables • Quite easy to spot • Look for “Nested Loop” in a Query Plans or STL_ALERT_EVENT_LOG ON a.date > b.date ON a.text LIKE b.text ON a.x = b.x OR a.y = b.y • Rewrite inequality JOIN condition as a window function • Use small nested loop instead of two large tables • Maybe you can do a nested loop join and persist the results in a separate relational table
  • 21. Inappropriate Joins Cardinality • Query “Fan-out” • Look for high number of rows generated from joins (higher than the sum of rows of all scanned tables) in the query plan or execution metrics • Carefully review the joins logic • Use SVL_QUERY_SUMMARY to detect FROM house JOIN rooms ON rooms.house_id = house.id JOIN residents ON residents.house_id = house.id • If possible break large fan-out queries into several smaller queries • Use derived tables to transform one-to-many joins into one-to-one joins • Try out some advanced techniques like 1 and 2
  • 22. Suboptimal WHERE clause If your WHERE clause causes excessive table scans, you might see a SCAN step in the segment with the highest maxtime value in SVL_QUERY_SUMMARY. For more information, see Using the SVL_QUERY_SUMMARY View. To fix this issue, add a WHERE clause to the query based on the primary sort column of the largest table. This approach will help minimize scanning time. For more information, see Amazon Redshift Best Practices for Designing Tables.
  • 23. Insufficiently Restrictive Predicate If your query has an insufficiently restrictive predicate, you might see a SCAN step in the segment with the highest maxtime value in SVL_QUERY_SUMMARY that has a very high rows value compared to the rows value in the final RETURN step in the query. For more information, see Using the SVL_QUERY_SUMMARY View. To fix this issue, try adding a predicate to the query or making the existing predicate more restrictive to narrow the output.
  • 24. Very Large Result Set If your query returns a very large result set, consider rewriting the query to use UNLOAD to write the results to Amazon S3. This approach will improve the performance of the RETURN step by taking advantage of parallel processing. For more information on checking for a very large result set, see Using the SVL_QUERY_SUMMARY View.
  • 25. Large SELECT List If your query has an unusually large SELECT list, you might see a bytes value that is high relative to the rows value for any step (in comparison to other steps) in SVL_QUERY_SUMMARY. This high bytes value can be an indicator that you are selecting a lot of columns. For more information, see Using the SVL_QUERY_SUMMARY View. To fix this issue, review the columns you are selecting and see if any can be removed.
  • 26. Working with SVL_QUERY_SUMMARY SELECT query, elapsed, substring FROM svl_qlog ORDER BY query DESC limit 5; Select your query ID Collect query data SELECT * FROM svl_query_summary WHERE query = MyQueryID ORDER BY stm, seg, step; For analysis please refer to: https://docs.aws.amazon.com/redshift/latest/dg/using-SVL- Query-Summary.html
  • 27. Some additional materials can be found here • Best practices for designing queries • Best practices for designing tables • Top 10 performance tuning techniques • Troubleshooting queries Amazon Redshift Developer Guide https://docs.aws.amazon.com/redshift/l atest/dg/redshift-dg.pdf