SlideShare a Scribd company logo
Amazon Redshift 소개 및 실습
김상필
아마존웹서비스 솔루션즈 아키텍트
2017년 5월 30일
13:00 – 15:00
Agenda
• Redshift 개요, 테이블 설계 및 데이터 로딩 고려사항 (30분)
• Qwiklab 실습 1 - Advanced Amazon Redshift: Data Loading (45분)
• Qwiklab 실습 2 - Advanced Amazon Redshift: Table Layout and Schema
Design (45분)
Amazon Redshift
Fast, simple, petabyte-scale data warehousing for less than $1,000/TB/Year
Relational data warehouse
Massively parallel; Petabyte scale
Fully managed
HDD and SSD Platforms
$1,000/TB/Year; starts at $0.25/hour
Amazon
Redshift
a lot faster
a lot simpler
a lot cheaper
Amazon Redshift 아키텍처
Leader Node
Simple SQL end point
Stores metadata
Optimizes query plan
Coordinates query execution
Compute Nodes
Local columnar storage
Parallel/distributed execution of all queries, loads, back
ups, restores, resizes
Start at just $0.25/hour, grow to 2 PB (compressed)
DC1: SSD; scale from 160 GB to 326 TB
DS1/DS2: HDD; scale from 2 TB to 2 PB
Ingestion/Backup
Backup
Restore
JDBC/ODBC
10 GigE
(HPC)
Amazon Redshift의 빠른 속도를 위한 구성
Dramatically less I/O
Column storage
Data compression
Zone maps
Direct-attached storage
Large data block sizes
analyze compression listing;
Table | Column | Encoding
---------+----------------+----------
listing | listid | delta
listing | sellerid | delta32k
listing | eventid | delta32k
listing | dateid | bytedict
listing | numtickets | bytedict
listing | priceperticket | delta32k
listing | totalprice | mostly32
listing | listtime | raw
10 | 13 | 14 | 26 |…
… | 100 | 245 | 324
375 | 393 | 417…
… 512 | 549 | 623
637 | 712 | 809 …
… | 834 | 921 | 959
10
324
375
623
637
959
Amazon Redshift의 빠른 속도를 위한 구성
Parallel and Distributed
Query
Load
Export
Backup
Restore
Resize
ID Name
1 John Smith
2 Jane Jones
3 Peter Black
4 Pat Partridge
5 Sarah Cyan
6 Brian Snail
1 John Smith
4 Pat Partridge
2 Jane Jones
5 Sarah Cyan
3 Peter Black
6 Brian Snail
Amazon Redshift의 빠른 속도를 위한 구성
Distribution Keys
Amazon Redshift Security
Petabyte-Scale Data Warehousing Service
Amazon Redshift Table and Schema Design
Amazon Redshift는 기존의 데이터 모델 지원
Star Snowflake
적합한 데이터 타입의 선택
Redshift performance is about efficient I/O
Don’t make columns wider than necessary, e.g.:
• Avoid BIGINT for country identifier
• Avoid CHAR(MAX) for country names
• Oversizing VARCHAR impact loading and runtime performance
Use appropriate types
• Use TIMESTAMP or DATE instead of CHAR
• Use CHAR instead of VARCHAR when appropriate
Multibyte Characters
• VARCHAR data type supports UTF-8 multibyte characters up to a maximum of four bytes
• The CHAR data type does not support multibyte characters
아키텍처 및 스키마 설계 고려사항
• Redshift is a distributed system:
– A compute node contains slices
(one per core)
– A slice contains data
• Queries run on all slices in parallel:
optimal query throughput can be
achieved when data is evenly spread
across slices
테이블 분산 타입
Distribution Key All
Node 1
Slice 1 Slice 2
Node 2
Slice 3 Slice 4
Node 1
Slice 1 Slice 2
Node 2
Slice 3 Slice 4
All data on
every node
Same key to same location
Node 1
Slice 1 Slice 2
Node 2
Slice 3 Slice 4
Even
Round robin
distribution
분산 타입의 선택
Choose a distribution style of KEY for
• Large data tables, like a FACT table in a star schema
• Large or rapidly changing tables used in joins or aggregations
• Improved performance even if the key is not used in join column
Choose a distribution style of ALL for tables that
• Have slowly changing data
• Reasonable size (i.e., few millions but not 100’s of millions of rows)
• No common distribution key for frequent joins
• Typical use case – joined dimension table without a common distri
bution key
Choose a distribution style of EVEN for tables that are not joined
and have no aggregate queries
Node 1
Slice 1 Slice 2
Node 2
Slice 3 Slice 4
cloudfront
uri = /games/g1.exe
user_id=1234
…
user_profile
user_id=1234
name=janet
…
user_profile
user_id=6789
name=fred
…
cloudfront
uri = /imgs/ad1.png
user_id=2345
…
user_profile
user_id=2345
name=bill
…
cloudfront
uri=/games/g10.exe
user_id=4312
…
user_profile
user_id=4312
name=fred
…
order_line
order_line_id = 25693
…
cloudfront
uri = /img/ad_5.img
user_id=1234
…
Distribution Keys 활용 데이터 분산
Node 1
Slice 1 Slice 2
Node 2
Slice 3 Slice 4
user_profile
user_id=1234
name=janet
…
user_profile
user_id=6789
name=fred
…
cloudfront
uri = /imgs/ad1.png
user_id=2345
…
user_profile
user_id=2345
name=bill
…
cloudfront
uri=/games/g10.exe
user_id=4312
…
user_profile
user_id=4312
name=fred
…
order_line
order_line_id = 25693
…
Distribution Keys determine which data resides on which slices
cloudfront
uri = /games/g1.exe
user_id=1234
…
cloudfront
uri = /img/ad_5.img
user_id=1234
…
Records with same distribu
tion key for a table are on t
he same slice
Distribution Keys 활용 데이터 분산
Node 1
Slice 1 Slice 2
cloudfront
uri = /games/g1.exe
user_id=1234
…
user_profile
user_id=1234
name=janet
…
cloudfront
uri = /imgs/ad1.png
user_id=2345
…
user_profile
user_id=2345
name=bill
…
order_line
order_line_id = 25693
…
cloudfront
uri = /img/ad_5.img
user_id=1234
…
Records from other tables
with the same distribution k
ey value are also on the sa
me slice
Records with same distribu
tion key for a table are on t
he same slice
Distribution Keys help with data locality for join evaluation
Node 2
Slice 3 Slice 4
user_profile
user_id=6789
name=fred
…
cloudfront
uri=/games/g10.exe
user_id=4312
…
user_profile
user_id=4312
name=fred
…
Distribution Keys 활용 데이터 분산
Node 1
Slice 1 Slice 2
Node 2
Slice 3 Slice 4
cloudfront
uri = /games/g1.exe
user_id=1234
…
cloudfront
uri = /imgs/ad1.png
user_id=2345
…
cloudfront
uri=/games/g10.exe
user_id=4312
…
cloudfront
uri = /img/ad_5.img
user_id=1234
…
2M records
5M records
1M records
4M records
Poor key choices lead to uneven distribution of records…
Distribution Keys 활용 데이터 분산
Node 1
Slice 1 Slice 2
Node 2
Slice 3 Slice 4
cloudfront
uri = /games/g1.exe
user_id=1234
…
cloudfront
uri = /imgs/ad1.png
user_id=2345
…
cloudfront
uri=/games/g10.exe
user_id=4312
…
cloudfront
uri = /img/ad_5.img
user_id=1234
…
2M records
5M records
1M records
4M records
Unevenly distributed data cause processing imbalances!
Distribution Keys 활용 데이터 분산
Node 1
Slice 1 Slice 2
Node 2
Slice 3 Slice 4
cloudfront
uri = /games/g1.exe
user_id=1234
…
cloudfront
uri = /imgs/ad1.png
user_id=2345
…
cloudfront
uri=/games/g10.exe
user_id=4312
…
cloudfront
uri = /img/ad_5.img
user_id=1234
…
2M records2M records 2M records 2M records
Evenly distributed data improves query performance
Distribution Keys 활용 데이터 분산
Distribution Key의 선택
Goal
• Distribute data evenly across nodes
• Minimize data movement: Co-located Joins & Aggregates
Best Practice
• Use the joined columns for largest commonly joined tables as key (example: f
act table and large dimension table)
• Consider using Group By column as a key (GROUP BY clause)
• Never use a distribution key that causes severe data skew
• Choose a key with high cardinality; large number of discrete values
Avoid
• Keys used as equality filter as your distribution key (Concentrates processing o
n one node)
Data의 정렬
The sort key helps Redshift minimize I/O
• For example, a table sorted on timestamp and queried on date ra
nge will skip all blocks not in the query range
In the slices (on disk), the data is sorted by a sort key
• If no sort key exists Redshift uses the data insertion order
Choose a sort key that is frequently used in your queries
• Primarily as a query predicate (date, identifier, …)
• Optionally choose a column frequently used for aggregates
• Optionally choose same as distribution key column for most effi
cient joins (merge join)
Don’t use too many columns per table as sort keys
SELECT SUM( S.Price * S.Quantity )
FROM SALES S
JOIN CATEGORY C ON C.ProductId = S.ProductId
JOIN FRANCHISE F ON F.FranchiseId = S.FranchiseId
Where C.CategoryId = ‘Produce’ And F.State = ‘WA’
AND S.Date Between ‘1/1/2015’ AND ‘1/31/2015’
예제 – Distribution and Sort Keys
Sort key (S) = Date
-- Total Products sold in Washington in January 2015
Dist key (F) = FranchiseID
Dist key (S) = ProductID
Dist key (C) = ProductID
쿼리를 위한 데이터베이스 최적화
Make sure your columns are compressed appropriately
Co-locate frequently joined tables using distribution keys or distribution
all to avoid data transfers between nodes
For joined tables consider using sort keys on the joined columns, allowin
g fast merge joins
Compression allows you to de-normalize without penalizing storage, simpli
fying queries and limiting joins
Vacuum and Analyze regularly
Amazon Redshift Data Loading
Petabyte-Scale Data Warehousing Serv
ice
데이터 로딩 프로세스
Data Source Extraction Transformation Loading
Amazon R
edshift
Target
Focus
Amazon Redshift 데이터 로딩 개요
AWS CloudCorporate Data center
Amazon
DynamoDB
Amazon S3
Data
Volume
Amazon Elastic
MapReduce
Amazon
RDS
Amazon Redsh
ift
Amazon
Glacier
logs / files
Source DBs
VPN
Connection
AWS Direct Co
nnect
S3 Multipart Upl
oad
AWS Import/ Ex
port
EC2 or On-Pre
m (using SSH)
Database Migration
Service
Kinesis
AWS Lambda
AWS Datapipeline
Amazon S3로 파일 업로드
Amazon Redsh
iftmydata
Client.txt
Corporate Data center
Region
Ensure that your d
ata resides in the s
ame Region as you
r Redshift clusters
Split the data into
multiple files to fac
ilitate parallel proc
essing
Optionally, you can
encrypt your data
using Amazon S3
Server-Side or Clie
nt-Side Encryption
Client.txt.1
Client.txt.2
Client.txt.3
Client.txt.4
Files should be ind
ividually compress
ed using GZIP or L
ZOP
Amazon S3에서 데이터 로드
Preparing Input Data Files
Uploading files to Amazon S3
Using COPY to load data from Amazon S3
구분자(Delimiters)를 사용한 입력 데이터
1|Customer#000000001|j5JsirBM9P|MOROCCO 0|MOROCCO|AFRICA|25-989-741-2988|BUILDING
2|Customer#000000002|487LW1dovn6Q4dMVym|JORDAN 1|JORDAN|MIDDLE EAST|23-768-687-3665|AUTOMOBILE
3|Customer#000000003|fkRGN8n|ARGENTINA7|ARGENTINA|AMERICA|11-719-748-3364|AUTOMOBILE
4|Customer#000000004|4u58h f|EGYPT 4|EGYPT|MIDDLE EAST|14-128-190-5944|MACHINERY
Example of pipe (‘|’) delimited file
CREATE TABLE customer (
c_custkey integer not null,
c_name varchar(25) not null,
c_address varchar(25) not null,
c_city varchar(10) not null,
c_nation varchar(15) not null,
c_region varchar(12) not null,
c_phone varchar(15) not null,
c_mktsegment varchar(10) not null
);
Copy customer from ‘s3://mydata/client.txt’
Credentials ‘aws_access_key_id=<your-access-key>; aws_secret_access_key=<your_secret_key>’
Delimiter ‘|’;
Fixed-width를 사용한 입력 데이터
1 RFK 900 Columbus MOROCCO MOROCCO AFRICA 25-989-741-2988 BUILDING
2 JFK 800 Washington JORDAN JORDAN MIDDLE EAST 23-768-687-3665 AUTOMOBILE
3 LBJ 700 Foxborough ARGENTINA ARGENTINA AMERICA 11-719-748-3364 AUTOMOBILE
4 GWB 600 Kansas EGYPT EGYPT MIDDLE EAST 14-128-190-5944 MACHINERY
CREATE TABLE customer (
c_custkey integer not null,
c_name varchar(25) not null,
c_address varchar(25) not null,
c_city varchar(10) not null,
c_nation varchar(15) not null,
c_region varchar(12) not null,
c_phone varchar(15) not null,
c_mktsegment varchar(10) not null
);
Copy customer from ‘s3://mydata/client.txt’
Credentials ‘aws_access_key_id=<your-access-key>;
aws_secret_access_key=<your_secret_key>’
fixedwidth ‘0:3, 1:25, 2:25, 3:10, 4:15, 5:12, 6:15, 7:10
’;
Client.txt
JSON 포맷을 사용한 입력 데이터
COPY uses a jsonpaths text file to parse JSON data
JSONPath expressions specify the path to JSON name elements
Each JSONPath expression corresponds to a column in the Amazon Redshift t
arget table
Suppose you want to load the VENUE table with the following content
{ "id": 15, "name": "Gillette Stadium", "location": [ "Foxborough", "MA" ],
"seats": 68756 } { "id": 15, "name": "McAfee Coliseum", "location": [
"Oakland", "MA" ], "seats": 63026 }
You would use the following jsonpaths file to parse the JSON data.
{ "jsonpaths": [ "$['id']", "$['name']", "$['location'][0]",
"$['location'][1]", "$['seats']" ] }
데이터 파일의 분할
Slice 0
Slice 1
Slice 0
Slice 1
Client.txt.1
Client.txt.2
Client.txt.3
Client.txt.4
Node 0
Node 1
2 XL Compute Nodes
Copy customer from ‘s3://mydata/client.txt’
Credentials ‘aws_access_key_id=<your-access-key>; aws_secret_access_key=<your_secret_key>’
Delimiter ‘|’;
mydata
Use the COPY command
Each slice can load one file at a time
A single input file means only one slice i
s ingesting data
Instead of 100MB/s, you’re only getting
6.25MB/s
쓰루풋 최대 활용을 위해 복수의 입력 파일 사용
Use the COPY command
You need at least as many input fil
es as you have slices
With 16 input files, all slices are wor
king so you maximize throughput
Get 100MB/s per node; scale linearly
as you add nodes
쓰루풋 최대 활용을 위해 복수의 입력 파일 사용
Manifest Files 사용한 데이터 로드
Use manifest to loads all required files
Supply JSON-formatted text file that lists the files to be loaded
Can load files from different buckets or wit different prefix
{
"entries": [
{"url":"s3://mybucket-alpha/2013-10-04-custdata", "mandatory":true},
{"url":"s3://mybucket-alpha/2013-10-05-custdata", "mandatory":true},
{"url":"s3://mybucket-beta/2013-10-04-custdata", "mandatory":true},
{"url":"s3://mybucket-beta/2013-10-05-custdata", "mandatory":true}
]
}
AWS Database Migration Service (AWS DMS)
Supports both homogenous and heterogeneous data replication.
Supported database sources include:
(1) Oracle, (2) SQL Server, (3) MySQL, (4) Amazon Aurora, (5) PostgreSQL, and
(6) ODBC. All sources are supported on-premises, in EC2, and RDS.
Supported database targets include:
(1) Amazon Aurora, (2) Oracle, (3) SQL Server, (4) MySQL, (5) PostgreSQL, and
(6) Amazon Redshift. All Oracle, SQL Server, MySQL and Postgres targets are
supported on-premises, in EC2 and RDS.
Keep your apps running during the migration
Customer
Premises
Application Users
AWS
Internet
VPN
Start a replication instance
Connect to source and target databases
Select tables, schemas, or databases
Let AWS Database Migration Service
create tables, load data, and keep
them in sync
Switch applications over to the target
at your convenience
AWS DMS – 온라인 마이그레이션
AWS
Database Migration Service
Q&A

More Related Content

What's hot

AWS re:Invent 2016: JustGiving: Serverless Data Pipelines, Event-Driven ETL, ...
AWS re:Invent 2016: JustGiving: Serverless Data Pipelines, Event-Driven ETL, ...AWS re:Invent 2016: JustGiving: Serverless Data Pipelines, Event-Driven ETL, ...
AWS re:Invent 2016: JustGiving: Serverless Data Pipelines, Event-Driven ETL, ...
Amazon Web Services
 
Migrating to Amazon RDS with Database Migration Service
Migrating to Amazon RDS with Database Migration ServiceMigrating to Amazon RDS with Database Migration Service
Migrating to Amazon RDS with Database Migration Service
Amazon Web Services
 
Getting started with amazon aurora - Toronto
Getting started with amazon aurora - TorontoGetting started with amazon aurora - Toronto
Getting started with amazon aurora - Toronto
Amazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
Amazon Web Services
 
AWS 마이그레이션 서비스 - 김일호 :: 2015 리인벤트 리캡 게이밍
AWS 마이그레이션 서비스 - 김일호 :: 2015 리인벤트 리캡 게이밍AWS 마이그레이션 서비스 - 김일호 :: 2015 리인벤트 리캡 게이밍
AWS 마이그레이션 서비스 - 김일호 :: 2015 리인벤트 리캡 게이밍
Amazon Web Services Korea
 
Accelerate your Business with SAP on AWS - AWS Summit Cape Town 2017
Accelerate your Business with SAP on AWS - AWS Summit Cape Town 2017 Accelerate your Business with SAP on AWS - AWS Summit Cape Town 2017
Accelerate your Business with SAP on AWS - AWS Summit Cape Town 2017
Amazon Web Services
 
Getting Started with Managed Database Services on AWS
Getting Started with Managed Database Services on AWSGetting Started with Managed Database Services on AWS
Getting Started with Managed Database Services on AWS
Amazon Web Services
 
Database Migration – Simple, Cross-Engine and Cross-Platform Migration
Database Migration – Simple, Cross-Engine and Cross-Platform MigrationDatabase Migration – Simple, Cross-Engine and Cross-Platform Migration
Database Migration – Simple, Cross-Engine and Cross-Platform Migration
Amazon Web Services
 
New Database Migration Services & RDS Updates
New Database Migration Services & RDS UpdatesNew Database Migration Services & RDS Updates
New Database Migration Services & RDS Updates
Amazon Web Services
 
Migrating Your Databases to AWS: Deep Dive on Amazon RDS and AWS Database Mig...
Migrating Your Databases to AWS: Deep Dive on Amazon RDS and AWS Database Mig...Migrating Your Databases to AWS: Deep Dive on Amazon RDS and AWS Database Mig...
Migrating Your Databases to AWS: Deep Dive on Amazon RDS and AWS Database Mig...
Amazon Web Services
 
AWS re:Invent 2016: ElastiCache Deep Dive: Best Practices and Usage Patterns ...
AWS re:Invent 2016: ElastiCache Deep Dive: Best Practices and Usage Patterns ...AWS re:Invent 2016: ElastiCache Deep Dive: Best Practices and Usage Patterns ...
AWS re:Invent 2016: ElastiCache Deep Dive: Best Practices and Usage Patterns ...
Amazon Web Services
 
AWS re:Invent 2016: Deep Dive on Amazon Relational Database Service (DAT305)
AWS re:Invent 2016: Deep Dive on Amazon Relational Database Service (DAT305)AWS re:Invent 2016: Deep Dive on Amazon Relational Database Service (DAT305)
AWS re:Invent 2016: Deep Dive on Amazon Relational Database Service (DAT305)
Amazon Web Services
 
AWS Innovate: Running Databases in AWS- Russell Nash
AWS Innovate: Running Databases in AWS- Russell NashAWS Innovate: Running Databases in AWS- Russell Nash
AWS Innovate: Running Databases in AWS- Russell Nash
Amazon Web Services Korea
 
SRV404 Deep Dive on Amazon DynamoDB
SRV404 Deep Dive on Amazon DynamoDBSRV404 Deep Dive on Amazon DynamoDB
SRV404 Deep Dive on Amazon DynamoDB
Amazon Web Services
 
AWS re:Invent 2016: Billions of Rows Transformed in Record Time Using Matilli...
AWS re:Invent 2016: Billions of Rows Transformed in Record Time Using Matilli...AWS re:Invent 2016: Billions of Rows Transformed in Record Time Using Matilli...
AWS re:Invent 2016: Billions of Rows Transformed in Record Time Using Matilli...
Amazon Web Services
 
AWS re:Invent 2016: How to Launch a 100K-User Corporate Back Office with Micr...
AWS re:Invent 2016: How to Launch a 100K-User Corporate Back Office with Micr...AWS re:Invent 2016: How to Launch a 100K-User Corporate Back Office with Micr...
AWS re:Invent 2016: How to Launch a 100K-User Corporate Back Office with Micr...
Amazon Web Services
 
AWS Database Migration Service
AWS Database Migration ServiceAWS Database Migration Service
AWS Database Migration Service
techugo
 
AWS re:Invent 2016: Workshop: Stretching Scalability: Doing more with Amazon ...
AWS re:Invent 2016: Workshop: Stretching Scalability: Doing more with Amazon ...AWS re:Invent 2016: Workshop: Stretching Scalability: Doing more with Amazon ...
AWS re:Invent 2016: Workshop: Stretching Scalability: Doing more with Amazon ...
Amazon Web Services
 
BDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
BDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMRBDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
BDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
Amazon Web Services
 
Migrating to Amazon RDS with Database Migration Service
Migrating to Amazon RDS with Database Migration ServiceMigrating to Amazon RDS with Database Migration Service
Migrating to Amazon RDS with Database Migration Service
Amazon Web Services
 

What's hot (20)

AWS re:Invent 2016: JustGiving: Serverless Data Pipelines, Event-Driven ETL, ...
AWS re:Invent 2016: JustGiving: Serverless Data Pipelines, Event-Driven ETL, ...AWS re:Invent 2016: JustGiving: Serverless Data Pipelines, Event-Driven ETL, ...
AWS re:Invent 2016: JustGiving: Serverless Data Pipelines, Event-Driven ETL, ...
 
Migrating to Amazon RDS with Database Migration Service
Migrating to Amazon RDS with Database Migration ServiceMigrating to Amazon RDS with Database Migration Service
Migrating to Amazon RDS with Database Migration Service
 
Getting started with amazon aurora - Toronto
Getting started with amazon aurora - TorontoGetting started with amazon aurora - Toronto
Getting started with amazon aurora - Toronto
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
AWS 마이그레이션 서비스 - 김일호 :: 2015 리인벤트 리캡 게이밍
AWS 마이그레이션 서비스 - 김일호 :: 2015 리인벤트 리캡 게이밍AWS 마이그레이션 서비스 - 김일호 :: 2015 리인벤트 리캡 게이밍
AWS 마이그레이션 서비스 - 김일호 :: 2015 리인벤트 리캡 게이밍
 
Accelerate your Business with SAP on AWS - AWS Summit Cape Town 2017
Accelerate your Business with SAP on AWS - AWS Summit Cape Town 2017 Accelerate your Business with SAP on AWS - AWS Summit Cape Town 2017
Accelerate your Business with SAP on AWS - AWS Summit Cape Town 2017
 
Getting Started with Managed Database Services on AWS
Getting Started with Managed Database Services on AWSGetting Started with Managed Database Services on AWS
Getting Started with Managed Database Services on AWS
 
Database Migration – Simple, Cross-Engine and Cross-Platform Migration
Database Migration – Simple, Cross-Engine and Cross-Platform MigrationDatabase Migration – Simple, Cross-Engine and Cross-Platform Migration
Database Migration – Simple, Cross-Engine and Cross-Platform Migration
 
New Database Migration Services & RDS Updates
New Database Migration Services & RDS UpdatesNew Database Migration Services & RDS Updates
New Database Migration Services & RDS Updates
 
Migrating Your Databases to AWS: Deep Dive on Amazon RDS and AWS Database Mig...
Migrating Your Databases to AWS: Deep Dive on Amazon RDS and AWS Database Mig...Migrating Your Databases to AWS: Deep Dive on Amazon RDS and AWS Database Mig...
Migrating Your Databases to AWS: Deep Dive on Amazon RDS and AWS Database Mig...
 
AWS re:Invent 2016: ElastiCache Deep Dive: Best Practices and Usage Patterns ...
AWS re:Invent 2016: ElastiCache Deep Dive: Best Practices and Usage Patterns ...AWS re:Invent 2016: ElastiCache Deep Dive: Best Practices and Usage Patterns ...
AWS re:Invent 2016: ElastiCache Deep Dive: Best Practices and Usage Patterns ...
 
AWS re:Invent 2016: Deep Dive on Amazon Relational Database Service (DAT305)
AWS re:Invent 2016: Deep Dive on Amazon Relational Database Service (DAT305)AWS re:Invent 2016: Deep Dive on Amazon Relational Database Service (DAT305)
AWS re:Invent 2016: Deep Dive on Amazon Relational Database Service (DAT305)
 
AWS Innovate: Running Databases in AWS- Russell Nash
AWS Innovate: Running Databases in AWS- Russell NashAWS Innovate: Running Databases in AWS- Russell Nash
AWS Innovate: Running Databases in AWS- Russell Nash
 
SRV404 Deep Dive on Amazon DynamoDB
SRV404 Deep Dive on Amazon DynamoDBSRV404 Deep Dive on Amazon DynamoDB
SRV404 Deep Dive on Amazon DynamoDB
 
AWS re:Invent 2016: Billions of Rows Transformed in Record Time Using Matilli...
AWS re:Invent 2016: Billions of Rows Transformed in Record Time Using Matilli...AWS re:Invent 2016: Billions of Rows Transformed in Record Time Using Matilli...
AWS re:Invent 2016: Billions of Rows Transformed in Record Time Using Matilli...
 
AWS re:Invent 2016: How to Launch a 100K-User Corporate Back Office with Micr...
AWS re:Invent 2016: How to Launch a 100K-User Corporate Back Office with Micr...AWS re:Invent 2016: How to Launch a 100K-User Corporate Back Office with Micr...
AWS re:Invent 2016: How to Launch a 100K-User Corporate Back Office with Micr...
 
AWS Database Migration Service
AWS Database Migration ServiceAWS Database Migration Service
AWS Database Migration Service
 
AWS re:Invent 2016: Workshop: Stretching Scalability: Doing more with Amazon ...
AWS re:Invent 2016: Workshop: Stretching Scalability: Doing more with Amazon ...AWS re:Invent 2016: Workshop: Stretching Scalability: Doing more with Amazon ...
AWS re:Invent 2016: Workshop: Stretching Scalability: Doing more with Amazon ...
 
BDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
BDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMRBDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
BDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
 
Migrating to Amazon RDS with Database Migration Service
Migrating to Amazon RDS with Database Migration ServiceMigrating to Amazon RDS with Database Migration Service
Migrating to Amazon RDS with Database Migration Service
 

Similar to 2017 AWS DB Day | Amazon Redshift 소개 및 실습

Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon Redshift
Amazon Web Services
 
Best Practices for Supercharging Cloud Analytics on Amazon Redshift
Best Practices for Supercharging Cloud Analytics on Amazon RedshiftBest Practices for Supercharging Cloud Analytics on Amazon Redshift
Best Practices for Supercharging Cloud Analytics on Amazon Redshift
SnapLogic
 
Processing and Analytics
Processing and AnalyticsProcessing and Analytics
Processing and Analytics
Amazon Web Services
 
Getting started with Amazon Redshift
Getting started with Amazon RedshiftGetting started with Amazon Redshift
Getting started with Amazon Redshift
Amazon Web Services
 
Best Practices for Migrating Your Data Warehouse to Amazon Redshift
Best Practices for Migrating Your Data Warehouse to Amazon RedshiftBest Practices for Migrating Your Data Warehouse to Amazon Redshift
Best Practices for Migrating Your Data Warehouse to Amazon Redshift
Amazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
Amazon Web Services
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Amazon Web Services
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Amazon Web Services
 
Data Warehousing in the Era of Big Data: Intro to Amazon Redshift
Data Warehousing in the Era of Big Data: Intro to Amazon RedshiftData Warehousing in the Era of Big Data: Intro to Amazon Redshift
Data Warehousing in the Era of Big Data: Intro to Amazon Redshift
Amazon Web Services
 
Deploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSDeploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWS
Amazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
Amazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
Amazon Web Services
 
Leveraging Amazon Redshift for Your Data Warehouse
Leveraging Amazon Redshift for Your Data WarehouseLeveraging Amazon Redshift for Your Data Warehouse
Leveraging Amazon Redshift for Your Data Warehouse
Amazon Web Services
 
Masterclass - Redshift
Masterclass - RedshiftMasterclass - Redshift
Masterclass - Redshift
Amazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
Amazon Web Services
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Amazon Web Services
 
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
Amazon Web Services
 
Azure Days 2019: Grösser und Komplexer ist nicht immer besser (Meinrad Weiss)
Azure Days 2019: Grösser und Komplexer ist nicht immer besser (Meinrad Weiss)Azure Days 2019: Grösser und Komplexer ist nicht immer besser (Meinrad Weiss)
Azure Days 2019: Grösser und Komplexer ist nicht immer besser (Meinrad Weiss)
Trivadis
 
Introdução ao data warehouse Amazon Redshift
Introdução ao data warehouse Amazon RedshiftIntrodução ao data warehouse Amazon Redshift
Introdução ao data warehouse Amazon Redshift
Amazon Web Services LATAM
 
Deep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDBDeep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDB
Amazon Web Services
 

Similar to 2017 AWS DB Day | Amazon Redshift 소개 및 실습 (20)

Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon Redshift
 
Best Practices for Supercharging Cloud Analytics on Amazon Redshift
Best Practices for Supercharging Cloud Analytics on Amazon RedshiftBest Practices for Supercharging Cloud Analytics on Amazon Redshift
Best Practices for Supercharging Cloud Analytics on Amazon Redshift
 
Processing and Analytics
Processing and AnalyticsProcessing and Analytics
Processing and Analytics
 
Getting started with Amazon Redshift
Getting started with Amazon RedshiftGetting started with Amazon Redshift
Getting started with Amazon Redshift
 
Best Practices for Migrating Your Data Warehouse to Amazon Redshift
Best Practices for Migrating Your Data Warehouse to Amazon RedshiftBest Practices for Migrating Your Data Warehouse to Amazon Redshift
Best Practices for Migrating Your Data Warehouse to Amazon Redshift
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
 
Data Warehousing in the Era of Big Data: Intro to Amazon Redshift
Data Warehousing in the Era of Big Data: Intro to Amazon RedshiftData Warehousing in the Era of Big Data: Intro to Amazon Redshift
Data Warehousing in the Era of Big Data: Intro to Amazon Redshift
 
Deploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSDeploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWS
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Leveraging Amazon Redshift for Your Data Warehouse
Leveraging Amazon Redshift for Your Data WarehouseLeveraging Amazon Redshift for Your Data Warehouse
Leveraging Amazon Redshift for Your Data Warehouse
 
Masterclass - Redshift
Masterclass - RedshiftMasterclass - Redshift
Masterclass - Redshift
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
 
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
 
Azure Days 2019: Grösser und Komplexer ist nicht immer besser (Meinrad Weiss)
Azure Days 2019: Grösser und Komplexer ist nicht immer besser (Meinrad Weiss)Azure Days 2019: Grösser und Komplexer ist nicht immer besser (Meinrad Weiss)
Azure Days 2019: Grösser und Komplexer ist nicht immer besser (Meinrad Weiss)
 
Introdução ao data warehouse Amazon Redshift
Introdução ao data warehouse Amazon RedshiftIntrodução ao data warehouse Amazon Redshift
Introdução ao data warehouse Amazon Redshift
 
Deep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDBDeep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDB
 

More from Amazon Web Services Korea

[D3T1S01] Gen AI를 위한 Amazon Aurora 활용 사례 방법
[D3T1S01] Gen AI를 위한 Amazon Aurora  활용 사례 방법[D3T1S01] Gen AI를 위한 Amazon Aurora  활용 사례 방법
[D3T1S01] Gen AI를 위한 Amazon Aurora 활용 사례 방법
Amazon Web Services Korea
 
[D3T1S06] Neptune Analytics with Vector Similarity Search
[D3T1S06] Neptune Analytics with Vector Similarity Search[D3T1S06] Neptune Analytics with Vector Similarity Search
[D3T1S06] Neptune Analytics with Vector Similarity Search
Amazon Web Services Korea
 
[D3T1S03] Amazon DynamoDB design puzzlers
[D3T1S03] Amazon DynamoDB design puzzlers[D3T1S03] Amazon DynamoDB design puzzlers
[D3T1S03] Amazon DynamoDB design puzzlers
Amazon Web Services Korea
 
[D3T1S04] Aurora PostgreSQL performance monitoring and troubleshooting by use...
[D3T1S04] Aurora PostgreSQL performance monitoring and troubleshooting by use...[D3T1S04] Aurora PostgreSQL performance monitoring and troubleshooting by use...
[D3T1S04] Aurora PostgreSQL performance monitoring and troubleshooting by use...
Amazon Web Services Korea
 
[D3T1S07] AWS S3 - 클라우드 환경에서 데이터베이스 보호하기
[D3T1S07] AWS S3 - 클라우드 환경에서 데이터베이스 보호하기[D3T1S07] AWS S3 - 클라우드 환경에서 데이터베이스 보호하기
[D3T1S07] AWS S3 - 클라우드 환경에서 데이터베이스 보호하기
Amazon Web Services Korea
 
[D3T1S05] Aurora 혼합 구성 아키텍처를 사용하여 예상치 못한 트래픽 급증 대응하기
[D3T1S05] Aurora 혼합 구성 아키텍처를 사용하여 예상치 못한 트래픽 급증 대응하기[D3T1S05] Aurora 혼합 구성 아키텍처를 사용하여 예상치 못한 트래픽 급증 대응하기
[D3T1S05] Aurora 혼합 구성 아키텍처를 사용하여 예상치 못한 트래픽 급증 대응하기
Amazon Web Services Korea
 
[D3T1S02] Aurora Limitless Database Introduction
[D3T1S02] Aurora Limitless Database Introduction[D3T1S02] Aurora Limitless Database Introduction
[D3T1S02] Aurora Limitless Database Introduction
Amazon Web Services Korea
 
[D3T2S01] Amazon Aurora MySQL 메이저 버전 업그레이드 및 Amazon B/G Deployments 실습
[D3T2S01] Amazon Aurora MySQL 메이저 버전 업그레이드 및 Amazon B/G Deployments 실습[D3T2S01] Amazon Aurora MySQL 메이저 버전 업그레이드 및 Amazon B/G Deployments 실습
[D3T2S01] Amazon Aurora MySQL 메이저 버전 업그레이드 및 Amazon B/G Deployments 실습
Amazon Web Services Korea
 
[D3T2S03] Data&AI Roadshow 2024 - Amazon DocumentDB 실습
[D3T2S03] Data&AI Roadshow 2024 - Amazon DocumentDB 실습[D3T2S03] Data&AI Roadshow 2024 - Amazon DocumentDB 실습
[D3T2S03] Data&AI Roadshow 2024 - Amazon DocumentDB 실습
Amazon Web Services Korea
 
AWS Modern Infra with Storage Roadshow 2023 - Day 2
AWS Modern Infra with Storage Roadshow 2023 - Day 2AWS Modern Infra with Storage Roadshow 2023 - Day 2
AWS Modern Infra with Storage Roadshow 2023 - Day 2
Amazon Web Services Korea
 
AWS Modern Infra with Storage Roadshow 2023 - Day 1
AWS Modern Infra with Storage Roadshow 2023 - Day 1AWS Modern Infra with Storage Roadshow 2023 - Day 1
AWS Modern Infra with Storage Roadshow 2023 - Day 1
Amazon Web Services Korea
 
사례로 알아보는 Database Migration Service : 데이터베이스 및 데이터 이관, 통합, 분리, 분석의 도구 - 발표자: ...
사례로 알아보는 Database Migration Service : 데이터베이스 및 데이터 이관, 통합, 분리, 분석의 도구 - 발표자: ...사례로 알아보는 Database Migration Service : 데이터베이스 및 데이터 이관, 통합, 분리, 분석의 도구 - 발표자: ...
사례로 알아보는 Database Migration Service : 데이터베이스 및 데이터 이관, 통합, 분리, 분석의 도구 - 발표자: ...
Amazon Web Services Korea
 
Amazon DocumentDB - Architecture 및 Best Practice (Level 200) - 발표자: 장동훈, Sr. ...
Amazon DocumentDB - Architecture 및 Best Practice (Level 200) - 발표자: 장동훈, Sr. ...Amazon DocumentDB - Architecture 및 Best Practice (Level 200) - 발표자: 장동훈, Sr. ...
Amazon DocumentDB - Architecture 및 Best Practice (Level 200) - 발표자: 장동훈, Sr. ...
Amazon Web Services Korea
 
Amazon Elasticache - Fully managed, Redis & Memcached Compatible Service (Lev...
Amazon Elasticache - Fully managed, Redis & Memcached Compatible Service (Lev...Amazon Elasticache - Fully managed, Redis & Memcached Compatible Service (Lev...
Amazon Elasticache - Fully managed, Redis & Memcached Compatible Service (Lev...
Amazon Web Services Korea
 
Internal Architecture of Amazon Aurora (Level 400) - 발표자: 정달영, APAC RDS Speci...
Internal Architecture of Amazon Aurora (Level 400) - 발표자: 정달영, APAC RDS Speci...Internal Architecture of Amazon Aurora (Level 400) - 발표자: 정달영, APAC RDS Speci...
Internal Architecture of Amazon Aurora (Level 400) - 발표자: 정달영, APAC RDS Speci...
Amazon Web Services Korea
 
[Keynote] 슬기로운 AWS 데이터베이스 선택하기 - 발표자: 강민석, Korea Database SA Manager, WWSO, A...
[Keynote] 슬기로운 AWS 데이터베이스 선택하기 - 발표자: 강민석, Korea Database SA Manager, WWSO, A...[Keynote] 슬기로운 AWS 데이터베이스 선택하기 - 발표자: 강민석, Korea Database SA Manager, WWSO, A...
[Keynote] 슬기로운 AWS 데이터베이스 선택하기 - 발표자: 강민석, Korea Database SA Manager, WWSO, A...
Amazon Web Services Korea
 
Demystify Streaming on AWS - 발표자: 이종혁, Sr Analytics Specialist, WWSO, AWS :::...
Demystify Streaming on AWS - 발표자: 이종혁, Sr Analytics Specialist, WWSO, AWS :::...Demystify Streaming on AWS - 발표자: 이종혁, Sr Analytics Specialist, WWSO, AWS :::...
Demystify Streaming on AWS - 발표자: 이종혁, Sr Analytics Specialist, WWSO, AWS :::...
Amazon Web Services Korea
 
Amazon EMR - Enhancements on Cost/Performance, Serverless - 발표자: 김기영, Sr Anal...
Amazon EMR - Enhancements on Cost/Performance, Serverless - 발표자: 김기영, Sr Anal...Amazon EMR - Enhancements on Cost/Performance, Serverless - 발표자: 김기영, Sr Anal...
Amazon EMR - Enhancements on Cost/Performance, Serverless - 발표자: 김기영, Sr Anal...
Amazon Web Services Korea
 
Amazon OpenSearch - Use Cases, Security/Observability, Serverless and Enhance...
Amazon OpenSearch - Use Cases, Security/Observability, Serverless and Enhance...Amazon OpenSearch - Use Cases, Security/Observability, Serverless and Enhance...
Amazon OpenSearch - Use Cases, Security/Observability, Serverless and Enhance...
Amazon Web Services Korea
 
Enabling Agility with Data Governance - 발표자: 김성연, Analytics Specialist, WWSO,...
Enabling Agility with Data Governance - 발표자: 김성연, Analytics Specialist, WWSO,...Enabling Agility with Data Governance - 발표자: 김성연, Analytics Specialist, WWSO,...
Enabling Agility with Data Governance - 발표자: 김성연, Analytics Specialist, WWSO,...
Amazon Web Services Korea
 

More from Amazon Web Services Korea (20)

[D3T1S01] Gen AI를 위한 Amazon Aurora 활용 사례 방법
[D3T1S01] Gen AI를 위한 Amazon Aurora  활용 사례 방법[D3T1S01] Gen AI를 위한 Amazon Aurora  활용 사례 방법
[D3T1S01] Gen AI를 위한 Amazon Aurora 활용 사례 방법
 
[D3T1S06] Neptune Analytics with Vector Similarity Search
[D3T1S06] Neptune Analytics with Vector Similarity Search[D3T1S06] Neptune Analytics with Vector Similarity Search
[D3T1S06] Neptune Analytics with Vector Similarity Search
 
[D3T1S03] Amazon DynamoDB design puzzlers
[D3T1S03] Amazon DynamoDB design puzzlers[D3T1S03] Amazon DynamoDB design puzzlers
[D3T1S03] Amazon DynamoDB design puzzlers
 
[D3T1S04] Aurora PostgreSQL performance monitoring and troubleshooting by use...
[D3T1S04] Aurora PostgreSQL performance monitoring and troubleshooting by use...[D3T1S04] Aurora PostgreSQL performance monitoring and troubleshooting by use...
[D3T1S04] Aurora PostgreSQL performance monitoring and troubleshooting by use...
 
[D3T1S07] AWS S3 - 클라우드 환경에서 데이터베이스 보호하기
[D3T1S07] AWS S3 - 클라우드 환경에서 데이터베이스 보호하기[D3T1S07] AWS S3 - 클라우드 환경에서 데이터베이스 보호하기
[D3T1S07] AWS S3 - 클라우드 환경에서 데이터베이스 보호하기
 
[D3T1S05] Aurora 혼합 구성 아키텍처를 사용하여 예상치 못한 트래픽 급증 대응하기
[D3T1S05] Aurora 혼합 구성 아키텍처를 사용하여 예상치 못한 트래픽 급증 대응하기[D3T1S05] Aurora 혼합 구성 아키텍처를 사용하여 예상치 못한 트래픽 급증 대응하기
[D3T1S05] Aurora 혼합 구성 아키텍처를 사용하여 예상치 못한 트래픽 급증 대응하기
 
[D3T1S02] Aurora Limitless Database Introduction
[D3T1S02] Aurora Limitless Database Introduction[D3T1S02] Aurora Limitless Database Introduction
[D3T1S02] Aurora Limitless Database Introduction
 
[D3T2S01] Amazon Aurora MySQL 메이저 버전 업그레이드 및 Amazon B/G Deployments 실습
[D3T2S01] Amazon Aurora MySQL 메이저 버전 업그레이드 및 Amazon B/G Deployments 실습[D3T2S01] Amazon Aurora MySQL 메이저 버전 업그레이드 및 Amazon B/G Deployments 실습
[D3T2S01] Amazon Aurora MySQL 메이저 버전 업그레이드 및 Amazon B/G Deployments 실습
 
[D3T2S03] Data&AI Roadshow 2024 - Amazon DocumentDB 실습
[D3T2S03] Data&AI Roadshow 2024 - Amazon DocumentDB 실습[D3T2S03] Data&AI Roadshow 2024 - Amazon DocumentDB 실습
[D3T2S03] Data&AI Roadshow 2024 - Amazon DocumentDB 실습
 
AWS Modern Infra with Storage Roadshow 2023 - Day 2
AWS Modern Infra with Storage Roadshow 2023 - Day 2AWS Modern Infra with Storage Roadshow 2023 - Day 2
AWS Modern Infra with Storage Roadshow 2023 - Day 2
 
AWS Modern Infra with Storage Roadshow 2023 - Day 1
AWS Modern Infra with Storage Roadshow 2023 - Day 1AWS Modern Infra with Storage Roadshow 2023 - Day 1
AWS Modern Infra with Storage Roadshow 2023 - Day 1
 
사례로 알아보는 Database Migration Service : 데이터베이스 및 데이터 이관, 통합, 분리, 분석의 도구 - 발표자: ...
사례로 알아보는 Database Migration Service : 데이터베이스 및 데이터 이관, 통합, 분리, 분석의 도구 - 발표자: ...사례로 알아보는 Database Migration Service : 데이터베이스 및 데이터 이관, 통합, 분리, 분석의 도구 - 발표자: ...
사례로 알아보는 Database Migration Service : 데이터베이스 및 데이터 이관, 통합, 분리, 분석의 도구 - 발표자: ...
 
Amazon DocumentDB - Architecture 및 Best Practice (Level 200) - 발표자: 장동훈, Sr. ...
Amazon DocumentDB - Architecture 및 Best Practice (Level 200) - 발표자: 장동훈, Sr. ...Amazon DocumentDB - Architecture 및 Best Practice (Level 200) - 발표자: 장동훈, Sr. ...
Amazon DocumentDB - Architecture 및 Best Practice (Level 200) - 발표자: 장동훈, Sr. ...
 
Amazon Elasticache - Fully managed, Redis & Memcached Compatible Service (Lev...
Amazon Elasticache - Fully managed, Redis & Memcached Compatible Service (Lev...Amazon Elasticache - Fully managed, Redis & Memcached Compatible Service (Lev...
Amazon Elasticache - Fully managed, Redis & Memcached Compatible Service (Lev...
 
Internal Architecture of Amazon Aurora (Level 400) - 발표자: 정달영, APAC RDS Speci...
Internal Architecture of Amazon Aurora (Level 400) - 발표자: 정달영, APAC RDS Speci...Internal Architecture of Amazon Aurora (Level 400) - 발표자: 정달영, APAC RDS Speci...
Internal Architecture of Amazon Aurora (Level 400) - 발표자: 정달영, APAC RDS Speci...
 
[Keynote] 슬기로운 AWS 데이터베이스 선택하기 - 발표자: 강민석, Korea Database SA Manager, WWSO, A...
[Keynote] 슬기로운 AWS 데이터베이스 선택하기 - 발표자: 강민석, Korea Database SA Manager, WWSO, A...[Keynote] 슬기로운 AWS 데이터베이스 선택하기 - 발표자: 강민석, Korea Database SA Manager, WWSO, A...
[Keynote] 슬기로운 AWS 데이터베이스 선택하기 - 발표자: 강민석, Korea Database SA Manager, WWSO, A...
 
Demystify Streaming on AWS - 발표자: 이종혁, Sr Analytics Specialist, WWSO, AWS :::...
Demystify Streaming on AWS - 발표자: 이종혁, Sr Analytics Specialist, WWSO, AWS :::...Demystify Streaming on AWS - 발표자: 이종혁, Sr Analytics Specialist, WWSO, AWS :::...
Demystify Streaming on AWS - 발표자: 이종혁, Sr Analytics Specialist, WWSO, AWS :::...
 
Amazon EMR - Enhancements on Cost/Performance, Serverless - 발표자: 김���영, Sr Anal...
Amazon EMR - Enhancements on Cost/Performance, Serverless - 발표자: 김기영, Sr Anal...Amazon EMR - Enhancements on Cost/Performance, Serverless - 발표자: 김기영, Sr Anal...
Amazon EMR - Enhancements on Cost/Performance, Serverless - 발표자: 김기영, Sr Anal...
 
Amazon OpenSearch - Use Cases, Security/Observability, Serverless and Enhance...
Amazon OpenSearch - Use Cases, Security/Observability, Serverless and Enhance...Amazon OpenSearch - Use Cases, Security/Observability, Serverless and Enhance...
Amazon OpenSearch - Use Cases, Security/Observability, Serverless and Enhance...
 
Enabling Agility with Data Governance - 발표자: 김성연, Analytics Specialist, WWSO,...
Enabling Agility with Data Governance - 발표자: 김성연, Analytics Specialist, WWSO,...Enabling Agility with Data Governance - 발표자: 김성연, Analytics Specialist, WWSO,...
Enabling Agility with Data Governance - 발표자: 김성연, Analytics Specialist, WWSO,...
 

Recently uploaded

Quantum Communications Q&A with Gemini LLM
Quantum Communications Q&A with Gemini LLMQuantum Communications Q&A with Gemini LLM
Quantum Communications Q&A with Gemini LLM
Vijayananda Mohire
 
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
Kief Morris
 
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - MydbopsScaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Mydbops
 
UiPath Community Day Kraków: Devs4Devs Conference
UiPath Community Day Kraków: Devs4Devs ConferenceUiPath Community Day Kraków: Devs4Devs Conference
UiPath Community Day Kraków: Devs4Devs Conference
UiPathCommunity
 
5G bootcamp Sep 2020 (NPI initiative).pptx
5G bootcamp Sep 2020 (NPI initiative).pptx5G bootcamp Sep 2020 (NPI initiative).pptx
5G bootcamp Sep 2020 (NPI initiative).pptx
SATYENDRA100
 
@Call @Girls Pune 0000000000 Riya Khan Beautiful Girl any Time
@Call @Girls Pune 0000000000 Riya Khan Beautiful Girl any Time@Call @Girls Pune 0000000000 Riya Khan Beautiful Girl any Time
@Call @Girls Pune 0000000000 Riya Khan Beautiful Girl any Time
amitchopra0215
 
What's Next Web Development Trends to Watch.pdf
What's Next Web Development Trends to Watch.pdfWhat's Next Web Development Trends to Watch.pdf
What's Next Web Development Trends to Watch.pdf
SeasiaInfotech2
 
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdfWhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
ArgaBisma
 
INDIAN AIR FORCE FIGHTER PLANES LIST.pdf
INDIAN AIR FORCE FIGHTER PLANES LIST.pdfINDIAN AIR FORCE FIGHTER PLANES LIST.pdf
INDIAN AIR FORCE FIGHTER PLANES LIST.pdf
jackson110191
 
What’s New in Teams Calling, Meetings and Devices May 2024
What’s New in Teams Calling, Meetings and Devices May 2024What’s New in Teams Calling, Meetings and Devices May 2024
What’s New in Teams Calling, Meetings and Devices May 2024
Stephanie Beckett
 
How Netflix Builds High Performance Applications at Global Scale
How Netflix Builds High Performance Applications at Global ScaleHow Netflix Builds High Performance Applications at Global Scale
How Netflix Builds High Performance Applications at Global Scale
ScyllaDB
 
GDG Cloud Southlake #34: Neatsun Ziv: Automating Appsec
GDG Cloud Southlake #34: Neatsun Ziv: Automating AppsecGDG Cloud Southlake #34: Neatsun Ziv: Automating Appsec
GDG Cloud Southlake #34: Neatsun Ziv: Automating Appsec
James Anderson
 
20240702 QFM021 Machine Intelligence Reading List June 2024
20240702 QFM021 Machine Intelligence Reading List June 202420240702 QFM021 Machine Intelligence Reading List June 2024
20240702 QFM021 Machine Intelligence Reading List June 2024
Matthew Sinclair
 
K2G - Insurtech Innovation EMEA Award 2024
K2G - Insurtech Innovation EMEA Award 2024K2G - Insurtech Innovation EMEA Award 2024
K2G - Insurtech Innovation EMEA Award 2024
The Digital Insurer
 
Quality Patents: Patents That Stand the Test of Time
Quality Patents: Patents That Stand the Test of TimeQuality Patents: Patents That Stand the Test of Time
Quality Patents: Patents That Stand the Test of Time
Aurora Consulting
 
Coordinate Systems in FME 101 - Webinar Slides
Coordinate Systems in FME 101 - Webinar SlidesCoordinate Systems in FME 101 - Webinar Slides
Coordinate Systems in FME 101 - Webinar Slides
Safe Software
 
Verti - EMEA Insurer Innovation Award 2024
Verti - EMEA Insurer Innovation Award 2024Verti - EMEA Insurer Innovation Award 2024
Verti - EMEA Insurer Innovation Award 2024
The Digital Insurer
 
20240705 QFM024 Irresponsible AI Reading List June 2024
20240705 QFM024 Irresponsible AI Reading List June 202420240705 QFM024 Irresponsible AI Reading List June 2024
20240705 QFM024 Irresponsible AI Reading List June 2024
Matthew Sinclair
 
Knowledge and Prompt Engineering Part 2 Focus on Prompt Design Approaches
Knowledge and Prompt Engineering Part 2 Focus on Prompt Design ApproachesKnowledge and Prompt Engineering Part 2 Focus on Prompt Design Approaches
Knowledge and Prompt Engineering Part 2 Focus on Prompt Design Approaches
Earley Information Science
 
How to Avoid Learning the Linux-Kernel Memory Model
How to Avoid Learning the Linux-Kernel Memory ModelHow to Avoid Learning the Linux-Kernel Memory Model
How to Avoid Learning the Linux-Kernel Memory Model
ScyllaDB
 

Recently uploaded (20)

Quantum Communications Q&A with Gemini LLM
Quantum Communications Q&A with Gemini LLMQuantum Communications Q&A with Gemini LLM
Quantum Communications Q&A with Gemini LLM
 
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
 
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - MydbopsScaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
 
UiPath Community Day Kraków: Devs4Devs Conference
UiPath Community Day Kraków: Devs4Devs ConferenceUiPath Community Day Kraków: Devs4Devs Conference
UiPath Community Day Kraków: Devs4Devs Conference
 
5G bootcamp Sep 2020 (NPI initiative).pptx
5G bootcamp Sep 2020 (NPI initiative).pptx5G bootcamp Sep 2020 (NPI initiative).pptx
5G bootcamp Sep 2020 (NPI initiative).pptx
 
@Call @Girls Pune 0000000000 Riya Khan Beautiful Girl any Time
@Call @Girls Pune 0000000000 Riya Khan Beautiful Girl any Time@Call @Girls Pune 0000000000 Riya Khan Beautiful Girl any Time
@Call @Girls Pune 0000000000 Riya Khan Beautiful Girl any Time
 
What's Next Web Development Trends to Watch.pdf
What's Next Web Development Trends to Watch.pdfWhat's Next Web Development Trends to Watch.pdf
What's Next Web Development Trends to Watch.pdf
 
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdfWhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
 
INDIAN AIR FORCE FIGHTER PLANES LIST.pdf
INDIAN AIR FORCE FIGHTER PLANES LIST.pdfINDIAN AIR FORCE FIGHTER PLANES LIST.pdf
INDIAN AIR FORCE FIGHTER PLANES LIST.pdf
 
What’s New in Teams Calling, Meetings and Devices May 2024
What’s New in Teams Calling, Meetings and Devices May 2024What’s New in Teams Calling, Meetings and Devices May 2024
What’s New in Teams Calling, Meetings and Devices May 2024
 
How Netflix Builds High Performance Applications at Global Scale
How Netflix Builds High Performance Applications at Global ScaleHow Netflix Builds High Performance Applications at Global Scale
How Netflix Builds High Performance Applications at Global Scale
 
GDG Cloud Southlake #34: Neatsun Ziv: Automating Appsec
GDG Cloud Southlake #34: Neatsun Ziv: Automating AppsecGDG Cloud Southlake #34: Neatsun Ziv: Automating Appsec
GDG Cloud Southlake #34: Neatsun Ziv: Automating Appsec
 
20240702 QFM021 Machine Intelligence Reading List June 2024
20240702 QFM021 Machine Intelligence Reading List June 202420240702 QFM021 Machine Intelligence Reading List June 2024
20240702 QFM021 Machine Intelligence Reading List June 2024
 
K2G - Insurtech Innovation EMEA Award 2024
K2G - Insurtech Innovation EMEA Award 2024K2G - Insurtech Innovation EMEA Award 2024
K2G - Insurtech Innovation EMEA Award 2024
 
Quality Patents: Patents That Stand the Test of Time
Quality Patents: Patents That Stand the Test of TimeQuality Patents: Patents That Stand the Test of Time
Quality Patents: Patents That Stand the Test of Time
 
Coordinate Systems in FME 101 - Webinar Slides
Coordinate Systems in FME 101 - Webinar SlidesCoordinate Systems in FME 101 - Webinar Slides
Coordinate Systems in FME 101 - Webinar Slides
 
Verti - EMEA Insurer Innovation Award 2024
Verti - EMEA Insurer Innovation Award 2024Verti - EMEA Insurer Innovation Award 2024
Verti - EMEA Insurer Innovation Award 2024
 
20240705 QFM024 Irresponsible AI Reading List June 2024
20240705 QFM024 Irresponsible AI Reading List June 202420240705 QFM024 Irresponsible AI Reading List June 2024
20240705 QFM024 Irresponsible AI Reading List June 2024
 
Knowledge and Prompt Engineering Part 2 Focus on Prompt Design Approaches
Knowledge and Prompt Engineering Part 2 Focus on Prompt Design ApproachesKnowledge and Prompt Engineering Part 2 Focus on Prompt Design Approaches
Knowledge and Prompt Engineering Part 2 Focus on Prompt Design Approaches
 
How to Avoid Learning the Linux-Kernel Memory Model
How to Avoid Learning the Linux-Kernel Memory ModelHow to Avoid Learning the Linux-Kernel Memory Model
How to Avoid Learning the Linux-Kernel Memory Model
 

2017 AWS DB Day | Amazon Redshift 소개 및 실습

  • 1. Amazon Redshift 소개 및 실습 김상필 아마존웹서비스 솔루션즈 아키텍트 2017년 5월 30일 13:00 – 15:00
  • 2. Agenda • Redshift 개요, 테이블 설계 및 데이터 로딩 고려사항 (30분) • Qwiklab 실습 1 - Advanced Amazon Redshift: Data Loading (45분) • Qwiklab 실습 2 - Advanced Amazon Redshift: Table Layout and Schema Design (45분)
  • 3. Amazon Redshift Fast, simple, petabyte-scale data warehousing for less than $1,000/TB/Year
  • 4. Relational data warehouse Massively parallel; Petabyte scale Fully managed HDD and SSD Platforms $1,000/TB/Year; starts at $0.25/hour Amazon Redshift a lot faster a lot simpler a lot cheaper
  • 5. Amazon Redshift 아키텍처 Leader Node Simple SQL end point Stores metadata Optimizes query plan Coordinates query execution Compute Nodes Local columnar storage Parallel/distributed execution of all queries, loads, back ups, restores, resizes Start at just $0.25/hour, grow to 2 PB (compressed) DC1: SSD; scale from 160 GB to 326 TB DS1/DS2: HDD; scale from 2 TB to 2 PB Ingestion/Backup Backup Restore JDBC/ODBC 10 GigE (HPC)
  • 6. Amazon Redshift의 빠른 속도를 위한 구성 Dramatically less I/O Column storage Data compression Zone maps Direct-attached storage Large data block sizes analyze compression listing; Table | Column | Encoding ---------+----------------+---------- listing | listid | delta listing | sellerid | delta32k listing | eventid | delta32k listing | dateid | bytedict listing | numtickets | bytedict listing | priceperticket | delta32k listing | totalprice | mostly32 listing | listtime | raw 10 | 13 | 14 | 26 |… … | 100 | 245 | 324 375 | 393 | 417… … 512 | 549 | 623 637 | 712 | 809 … … | 834 | 921 | 959 10 324 375 623 637 959
  • 7. Amazon Redshift의 빠른 속도를 위한 구성 Parallel and Distributed Query Load Export Backup Restore Resize
  • 8. ID Name 1 John Smith 2 Jane Jones 3 Peter Black 4 Pat Partridge 5 Sarah Cyan 6 Brian Snail 1 John Smith 4 Pat Partridge 2 Jane Jones 5 Sarah Cyan 3 Peter Black 6 Brian Snail Amazon Redshift의 빠른 속도를 위한 구성 Distribution Keys
  • 9. Amazon Redshift Security Petabyte-Scale Data Warehousing Service Amazon Redshift Table and Schema Design
  • 10. Amazon Redshift는 기존의 데이터 모델 지원 Star Snowflake
  • 11. 적합한 데이터 타입의 선택 Redshift performance is about efficient I/O Don’t make columns wider than necessary, e.g.: • Avoid BIGINT for country identifier • Avoid CHAR(MAX) for country names • Oversizing VARCHAR impact loading and runtime performance Use appropriate types • Use TIMESTAMP or DATE instead of CHAR • Use CHAR instead of VARCHAR when appropriate Multibyte Characters • VARCHAR data type supports UTF-8 multibyte characters up to a maximum of four bytes • The CHAR data type does not support multibyte characters
  • 12. 아키텍처 및 스키마 설계 고려사항 • Redshift is a distributed system: – A compute node contains slices (one per core) – A slice contains data • Queries run on all slices in parallel: optimal query throughput can be achieved when data is evenly spread across slices
  • 13. 테이블 분산 타입 Distribution Key All Node 1 Slice 1 Slice 2 Node 2 Slice 3 Slice 4 Node 1 Slice 1 Slice 2 Node 2 Slice 3 Slice 4 All data on every node Same key to same location Node 1 Slice 1 Slice 2 Node 2 Slice 3 Slice 4 Even Round robin distribution
  • 14. 분산 타입의 선택 Choose a distribution style of KEY for • Large data tables, like a FACT table in a star schema • Large or rapidly changing tables used in joins or aggregations • Improved performance even if the key is not used in join column Choose a distribution style of ALL for tables that • Have slowly changing data • Reasonable size (i.e., few millions but not 100’s of millions of rows) • No common distribution key for frequent joins • Typical use case – joined dimension table without a common distri bution key Choose a distribution style of EVEN for tables that are not joined and have no aggregate queries
  • 15. Node 1 Slice 1 Slice 2 Node 2 Slice 3 Slice 4 cloudfront uri = /games/g1.exe user_id=1234 … user_profile user_id=1234 name=janet … user_profile user_id=6789 name=fred … cloudfront uri = /imgs/ad1.png user_id=2345 … user_profile user_id=2345 name=bill … cloudfront uri=/games/g10.exe user_id=4312 … user_profile user_id=4312 name=fred … order_line order_line_id = 25693 … cloudfront uri = /img/ad_5.img user_id=1234 … Distribution Keys 활용 데이터 분산
  • 16. Node 1 Slice 1 Slice 2 Node 2 Slice 3 Slice 4 user_profile user_id=1234 name=janet … user_profile user_id=6789 name=fred … cloudfront uri = /imgs/ad1.png user_id=2345 … user_profile user_id=2345 name=bill … cloudfront uri=/games/g10.exe user_id=4312 … user_profile user_id=4312 name=fred … order_line order_line_id = 25693 … Distribution Keys determine which data resides on which slices cloudfront uri = /games/g1.exe user_id=1234 … cloudfront uri = /img/ad_5.img user_id=1234 … Records with same distribu tion key for a table are on t he same slice Distribution Keys 활용 데이터 분산
  • 17. Node 1 Slice 1 Slice 2 cloudfront uri = /games/g1.exe user_id=1234 … user_profile user_id=1234 name=janet … cloudfront uri = /imgs/ad1.png user_id=2345 … user_profile user_id=2345 name=bill … order_line order_line_id = 25693 … cloudfront uri = /img/ad_5.img user_id=1234 … Records from other tables with the same distribution k ey value are also on the sa me slice Records with same distribu tion key for a table are on t he same slice Distribution Keys help with data locality for join evaluation Node 2 Slice 3 Slice 4 user_profile user_id=6789 name=fred … cloudfront uri=/games/g10.exe user_id=4312 … user_profile user_id=4312 name=fred … Distribution Keys 활용 데이터 분산
  • 18. Node 1 Slice 1 Slice 2 Node 2 Slice 3 Slice 4 cloudfront uri = /games/g1.exe user_id=1234 … cloudfront uri = /imgs/ad1.png user_id=2345 … cloudfront uri=/games/g10.exe user_id=4312 … cloudfront uri = /img/ad_5.img user_id=1234 … 2M records 5M records 1M records 4M records Poor key choices lead to uneven distribution of records… Distribution Keys 활용 데이터 분산
  • 19. Node 1 Slice 1 Slice 2 Node 2 Slice 3 Slice 4 cloudfront uri = /games/g1.exe user_id=1234 … cloudfront uri = /imgs/ad1.png user_id=2345 … cloudfront uri=/games/g10.exe user_id=4312 … cloudfront uri = /img/ad_5.img user_id=1234 … 2M records 5M records 1M records 4M records Unevenly distributed data cause processing imbalances! Distribution Keys 활용 데이터 분산
  • 20. Node 1 Slice 1 Slice 2 Node 2 Slice 3 Slice 4 cloudfront uri = /games/g1.exe user_id=1234 … cloudfront uri = /imgs/ad1.png user_id=2345 … cloudfront uri=/games/g10.exe user_id=4312 … cloudfront uri = /img/ad_5.img user_id=1234 … 2M records2M records 2M records 2M records Evenly distributed data improves query performance Distribution Keys 활용 데이터 분산
  • 21. Distribution Key의 선택 Goal • Distribute data evenly across nodes • Minimize data movement: Co-located Joins & Aggregates Best Practice • Use the joined columns for largest commonly joined tables as key (example: f act table and large dimension table) • Consider using Group By column as a key (GROUP BY clause) • Never use a distribution key that causes severe data skew • Choose a key with high cardinality; large number of discrete values Avoid • Keys used as equality filter as your distribution key (Concentrates processing o n one node)
  • 22. Data의 정렬 The sort key helps Redshift minimize I/O • For example, a table sorted on timestamp and queried on date ra nge will skip all blocks not in the query range In the slices (on disk), the data is sorted by a sort key • If no sort key exists Redshift uses the data insertion order Choose a sort key that is frequently used in your queries • Primarily as a query predicate (date, identifier, …) • Optionally choose a column frequently used for aggregates • Optionally choose same as distribution key column for most effi cient joins (merge join) Don’t use too many columns per table as sort keys
  • 23. SELECT SUM( S.Price * S.Quantity ) FROM SALES S JOIN CATEGORY C ON C.ProductId = S.ProductId JOIN FRANCHISE F ON F.FranchiseId = S.FranchiseId Where C.CategoryId = ‘Produce’ And F.State = ‘WA’ AND S.Date Between ‘1/1/2015’ AND ‘1/31/2015’ 예제 – Distribution and Sort Keys Sort key (S) = Date -- Total Products sold in Washington in January 2015 Dist key (F) = FranchiseID Dist key (S) = ProductID Dist key (C) = ProductID
  • 24. 쿼리를 위한 데이터베이스 최적화 Make sure your columns are compressed appropriately Co-locate frequently joined tables using distribution keys or distribution all to avoid data transfers between nodes For joined tables consider using sort keys on the joined columns, allowin g fast merge joins Compression allows you to de-normalize without penalizing storage, simpli fying queries and limiting joins Vacuum and Analyze regularly
  • 25. Amazon Redshift Data Loading Petabyte-Scale Data Warehousing Serv ice
  • 26. 데이터 로딩 프로세스 Data Source Extraction Transformation Loading Amazon R edshift Target Focus
  • 27. Amazon Redshift 데이터 로딩 개요 AWS CloudCorporate Data center Amazon DynamoDB Amazon S3 Data Volume Amazon Elastic MapReduce Amazon RDS Amazon Redsh ift Amazon Glacier logs / files Source DBs VPN Connection AWS Direct Co nnect S3 Multipart Upl oad AWS Import/ Ex port EC2 or On-Pre m (using SSH) Database Migration Service Kinesis AWS Lambda AWS Datapipeline
  • 28. Amazon S3로 파일 업로드 Amazon Redsh iftmydata Client.txt Corporate Data center Region Ensure that your d ata resides in the s ame Region as you r Redshift clusters Split the data into multiple files to fac ilitate parallel proc essing Optionally, you can encrypt your data using Amazon S3 Server-Side or Clie nt-Side Encryption Client.txt.1 Client.txt.2 Client.txt.3 Client.txt.4 Files should be ind ividually compress ed using GZIP or L ZOP
  • 29. Amazon S3에서 데이터 로드 Preparing Input Data Files Uploading files to Amazon S3 Using COPY to load data from Amazon S3
  • 30. 구분자(Delimiters)를 사용한 입력 데이터 1|Customer#000000001|j5JsirBM9P|MOROCCO 0|MOROCCO|AFRICA|25-989-741-2988|BUILDING 2|Customer#000000002|487LW1dovn6Q4dMVym|JORDAN 1|JORDAN|MIDDLE EAST|23-768-687-3665|AUTOMOBILE 3|Customer#000000003|fkRGN8n|ARGENTINA7|ARGENTINA|AMERICA|11-719-748-3364|AUTOMOBILE 4|Customer#000000004|4u58h f|EGYPT 4|EGYPT|MIDDLE EAST|14-128-190-5944|MACHINERY Example of pipe (‘|’) delimited file CREATE TABLE customer ( c_custkey integer not null, c_name varchar(25) not null, c_address varchar(25) not null, c_city varchar(10) not null, c_nation varchar(15) not null, c_region varchar(12) not null, c_phone varchar(15) not null, c_mktsegment varchar(10) not null ); Copy customer from ‘s3://mydata/client.txt’ Credentials ‘aws_access_key_id=<your-access-key>; aws_secret_access_key=<your_secret_key>’ Delimiter ‘|’;
  • 31. Fixed-width를 사용한 입력 데이터 1 RFK 900 Columbus MOROCCO MOROCCO AFRICA 25-989-741-2988 BUILDING 2 JFK 800 Washington JORDAN JORDAN MIDDLE EAST 23-768-687-3665 AUTOMOBILE 3 LBJ 700 Foxborough ARGENTINA ARGENTINA AMERICA 11-719-748-3364 AUTOMOBILE 4 GWB 600 Kansas EGYPT EGYPT MIDDLE EAST 14-128-190-5944 MACHINERY CREATE TABLE customer ( c_custkey integer not null, c_name varchar(25) not null, c_address varchar(25) not null, c_city varchar(10) not null, c_nation varchar(15) not null, c_region varchar(12) not null, c_phone varchar(15) not null, c_mktsegment varchar(10) not null ); Copy customer from ‘s3://mydata/client.txt’ Credentials ‘aws_access_key_id=<your-access-key>; aws_secret_access_key=<your_secret_key>’ fixedwidth ‘0:3, 1:25, 2:25, 3:10, 4:15, 5:12, 6:15, 7:10 ’; Client.txt
  • 32. JSON 포맷을 사용한 입력 데이터 COPY uses a jsonpaths text file to parse JSON data JSONPath expressions specify the path to JSON name elements Each JSONPath expression corresponds to a column in the Amazon Redshift t arget table Suppose you want to load the VENUE table with the following content { "id": 15, "name": "Gillette Stadium", "location": [ "Foxborough", "MA" ], "seats": 68756 } { "id": 15, "name": "McAfee Coliseum", "location": [ "Oakland", "MA" ], "seats": 63026 } You would use the following jsonpaths file to parse the JSON data. { "jsonpaths": [ "$['id']", "$['name']", "$['location'][0]", "$['location'][1]", "$['seats']" ] }
  • 33. 데이터 파일의 분할 Slice 0 Slice 1 Slice 0 Slice 1 Client.txt.1 Client.txt.2 Client.txt.3 Client.txt.4 Node 0 Node 1 2 XL Compute Nodes Copy customer from ‘s3://mydata/client.txt’ Credentials ‘aws_access_key_id=<your-access-key>; aws_secret_access_key=<your_secret_key>’ Delimiter ‘|’; mydata
  • 34. Use the COPY command Each slice can load one file at a time A single input file means only one slice i s ingesting data Instead of 100MB/s, you’re only getting 6.25MB/s 쓰루풋 최대 활용을 위해 복수의 입력 파일 사용
  • 35. Use the COPY command You need at least as many input fil es as you have slices With 16 input files, all slices are wor king so you maximize throughput Get 100MB/s per node; scale linearly as you add nodes 쓰루풋 최대 활용을 위해 복수의 입력 파일 사용
  • 36. Manifest Files 사용한 데이터 로드 Use manifest to loads all required files Supply JSON-formatted text file that lists the files to be loaded Can load files from different buckets or wit different prefix { "entries": [ {"url":"s3://mybucket-alpha/2013-10-04-custdata", "mandatory":true}, {"url":"s3://mybucket-alpha/2013-10-05-custdata", "mandatory":true}, {"url":"s3://mybucket-beta/2013-10-04-custdata", "mandatory":true}, {"url":"s3://mybucket-beta/2013-10-05-custdata", "mandatory":true} ] }
  • 37. AWS Database Migration Service (AWS DMS) Supports both homogenous and heterogeneous data replication. Supported database sources include: (1) Oracle, (2) SQL Server, (3) MySQL, (4) Amazon Aurora, (5) PostgreSQL, and (6) ODBC. All sources are supported on-premises, in EC2, and RDS. Supported database targets include: (1) Amazon Aurora, (2) Oracle, (3) SQL Server, (4) MySQL, (5) PostgreSQL, and (6) Amazon Redshift. All Oracle, SQL Server, MySQL and Postgres targets are supported on-premises, in EC2 and RDS. Keep your apps running during the migration
  • 38. Customer Premises Application Users AWS Internet VPN Start a replication instance Connect to source and target databases Select tables, schemas, or databases Let AWS Database Migration Service create tables, load data, and keep them in sync Switch applications over to the target at your convenience AWS DMS – 온라인 마이그레이션 AWS Database Migration Service
  • 39. Q&A