SlideShare a Scribd company logo
Analytics and Processing
agenda overview
10:00 AM Registration
10:30 AM Introduction to Big Data @ AWS
12:00 PM Lunch + Registration for Technical Sessions
12:30 PM Data Collection and Storage
1:45PM Real-time Event Processing
3:00PM Analytics (incl Machine Learning)
4:30 PM Open Q&A Roundtable
Collect Process Analyze
Store
Data Collection
and Storage
Data
Processing
Event
Processing
Data
Analysis
primitive patterns
EMR Redshift
Machine
Learning
Process and Analyze
• Hadoop
 Ad-hoc exploration of un-structured datasets
 Batch Processing on Large datasets
• Data Warehouses
 Analysis via Visualization tools
 Interactive querying of structured data
• Machine learning
 Predictions for what will happen
 Smart applications
Hadoop and Data Warehouses
Databases
Files
Data warehouse Data Marts Reports
Hadoop
Ad-hoc Exploration
Media
Cloud
ETL
Amazon EMR
Elastic MapReduce
Why Amazon EMR?
Easy to Use
Launch a cluster in minutes
Low Cost
Pay an hourly rate
Elastic
Easily add or remove capacity
Reliable
Spend less time monitoring
Secure
Manage firewalls
Flexible
Control the cluster
Try different configurations to find your optimal architecture
CPU
c3 family
cc1.4xlarge
cc2.8xlarge
Memory
m2 family
r3 family
Disk/IO
d2 family
i2 family
General
m1 family
m3 family
Choose your instance types
Batch Machine Spark and Large
process learning interactive HDFS
Easy to add and remove compute capacity on your cluster
Match compute
demands with
cluster sizing.
Resizable clusters
Spot Instances
for task nodes
Up to 90%
off Amazon EC2
on-demand
pricing
On-demand for
core nodes
Standard
Amazon EC2
pricing for
on-demand
capacity
Easy to use Spot Instances
Meet SLA at predictable cost Exceed SLA at lower cost
Amazon S3 as your persistent data store
• Separate compute and storage
• Resize and shut down Amazon EMR
clusters with no data loss
• Point multiple Amazon EMR clusters
at same data in Amazon S3
EMR
EMR
Amazon
S3
EMRFS makes it easier to leverage S3
• Better performance and error handling options
• Transparent to applications – Use “s3://”
• Consistent view
 For consistent list and read-after-write for new puts
• Support for Amazon S3 server-side and client-side
encryption
• Faster listing using EMRFS metadata
EMRFS - S3 client-side encryption
Amazon S3
AmazonS3encryptionclients
EMRFSenabledfor
AmazonS3client-sideencryption
Key vendor (AWS KMS or your custom key vendor)
(client-side encrypted objects)
Amazon S3 EMRFS metadata
in Amazon DynamoDB
• List and read-after-write consistency
• Faster list operations
Number
of objects
Without
Consistent
Views
With Consistent
Views
1,000,000 147.72 29.70
100,000 12.70 3.69
Fast listing of S3 objects using
EMRFS metadata
*Tested using a single node cluster with a m3.xlarge instance.
Optimize to leverage HDFS
• Iterative workloads
 If you’re processing the same dataset more than once
• Disk I/O intensive workloads
Persist data on Amazon S3 and use S3DistCp to copy
to HDFS for processing
Pattern #1: Batch processing
GBs of logs pushed
to Amazon S3 hourly
Daily Amazon EMR
cluster using Hive to
process data
Input and output
stored in Amazon S3
Load subset into
Redshift DW
Pattern #2: Online data-store
Data pushed to
Amazon S3
Daily Amazon EMR cluster
Extract, Transform, and Load
(ETL) data into database
24/7 Amazon EMR cluster
running HBase holds last 2
years’ worth of data
Front-end service uses
HBase cluster to power
dashboard with high
concurrency
Pattern #3: Interactive query
TBs of logs sent
daily
Logs stored in S3
Transient EMR
clusters
Hive Metastore
File formats
• Row oriented
 Text files
 Sequence files
• Writable object
 Avro data files
• Described by schema
• Columnar format
 Object Record Columnar (ORC)
 Parquet
Logical Table
Row oriented
Column oriented
Choosing the right file format
• Processing and query tools
 Hive, Impala, and Presto.
• Evolution of schema
 Avro for schema and Presto for storage.
• File format “splittability”
 Avoid JSON/XML Files. Use them as records.
Choosing the right compression
• Time sensitive: faster compressions are a better choice
• Large amount of data: use space-efficient compressions
Algorithm Splittable? Compression Ratio
Compress +
Decompress Speed
Gzip (DEFLATE) No High Medium
bzip2 Yes Very high Slow
LZO Yes Low Fast
Snappy No Low Very fast
Dealing with small files
• Reduce HDFS block size (e.g., 1 MB [default is 128 MB])
 --bootstrap-action s3://elasticmapreduce/bootstrap-actions/configure-
hadoop --args “-m,dfs.block.size=1048576”
• Better: use S3DistCp to combine smaller files together
 S3DistCp takes a pattern and target path to combine smaller input files
into larger ones
 Supply a target size and compression codec
DEMO: Log Processing using Amazon EMR
• Aggregating small files using s3distcp
• Defining Hive tables with data on Amazon S3
• Transforming dataset using Batch processing
• Interactive querying using Presto and Spark-Sql
Amazon S3
Log Bucket
Amazon
EMR
Processed and
structured log data
Amazon Redshift
Amazon Redshift Architecture
• Leader Node
 SQL endpoint
 Stores metadata
 Coordinates query execution
• Compute Nodes
 Local, columnar storage
 Execute queries in parallel
 Load, backup, restore via
Amazon S3; load from
Amazon DynamoDB or SSH
• Two hardware platforms
 Optimized for data processing
 DW1: HDD; scale from 2TB to 1.6PB
 DW2: SSD; scale from 160GB to 256TB
10 GigE
(HPC)
Ingestion
Backup
Restore
JDBC/ODBC
Amazon Redshift Node Types
• Optimized for I/O intensive workloads
• High disk density
• On demand at $0.85/hour
• As low as $1,000/TB/Year
• Scale from 2TB to 1.6PB
DW1.XL: 16 GB RAM, 2 Cores
3 Spindles, 2 TB compressed storage
DW1.8XL: 128 GB RAM, 16 Cores, 24 Spindles
16 TB compressed, 2 GB/sec scan rate
• High performance at smaller storage size
• High compute and memory density
• On demand at $0.25/hour
• As low as $5,500/TB/Year
• Scale from 160GB to 256TB
DW2.L *New*: 16 GB RAM, 2 Cores,
160 GB compressed SSD storage
DW2.8XL *New*: 256 GB RAM, 32 Cores,
2.56 TB of compressed SSD storage
Amazon Redshift dramatically reduces I/O
Column storage
Data compression
Zone maps
Direct-attached storage
• With row storage you do
unnecessary I/O
• To get total amount, you have
to read everything
ID Age State Amount
123 20 CA 500
345 25 WA 250
678 40 FL 125
957 37 WA 375
Amazon Redshift dramatically reduces I/O
Column storage
Data compression
Zone maps
Direct-attached storage
With column storage, you only
read the data you need
ID Age State Amount
123 20 CA 500
345 25 WA 250
678 40 FL 125
957 37 WA 375
analyze compression listing;
Table | Column | Encoding
---------+----------------+----------
listing | listid | delta
listing | sellerid | delta32k
listing | eventid | delta32k
listing | dateid | bytedict
listing | numtickets | bytedict
listing | priceperticket | delta32k
listing | totalprice | mostly32
listing | listtime | raw
Amazon Redshift dramatically reduces I/O
Column storage
Data compression
Zone maps
Direct-attached storage
• COPY compresses automatically
• You can analyze and override
• More performance, less cost
Amazon Redshift dramatically reduces I/O
Column storage
Data compression
Zone maps
Direct-attached storage
• Track the minimum and
maximum value for each block
• Skip over blocks that don’t
contain relevant data
10 | 13 | 14 | 26 |…
… | 100 | 245 | 324
375 | 393 | 417…
… 512 | 549 | 623
637 | 712 | 809 …
… | 834 | 921 | 959
10
324
375
623
637
959
Amazon Redshift dramatically reduces I/O
Column storage
Data compression
Zone maps
Direct-attached storage
• Use local storage for
performance
• Maximize scan rates
• Automatic replication
and continuous backup
• HDD & SSD platforms
Amazon Redshift parallelizes and
distributes everything
Query
Load
Backup/Restore
Resize
Amazon Redshift parallelizes and
distributes everything
Query
Load
Backup/Restore
Resize
• Load in parallel from Amazon S3 or
DynamoDB or any SSH connection
• Data automatically distributed and
sorted according to DDL
• Scales linearly with the number of
nodes in the cluster
Amazon Redshift parallelizes and
distributes everything
Query
Load
Backup/Restore
Resize
• Backups to Amazon S3 are automatic,
continuous and incremental
• Configurable system snapshot retention
period. Take user snapshots on-demand
• Cross region backups for disaster recovery
• Streaming restores enable you to resume
querying faster
Amazon Redshift parallelizes and
distributes everything
Query
Load
Backup/Restore
Resize
• Resize while remaining online
• Provision a new cluster in the background
• Copy data in parallel from node to node
• Only charged for source cluster
Amazon Redshift parallelizes and
distributes everything
Query
Load
Backup/Restore
Resize
• Automatic SQL endpoint
switchover via DNS
• Decommission the source cluster
• Simple operation via Console or API
Amazon Redshift works with your
existing analysis tools
JDBC/ODBC
Connect using drivers
from PostgreSQL.org
Amazon Redshift
Custom ODBC and JDBC Drivers
• Up to 35% higher performance than open source drivers
• Supported by Informatica, Microstrategy, Pentaho, Qlik,
SAS, Tableau
• Will continue to support PostgreSQL open source drivers
• Download drivers from console
User Defined Functions
• We’re enabling User Defined Functions (UDFs) so
you can add your own
 Scalar and Aggregate Functions supported
• You’ll be able to write UDFs using Python 2.7
 Syntax is largely identical to PostgreSQL UDF Syntax
 System and network calls within UDFs are prohibited
• Comes with Pandas, NumPy, and SciPy pre-installed
 You’ll also be able import your own libraries for even more
flexibility
Scalar UDF example – URL parsing
Rather than using complex REGEX expressions, you can import
standard Python URL parsing libraries and use them in your SQL
Interleaved Multi Column Sort
• Currently support Compound Sort Keys
 Optimized for applications that filter data by one leading column
• Adding support for Interleaved Sort Keys
 Optimized for filtering data by up to eight columns
 No storage overhead unlike an index
 Lower maintenance penalty compared to indexes
Compound Sort Keys Illustrated
• Records in Redshift are
stored in blocks.
• For this illustration, let’s
assume that four records fill
a block
• Records with a given cust_id
are all in one block
• However, records with a
given prod_id are spread
across four blocks
1
1
1
1
2
3
4
1
4
4
4
2
3
4
4
1
3
3
3
2
3
4
3
1
2
2
2
2
3
4
2
1
1 [1,1] [1,2] [1,3] [1,4]
2 [2,1] [2,2] [2,3] [2,4]
3 [3,1] [3,2] [3,3] [3,4]
4 [4,1] [4,2] [4,3] [4,4]
1 2 3 4
prod_id
cust_id
cust_id prod_id other columns blocks
1 [1,1] [1,2] [1,3] [1,4]
2 [2,1] [2,2] [2,3] [2,4]
3 [3,1] [3,2] [3,3] [3,4]
4 [4,1] [4,2] [4,3] [4,4]
1 2 3 4
prod_id
cust_id
Interleaved Sort Keys Illustrated
• Records with a given
cust_id are spread
across two blocks
• Records with a given
prod_id are also spread
across two blocks
• Data is sorted in equal
measures for both keys
1
1
2
2
2
1
2
3
3
4
4
4
3
4
3
1
3
4
4
2
1
2
3
3
1
2
2
4
3
4
1
1
cust_id prod_id other columns blocks
How to use the feature
• New keyword ‘INTERLEAVED’ when defining sort keys
 Existing syntax will still work and behavior is unchanged
 You can choose up to 8 columns to include and can query with any or
all of them
• No change needed to queries
• Benefits are significant
[ SORTKEY [ COMPOUND | INTERLEAVED ] ( column_name [, ...] ) ]
SELECT
INTO OUTFILE
s3cmd
COPY
Staging Prod
SQL
bcp
SQL Server
Redshift Use Case
Operational Reporting with Redshift
Amazon S3
Log Bucket
Amazon
EMR
Processed and
structured log data
Amazon
Redshift
Operational
Reports
Thank you
Questions?

More Related Content

What's hot

Migrating to the Cloud
Migrating to the CloudMigrating to the Cloud
Migrating to the Cloud
Amazon Web Services
 
Overview of AWS by Andy Jassy - SVP, AWS
Overview of AWS by Andy Jassy - SVP, AWSOverview of AWS by Andy Jassy - SVP, AWS
Overview of AWS by Andy Jassy - SVP, AWS
Amazon Web Services
 
Introduction to Amazon Relational Database Service
Introduction to Amazon Relational Database ServiceIntroduction to Amazon Relational Database Service
Introduction to Amazon Relational Database Service
Amazon Web Services
 
ABD315_Serverless ETL with AWS Glue
ABD315_Serverless ETL with AWS GlueABD315_Serverless ETL with AWS Glue
ABD315_Serverless ETL with AWS Glue
Amazon Web Services
 
Introduction to AWS Glue
Introduction to AWS Glue Introduction to AWS Glue
Introduction to AWS Glue
Amazon Web Services
 
Databases - Choosing the right Database on AWS
Databases - Choosing the right Database on AWSDatabases - Choosing the right Database on AWS
Databases - Choosing the right Database on AWS
Amazon Web Services
 
AWS Architecting In The Cloud
AWS Architecting In The CloudAWS Architecting In The Cloud
AWS Architecting In The Cloud
Amazon Web Services
 
AWS EC2
AWS EC2AWS EC2
AWS EC2
Mahesh Raj
 
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Cathrine Wilhelmsen
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse Architecture
James Serra
 
Amazon Kinesis
Amazon KinesisAmazon Kinesis
Amazon Kinesis
Amazon Web Services
 
AWS Glue - let's get stuck in!
AWS Glue - let's get stuck in!AWS Glue - let's get stuck in!
AWS Glue - let's get stuck in!
Chris Taylor
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)
James Serra
 
Aws ppt
Aws pptAws ppt
Aws ppt
RamyaG50
 
Building A Modern Data Analytics Architecture on AWS
Building A Modern Data Analytics Architecture on AWSBuilding A Modern Data Analytics Architecture on AWS
Building A Modern Data Analytics Architecture on AWS
Amazon Web Services
 
Migrating Databases to the Cloud: Introduction to AWS DMS - SRV215 - Chicago ...
Migrating Databases to the Cloud: Introduction to AWS DMS - SRV215 - Chicago ...Migrating Databases to the Cloud: Introduction to AWS DMS - SRV215 - Chicago ...
Migrating Databases to the Cloud: Introduction to AWS DMS - SRV215 - Chicago ...
Amazon Web Services
 
Leveraging the AWS Sales Methodology and Partner Best Practices aws-partner-s...
Leveraging the AWS Sales Methodology and Partner Best Practices aws-partner-s...Leveraging the AWS Sales Methodology and Partner Best Practices aws-partner-s...
Leveraging the AWS Sales Methodology and Partner Best Practices aws-partner-s...
Amazon Web Services
 
The Zen of DataOps – AWS Lake Formation and the Data Supply Chain Pipeline
The Zen of DataOps – AWS Lake Formation and the Data Supply Chain PipelineThe Zen of DataOps – AWS Lake Formation and the Data Supply Chain Pipeline
The Zen of DataOps – AWS Lake Formation and the Data Supply Chain Pipeline
Amazon Web Services
 
Amazon Relational Database Service (Amazon RDS)
Amazon Relational Database Service (Amazon RDS)Amazon Relational Database Service (Amazon RDS)
Amazon Relational Database Service (Amazon RDS)
Amazon Web Services
 
Architecting for the Cloud using NetflixOSS - Codemash Workshop
Architecting for the Cloud using NetflixOSS - Codemash WorkshopArchitecting for the Cloud using NetflixOSS - Codemash Workshop
Architecting for the Cloud using NetflixOSS - Codemash Workshop
Sudhir Tonse
 

What's hot (20)

Migrating to the Cloud
Migrating to the CloudMigrating to the Cloud
Migrating to the Cloud
 
Overview of AWS by Andy Jassy - SVP, AWS
Overview of AWS by Andy Jassy - SVP, AWSOverview of AWS by Andy Jassy - SVP, AWS
Overview of AWS by Andy Jassy - SVP, AWS
 
Introduction to Amazon Relational Database Service
Introduction to Amazon Relational Database ServiceIntroduction to Amazon Relational Database Service
Introduction to Amazon Relational Database Service
 
ABD315_Serverless ETL with AWS Glue
ABD315_Serverless ETL with AWS GlueABD315_Serverless ETL with AWS Glue
ABD315_Serverless ETL with AWS Glue
 
Introduction to AWS Glue
Introduction to AWS Glue Introduction to AWS Glue
Introduction to AWS Glue
 
Databases - Choosing the right Database on AWS
Databases - Choosing the right Database on AWSDatabases - Choosing the right Database on AWS
Databases - Choosing the right Database on AWS
 
AWS Architecting In The Cloud
AWS Architecting In The CloudAWS Architecting In The Cloud
AWS Architecting In The Cloud
 
AWS EC2
AWS EC2AWS EC2
AWS EC2
 
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse Architecture
 
Amazon Kinesis
Amazon KinesisAmazon Kinesis
Amazon Kinesis
 
AWS Glue - let's get stuck in!
AWS Glue - let's get stuck in!AWS Glue - let's get stuck in!
AWS Glue - let's get stuck in!
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)
 
Aws ppt
Aws pptAws ppt
Aws ppt
 
Building A Modern Data Analytics Architecture on AWS
Building A Modern Data Analytics Architecture on AWSBuilding A Modern Data Analytics Architecture on AWS
Building A Modern Data Analytics Architecture on AWS
 
Migrating Databases to the Cloud: Introduction to AWS DMS - SRV215 - Chicago ...
Migrating Databases to the Cloud: Introduction to AWS DMS - SRV215 - Chicago ...Migrating Databases to the Cloud: Introduction to AWS DMS - SRV215 - Chicago ...
Migrating Databases to the Cloud: Introduction to AWS DMS - SRV215 - Chicago ...
 
Leveraging the AWS Sales Methodology and Partner Best Practices aws-partner-s...
Leveraging the AWS Sales Methodology and Partner Best Practices aws-partner-s...Leveraging the AWS Sales Methodology and Partner Best Practices aws-partner-s...
Leveraging the AWS Sales Methodology and Partner Best Practices aws-partner-s...
 
The Zen of DataOps – AWS Lake Formation and the Data Supply Chain Pipeline
The Zen of DataOps – AWS Lake Formation and the Data Supply Chain PipelineThe Zen of DataOps – AWS Lake Formation and the Data Supply Chain Pipeline
The Zen of DataOps – AWS Lake Formation and the Data Supply Chain Pipeline
 
Amazon Relational Database Service (Amazon RDS)
Amazon Relational Database Service (Amazon RDS)Amazon Relational Database Service (Amazon RDS)
Amazon Relational Database Service (Amazon RDS)
 
Architecting for the Cloud using NetflixOSS - Codemash Workshop
Architecting for the Cloud using NetflixOSS - Codemash WorkshopArchitecting for the Cloud using NetflixOSS - Codemash Workshop
Architecting for the Cloud using NetflixOSS - Codemash Workshop
 

Viewers also liked

(ARC202) Real-World Real-Time Analytics | AWS re:Invent 2014
(ARC202) Real-World Real-Time Analytics | AWS re:Invent 2014(ARC202) Real-World Real-Time Analytics | AWS re:Invent 2014
(ARC202) Real-World Real-Time Analytics | AWS re:Invent 2014
Amazon Web Services
 
DPACC Acceleration Progress and Demonstration
DPACC Acceleration Progress and DemonstrationDPACC Acceleration Progress and Demonstration
DPACC Acceleration Progress and Demonstration
OPNFV
 
Analytics & Reporting for Amazon Cloud Logs
Analytics & Reporting for Amazon Cloud LogsAnalytics & Reporting for Amazon Cloud Logs
Analytics & Reporting for Amazon Cloud Logs
Cloudlytics
 
World's best AWS Cloud Log Analytics & Management Tool
World's best AWS Cloud Log Analytics & Management ToolWorld's best AWS Cloud Log Analytics & Management Tool
World's best AWS Cloud Log Analytics & Management Tool
Cloudlytics
 
(MBL303) Get Deeper Insights Using Amazon Mobile Analytics | AWS re:Invent 2014
(MBL303) Get Deeper Insights Using Amazon Mobile Analytics | AWS re:Invent 2014(MBL303) Get Deeper Insights Using Amazon Mobile Analytics | AWS re:Invent 2014
(MBL303) Get Deeper Insights Using Amazon Mobile Analytics | AWS re:Invent 2014
Amazon Web Services
 
Data Analytics on AWS
Data Analytics on AWSData Analytics on AWS
Data Analytics on AWS
Danilo Poccia
 
GDC 2015 - Game Analytics with AWS Redshift, Kinesis, and the Mobile SDK
GDC 2015 - Game Analytics with AWS Redshift, Kinesis, and the Mobile SDKGDC 2015 - Game Analytics with AWS Redshift, Kinesis, and the Mobile SDK
GDC 2015 - Game Analytics with AWS Redshift, Kinesis, and the Mobile SDK
Nate Wiger
 
Big Data Analytics with AWS and AWS Marketplace Webinar
Big Data Analytics with AWS and AWS Marketplace WebinarBig Data Analytics with AWS and AWS Marketplace Webinar
Big Data Analytics with AWS and AWS Marketplace Webinar
Amazon Web Services
 
AWS_Architecture_e-commerce
AWS_Architecture_e-commerceAWS_Architecture_e-commerce
AWS_Architecture_e-commerce
SEONGTAEK OH
 
(BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesi...
(BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesi...(BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesi...
(BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesi...
Amazon Web Services
 
AWS Webcast - Tableau Big Data Solution Showcase
AWS Webcast - Tableau Big Data Solution ShowcaseAWS Webcast - Tableau Big Data Solution Showcase
AWS Webcast - Tableau Big Data Solution Showcase
Amazon Web Services
 
(WEB301) Operational Web Log Analysis | AWS re:Invent 2014
(WEB301) Operational Web Log Analysis | AWS re:Invent 2014(WEB301) Operational Web Log Analysis | AWS re:Invent 2014
(WEB301) Operational Web Log Analysis | AWS re:Invent 2014
Amazon Web Services
 
AWS re:Invent 2016: Moving Mission Critical Apps from One Region to Multi-Reg...
AWS re:Invent 2016: Moving Mission Critical Apps from One Region to Multi-Reg...AWS re:Invent 2016: Moving Mission Critical Apps from One Region to Multi-Reg...
AWS re:Invent 2016: Moving Mission Critical Apps from One Region to Multi-Reg...
Amazon Web Services
 
Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Mac...
Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Mac...Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Mac...
Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Mac...
Amazon Web Services
 
One Click Enterprise IoT Services - March 2017 AWS Online Tech Talks
One Click Enterprise IoT Services - March 2017 AWS Online Tech TalksOne Click Enterprise IoT Services - March 2017 AWS Online Tech Talks
One Click Enterprise IoT Services - March 2017 AWS Online Tech Talks
Amazon Web Services
 
AWS re:Invent 2016: Understanding IoT Data: How to Leverage Amazon Kinesis in...
AWS re:Invent 2016: Understanding IoT Data: How to Leverage Amazon Kinesis in...AWS re:Invent 2016: Understanding IoT Data: How to Leverage Amazon Kinesis in...
AWS re:Invent 2016: Understanding IoT Data: How to Leverage Amazon Kinesis in...
Amazon Web Services
 
AWS re:Invent 2016: Reduce Your Blast Radius by Using Multiple AWS Accounts P...
AWS re:Invent 2016: Reduce Your Blast Radius by Using Multiple AWS Accounts P...AWS re:Invent 2016: Reduce Your Blast Radius by Using Multiple AWS Accounts P...
AWS re:Invent 2016: Reduce Your Blast Radius by Using Multiple AWS Accounts P...
Amazon Web Services
 
Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 20...
Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 20...Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 20...
Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 20...
Amazon Web Services
 
Amazon Web Services Customer Case Study, 9flats.com
Amazon Web Services Customer Case Study, 9flats.comAmazon Web Services Customer Case Study, 9flats.com
Amazon Web Services Customer Case Study, 9flats.com
Amazon Web Services
 
Deploy, Manage & Scale Your Apps with Elastic Beanstalk
Deploy, Manage & Scale Your Apps with Elastic BeanstalkDeploy, Manage & Scale Your Apps with Elastic Beanstalk
Deploy, Manage & Scale Your Apps with Elastic Beanstalk
Amazon Web Services
 

Viewers also liked (20)

(ARC202) Real-World Real-Time Analytics | AWS re:Invent 2014
(ARC202) Real-World Real-Time Analytics | AWS re:Invent 2014(ARC202) Real-World Real-Time Analytics | AWS re:Invent 2014
(ARC202) Real-World Real-Time Analytics | AWS re:Invent 2014
 
DPACC Acceleration Progress and Demonstration
DPACC Acceleration Progress and DemonstrationDPACC Acceleration Progress and Demonstration
DPACC Acceleration Progress and Demonstration
 
Analytics & Reporting for Amazon Cloud Logs
Analytics & Reporting for Amazon Cloud LogsAnalytics & Reporting for Amazon Cloud Logs
Analytics & Reporting for Amazon Cloud Logs
 
World's best AWS Cloud Log Analytics & Management Tool
World's best AWS Cloud Log Analytics & Management ToolWorld's best AWS Cloud Log Analytics & Management Tool
World's best AWS Cloud Log Analytics & Management Tool
 
(MBL303) Get Deeper Insights Using Amazon Mobile Analytics | AWS re:Invent 2014
(MBL303) Get Deeper Insights Using Amazon Mobile Analytics | AWS re:Invent 2014(MBL303) Get Deeper Insights Using Amazon Mobile Analytics | AWS re:Invent 2014
(MBL303) Get Deeper Insights Using Amazon Mobile Analytics | AWS re:Invent 2014
 
Data Analytics on AWS
Data Analytics on AWSData Analytics on AWS
Data Analytics on AWS
 
GDC 2015 - Game Analytics with AWS Redshift, Kinesis, and the Mobile SDK
GDC 2015 - Game Analytics with AWS Redshift, Kinesis, and the Mobile SDKGDC 2015 - Game Analytics with AWS Redshift, Kinesis, and the Mobile SDK
GDC 2015 - Game Analytics with AWS Redshift, Kinesis, and the Mobile SDK
 
Big Data Analytics with AWS and AWS Marketplace Webinar
Big Data Analytics with AWS and AWS Marketplace WebinarBig Data Analytics with AWS and AWS Marketplace Webinar
Big Data Analytics with AWS and AWS Marketplace Webinar
 
AWS_Architecture_e-commerce
AWS_Architecture_e-commerceAWS_Architecture_e-commerce
AWS_Architecture_e-commerce
 
(BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesi...
(BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesi...(BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesi...
(BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesi...
 
AWS Webcast - Tableau Big Data Solution Showcase
AWS Webcast - Tableau Big Data Solution ShowcaseAWS Webcast - Tableau Big Data Solution Showcase
AWS Webcast - Tableau Big Data Solution Showcase
 
(WEB301) Operational Web Log Analysis | AWS re:Invent 2014
(WEB301) Operational Web Log Analysis | AWS re:Invent 2014(WEB301) Operational Web Log Analysis | AWS re:Invent 2014
(WEB301) Operational Web Log Analysis | AWS re:Invent 2014
 
AWS re:Invent 2016: Moving Mission Critical Apps from One Region to Multi-Reg...
AWS re:Invent 2016: Moving Mission Critical Apps from One Region to Multi-Reg...AWS re:Invent 2016: Moving Mission Critical Apps from One Region to Multi-Reg...
AWS re:Invent 2016: Moving Mission Critical Apps from One Region to Multi-Reg...
 
Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Mac...
Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Mac...Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Mac...
Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Mac...
 
One Click Enterprise IoT Services - March 2017 AWS Online Tech Talks
One Click Enterprise IoT Services - March 2017 AWS Online Tech TalksOne Click Enterprise IoT Services - March 2017 AWS Online Tech Talks
One Click Enterprise IoT Services - March 2017 AWS Online Tech Talks
 
AWS re:Invent 2016: Understanding IoT Data: How to Leverage Amazon Kinesis in...
AWS re:Invent 2016: Understanding IoT Data: How to Leverage Amazon Kinesis in...AWS re:Invent 2016: Understanding IoT Data: How to Leverage Amazon Kinesis in...
AWS re:Invent 2016: Understanding IoT Data: How to Leverage Amazon Kinesis in...
 
AWS re:Invent 2016: Reduce Your Blast Radius by Using Multiple AWS Accounts P...
AWS re:Invent 2016: Reduce Your Blast Radius by Using Multiple AWS Accounts P...AWS re:Invent 2016: Reduce Your Blast Radius by Using Multiple AWS Accounts P...
AWS re:Invent 2016: Reduce Your Blast Radius by Using Multiple AWS Accounts P...
 
Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 20...
Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 20...Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 20...
Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 20...
 
Amazon Web Services Customer Case Study, 9flats.com
Amazon Web Services Customer Case Study, 9flats.comAmazon Web Services Customer Case Study, 9flats.com
Amazon Web Services Customer Case Study, 9flats.com
 
Deploy, Manage & Scale Your Apps with Elastic Beanstalk
Deploy, Manage & Scale Your Apps with Elastic BeanstalkDeploy, Manage & Scale Your Apps with Elastic Beanstalk
Deploy, Manage & Scale Your Apps with Elastic Beanstalk
 

Similar to AWS Analytics

Processing and Analytics
Processing and AnalyticsProcessing and Analytics
Processing and Analytics
Amazon Web Services
 
Leveraging Amazon Redshift for your Data Warehouse
Leveraging Amazon Redshift for your Data WarehouseLeveraging Amazon Redshift for your Data Warehouse
Leveraging Amazon Redshift for your Data Warehouse
Amazon Web Services
 
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
Amazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
Amazon Web Services
 
Data & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon RedshiftData & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon Redshift
Amazon Web Services
 
Getting Started with Big Data and HPC in the Cloud - August 2015
Getting Started with Big Data and HPC in the Cloud - August 2015Getting Started with Big Data and HPC in the Cloud - August 2015
Getting Started with Big Data and HPC in the Cloud - August 2015
Amazon Web Services
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon Redshift
Amazon Web Services
 
Deep Dive: Amazon Elastic MapReduce
Deep Dive: Amazon Elastic MapReduceDeep Dive: Amazon Elastic MapReduce
Deep Dive: Amazon Elastic MapReduce
Amazon Web Services
 
AWS June Webinar Series - Getting Started: Amazon Redshift
AWS June Webinar Series - Getting Started: Amazon RedshiftAWS June Webinar Series - Getting Started: Amazon Redshift
AWS June Webinar Series - Getting Started: Amazon Redshift
Amazon Web Services
 
Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013
Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013
Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013
Amazon Web Services
 
Introduction to Database Services
Introduction to Database ServicesIntroduction to Database Services
Introduction to Database Services
Amazon Web Services
 
Redshift overview
Redshift overviewRedshift overview
Redshift overview
Amazon Web Services LATAM
 
London Redshift Meetup - July 2017
London Redshift Meetup - July 2017London Redshift Meetup - July 2017
London Redshift Meetup - July 2017
Pratim Das
 
[よくわかるAmazon Redshift in 大阪]Amazon Redshift最新情報と導入事例のご紹介
[よくわかるAmazon Redshift in 大阪]Amazon Redshift最新情報と導入事例のご紹介[よくわかるAmazon Redshift in 大阪]Amazon Redshift最新情報と導入事例のご紹介
[よくわかるAmazon Redshift in 大阪]Amazon Redshift最新情報と導入事例のご紹介
Amazon Web Services Japan
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
Amazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
Amazon Web Services
 
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of Things
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of ThingsDay 4 - Big Data on AWS - RedShift, EMR & the Internet of Things
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of Things
Amazon Web Services
 
Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon RedshiftUses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift
Amazon Web Services
 
Deep Dive: Amazon Elastic MapReduce
Deep Dive: Amazon Elastic MapReduceDeep Dive: Amazon Elastic MapReduce
Deep Dive: Amazon Elastic MapReduce
Amazon Web Services
 
AWS Webcast - Redshift Overview and New Features
AWS Webcast - Redshift Overview and New Features AWS Webcast - Redshift Overview and New Features
AWS Webcast - Redshift Overview and New Features
Amazon Web Services
 

Similar to AWS Analytics (20)

Processing and Analytics
Processing and AnalyticsProcessing and Analytics
Processing and Analytics
 
Leveraging Amazon Redshift for your Data Warehouse
Leveraging Amazon Redshift for your Data WarehouseLeveraging Amazon Redshift for your Data Warehouse
Leveraging Amazon Redshift for your Data Warehouse
 
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Data & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon RedshiftData & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon Redshift
 
Getting Started with Big Data and HPC in the Cloud - August 2015
Getting Started with Big Data and HPC in the Cloud - August 2015Getting Started with Big Data and HPC in the Cloud - August 2015
Getting Started with Big Data and HPC in the Cloud - August 2015
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon Redshift
 
Deep Dive: Amazon Elastic MapReduce
Deep Dive: Amazon Elastic MapReduceDeep Dive: Amazon Elastic MapReduce
Deep Dive: Amazon Elastic MapReduce
 
AWS June Webinar Series - Getting Started: Amazon Redshift
AWS June Webinar Series - Getting Started: Amazon RedshiftAWS June Webinar Series - Getting Started: Amazon Redshift
AWS June Webinar Series - Getting Started: Amazon Redshift
 
Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013
Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013
Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013
 
Introduction to Database Services
Introduction to Database ServicesIntroduction to Database Services
Introduction to Database Services
 
Redshift overview
Redshift overviewRedshift overview
Redshift overview
 
London Redshift Meetup - July 2017
London Redshift Meetup - July 2017London Redshift Meetup - July 2017
London Redshift Meetup - July 2017
 
[よくわかるAmazon Redshift in 大阪]Amazon Redshift最新情報と導入事例のご紹介
[よくわかるAmazon Redshift in 大阪]Amazon Redshift最新情報と導入事例のご紹介[よくわかるAmazon Redshift in 大阪]Amazon Redshift最新情報と導入事例のご紹介
[よくわかるAmazon Redshift in 大阪]Amazon Redshift最新情報と導入事例のご紹介
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of Things
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of ThingsDay 4 - Big Data on AWS - RedShift, EMR & the Internet of Things
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of Things
 
Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon RedshiftUses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift
 
Deep Dive: Amazon Elastic MapReduce
Deep Dive: Amazon Elastic MapReduceDeep Dive: Amazon Elastic MapReduce
Deep Dive: Amazon Elastic MapReduce
 
AWS Webcast - Redshift Overview and New Features
AWS Webcast - Redshift Overview and New Features AWS Webcast - Redshift Overview and New Features
AWS Webcast - Redshift Overview and New Features
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
Amazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
Amazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
Amazon Web Services
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Amazon Web Services
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
Amazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
Amazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Amazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
Amazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Amazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
Amazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Amazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
Amazon Web Services
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
Amazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
Amazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Recently uploaded

Garbage In, Garbage Out: Why poor data curation is killing your AI models (an...
Garbage In, Garbage Out: Why poor data curation is killing your AI models (an...Garbage In, Garbage Out: Why poor data curation is killing your AI models (an...
Garbage In, Garbage Out: Why poor data curation is killing your AI models (an...
Zilliz
 
What's New in Teams Calling, Meetings, Devices June 2024
What's New in Teams Calling, Meetings, Devices June 2024What's New in Teams Calling, Meetings, Devices June 2024
What's New in Teams Calling, Meetings, Devices June 2024
Stephanie Beckett
 
Mastering Board Best Practices: Essential Skills for Effective Non-profit Lea...
Mastering Board Best Practices: Essential Skills for Effective Non-profit Lea...Mastering Board Best Practices: Essential Skills for Effective Non-profit Lea...
Mastering Board Best Practices: Essential Skills for Effective Non-profit Lea...
OnBoard
 
UX Webinar Series: Essentials for Adopting Passkeys as the Foundation of your...
UX Webinar Series: Essentials for Adopting Passkeys as the Foundation of your...UX Webinar Series: Essentials for Adopting Passkeys as the Foundation of your...
UX Webinar Series: Essentials for Adopting Passkeys as the Foundation of your...
FIDO Alliance
 
FIDO Munich Seminar Blueprint for In-Vehicle Payment Standard.pptx
FIDO Munich Seminar Blueprint for In-Vehicle Payment Standard.pptxFIDO Munich Seminar Blueprint for In-Vehicle Payment Standard.pptx
FIDO Munich Seminar Blueprint for In-Vehicle Payment Standard.pptx
FIDO Alliance
 
Zaitechno Handheld Raman Spectrometer.pdf
Zaitechno Handheld Raman Spectrometer.pdfZaitechno Handheld Raman Spectrometer.pdf
Zaitechno Handheld Raman Spectrometer.pdf
AmandaCheung15
 
Redefining Cybersecurity with AI Capabilities
Redefining Cybersecurity with AI CapabilitiesRedefining Cybersecurity with AI Capabilities
Redefining Cybersecurity with AI Capabilities
Priyanka Aash
 
Retrieval Augmented Generation Evaluation with Ragas
Retrieval Augmented Generation Evaluation with RagasRetrieval Augmented Generation Evaluation with Ragas
Retrieval Augmented Generation Evaluation with Ragas
Zilliz
 
Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...
Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...
Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...
Snarky Security
 
FIDO Munich Seminar Workforce Authentication Case Study.pptx
FIDO Munich Seminar Workforce Authentication Case Study.pptxFIDO Munich Seminar Workforce Authentication Case Study.pptx
FIDO Munich Seminar Workforce Authentication Case Study.pptx
FIDO Alliance
 
Scaling Vector Search: How Milvus Handles Billions+
Scaling Vector Search: How Milvus Handles Billions+Scaling Vector Search: How Milvus Handles Billions+
Scaling Vector Search: How Milvus Handles Billions+
Zilliz
 
FIDO Munich Seminar FIDO Automotive Apps.pptx
FIDO Munich Seminar FIDO Automotive Apps.pptxFIDO Munich Seminar FIDO Automotive Apps.pptx
FIDO Munich Seminar FIDO Automotive Apps.pptx
FIDO Alliance
 
Keynote : Presentation on SASE Technology
Keynote : Presentation on SASE TechnologyKeynote : Presentation on SASE Technology
Keynote : Presentation on SASE Technology
Priyanka Aash
 
FIDO Munich Seminar: Biometrics and Passkeys for In-Vehicle Apps.pptx
FIDO Munich Seminar: Biometrics and Passkeys for In-Vehicle Apps.pptxFIDO Munich Seminar: Biometrics and Passkeys for In-Vehicle Apps.pptx
FIDO Munich Seminar: Biometrics and Passkeys for In-Vehicle Apps.pptx
FIDO Alliance
 
Indian Privacy law & Infosec for Startups
Indian Privacy law & Infosec for StartupsIndian Privacy law & Infosec for Startups
Indian Privacy law & Infosec for Startups
AMol NAik
 
Semantic-Aware Code Model: Elevating the Future of Software Development
Semantic-Aware Code Model: Elevating the Future of Software DevelopmentSemantic-Aware Code Model: Elevating the Future of Software Development
Semantic-Aware Code Model: Elevating the Future of Software Development
Baishakhi Ray
 
Exchange, Entra ID, Conectores, RAML: Todo, a la vez, en todas partes
Exchange, Entra ID, Conectores, RAML: Todo, a la vez, en todas partesExchange, Entra ID, Conectores, RAML: Todo, a la vez, en todas partes
Exchange, Entra ID, Conectores, RAML: Todo, a la vez, en todas partes
jorgelebrato
 
NVIDIA at Breakthrough Discuss for Space Exploration
NVIDIA at Breakthrough Discuss for Space ExplorationNVIDIA at Breakthrough Discuss for Space Exploration
NVIDIA at Breakthrough Discuss for Space Exploration
Alison B. Lowndes
 
Demystifying Neural Networks And Building Cybersecurity Applications
Demystifying Neural Networks And Building Cybersecurity ApplicationsDemystifying Neural Networks And Building Cybersecurity Applications
Demystifying Neural Networks And Building Cybersecurity Applications
Priyanka Aash
 
FIDO Munich Seminar: Securing Smart Car.pptx
FIDO Munich Seminar: Securing Smart Car.pptxFIDO Munich Seminar: Securing Smart Car.pptx
FIDO Munich Seminar: Securing Smart Car.pptx
FIDO Alliance
 

Recently uploaded (20)

Garbage In, Garbage Out: Why poor data curation is killing your AI models (an...
Garbage In, Garbage Out: Why poor data curation is killing your AI models (an...Garbage In, Garbage Out: Why poor data curation is killing your AI models (an...
Garbage In, Garbage Out: Why poor data curation is killing your AI models (an...
 
What's New in Teams Calling, Meetings, Devices June 2024
What's New in Teams Calling, Meetings, Devices June 2024What's New in Teams Calling, Meetings, Devices June 2024
What's New in Teams Calling, Meetings, Devices June 2024
 
Mastering Board Best Practices: Essential Skills for Effective Non-profit Lea...
Mastering Board Best Practices: Essential Skills for Effective Non-profit Lea...Mastering Board Best Practices: Essential Skills for Effective Non-profit Lea...
Mastering Board Best Practices: Essential Skills for Effective Non-profit Lea...
 
UX Webinar Series: Essentials for Adopting Passkeys as the Foundation of your...
UX Webinar Series: Essentials for Adopting Passkeys as the Foundation of your...UX Webinar Series: Essentials for Adopting Passkeys as the Foundation of your...
UX Webinar Series: Essentials for Adopting Passkeys as the Foundation of your...
 
FIDO Munich Seminar Blueprint for In-Vehicle Payment Standard.pptx
FIDO Munich Seminar Blueprint for In-Vehicle Payment Standard.pptxFIDO Munich Seminar Blueprint for In-Vehicle Payment Standard.pptx
FIDO Munich Seminar Blueprint for In-Vehicle Payment Standard.pptx
 
Zaitechno Handheld Raman Spectrometer.pdf
Zaitechno Handheld Raman Spectrometer.pdfZaitechno Handheld Raman Spectrometer.pdf
Zaitechno Handheld Raman Spectrometer.pdf
 
Redefining Cybersecurity with AI Capabilities
Redefining Cybersecurity with AI CapabilitiesRedefining Cybersecurity with AI Capabilities
Redefining Cybersecurity with AI Capabilities
 
Retrieval Augmented Generation Evaluation with Ragas
Retrieval Augmented Generation Evaluation with RagasRetrieval Augmented Generation Evaluation with Ragas
Retrieval Augmented Generation Evaluation with Ragas
 
Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...
Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...
Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...
 
FIDO Munich Seminar Workforce Authentication Case Study.pptx
FIDO Munich Seminar Workforce Authentication Case Study.pptxFIDO Munich Seminar Workforce Authentication Case Study.pptx
FIDO Munich Seminar Workforce Authentication Case Study.pptx
 
Scaling Vector Search: How Milvus Handles Billions+
Scaling Vector Search: How Milvus Handles Billions+Scaling Vector Search: How Milvus Handles Billions+
Scaling Vector Search: How Milvus Handles Billions+
 
FIDO Munich Seminar FIDO Automotive Apps.pptx
FIDO Munich Seminar FIDO Automotive Apps.pptxFIDO Munich Seminar FIDO Automotive Apps.pptx
FIDO Munich Seminar FIDO Automotive Apps.pptx
 
Keynote : Presentation on SASE Technology
Keynote : Presentation on SASE TechnologyKeynote : Presentation on SASE Technology
Keynote : Presentation on SASE Technology
 
FIDO Munich Seminar: Biometrics and Passkeys for In-Vehicle Apps.pptx
FIDO Munich Seminar: Biometrics and Passkeys for In-Vehicle Apps.pptxFIDO Munich Seminar: Biometrics and Passkeys for In-Vehicle Apps.pptx
FIDO Munich Seminar: Biometrics and Passkeys for In-Vehicle Apps.pptx
 
Indian Privacy law & Infosec for Startups
Indian Privacy law & Infosec for StartupsIndian Privacy law & Infosec for Startups
Indian Privacy law & Infosec for Startups
 
Semantic-Aware Code Model: Elevating the Future of Software Development
Semantic-Aware Code Model: Elevating the Future of Software DevelopmentSemantic-Aware Code Model: Elevating the Future of Software Development
Semantic-Aware Code Model: Elevating the Future of Software Development
 
Exchange, Entra ID, Conectores, RAML: Todo, a la vez, en todas partes
Exchange, Entra ID, Conectores, RAML: Todo, a la vez, en todas partesExchange, Entra ID, Conectores, RAML: Todo, a la vez, en todas partes
Exchange, Entra ID, Conectores, RAML: Todo, a la vez, en todas partes
 
NVIDIA at Breakthrough Discuss for Space Exploration
NVIDIA at Breakthrough Discuss for Space ExplorationNVIDIA at Breakthrough Discuss for Space Exploration
NVIDIA at Breakthrough Discuss for Space Exploration
 
Demystifying Neural Networks And Building Cybersecurity Applications
Demystifying Neural Networks And Building Cybersecurity ApplicationsDemystifying Neural Networks And Building Cybersecurity Applications
Demystifying Neural Networks And Building Cybersecurity Applications
 
FIDO Munich Seminar: Securing Smart Car.pptx
FIDO Munich Seminar: Securing Smart Car.pptxFIDO Munich Seminar: Securing Smart Car.pptx
FIDO Munich Seminar: Securing Smart Car.pptx
 

AWS Analytics

  • 2. agenda overview 10:00 AM Registration 10:30 AM Introduction to Big Data @ AWS 12:00 PM Lunch + Registration for Technical Sessions 12:30 PM Data Collection and Storage 1:45PM Real-time Event Processing 3:00PM Analytics (incl Machine Learning) 4:30 PM Open Q&A Roundtable
  • 3. Collect Process Analyze Store Data Collection and Storage Data Processing Event Processing Data Analysis primitive patterns EMR Redshift Machine Learning
  • 4. Process and Analyze • Hadoop  Ad-hoc exploration of un-structured datasets  Batch Processing on Large datasets • Data Warehouses  Analysis via Visualization tools  Interactive querying of structured data • Machine learning  Predictions for what will happen  Smart applications
  • 5. Hadoop and Data Warehouses Databases Files Data warehouse Data Marts Reports Hadoop Ad-hoc Exploration Media Cloud ETL
  • 7. Why Amazon EMR? Easy to Use Launch a cluster in minutes Low Cost Pay an hourly rate Elastic Easily add or remove capacity Reliable Spend less time monitoring Secure Manage firewalls Flexible Control the cluster
  • 8. Try different configurations to find your optimal architecture CPU c3 family cc1.4xlarge cc2.8xlarge Memory m2 family r3 family Disk/IO d2 family i2 family General m1 family m3 family Choose your instance types Batch Machine Spark and Large process learning interactive HDFS
  • 9. Easy to add and remove compute capacity on your cluster Match compute demands with cluster sizing. Resizable clusters
  • 10. Spot Instances for task nodes Up to 90% off Amazon EC2 on-demand pricing On-demand for core nodes Standard Amazon EC2 pricing for on-demand capacity Easy to use Spot Instances Meet SLA at predictable cost Exceed SLA at lower cost
  • 11. Amazon S3 as your persistent data store • Separate compute and storage • Resize and shut down Amazon EMR clusters with no data loss • Point multiple Amazon EMR clusters at same data in Amazon S3 EMR EMR Amazon S3
  • 12. EMRFS makes it easier to leverage S3 • Better performance and error handling options • Transparent to applications – Use “s3://” • Consistent view  For consistent list and read-after-write for new puts • Support for Amazon S3 server-side and client-side encryption • Faster listing using EMRFS metadata
  • 13. EMRFS - S3 client-side encryption Amazon S3 AmazonS3encryptionclients EMRFSenabledfor AmazonS3client-sideencryption Key vendor (AWS KMS or your custom key vendor) (client-side encrypted objects)
  • 14. Amazon S3 EMRFS metadata in Amazon DynamoDB • List and read-after-write consistency • Faster list operations Number of objects Without Consistent Views With Consistent Views 1,000,000 147.72 29.70 100,000 12.70 3.69 Fast listing of S3 objects using EMRFS metadata *Tested using a single node cluster with a m3.xlarge instance.
  • 15. Optimize to leverage HDFS • Iterative workloads  If you’re processing the same dataset more than once • Disk I/O intensive workloads Persist data on Amazon S3 and use S3DistCp to copy to HDFS for processing
  • 16. Pattern #1: Batch processing GBs of logs pushed to Amazon S3 hourly Daily Amazon EMR cluster using Hive to process data Input and output stored in Amazon S3 Load subset into Redshift DW
  • 17. Pattern #2: Online data-store Data pushed to Amazon S3 Daily Amazon EMR cluster Extract, Transform, and Load (ETL) data into database 24/7 Amazon EMR cluster running HBase holds last 2 years’ worth of data Front-end service uses HBase cluster to power dashboard with high concurrency
  • 18. Pattern #3: Interactive query TBs of logs sent daily Logs stored in S3 Transient EMR clusters Hive Metastore
  • 19. File formats • Row oriented  Text files  Sequence files • Writable object  Avro data files • Described by schema • Columnar format  Object Record Columnar (ORC)  Parquet Logical Table Row oriented Column oriented
  • 20. Choosing the right file format • Processing and query tools  Hive, Impala, and Presto. • Evolution of schema  Avro for schema and Presto for storage. • File format “splittability”  Avoid JSON/XML Files. Use them as records.
  • 21. Choosing the right compression • Time sensitive: faster compressions are a better choice • Large amount of data: use space-efficient compressions Algorithm Splittable? Compression Ratio Compress + Decompress Speed Gzip (DEFLATE) No High Medium bzip2 Yes Very high Slow LZO Yes Low Fast Snappy No Low Very fast
  • 22. Dealing with small files • Reduce HDFS block size (e.g., 1 MB [default is 128 MB])  --bootstrap-action s3://elasticmapreduce/bootstrap-actions/configure- hadoop --args “-m,dfs.block.size=1048576” • Better: use S3DistCp to combine smaller files together  S3DistCp takes a pattern and target path to combine smaller input files into larger ones  Supply a target size and compression codec
  • 23. DEMO: Log Processing using Amazon EMR • Aggregating small files using s3distcp • Defining Hive tables with data on Amazon S3 • Transforming dataset using Batch processing • Interactive querying using Presto and Spark-Sql Amazon S3 Log Bucket Amazon EMR Processed and structured log data
  • 25. Amazon Redshift Architecture • Leader Node  SQL endpoint  Stores metadata  Coordinates query execution • Compute Nodes  Local, columnar storage  Execute queries in parallel  Load, backup, restore via Amazon S3; load from Amazon DynamoDB or SSH • Two hardware platforms  Optimized for data processing  DW1: HDD; scale from 2TB to 1.6PB  DW2: SSD; scale from 160GB to 256TB 10 GigE (HPC) Ingestion Backup Restore JDBC/ODBC
  • 26. Amazon Redshift Node Types • Optimized for I/O intensive workloads • High disk density • On demand at $0.85/hour • As low as $1,000/TB/Year • Scale from 2TB to 1.6PB DW1.XL: 16 GB RAM, 2 Cores 3 Spindles, 2 TB compressed storage DW1.8XL: 128 GB RAM, 16 Cores, 24 Spindles 16 TB compressed, 2 GB/sec scan rate • High performance at smaller storage size • High compute and memory density • On demand at $0.25/hour • As low as $5,500/TB/Year • Scale from 160GB to 256TB DW2.L *New*: 16 GB RAM, 2 Cores, 160 GB compressed SSD storage DW2.8XL *New*: 256 GB RAM, 32 Cores, 2.56 TB of compressed SSD storage
  • 27. Amazon Redshift dramatically reduces I/O Column storage Data compression Zone maps Direct-attached storage • With row storage you do unnecessary I/O • To get total amount, you have to read everything ID Age State Amount 123 20 CA 500 345 25 WA 250 678 40 FL 125 957 37 WA 375
  • 28. Amazon Redshift dramatically reduces I/O Column storage Data compression Zone maps Direct-attached storage With column storage, you only read the data you need ID Age State Amount 123 20 CA 500 345 25 WA 250 678 40 FL 125 957 37 WA 375
  • 29. analyze compression listing; Table | Column | Encoding ---------+----------------+---------- listing | listid | delta listing | sellerid | delta32k listing | eventid | delta32k listing | dateid | bytedict listing | numtickets | bytedict listing | priceperticket | delta32k listing | totalprice | mostly32 listing | listtime | raw Amazon Redshift dramatically reduces I/O Column storage Data compression Zone maps Direct-attached storage • COPY compresses automatically • You can analyze and override • More performance, less cost
  • 30. Amazon Redshift dramatically reduces I/O Column storage Data compression Zone maps Direct-attached storage • Track the minimum and maximum value for each block • Skip over blocks that don’t contain relevant data 10 | 13 | 14 | 26 |… … | 100 | 245 | 324 375 | 393 | 417… … 512 | 549 | 623 637 | 712 | 809 … … | 834 | 921 | 959 10 324 375 623 637 959
  • 31. Amazon Redshift dramatically reduces I/O Column storage Data compression Zone maps Direct-attached storage • Use local storage for performance • Maximize scan rates • Automatic replication and continuous backup • HDD & SSD platforms
  • 32. Amazon Redshift parallelizes and distributes everything Query Load Backup/Restore Resize
  • 33. Amazon Redshift parallelizes and distributes everything Query Load Backup/Restore Resize • Load in parallel from Amazon S3 or DynamoDB or any SSH connection • Data automatically distributed and sorted according to DDL • Scales linearly with the number of nodes in the cluster
  • 34. Amazon Redshift parallelizes and distributes everything Query Load Backup/Restore Resize • Backups to Amazon S3 are automatic, continuous and incremental • Configurable system snapshot retention period. Take user snapshots on-demand • Cross region backups for disaster recovery • Streaming restores enable you to resume querying faster
  • 35. Amazon Redshift parallelizes and distributes everything Query Load Backup/Restore Resize • Resize while remaining online • Provision a new cluster in the background • Copy data in parallel from node to node • Only charged for source cluster
  • 36. Amazon Redshift parallelizes and distributes everything Query Load Backup/Restore Resize • Automatic SQL endpoint switchover via DNS • Decommission the source cluster • Simple operation via Console or API
  • 37. Amazon Redshift works with your existing analysis tools JDBC/ODBC Connect using drivers from PostgreSQL.org Amazon Redshift
  • 38. Custom ODBC and JDBC Drivers • Up to 35% higher performance than open source drivers • Supported by Informatica, Microstrategy, Pentaho, Qlik, SAS, Tableau • Will continue to support PostgreSQL open source drivers • Download drivers from console
  • 39. User Defined Functions • We’re enabling User Defined Functions (UDFs) so you can add your own  Scalar and Aggregate Functions supported • You’ll be able to write UDFs using Python 2.7  Syntax is largely identical to PostgreSQL UDF Syntax  System and network calls within UDFs are prohibited • Comes with Pandas, NumPy, and SciPy pre-installed  You’ll also be able import your own libraries for even more flexibility
  • 40. Scalar UDF example – URL parsing Rather than using complex REGEX expressions, you can import standard Python URL parsing libraries and use them in your SQL
  • 41. Interleaved Multi Column Sort • Currently support Compound Sort Keys  Optimized for applications that filter data by one leading column • Adding support for Interleaved Sort Keys  Optimized for filtering data by up to eight columns  No storage overhead unlike an index  Lower maintenance penalty compared to indexes
  • 42. Compound Sort Keys Illustrated • Records in Redshift are stored in blocks. • For this illustration, let’s assume that four records fill a block • Records with a given cust_id are all in one block • However, records with a given prod_id are spread across four blocks 1 1 1 1 2 3 4 1 4 4 4 2 3 4 4 1 3 3 3 2 3 4 3 1 2 2 2 2 3 4 2 1 1 [1,1] [1,2] [1,3] [1,4] 2 [2,1] [2,2] [2,3] [2,4] 3 [3,1] [3,2] [3,3] [3,4] 4 [4,1] [4,2] [4,3] [4,4] 1 2 3 4 prod_id cust_id cust_id prod_id other columns blocks
  • 43. 1 [1,1] [1,2] [1,3] [1,4] 2 [2,1] [2,2] [2,3] [2,4] 3 [3,1] [3,2] [3,3] [3,4] 4 [4,1] [4,2] [4,3] [4,4] 1 2 3 4 prod_id cust_id Interleaved Sort Keys Illustrated • Records with a given cust_id are spread across two blocks • Records with a given prod_id are also spread across two blocks • Data is sorted in equal measures for both keys 1 1 2 2 2 1 2 3 3 4 4 4 3 4 3 1 3 4 4 2 1 2 3 3 1 2 2 4 3 4 1 1 cust_id prod_id other columns blocks
  • 44. How to use the feature • New keyword ‘INTERLEAVED’ when defining sort keys  Existing syntax will still work and behavior is unchanged  You can choose up to 8 columns to include and can query with any or all of them • No change needed to queries • Benefits are significant [ SORTKEY [ COMPOUND | INTERLEAVED ] ( column_name [, ...] ) ]
  • 46. Operational Reporting with Redshift Amazon S3 Log Bucket Amazon EMR Processed and structured log data Amazon Redshift Operational Reports

Editor's Notes

  1. Six main reasons why Amazon EMR
  2. In the next few slides, we’ll talk about data persistence models with Amazon EMR. The first pattern is Amazon S3 as HDFS. With this data persistence model, data gets stored on Amazon S3. HDFS does not play any role in storing data. As a matter of fact, HDFS is only there for temporary storage. Another common thing I hear is that storing data on Amazon S3 instead of HDFS slows my job down a lot because data has to get copied to the HDFS/disk first before processing starts. That’s incorrect. If you tell Hadoop that your data is on Amazon S3, Hadoop reads directly from Amazon S3 and streams data to Mappers without toughing the disk. Not to be completely correct, data does touch HDFS when data has to shuffle from mappers to reducers, but as I mentioned, HDFS acts as the temp space and nothing more. EMRFS is an implementation of HDFS used for reading and writing regular files from Amazon EMR directly to Amazon S3. EMRFS provides the convenience of storing persistent data in Amazon S3 for use with Hadoop while also providing features like Amazon S3 server-side encryption, read-after-write consistency, and list consistency.
  3. And every other feature that comes with Amazon S3. Features such as SSE, LifeCycle, etc. And again keep in mind that Amazon S3 as the storage is the main reason why we can’t build elastic clusters where nodes get added and removed dynamically without any data loss.
  4. In the next few slides, we’ll talk about data persistence models with EMR. The first pattern is Amazon S3 as HDFS. With this data persistence model, data gets stored on Amazon S3. HDFS does not play any role in storing data. As a matter of fact, HDFS is only there for temporary storage. Another common thing I hear is that storing data on Amazon S3 instead of HDFS slows my job down a lot because data has to get copied to HDFS/disk first before processing starts. That’s incorrect. If you tell Hadoop that your data is on Amazon S3, Hadoop reads directly from Amazon S3 and streams data to Mappers without toughing the disk. Not to be completely correct, data does touch HDFS when data has to shuffle from mappers to reducers, but as I mentioned, HDFS acts as the temp space and nothing more. EMRFS is an implementation of HDFS used for reading and writing regular files from Amazon EMR directly to Amazon S3. EMRFS provides the convenience of storing persistent data in Amazon S3 for use with Hadoop while also providing features like Amazon S3 server-side encryption, read-after-write consistency, and list consistency.
  5. In the next few slides, we’ll talk about data persistence models with EMR. The first pattern is Amazon S3 as HDFS. With this data persistence model, data gets stored on Amazon S3. HDFS does not play any role in storing data. As a matter of fact, HDFS is only there for temporary storage. Another common thing I hear is that storing data on Amazon S3 instead of HDFS slows my job down a lot because data has to get copied to HDFS/disk first before processing starts. That’s incorrect. If you tell Hadoop that your data is on Amazon S3, Hadoop reads directly from Amazon S3 and streams data to Mappers without toughing the disk. Not to be completely correct, data does touch HDFS when data has to shuffle from mappers to reducers, but as I mentioned, HDFS acts as the temp space and nothing more. EMRFS is an implementation of HDFS used for reading and writing regular files from Amazon EMR directly to Amazon S3. EMRFS provides the convenience of storing persistent data in Amazon S3 for use with Hadoop while also providing features like Amazon S3 server-side encryption, read-after-write consistency, and list consistency.
  6. EMR example #3: EMR for ETL and query engine for investigations which require all raw data
  7. Give guidance
  8. CloudFront logs arrive out of order.
  9. Read only the data you need
  10. Read only the data you need
  11. Read only the data you need
  12. Read only the data you need
  13. Read only the data you need
  14. Redshift works with customer’s BI tool of choice through Postgres drivers and a JDBC, ODBC connection. A number of partners shown here have certified integration with Redshift, meaning they have done testing to validate/build Redshift integration and make using Redshift easy from a UI perspective. If there are tools customer’s use not shown we can work with Redshift on getting them integrated.
  15. So, we started with our MySQL server. But this time we would run directly on the server itself SQL statements that would dump the data out to local files. Then using s3cmd we copied the flat files into our S3 bucket. Select data from MySQL and use the S3cmd to copy these flat files to S3. Use BCP to export data into an EC2 instance, which generates and copies flat files to S3. And then instead of using EMR, we just run some crazy SQL statements to transform the data into the Production version of Redshift. Copy data into a staging schema in Redshift where it can be transformed via SQL to the final table structure and loaded into the production schema. Use standard tools, like Microstrategy and Tableau, to provide business views into the data. And then of course we need a good way for business users to look at the data, and that’s where MicroStrategy and Tableau come into play.
  16. CloudFront logs arrive out of order.