This document provides an overview of Amazon Redshift presented by Pavan Pothukuchi and Chris Liu. The agenda includes an introduction to Redshift, its benefits, use cases, and Coursera's experience using Redshift. Some key benefits highlighted are that Redshift is fast, inexpensive, fully managed, secure, and innovates quickly. Example use cases from NTT Docomo and Nasdaq are discussed. Chris Liu then discusses Coursera's experience moving from no data warehouse to using Redshift over three years, including their current ecosystem involving Redshift, other AWS services, and business intelligence applications. Lessons learned around thinking in Redshift, communicating with users, surprises, and reflections are also shared.
AWS re:Invent 2016: Case Study: How Monsanto Uses Amazon EFS with Their Large...Amazon Web Services
This document discusses how Monsanto uses Amazon EFS for large scale geospatial data sets. It provides an overview of EFS and its key features. It then details how Monsanto moved its geospatial data and analytics to the cloud using EFS, including setting up a GeoServer cluster on EFS. It also discusses how Monsanto built a collaborative analytics platform and production environmental classification engine that run analytics at scale on EFS and EMR. The document concludes with recommendations when using EFS and takeaways.
Learn tuning best practices for taking advantage of Amazon Redshift's columnar technology and parallel processing capabilities to improve your delivery of queries and improve overall database performance. This session explains how to migrate from existing data warehouses, create an optimized schema, efficiently load data, use work load management, tune your queries, and use Amazon Redshift's interleaved sorting features. Finally, learn how to use these best practices to give their entire organization access to analytic insights at scale.
Presented by: Alex Sinner, Solutions Architecture PMO, Amazon Web Services
Customer Guest: Luuk Linssen, Product Manager, Bannerconnect
AWS re:Invent 2016: Introduction to Managed Database Services on AWS (DAT307)Amazon Web Services
Which database is best suited for your use case? Should you choose a relational database or NoSQL or a data warehouse for your workload? Would a managed service like Amazon RDS, Amazon DynamoDB, or Amazon Redshift work better for you, or would it be better to run your own database on Amazon EC2? FanDuel has been running its fantasy sports service on Amazon Web Services (AWS) since 2012. You will learn best practices and insights from FanDuel’s successful migrations from self-managed databases on EC2 to fully-managed database services.
In addition to running databases in Amazon EC2, AWS customers can choose among a variety of managed database services. These services save effort, save time, and unlock new capabilities and economies. In this session, we make it easy to understand how they differ, what they have in common, and how to choose one or more. We explain the fundamentals of Amazon DynamoDB, a fully managed NoSQL database service; Amazon RDS, a relational database service in the cloud; Amazon ElastiCache, a fast, in-memory caching service in the cloud; and Amazon Redshift, a fully managed, petabyte-scale data-warehouse solution that can be surprisingly economical. We’ll cover how each service might help support your application, how much each service costs, and how to get started.
Migrate from SQL Server or Oracle into Amazon Aurora using AWS Database Migra...Amazon Web Services
The document discusses migrating databases from SQL Server or Oracle to Amazon Aurora using the AWS Database Migration Service (DMS). Key points include:
- DMS can migrate databases with zero downtime by capturing changes during the initial load and then continuously applying changes.
- The AWS Schema Conversion Tool can help automate schema and code conversion when migrating between database engines. It assesses the source database and provides conversion recommendations.
- Amazon Aurora provides enterprise-level availability and performance at 1/10th the cost of commercial databases. It is optimized for database workloads and is fully managed by AWS.
Amazon Aurora adds PostgreSQL compatibility to its cloud-optimized relational database. With PostgreSQL compatibility, customers can now choose to use Amazon's database with the performance and availability of commercial databases and the simplicity and cost-effectiveness of open source databases. Amazon Aurora provides high performance, durability, availability and automatic scaling capabilities for PostgreSQL workloads.
AWS re:Invent 2016: How to Launch a 100K-User Corporate Back Office with Micr...Amazon Web Services
Learn how to build a scalable, compliance-ready, and automated deployment of the Microsoft “backoffice” servers for 100K users running on AWS. In this session, we show a reference architecture deployment of Exchange, SharePoint, Skype for Business, SQL Server and Active Directory in a single VPC. We discuss the following: (1) how the solution is automated for 100K users, (2) how the solution is enabled for compliance (e.g., FedRAMP, HIPAA, PCI), and (3) how the solution is built from modular 10K user blocks. Attendees should have knowledge of AWS CloudFormation, PowerShell, instance bootstrapping, VPCs, and Amazon Route 53, as well as the relevant Microsoft technologies.
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...Amazon Web Services
The world is producing an ever increasing volume, velocity, and variety of big data. Consumers and businesses are demanding up-to-the-second (or even millisecond) analytics on their fast-moving data, in addition to classic batch processing. AWS delivers many technologies for solving big data problems. But what services should you use, why, when, and how? In this session, we simplify big data processing as a data bus comprising various stages: ingest, store, process, and visualize. Next, we discuss how to choose the right technology in each stage based on criteria such as data structure, query latency, cost, request rate, item size, data volume, durability, and so on. Finally, we provide reference architecture, design patterns, and best practices for assembling these technologies to solve your big data problems at the right cost.
Dive deep into some of the key innovations behind Amazon Aurora, discuss best practices and configurations, and share early customer experience from the field.
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...Amazon Web Services
In this session, you will learn the key differences between a relational database management service (RDBMS) and non-relational (NoSQL) databases like Amazon DynamoDB. You will learn about suitable and unsuitable use cases for NoSQL databases. You'll learn strategies for migrating from an RDBMS to DynamoDB through a 5-phase, iterative approach. See how Sony migrated an on-premises MySQL database to the cloud with Amazon DynamoDB, and see the results of this migration.
Amazon Web Services (AWS) offers a wide range of database options to fit your application requirements. From database services that are fully managed and that can be launched in minutes with just a few clicks to self-managed databases running on EC2. AWS managed database services include Amazon Relational Database Service (Amazon RDS), with support for six commonly used database engines, Amazon Aurora, a MySQL and PostgreSQL-compatible relational database, Amazon DynamoDB, a NoSQL database service or Amazon Redshift, a petabyte-scale data warehouse service. AWS also provides the AWS Database Migration Service, a service which makes it easy and inexpensive to migrate your databases to AWS cloud.
In this webinar, we take a closer look at the AWS database offerings and learn how to quickly select, set up, operate, and scale your database in the cloud.
Learning Objectives:
• Gain insights into the AWS database offering and know which to select for your workload.
• Learn how the AWS Schema Conversion Tool (AWS SCT) and AWS Database Migration Service (AWS DMS) can facilitate and simplify migrating your business critical applications to Amazon Web Services.
• Learn how Amazon DynamoDB Accelerator (DAX) can reduce Amazon DynamoDB response times from milliseconds to microseconds, even at millions of requests per second.
• Hear from our partners like Version1 and Clckwrk who can help you in your journey towards Database freedom.
BDA 302 Deep Dive on Migrating Big Data Workloads to Amazon EMRAmazon Web Services
Customers are migrating their analytics, data processing (ETL), and data science workloads running on Apache Hadoop, Spark, and data warehouse appliances from on-premise deployments to Amazon EMR in order to save costs, increase availability, and improve performance. Amazon EMR is a managed service that lets you process and analyze extremely large data sets using the latest versions of over 15 open-source frameworks in the Apache Hadoop and Spark ecosystems. This session will focus on identifying the components and workflows in your current environment and providing the best practices to migrate these workloads to Amazon EMR. We will explain how to move from HDFS to Amazon S3 as a durable storage layer, and how to lower costs with Amazon EC2 Spot instances and Auto Scaling. Additionally, we will go over common security recommendations and tuning tips to accelerate the time to production.
Intended for customers who have (or will have) thousands of instances on AWS, this session is about reducing the complexity of managing costs for these large fleets so they run efficiently. Attendees will learn about common roadblocks that prevent large customers from cost optimizing, tools they can use to efficiently remove those roadblocks, and techniques to monitor their rate of cost optimization. The session will include a case study that will talk in detail about the millions of dollars saved using these techniques. Customers will learn about a range of templates they can use to quickly implement these techniques, and also partners who can help them implement these templates.
Data migration at petabyte scale is now a simple service from AWS. You can easily migrate large volumes of data from on-premises environments to the cloud, quickly get started with the cloud as a backup target, or burst workloads between your on-premises environments and the AWS Cloud. Learn about AWS Snowball, AWS Snowball Edge, AWS Snowmobile and AWS Storage Gateway, and understand which one is the right fit for your requirements. We will go through customer use cases, review the different applications used, and help you cut IT spend and management time on hardware and backup solutions.
This document discusses using AWS for high performance computing to perform risk analysis simulations for financial services institutions. It outlines the challenges of limited on-premises capacity and inflexible hardware. AWS provides scalable compute resources, different instance types, storage options, and security tools. Example models for credit, market, and other risk are Compute-as-a-Service using EC2. Estimates show over 1 petaflop of capacity for under $0.025 per core hour using a mix of reserved and spot instances. AWS delivers flexible, secure infrastructure to meet financial risk management needs.
Deep Dive on MySQL Databases on AWS - AWS Online Tech TalksAmazon Web Services
RDS provides fully managed MySQL, MariaDB, and Aurora database engines. It handles common database tasks to reduce management overhead and allows focusing on applications. Key features include automatic failover, backups/snapshots, scaling, security, compliance support, and integration across AWS services. Best practices involve leveraging multi-AZ, read replicas, monitoring, and storage optimization based on workload needs. Migration options include the Database Migration Service and Schema Conversion Tool.
BDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMRAmazon Web Services
Customers are migrating their analytics, data processing (ETL), and data science workloads running on Apache Hadoop, Spark, and data warehouse appliances from on-premise deployments to Amazon EMR in order to save costs, increase availability, and improve performance. Amazon EMR is a managed service that lets you process and analyze extremely large data sets using the latest versions of over 15 open-source frameworks in the Apache Hadoop and Spark ecosystems. This session will focus on identifying the components and workflows in your current environment and providing the best practices to migrate these workloads to Amazon EMR. We will explain how to move from HDFS to Amazon S3 as a durable storage layer, and how to lower costs with Amazon EC2 Spot instances and Auto Scaling. Additionally, we will go over common security recommendations and tuning tips to accelerate the time to production.
(DAT303) Oracle on AWS and Amazon RDS: Secure, Fast, and ScalableAmazon Web Services
AWS and Amazon RDS provide advanced features and architectures that enable graceful migration, high performance, elastic scaling, and high availability for Oracle database workloads. Learn best practices for realizing the benefits of the cloud while reducing costs, by running Oracle on AWS in a variety of single- and multi-instance topologies. This session teaches you to take advantage of features unique to AWS and Amazon RDS to free your databases from the confines of the conventional data center.
Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud. It is fast, inexpensive, and fully managed. Some key benefits include being 10x faster and cheaper than traditional data warehouses, with high availability and disaster recovery built-in. It is easy to set up and use, and has a large ecosystem of integration and business intelligence tools. Common use cases include analytics on large volumes of mobile, web, IoT and operational data. The presentation provides an overview of Amazon Redshift and how to get started, including provisioning a cluster, data modeling best practices, and loading and querying data.
Consolidate MySQL Shards Into Amazon Aurora Using AWS Database Migration Serv...Amazon Web Services
If you’re running a MySQL database at scale, there’s a good chance you’re sharding your database deployment. Sharding is a useful way to increase the scale of your deployment, but it has drawbacks like higher costs, high administration overheard and lower elasticity. It’s harder to grow or shrink a sharded database deployment to match your traffic patterns. In this session, we will discuss and demonstrate how to use AWS Database Migration Service to consolidate multiple MySQL shards into an Amazon Aurora cluster to reduce cost, improve elasticity and make it easier to manage your database.
Learning Objectives:
Learn how to scale your MySQL database at reduced cost and higher elasticity, by consolidating multiple shards into one Amazon Aurora cluster.
Learn how Amazon Redshift, our fully managed, petabyte-scale data warehouse, can help you quickly and cost-effectively analyze all of your data using your existing business intelligence tools. Get an introduction to how Amazon Redshift uses massively parallel processing, scale-out architecture, and columnar direct-attached storage to minimize I/O time and maximize performance. Learn how you can gain deeper business insights and save money and time by migrating to Amazon Redshift. Take away strategies for migrating from on-premises data warehousing solutions, tuning schema and queries, and utilizing third party solutions.
Learn how Amazon Redshift, our fully managed, petabyte-scale data warehouse, can help you quickly and cost-effectively analyze all of your data using your existing business intelligence tools. Get an introduction to how Amazon Redshift uses massively parallel processing, scale-out architecture, and columnar direct-attached storage to minimize I/O time and maximize performance. Learn how you can gain deeper business insights and save money and time by migrating to Amazon Redshift. Take away strategies for migrating from on-premises data warehousing solutions, tuning schema and queries, and utilizing third party solutions.
Learn how Amazon Redshift, our fully managed, petabyte-scale data warehouse, can help you quickly and cost-effectively analyze all of your data using your existing business intelligence tools. Get an introduction to how Amazon Redshift uses massively parallel processing, scale-out architecture, and columnar direct-attached storage to minimize I/O time and maximize performance. Learn how you can gain deeper business insights and save money and time by migrating to Amazon Redshift. Take away strategies for migrating from on-premises data warehousing solutions, tuning schema and queries, and utilizing third party solutions.
Learn how Amazon Redshift, our fully managed, petabyte-scale data warehouse, can help you quickly and cost-effectively analyze all of your data using your existing business intelligence tools. Get an introduction to how Amazon Redshift uses massively parallel processing, scale-out architecture, and columnar direct-attached storage to minimize I/O time and maximize performance. Learn how you can gain deeper business insights and save money and time by migrating to Amazon Redshift. Take away strategies for migrating from on-premises data warehousing solutions, tuning schema and queries, and utilizing third party solutions.
(1) Amazon Redshift is a fully managed data warehousing service in the cloud that makes it simple and cost-effective to analyze large amounts of data across petabytes of structured and semi-structured data. (2) It provides fast query performance by using massively parallel processing and columnar storage techniques. (3) Customers like NTT Docomo, Nasdaq, and Amazon have been able to analyze petabytes of data faster and at a lower cost using Amazon Redshift compared to their previous on-premises solutions.
Saiba como Amazon Redshift, o nosso dataware house totalmente gerenciados, pode ajudá-lo de forma rápida e rentável analisar todos os seus dados utilizando suas ferramentas de BI. Também será abordado introdução ao serviço, o qual utiliza MPP, arquitetura scale-out e armazenamento de forma colunar.
Getting Started with Amazon Redshift - AWS July 2016 Webinar SeriesAmazon Web Services
Traditional data warehouses become expensive and slow down as the volume of your data grows. Amazon Redshift is a fast, petabyte-scale data warehouse that makes it easy to analyze all of your data using existing business intelligence tools for as low as $1000/TB/year. This webinar will provide an introduction to Amazon Redshift and cover the essentials you need to deploy your data warehouse in the cloud so that you can achieve faster analytics and save costs.
Learning Objectives:
• Get an introduction to Amazon Redshift's massively parallel processing, columnar, scale-out architecture
• Learn how to configure your data warehouse cluster, optimize schema, and load data efficiently
• Get an overview of all the latest features including interleaved sorting and user-defined functions
This document provides an overview of Amazon Redshift data warehousing capabilities. It discusses how Redshift is fast, inexpensive, fully managed, secure, and innovates quickly. It describes how to get started with Redshift, provision clusters, model data, load and query data, and monitor performance. It also provides an example of how MakerBot uses Redshift as part of its "Dream Stack" along with other AWS services for analytics.
Amazon Redshift is a fully managed data warehouse service that makes it fast, simple and cost effective to analyze data using SQL and existing business intelligence tools. The document provides an overview of Amazon Redshift and its benefits including speed, low cost, security, scalability and ease of use. It also provides examples of how various companies use Redshift for big data analytics including analyzing social media firehoses, mobile usage and real-time IoT streaming data.
Data Warehousing in the Era of Big Data: Intro to Amazon RedshiftAmazon Web Services
An overview of how Amazon Redshift uses columnar technology, massively parallel processing, and other techniques to deliver fast query performance on petabyte-size datasets.
In this presentation, you will get a look under the covers of Amazon Redshift, a fast, fully-managed, petabyte-scale data warehouse service for less than $1,000 per TB per year. Learn how Amazon Redshift uses columnar technology, optimized hardware, and massively parallel processing to deliver fast query performance on data sets ranging in size from hundreds of gigabytes to a petabyte or more. We'll also walk through techniques for optimizing performance and, you’ll hear from a specific customer and their use case to take advantage of fast performance on enormous datasets leveraging economies of scale on the AWS platform.
Learn how Amazon Redshift, our fully managed, petabyte-scale data warehouse, can help you quickly and cost-effectively analyze all of your data using your existing business intelligence tools. Get an introduction to how Amazon Redshift uses massively parallel processing, scale-out architecture, and columnar direct-attached storage to minimize I/O time and maximize performance. Learn how you can gain deeper business insights and save money and time by migrating to Amazon Redshift. Take away strategies for migrating from on-premises data warehousing solutions, tuning schema and queries, and utilizing third party solutions.
AWS June Webinar Series - Getting Started: Amazon RedshiftAmazon Web Services
Amazon Redshift is a fast, fully-managed petabyte-scale data warehouse service, for less than $1,000 per TB per year. In this presentation, you'll get an overview of Amazon Redshift, including how Amazon Redshift uses columnar technology, optimized hardware, and massively parallel processing to deliver fast query performance on data sets ranging in size from hundreds of gigabytes to a petabyte or more. Learn how, with just a few clicks in the AWS Management Console, you can set up with a fully functional data warehouse, ready to accept data without learning any new languages and easily plugging in with the existing business intelligence tools and applications you use today. This webinar is ideal for anyone looking to gain deeper insight into their data, without the usual challenges of time, cost and effort. In this webinar, you will learn: • Understand what Amazon Redshift is and how it works • Create a data warehouse interactively through the AWS Management Console • Load some data into your new Amazon Redshift data warehouse from S3 Who Should Attend • IT professionals, developers, line-of-business managers
Building Analytic Apps for SaaS: “Analytics as a Service”Amazon Web Services
TIBCO Jaspersoft® for AWS is a business intelligence suite that helps you deliver stunning interactive reports and dashboards inside your app that make it easy for your customers to get answers. Purpose-built for AWS, our reporting and analytics server quickly and easily connects to Amazon Relational Database Service (RDS), Amazon Redshift, and Amazon EMR. It includes ad-hoc reporting, dashboards, data analysis, data visualization, and data blending. In less than 10 minutes, you can be analyzing and reporting on your data. You get a full Cloud BI server starting at less than $1/hour, with no user or data limits and no additional fees.
This webinar deck shows how embeddable analytics with TIBCO Jaspersoft for AWS gives you the power to create the experience your end users demand and how to scale and manage that experience across your customer base with AWS.
This document provides an overview and use cases for Amazon Redshift, a fast, fully managed, petabyte-scale data warehouse service from Amazon Web Services. It summarizes Redshift's features including columnar storage, data compression, and massively parallel query processing. It also provides examples of how Redshift is used by companies to reduce costs, improve query performance, and scale their data warehousing needs. Specific use cases and customers of Redshift are highlighted.
Amazon Redshift is a fast, fully managed data warehousing service that allows customers to analyze petabytes of structured data, at one-tenth the cost of traditional data warehousing solutions. It provides massively parallel processing across multiple nodes, columnar data storage for efficient queries, and automatic backups and recovery. Customers have seen up to 100x performance improvements over legacy systems when using Redshift for applications like log and clickstream analytics, business intelligence reporting, and real-time analytics.
Traditional data warehouses become expensive and slow down as the volume of your data grows. Amazon Redshift is a fast, petabyte-scale data warehouse that makes it easy to analyze all of your data using existing business intelligence tools for 1/10th the traditional cost. This session will provide an introduction to Amazon Redshift and cover the essentials you need to deploy your data warehouse in the cloud so that you can achieve faster analytics and save costs.
Amazon Redshift는 속도가 빠른 페타바이트 규모의 완전관리형 데이터 웨어하우스로, 간편하고 비용 효율적으로 모든 데이터를 기존 비즈니스 인텔리전스 도구를 사용하여 분석할 수 있게 해줍니다. 이 강연에서는 RedShift를 활용해 데이터 웨어하우스를 구축하고 데이터를 분석할 때의 모범사례과 다양한 고려사항에 대해 알아보고, Amazon S3에 있는 엑사바이트 규모의 데이터에 대해 복잡한 쿼리를 실행할 직접 수행할 수 있는 RedShift Spectrum을 실제로 사용할 때 고려사항에 대해 함께 다룰 예정입니다.
연사: 정영준, 아마존 웹서비스 솔루션즈 아키텍트
Amazon Redshift is a fast, fully managed, petabyte-scale data warehouse service that makes it simple and cost-effective to efficiently analyze all your data using your existing business intelligence tools.
This webinar will provide an overview of Redshift with an emphasis on the many changes we recently introduced. In particular, we will address the newly released DW2 instance types and what you can do with them.
This content is designed for database developers and architects interested in Amazon Redshift.
O Amazon Redshift é um data warehouse rápido, gerenciado e em escala de petabytes que torna mais simples e econômica a análise de todos os seus dados usando as ferramentas de inteligência de negócios de que você já dispõe. Comece aos poucos, por apenas 0,25 USD por hora, sem compromissos, e aumente a escala até petabytes por 1.000 USD por terabyte por ano, menos de um décimo do custo das soluções tradicionais. Normalmente, os clientes relatam uma compactação de 3x, que reduz seus custos para 333 USD por terabyte não compactado por ano.
Similar to Getting Started with Amazon Redshift (20)
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
Il Forecasting è un processo importante per tantissime aziende e viene utilizzato in vari ambiti per cercare di prevedere in modo accurato la crescita e distribuzione di un prodotto, l’utilizzo delle risorse necessarie nelle linee produttive, presentazioni finanziarie e tanto altro. Amazon utilizza delle tecniche avanzate di forecasting, in parte questi servizi sono stati messi a disposizione di tutti i clienti AWS.
In questa sessione illustreremo come pre-processare i dati che contengono una componente temporale e successivamente utilizzare un algoritmo che a partire dal tipo di dato analizzato produce un forecasting accurato.
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
La varietà e la quantità di dati che si crea ogni giorno accelera sempre più velocemente e rappresenta una opportunità irripetibile per innovare e creare nuove startup.
Tuttavia gestire grandi quantità di dati può apparire complesso: creare cluster Big Data su larga scala sembra essere un investimento accessibile solo ad aziende consolidate. Ma l’elasticità del Cloud e, in particolare, i servizi Serverless ci permettono di rompere questi limiti.
Vediamo quindi come è possibile sviluppare applicazioni Big Data rapidamente, senza preoccuparci dell’infrastruttura, ma dedicando tutte le risorse allo sviluppo delle nostre le nostre idee per creare prodotti innovativi.
Ora puoi utilizzare Amazon Elastic Kubernetes Service (EKS) per eseguire pod Kubernetes su AWS Fargate, il motore di elaborazione serverless creato per container su AWS. Questo rende più semplice che mai costruire ed eseguire le tue applicazioni Kubernetes nel cloud AWS.In questa sessione presenteremo le caratteristiche principali del servizio e come distribuire la tua applicazione in pochi passaggi
Vent'anni fa Amazon ha attraversato una trasformazione radicale con l'obiettivo di aumentare il ritmo dell'innovazione. In questo periodo abbiamo imparato come cambiare il nostro approccio allo sviluppo delle applicazioni ci ha permesso di aumentare notevolmente l'agilità, la velocità di rilascio e, in definitiva, ci ha consentito di creare applicazioni più affidabili e scalabili. In questa sessione illustreremo come definiamo le applicazioni moderne e come la creazione di app moderne influisce non solo sull'architettura dell'applicazione, ma sulla struttura organizzativa, sulle pipeline di rilascio dello sviluppo e persino sul modello operativo. Descriveremo anche approcci comuni alla modernizzazione, compreso l'approccio utilizzato dalla stessa Amazon.com.
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
L’utilizzo dei container è in continua crescita.
Se correttamente disegnate, le applicazioni basate su Container sono molto spesso stateless e flessibili.
I servizi AWS ECS, EKS e Kubernetes su EC2 possono sfruttare le istanze Spot, portando ad un risparmio medio del 70% rispetto alle istanze On Demand. In questa sessione scopriremo insieme quali sono le caratteristiche delle istanze Spot e come possono essere utilizzate facilmente su AWS. Impareremo inoltre come Spreaker sfrutta le istanze spot per eseguire applicazioni di diverso tipo, in produzione, ad una frazione del costo on-demand!
In recent months, many customers have been asking us the question – how to monetise Open APIs, simplify Fintech integrations and accelerate adoption of various Open Banking business models. Therefore, AWS and FinConecta would like to invite you to Open Finance marketplace presentation on October 20th.
Event Agenda :
Open banking so far (short recap)
• PSD2, OB UK, OB Australia, OB LATAM, OB Israel
Intro to Open Finance marketplace
• Scope
• Features
• Tech overview and Demo
The role of the Cloud
The Future of APIs
• Complying with regulation
• Monetizing data / APIs
• Business models
• Time to market
One platform for all: a Strategic approach
Q&A
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
Per creare valore e costruire una propria offerta differenziante e riconoscibile, le startup di successo sanno come combinare tecnologie consolidate con componenti innovativi creati ad hoc.
AWS fornisce servizi pronti all'utilizzo e, allo stesso tempo, permette di personalizzare e creare gli elementi differenzianti della propria offerta.
Concentrandoci sulle tecnologie di Machine Learning, vedremo come selezionare i servizi di intelligenza artificiale offerti da AWS e, anche attraverso una demo, come costruire modelli di Machine Learning personalizzati utilizzando SageMaker Studio.
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
Con l'approccio tradizionale al mondo IT per molti anni è stato difficile implementare tecniche di DevOps, che finora spesso hanno previsto attività manuali portando di tanto in tanto a dei downtime degli applicativi interrompendo l'operatività dell'utente. Con l'avvento del cloud, le tecniche di DevOps sono ormai a portata di tutti a basso costo per qualsiasi genere di workload, garantendo maggiore affidabilità del sistema e risultando in dei significativi miglioramenti della business continuity.
AWS mette a disposizione AWS OpsWork come strumento di Configuration Management che mira ad automatizzare e semplificare la gestione e i deployment delle istanze EC2 per mezzo di workload Chef e Puppet.
Scopri come sfruttare AWS OpsWork a garanzia e affidabilità del tuo applicativo installato su Instanze EC2.
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
Vuoi conoscere le opzioni per eseguire Microsoft Active Directory su AWS? Quando si spostano carichi di lavoro Microsoft in AWS, è importante considerare come distribuire Microsoft Active Directory per supportare la gestione, l'autenticazione e l'autorizzazione dei criteri di gruppo. In questa sessione, discuteremo le opzioni per la distribuzione di Microsoft Active Directory su AWS, incluso AWS Directory Service per Microsoft Active Directory e la distribuzione di Active Directory su Windows su Amazon Elastic Compute Cloud (Amazon EC2). Trattiamo argomenti quali l'integrazione del tuo ambiente Microsoft Active Directory locale nel cloud e l'utilizzo di applicazioni SaaS, come Office 365, con AWS Single Sign-On.
Dal riconoscimento facciale al riconoscimento di frodi o difetti di fabbricazione, l'analisi di immagini e video che sfruttano tecniche di intelligenza artificiale, si stanno evolvendo e raffinando a ritmi elevati. In questo webinar esploreremo le possibilità messe a disposizione dai servizi AWS per applicare lo stato dell'arte delle tecniche di computer vision a scenari reali.
Amazon Web Services e VMware organizzano un evento virtuale gratuito il prossimo mercoledì 14 Ottobre dalle 12:00 alle 13:00 dedicato a VMware Cloud ™ on AWS, il servizio on demand che consente di eseguire applicazioni in ambienti cloud basati su VMware vSphere® e di accedere ad una vasta gamma di servizi AWS, sfruttando a pieno le potenzialità del cloud AWS e tutelando gli investimenti VMware esistenti.
Molte organizzazioni sfruttano i vantaggi del cloud migrando i propri carichi di lavoro Oracle e assicurandosi notevoli vantaggi in termini di agilità ed efficienza dei costi.
La migrazione di questi carichi di lavoro, può creare complessità durante la modernizzazione e il refactoring delle applicazioni e a questo si possono aggiungere rischi di prestazione che possono essere introdotti quando si spostano le applicazioni dai data center locali.
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
Molte aziende oggi, costruiscono applicazioni con funzionalità di tipo ledger ad esempio per verificare lo storico di accrediti o addebiti nelle transazioni bancarie o ancora per tenere traccia del flusso supply chain dei propri prodotti.
Alla base di queste soluzioni ci sono i database ledger che permettono di avere un log delle transazioni trasparente, immutabile e crittograficamente verificabile, ma sono strumenti complessi e onerosi da gestire.
Amazon QLDB elimina la necessità di costruire sistemi personalizzati e complessi fornendo un database ledger serverless completamente gestito.
In questa sessione scopriremo come realizzare un'applicazione serverless completa che utilizzi le funzionalità di QLDB.
Con l’ascesa delle architetture di microservizi e delle ricche applicazioni mobili e Web, le API sono più importanti che mai per offrire agli utenti finali una user experience eccezionale. In questa sessione impareremo come affrontare le moderne sfide di progettazione delle API con GraphQL, un linguaggio di query API open source utilizzato da Facebook, Amazon e altro e come utilizzare AWS AppSync, un servizio GraphQL serverless gestito su AWS. Approfondiremo diversi scenari, comprendendo come AppSync può aiutare a risolvere questi casi d’uso creando API moderne con funzionalità di aggiornamento dati in tempo reale e offline.
Inoltre, impareremo come Sky Italia utilizza AWS AppSync per fornire aggiornamenti sportivi in tempo reale agli utenti del proprio portale web.
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
Molte organizzazioni sfruttano i vantaggi del cloud migrando i propri carichi di lavoro Oracle e assicurandosi notevoli vantaggi in termini di agilità ed efficienza dei costi.
La migrazione di questi carichi di lavoro, può creare complessità durante la modernizzazione e il refactoring delle applicazioni e a questo si possono aggiungere rischi di prestazione che possono essere introdotti quando si spostano le applicazioni dai data center locali.
In queste slide, gli esperti AWS e VMware presentano semplici e pratici accorgimenti per facilitare e semplificare la migrazione dei carichi di lavoro Oracle accelerando la trasformazione verso il cloud, approfondiranno l’architettura e dimostreranno come sfruttare a pieno le potenzialità di VMware Cloud ™ on AWS.
1) The document discusses building a minimum viable product (MVP) using Amazon Web Services (AWS).
2) It provides an example of an MVP for an omni-channel messenger platform that was built from 2017 to connect ecommerce stores to customers via web chat, Facebook Messenger, WhatsApp, and other channels.
3) The founder discusses how they started with an MVP in 2017 with 200 ecommerce stores in Hong Kong and Taiwan, and have since expanded to over 5000 clients across Southeast Asia using AWS for scaling.
This document discusses pitch decks and fundraising materials. It explains that venture capitalists will typically spend only 3 minutes and 44 seconds reviewing a pitch deck. Therefore, the deck needs to tell a compelling story to grab their attention. It also provides tips on tailoring different types of decks for different purposes, such as creating a concise 1-2 page teaser, a presentation deck for pitching in-person, and a more detailed read-only or fundraising deck. The document stresses the importance of including key information like the problem, solution, product, traction, market size, plans, team, and ask.
This document discusses building serverless web applications using AWS services like API Gateway, Lambda, DynamoDB, S3 and Amplify. It provides an overview of each service and how they can work together to create a scalable, secure and cost-effective serverless application stack without having to manage servers or infrastructure. Key services covered include API Gateway for hosting APIs, Lambda for backend logic, DynamoDB for database needs, S3 for static content, and Amplify for frontend hosting and continuous deployment.
This document provides tips for fundraising from startup founders Roland Yau and Sze Lok Chan. It discusses generating competition to create urgency for investors, fundraising in parallel rather than sequentially, having a clear fundraising narrative focused on what you do and why it's compelling, and prioritizing relationships with people over firms. It also notes how the pandemic has changed fundraising, with examples of deals done virtually during this time. The tips emphasize being fully prepared before fundraising and cultivating connections with investors in advance.
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
This document discusses Amazon's machine learning services for building conversational interfaces and extracting insights from unstructured text and audio. It describes Amazon Lex for creating chatbots, Amazon Comprehend for natural language processing tasks like entity extraction and sentiment analysis, and how they can be used together for applications like intelligent call centers and content analysis. Pre-trained APIs simplify adding machine learning to apps without requiring ML expertise.
Amazon Elastic Container Service (Amazon ECS) è un servizio di gestione dei container altamente scalabile, che semplifica la gestione dei contenitori Docker attraverso un layer di orchestrazione per il controllo del deployment e del relativo lifecycle. In questa sessione presenteremo le principali caratteristiche del servizio, le architetture di riferimento per i differenti carichi di lavoro e i semplici passi necessari per poter velocemente migrare uno o più dei tuo container.
The Challenge of Interpretability in Generative AI Models.pdfSara Kroft
Navigating the intricacies of generative AI models reveals a pressing challenge: interpretability. Our blog delves into the complexities of understanding how these advanced models make decisions, shedding light on the mechanisms behind their outputs. Explore the latest research, practical implications, and ethical considerations, as we unravel the opaque processes that drive generative AI. Join us in this insightful journey to demystify the black box of artificial intelligence.
Dive into the complexities of generative AI with our blog on interpretability. Find out why making AI models understandable is key to trust and ethical use and discover current efforts to tackle this big challenge.
How UiPath Discovery Suite supports identification of Agentic Process Automat...DianaGray10
📚 Understand the basics of the newly persona-based LLM-powered Agentic Process Automation and discover how existing UiPath Discovery Suite products like Communication Mining, Process Mining, and Task Mining can be leveraged to identify APA candidates.
Topics Covered:
💡 Idea Behind APA: Explore the innovative concept of Agentic Process Automation and its significance in modern workflows.
🔄 How APA is Different from RPA: Learn the key differences between Agentic Process Automation and Robotic Process Automation.
🚀 Discover the Advantages of APA: Uncover the unique benefits of implementing APA in your organization.
🔍 Identifying APA Candidates with UiPath Discovery Products: See how UiPath's Communication Mining, Process Mining, and Task Mining tools can help pinpoint potential APA candidates.
🔮 Discussion on Expected Future Impacts: Engage in a discussion on the potential future impacts of APA on various industries and business processes.
Enhance your knowledge on the forefront of automation technology and stay ahead with Agentic Process Automation. 🧠💼✨
Speakers:
Arun Kumar Asokan, Delivery Director (US) @ qBotica and UiPath MVP
Naveen Chatlapalli, Solution Architect @ Ashling Partners and UiPath MVP
Choosing the Best Outlook OST to PST Converter: Key Features and Considerationswebbyacad software
When looking for a good software utility to convert Outlook OST files to PST format, it is important to find one that is easy to use and has useful features. WebbyAcad OST to PST Converter Tool is a great choice because it is simple to use for anyone, whether you are tech-savvy or not. It can smoothly change your files to PST while keeping all your data safe and secure. Plus, it can handle large amounts of data and convert multiple files at once, which can save you a lot of time. It even comes with 24*7 technical support assistance and a free trial, so you can try it out before making a decision. Whether you need to recover, move, or back up your data, Webbyacad OST to PST Converter is a reliable option that gives you all the support you need to manage your Outlook data effectively.
The Zaitechno Handheld Raman Spectrometer is a powerful and portable tool for rapid, non-destructive chemical analysis. It utilizes Raman spectroscopy, a technique that analyzes the vibrational fingerprint of molecules to identify their chemical composition. This handheld instrument allows for on-site analysis of materials, making it ideal for a variety of applications, including:
Material identification: Identify unknown materials, minerals, and contaminants.
Quality control: Ensure the quality and consistency of raw materials and finished products.
Pharmaceutical analysis: Verify the identity and purity of pharmaceutical compounds.
Food safety testing: Detect contaminants and adulterants in food products.
Field analysis: Analyze materials in the field, such as during environmental monitoring or forensic investigations.
The Zaitechno Handheld Raman Spectrometer is easy to use and features a user-friendly interface. It is compact and lightweight, making it ideal for field applications. With its rapid analysis capabilities, the Zaitechno Handheld Raman Spectrometer can help you improve efficiency and productivity in your research or quality control workflows.
UiPath Community Day Amsterdam: Code, Collaborate, ConnectUiPathCommunity
Welcome to our third live UiPath Community Day Amsterdam! Come join us for a half-day of networking and UiPath Platform deep-dives, for devs and non-devs alike, in the middle of summer ☀.
📕 Agenda:
12:30 Welcome Coffee/Light Lunch ☕
13:00 Event opening speech
Ebert Knol, Managing Partner, Tacstone Technology
Jonathan Smith, UiPath MVP, RPA Lead, Ciphix
Cristina Vidu, Senior Marketing Manager, UiPath Community EMEA
Dion Mes, Principal Sales Engineer, UiPath
13:15 ASML: RPA as Tactical Automation
Tactical robotic process automation for solving short-term challenges, while establishing standard and re-usable interfaces that fit IT's long-term goals and objectives.
Yannic Suurmeijer, System Architect, ASML
13:30 PostNL: an insight into RPA at PostNL
Showcasing the solutions our automations have provided, the challenges we’ve faced, and the best practices we’ve developed to support our logistics operations.
Leonard Renne, RPA Developer, PostNL
13:45 Break (30')
14:15 Breakout Sessions: Round 1
Modern Document Understanding in the cloud platform: AI-driven UiPath Document Understanding
Mike Bos, Senior Automation Developer, Tacstone Technology
Process Orchestration: scale up and have your Robots work in harmony
Jon Smith, UiPath MVP, RPA Lead, Ciphix
UiPath Integration Service: connect applications, leverage prebuilt connectors, and set up customer connectors
Johans Brink, CTO, MvR digital workforce
15:00 Breakout Sessions: Round 2
Automation, and GenAI: practical use cases for value generation
Thomas Janssen, UiPath MVP, Senior Automation Developer, Automation Heroes
Human in the Loop/Action Center
Dion Mes, Principal Sales Engineer @UiPath
Improving development with coded workflows
Idris Janszen, Technical Consultant, Ilionx
15:45 End remarks
16:00 Community fun games, sharing knowledge, drinks, and bites 🍻
The History of Embeddings & Multimodal EmbeddingsZilliz
Frank Liu will walk through the history of embeddings and how we got to the cool embedding models used today. He'll end with a demo on how multimodal RAG is used.
"Making .NET Application Even Faster", Sergey Teplyakov.pptxFwdays
In this talk we're going to explore performance improvement lifecycle, starting with setting the performance goals, using profilers to figure out the bottle necks, making a fix and validating that the fix works by benchmarking it. The talk will be useful for novice and seasoned .NET developers and architects interested in making their application fast and understanding how things work under the hood.
Self-Healing Test Automation Framework - HealeniumKnoldus Inc.
Revolutionize your test automation with Healenium's self-healing framework. Automate test maintenance, reduce flakes, and increase efficiency. Learn how to build a robust test automation foundation. Discover the power of self-healing tests. Transform your testing experience.
This PDF delves into the aspects of information security from a forensic perspective, focusing on privacy leaks. It provides insights into the methods and tools used in forensic investigations to uncover and mitigate privacy breaches in mobile and cloud environments.
TrustArc Webinar - Innovating with TRUSTe Responsible AI CertificationTrustArc
In a landmark year marked by significant AI advancements, it’s vital to prioritize transparency, accountability, and respect for privacy rights with your AI innovation.
Learn how to navigate the shifting AI landscape with our innovative solution TRUSTe Responsible AI Certification, the first AI certification designed for data protection and privacy. Crafted by a team with 10,000+ privacy certifications issued, this framework integrated industry standards and laws for responsible AI governance.
This webinar will review:
- How compliance can play a role in the development and deployment of AI systems
- How to model trust and transparency across products and services
- How to save time and work smarter in understanding regulatory obligations, including AI
- How to operationalize and deploy AI governance best practices in your organization
Generative AI technology is a fascinating field that focuses on creating comp...Nohoax Kanont
Generative AI technology is a fascinating field that focuses on creating computer models capable of generating new, original content. It leverages the power of large language models, neural networks, and machine learning to produce content that can mimic human creativity. This technology has seen a surge in innovation and adoption since the introduction of ChatGPT in 2022, leading to significant productivity benefits across various industries. With its ability to generate text, images, video, and audio, generative AI is transforming how we interact with technology and the types of tasks that can be automated.
3. AnalyzeStore
Amazon
Glacier
Amazon S3
Amazon
DynamoDB
Amazon RDS,
Amazon Aurora
AWS big data portfolio
AWS Data Pipeline
Amazon
CloudSearch
Amazon EMR Amazon EC2
Amazon
Redshift
Amazon
Machine
Learning
Amazon
Elasticsearch
Service
AWS Database
Migration Service
Amazon
QuickSight
Amazon
Kinesis
Firehose
AWS Import/Export
Snowball
AWS Direct
Connect
Collect
Amazon Kinesis
Streams
4. Relational data warehouse
Massively parallel; petabyte scale
Fully managed
HDD and SSD platforms
$1,000/TB/year; starts at $0.25/hour
Amazon
Redshift
a lot faster
a lot simpler
a lot cheaper
5. The Amazon Redshift view of data warehousing
10x cheaper
Easy to provision
Higher DBA productivity
10x faster
No programming
Easily leverage BI tools,
Hadoop, machine learning,
streaming
Analysis inline with process
flows
Pay as you go, grow as you
need
Managed availability and
disaster recovery
Enterprise Big data SaaS
6. The Forrester Wave™ is copyrighted by Forrester Research, Inc. Forrester and Forrester Wave™ are trademarks of Forrester Research, Inc. The Forrester Wave™ is a graphical
representation of Forrester's call on a market and is plotted using a detailed spreadsheet with exposed scores, weightings, and comments. Forrester does not endorse any
vendor, product, or service depicted in the Forrester Wave. Information is based on best available resources. Opinions reflect judgment at the time and are subject to change.
Forrester Wave™ enterprise data warehouse Q4 ’15
10. Benefit #1: Amazon Redshift is fast
Parallel and distributed
Query
Load
Export
Backup
Restore
Resize
11. Benefit #1: Amazon Redshift is fast
Hardware optimized for I/O intensive workloads, 4 GB/sec/node
Enhanced networking, over 1 million packets/sec/node
Choice of storage type, instance size
Regular cadence of autopatched improvements
12. Benefit #1: Amazon Redshift is fast
New dense storage (HDD) instance type (Jun 2015)
Improved memory 2x, compute 2x, disk throughput 1.5x
Cost: Same as our prior generation!
Performance improvement: 50%
Enhanced I/O and commit improvements (Jan 2016)
Performance improvement: 35%
Memory allocation improvements (May 2016)
Performance improvement: 60%
13. Benefit #2: Amazon Redshift is inexpensive
Ds2 (HDD)
Price per hour for
DW1.XL single node
Effective annual
price per TB compressed
On demand $ 0.850 $ 3,725
1-year reservation $ 0.500 $ 2,190
3-year reservation $ 0.228 $ 999
Dc1 (SSD)
Price per hour for
DW2.L single node
Effective annual
price per TB compressed
On demand $ 0.250 $ 13,690
1-year reservation $ 0.161 $ 8,795
3-year reservation $ 0.100 $ 5,500
Pricing is simple
Number of nodes x price/hour
No charge for leader node
No upfront costs
Pay as you go
14. Benefit #3: Amazon Redshift is fully managed
Continuous/incremental backups
Multiple copies within cluster
Continuous and incremental backups
to Amazon S3
Continuous and incremental backups
across regions
Streaming restore
Amazon S3
Amazon S3
Region 1
Region 2
15. Benefit #3: Amazon Redshift is fully managed
Amazon S3
Amazon S3
Region 1
Region 2
Fault tolerance
Disk failures
Node failures
Network failures
Availability Zone/region level disasters
16. Benefit #4: Security is built in
• Load encrypted from Amazon S3
• SSL to secure data in transit
• ECDHE perfect forward security
• Amazon VPC for network isolation
• Encryption to secure data at rest
• All blocks on disks and in Amazon S3 encrypted
• Block key, cluster key, master key (AES-256)
• On-premises HSM and AWS CloudHSM support
• Audit logging and AWS CloudTrail integration
• SOC 1/2/3, PCI-DSS, FedRAMP, BAA
10 GigE
(HPC)
Ingestion
Backup
Restore
Customer VPC
Internal
VPC
JDBC/ODBC
17. Benefit #5: We innovate quickly
Well over 125 new features added since launch
Release every two weeks
Automatic patching
Service Launch (2/14)
PDX (4/2)
Temp Credentials (4/11)
DUB (4/25)
SOC1/2/3 (5/8)
Unload Encrypted Files
NRT (6/5)
JDBC Fetch Size (6/27)
Unload logs (7/5)
SHA1 Builtin (7/15)
4 byte UTF-8 (7/18)
Sharing snapshots (7/18)
Statement Timeout (7/22)
Timezone, Epoch, Autoformat (7/25)
WLM Timeout/Wildcards (8/1)
CRC32 Builtin, CSV, Restore Progress
(8/9)
Resource Level IAM (8/9)
PCI (8/22)
UTF-8 Substitution (8/29)
JSON, Regex, Cursors (9/10)
Split_part, Audit tables (10/3)
SIN/SYD (10/8)
HSM Support (11/11)
Kinesis EMR/HDFS/SSH copy,
Distributed Tables, Audit
Logging/CloudTrail, Concurrency, Resize
Perf., Approximate Count Distinct, SNS
Alerts, Cross Region Backup (11/13)
Distributed Tables, Single Node Cursor
Support, Maximum Connections to 500
(12/13)
EIP Support for VPC Clusters (12/28)
New query monitoring system tables and
diststyle all (1/13)
Redshift on DW2 (SSD) Nodes (1/23)
Compression for COPY from SSH, Fetch
size support for single node clusters, new
system tables with commit stats,
row_number(), strotol() and query
termination (2/13)
Resize progress indicator & Cluster
Version (3/21)
Regex_Substr, COPY from JSON (3/25)
50 slots, COPY from EMR, ECDHE
ciphers (4/22)
3 new regex features, Unload to single
file, FedRAMP(5/6)
Rename Cluster (6/2)
Copy from multiple regions,
percentile_cont, percentile_disc (6/30)
Free Trial (7/1)
pg_last_unload_count (9/15)
AES-128 S3 encryption (9/29)
UTF-16 support (9/29)
18. Benefit #6: Amazon Redshift is powerful
• Approximate functions
• User-defined functions
• Machine learning
• Data science
19. Benefit #7: Amazon Redshift has a large ecosystem
Data integration Systems integratorsBusiness intelligence
21. Performance
Ease of use
Security
Analytics and
functionality
SOA
Recent launches Dynamic WLM parameters
Queue hopping for timed-out queries
Merge rows from staging to prod. table
2x improvement in query throughput
10x latency improvement for UNION ALL queries
Bzip2 format for ingestion
Table level restore
10x improvement in vacuum perf.
Default access privileges
Tag-based AWS IAM access
IAM roles for COPY/UNLOAD
SAS connector enhancements,
Implicit conversion of SAS
queries to Amazon Redshift
DMS support from OLTP sources
Enhanced data ingestion from
Kinesis Firehose
Improved data schema conversion
to Amazon ML
23. NTT Docomo: Japan’s largest mobile service
provider
68 million customers
Tens of TBs per day of data across a
mobile network
6 PB of total data (uncompressed)
Data science for marketing
operations, logistics, and so on
Greenplum on premises
Scaling challenges
Performance issues
Need same level of security
Need for a hybrid environment
24. NTT Docomo: Japan’s largest mobile service
provider
125 node DS2.8XL cluster
4,500 vCPUs, 30 TB RAM
2 PB compressed
10x faster analytic queries
50% reduction in time for new
BI application deployment
Significantly less operations
overhead
Data
Source
ET
AWS
Direct
Connect
Client
Forwarder
LoaderState
management
SandboxAmazon Redshift
S3
25. Nasdaq: powering 100 marketplaces in 50
countries
Orders, quotes, trade executions,
market “tick” data from 7 exchanges
7 billion rows/day
Analyze market share, client activity,
surveillance, billing, and so on
Microsoft SQL Server on premises
Expensive legacy DW
($1.16 M/yr.)
Limited capacity (1 yr. of data
online)
Needed lower TCO
Must satisfy multiple security
and regulatory requirements
Similar performance
26. 23 node DS2.8XL cluster
828 vCPUs, 5 TB RAM
368 TB compressed
2.7 T rows, 900 B derived
8 tables with 100 B rows
7 man-month migration
¼ the cost, 2x storage, room to
grow
Faster performance, very
secure
Nasdaq: powering 100 marketplaces in 50
countries
31. Outline
• Moving from no data warehouse to the Amazon
Redshift ecosystem
• No warehouse: m2.2xlarge read replica – 4 CPUs, 32 GB RAM on Amazon RDS
• First Amazon Redshift cluster: 1 ds1.xl node – 2 CPU, 16 GB RAM
32. Outline
• Moving from no data warehouse to the Amazon
Redshift ecosystem
• No warehouse: m2.2xlarge read replica – 4 CPUs, 32 GB RAM on Amazon RDS
• First Amazon Redshift cluster: 1 ds1.xl node – 2 CPU, 16 GB RAM
• The Amazon Redshift ecosystem at Coursera
• Current day: 9 dc1.8xl nodes – 288 CPUs, 2.4 TB RAM
33. Outline
• Moving from no data warehouse to the Amazon
Redshift ecosystem
• No warehouse: m2.2xlarge read replica – 4 CPUs, 32 GB RAM on Amazon RDS
• First Amazon Redshift cluster: 1 ds1.xl node – 2 CPU, 16 GB RAM
• The Amazon Redshift ecosystem at Coursera
• Current day: 9 dc1.8xl nodes – 288 CPUs, 2.4 TB RAM
• Learnings from 3 years on Amazon Redshift
• Lessons in communication, surprises, reflections
34. Starting point
• Querying production read replica
• Makeshift libraries providing thin abstraction layer
• 45 minutes to provide aggregate metrics over all classes running on Coursera =(
35. Starting point
• Querying production read replica
• Makeshift libraries providing thin abstraction layer
• 45 minutes to provide aggregate metrics over all classes running on Coursera =(
36. Move in progress
• Risk-free deployment
• "Let's try it out"
• Few clicks to deploy cluster, connect to cluster, resize
37. Move in progress
• Risk-free deployment
• "Let's try it out"
• Few clicks to deploy cluster, connect to cluster, resize
• AWS ecosystem integration
• COPY from S3/EMR/SSH
• Unload to S3
• UNLOAD(COPY(data)) == COPY(UNLOAD(data)) == data
38. Move in progress
• Risk-free deployment
• "Let's try it out"
• Few clicks to deploy cluster, connect to cluster, resize
• AWS ecosystem integration
• COPY from S3/EMR/SSH
• Unload to S3
• UNLOAD(COPY(data)) == COPY(UNLOAD(data)) == data
• Minimal administration
• In aggregate, less than 1 full-time employee for administration
• Automation and tooling for monitoring usage and performance
40. Amazon Redshift ecosystem at Coursera
• Data flow in and out of Amazon Redshift
• Business insights and reporting
• Deriving value from data
• Democratizing data access
41. Amazon Redshift ecosystem at Coursera
• Data flow in and out of Amazon Redshift
• Business insights and reporting
• Deriving value from data
• Democratizing data access
43. Amazon Redshift ecosystem at Coursera
• Data flow in and out of Amazon Redshift
• Business insights and reporting
• Data products
• Democratizing data access
44. • Provide directional insights and aggregate metrics
• Aggregate metrics over all classes on Coursera: < 5 seconds
• Goal: Insight at the speed of thought
• Results1: 0.8s median, 28s p95, 120s p99
• Companywide goal tracking
• Scheduled reports to internal and external stakeholders
• Crucial part of data informed culture
1 Results for ad hoc queries ran in the last 90 days
Business insights and reporting
45. Amazon Redshift ecosystem at Coursera
• Data flow in and out of Amazon Redshift
• Business insights and reporting
• Data products
• Democratizing data access
46. • AB experimentation
• 300 M impression table joined with 1.8 B events table in 12 minutes.
• Recommendations model
• Amazon Redshift for relational transformation
• Unload to S3 for model training
• Providing university partners with analytical dashboard
and research exports
Data Products
47. Amazon Redshift ecosystem at Coursera
• Data flow in and out of Amazon Redshift
• Business insights and reporting
• Data products
• Democratizing data access
50. Learnings from 3 years on Amazon Redshift
• Thinking in Amazon Redshift
• Communicate to users
• Surprises
• Reflections
51. Thinking in Amazon Redshift
• Columnar
• SELECT * considered harmful in most cases
52. Thinking in Amazon Redshift
• Columnar
• SELECT * considered harmful in most cases
• Nodes, slices, blocks
• 1 MB blocks per slice, n slices per node depending on node type
53. Thinking in Amazon Redshift
• Columnar
• SELECT * considered harmful in most cases
• Nodes, slices, blocks
• 1 MB blocks per slice, n slices per node depending on node type
• Sorting and distribution
• Share nothing massively parallel processing => data is sorted per slice
• Up to 2 orders of magnitude increase in JOIN/GROUP BY for merge join vs hash join
54. Communicating to users
• Prefer the scientific method over gut feel
• Investigate how many rows were materialized with svl_query_report
• Understand EXPLAIN plan for data distribution, join strategy, predicate order
55. Communicating to users
• Prefer the scientific method over gut feel
• Investigate how many rows were materialized with svl_query_report
• Understand EXPLAIN plan for data distribution, join strategy, predicate order
• SQL style guide for readability
• Leading commas, capitalized SQL keywords, conventions for handling dates/timestamps,
conventions for table names, mapping tables
56. Communicating to users
• Prefer the scientific method over gut feel
• Investigate how many rows were materialized with svl_query_report
• Understand EXPLAIN plan for data distribution, join strategy, predicate order
• SQL style guide for readability
• Leading commas, capitalized SQL keywords, conventions for handling dates/timestamps,
conventions for table names, mapping tables
• Use the right tool for the right task
• Amazon Redshift is not for online traffic serving
• Amazon Redshift is not for stream processing
57. Surprises
• "Fundamental theorem of Redshift at Coursera"
• Most queries involve full table scans
• 9 nodes x 32 slices/node x 1 block/slice x 1 MB/block => at least 288 MB allocated per column
• Store ~75 M integer values and maintain 1 block per slice
58. Surprises
• "Fundamental theorem of Redshift at Coursera"
• Most queries involve full table scans
• 9 nodes x 32 slices/node x 1 block/slice x 1 MB/block => at least 288 MB allocated per column
• Store ~75 M integer values and maintain 1 block per slice
• Features may behave in unexpected fashions
• Sort key compression
• Primary and foreign keys
59. Surprises
• "Fundamental theorem of Redshift at Coursera"
• Most queries involve full table scans
• 9 nodes x 32 slices/node x 1 block/slice x 1 MB/block => at least 288 MB allocated per column
• Store ~75 M integer values and maintain 1 block per slice
• Features may behave in unexpected fashions
• Sort key compression
• Primary and foreign keys
• Features may be unexpectedly expensive
• COMMIT – Batch work, monitor with stl_commit_stats
• VACUUM – Prefer TRUNCATE, monitor with stl_vacuum, stl_query
• Your mileage may vary
60. Reflections
• Simplicity – Relational model, Postgres 8.0 compliant SQL, things just
work. Minimal administration. Minimal tuning.
61. Reflections
• Simplicity – Relational model, Postgres 8.0 compliant SQL, things just
work. Minimal administration. Minimal tuning.
• Scalability – Scaled cluster up 5 times in the last 3 years as data volume
and usage increased.
62. Reflections
• Simplicity – Relational model, Postgres 8.0 compliant SQL, things just
work. Minimal administration. Minimal tuning.
• Scalability – Scaled cluster up 5 times in the last 3 years as data volume
and usage increased.
• Flexibility – No strict requirement on data modeling; dusty knobs for
tuning in majority of cases. Handles both heavily normalized data model
and denormalized clickstream data.
63. Reflections
• Simplicity – Relational model, Postgres 8.0 compliant SQL, things just
work. Minimal administration. Minimal tuning.
• Scalability – Scaled cluster up 5 times in the last 3 years as data volume
and usage increased.
• Flexibility – No strict requirement on data modeling; dusty knobs for
tuning in majority of cases. Handles both heavily normalized data model
and denormalized clickstream data.
• Extensibility – Standard API (JDBC/ODBC/libpq) and integration points.
64. Resources
Pavan Pothukuchi | pavanpo@amazon.com |
Chris Liu | cliu@coursera.org |
Detail pages
• http://aws.amazon.com/redshift
• https://aws.amazon.com/marketplace/redshift/
Best practices
• http://docs.aws.amazon.com/redshift/latest/dg/c_loading-data-best-practices.html
• http://docs.aws.amazon.com/redshift/latest/dg/c_designing-tables-best-practices.html
• http://docs.aws.amazon.com/redshift/latest/dg/c-optimizing-query-performance.html
Related breakout sessions
• Deep Dive on Amazon QuickSight (2:15–3:15 pm)
• Getting Started with Amazon QuickSight (2:15–3:15 pm)
• Database Migration: Simple, Cross-Engine and Cross-Platform Migrations with Minimal Downtime
(4:45–5:45 pm)