Data processing and analysis is where big data is most often consumed, driving business intelligence (BI) use cases that discover and report on meaningful patterns in the data. In this session, we will discuss options for processing, analyzing, and visualizing data. We will also look at partner solutions and BI-enabling services from AWS. Attendees will learn about optimal approaches for stream processing, batch processing, and interactive analytics with AWS services, such as, Amazon Machine Learning, Elastic MapReduce (EMR), and Redshift.
Created by: Jason Morris, Solutions Architect
AWS Glue is a fully managed extract, transform, and load (ETL) service that makes it easy for customers to prepare and load their data for analytics. You can create and run an ETL job with a few clicks in the AWS Management Console. You simply point AWS Glue to your data stored on AWS, and AWS Glue discovers your data and stores the associated metadata (e.g. table definition and schema) in the AWS Glue Data Catalog. Once cataloged, your data is immediately searchable, queryable, and available for ETL. AWS Glue generates the code to execute your data transformations and data loading processes.
Level: Intermediate
Speakers:
Ryan Malecky - Solutions Architect, EdTech, AWS
Rajakumar Sampathkumar - Sr. Technical Account Manager, AWS
Amazon Elastic Compute Cloud (Amazon EC2) provides a broad selection of instance types to accommodate a diverse mix of workloads. In this technical session, we provide an overview of the Amazon EC2 instance platform, key platform features, and the concept of instance generations. We dive into the current generation design choices of the different instance families, including the General Purpose, Compute Optimized, Storage Optimized, Memory Optimized, and GPU instance families. We also detail best practices and share performance tips for getting the most out of your Amazon EC2 instances.
This document discusses Amazon Redshift, a fully managed data warehousing service. It provides petabyte-scale data warehousing capabilities with performance up to 3x faster and 80% lower cost than traditional data warehousing solutions. The document outlines use cases, architecture details, pricing and total cost of ownership, security features, integration options and best practices. It also shares customer examples and an ecosystem of partners building solutions on Amazon Redshift.
Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud. It is fast, inexpensive, and fully managed. Some key benefits include being 10x faster and cheaper than traditional data warehouses, with high availability and disaster recovery built-in. It is easy to set up and use, and has a large ecosystem of integration and business intelligence tools. Common use cases include analytics on large volumes of mobile, web, IoT and operational data. The presentation provides an overview of Amazon Redshift and how to get started, including provisioning a cluster, data modeling best practices, and loading and querying data.
This document summarizes the limitations of Azure SQL Database. Key limitations include not being able to change collation settings of system objects, use Windows authentication, manage high availability features, or use various SQL Server tools and stored procedures. The document provides a detailed list of limitations across database and T-SQL functionality, database internals, and other areas in comparison to on-premises SQL Server. It advises checking official Microsoft documentation for the most up-to-date information.
Amazon Redshift é um serviço gerenciado que lhe dá um Data Warehouse, pronto para usar. Você se preocupa com carregar dados e utilizá-lo. Os detalhes de infraestrutura, servidores, replicação, backup são administrados pela AWS.
(APP204) NEW LAUNCH: Introduction to AWS Service Catalog | AWS re:Invent 2014Amazon Web Services
Running an IT department in a large organization is not easy. To provide your internal users with access to the latest and greatest technology so that they can be as efficient and as productive as possible needs to be balanced with the need to set and maintain corporate standards, collect and disseminate best practices, and provide some oversight to avoid runaway spending and technology sprawl. Introducing AWS Service Catalog, a service that allows end users in your organization to easily find and launch products using a personalized portal. You can manage catalogs of standardized offerings and control which users have access to which products, enabling compliance with business policies. Your organization can benefit from increased agility and reduced costs. Attend this session to be one of the first to learn about this new service.
In this session, we introduce AWS Glue, provide an overview of its components, and share how you can use AWS Glue to automate discovering your data, cataloging it, and preparing it for analysis.
Este documento describe cómo las instancias Amazon EC2 Spot pueden ayudar a ahorrar hasta un 90% en los costos de procesamiento por lotes ejecutando trabajos en instancias disponibles a precios más bajos. Se explica cómo las aplicaciones de procesamiento por lotes pueden aprovechar las instancias Spot para simulaciones, modelado molecular, procesamiento de video y más, al tiempo que se mantiene la confiabilidad mediante el uso de colas SQS y almacenamiento S3.
Amazon EC2 is a cloud computing service that provides virtual computing resources such as servers and storage. It allows users to launch virtual machine instances that can be used to build and host applications. EC2 has grown significantly since its launch in 2006 to include many instance types, operating systems, pricing options, and features to improve performance, security, and scalability. Customers use EC2 for its flexibility, low costs, global accessibility, security, and ability to easily scale resources to meet variable computing needs.
This document provides an introduction to Amazon Aurora, AWS's managed relational database service. It discusses how Aurora was built to provide the speed and availability of commercial databases at the simplicity and cost-effectiveness of open source databases. The document outlines key Aurora features like automatic scaling, continuous backups, replication across Availability Zones, and integration with other AWS services. Customer case studies show how Aurora provides better performance at lower costs than alternative database options. The document also covers migration options and how Aurora offers a simpler, more cost-effective database solution than on-premises or self-managed options.
AWS Compute Evolved Week: High Performance Computing on AWSAmazon Web Services
AWS Compute Evolved Week at the San Francisco Loft: High Performance Computing on AWS
High Performance Computing (HPC) has been driving technology advancements for many decades. HPC enables performance-demanding applications and workloads to solve complex problems while dramatically reducing time to solution. With a history of requiring very large data centers, HPC is now on the edge of a paradigm shift. The AWS Cloud will allow customers to have access to near infinite compute and storage resources, without the overhead of running their own data centers. There are a vast number of HPC segments and verticals that are already seeing great success running their workloads on AWS. Life Sciences, Financial Services, Energy & Geo Sciences, as well as Manufacturing are successfully deploying their applications on AWS. In these two sessions we will discuss how AWS can help you run HPC workloads in the cloud.
Speakers: Pierre-Yves Aquilanti - Sr. HPC Specialized Solutions Architect, AWS & Anh Tran - Sr. HPC Specialized Solutions Architect, AWS
by Robbie Wright, HEad of Amazon S3 & Amazon Glacier Product Marketing, AWS
Learn from AWS on how we've designed S3 and Glacier to be durable, available, and massively scalable. Hear how customers are using these services to enhance the accessibility and usability of their data. We will also dive into the benefits of object storage, its applications, and some best practices to follow.
For more training on AWS, visit: https://www.qa.com/amazon
AWS Loft | London - Deep Dive: Amazon RDS by Toby Knight, Manager Solutions Architecture, 18 April 2016
Sparklint is a tool for identifying and tuning inefficient Spark jobs across a cluster. It provides live views of application statistics or event-by-event analysis of historical logs for metrics like idle time, core usage, and task locality. The demo shows Sparklint analyzing access logs to count by IP, status, and verb over multiple jobs, tuning configuration settings to improve efficiency and reduce idle time. Future features may include more job detail, auto-tuning, and streaming optimizations. Sparklint is an open source project for contributing to Spark job monitoring and optimization.
This document provides an overview of AWS Lambda and how it enables serverless application development. It discusses how AWS Lambda allows developers to run code without provisioning or managing servers, and how it can automatically scale in response to triggers from various AWS services. It also summarizes some key capabilities like bringing custom code, parallel execution, integration with other AWS services, and event-driven or pull-based invocation models.
This document discusses AWS Step Functions, a serverless workflow service for coordinating Lambda functions and other services. It provides examples of how Step Functions allows sequencing functions, selecting functions based on data, retrying functions, handling errors with try/catch/finally, running long-running code over hours, and running functions in parallel. Quotes from customers discuss how Step Functions allows building automated workflows for food delivery operations and product data updates.
Building Serverless ETL Pipelines with AWS Glue - AWS Summit Sydney 2018Amazon Web Services
Building Serverless ETL Pipelines with AWS Glue
In this session we will introduce key ETL features of AWS Glue and cover common use cases ranging from scheduled nightly data warehouse loads to near real-time, event-driven ETL flows for your data lake. We will also discuss how to build scalable, efficient, and serverless ETL pipelines.
Ben Thurgood, Solutions Architect, Amazon Web Services
The poem describes a woman and the different ways she can affect the narrator. In three sentences, the poem explores how the woman may represent different emotions like pleasure, regret, beauty, or beast. She is also described as potentially being the reason for the narrator's survival, happiness, and meaning in life. The narrator states that wherever the woman goes, he must follow, as she is the meaning of his life.
This magazine article summarizes an interview with the Seattle-based solo musician Chris Mansfield, who performs under the name Fences. Some key details from the interview include that Fences cites Elliott Smith as a major musical influence and enjoys recording with other local artists. He drinks alcohol daily but does not do other drugs, and hopes to tour and release a full-length album if given the opportunity. Fences aims to convey emotion through his intimate folk songs and wants listeners to feel something when they listen to his music.
The document discusses different coordinate systems and geometries for the FCAL collaboration including the positions of the LumiCal and BeamCal detectors centered on the outgoing beam within 3.05cm and 3.65cm respectively. It also notes the angular coverage of LumiCal from 8 to 28cm and 26 to 92 mrad and BeamCal from 1.5 to 10cm and 4 to 28 mrad. The BeamCal segmentation is mentioned as 45 or 60 degree calorimeter segments with either 8 or 6 segments using a 360-opening angle divided by the number of segments for 20 mrad coverage.
Surface computing is an interface that replaces the traditional GUI with touch-sensitive surfaces that allow users to interact directly using their hands and fingers instead of mice or keyboards. Microsoft Surface is one example of this technology, featuring a flat computing table where people can interact with devices by placing them on the surface. Potential uses include browsing menus at restaurants, paying bills, downloading content to phones, and organizing media collections using touch interactions on surfaces instead of traditional inputs.
Providing Better Producer Administration With TrueProducerCallidus Software
The document discusses the TrueProducer software for administering insurance producers. It provides an overview of key functionality, including 360-degree producer views, analytics and reporting, producer onboarding and credential management. Upcoming releases will enhance integration with other Callidus products and provide additional insurance-specific functionality.
With new technology, new frameworks, new models, and new best practices coming at us every day, how can anyone become an expert in everything? This presentation talks about the danger of shiny new web methodologies and how to grow your expertise in a manageable way. Avoid yak shaving and solve real problems!
She may represent different things throughout each day, such as beauty or beast, feast or famine, heaven or hell. She can mirror one's dreams or reflect in a stream, though she may not seem what is inside her shell. She who seems happy in a crowd may have private, proud eyes that no one sees when she cries. She may be the love that cannot last or come from the past that will be remembered until death. She may be the reason for survival and life, the one who will be cared for through life. Her laughter and tears will be treasured souvenirs, and where she goes is where the speaker must be, as she is the meaning of life.
She represents different things to the singer including a memory that cannot be forgotten, a source of pleasure or regret, or something that can make each day heaven or hell. The singer will remember her until the day they die and sees her as the reason they survive and the meaning of their life. She may seem one way on the outside but be different inside and have private moments where she cries that no one sees.
This song describes a woman who has a profound impact on the singer's life and emotions. She represents different things that can make each day feel like heaven or hell. The singer will remember her until his dying day and cares for her through good and bad times. While she may seem happy in a crowd, she also has private and proud moments when she cries alone. The singer suggests she may be the love that cannot last but will always be remembered from the past.
Data processing and analysis is where big data is most often consumed - driving business intelligence (BI) use cases that discover and report on meaningful patterns in the data. In this session, we will discuss options for processing, analyzing and visualizing data. We will also look at partner solutions and BI-enabling services from AWS. Attendees will learn about optimal approaches for stream processing, batch processing and Interactive analytics. AWS services to be covered include: Amazon Machine Learning, Elastic MapReduce (EMR), and Redshift.
In this presentation, you will get a look under the covers of Amazon Redshift, a fast, fully-managed, petabyte-scale data warehouse service for less than $1,000 per TB per year. Learn how Amazon Redshift uses columnar technology, optimized hardware, and massively parallel processing to deliver fast query performance on data sets ranging in size from hundreds of gigabytes to a petabyte or more. We'll also walk through techniques for optimizing performance and, you’ll hear from a specific customer and their use case to take advantage of fast performance on enormous datasets leveraging economies of scale on the AWS platform.
(1) Amazon Redshift is a fully managed data warehousing service in the cloud that makes it simple and cost-effective to analyze large amounts of data across petabytes of structured and semi-structured data. (2) It provides fast query performance by using massively parallel processing and columnar storage techniques. (3) Customers like NTT Docomo, Nasdaq, and Amazon have been able to analyze petabytes of data faster and at a lower cost using Amazon Redshift compared to their previous on-premises solutions.
Data & Analytics - Session 2 - Introducing Amazon RedshiftAmazon Web Services
Amazon Redshift is a fast and powerful, fully managed, petabyte-scale data warehouse service in the cloud. This presentation will give an introduction to the service and its pricing before diving into how it delivers fast query performance on data sets ranging from hundreds of gigabytes to a petabyte or more.
Steffen Krause, Technical Evangelist, AWS
Padraic Mulligan, Architect and Lead Developer and Mike McCarthy, CTO, Skillspage
Amazon Elastic MapReduce (Amazon EMR) is a web service that allows you to easily and securely provision and manage your Hadoop clusters. In this talk, we will introduce you to Amazon EMR design patterns, such as using various data stores like Amazon S3, how to take advantage of both transient and active clusters, and how to work with other Amazon EMR architectural patterns. We will dive deep on how to dynamically scale your cluster and address the ways you can fine-tune your cluster. We will discuss bootstrapping Hadoop applications from our partner ecosystem that you can use natively with Amazon EMR. Lastly, we will share best practices on how to keep your Amazon EMR cluster cost-effective.
In this presentation, you will get a look under the covers of Amazon Redshift, a fast, fully-managed, petabyte-scale data warehouse service for less than $1,000 per TB per year. Learn how Amazon Redshift uses columnar technology, optimized hardware, and massively parallel processing to deliver fast query performance on data sets ranging in size from hundreds of gigabytes to a petabyte or more. We'll also walk through techniques for optimizing performance and, you’ll hear from a specific customer and their use case to take advantage of fast performance on enormous datasets leveraging economies of scale on the AWS platform.
In this presentation, you will get a look under the covers of Amazon Redshift, a fast, fully-managed, petabyte-scale data warehouse service for less than $1,000 per TB per year. Learn how Amazon Redshift uses columnar technology, optimized hardware, and massively parallel processing to deliver fast query performance on data sets ranging in size from hundreds of gigabytes to a petabyte or more. You¹ll also hear from Dan Wagner, CEO at Civis Analytics, as he discusses why the Civis data science platform was designed on top of Amazon Redshift and the AWS platform in order to help smart organizations bridge their data silos, build 360 degree view of their customer relationships, and identify opportunities for driving their companies forward by leveraging enormous datasets, the power of analytics, and economies of scale on the AWS platform.
This document provides an overview of Amazon Redshift data warehousing capabilities. It discusses how Redshift is fast, inexpensive, fully managed, secure, and innovates quickly. It describes how to get started with Redshift, provision clusters, model data, load and query data, and monitor performance. It also provides an example of how MakerBot uses Redshift as part of its "Dream Stack" along with other AWS services for analytics.
AWS Summit London 2014 | Uses and Best Practices for Amazon Redshift (200)Amazon Web Services
Amazon Redshift is a fully managed, petabyte-scale data warehouse service that makes it simple and cost-effective to analyze all your data using standard SQL and existing BI tools. It provides fast query performance by using columnar storage and data compression, automatic parallelization and distribution of data and queries, continuous backups and automated repairs. Customers use Redshift for analytics, business intelligence, and reporting on large datasets for use cases such as reducing costs, improving performance, and making more data available for analysis.
AWS Webcast - Managing Big Data in the AWS Cloud_20140924Amazon Web Services
This presentation deck will cover specific services such as Amazon S3, Kinesis, Redshift, Elastic MapReduce, and DynamoDB, including their features and performance characteristics. It will also cover architectural designs for the optimal use of these services based on dimensions of your data source (structured or unstructured data, volume, item size and transfer rates) and application considerations - for latency, cost and durability. It will also share customer success stories and resources to help you get started.
This document provides an overview and use cases for Amazon Redshift, a fast, fully managed, petabyte-scale data warehouse service from Amazon Web Services. It summarizes Redshift's features including columnar storage, data compression, and massively parallel query processing. It also provides examples of how Redshift is used by companies to reduce costs, improve query performance, and scale their data warehousing needs. Specific use cases and customers of Redshift are highlighted.
AWS June Webinar Series - Getting Started: Amazon RedshiftAmazon Web Services
Amazon Redshift is a fast, fully-managed petabyte-scale data warehouse service, for less than $1,000 per TB per year. In this presentation, you'll get an overview of Amazon Redshift, including how Amazon Redshift uses columnar technology, optimized hardware, and massively parallel processing to deliver fast query performance on data sets ranging in size from hundreds of gigabytes to a petabyte or more. Learn how, with just a few clicks in the AWS Management Console, you can set up with a fully functional data warehouse, ready to accept data without learning any new languages and easily plugging in with the existing business intelligence tools and applications you use today. This webinar is ideal for anyone looking to gain deeper insight into their data, without the usual challenges of time, cost and effort. In this webinar, you will learn: • Understand what Amazon Redshift is and how it works • Create a data warehouse interactively through the AWS Management Console • Load some data into your new Amazon Redshift data warehouse from S3 Who Should Attend • IT professionals, developers, line-of-business managers
Antoine Genereux takes us on a detailed overview of the Database solutions available on the AWS Cloud, addressing the needs and requirements of customers at all levels. He also discusses Business Intelligence and Analytics solutions.
This document provides an overview of Amazon Redshift presented by Pavan Pothukuchi and Chris Liu. The agenda includes an introduction to Redshift, its benefits, use cases, and Coursera's experience using Redshift. Some key benefits highlighted are that Redshift is fast, inexpensive, fully managed, secure, and innovates quickly. Example use cases from NTT Docomo and Nasdaq are discussed. Chris Liu then discusses Coursera's experience moving from no data warehouse to using Redshift over three years, including their current ecosystem involving Redshift, other AWS services, and business intelligence applications. Lessons learned around thinking in Redshift, communicating with users, surprises, and reflections are also shared.
講師: Xiaoyong Han, Solution Architect, AWS
Data collection and storage is a primary challenge for any big data architecture. In this webinar, gain a thorough understanding of AWS solutions for data collection and storage, and learn architectural best practices for applying those solutions to your projects. This session will also include a discussion of popular use cases and reference architectures. In this webinar, you will learn:
• Overview of the different types of data that customers are handling to drive high-scale workloads on AWS, and how to choose the best approach for your workload • Optimization techniques that improve performance and reduce the cost of data ingestion • Leveraging Amazon S3, Amazon DynamoDB, and Amazon Kinesis for storage and data collection
Selecting the Right AWS Database Solution - AWS 2017 Online Tech TalksAmazon Web Services
• Get an overview of managed database services available on AWS
• Learn how to combine them for high-performance cost effective architectures
• Learn how to choose between the AWS database services based on your use case
On AWS you can choose from a variety of managed database services that save effort, save time, and unlock new capabilities and economies. In this session, we make it easy to understand how they differ, what they have in common, and how to choose one or more. We'll explain the fundamentals of Amazon RDS, a managed relational database service in the cloud; Amazon DynamoDB, a fully managed NoSQL database service; Amazon ElastiCache, a fast, in-memory caching service in the cloud; and Amazon Redshift, a fully managed, petabyte-scale data-warehouse solution that can be economical. We will cover how each service might help support your application and how to get started.
Getting Started with Big Data and HPC in the Cloud - August 2015Amazon Web Services
How can you use Big Data to grow your business and discover new opportunities? When organizations effectively capture, analyze, visualize and apply big data insights to their business goals, they differentiate themselves from their competitors and outperform them in terms of operational efficiency and the bottom line. With Amazon Web Services, businesses and researchers can easily fulfill their high performance computing (HPC) requirements with the added benefit of ad-hoc provisioning, pay-as-you-go pricing and faster time-to-results. Join this session to understand how to run HPC applications in AWS cloud, and about different AWS Big Data and Analytics services such as Amazon Elastic MapReduce (Hadoop), Amazon Redshift (Data Warehouse) and Amazon Kinesis (Streaming), when to use them and how they work together.
Getting Started with Managed Database Services on AWS - September 2016 Webina...Amazon Web Services
On AWS you can choose from a variety of managed database services that save effort, save time, and unlock new capabilities and economies. In this session, we make it easy to understand how they differ, what they have in common, and how to choose one or more. We'll explain the fundamentals of Amazon RDS, a managed relational database service in the cloud; Amazon DynamoDB, a fully managed NoSQL database service; Amazon ElastiCache, a fast, in-memory caching service in the cloud; and Amazon Redshift, a fully managed, petabyte-scale data-warehouse solution that can be surprisingly economical. We will cover how each service might help support your application, how much each service costs, and how to get started.
Learning Objectives:
• Overview of managed database services available on AWS
• How to combine them for high-performance cost effective architectures
• Learn how to choose between the AWS database services based on the use case
Who Should Attend:
• IT Managers, DBAs, Enterprise and Solution Architects, IT Managers, DBAs, Enterprise and Solution Architects, Devops Engineers and Developers
A quick overview of Redshift and common use-cases. Followed by tools and links to performance tuning. How Redshift fits in the AWS data services. A list of key new features since last meetup in September 2016, including Redshift Spectrum that allows one to run SQL directly on your data sitting on Amazon S3. It also includes Redshift echosystem with data integration, bi, consultancy and data modelling partners.
Amazon Redshift is a fast, fully managed, petabyte-scale data warehouse service that makes it simple and cost-effective to efficiently analyze all your data using your existing business intelligence tools.
This webinar will provide an overview of Redshift with an emphasis on the many changes we recently introduced. In particular, we will address the newly released DW2 instance types and what you can do with them.
This content is designed for database developers and architects interested in Amazon Redshift.
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
Il Forecasting è un processo importante per tantissime aziende e viene utilizzato in vari ambiti per cercare di prevedere in modo accurato la crescita e distribuzione di un prodotto, l’utilizzo delle risorse necessarie nelle linee produttive, presentazioni finanziarie e tanto altro. Amazon utilizza delle tecniche avanzate di forecasting, in parte questi servizi sono stati messi a disposizione di tutti i clienti AWS.
In questa sessione illustreremo come pre-processare i dati che contengono una componente temporale e successivamente utilizzare un algoritmo che a partire dal tipo di dato analizzato produce un forecasting accurato.
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
La varietà e la quantità di dati che si crea ogni giorno accelera sempre più velocemente e rappresenta una opportunità irripetibile per innovare e creare nuove startup.
Tuttavia gestire grandi quantità di dati può apparire complesso: creare cluster Big Data su larga scala sembra essere un investimento accessibile solo ad aziende consolidate. Ma l’elasticità del Cloud e, in particolare, i servizi Serverless ci permettono di rompere questi limiti.
Vediamo quindi come è possibile sviluppare applicazioni Big Data rapidamente, senza preoccuparci dell’infrastruttura, ma dedicando tutte le risorse allo sviluppo delle nostre le nostre idee per creare prodotti innovativi.
Ora puoi utilizzare Amazon Elastic Kubernetes Service (EKS) per eseguire pod Kubernetes su AWS Fargate, il motore di elaborazione serverless creato per container su AWS. Questo rende più semplice che mai costruire ed eseguire le tue applicazioni Kubernetes nel cloud AWS.In questa sessione presenteremo le caratteristiche principali del servizio e come distribuire la tua applicazione in pochi passaggi
Vent'anni fa Amazon ha attraversato una trasformazione radicale con l'obiettivo di aumentare il ritmo dell'innovazione. In questo periodo abbiamo imparato come cambiare il nostro approccio allo sviluppo delle applicazioni ci ha permesso di aumentare notevolmente l'agilità, la velocità di rilascio e, in definitiva, ci ha consentito di creare applicazioni più affidabili e scalabili. In questa sessione illustreremo come definiamo le applicazioni moderne e come la creazione di app moderne influisce non solo sull'architettura dell'applicazione, ma sulla struttura organizzativa, sulle pipeline di rilascio dello sviluppo e persino sul modello operativo. Descriveremo anche approcci comuni alla modernizzazione, compreso l'approccio utilizzato dalla stessa Amazon.com.
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
L’utilizzo dei container è in continua crescita.
Se correttamente disegnate, le applicazioni basate su Container sono molto spesso stateless e flessibili.
I servizi AWS ECS, EKS e Kubernetes su EC2 possono sfruttare le istanze Spot, portando ad un risparmio medio del 70% rispetto alle istanze On Demand. In questa sessione scopriremo insieme quali sono le caratteristiche delle istanze Spot e come possono essere utilizzate facilmente su AWS. Impareremo inoltre come Spreaker sfrutta le istanze spot per eseguire applicazioni di diverso tipo, in produzione, ad una frazione del costo on-demand!
In recent months, many customers have been asking us the question – how to monetise Open APIs, simplify Fintech integrations and accelerate adoption of various Open Banking business models. Therefore, AWS and FinConecta would like to invite you to Open Finance marketplace presentation on October 20th.
Event Agenda :
Open banking so far (short recap)
• PSD2, OB UK, OB Australia, OB LATAM, OB Israel
Intro to Open Finance marketplace
• Scope
• Features
• Tech overview and Demo
The role of the Cloud
The Future of APIs
• Complying with regulation
• Monetizing data / APIs
• Business models
• Time to market
One platform for all: a Strategic approach
Q&A
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
Per creare valore e costruire una propria offerta differenziante e riconoscibile, le startup di successo sanno come combinare tecnologie consolidate con componenti innovativi creati ad hoc.
AWS fornisce servizi pronti all'utilizzo e, allo stesso tempo, permette di personalizzare e creare gli elementi differenzianti della propria offerta.
Concentrandoci sulle tecnologie di Machine Learning, vedremo come selezionare i servizi di intelligenza artificiale offerti da AWS e, anche attraverso una demo, come costruire modelli di Machine Learning personalizzati utilizzando SageMaker Studio.
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
Con l'approccio tradizionale al mondo IT per molti anni è stato difficile implementare tecniche di DevOps, che finora spesso hanno previsto attività manuali portando di tanto in tanto a dei downtime degli applicativi interrompendo l'operatività dell'utente. Con l'avvento del cloud, le tecniche di DevOps sono ormai a portata di tutti a basso costo per qualsiasi genere di workload, garantendo maggiore affidabilità del sistema e risultando in dei significativi miglioramenti della business continuity.
AWS mette a disposizione AWS OpsWork come strumento di Configuration Management che mira ad automatizzare e semplificare la gestione e i deployment delle istanze EC2 per mezzo di workload Chef e Puppet.
Scopri come sfruttare AWS OpsWork a garanzia e affidabilità del tuo applicativo installato su Instanze EC2.
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
Vuoi conoscere le opzioni per eseguire Microsoft Active Directory su AWS? Quando si spostano carichi di lavoro Microsoft in AWS, è importante considerare come distribuire Microsoft Active Directory per supportare la gestione, l'autenticazione e l'autorizzazione dei criteri di gruppo. In questa sessione, discuteremo le opzioni per la distribuzione di Microsoft Active Directory su AWS, incluso AWS Directory Service per Microsoft Active Directory e la distribuzione di Active Directory su Windows su Amazon Elastic Compute Cloud (Amazon EC2). Trattiamo argomenti quali l'integrazione del tuo ambiente Microsoft Active Directory locale nel cloud e l'utilizzo di applicazioni SaaS, come Office 365, con AWS Single Sign-On.
Dal riconoscimento facciale al riconoscimento di frodi o difetti di fabbricazione, l'analisi di immagini e video che sfruttano tecniche di intelligenza artificiale, si stanno evolvendo e raffinando a ritmi elevati. In questo webinar esploreremo le possibilità messe a disposizione dai servizi AWS per applicare lo stato dell'arte delle tecniche di computer vision a scenari reali.
Amazon Web Services e VMware organizzano un evento virtuale gratuito il prossimo mercoledì 14 Ottobre dalle 12:00 alle 13:00 dedicato a VMware Cloud ™ on AWS, il servizio on demand che consente di eseguire applicazioni in ambienti cloud basati su VMware vSphere® e di accedere ad una vasta gamma di servizi AWS, sfruttando a pieno le potenzialità del cloud AWS e tutelando gli investimenti VMware esistenti.
Molte organizzazioni sfruttano i vantaggi del cloud migrando i propri carichi di lavoro Oracle e assicurandosi notevoli vantaggi in termini di agilità ed efficienza dei costi.
La migrazione di questi carichi di lavoro, può creare complessità durante la modernizzazione e il refactoring delle applicazioni e a questo si possono aggiungere rischi di prestazione che possono essere introdotti quando si spostano le applicazioni dai data center locali.
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
Molte aziende oggi, costruiscono applicazioni con funzionalità di tipo ledger ad esempio per verificare lo storico di accrediti o addebiti nelle transazioni bancarie o ancora per tenere traccia del flusso supply chain dei propri prodotti.
Alla base di queste soluzioni ci sono i database ledger che permettono di avere un log delle transazioni trasparente, immutabile e crittograficamente verificabile, ma sono strumenti complessi e onerosi da gestire.
Amazon QLDB elimina la necessità di costruire sistemi personalizzati e complessi fornendo un database ledger serverless completamente gestito.
In questa sessione scopriremo come realizzare un'applicazione serverless completa che utilizzi le funzionalità di QLDB.
Con l’ascesa delle architetture di microservizi e delle ricche applicazioni mobili e Web, le API sono più importanti che mai per offrire agli utenti finali una user experience eccezionale. In questa sessione impareremo come affrontare le moderne sfide di progettazione delle API con GraphQL, un linguaggio di query API open source utilizzato da Facebook, Amazon e altro e come utilizzare AWS AppSync, un servizio GraphQL serverless gestito su AWS. Approfondiremo diversi scenari, comprendendo come AppSync può aiutare a risolvere questi casi d’uso creando API moderne con funzionalità di aggiornamento dati in tempo reale e offline.
Inoltre, impareremo come Sky Italia utilizza AWS AppSync per fornire aggiornamenti sportivi in tempo reale agli utenti del proprio portale web.
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
Molte organizzazioni sfruttano i vantaggi del cloud migrando i propri carichi di lavoro Oracle e assicurandosi notevoli vantaggi in termini di agilità ed efficienza dei costi.
La migrazione di questi carichi di lavoro, può creare complessità durante la modernizzazione e il refactoring delle applicazioni e a questo si possono aggiungere rischi di prestazione che possono essere introdotti quando si spostano le applicazioni dai data center locali.
In queste slide, gli esperti AWS e VMware presentano semplici e pratici accorgimenti per facilitare e semplificare la migrazione dei carichi di lavoro Oracle accelerando la trasformazione verso il cloud, approfondiranno l’architettura e dimostreranno come sfruttare a pieno le potenzialità di VMware Cloud ™ on AWS.
1) The document discusses building a minimum viable product (MVP) using Amazon Web Services (AWS).
2) It provides an example of an MVP for an omni-channel messenger platform that was built from 2017 to connect ecommerce stores to customers via web chat, Facebook Messenger, WhatsApp, and other channels.
3) The founder discusses how they started with an MVP in 2017 with 200 ecommerce stores in Hong Kong and Taiwan, and have since expanded to over 5000 clients across Southeast Asia using AWS for scaling.
This document discusses pitch decks and fundraising materials. It explains that venture capitalists will typically spend only 3 minutes and 44 seconds reviewing a pitch deck. Therefore, the deck needs to tell a compelling story to grab their attention. It also provides tips on tailoring different types of decks for different purposes, such as creating a concise 1-2 page teaser, a presentation deck for pitching in-person, and a more detailed read-only or fundraising deck. The document stresses the importance of including key information like the problem, solution, product, traction, market size, plans, team, and ask.
This document discusses building serverless web applications using AWS services like API Gateway, Lambda, DynamoDB, S3 and Amplify. It provides an overview of each service and how they can work together to create a scalable, secure and cost-effective serverless application stack without having to manage servers or infrastructure. Key services covered include API Gateway for hosting APIs, Lambda for backend logic, DynamoDB for database needs, S3 for static content, and Amplify for frontend hosting and continuous deployment.
This document provides tips for fundraising from startup founders Roland Yau and Sze Lok Chan. It discusses generating competition to create urgency for investors, fundraising in parallel rather than sequentially, having a clear fundraising narrative focused on what you do and why it's compelling, and prioritizing relationships with people over firms. It also notes how the pandemic has changed fundraising, with examples of deals done virtually during this time. The tips emphasize being fully prepared before fundraising and cultivating connections with investors in advance.
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
This document discusses Amazon's machine learning services for building conversational interfaces and extracting insights from unstructured text and audio. It describes Amazon Lex for creating chatbots, Amazon Comprehend for natural language processing tasks like entity extraction and sentiment analysis, and how they can be used together for applications like intelligent call centers and content analysis. Pre-trained APIs simplify adding machine learning to apps without requiring ML expertise.
Amazon Elastic Container Service (Amazon ECS) è un servizio di gestione dei container altamente scalabile, che semplifica la gestione dei contenitori Docker attraverso un layer di orchestrazione per il controllo del deployment e del relativo lifecycle. In questa sessione presenteremo le principali caratteristiche del servizio, le architetture di riferimento per i differenti carichi di lavoro e i semplici passi necessari per poter velocemente migrare uno o più dei tuo container.
"Making .NET Application Even Faster", Sergey Teplyakov.pptxFwdays
In this talk we're going to explore performance improvement lifecycle, starting with setting the performance goals, using profilers to figure out the bottle necks, making a fix and validating that the fix works by benchmarking it. The talk will be useful for novice and seasoned .NET developers and architects interested in making their application fast and understanding how things work under the hood.
Keynote : AI & Future Of Offensive SecurityPriyanka Aash
In the presentation, the focus is on the transformative impact of artificial intelligence (AI) in cybersecurity, particularly in the context of malware generation and adversarial attacks. AI promises to revolutionize the field by enabling scalable solutions to historically challenging problems such as continuous threat simulation, autonomous attack path generation, and the creation of sophisticated attack payloads. The discussions underscore how AI-powered tools like AI-based penetration testing can outpace traditional methods, enhancing security posture by efficiently identifying and mitigating vulnerabilities across complex attack surfaces. The use of AI in red teaming further amplifies these capabilities, allowing organizations to validate security controls effectively against diverse adversarial scenarios. These advancements not only streamline testing processes but also bolster defense strategies, ensuring readiness against evolving cyber threats.
Redefining Cybersecurity with AI CapabilitiesPriyanka Aash
In this comprehensive overview of Cisco's latest innovations in cybersecurity, the focus is squarely on resilience and adaptation in the face of evolving threats. The discussion covers the imperative of tackling Mal information, the increasing sophistication of insider attacks, and the expanding attack surfaces in a hybrid work environment. Emphasizing a shift towards integrated platforms over fragmented tools, Cisco introduces its Security Cloud, designed to provide end-to-end visibility and robust protection across user interactions, cloud environments, and breaches. AI emerges as a pivotal tool, from enhancing user experiences to predicting and defending against cyber threats. The blog underscores Cisco's commitment to simplifying security stacks while ensuring efficacy and economic feasibility, making a compelling case for their platform approach in safeguarding digital landscapes.
Discovery Series - Zero to Hero - Task Mining Session 1DianaGray10
This session is focused on providing you with an introduction to task mining. We will go over different types of task mining and provide you with a real-world demo on each type of task mining in detail.
Keynote : Presentation on SASE TechnologyPriyanka Aash
Secure Access Service Edge (SASE) solutions are revolutionizing enterprise networks by integrating SD-WAN with comprehensive security services. Traditionally, enterprises managed multiple point solutions for network and security needs, leading to complexity and resource-intensive operations. SASE, as defined by Gartner, consolidates these functions into a unified cloud-based service, offering SD-WAN capabilities alongside advanced security features like secure web gateways, CASB, and remote browser isolation. This convergence not only simplifies management but also enhances security posture and application performance across global networks and cloud environments. Discover how adopting SASE can streamline operations and fortify your enterprise's digital transformation strategy.
It's your unstructured data: How to get your GenAI app to production (and spe...Zilliz
So you've successfully built a GenAI app POC for your company -- now comes the hard part: bringing it to production. Aparavi addresses the challenges of AI projects while addressing data privacy and PII. Our Service for RAG helps AI developers and data scientists to scale their app to 1000s to millions of users using corporate unstructured data. Aparavi’s AI Data Loader cleans, prepares and then loads only the relevant unstructured data for each AI project/app, enabling you to operationalize the creation of GenAI apps easily and accurately while giving you the time to focus on what you really want to do - building a great AI application with useful and relevant context. All within your environment and never having to share private corporate data with anyone - not even Aparavi.
How UiPath Discovery Suite supports identification of Agentic Process Automat...DianaGray10
📚 Understand the basics of the newly persona-based LLM-powered Agentic Process Automation and discover how existing UiPath Discovery Suite products like Communication Mining, Process Mining, and Task Mining can be leveraged to identify APA candidates.
Topics Covered:
💡 Idea Behind APA: Explore the innovative concept of Agentic Process Automation and its significance in modern workflows.
🔄 How APA is Different from RPA: Learn the key differences between Agentic Process Automation and Robotic Process Automation.
🚀 Discover the Advantages of APA: Uncover the unique benefits of implementing APA in your organization.
🔍 Identifying APA Candidates with UiPath Discovery Products: See how UiPath's Communication Mining, Process Mining, and Task Mining tools can help pinpoint potential APA candidates.
🔮 Discussion on Expected Future Impacts: Engage in a discussion on the potential future impacts of APA on various industries and business processes.
Enhance your knowledge on the forefront of automation technology and stay ahead with Agentic Process Automation. 🧠💼✨
Speakers:
Arun Kumar Asokan, Delivery Director (US) @ qBotica and UiPath MVP
Naveen Chatlapalli, Solution Architect @ Ashling Partners and UiPath MVP
"Building Future-Ready Apps with .NET 8 and Azure Serverless Ecosystem", Stan...Fwdays
.NET 8 brought a lot of improvements for developers and maturity to the Azure serverless container ecosystem. So, this talk will cover these changes and explain how you can apply them to your projects. Another reason for this talk is the re-invention of Serverless from a DevOps perspective as a Platform Engineering trend with Backstage and the recent Radius project from Microsoft. So now is the perfect time to look at developer productivity tooling and serverless apps from Microsoft's perspective.
The Zaitechno Handheld Raman Spectrometer is a powerful and portable tool for rapid, non-destructive chemical analysis. It utilizes Raman spectroscopy, a technique that analyzes the vibrational fingerprint of molecules to identify their chemical composition. This handheld instrument allows for on-site analysis of materials, making it ideal for a variety of applications, including:
Material identification: Identify unknown materials, minerals, and contaminants.
Quality control: Ensure the quality and consistency of raw materials and finished products.
Pharmaceutical analysis: Verify the identity and purity of pharmaceutical compounds.
Food safety testing: Detect contaminants and adulterants in food products.
Field analysis: Analyze materials in the field, such as during environmental monitoring or forensic investigations.
The Zaitechno Handheld Raman Spectrometer is easy to use and features a user-friendly interface. It is compact and lightweight, making it ideal for field applications. With its rapid analysis capabilities, the Zaitechno Handheld Raman Spectrometer can help you improve efficiency and productivity in your research or quality control workflows.
The History of Embeddings & Multimodal EmbeddingsZilliz
Frank Liu will walk through the history of embeddings and how we got to the cool embedding models used today. He'll end with a demo on how multimodal RAG is used.
Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...Snarky Security
How wonderful it is that in our modern age, every bit of our biological data can be digitized, stored, and potentially pilfered by cyber thieves! Isn't it just splendid to think that while scientists are busy pushing the boundaries of biotechnology, hackers could be plotting the next big bio-data heist? This delightful scenario is brought to you by the ever-expanding digital landscape of biology and biotechnology, where the integration of computer science, engineering, and data science transforms our understanding and manipulation of biological systems.
While the fusion of technology and biology offers immense benefits, it also necessitates a careful consideration of the ethical, security, and associated social implications. But let's be honest, in the grand scheme of things, what's a little risk compared to potential scientific achievements? After all, progress in biotechnology waits for no one, and we're just along for the ride in this thrilling, slightly terrifying, adventure.
So, as we continue to navigate this complex landscape, let's not forget the importance of robust data protection measures and collaborative international efforts to safeguard sensitive biological information. After all, what could possibly go wrong?
-------------------------
This document provides a comprehensive analysis of the security implications biological data use. The analysis explores various aspects of biological data security, including the vulnerabilities associated with data access, the potential for misuse by state and non-state actors, and the implications for national and transnational security. Key aspects considered include the impact of technological advancements on data security, the role of international policies in data governance, and the strategies for mitigating risks associated with unauthorized data access.
This view offers valuable insights for security professionals, policymakers, and industry leaders across various sectors, highlighting the importance of robust data protection measures and collaborative international efforts to safeguard sensitive biological information. The analysis serves as a crucial resource for understanding the complex dynamics at the intersection of biotechnology and security, providing actionable recommendations to enhance biosecurity in an digital and interconnected world.
The evolving landscape of biology and biotechnology, significantly influenced by advancements in computer science, engineering, and data science, is reshaping our understanding and manipulation of biological systems. The integration of these disciplines has led to the development of fields such as computational biology and synthetic biology, which utilize computational power and engineering principles to solve complex biological problems and innovate new biotechnological applications. This interdisciplinary approach has not only accelerated research and development but also introduced new capabilities such as gene editing and biomanufact
UiPath Community Day Amsterdam: Code, Collaborate, ConnectUiPathCommunity
Welcome to our third live UiPath Community Day Amsterdam! Come join us for a half-day of networking and UiPath Platform deep-dives, for devs and non-devs alike, in the middle of summer ☀.
📕 Agenda:
12:30 Welcome Coffee/Light Lunch ☕
13:00 Event opening speech
Ebert Knol, Managing Partner, Tacstone Technology
Jonathan Smith, UiPath MVP, RPA Lead, Ciphix
Cristina Vidu, Senior Marketing Manager, UiPath Community EMEA
Dion Mes, Principal Sales Engineer, UiPath
13:15 ASML: RPA as Tactical Automation
Tactical robotic process automation for solving short-term challenges, while establishing standard and re-usable interfaces that fit IT's long-term goals and objectives.
Yannic Suurmeijer, System Architect, ASML
13:30 PostNL: an insight into RPA at PostNL
Showcasing the solutions our automations have provided, the challenges we’ve faced, and the best practices we’ve developed to support our logistics operations.
Leonard Renne, RPA Developer, PostNL
13:45 Break (30')
14:15 Breakout Sessions: Round 1
Modern Document Understanding in the cloud platform: AI-driven UiPath Document Understanding
Mike Bos, Senior Automation Developer, Tacstone Technology
Process Orchestration: scale up and have your Robots work in harmony
Jon Smith, UiPath MVP, RPA Lead, Ciphix
UiPath Integration Service: connect applications, leverage prebuilt connectors, and set up customer connectors
Johans Brink, CTO, MvR digital workforce
15:00 Breakout Sessions: Round 2
Automation, and GenAI: practical use cases for value generation
Thomas Janssen, UiPath MVP, Senior Automation Developer, Automation Heroes
Human in the Loop/Action Center
Dion Mes, Principal Sales Engineer @UiPath
Improving development with coded workflows
Idris Janszen, Technical Consultant, Ilionx
15:45 End remarks
16:00 Community fun games, sharing knowledge, drinks, and bites 🍻
Finetuning GenAI For Hacking and DefendingPriyanka Aash
Generative AI, particularly through the lens of large language models (LLMs), represents a transformative leap in artificial intelligence. With advancements that have fundamentally altered our approach to AI, understanding and leveraging these technologies is crucial for innovators and practitioners alike. This comprehensive exploration delves into the intricacies of GenAI, from its foundational principles and historical evolution to its practical applications in security and beyond.
Retrieval Augmented Generation Evaluation with RagasZilliz
Retrieval Augmented Generation (RAG) enhances chatbots by incorporating custom data in the prompt. Using large language models (LLMs) as judge has gained prominence in modern RAG systems. This talk will demo Ragas, an open-source automation tool for RAG evaluations. Christy will talk about and demo evaluating a RAG pipeline using Milvus and RAG metrics like context F1-score and answer correctness.
2. Agenda Overview
10:00 AM Registration
10:30 AM Introduction to Big Data @ AWS
12:00 PM Lunch + Registration for Technical Sessions
12:30 PM Data Collection and Storage
1:45PM Real-time Event Processing
3:00PM Analytics (incl Machine Learning)
4:30 PM Open Q&A Roundtable
3. Collect Process Analyze
Store
Data Collection
and Storage
Data
Processing
Event
Processing
Data
Analysis
Primitive Patterns
EMR Redshift
Machine
Learning
5. Why Amazon EMR?
Easy to Use
Launch a cluster in minutes
Low Cost
Pay an hourly rate
Elastic
Easily add or remove capacity
Reliable
Spend less time monitoring
Secure
Manage firewalls
Flexible
Control the cluster
7. Try different configurations to find your optimal architecture
CPU
c3 family
cc1.4xlarge
cc2.8xlarge
Memory
m2 family
r3 family
Disk/IO
d2 family
i2 family
General
m1 family
m3 family
Choose your instance types
Batch Machine Spark and Large
process learning interactive HDFS
8. Easy to add/remove compute capacity to your cluster
Match compute
demands with
cluster sizing
Resizable clusters
9. Spot Instances
for task nodes
Up to 90%
off Amazon EC2
on-demand
pricing
On-demand for
core nodes
Standard
Amazon EC2
pricing for
on-demand
capacity
Easy to use Spot Instances
Meet SLA at predictable cost Exceed SLA at lower cost
10. Amazon S3 as your persistent data store
• Separate compute and storage
• Resize and shut down Amazon
EMR clusters with no data loss
• Point multiple Amazon EMR
clusters at same data in Amazon
S3
EMR
EMR
Amazon
S3
11. EMRFS makes it easier to leverage S3
• Better performance and error handling options
• Transparent to applications – Use “s3://”
• Consistent view
For consistent list and read-after-write for new puts
• Support for Amazon S3 server-side and client-side
encryption
• Faster listing using EMRFS metadata
12. Amazon S3 EMRFS metadata
in Amazon DynamoDB
• List and read-after-write consistency
• Faster list operations
Number
of objects
Without
Consistent
Views
With Consistent
Views
1,000,000 147.72 29.70
100,000 12.70 3.69
Fast listing of S3 objects using
EMRFS metadata
*Tested using a single node cluster with a m3.xlarge instance.
14. Optimize to leverage HDFS
• Iterative workloads
If you’re processing the same dataset more than once
• Disk I/O intensive workloads
Persist data on Amazon S3 and use S3DistCp to
copy to HDFS for processing
15. Pattern #1: Batch processing
GBs of logs pushed
to Amazon S3 hourly
Daily Amazon EMR
cluster using Hive to
process data
Input and output
stored in Amazon S3
Load subset into
Redshift DW
16. Pattern #2: Online data-store
Data pushed to
Amazon S3
Daily Amazon EMR cluster
Extract, Transform, and Load
(ETL) data into database
24/7 Amazon EMR cluster
running HBase holds last 2
years’ worth of data
Front-end service uses
HBase cluster to power
dashboard with high
concurrency
17. Pattern #3: Interactive query
TBs of logs sent
daily
Logs stored in S3
Transient EMR
clusters
Hive Metastore
18. Example: Log Processing using Amazon EMR
• Aggregating small files using s3distcp
• Defining Hive tables with data on Amazon S3
• Interactive querying using Hue
Amazon S3
Log Bucket
Amazon
EMR
Processed and
structured log data
19. Months of user history Common misspellings
Data Analyzed Using EMR:
Westen
Wistin
Westan
Whestin
Automatic spelling corrections
22. Clickstream Analysis for Amazon.com
• Redshift runs web log analysis for Amazon.com
100 node Redshift Cluster
Over one petabyte workload
Largest table: 400TB
2TB of data per day
• Understand customer behavior
��� Who is browsing but not buying
Which products / features are winners
What sequence led to higher customer conversion
23. Redshift Performance Realized
• Scan 15 months of data: 14 minutes
2.25 trillion rows
• Load one day worth of data: 10 minutes
5 billion rows
• Backfill one month of data: 9.75 hours
150 billion rows
• Pig Amazon Redshift: 2 days to 1 hr
10B row join with 700M rows
• Oracle Amazon Redshift: 90 hours to 8 hrs
Reduced number of SQLs by a factor of 3
24. Amazon Redshift Architecture
• Leader Node
SQL endpoint
Stores metadata
Coordinates query execution
• Compute Nodes
Local, columnar storage
Execute queries in parallel
Load, backup, restore via
Amazon S3; load from
Amazon DynamoDB or SSH
• Two hardware platforms
Optimized for data processing
DW1: HDD; scale from 2TB to 2PB
DW2: SSD; scale from 160GB to 325TB
10 GigE
(HPC)
Ingestion
Backup
Restore
JDBC/ODBC
25. Amazon Redshift Node Types
• Optimized for I/O intensive workloads
• High disk density
• On demand at $0.85/hour
• As low as $1,000/TB/Year
• Scale from 2TB to 1.6PB
DW1.XL: 16 GB RAM, 2 Cores
3 Spindles, 2 TB compressed storage
DW1.8XL: 128 GB RAM, 16 Cores, 24 Spindles
16 TB compressed, 2 GB/sec scan rate
• High performance at smaller storage
size
• High compute and memory density
• On demand at $0.25/hour
• As low as $5,500/TB/Year
• Scale from 160GB to 256TB
DW2.L: 16 GB RAM, 2 Cores,
160 GB compressed SSD storage
DW2.8XL: 256 GB RAM, 32 Cores,
2.56 TB of compressed SSD storage
26. Amazon Redshift dramatically reduces I/O
Column storage
Data compression
Zone maps
Direct-attached storage
• With row storage you do
unnecessary I/O
• To get total amount, you have
to read everything
ID Age State Amount
123 20 CA 500
345 25 WA 250
678 40 FL 125
957 37 WA 375
27. Amazon Redshift dramatically reduces I/O
Column storage
Data compression
Zone maps
Direct-attached storage
With column storage, you
only read the data you need
ID Age State Amount
123 20 CA 500
345 25 WA 250
678 40 FL 125
957 37 WA 375
28. analyze compression listing;
Table | Column | Encoding
---------+----------------+----------
listing | listid | delta
listing | sellerid | delta32k
listing | eventid | delta32k
listing | dateid | bytedict
listing | numtickets | bytedict
listing | priceperticket | delta32k
listing | totalprice | mostly32
listing | listtime | raw
Amazon Redshift dramatically reduces I/O
Column storage
Data compression
Zone maps
Direct-attached storage
• COPY compresses
automatically
• You can analyze and override
• More performance, less cost
29. Amazon Redshift dramatically reduces I/O
Column storage
Data compression
Zone maps
Direct-attached storage
• Track the minimum and
maximum value for each block
• Skip over blocks that don’t
contain relevant data
10 | 13 | 14 | 26 |…
… | 100 | 245 | 324
375 | 393 | 417…
… 512 | 549 | 623
637 | 712 | 809 …
… | 834 | 921 | 959
10
324
375
623
637
959
30. Amazon Redshift dramatically reduces I/O
Column storage
Data compression
Zone maps
Direct-attached storage
• Use local storage for
performance
• Maximize scan rates
• Automatic replication
and continuous
backup
• HDD & SSD platforms
32. Amazon Redshift parallelizes and
distributes everything
Query
Load
Backup/Restore
Resize
• Load in parallel from Amazon S3
or DynamoDB or any SSH
connection
• Data automatically distributed
and sorted according to DDL
• Scales linearly with the number of
nodes in the cluster
33. Amazon Redshift parallelizes and
distributes everything
Query
Load
Backup/Restore
Resize
• Backups to Amazon S3 are automatic,
continuous and incremental
• Configurable system snapshot retention
period. Take user snapshots on-
demand
• Cross region backups for disaster
recovery
• Streaming restores enable you to
resume querying faster
34. Amazon Redshift parallelizes and
distributes everything
Query
Load
Backup/Restore
Resize
• Resize while remaining online
• Provision a new cluster in the
background
• Copy data in parallel from node to
node
• Only charged for source cluster
35. Amazon Redshift parallelizes and
distributes everything
Query
Load
Backup/Restore
Resize
• Automatic SQL endpoint
switchover via DNS
• Decommission the source
cluster
• Simple operation via Console or
API
37. Table Distribution Styles
Distribution Key All
Node 1
Slice
1
Slice
2
Node 2
Slice
3
Slice
4
Node 1
Slice
1
Slice
2
Node 2
Slice
3
Slice
4
All data on
every node
Same key to
same location
Node 1
Slice
1
Slice
2
Node 2
Slice
3
Slice
4
Even
Round robin
distribution
38. Sorting Data
• In the slices (on disk), the data is sorted by a sort key
If no sort key exists Redshift uses the data insertion order
• Choose a sort key that is frequently used in your queries
As a query predicate (date, identifier, …)
As a join parameter (it can also be the hash key)
• The sort key allows Redshift to avoid reading entire
blocks based on predicates
For example, a table containing a timestamp sort key where only recent
data is accessed, will skip blocks containing “old” data
39. Interleaved Multi Column Sort
• Compound Sort Keys
Optimized for applications that filter data by one leading column
• Interleaved Sort Keys (new)
Optimized for filtering data by up to eight columns
No storage overhead unlike an index
Lower maintenance penalty compared to indexes
40. Compound Sort Keys Illustrated
• Records in Redshift are
stored in blocks.
• For this illustration, let’s
assume that four records
fill a block
• Records with a given
cust_id are all in one block
• However, records with a
given prod_id are spread
across four blocks
1
1
1
1
2
3
4
1
4
4
4
2
3
4
4
1
3
3
3
2
3
4
3
1
2
2
2
2
3
4
2
1
1 [1,1] [1,2] [1,3] [1,4]
2 [2,1] [2,2] [2,3] [2,4]
3 [3,1] [3,2] [3,3] [3,4]
4 [4,1] [4,2] [4,3] [4,4]
1 2 3 4
prod_id
cust_id
cust_id prod_id other columns blocks
41. 1 [1,1] [1,2] [1,3] [1,4]
2 [2,1] [2,2] [2,3] [2,4]
3 [3,1] [3,2] [3,3] [3,4]
4 [4,1] [4,2] [4,3] [4,4]
1 2 3 4
prod_id
cust_id
Interleaved Sort Keys Illustrated
• Records with a given
cust_id are spread
across two blocks
• Records with a given
prod_id are also
spread across two
blocks
• Data is sorted in equal
measures for both
keys
1
1
2
2
2
1
2
3
3
4
4
4
3
4
3
1
3
4
4
2
1
2
3
3
1
2
2
4
3
4
1
1
cust_id prod_id other columns blocks
43. Custom ODBC and JDBC Drivers
• Up to 35% higher performance than open source
drivers
• Supported by Informatica, Microstrategy, Pentaho,
Qlik, SAS, Tableau, Tibco, and others
• Will continue to support PostgreSQL open source
drivers
• Download drivers from console
44. User Defined Functions
• We’re enabling User Defined Functions (UDFs)
so you can add your own
Scalar and Aggregate Functions supported
• You’ll be able to write UDFs using Python 2.7
Syntax is largely identical to PostgreSQL UDF Syntax
System and network calls within UDFs are prohibited
• Comes with Pandas, NumPy, and SciPy pre-
installed
You’ll also be able import your own libraries for even
more flexibility
46. Operational Reporting with Redshift
Amazon S3
Log Bucket
Amazon
EMR
Processed and
structured log
data
Amazon
Redshift
Operational
Reports
47. Amazon Web Services’ global
customer and partner conference
Learn more and register:
reinvent.awsevents.com
October 6-9, 2015 | The Venetian - Las Vegas, NV
Amazon EMR is more than just MapReduce.
Bootstrap actions available on GitHub
In the next few slides, we’ll talk about data persistence models with Amazon EMR. The first pattern is Amazon S3 as HDFS. With this data persistence model, data gets stored on Amazon S3. HDFS does not play any role in storing data. As a matter of fact, HDFS is only there for temporary storage. Another common thing I hear is that storing data on Amazon S3 instead of HDFS slows my job down a lot because data has to get copied to the HDFS/disk first before processing starts. That’s incorrect. If you tell Hadoop that your data is on Amazon S3, Hadoop reads directly from Amazon S3 and streams data to Mappers without toughing the disk. Not to be completely correct, data does touch HDFS when data has to shuffle from mappers to reducers, but as I mentioned, HDFS acts as the temp space and nothing more.
EMRFS is an implementation of HDFS used for reading and writing regular files from Amazon EMR directly to Amazon S3. EMRFS provides the convenience of storing persistent data in Amazon S3 for use with Hadoop while also providing features like Amazon S3 server-side encryption, read-after-write consistency, and list consistency.
And every other feature that comes with Amazon S3. Features such as SSE, LifeCycle, etc. And again keep in mind that Amazon S3 as the storage is the main reason why we can’t build elastic clusters where nodes get added and removed dynamically without any data loss.
In the next few slides, we’ll talk about data persistence models with EMR. The first pattern is Amazon S3 as HDFS. With this data persistence model, data gets stored on Amazon S3. HDFS does not play any role in storing data. As a matter of fact, HDFS is only there for temporary storage. Another common thing I hear is that storing data on Amazon S3 instead of HDFS slows my job down a lot because data has to get copied to HDFS/disk first before processing starts. That’s incorrect. If you tell Hadoop that your data is on Amazon S3, Hadoop reads directly from Amazon S3 and streams data to Mappers without toughing the disk. Not to be completely correct, data does touch HDFS when data has to shuffle from mappers to reducers, but as I mentioned, HDFS acts as the temp space and nothing more.
EMRFS is an implementation of HDFS used for reading and writing regular files from Amazon EMR directly to Amazon S3. EMRFS provides the convenience of storing persistent data in Amazon S3 for use with Hadoop while also providing features like Amazon S3 server-side encryption, read-after-write consistency, and list consistency.
In the next few slides, we’ll talk about data persistence models with EMR. The first pattern is Amazon S3 as HDFS. With this data persistence model, data gets stored on Amazon S3. HDFS does not play any role in storing data. As a matter of fact, HDFS is only there for temporary storage. Another common thing I hear is that storing data on Amazon S3 instead of HDFS slows my job down a lot because data has to get copied to HDFS/disk first before processing starts. That’s incorrect. If you tell Hadoop that your data is on Amazon S3, Hadoop reads directly from Amazon S3 and streams data to Mappers without toughing the disk. Not to be completely correct, data does touch HDFS when data has to shuffle from mappers to reducers, but as I mentioned, HDFS acts as the temp space and nothing more.
EMRFS is an implementation of HDFS used for reading and writing regular files from Amazon EMR directly to Amazon S3. EMRFS provides the convenience of storing persistent data in Amazon S3 for use with Hadoop while also providing features like Amazon S3 server-side encryption, read-after-write consistency, and list consistency.
EMR example #3: EMR for ETL and query engine for investigations which require all raw data
CloudFront logs arrive out of order.
200 node cluster
spin it up daily, shut it down
Nasdaq security, HasOffers loads 60M rows per day in 2 min intervals, Desk: high concurrency user facing portal (read/write cluster), Amazon.com/NTT PB scale. Pinterest saw 50-100x speed ups when moved 300TB from Hadoop to Redshift. Nokia saw 50% reduction in costs.
Today we will over the role of Amazon Redshift in addressing the Web Log Analysis problem for one of the largest online retailer, Amazon.com
<go over the slide with restated language>
Read only the data you need
Read only the data you need
Read only the data you need
Read only the data you need
Read only the data you need
Comments on next slide.
Redshift is a distributed system:
A cluster contains a leader node and compute nodes
A compute node contains slices (one per core) that contain data
Data is distributed among slices in 3 ways:
Even – Rows distributed in Round Robin fashion (default)
Key – Rows distributed based on a distribution key (hash of a defined column)
All - Rows distributed to all slices
Queries run on all slices in parallel
Optimal query throughput can be achieved when data is evenly spread across slices
Redshift leverages sorting in storage.
Redshift stores column data in blocks, for the sort key, the data blocks are “marked” with the min and max value of this columns, allowing Redshift to skip reading the blocks that are not relevant to the current query.
Check that join parameter statement is true (best practices on designing tables)
Redshift works with customer’s BI tool of choice through Postgres drivers and a JDBC, ODBC connection. A number of partners shown here have certified integration with Redshift, meaning they have done testing to validate/build Redshift integration and make using Redshift easy from a UI perspective. If there are tools customer’s use not shown we can work with Redshift on getting them integrated.
So, we started with our MySQL server. But this time we would run directly on the server itself SQL statements that would dump the data out to local files. Then using s3cmd we copied the flat files into our S3 bucket.
Select data from MySQL and use the S3cmd to copy these flat files to S3.
Use BCP to export data into an EC2 instance, which generates and copies flat files to S3.
And then instead of using EMR, we just run some crazy SQL statements to transform the data into the Production version of Redshift.
Copy data into a staging schema in Redshift where it can be transformed via SQL to the final table structure and loaded into the production schema.
Use standard tools, like Microstrategy and Tableau, to provide business views into the data.
And then of course we need a good way for business users to look at the data, and that’s where MicroStrategy and Tableau come into play.