Amazon Redshift는 속도가 빠른 페타바이트 규모의 완전관리형 데이터 웨어하우스로, 간편하고 비용 효율적으로 모든 데이터를 기존 비즈니스 인텔리전스 도구를 사용하여 분석할 수 있게 해줍니다. 이 강연에서는 대량 병렬처리를 가능하게 하는 RedShift의 분산 처리구조를 살펴보고, 다양한 데이터 소스 및 포맷으로 부터의 데이터 통합 및 로드를 위한 모범 사례에 대하여 실습을 통하여 학습할 예정입니다.
연사: 김상필, 아마존 웹서비스 솔루션즈 아키텍트
AWS re:Invent 2016: JustGiving: Serverless Data Pipelines, Event-Driven ETL, ...Amazon Web Services
Organizations need to gain insight and knowledge from a growing number of Internet of Things (IoT), application programming interfaces (API), clickstreams, unstructured and log data sources. However, organizations are also often limited by legacy data warehouses and ETL processes that were designed for transactional data. Building scalable big data pipelines with automated extract-transform-load (ETL) and machine learning processes can address these limitations. JustGiving is the world’s largest social platform for online giving. In this session, we describe how we created several scalable and loosely coupled event-driven ETL and ML pipelines as part of our in-house data science platform called RAVEN. You learn how to leverage AWS Lambda, Amazon S3, Amazon EMR, Amazon Kinesis, and other services to build serverless, event-driven, data and stream processing pipelines in your organization. We review common design patterns, lessons learned, and best practices, with a focus on serverless big data architectures with AWS Lambda.
This document discusses database migration and replication using AWS services. It provides an overview of AWS Database Migration Service (DMS) and Schema Conversion Tool (SCT) and how they can be used to migrate databases to AWS, replicate data across different databases, and modernize database platforms. Specific migration use cases are presented, along with a demonstration and discussion of pricing and resources available to customers.
Aurora is Amazon's relational database service that is compatible with MySQL and PostgreSQL. It is optimized for fast performance, high availability, and automatic scaling. Aurora provides up to 5x better performance than MySQL and 3x better than PostgreSQL for the same cost. It automatically scales up to 64TB in storage and can support tens of thousands of low-latency transactions. Aurora replicates data across three Availability Zones for high availability and durability with storage replicated six ways. It is easy to use with simple pricing and management capabilities.
This document provides an overview of Amazon Redshift presented by Pavan Pothukuchi and Chris Liu. The agenda includes an introduction to Redshift, its benefits, use cases, and Coursera's experience using Redshift. Some key benefits highlighted are that Redshift is fast, inexpensive, fully managed, secure, and innovates quickly. Example use cases from NTT Docomo and Nasdaq are discussed. Chris Liu then discusses Coursera's experience moving from no data warehouse to using Redshift over three years, including their current ecosystem involving Redshift, other AWS services, and business intelligence applications. Lessons learned around thinking in Redshift, communicating with users, surprises, and reflections are also shared.
Accelerate your Business with SAP on AWS - AWS Summit Cape Town 2017 Amazon Web Services
Michael Needham, a senior manager at AWS, presented on accelerating businesses with SAP on AWS. He discussed how AWS provides scalable, cost-effective infrastructure for SAP workloads. Rob Enslin, president of SAP Global Operations, praised how AWS provisioned a 14 TB HANA system in an "unbelievable" time, helping deliver simplicity. AWS offers a variety of compute instances certified for SAP and manages all infrastructure, allowing customers to focus on innovation.
Amazon Web Services (AWS) offers a wide range of database options to fit your application requirements. From database services that are fully managed and that can be launched in minutes with just a few clicks to self-managed databases running on EC2. AWS managed database services include Amazon Relational Database Service (Amazon RDS), with support for six commonly used database engines, Amazon Aurora, a MySQL and PostgreSQL-compatible relational database, Amazon DynamoDB, a NoSQL database service or Amazon Redshift, a petabyte-scale data warehouse service. AWS also provides the AWS Database Migration Service, a service which makes it easy and inexpensive to migrate your databases to AWS cloud.
In this webinar, we take a closer look at the AWS database offerings and learn how to quickly select, set up, operate, and scale your database in the cloud.
Learning Objectives:
• Gain insights into the AWS database offering and know which to select for your workload.
• Learn how the AWS Schema Conversion Tool (AWS SCT) and AWS Database Migration Service (AWS DMS) can facilitate and simplify migrating your business critical applications to Amazon Web Services.
• Learn how Amazon DynamoDB Accelerator (DAX) can reduce Amazon DynamoDB response times from milliseconds to microseconds, even at millions of requests per second.
• Hear from our partners like Version1 and Clckwrk who can help you in your journey towards Database freedom.
Database Migration – Simple, Cross-Engine and Cross-Platform MigrationAmazon Web Services
Learn about the new AWS Database Migration Service, which helps you migrate databases with minimal downtime from on-premises and Amazon EC2 environments to Amazon RDS, Amazon Redshift, Amazon Aurora and EC2 databases.
This document summarizes Amazon Web Services database migration and replication services. It discusses how the AWS Database Migration Service can migrate databases between on-premises and cloud environments within 10 minutes with no application downtime. It also describes how the AWS Schema Conversion Tool can help migrate databases off Oracle and SQL Server to other database engines like MySQL. Finally, it provides an overview of Amazon RDS managed database services and high availability features.
Migrating Your Databases to AWS: Deep Dive on Amazon RDS and AWS Database Mig...Amazon Web Services
Amazon RDS allows you to launch an optimally configured, secure and highly available database with just a few clicks. It provides cost-efficient and resizable capacity, automates time-consuming database administration tasks, and provides you with six familiar database engines to choose from: Amazon Aurora, Oracle, Microsoft SQL Server, PostgreSQL, MySQL and MariaDB. In this session, we will take a close look at the capabilities of Amazon RDS and explain how it works. We’ll also discuss the AWS Database Migration Service and AWS Schema Conversion Tool, which help you migrate databases and data warehouses with minimal downtime from on-premises and cloud environments to Amazon RDS and other Amazon services. Gain your freedom from expensive, proprietary databases while providing your applications with the fast performance, scalability, high availability, and compatibility they need.
AWS Speaker: Andrew Kane, Solutions Architect - Amazon Web Services
AWS re:Invent 2016: ElastiCache Deep Dive: Best Practices and Usage Patterns ...Amazon Web Services
In this session, we provide a peek behind the scenes to learn about Amazon ElastiCache's design and architecture. See common design patterns with our Redis and Memcached offerings and how customers have used them for in-memory operations to reduce latency and improve application throughput. During this session, we review ElastiCache best practices, design patterns, and anti-patterns.
AWS re:Invent 2016: Deep Dive on Amazon Relational Database Service (DAT305)Amazon Web Services
Amazon RDS allows customers to launch an optimally configured, secure and highly available database with just a few clicks. It provides cost-efficient and resizable capacity while managing time-consuming database administration tasks, freeing you up to focus on your applications and business. Amazon RDS provides you six database engines to choose from, including Amazon Aurora, Oracle, Microsoft SQL Server, PostgreSQL, MySQL and MariaDB. In this session, we take a closer look at the capabilities of RDS and all the different options available. We do a deep dive into how RDS works and the best practises to achive the optimal perfomance, flexibility, and cost saving for your databases.
This document discusses running databases on AWS. It provides an overview of several AWS database services including Amazon DynamoDB, Amazon RDS, Amazon Redshift, and Amazon ElastiCache. It highlights how these services provide scalability, reliability, automation of tasks like backups and patching, and pay-as-you-go pricing. It also discusses how AWS database services eliminate the need to manage hardware, databases, backups, and other complex tasks when compared to operating databases in an on-premises data center.
Amazon DynamoDB is a fully managed NoSQL database service for applications that need consistent, single-digit millisecond latency at any scale. This talk explores DynamoDB capabilities and benefits in detail and discusses how to get the most out of your DynamoDB database. We go over schema design best practices with DynamoDB across multiple use cases, including gaming, AdTech, IoT, and others. We also explore designing efficient indexes, scanning, and querying, and go into detail on a number of recently released features, including JSON document support, Streams, Time-to-Live (TTL), and more.
AWS re:Invent 2016: Billions of Rows Transformed in Record Time Using Matilli...Amazon Web Services
Billions of Rows Transformed in Record Time Using Matillion ETL for Amazon Redshift
GE Power & Water develops advanced technologies to help solve some of the world’s most complex challenges related to water availability and quality. They had amassed billions of rows of data on on-premises databases, but decided to migrate some of their core big data projects to the AWS Cloud. When they decided to transform and store it all in Amazon Redshift, they knew they needed an ETL/ELT tool that could handle this enormous amount of data and safely deliver it to its destination. In this session, Ryan Oates, Enterprise Architect at GE Water, shares his use case, requirements, outcomes and lessons learned. He also shares the details of his solution stack, including Amazon Redshift and Matillion ETL for Amazon Redshift in AWS Marketplace. You learn best practices on Amazon Redshift ETL supporting enterprise analytics and big data requirements, simply and at scale. You learn how to simplify data loading, transformation and orchestration on to Amazon Redshift and how build out a real data pipeline. Get the insights to deliver your big data project in record time.
AWS re:Invent 2016: How to Launch a 100K-User Corporate Back Office with Micr...Amazon Web Services
Learn how to build a scalable, compliance-ready, and automated deployment of the Microsoft “backoffice” servers for 100K users running on AWS. In this session, we show a reference architecture deployment of Exchange, SharePoint, Skype for Business, SQL Server and Active Directory in a single VPC. We discuss the following: (1) how the solution is automated for 100K users, (2) how the solution is enabled for compliance (e.g., FedRAMP, HIPAA, PCI), and (3) how the solution is built from modular 10K user blocks. Attendees should have knowledge of AWS CloudFormation, PowerShell, instance bootstrapping, VPCs, and Amazon Route 53, as well as the relevant Microsoft technologies.
The document discusses Amazon Relational Database Service (Amazon RDS), which allows users to set up and manage relational databases in the cloud. Some key points:
- Amazon RDS provides a managed database service running MySQL, Oracle, SQL Server, PostgreSQL, MariaDB, or Aurora databases. It handles time-consuming administration tasks.
- Databases can be easily launched and scaled up or down as needed. Additional storage, compute power, and throughput can also be provisioned.
- Automated backups provide point-in-time recovery. Multi-AZ deployments provide high availability and durability. Security features include encryption and IAM access control.
AWS re:Invent 2016: Workshop: Stretching Scalability: Doing more with Amazon ...Amazon Web Services
Easy scalability is a powerful feature of Amazon Aurora. Scalability in its actual definition refers to being able to get larger or smaller depending on the need. Amazon Aurora allows you to easily achieve this by scaling the database instance up or down and adding or removing read replicas. Scaling across regions brings additional resilience to your architectures and could boost your application performance due to geographic proximity. You can perform all of these scaling operations through the Aurora console. You can also automate instance and read scaling using lambda function or scripts based on the usage pattern you define. You can extend the automation by feeding your database usage data from Aurora enhanced monitoring into Machine Learning to provide more sophisticated predictive patterns to drive your automation. In this session we will do a deep dive into how scalability works in Aurora and how to make the best use of it to reduce your cost, increase application performance and architect resilient applications.
You should have good database knowledge and at least some experience with Amazon RDS or Amazon Aurora and should bring your own laptop.
BDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMRAmazon Web Services
Customers are migrating their analytics, data processing (ETL), and data science workloads running on Apache Hadoop, Spark, and data warehouse appliances from on-premise deployments to Amazon EMR in order to save costs, increase availability, and improve performance. Amazon EMR is a managed service that lets you process and analyze extremely large data sets using the latest versions of over 15 open-source frameworks in the Apache Hadoop and Spark ecosystems. This session will focus on identifying the components and workflows in your current environment and providing the best practices to migrate these workloads to Amazon EMR. We will explain how to move from HDFS to Amazon S3 as a durable storage layer, and how to lower costs with Amazon EC2 Spot instances and Auto Scaling. Additionally, we will go over common security recommendations and tuning tips to accelerate the time to production.
The document provides an overview of database migration options using AWS Database Migration Service (DMS) and AWS Schema Conversion Tool (SCT). It discusses how DMS can be used to migrate databases across different database platforms with minimal downtime. It also outlines how SCT can be used to convert schemas from commercial databases to open-source databases like PostgreSQL. The document shares customer examples and benefits of using DMS and SCT for heterogeneous, scale-up, and split migrations. It also lists available resources for customers on DMS and SCT.
In this presentation, you will get a look under the covers of Amazon Redshift, a fast, fully-managed, petabyte-scale data warehouse service for less than $1,000 per TB per year. Learn how Amazon Redshift uses columnar technology, optimized hardware, and massively parallel processing to deliver fast query performance on data sets ranging in size from hundreds of gigabytes to a petabyte or more. We'll also walk through techniques for optimizing performance and, you’ll hear from a specific customer and their use case to take advantage of fast performance on enormous datasets leveraging economies of scale on the AWS platform.
Speakers:
Ian Meyers, AWS Solutions Architect
Toby Moore, Chief Technology Officer, Space Ape
Best Practices for Supercharging Cloud Analytics on Amazon RedshiftSnapLogic
In this webinar, we discuss how the secret sauce to your business analytics strategy remains rooted on your approached, methodologies and the amount of data incorporated into this critical exercise. We also address best practices to supercharge your cloud analytics initiatives, and tips and tricks on designing the right information architecture, data models and other tactical optimizations.
To learn more, visit: http://www.snaplogic.com/redshift-trial
Data processing and analysis is where big data is most often consumed, driving business intelligence (BI) use cases that discover and report on meaningful patterns in the data. In this session, we will discuss options for processing, analyzing, and visualizing data. We will also look at partner solutions and BI-enabling services from AWS. Attendees will learn about optimal approaches for stream processing, batch processing, and interactive analytics with AWS services, such as, Amazon Machine Learning, Elastic MapReduce (EMR), and Redshift.
Created by: Jason Morris, Solutions Architect
This document provides an overview of Amazon Redshift data warehousing capabilities. It discusses how Redshift is fast, inexpensive, fully managed, secure, and innovates quickly. It describes how to get started with Redshift, provision clusters, model data, load and query data, and monitor performance. It also provides an example of how MakerBot uses Redshift as part of its "Dream Stack" along with other AWS services for analytics.
Best Practices for Migrating Your Data Warehouse to Amazon RedshiftAmazon Web Services
by Darin Briskman, Technical Evangelist, AWS
You can gain substantially more business insights and save costs by migrating your existing data warehouse to Amazon Redshift. This session will cover the key benefits of migrating to Amazon Redshift, migration strategies, and tools and resources that can help you in the process. We’ll learn about AWS Database Migration Service and AWS Schema Migration Tool, which were recently enhanced to import data from six common data warehouse platforms. Level: 200
Learn how Amazon Redshift, our fully managed, petabyte-scale data warehouse, can help you quickly and cost-effectively analyze all of your data using your existing business intelligence tools. Get an introduction to how Amazon Redshift uses massively parallel processing, scale-out architecture, and columnar direct-attached storage to minimize I/O time and maximize performance. Learn how you can gain deeper business insights and save money and time by migrating to Amazon Redshift. Take away strategies for migrating from on-premises data warehousing solutions, tuning schema and queries, and utilizing third party solutions.
Best Practices for Migrating your Data Warehouse to Amazon Redshift Amazon Web Services
This document provides best practices for migrating a data warehouse to Amazon Redshift. It discusses why companies migrate to Redshift due to its scalability, performance and cost advantages. Example migration stories are provided from companies that achieved significant improvements after migrating large datasets from Oracle, Greenplum and SQL on Hadoop to Redshift. The document also outlines the Redshift cluster architecture, data loading best practices including file splitting and column encoding, schema design considerations and available migration tools.
Best Practices for Migrating your Data Warehouse to Amazon RedshiftAmazon Web Services
You can gain substantially more business insights and save costs by migrating your existing data warehouse to Amazon Redshift. This session will cover the key benefits of migrating to Amazon Redshift, migration strategies, and tools and resources that can help you in the process.
Data Warehousing in the Era of Big Data: Intro to Amazon RedshiftAmazon Web Services
An overview of how Amazon Redshift uses columnar technology, massively parallel processing, and other techniques to deliver fast query performance on petabyte-size datasets.
Data warehousing is a critical component for analysing and extracting actionable insights from your data. Amazon Redshift allows you to deploy a scalable data warehouse in a matter of minutes and starts to analyse your data right away using your existing business intelligence tools.
This is an introduction to Amazon Redshift and cover the essentials you need to deploy your data warehouse in the cloud so that you can achieve faster analytics and save costs.
(1) Amazon Redshift is a fully managed data warehousing service in the cloud that makes it simple and cost-effective to analyze large amounts of data across petabytes of structured and semi-structured data. (2) It provides fast query performance by using massively parallel processing and columnar storage techniques. (3) Customers like NTT Docomo, Nasdaq, and Amazon have been able to analyze petabytes of data faster and at a lower cost using Amazon Redshift compared to their previous on-premises solutions.
In this presentation, you will get a look under the covers of Amazon Redshift, a fast, fully-managed, petabyte-scale data warehouse service for less than $1,000 per TB per year. Learn how Amazon Redshift uses columnar technology, optimized hardware, and massively parallel processing to deliver fast query performance on data sets ranging in size from hundreds of gigabytes to a petabyte or more. You¹ll also hear from Dan Wagner, CEO at Civis Analytics, as he discusses why the Civis data science platform was designed on top of Amazon Redshift and the AWS platform in order to help smart organizations bridge their data silos, build 360 degree view of their customer relationships, and identify opportunities for driving their companies forward by leveraging enormous datasets, the power of analytics, and economies of scale on the AWS platform.
This document summarizes a presentation on Amazon Redshift. Redshift is a fully managed data warehouse service that makes it easy to analyze large amounts of data for less than $1,000 per terabyte per year. The presentation covers how to get started with Redshift, best practices for table design and data loading, using Redshift for analytics, and upgrading and scaling a Redshift data warehouse over time.
Learn how Amazon Redshift, our fully managed, petabyte-scale data warehouse, can help you quickly and cost-effectively analyze all of your data using your existing business intelligence tools. Get an introduction to how Amazon Redshift uses massively parallel processing, scale-out architecture, and columnar direct-attached storage to minimize I/O time and maximize performance. Learn how you can gain deeper business insights and save money and time by migrating to Amazon Redshift. Take away strategies for migrating from on-premises data warehousing solutions, tuning schema and queries, and utilizing third party solutions.
Best Practices for Migrating your Data Warehouse to Amazon RedshiftAmazon Web Services
You can gain substantially more business insights and save costs by migrating your existing data warehouse to Amazon Redshift. This session will cover the key benefits of migrating to Amazon Redshift, migration strategies, and tools and resources that can help you in the process. We’ll learn about AWS Database Migration Service and AWS Schema Migration Tool, which were recently enhanced to import data from six common data warehouse platforms.
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...Amazon Web Services
In this session, you will learn the key differences between a relational database management service (RDBMS) and non-relational (NoSQL) databases like Amazon DynamoDB. You will learn about suitable and unsuitable use cases for NoSQL databases. You'll learn strategies for migrating from an RDBMS to DynamoDB through a 5-phase, iterative approach. See how Sony migrated an on-premises MySQL database to the cloud with Amazon DynamoDB, and see the results of this migration.
Azure Days 2019: Grösser und Komplexer ist nicht immer besser (Meinrad Weiss)Trivadis
«Moderne» Data Warehouse/Data Lake Architekturen strotzen oft nur von Layern und Services. Mit solchen Systemen lassen sich Petabytes von Daten verwalten und analysieren. Das Ganze hat aber auch seinen Preis (Komplexität, Latenzzeit, Stabilität) und nicht jedes Projekt wird mit diesem Ansatz glücklich.
Der Vortrag zeigt die Reise von einer technologieverliebten Lösung zu einer auf die Anwender Bedürfnisse abgestimmten Umgebung. Er zeigt die Sonnen- und Schattenseiten von massiv parallelen Systemen und soll die Sinne auf das Aufnehmen der realen Kundenanforderungen sensibilisieren.
Saiba como Amazon Redshift, o nosso dataware house totalmente gerenciados, pode ajudá-lo de forma rápida e rentável analisar todos os seus dados utilizando suas ferramentas de BI. Também será abordado introdução ao serviço, o qual utiliza MPP, arquitetura scale-out e armazenamento de forma colunar.
Amazon DynamoDB is a fully managed NoSQL database service for applications that need consistent, single-digit millisecond latency at any scale. This talk explores DynamoDB capabilities and benefits in detail and discusses how to get the most out of your DynamoDB database. We go over schema design best practices with DynamoDB across multiple use cases, including gaming, AdTech, IoT, and others. We also explore designing efficient indexes, scanning, and querying, and go into detail on a number of recently released features, including JSON document support, Streams, and more.
Similar to 2017 AWS DB Day | Amazon Redshift 소개 및 실습 (20)
클라우드에서 Database를 백업하고 복구하는 방법에 대해 설명드립니다. AWS Backup을 사용하여 전체백업/복구 부터 PITR(Point in Time Recovery)백업, 그리고 멀티 어카운트, 멀티 리전등 다양한 데이터 보호 방법을 소개합니다(데모 포함). 또한 self-managed DB 의 데이터 저장소로 Amazon FSx for NetApp ONTAP 스토리지 서비스를 사용할 경우 얼마나 신속하게 데이터를 복구/복제 할수 있는지 살펴 봅니다.
기업은 이벤트나 신제품 출시 등으로 예기치 못한 트래픽 급증 시 데이터베이스 과부하, 서비스 지연 및 중단 등의 문제를 겪곤 합니다. Aurora 오토스케일링은 프로비저닝 시간으로 인해 실시간 대응이 어렵고, 트래픽 대응을 위한 과잉 프로비저닝이 발생합니다. 이러한 문제를 해결하기 위해 프로비저닝된 Amazon Aurora 클러스터와 Aurora Serverless v2(ASV2) 인스턴스를 결합하는 Amazon Aurora 혼합 구성 클러스터 아키텍처와 고해상도 지표를 기반으로 하는 커스텀 오토스케일링 솔루션을 소개합니다.
Amazon Aurora 클러스터를 초당 수백만 건의 쓰기 트랜잭션으로 확장하고 페타바이트 규모의 데이터를 관리할 수 있으며, 사용자 지정 애플리케이션 로직을 생성하거나 여러 데이터베이스를 관리할 필요 없이 Aurora에서 관계형 데이터베이스 워크로드를 ��일 Aurora 라이터 인스턴스의 한도 이상으로 확장할 수 있는 Amazon Aurora Limitless Database를 소개합니다.
Amazon Aurora MySQL 호환 버전 2(MySQL 5.7 호환성 지원)는 2024년 10월 31일에 표준 지원이 종료될 예정입니다. 이로 인해 Aurora MySQL의 메이저 버전 업그레이드를 검토하고 계시다면, Amazon Blue/Green Deployments는 운영 환경에 영향을 주지 않고 메이저 버전 업그레이드를 할 수 있는 최적의 솔루션입니다. 본 세션에서는 Blue/Green Deployments를 통한 Aurora MySQL의 메이저 버전 업그레이드를 실습합니다.
Amazon DocumentDB(MongoDB와 호환됨)는 빠르고 안정적이며 완전 관리형 데이터베이스 서비스입니다. Amazon DocumentDB를 사용하면 클라우드에서 MongoDB 호환 데이터베이스를 쉽게 설치, 운영 및 규모를 조정할 수 있습니다. Amazon DocumentDB를 사용하면 MongoDB에서 사용하는 것과 동일한 애플리케이션 코드를 실행하고 동일한 드라이버와 도구를 사용하는 것을 실습합니다.
사례로 알아보는 Database Migration Service : 데이터베이스 및 데이터 이관, 통합, 분리, 분석의 도구 - 발표자: ...Amazon Web Services Korea
Database Migration Service(DMS)는 RDBMS 이외에도 다양한 데이터베이스 이관을 지원합니다. 실제 고객사 사례를 통해 DMS가 데이터베이스 이관, 통합, 분리를 수행하는 데 어떻게 활용되는지 알아보고, 동시에 데이터 분석을 위한 데이터 수집(Data Ingest)에도 어떤 역할을 하는지 살펴보겠습니다.
Amazon Elasticache - Fully managed, Redis & Memcached Compatible Service (Lev...Amazon Web Services Korea
Amazon ElastiCache는 Redis 및 MemCached와 호환되는 완전관리형 서비스로서 현대적 애플리케이션의 성능을 최적의 비용으로 실시간으로 개선해 줍니다. ElastiCache의 Best Practice를 통해 최적의 성능과 서비스 최적화 방법에 대해 알아봅니다.
Internal Architecture of Amazon Aurora (Level 400) - 발표자: 정달영, APAC RDS Speci...Amazon Web Services Korea
ccAmazon Aurora 데이터베이스는 클라우드용으로 구축된 관계형 데이터베이스입니다. Aurora는 상용 데이터베이스의 성능과 가용성, 그리고 오픈소스 데이터베이스의 단순성과 비용 효율성을 모두 제공합니다. 이 세션은 Aurora의 고급 사용자들을 위한 세션으로써 Aurora의 내부 구조와 성능 최적화에 대해 알아봅니다.
[Keynote] 슬기로운 AWS 데이터베이스 선택하기 - 발표자: 강민석, Korea Database SA Manager, WWSO, A...Amazon Web Services Korea
오랫동안 관계형 데이터베이스가 가장 많이 사용되었으며 거의 모든 애플리케이션에서 널리 사용되었습니다. 따라서 애플리케이션 아키텍처에서 데이터베이스를 선택하기가 더 쉬웠지만, 구축할 수 있는 애플리케이션의 유형이 제한적이었습니다. 관계형 데이터베이스는 스위스 군용 칼과 같아서 많은 일을 할 수 있지만 특정 업무에는 완벽하게 적합하지는 않습니다. 클라우드 컴퓨팅의 등장으로 경제적인 방식으로 더욱 탄력적이고 확장 가능한 애플리케이션을 구축할 수 있게 되면서 기술적으로 가능한 일이 달라졌습니다. 이러한 변화는 전용 데이터베이스의 부상으로 이어졌습니다. 개발자는 더 이상 기본 관계형 데이터베이스를 사용할 필요가 없습니다. 개발자는 애플리케이션의 요구 사항을 신중하게 고려하고 이러한 요구 사항에 맞는 데이터베이스를 선택할 수 있습니다.
Demystify Streaming on AWS - 발표자: 이종혁, Sr Analytics Specialist, WWSO, AWS :::...Amazon Web Services Korea
실시간 분석은 AWS 고객의 사용 사례가 점점 늘어나고 있습니다. 이 세션에 참여하여 스트리밍 데이터 기술이 어떻게 데이터를 즉시 분석하고, 시스템 간에 데이터를 실시간으로 이동하고, 실행 가능한 통찰력을 더 빠르게 얻을 수 있는지 알아보십시오. 일반적인 스트리밍 데이터 사용 사례, 비즈니스에서 실시간 분석을 쉽게 활성화하는 단계, AWS가 Amazon Kinesis와 같은 AWS 스트리밍 데이터 서비스를 사용하도록 지원하는 방법을 다룹니다.
Amazon EMR - Enhancements on Cost/Performance, Serverless - 발표자: 김기영, Sr Anal...Amazon Web Services Korea
Amazon EMR은 Apache Spark, Hive, Presto, Trino, HBase 및 Flink와 같은 오픈 소스 프레임워크를 사용하여 분석 애플리케이션을 쉽게 실행할 수 있는 관리형 서비스를 제공합니다. Spark 및 Presto용 Amazon EMR 런타임에는 오픈 소스 Apache Spark 및 Presto에 비해 두 배 이상의 성능 향상을 제공하는 최적화 기능이 포함되어 있습니다. Amazon EMR Serverless는 Amazon EMR의 새로운 배포 옵션이지만 데이터 엔지니어와 분석가는 클라우드에서 페타바이트 규모의 데이터 분석을 쉽고 비용 효율적으로 실행할 수 있습니다. 이 세션에 참여하여 개념, 설계 패턴, 라이브 데모를 사용하여 Amazon EMR/EMR 서버리스를 살펴보고 Spark 및 Hive 워크로드, Amazon EMR 스튜디오 및 Amazon SageMaker Studio와의 Amazon EMR 통합을 실행하는 것이 얼마나 쉬운지 알아보십시오.
Amazon OpenSearch - Use Cases, Security/Observability, Serverless and Enhance...Amazon Web Services Korea
로그 및 지표 데이터를 쉽게 가져오고, OpenSearch 검색 API를 사용하고, OpenSearch 대시보드를 사용하여 시각화를 구축하는 등 Amazon OpenSearch의 새로운 기능과 기능에 대해 자세히 알아보십시오. 애플리케이션 문제를 디버깅할 수 있는 OpenSearch의 Observability 기능에 대해 알아보세요. Amazon OpenSearch Service를 통해 인프라 관리에 대해 걱정하지 않고 검색 또는 모니터링 문제에 집중할 수 있는 방법을 알아보십시오.
Enabling Agility with Data Governance - 발표자: 김성연, Analytics Specialist, WWSO,...Amazon Web Services Korea
데이터 거버넌스는 전체 프로세스에서 데이터를 관리하여 데이터의 정확성과 완전성을 보장하고 필요한 사람들이 데이터에 액세스할 수 있도록 하는 프로세스입니다. 이 세션에 참여하여 AWS가 어떻게 분석 서비스 전반에서 데이터 준비 및 통합부터 데이터 액세스, 데이터 품질 및 메타데이터 관리에 이르기까지 포괄적인 데이터 거버넌스를 제공하는지 알아보십시오. AWS에서의 스트리밍에 대해 자세히 알아보십시오.
Quantum Communications Q&A with Gemini LLM. These are based on Shannon's Noisy channel Theorem and offers how the classical theory applies to the quantum world.
Kief Morris rethinks the infrastructure code delivery lifecycle, advocating for a shift towards composable infrastructure systems. We should shift to designing around deployable components rather than code modules, use more useful levels of abstraction, and drive design and deployment from applications rather than bottom-up, monolithic architecture and delivery.
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - MydbopsMydbops
This presentation, delivered at the Postgres Bangalore (PGBLR) Meetup-2 on June 29th, 2024, dives deep into connection pooling for PostgreSQL databases. Aakash M, a PostgreSQL Tech Lead at Mydbops, explores the challenges of managing numerous connections and explains how connection pooling optimizes performance and resource utilization.
Key Takeaways:
* Understand why connection pooling is essential for high-traffic applications
* Explore various connection poolers available for PostgreSQL, including pgbouncer
* Learn the configuration options and functionalities of pgbouncer
* Discover best practices for monitoring and troubleshooting connection pooling setups
* Gain insights into real-world use cases and considerations for production environments
This presentation is ideal for:
* Database administrators (DBAs)
* Developers working with PostgreSQL
* DevOps engineers
* Anyone interested in optimizing PostgreSQL performance
Contact info@mydbops.com for PostgreSQL Managed, Consulting and Remote DBA Services
UiPath Community Day Kraków: Devs4Devs ConferenceUiPathCommunity
We are honored to launch and host this event for our UiPath Polish Community, with the help of our partners - Proservartner!
We certainly hope we have managed to spike your interest in the subjects to be presented and the incredible networking opportunities at hand, too!
Check out our proposed agenda below 👇👇
08:30 ☕ Welcome coffee (30')
09:00 Opening note/ Intro to UiPath Community (10')
Cristina Vidu, Global Manager, Marketing Community @UiPath
Dawid Kot, Digital Transformation Lead @Proservartner
09:10 Cloud migration - Proservartner & DOVISTA case study (30')
Marcin Drozdowski, Automation CoE Manager @DOVISTA
Pawel Kamiński, RPA developer @DOVISTA
Mikolaj Zielinski, UiPath MVP, Senior Solutions Engineer @Proservartner
09:40 From bottlenecks to breakthroughs: Citizen Development in action (25')
Pawel Poplawski, Director, Improvement and Automation @McCormick & Company
Michał Cieślak, Senior Manager, Automation Programs @McCormick & Company
10:05 Next-level bots: API integration in UiPath Studio (30')
Mikolaj Zielinski, UiPath MVP, Senior Solutions Engineer @Proservartner
10:35 ☕ Coffee Break (15')
10:50 Document Understanding with my RPA Companion (45')
Ewa Gruszka, Enterprise Sales Specialist, AI & ML @UiPath
11:35 Power up your Robots: GenAI and GPT in REFramework (45')
Krzysztof Karaszewski, Global RPA Product Manager
12:20 🍕 Lunch Break (1hr)
13:20 From Concept to Quality: UiPath Test Suite for AI-powered Knowledge Bots (30')
Kamil Miśko, UiPath MVP, Senior RPA Developer @Zurich Insurance
13:50 Communications Mining - focus on AI capabilities (30')
Thomasz Wierzbicki, Business Analyst @Office Samurai
14:20 Polish MVP panel: Insights on MVP award achievements and career profiling
What's Next Web Development Trends to Watch.pdfSeasiaInfotech2
Explore the latest advancements and upcoming innovations in web development with our guide to the trends shaping the future of digital experiences. Read our article today for more information.
INDIAN AIR FORCE FIGHTER PLANES LIST.pdfjackson110191
These fighter aircraft have uses outside of traditional combat situations. They are essential in defending India's territorial integrity, averting dangers, and delivering aid to those in need during natural calamities. Additionally, the IAF improves its interoperability and fortifies international military alliances by working together and conducting joint exercises with other air forces.
How Netflix Builds High Performance Applications at Global ScaleScyllaDB
We all want to build applications that are blazingly fast. We also want to scale them to users all over the world. Can the two happen together? Can users in the slowest of environments also get a fast experience? Learn how we do this at Netflix: how we understand every user's needs and preferences and build high performance applications that work for every user, every time.
GDG Cloud Southlake #34: Neatsun Ziv: Automating AppsecJames Anderson
The lecture titled "Automating AppSec" delves into the critical challenges associated with manual application security (AppSec) processes and outlines strategic approaches for incorporating automation to enhance efficiency, accuracy, and scalability. The lecture is structured to highlight the inherent difficulties in traditional AppSec practices, emphasizing the labor-intensive triage of issues, the complexity of identifying responsible owners for security flaws, and the challenges of implementing security checks within CI/CD pipelines. Furthermore, it provides actionable insights on automating these processes to not only mitigate these pains but also to enable a more proactive and scalable security posture within development cycles.
The Pains of Manual AppSec:
This section will explore the time-consuming and error-prone nature of manually triaging security issues, including the difficulty of prioritizing vulnerabilities based on their actual risk to the organization. It will also discuss the challenges in determining ownership for remediation tasks, a process often complicated by cross-functional teams and microservices architectures. Additionally, the inefficiencies of manual checks within CI/CD gates will be examined, highlighting how they can delay deployments and introduce security risks.
Automating CI/CD Gates:
Here, the focus shifts to the automation of security within the CI/CD pipelines. The lecture will cover methods to seamlessly integrate security tools that automatically scan for vulnerabilities as part of the build process, thereby ensuring that security is a core component of the development lifecycle. Strategies for configuring automated gates that can block or flag builds based on the severity of detected issues will be discussed, ensuring that only secure code progresses through the pipeline.
Triaging Issues with Automation:
This segment addresses how automation can be leveraged to intelligently triage and prioritize security issues. It will cover technologies and methodologies for automatically assessing the context and potential impact of vulnerabilities, facilitating quicker and more accurate decision-making. The use of automated alerting and reporting mechanisms to ensure the right stakeholders are informed in a timely manner will also be discussed.
Identifying Ownership Automatically:
Automating the process of identifying who owns the responsibility for fixing specific security issues is critical for efficient remediation. This part of the lecture will explore tools and practices for mapping vulnerabilities to code owners, leveraging version control and project management tools.
Three Tips to Scale the Shift Left Program:
Finally, the lecture will offer three practical tips for organizations looking to scale their Shift Left security programs. These will include recommendations on fostering a security culture within development teams, employing DevSecOps principles to integrate security throughout the development
Quality Patents: Patents That Stand the Test of TimeAurora Consulting
Is your patent a vanity piece of paper for your office wall? Or is it a reliable, defendable, assertable, property right? The difference is often quality.
Is your patent simply a transactional cost and a large pile of legal bills for your startup? Or is it a leverageable asset worthy of attracting precious investment dollars, worth its cost in multiples of valuation? The difference is often quality.
Is your patent application only good enough to get through the examination process? Or has it been crafted to stand the tests of time and varied audiences if you later need to assert that document against an infringer, find yourself litigating with it in an Article 3 Court at the hands of a judge and jury, God forbid, end up having to defend its validity at the PTAB, or even needing to use it to block pirated imports at the International Trade Commission? The difference is often quality.
Quality will be our focus for a good chunk of the remainder of this season. What goes into a quality patent, and where possible, how do you get it without breaking the bank?
** Episode Overview **
In this first episode of our quality series, Kristen Hansen and the panel discuss:
⦿ What do we mean when we say patent quality?
⦿ Why is patent quality important?
⦿ How to balance quality and budget
⦿ The importance of searching, continuations, and draftsperson domain expertise
⦿ Very practical tips, tricks, examples, and Kristen’s Musts for drafting quality applications
https://www.aurorapatents.com/patently-strategic-podcast.html
Coordinate Systems in FME 101 - Webinar SlidesSafe Software
If you’ve ever had to analyze a map or GPS data, chances are you’ve encountered and even worked with coordinate systems. As historical data continually updates through GPS, understanding coordinate systems is increasingly crucial. However, not everyone knows why they exist or how to effectively use them for data-driven insights.
During this webinar, you’ll learn exactly what coordinate systems are and how you can use FME to maintain and transform your data’s coordinate systems in an easy-to-digest way, accurately representing the geographical space that it exists within. During this webinar, you will have the chance to:
- Enhance Your Understanding: Gain a clear overview of what coordinate systems are and their value
- Learn Practical Applications: Why we need datams and projections, plus units between coordinate systems
- Maximize with FME: Understand how FME handles coordinate systems, including a brief summary of the 3 main reprojectors
- Custom Coordinate Systems: Learn how to work with FME and coordinate systems beyond what is natively supported
- Look Ahead: Gain insights into where FME is headed with coordinate systems in the future
Don’t miss the opportunity to improve the value you receive from your coordinate system data, ultimately allowing you to streamline your data analysis and maximize your time. See you there!
In this follow-up session on knowledge and prompt engineering, we will explore structured prompting, chain of thought prompting, iterative prompting, prompt optimization, emotional language prompts, and the inclusion of user signals and industry-specific data to enhance LLM performance.
Join EIS Founder & CEO Seth Earley and special guest Nick Usborne, Copywriter, Trainer, and Speaker, as they delve into these methodologies to improve AI-driven knowledge processes for employees and customers alike.
How to Avoid Learning the Linux-Kernel Memory ModelScyllaDB
The Linux-kernel memory model (LKMM) is a powerful tool for developing highly concurrent Linux-kernel code, but it also has a steep learning curve. Wouldn't it be great to get most of LKMM's benefits without the learning curve?
This talk will describe how to do exactly that by using the standard Linux-kernel APIs (locking, reference counting, RCU) along with a simple rules of thumb, thus gaining most of LKMM's power with less learning. And the full LKMM is always there when you need it!
4. Relational data warehouse
Massively parallel; Petabyte scale
Fully managed
HDD and SSD Platforms
$1,000/TB/Year; starts at $0.25/hour
Amazon
Redshift
a lot faster
a lot simpler
a lot cheaper
5. Amazon Redshift 아키텍처
Leader Node
Simple SQL end point
Stores metadata
Optimizes query plan
Coordinates query execution
Compute Nodes
Local columnar storage
Parallel/distributed execution of all queries, loads, back
ups, restores, resizes
Start at just $0.25/hour, grow to 2 PB (compressed)
DC1: SSD; scale from 160 GB to 326 TB
DS1/DS2: HDD; scale from 2 TB to 2 PB
Ingestion/Backup
Backup
Restore
JDBC/ODBC
10 GigE
(HPC)
7. Amazon Redshift의 빠른 속도를 위한 구성
Parallel and Distributed
Query
Load
Export
Backup
Restore
Resize
8. ID Name
1 John Smith
2 Jane Jones
3 Peter Black
4 Pat Partridge
5 Sarah Cyan
6 Brian Snail
1 John Smith
4 Pat Partridge
2 Jane Jones
5 Sarah Cyan
3 Peter Black
6 Brian Snail
Amazon Redshift의 빠른 속도를 위한 구성
Distribution Keys
11. 적합한 데이터 타입의 선택
Redshift performance is about efficient I/O
Don’t make columns wider than necessary, e.g.:
• Avoid BIGINT for country identifier
• Avoid CHAR(MAX) for country names
• Oversizing VARCHAR impact loading and runtime performance
Use appropriate types
• Use TIMESTAMP or DATE instead of CHAR
• Use CHAR instead of VARCHAR when appropriate
Multibyte Characters
• VARCHAR data type supports UTF-8 multibyte characters up to a maximum of four bytes
• The CHAR data type does not support multibyte characters
12. 아키텍처 및 스키마 설계 고려사항
• Redshift is a distributed system:
– A compute node contains slices
(one per core)
– A slice contains data
• Queries run on all slices in parallel:
optimal query throughput can be
achieved when data is evenly spread
across slices
13. 테이블 분산 타입
Distribution Key All
Node 1
Slice 1 Slice 2
Node 2
Slice 3 Slice 4
Node 1
Slice 1 Slice 2
Node 2
Slice 3 Slice 4
All data on
every node
Same key to same location
Node 1
Slice 1 Slice 2
Node 2
Slice 3 Slice 4
Even
Round robin
distribution
14. 분산 타입의 선택
Choose a distribution style of KEY for
• Large data tables, like a FACT table in a star schema
• Large or rapidly changing tables used in joins or aggregations
• Improved performance even if the key is not used in join column
Choose a distribution style of ALL for tables that
• Have slowly changing data
• Reasonable size (i.e., few millions but not 100’s of millions of rows)
• No common distribution key for frequent joins
• Typical use case – joined dimension table without a common distri
bution key
Choose a distribution style of EVEN for tables that are not joined
and have no aggregate queries
16. Node 1
Slice 1 Slice 2
Node 2
Slice 3 Slice 4
user_profile
user_id=1234
name=janet
…
user_profile
user_id=6789
name=fred
…
cloudfront
uri = /imgs/ad1.png
user_id=2345
…
user_profile
user_id=2345
name=bill
…
cloudfront
uri=/games/g10.exe
user_id=4312
…
user_profile
user_id=4312
name=fred
…
order_line
order_line_id = 25693
…
Distribution Keys determine which data resides on which slices
cloudfront
uri = /games/g1.exe
user_id=1234
…
cloudfront
uri = /img/ad_5.img
user_id=1234
…
Records with same distribu
tion key for a table are on t
he same slice
Distribution Keys 활용 데이터 분산
17. Node 1
Slice 1 Slice 2
cloudfront
uri = /games/g1.exe
user_id=1234
…
user_profile
user_id=1234
name=janet
…
cloudfront
uri = /imgs/ad1.png
user_id=2345
…
user_profile
user_id=2345
name=bill
…
order_line
order_line_id = 25693
…
cloudfront
uri = /img/ad_5.img
user_id=1234
…
Records from other tables
with the same distribution k
ey value are also on the sa
me slice
Records with same distribu
tion key for a table are on t
he same slice
Distribution Keys help with data locality for join evaluation
Node 2
Slice 3 Slice 4
user_profile
user_id=6789
name=fred
…
cloudfront
uri=/games/g10.exe
user_id=4312
…
user_profile
user_id=4312
name=fred
…
Distribution Keys 활용 데이터 분산
18. Node 1
Slice 1 Slice 2
Node 2
Slice 3 Slice 4
cloudfront
uri = /games/g1.exe
user_id=1234
…
cloudfront
uri = /imgs/ad1.png
user_id=2345
…
cloudfront
uri=/games/g10.exe
user_id=4312
…
cloudfront
uri = /img/ad_5.img
user_id=1234
…
2M records
5M records
1M records
4M records
Poor key choices lead to uneven distribution of records…
Distribution Keys 활용 데이터 분산
19. Node 1
Slice 1 Slice 2
Node 2
Slice 3 Slice 4
cloudfront
uri = /games/g1.exe
user_id=1234
…
cloudfront
uri = /imgs/ad1.png
user_id=2345
…
cloudfront
uri=/games/g10.exe
user_id=4312
…
cloudfront
uri = /img/ad_5.img
user_id=1234
…
2M records
5M records
1M records
4M records
Unevenly distributed data cause processing imbalances!
Distribution Keys 활용 데이터 분산
20. Node 1
Slice 1 Slice 2
Node 2
Slice 3 Slice 4
cloudfront
uri = /games/g1.exe
user_id=1234
…
cloudfront
uri = /imgs/ad1.png
user_id=2345
…
cloudfront
uri=/games/g10.exe
user_id=4312
…
cloudfront
uri = /img/ad_5.img
user_id=1234
…
2M records2M records 2M records 2M records
Evenly distributed data improves query performance
Distribution Keys 활용 데이터 분산
21. Distribution Key의 선택
Goal
• Distribute data evenly across nodes
• Minimize data movement: Co-located Joins & Aggregates
Best Practice
• Use the joined columns for largest commonly joined tables as key (example: f
act table and large dimension table)
• Consider using Group By column as a key (GROUP BY clause)
• Never use a distribution key that causes severe data skew
• Choose a key with high cardinality; large number of discrete values
Avoid
• Keys used as equality filter as your distribution key (Concentrates processing o
n one node)
22. Data의 정렬
The sort key helps Redshift minimize I/O
• For example, a table sorted on timestamp and queried on date ra
nge will skip all blocks not in the query range
In the slices (on disk), the data is sorted by a sort key
• If no sort key exists Redshift uses the data insertion order
Choose a sort key that is frequently used in your queries
• Primarily as a query predicate (date, identifier, …)
• Optionally choose a column frequently used for aggregates
• Optionally choose same as distribution key column for most effi
cient joins (merge join)
Don’t use too many columns per table as sort keys
23. SELECT SUM( S.Price * S.Quantity )
FROM SALES S
JOIN CATEGORY C ON C.ProductId = S.ProductId
JOIN FRANCHISE F ON F.FranchiseId = S.FranchiseId
Where C.CategoryId = ‘Produce’ And F.State = ‘WA’
AND S.Date Between ‘1/1/2015’ AND ‘1/31/2015’
예제 – Distribution and Sort Keys
Sort key (S) = Date
-- Total Products sold in Washington in January 2015
Dist key (F) = FranchiseID
Dist key (S) = ProductID
Dist key (C) = ProductID
24. 쿼리를 위한 데이터베이스 최적화
Make sure your columns are compressed appropriately
Co-locate frequently joined tables using distribution keys or distribution
all to avoid data transfers between nodes
For joined tables consider using sort keys on the joined columns, allowin
g fast merge joins
Compression allows you to de-normalize without penalizing storage, simpli
fying queries and limiting joins
Vacuum and Analyze regularly
26. 데이터 로딩 프로세스
Data Source Extraction Transformation Loading
Amazon R
edshift
Target
Focus
27. Amazon Redshift 데이터 로딩 개요
AWS CloudCorporate Data center
Amazon
DynamoDB
Amazon S3
Data
Volume
Amazon Elastic
MapReduce
Amazon
RDS
Amazon Redsh
ift
Amazon
Glacier
logs / files
Source DBs
VPN
Connection
AWS Direct Co
nnect
S3 Multipart Upl
oad
AWS Import/ Ex
port
EC2 or On-Pre
m (using SSH)
Database Migration
Service
Kinesis
AWS Lambda
AWS Datapipeline
28. Amazon S3로 파일 업로드
Amazon Redsh
iftmydata
Client.txt
Corporate Data center
Region
Ensure that your d
ata resides in the s
ame Region as you
r Redshift clusters
Split the data into
multiple files to fac
ilitate parallel proc
essing
Optionally, you can
encrypt your data
using Amazon S3
Server-Side or Clie
nt-Side Encryption
Client.txt.1
Client.txt.2
Client.txt.3
Client.txt.4
Files should be ind
ividually compress
ed using GZIP or L
ZOP
29. Amazon S3에서 데이터 로드
Preparing Input Data Files
Uploading files to Amazon S3
Using COPY to load data from Amazon S3
30. 구분자(Delimiters)를 사용한 입력 데이터
1|Customer#000000001|j5JsirBM9P|MOROCCO 0|MOROCCO|AFRICA|25-989-741-2988|BUILDING
2|Customer#000000002|487LW1dovn6Q4dMVym|JORDAN 1|JORDAN|MIDDLE EAST|23-768-687-3665|AUTOMOBILE
3|Customer#000000003|fkRGN8n|ARGENTINA7|ARGENTINA|AMERICA|11-719-748-3364|AUTOMOBILE
4|Customer#000000004|4u58h f|EGYPT 4|EGYPT|MIDDLE EAST|14-128-190-5944|MACHINERY
Example of pipe (‘|’) delimited file
CREATE TABLE customer (
c_custkey integer not null,
c_name varchar(25) not null,
c_address varchar(25) not null,
c_city varchar(10) not null,
c_nation varchar(15) not null,
c_region varchar(12) not null,
c_phone varchar(15) not null,
c_mktsegment varchar(10) not null
);
Copy customer from ‘s3://mydata/client.txt’
Credentials ‘aws_access_key_id=<your-access-key>; aws_secret_access_key=<your_secret_key>’
Delimiter ‘|’;
31. Fixed-width를 사용한 입력 데이터
1 RFK 900 Columbus MOROCCO MOROCCO AFRICA 25-989-741-2988 BUILDING
2 JFK 800 Washington JORDAN JORDAN MIDDLE EAST 23-768-687-3665 AUTOMOBILE
3 LBJ 700 Foxborough ARGENTINA ARGENTINA AMERICA 11-719-748-3364 AUTOMOBILE
4 GWB 600 Kansas EGYPT EGYPT MIDDLE EAST 14-128-190-5944 MACHINERY
CREATE TABLE customer (
c_custkey integer not null,
c_name varchar(25) not null,
c_address varchar(25) not null,
c_city varchar(10) not null,
c_nation varchar(15) not null,
c_region varchar(12) not null,
c_phone varchar(15) not null,
c_mktsegment varchar(10) not null
);
Copy customer from ‘s3://mydata/client.txt’
Credentials ‘aws_access_key_id=<your-access-key>;
aws_secret_access_key=<your_secret_key>’
fixedwidth ‘0:3, 1:25, 2:25, 3:10, 4:15, 5:12, 6:15, 7:10
’;
Client.txt
32. JSON 포맷을 사용한 입력 데이터
COPY uses a jsonpaths text file to parse JSON data
JSONPath expressions specify the path to JSON name elements
Each JSONPath expression corresponds to a column in the Amazon Redshift t
arget table
Suppose you want to load the VENUE table with the following content
{ "id": 15, "name": "Gillette Stadium", "location": [ "Foxborough", "MA" ],
"seats": 68756 } { "id": 15, "name": "McAfee Coliseum", "location": [
"Oakland", "MA" ], "seats": 63026 }
You would use the following jsonpaths file to parse the JSON data.
{ "jsonpaths": [ "$['id']", "$['name']", "$['location'][0]",
"$['location'][1]", "$['seats']" ] }
33. 데이터 파일의 분할
Slice 0
Slice 1
Slice 0
Slice 1
Client.txt.1
Client.txt.2
Client.txt.3
Client.txt.4
Node 0
Node 1
2 XL Compute Nodes
Copy customer from ‘s3://mydata/client.txt’
Credentials ‘aws_access_key_id=<your-access-key>; aws_secret_access_key=<your_secret_key>’
Delimiter ‘|’;
mydata
34. Use the COPY command
Each slice can load one file at a time
A single input file means only one slice i
s ingesting data
Instead of 100MB/s, you’re only getting
6.25MB/s
쓰루풋 최대 활용을 위해 복수의 입력 파일 사용
35. Use the COPY command
You need at least as many input fil
es as you have slices
With 16 input files, all slices are wor
king so you maximize throughput
Get 100MB/s per node; scale linearly
as you add nodes
쓰루풋 최대 활용을 위해 복수의 입력 파일 사용
36. Manifest Files 사용한 데이터 로드
Use manifest to loads all required files
Supply JSON-formatted text file that lists the files to be loaded
Can load files from different buckets or wit different prefix
{
"entries": [
{"url":"s3://mybucket-alpha/2013-10-04-custdata", "mandatory":true},
{"url":"s3://mybucket-alpha/2013-10-05-custdata", "mandatory":true},
{"url":"s3://mybucket-beta/2013-10-04-custdata", "mandatory":true},
{"url":"s3://mybucket-beta/2013-10-05-custdata", "mandatory":true}
]
}
37. AWS Database Migration Service (AWS DMS)
Supports both homogenous and heterogeneous data replication.
Supported database sources include:
(1) Oracle, (2) SQL Server, (3) MySQL, (4) Amazon Aurora, (5) PostgreSQL, and
(6) ODBC. All sources are supported on-premises, in EC2, and RDS.
Supported database targets include:
(1) Amazon Aurora, (2) Oracle, (3) SQL Server, (4) MySQL, (5) PostgreSQL, and
(6) Amazon Redshift. All Oracle, SQL Server, MySQL and Postgres targets are
supported on-premises, in EC2 and RDS.
Keep your apps running during the migration
38. Customer
Premises
Application Users
AWS
Internet
VPN
Start a replication instance
Connect to source and target databases
Select tables, schemas, or databases
Let AWS Database Migration Service
create tables, load data, and keep
them in sync
Switch applications over to the target
at your convenience
AWS DMS – 온라인 마이그레이션
AWS
Database Migration Service