SlideShare a Scribd company logo
premiseo.com dataredkite.com
26/02/2021 Meetup Azure Lille 1
Azure Data
dataredkite.com
premiseo.com
Who are we ?
26/02/2021 2
I'm a data and cloud Architect and Spark lover.
I worked many years as an Oracle consultant and
expert, and now I work with Cloud solutions devoted to
solve complex problems with high volumes of data.
I am a Data Analyst & Solution Architect indepedent -
☁️ MCSE, Cosmos DB & Delta lover.
I developed my skills through various clients' projects,
teaching at the University and personal proof of
concepts.
I’m also the Co-Founder of DataRedkite, a product which
can quickly give to its user a good management of data
in Microsoft Azure DataLake.
Laurent Leturgez Alexandre Bergere
Meetup Azure Lille
dataredkite.com
premiseo.com
Summary
26/02/2021 Meetup Azure Lille 3
Relational Databases NoSQL Databases Big Data Storage
Data Big Data Streaming
Storage :
Compute :
premiseo.com dataredkite.com
Storage
4
premiseo.com dataredkite.com
26/02/2021 Meetup Azure Lille 5
Relational Databases
Managed relational SQL Database
as a service
Azure SQL Database
Managed MariaDB database
service for app developers
Azure Database for MariaDB
Managed MySQL database service
for app developers
Azure Database for MySQL
Managed Postgres database
service for app developers
Azure Database for PostGres
premiseo.com dataredkite.com
26/02/2021 Meetup Azure Lille 6
Relational Databases
Managed relational SQL Database
as a service
Azure SQL Database
Managed MySQL database service
for app developers
Azure Database for MySQL
Managed MariaDB database
service for app developers
Azure Database for MariaDB
Managed Postgres database
service for app developers
Azure Database for PostGres
premiseo.com dataredkite.com
26/02/2021 Meetup Azure Lille 7
Relational Databases
Managed relational SQL Database
as a service
Azure SQL Database Azure SQL Database
dataredkite.com
premiseo.com
Azure SQL Database
27/04/2021 Meetup Azure Lille 8
• Azure SQL
• SQL Server Paas service
• Managed upgrades, patches, backups and monitoring
• Latest Stable version of SQL Server
• 99,99% availability
• Deployment model
• Single Database : database runs on non shared resources
• Elastic Pool : database runs with a collection of databases that share set of resources at a
predictable price
dataredkite.com
premiseo.com
Azure SQL Database
27/04/2021 Meetup Azure Lille 9
• Azure SQL
• Purchasing model
• DTU (Database Transaction Unit) : https://docs.microsoft.com/en-us/azure/azure-sql/database/service-tiers-
dtu
• Basic tier
• Standard Tier
• Premium Tier
• vCore model
• Serverless
• Service Tier
• General Purpose (vCore) / Standard (DTU) : Common workloads
• Business Critical (vCore) / Premium (DTU) : High transaction and availability / low latency IO
• HyperScale (vCore) :
• Up to 100Tb Database
• Rapid Scale up (compute resources)
• Rapid Scale out (read only nodes : read workload / hot-standby)
dataredkite.com
premiseo.com
Azure SQL Database
27/04/2021 Meetup Azure Lille 10
• Azure SQL Managed Instance
• Features
• Paas platform for lift and shift at scale
• Broadest SQL Server engine compatibility (network integration, features etc.)
• With perservation of all Paas capabilities (patching, updates, backups, HA etc.)
• vCore purchase model only
• BYOL available
• SQL Virtual Machine
• SQL Server deployment on VM (Linux and Windows)
• Can choice SQL Server version
• From 2008 R2
• Up to 2019
dataredkite.com
premiseo.com
Azure SQL Database
27/04/2021 Meetup Azure Lille 11
Azure SQL Database
Managed Instance
Instance scoped model with
high compatibility to SQL Server
Best for modernisation at scale
with low cost effort (lift & shift)
Single
Standalone managed database
for predictable and stable
workloads
Elastic Pool
Shared resources model :
multitenant
premiseo.com dataredkite.com
26/02/2021 Meetup Azure Lille 12
Relational Databases
Managed relational SQL Database
as a service
Azure SQL Database
Managed MySQL database service
for app developers
Azure Database for MySQL
Managed MariaDB database
service for app developers
Azure Database for MariaDB
Managed Postgres database
service for app developers
Azure Database for PostGre
dataredkite.com
premiseo.com
Azure Database for PostgreSQL
27/04/2021 Meetup Azure Lille 13
• Paas Service for PostgreSQL
• Runs on Windows
• Single Server
• v9.5 to 11
• Up to 64 vCores depending on SKU (https://docs.microsoft.com/en-us/azure/postgresql/concepts-pricing-
tiers)
• Up to 2 for Basic SKU
• Up to 64 for General Purpose SKU
• Up to 32 for Memory Optimized SKU
• Bunch of PG Extensions available
• Automated Backup (retention up to 35days)
• Backup frequency and backup types depend on database size
• Geo-redundant backup option (General Purpose & Memory Optimized)
dataredkite.com
premiseo.com
Azure Database for PostgreSQL
27/04/2021 Meetup Azure Lille 14
• Paas Service for PostgreSQL
• HyperScale (Citus)
• High performance and analytical workloads beyond 100Gb
• Hyperscale delivers
• Horizontal scaling across multiple machine (with Sharding)
• Query parallelization across these servers
• High performance for analytics
• Based on server groups
• Design approach required for table distribution and performance
• Distributed tables (based on distribution column)
• Reference tables (content concentrated into a single shard replicated on every worker node)
• Local tables (ordinary unsharded tables. Perfect for small tables not involded into joins)
• Automated backup through storage snapshots
dataredkite.com
premiseo.com
Azure Database for PostgreSQL
27/04/2021 Meetup Azure Lille 15
• Paas Service for PostgreSQL
• Flexible Server (Preview)
• Automated patching
• Automatic backups
• Performance adjustment in three switchable compute tiers : Burstable, GP, Memory Optimized
High Availability Zone Redundant HA (Optional)
premiseo.com dataredkite.com
26/02/2021 Meetup Azure Lille 16
Relational Databases
Managed relational SQL Database
as a service
Azure SQL Database
Managed MariaDB database
service for app developers
Azure Database for MariaDB
Managed Postgres database
service for app developers
Azure Database for PostGre
Managed MySQL database service
for app developers
Azure Database for MySQL
dataredkite.com
premiseo.com
Azure Database for MariaDB
27/04/2021 Meetup Azure Lille 17
• Paas Service for MariaDB
• Runs on Windows
• Single Server
• V10.2 and 10.3
• Up to 64 vCores depending on SKU (https://docs.microsoft.com/en-us/azure/mariadb/concepts-pricing-tiers)
• Up to 2 for Basic SKU
• Up to 64 for General Purpose SKU
• Up to 32 for Memory Optimized SKU
• Automated Backup (retention up to 35days)
• Backup frequency and backup types depend on database size
• Geo-redundant backup option (General Purpose & Memory Optimized)
premiseo.com dataredkite.com
26/02/2021 Meetup Azure Lille 18
Relational Databases
Managed relational SQL Database
as a service
Azure SQL Database
Managed MariaDB database
service for app developers
Azure Database for MariaDB
Managed Postgres database
service for app developers
Azure Database for PostGre
Managed MySQL database service
for app developers
Azure Database for MySQL
dataredkite.com
premiseo.com
Azure Database for MySQL
27/04/2021 Meetup Azure Lille 19
• Paas Service for MySQL
• Runs on Windows
• Single Server
• V5.6, 5.7, and 8.0
• Up to 64 vCores depending on SKU (https://docs.microsoft.com/en-us/azure/mysql/concepts-pricing-tiers)
• Up to 2 for Basic SKU
• Up to 64 for General Purpose SKU
• Up to 32 for Memory Optimized SKU
• Automated Backup (retention up to 35days)
• Backup frequency and backup types depend on database size
• Geo-redundant backup option (General Purpose & Memory Optimized)
dataredkite.com
premiseo.com
Azure Database for MySQL
27/04/2021 Meetup Azure Lille 20
• Paas Service for MySQL
Flexible Server (Preview)
• V5.7
• Automated patching
• Automatic backups
• Performance adjustment in three switchable compute tiers : Burstable, GP, Memory Optimized
• Network Isolation
• Private Access through Vnet integration
• Public Access
dataredkite.com
premiseo.com
Azure Database for MySQL
27/04/2021 Meetup Azure Lille 21
• Paas Service for MySQL
Flexible Server (Preview)
High Availability Zone Redundant HA (Optional)
premiseo.com dataredkite.com
26/02/2021 Meetup Azure Lille 22
NOSQL Databases
Globally distributed, multi-model
database for any scale
Azure Cosmos DB
dataredkite.com
premiseo.com
Azure Cosmos DB
26/02/2021 23
A globally distributed, massively scalable, multi-model database service
Azure Cosmos DB
o SQL API
o MongoDB API
o Cassandra API
o Gremlin API
o Table API
dataredkite.com
premiseo.com
Azure Cosmos DB
26/02/2021 24
Throughput
Cosmic Notes
premiseo.com dataredkite.com
26/02/2021 Meetup Azure Lille 25
Big Data
Storage
REST-based object storage for
unstructured data
Storage Account
Massively scalable, secure data
lake functionality built on Azure
Blob Storage
Azure Data Lake Storage
premiseo.com dataredkite.com
26/02/2021 Meetup Azure Lille 26
Big Data
Storage
REST-based object storage for
unstructured data
Storage Account
Massively scalable, secure data
lake functionality built on Azure
Blob Storage
Azure Data Lake Storage
dataredkite.com
premiseo.com
Storage Account
26/02/2021 27
o Azure Blobs : A scalable object store for text and binary data
o Azure Files : Managed file shares for cloud or on-premises deployments
o Azure Queues : A messaging store for reliable messaging between application components
o Azure Tables : A NoSQL store for no-schema storage of structured data
Azure Storage accounts are the base storage type within Azure. Azure Storage offers a very scalable object store for data
objects and file system services in the cloud. It can also provide a messaging store for reliable messaging, or it can act as a
NoSQL store.
Azure selected four of these data services and placed them together under the name Azure Storage. The four services are
Azure Blobs, Azure Files, Azure Queues, and Azure Tables. The following illustration shows the elements of Azure Storage
dataredkite.com
premiseo.com
Storage Account
26/02/2021 28
Type of Storage Account
Storage account type Services Redundancy options
General-purpose V2 Basic storage account type for blobs, files, queues, and tables. Recommended
for most scenarios using Azure Storage.
LRS, GRS, RA-GRS, ZRS, GZRS,
RA-GZRS
General-purpose V1 Legacy account type for blobs, files, queues, and tables. Use general-purpose
v2 accounts instead when possible.
LRS, GRS, RA-GRS
BlockBlobStorage Storage accounts with premium performance characteristics for block blobs
and append blobs. Recommended for scenarios with high transactions rates, or
scenarios that use smaller objects or require consistently low storage latency.
LRS, ZRS
FileStorage Files-only storage accounts with premium performance characteristics.
Recommended for enterprise or high performance scale applications.
LRS, ZRS
BlobStorage Legacy Blob-only storage accounts. Use general-purpose v2 accounts instead
when possible.
LRS, GRS, RA-GRS
dataredkite.com
premiseo.com
Replication Options
27/04/2021 29
dataredkite.com
premiseo.com
Replication Strategy
27/04/2021 30
premiseo.com dataredkite.com
26/02/2021 Meetup Azure Lille 31
Big Data
Storage
REST-based object storage for
unstructured data
Storage Account
Massively scalable, secure data
lake functionality built on Azure
Blob Storage
Azure Data Lake Storage
dataredkite.com
premiseo.com
Azure Datalake Store
26/02/2021 32
Azure Data Lake Storage is a Hadoop-compatible data repository that can store any size or type of data. This storage
service is available as Generation 1 (Gen1) or Generation 2 (Gen2).
Key features of Data Lake Storage:
o Unlimited scalability
o Hadoop compatibility
o Security support for both access control lists (ACLs) & RBAC (for Gen 2 only)
o POSIX compliance
o An optimized Azure Blob File System (ABFS) driver that's designed for big-data analytics
o Zone-redundant storage
o Geo-redundant storage
Azure Datalake Gen 1 Azure Datalake Gen 2
dataredkite.com
premiseo.com
Choose a storage solution on Azure
26/02/2021 33
Data classification Operations Latency & throughput Transactional support Recommended service
Product catalog data Semi-structured because of
the need to extend or modify
the schema for new products
o Customers require a high
number of read operations,
with the ability to query on
many fields within the
database.
o The business requires a
high number of write
operations to track the
constantly changing
inventory.
High throughput and low
latency
Required Azure Cosmos DB
Photos and videos Unstructured o Only need to be retrieved
by ID.
o Customers require a high
number of read operations
with low latency.
o Creates and updates will be
somewhat infrequent and
can have higher latency
than read operations.
Retrievals by ID need to
support low latency and high
throughput. Creates and
updates can have higher
latency than read operations.
Not required Azure Blob storage
Business data Structured Read-only, complex analytical
queries across multiple
databases
Some latency in the results is
expected based on the
complex nature of the queries
Required Azure SQL Database
Azure Database for MariaDB
Azure Database for PostGre
Azure Database for MySQL
premiseo.com dataredkite.com
Compute
34
premiseo.com dataredkite.com
26/02/2021 Meetup Azure Lille 35
Data
Compute
Managed data-integration solution
Data Factory
Process events with serverless
code
Azure Functions
premiseo.com dataredkite.com
26/02/2021 Meetup Azure Lille 36
Data
Compute
Managed data-integration solution
Data Factory
Process events with serverless
code
Azure Functions
dataredkite.com
premiseo.com
Azure Function
37
Azure Functions is the serverless compute service from Microsoft. Functions are event-driven: each function defines a
trigger — the exact definition of the event source, for instance, the name of a storage queue.
Uses cases:
If you want to... then...
Build a web API Implement an endpoint for your web applications using the HTTP trigger
Process file uploads Run code when a file is uploaded or changed in blob storage
Build a serverless workflow Chain a series of functions together using durable functions
Respond to database changes Run custom logic when a document is created or updated in Cosmos DB
Run scheduled tasks Execute code at set times
Create reliable message queue systems Process message queues using Queue Storage, Service Bus, or Event Hubs
dataredkite.com
premiseo.com
Azure Function
38
Consumption Plan Functions
Consumption Plan (B1, B2, B3, S1, S2, S3
Scale automatically and only pay for compute resources when your functions are running. On
the Consumption plan, instances of the Functions host will be dynamically added and
removed based on the number of incoming events.
Premium plan (P1v2, P2v2, P3v3)
While automatically scaling based on demand, use prewarmed workers to run applications
with no delay after being idle, run on more powerful instances and connect to VNETs.
Azure App Service plan
Run Functions within an App Service plan at regular App Service plan rates. Good fit for long-
running operations, as well as when more predictive scaling and costs are required.
Azure Functions hosting options : Azure Plan
dataredkite.com
premiseo.com
27/04/2021 39
Durable Functions is a library that brings workflow orchestration abstractions to Azure Functions. It introduces a number of idioms and tools
to define stateful, potentially long-running operations, and manages a lot of mechanics of reliable communication and state management
behind the scenes.
Log of events in the course of orchestrator
progression
3 steps of a workflow executed in sequence
https://medium.com/hackernoon/making-sense-of-azure-durable-functions-
645ecb3c1d58
Azure Function
Azure Durable Functions
premiseo.com dataredkite.com
26/02/2021 Meetup Azure Lille 40
Data
Compute
Managed data-integration solution
Data Factory
Process events with serverless
code
Azure Functions
dataredkite.com
premiseo.com
Azure Data Factory
27/04/2021 Meetup Azure Lille 41
• Serverless Data Integration service
• Data Pipeline : logical group of activities
• Data Flow : Data Transformation activity
• Data Copy : Data Transfer activity
• SSIS Integration
• Git integration
dataredkite.com
premiseo.com
Azure Data Factory
27/04/2021 Meetup Azure Lille 42
• Serverless Data Integration service
• Job scheduling
• Automatically through internal Scheduler
• Manually
• SDK : .NET, Python
• REST API
• PowerShell
dataredkite.com
premiseo.com
Azure Data Factory
27/04/2021 Meetup Azure Lille 43
• Serverless Data Integration service
• Integration runtime
• Compute infrastructure used by ADF to provide data integration
• Azure : Serverless
• Self Hosted : Onprem or Azure Virtual Machine (Windows)
• SSIS
Activity Features
Azure Data Flow
Data Copy
Dispatch Activity (HDI, Databricks, SQL …)
Cloud to Cloud data transfer/flows
Self-Hosted Data Flow
Data Copy
Dispatch Activity (HDI, Databricks, SQL …)
OnPrem or Virtual Machine deployment (Windows)
OnPrem <-> Cloud data transfer/flows
When connectors are not available
SSIS SSIS Package execution Private or public Network
premiseo.com dataredkite.com
26/02/2021 Meetup Azure Lille 44
Big Data
Compute
Fast, easy, and collaborative
Apache Spark-based analytics
platform
Azure Databricks
HDInsight supports the latest open
source projects from the Apache
Hadoop and Spark ecosystems.
Azure HDInsight
Managed Enterprise
Datawarehouse and BigData
Analytics service
Azure Synapse Analytics
premiseo.com dataredkite.com
26/02/2021 Meetup Azure Lille 45
Big Data
Compute
Fast, easy, and collaborative
Apache Spark-based analytics
platform
Azure Databricks
HDInsight supports the latest open
source projects from the Apache
Hadoop and Spark ecosystems.
Azure HDInsight
Managed Enterprise
Datawarehouse and BigData
Analytics service
Azure Synapse Analytics
dataredkite.com
premiseo.com
Azure Databricks
26/02/2021 46
dataredkite.com
premiseo.com
Azure Databricks
26/02/2021 47
Azure Databricks is a data analytics platform optimized for the Microsoft Azure cloud services platform. Azure Databricks
offers two environments for developing data intensive applications:
o Azure Databricks Workspace: provides an interactive workspace that enables collaboration between data engineers,
data scientists, and machine learning engineers.
o Azure Databricks SQL Analytics: provides an easy-to-use platform for analysts who want to run SQL queries on their
data lake, create multiple visualization types to explore query results from different perspectives, and build and share
dashboards.
premiseo.com dataredkite.com
26/02/2021 Meetup Azure Lille 48
Big Data
Compute
Fast, easy, and collaborative
Apache Spark-based analytics
platform
Azure Databricks
HDInsight supports the latest open
source projects from the Apache
Hadoop and Spark ecosystems.
Azure HDInsight
Managed Enterprise
Datawarehouse and BigData
Analytics service
Azure Synapse Analytics
dataredkite.com
premiseo.com
Azure HDInsights
27/04/2021 Meetup Azure Lille 49
• Managed Hadoop distribution for Azure
• Based on Cloudera Hortonworks hadoop distribution
• Comes in various flavours / shapes (VM shapes and number)
• Hadoop : General purpose (HDFS, Yarn, MapReduce, Hive, Pig, Sqoop, Oozie)
• Spark
• Kafka
• HBase
• Hive / LLAP (Interactive Query)
• Storm (Stream processing)
• ML Services with R
dataredkite.com
premiseo.com
Azure HDInsights
27/04/2021 Meetup Azure Lille 50
• At least one Storage account mandatory (for libs and binaries)
• External Metastores available for Ambari, Hive and Oozie
• HDInsights architecture
premiseo.com dataredkite.com
26/02/2021 Meetup Azure Lille 51
Big Data
Compute
Fast, easy, and collaborative
Apache Spark-based analytics
platform
Azure Databricks
HDInsight supports the latest open
source projects from the Apache
Hadoop and Spark ecosystems.
Azure HDInsigth
Managed Enterprise
Datawarehouse and BigData
Analytics service
Azure Synapse Analytics
dataredkite.com
premiseo.com
Azure Synapse Analytics
27/04/2021 Meetup Azure Lille 52
Stream Analytics
Data Factory
Data Lake
Modern Analytics
MPP
Datawarehouse
dataredkite.com
premiseo.com
Azure Synapse Analytics
27/04/2021 Meetup Azure Lille 53
MPP
Datawarehou
se
Choice of language (T-SQL, Spark
SQL, Python, Scala, .Net)
Analytics ready (Analysis Services,
Power BI)
Data Science and AI Ready (Azure
Machine Learning integration)
dataredkite.com
premiseo.com
Azure Synapse Analytics
27/04/2021 Meetup Azure Lille 54
Synapse Analytics
• Sample Use Case : Pure Business Intelligence !
dataredkite.com
premiseo.com
Azure Synapse Analytics
27/04/2021 Meetup Azure Lille 55
• Not for small database (Usually > 1Tb)
• Cost Model
• Synapse Provisioned
• T-SQL Pool with DWU (Datawarehouse Units)
• Storage (Geo redundant option)
• Synapse Serverless
• Spark Pools
• Synapse Pipeline
dataredkite.com
premiseo.com
Azure Synapse Analytics
27/04/2021 Meetup Azure Lille 56
• Architecture
• DMS (Data Movement Service)
• Used for Data Colocation
• Key point: Data Partitioning
and Data Distribution
dataredkite.com
premiseo.com
Azure Synapse Analytics
27/04/2021 Meetup Azure Lille 57
• Hash distributed table • Replicated Table • Round Robin
distributed Table
• Example
• Dimension to Fact table join
premiseo.com dataredkite.com
26/02/2021 Meetup Azure Lille 58
Big Data
Streaming
Real-time data stream processing
from millions of IoT devices
Azure Stream Analytics
Connect, monitor and manage
billions of IoT assets
Azure IoT Hub
Real-time data stream with Kafka
Azure HDInsigth & Kafka
Use Spark Streaming with
Databricks
Spark Streaming with Databricks
premiseo.com dataredkite.com
26/02/2021 Meetup Azure Lille 59
Big Data
Streaming
Real-time data stream processing
from millions of IoT devices
Azure Stream Analytics
Connect, monitor and manage
billions of IoT assets
Azure IoT Hub
Real-time data stream with Kafka
Azure HDInsigth & Kafka
Use Spark Streaming with
Databricks
Spark Streaming with Databricks
dataredkite.com
premiseo.com
Azure Streaming Analytics
26/02/2021 60
dataredkite.com
premiseo.com
Azure Streaming Analytics
26/02/2021 61
o Azure Stream Analytics supports user-defined functions (UDF) or user-defined aggregates (UDA) in JavaScript for cloud jobs and C# for IoT
Edge jobs
UDFs, UDAs, and custom deserializers:
o Analyze real-time telemetry streams from IoT devices
o Web logs/clickstream analytics
o Geospatial analytics for fleet management and driverless vehicles
o Remote monitoring and predictive maintenance of high value assets
o Real-time analytics on Point of Sale data for inventory control and anomaly detection
Examples scenarios:
premiseo.com dataredkite.com
26/02/2021 Meetup Azure Lille 62
Big Data
Streaming
Real-time data stream processing
from millions of IoT devices
Azure Stream Analytics
Connect, monitor and manage
billions of IoT assets
Azure IoT Hub
Real-time data stream with Kafka
Azure HDInsigth & Kafka
Use Spark Streaming with
Databricks
Spark Streaming with Databricks
dataredkite.com
premiseo.com
Azure Iot Hub
63
Azure IoT Hub :
o The cloud gateway that connects IoT devices to gather data and drive business insights and automation.
o The big data streaming service of Azure. It is designed for high throughput data streaming scenarios where customers
may send billions of requests per day.
o Bi-directional communication capabilities
dataredkite.com
premiseo.com
Iot Hub or Event Hubs
64
IoT Hub was developed to address the unique requirements of connecting IoT devices to the Azure cloud while Event Hubs
was designed for big data streaming. Microsoft recommends using Azure IoT Hub to connect IoT devices to Azure.
IoT Capability IoT Hub standard tier IoT Hub basic tier Event Hubs
Device-to-cloud messaging
Protocols: HTTPS, AMQP, AMQP over webSockets
Protocols: MQTT, MQTT over webSockets
Per-device identity
File upload from devices
Device Provisioning Service
Cloud-to-device messaging
Device twin and device management
Device streams (preview)
IoT Edge
premiseo.com dataredkite.com
26/02/2021 Meetup Azure Lille 65
Big Data
Streaming
Real-time data stream processing
from millions of IoT devices
Azure Stream Analytics
Connect, monitor and manage
billions of IoT assets
Azure IoT Hub
Real-time data stream with Kafka
Azure HDInsigth & Kafka
Connect, monitor and manage
billions of IoT assets
Spark Streaming with Databricks
dataredkite.com
premiseo.com
Apache Kafka on HDInsight architecture
27/04/2021 Meetup Azure Lille 66
premiseo.com dataredkite.com
26/02/2021 Meetup Azure Lille 67
Big Data
Streaming
Real-time data stream processing
from millions of IoT devices
Azure Stream Analytics
Connect, monitor and manage
billions of IoT assets
Azure IoT Hub
Real-time data stream with Kafka
Azure HDInsigth & Kafka
Use Spark Streaming with
Databricks
Spark Streaming with Databricks
dataredkite.com
premiseo.com
Azure Databricks
26/02/2021 68
o Apache Spark Streaming is a scalable fault-tolerant streaming processing system that natively supports both batch and
streaming workloads.
o Spark Streaming is an extension of the core Spark API
premiseo.com dataredkite.com
26/02/2021 Meetup Azure Lille 69
Data Tools
dataredkite.com
premiseo.com
Azure Data Studio
26/02/2021 70
Azure Data Studio is a cross-platform database tool that you can run on Windows, macOS, and Linux. You'll use it to
connect to SQL Data Warehouse and Azure SQL Database.
Previously released under the preview name SQL Operations Studio, Azure Data Studio offers a modern editor experience
with IntelliSense, code snippets, source control integration, and an integrated terminal. It is engineered with the data
platform user in mind, with built in charting of query result sets and customizable dashboards.
dataredkite.com
premiseo.com
Storage Explorer
26/02/2021 71
Begin by downloading and installing Storage Explorer. You can use Storage Explorer to do several operations against data
in your Azure Storage account and data lake:
o Upload files or folders from your local computer into Azure Storage.
o Download cloud-based data to your local computer.
o Copy or move files and folders around in the storage account.
o Delete data from the storage account.
dataredkite.com
premiseo.com
Visual Studio Code
26/02/2021 72
Visual Studio Code is a lightweight source code editor which runs on your desktop and is available for Windows, macOS
and Linux. It comes with built-in support for JavaScript, TypeScript and Node.js and has a rich ecosystem of extensions for
other languages (such as C++, C#, Java, Python, PHP, Go) and runtimes (such as .NET and Unity).
premiseo.com dataredkite.com
26/02/2021 Meetup Azure Lille 73
Data Migration Tools
dataredkite.com
premiseo.com
Summary
26/02/2021 74
Scenario Some recommended solutions
Disaster Recovery Azure geo-redundant backups
Read Scale Use read-only replicas to load balance read-only query
workloads (preview)
ETL (OLTP to OLAP) Azure Data Factory or SQL Server Integration Services or
Databricks
Migration from on-premises SQL Server to Azure SQL
Database
Azure Database Migration Service
Kept up-to-date across several Azure SQL databases or SQL
Server database
Azure SQL Data Sync
Detecting compatibility issues that can impact database
functionality in your new version of SQL Server or Azure SQL
Database
Data Migration Assistant (DMA)
premiseo.com dataredkite.com
26/02/2021 Meetup Azure Lille 75
Resources
dataredkite.com
premiseo.com
Azure charts
26/02/2021 76
https://azurecharts.com/
premiseo.com dataredkite.com
26/02/2021 77
Just few sources in Microsoft Learn:
o Azure for the Data Engineer
o Store data in Azure
o Work with relational data in Azure
o Large Scale Data Processing with Azure Data Lake Storage Gen2
o Implement a Data Streaming Solution with Azure Streaming Analytics
o Implement a Data Warehouse with Azure SQL Data Warehouse
Sources
dataredkite.com
premiseo.com
Fill the form
78
https://forms.office.com/Pages/ResponsePage.as
px?id=M3s0akU8nUyLePs4Zpn6Tp_2uFsS8cJJsHCS
wweCY5JUNVlMMllQNU4yRUVVWjFEOU5GVVc2S
VU3Si4u
Your turn !
premiseo.com dataredkite.com
26/02/2021 Meetup Azure Lille 79
Fast, easy, and collaborative
Apache Spark-based analytics
platform
Azure Databricks
Next Session: Azure
Databricks
dataredkite.com
premiseo.com
Thank you
26/02/2021 80
Meetup Azure Lille
dataredkite.com
https://premiseo.com/

More Related Content

What's hot

Ppt on cloud service
Ppt on cloud servicePpt on cloud service
Ppt on cloud service
JYOTIRANJANNAYAK18
 
Azure intelligent edge solutions overview
Azure intelligent edge solutions overviewAzure intelligent edge solutions overview
Azure intelligent edge solutions overview
Cenk Ersoy
 
What are the Business Benefits of Microsoft Azure
What are the Business Benefits of Microsoft AzureWhat are the Business Benefits of Microsoft Azure
What are the Business Benefits of Microsoft Azure
Chris Roche
 
Cassandra at eBay - Cassandra Summit 2013
Cassandra at eBay - Cassandra Summit 2013Cassandra at eBay - Cassandra Summit 2013
Cassandra at eBay - Cassandra Summit 2013
Jay Patel
 
Google Cloud Platform
Google Cloud Platform Google Cloud Platform
Google Cloud Platform
Francesco Marchitelli
 
Azure Storage
Azure StorageAzure Storage
Azure Storage
Mustafa
 
ITCamp 2018 - Thomas Maurer - Azure Stack - Everything you need to know!
ITCamp 2018 - Thomas Maurer - Azure Stack - Everything you need to know!ITCamp 2018 - Thomas Maurer - Azure Stack - Everything you need to know!
ITCamp 2018 - Thomas Maurer - Azure Stack - Everything you need to know!
ITCamp
 
Introduction to Microsoft Azure Cloud
Introduction to Microsoft Azure CloudIntroduction to Microsoft Azure Cloud
Introduction to Microsoft Azure Cloud
Dinesh Kumar Wickramasinghe
 
Data saturday Oslo Azure Purview Erwin de Kreuk
Data saturday Oslo Azure Purview Erwin de KreukData saturday Oslo Azure Purview Erwin de Kreuk
Data saturday Oslo Azure Purview Erwin de Kreuk
Erwin de Kreuk
 
Google Cloud Storage | Google Cloud Platform Tutorial | Google Cloud Architec...
Google Cloud Storage | Google Cloud Platform Tutorial | Google Cloud Architec...Google Cloud Storage | Google Cloud Platform Tutorial | Google Cloud Architec...
Google Cloud Storage | Google Cloud Platform Tutorial | Google Cloud Architec...
Edureka!
 
04 Azure IAAS 101
04 Azure IAAS 10104 Azure IAAS 101
04 Azure IAAS 101
Herman Keijzer
 
Mastering azure devOps - Dot Net Tricks
Mastering azure devOps - Dot Net TricksMastering azure devOps - Dot Net Tricks
Mastering azure devOps - Dot Net Tricks
Gaurav Singh
 
AWS for the Data Professional
AWS for the Data ProfessionalAWS for the Data Professional
AWS for the Data Professional
Lynn Langit
 
Benefits of the Azure cloud
Benefits of the Azure cloudBenefits of the Azure cloud
Benefits of the Azure cloud
James Serra
 
Tom Grey - Google Cloud Platform
Tom Grey - Google Cloud PlatformTom Grey - Google Cloud Platform
Tom Grey - Google Cloud Platform
Fondazione CUOA
 
Citrix on Azure
Citrix on AzureCitrix on Azure
Citrix on Azure
Mustafa
 
Azure Stack - Azure in your own Data Center
Azure Stack - Azure in your own Data CenterAzure Stack - Azure in your own Data Center
Azure Stack - Azure in your own Data Center
Adnan Hashmi
 
Introduction to Microsoft Azure
Introduction to Microsoft AzureIntroduction to Microsoft Azure
Introduction to Microsoft Azure
Guy Barrette
 
Architecting Enterprise Applications In The Cloud
Architecting Enterprise Applications In The CloudArchitecting Enterprise Applications In The Cloud
Architecting Enterprise Applications In The Cloud
Amazon Web Services
 
Building Hybrid Cloud Apps with Azure and Azure stack
Building Hybrid Cloud Apps with Azure and Azure stackBuilding Hybrid Cloud Apps with Azure and Azure stack
Building Hybrid Cloud Apps with Azure and Azure stack
WinWire Technologies Inc
 

What's hot (20)

Ppt on cloud service
Ppt on cloud servicePpt on cloud service
Ppt on cloud service
 
Azure intelligent edge solutions overview
Azure intelligent edge solutions overviewAzure intelligent edge solutions overview
Azure intelligent edge solutions overview
 
What are the Business Benefits of Microsoft Azure
What are the Business Benefits of Microsoft AzureWhat are the Business Benefits of Microsoft Azure
What are the Business Benefits of Microsoft Azure
 
Cassandra at eBay - Cassandra Summit 2013
Cassandra at eBay - Cassandra Summit 2013Cassandra at eBay - Cassandra Summit 2013
Cassandra at eBay - Cassandra Summit 2013
 
Google Cloud Platform
Google Cloud Platform Google Cloud Platform
Google Cloud Platform
 
Azure Storage
Azure StorageAzure Storage
Azure Storage
 
ITCamp 2018 - Thomas Maurer - Azure Stack - Everything you need to know!
ITCamp 2018 - Thomas Maurer - Azure Stack - Everything you need to know!ITCamp 2018 - Thomas Maurer - Azure Stack - Everything you need to know!
ITCamp 2018 - Thomas Maurer - Azure Stack - Everything you need to know!
 
Introduction to Microsoft Azure Cloud
Introduction to Microsoft Azure CloudIntroduction to Microsoft Azure Cloud
Introduction to Microsoft Azure Cloud
 
Data saturday Oslo Azure Purview Erwin de Kreuk
Data saturday Oslo Azure Purview Erwin de KreukData saturday Oslo Azure Purview Erwin de Kreuk
Data saturday Oslo Azure Purview Erwin de Kreuk
 
Google Cloud Storage | Google Cloud Platform Tutorial | Google Cloud Architec...
Google Cloud Storage | Google Cloud Platform Tutorial | Google Cloud Architec...Google Cloud Storage | Google Cloud Platform Tutorial | Google Cloud Architec...
Google Cloud Storage | Google Cloud Platform Tutorial | Google Cloud Architec...
 
04 Azure IAAS 101
04 Azure IAAS 10104 Azure IAAS 101
04 Azure IAAS 101
 
Mastering azure devOps - Dot Net Tricks
Mastering azure devOps - Dot Net TricksMastering azure devOps - Dot Net Tricks
Mastering azure devOps - Dot Net Tricks
 
AWS for the Data Professional
AWS for the Data ProfessionalAWS for the Data Professional
AWS for the Data Professional
 
Benefits of the Azure cloud
Benefits of the Azure cloudBenefits of the Azure cloud
Benefits of the Azure cloud
 
Tom Grey - Google Cloud Platform
Tom Grey - Google Cloud PlatformTom Grey - Google Cloud Platform
Tom Grey - Google Cloud Platform
 
Citrix on Azure
Citrix on AzureCitrix on Azure
Citrix on Azure
 
Azure Stack - Azure in your own Data Center
Azure Stack - Azure in your own Data CenterAzure Stack - Azure in your own Data Center
Azure Stack - Azure in your own Data Center
 
Introduction to Microsoft Azure
Introduction to Microsoft AzureIntroduction to Microsoft Azure
Introduction to Microsoft Azure
 
Architecting Enterprise Applications In The Cloud
Architecting Enterprise Applications In The CloudArchitecting Enterprise Applications In The Cloud
Architecting Enterprise Applications In The Cloud
 
Building Hybrid Cloud Apps with Azure and Azure stack
Building Hybrid Cloud Apps with Azure and Azure stackBuilding Hybrid Cloud Apps with Azure and Azure stack
Building Hybrid Cloud Apps with Azure and Azure stack
 

Similar to 20210427 azure lille_meetup_azure_data_stack

Azure Data services
Azure Data servicesAzure Data services
Azure Data services
Rajesh Kolla
 
Perth Azure Usergroup Build 2018 updates
Perth Azure Usergroup Build 2018 updatesPerth Azure Usergroup Build 2018 updates
Perth Azure Usergroup Build 2018 updates
Nirmal Thewarathanthri
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overview
James Serra
 
Rising Interest in Open Source Relational Databases
Rising Interest in Open Source Relational DatabasesRising Interest in Open Source Relational Databases
Rising Interest in Open Source Relational Databases
Christopher Foot
 
Lakehouse in Azure
Lakehouse in AzureLakehouse in Azure
Lakehouse in Azure
Sergio Zenatti Filho
 
OSS DB on Azure
OSS DB on AzureOSS DB on Azure
OSS DB on Azure
rockplace
 
DBaaS with EDB Postgres on AWS
DBaaS with EDB Postgres on AWSDBaaS with EDB Postgres on AWS
DBaaS with EDB Postgres on AWS
EDB
 
Azure - Data Platform
Azure - Data PlatformAzure - Data Platform
Azure - Data Platform
giventocode
 
2014.10.22 Building Azure Solutions with Office 365
2014.10.22 Building Azure Solutions with Office 3652014.10.22 Building Azure Solutions with Office 365
2014.10.22 Building Azure Solutions with Office 365
Marco Parenzan
 
Scalable relational database with SQL Azure
Scalable relational database with SQL AzureScalable relational database with SQL Azure
Scalable relational database with SQL Azure
Shy Engelberg
 
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part20812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
Raul Chong
 
IBM - Introduction to Cloudant
IBM - Introduction to CloudantIBM - Introduction to Cloudant
IBM - Introduction to Cloudant
Francisco González Jiménez
 
Serverless Data Platform
Serverless Data PlatformServerless Data Platform
Serverless Data Platform
Shu-Jeng Hsieh
 
Afternoons with Azure - Azure Data Services
Afternoons with Azure - Azure Data ServicesAfternoons with Azure - Azure Data Services
Afternoons with Azure - Azure Data Services
CCG
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overview
Alessandro Melchiori
 
Clash of Technologies Google Cloud vs Microsoft Azure
Clash of Technologies Google Cloud vs Microsoft AzureClash of Technologies Google Cloud vs Microsoft Azure
Clash of Technologies Google Cloud vs Microsoft Azure
Mihail Mateev
 
MySQL Ecosystem in 2020
MySQL Ecosystem in 2020MySQL Ecosystem in 2020
MySQL Ecosystem in 2020
Alkin Tezuysal
 
FIWARE Global Summit - A Multi-database Plugin for the Orion FIWARE Context B...
FIWARE Global Summit - A Multi-database Plugin for the Orion FIWARE Context B...FIWARE Global Summit - A Multi-database Plugin for the Orion FIWARE Context B...
FIWARE Global Summit - A Multi-database Plugin for the Orion FIWARE Context B...
FIWARE
 
Azure SQL Database
Azure SQL DatabaseAzure SQL Database
Azure SQL Database
rockplace
 
A Tour of Azure SQL Databases (NOVA SQL UG 2020)
A Tour of Azure SQL Databases  (NOVA SQL UG 2020)A Tour of Azure SQL Databases  (NOVA SQL UG 2020)
A Tour of Azure SQL Databases (NOVA SQL UG 2020)
Timothy McAliley
 

Similar to 20210427 azure lille_meetup_azure_data_stack (20)

Azure Data services
Azure Data servicesAzure Data services
Azure Data services
 
Perth Azure Usergroup Build 2018 updates
Perth Azure Usergroup Build 2018 updatesPerth Azure Usergroup Build 2018 updates
Perth Azure Usergroup Build 2018 updates
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overview
 
Rising Interest in Open Source Relational Databases
Rising Interest in Open Source Relational DatabasesRising Interest in Open Source Relational Databases
Rising Interest in Open Source Relational Databases
 
Lakehouse in Azure
Lakehouse in AzureLakehouse in Azure
Lakehouse in Azure
 
OSS DB on Azure
OSS DB on AzureOSS DB on Azure
OSS DB on Azure
 
DBaaS with EDB Postgres on AWS
DBaaS with EDB Postgres on AWSDBaaS with EDB Postgres on AWS
DBaaS with EDB Postgres on AWS
 
Azure - Data Platform
Azure - Data PlatformAzure - Data Platform
Azure - Data Platform
 
2014.10.22 Building Azure Solutions with Office 365
2014.10.22 Building Azure Solutions with Office 3652014.10.22 Building Azure Solutions with Office 365
2014.10.22 Building Azure Solutions with Office 365
 
Scalable relational database with SQL Azure
Scalable relational database with SQL AzureScalable relational database with SQL Azure
Scalable relational database with SQL Azure
 
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part20812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
 
IBM - Introduction to Cloudant
IBM - Introduction to CloudantIBM - Introduction to Cloudant
IBM - Introduction to Cloudant
 
Serverless Data Platform
Serverless Data PlatformServerless Data Platform
Serverless Data Platform
 
Afternoons with Azure - Azure Data Services
Afternoons with Azure - Azure Data ServicesAfternoons with Azure - Azure Data Services
Afternoons with Azure - Azure Data Services
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overview
 
Clash of Technologies Google Cloud vs Microsoft Azure
Clash of Technologies Google Cloud vs Microsoft AzureClash of Technologies Google Cloud vs Microsoft Azure
Clash of Technologies Google Cloud vs Microsoft Azure
 
MySQL Ecosystem in 2020
MySQL Ecosystem in 2020MySQL Ecosystem in 2020
MySQL Ecosystem in 2020
 
FIWARE Global Summit - A Multi-database Plugin for the Orion FIWARE Context B...
FIWARE Global Summit - A Multi-database Plugin for the Orion FIWARE Context B...FIWARE Global Summit - A Multi-database Plugin for the Orion FIWARE Context B...
FIWARE Global Summit - A Multi-database Plugin for the Orion FIWARE Context B...
 
Azure SQL Database
Azure SQL DatabaseAzure SQL Database
Azure SQL Database
 
A Tour of Azure SQL Databases (NOVA SQL UG 2020)
A Tour of Azure SQL Databases  (NOVA SQL UG 2020)A Tour of Azure SQL Databases  (NOVA SQL UG 2020)
A Tour of Azure SQL Databases (NOVA SQL UG 2020)
 

More from Alexandre BERGERE

Databases - beyond SQL : Cosmos DB (part 6)
Databases - beyond SQL : Cosmos DB (part 6)Databases - beyond SQL : Cosmos DB (part 6)
Databases - beyond SQL : Cosmos DB (part 6)
Alexandre BERGERE
 
comparatifs des familles NoSQL & concepts de modélisation
comparatifs des familles NoSQL & concepts de modélisationcomparatifs des familles NoSQL & concepts de modélisation
comparatifs des familles NoSQL & concepts de modélisation
Alexandre BERGERE
 
Azure data stack_2019_08
Azure data stack_2019_08Azure data stack_2019_08
Azure data stack_2019_08
Alexandre BERGERE
 
Big dataclasses 2019_nosql
Big dataclasses 2019_nosqlBig dataclasses 2019_nosql
Big dataclasses 2019_nosql
Alexandre BERGERE
 
Iot streaming with Azure Stream Analytics from IotHub to the full data slack
Iot streaming with Azure Stream Analytics from IotHub to the full data slackIot streaming with Azure Stream Analytics from IotHub to the full data slack
Iot streaming with Azure Stream Analytics from IotHub to the full data slack
Alexandre BERGERE
 
MongoDB classes 2019
MongoDB classes 2019MongoDB classes 2019
MongoDB classes 2019
Alexandre BERGERE
 

More from Alexandre BERGERE (6)

Databases - beyond SQL : Cosmos DB (part 6)
Databases - beyond SQL : Cosmos DB (part 6)Databases - beyond SQL : Cosmos DB (part 6)
Databases - beyond SQL : Cosmos DB (part 6)
 
comparatifs des familles NoSQL & concepts de modélisation
comparatifs des familles NoSQL & concepts de modélisationcomparatifs des familles NoSQL & concepts de modélisation
comparatifs des familles NoSQL & concepts de modélisation
 
Azure data stack_2019_08
Azure data stack_2019_08Azure data stack_2019_08
Azure data stack_2019_08
 
Big dataclasses 2019_nosql
Big dataclasses 2019_nosqlBig dataclasses 2019_nosql
Big dataclasses 2019_nosql
 
Iot streaming with Azure Stream Analytics from IotHub to the full data slack
Iot streaming with Azure Stream Analytics from IotHub to the full data slackIot streaming with Azure Stream Analytics from IotHub to the full data slack
Iot streaming with Azure Stream Analytics from IotHub to the full data slack
 
MongoDB classes 2019
MongoDB classes 2019MongoDB classes 2019
MongoDB classes 2019
 

Recently uploaded

DefCamp_2016_Chemerkin_Yury_--_publish.pdf
DefCamp_2016_Chemerkin_Yury_--_publish.pdfDefCamp_2016_Chemerkin_Yury_--_publish.pdf
DefCamp_2016_Chemerkin_Yury_--_publish.pdf
Yury Chemerkin
 
Generative AI Reasoning Tech Talk - July 2024
Generative AI Reasoning Tech Talk - July 2024Generative AI Reasoning Tech Talk - July 2024
Generative AI Reasoning Tech Talk - July 2024
siddu769252
 
"Building Future-Ready Apps with .NET 8 and Azure Serverless Ecosystem", Stan...
"Building Future-Ready Apps with .NET 8 and Azure Serverless Ecosystem", Stan..."Building Future-Ready Apps with .NET 8 and Azure Serverless Ecosystem", Stan...
"Building Future-Ready Apps with .NET 8 and Azure Serverless Ecosystem", Stan...
Fwdays
 
Finetuning GenAI For Hacking and Defending
Finetuning GenAI For Hacking and DefendingFinetuning GenAI For Hacking and Defending
Finetuning GenAI For Hacking and Defending
Priyanka Aash
 
Keynote : AI & Future Of Offensive Security
Keynote : AI & Future Of Offensive SecurityKeynote : AI & Future Of Offensive Security
Keynote : AI & Future Of Offensive Security
Priyanka Aash
 
Mule Experience Hub and Release Channel with Java 17
Mule Experience Hub and Release Channel with Java 17Mule Experience Hub and Release Channel with Java 17
Mule Experience Hub and Release Channel with Java 17
Bhajan Mehta
 
What's New in Copilot for Microsoft 365 June 2024.pptx
What's New in Copilot for Microsoft 365 June 2024.pptxWhat's New in Copilot for Microsoft 365 June 2024.pptx
What's New in Copilot for Microsoft 365 June 2024.pptx
Stephanie Beckett
 
FIDO Munich Seminar: Securing Smart Car.pptx
FIDO Munich Seminar: Securing Smart Car.pptxFIDO Munich Seminar: Securing Smart Car.pptx
FIDO Munich Seminar: Securing Smart Car.pptx
FIDO Alliance
 
NVIDIA at Breakthrough Discuss for Space Exploration
NVIDIA at Breakthrough Discuss for Space ExplorationNVIDIA at Breakthrough Discuss for Space Exploration
NVIDIA at Breakthrough Discuss for Space Exploration
Alison B. Lowndes
 
What's New in Teams Calling, Meetings, Devices June 2024
What's New in Teams Calling, Meetings, Devices June 2024What's New in Teams Calling, Meetings, Devices June 2024
What's New in Teams Calling, Meetings, Devices June 2024
Stephanie Beckett
 
Scaling Vector Search: How Milvus Handles Billions+
Scaling Vector Search: How Milvus Handles Billions+Scaling Vector Search: How Milvus Handles Billions+
Scaling Vector Search: How Milvus Handles Billions+
Zilliz
 
TrustArc Webinar - Innovating with TRUSTe Responsible AI Certification
TrustArc Webinar - Innovating with TRUSTe Responsible AI CertificationTrustArc Webinar - Innovating with TRUSTe Responsible AI Certification
TrustArc Webinar - Innovating with TRUSTe Responsible AI Certification
TrustArc
 
"Making .NET Application Even Faster", Sergey Teplyakov.pptx
"Making .NET Application Even Faster", Sergey Teplyakov.pptx"Making .NET Application Even Faster", Sergey Teplyakov.pptx
"Making .NET Application Even Faster", Sergey Teplyakov.pptx
Fwdays
 
The History of Embeddings & Multimodal Embeddings
The History of Embeddings & Multimodal EmbeddingsThe History of Embeddings & Multimodal Embeddings
The History of Embeddings & Multimodal Embeddings
Zilliz
 
Top 12 AI Technology Trends For 2024.pdf
Top 12 AI Technology Trends For 2024.pdfTop 12 AI Technology Trends For 2024.pdf
Top 12 AI Technology Trends For 2024.pdf
Marrie Morris
 
Retrieval Augmented Generation Evaluation with Ragas
Retrieval Augmented Generation Evaluation with RagasRetrieval Augmented Generation Evaluation with Ragas
Retrieval Augmented Generation Evaluation with Ragas
Zilliz
 
FIDO Munich Seminar Workforce Authentication Case Study.pptx
FIDO Munich Seminar Workforce Authentication Case Study.pptxFIDO Munich Seminar Workforce Authentication Case Study.pptx
FIDO Munich Seminar Workforce Authentication Case Study.pptx
FIDO Alliance
 
Discovery Series - Zero to Hero - Task Mining Session 1
Discovery Series - Zero to Hero - Task Mining Session 1Discovery Series - Zero to Hero - Task Mining Session 1
Discovery Series - Zero to Hero - Task Mining Session 1
DianaGray10
 
Generative AI technology is a fascinating field that focuses on creating comp...
Generative AI technology is a fascinating field that focuses on creating comp...Generative AI technology is a fascinating field that focuses on creating comp...
Generative AI technology is a fascinating field that focuses on creating comp...
Nohoax Kanont
 
The Challenge of Interpretability in Generative AI Models.pdf
The Challenge of Interpretability in Generative AI Models.pdfThe Challenge of Interpretability in Generative AI Models.pdf
The Challenge of Interpretability in Generative AI Models.pdf
Sara Kroft
 

Recently uploaded (20)

DefCamp_2016_Chemerkin_Yury_--_publish.pdf
DefCamp_2016_Chemerkin_Yury_--_publish.pdfDefCamp_2016_Chemerkin_Yury_--_publish.pdf
DefCamp_2016_Chemerkin_Yury_--_publish.pdf
 
Generative AI Reasoning Tech Talk - July 2024
Generative AI Reasoning Tech Talk - July 2024Generative AI Reasoning Tech Talk - July 2024
Generative AI Reasoning Tech Talk - July 2024
 
"Building Future-Ready Apps with .NET 8 and Azure Serverless Ecosystem", Stan...
"Building Future-Ready Apps with .NET 8 and Azure Serverless Ecosystem", Stan..."Building Future-Ready Apps with .NET 8 and Azure Serverless Ecosystem", Stan...
"Building Future-Ready Apps with .NET 8 and Azure Serverless Ecosystem", Stan...
 
Finetuning GenAI For Hacking and Defending
Finetuning GenAI For Hacking and DefendingFinetuning GenAI For Hacking and Defending
Finetuning GenAI For Hacking and Defending
 
Keynote : AI & Future Of Offensive Security
Keynote : AI & Future Of Offensive SecurityKeynote : AI & Future Of Offensive Security
Keynote : AI & Future Of Offensive Security
 
Mule Experience Hub and Release Channel with Java 17
Mule Experience Hub and Release Channel with Java 17Mule Experience Hub and Release Channel with Java 17
Mule Experience Hub and Release Channel with Java 17
 
What's New in Copilot for Microsoft 365 June 2024.pptx
What's New in Copilot for Microsoft 365 June 2024.pptxWhat's New in Copilot for Microsoft 365 June 2024.pptx
What's New in Copilot for Microsoft 365 June 2024.pptx
 
FIDO Munich Seminar: Securing Smart Car.pptx
FIDO Munich Seminar: Securing Smart Car.pptxFIDO Munich Seminar: Securing Smart Car.pptx
FIDO Munich Seminar: Securing Smart Car.pptx
 
NVIDIA at Breakthrough Discuss for Space Exploration
NVIDIA at Breakthrough Discuss for Space ExplorationNVIDIA at Breakthrough Discuss for Space Exploration
NVIDIA at Breakthrough Discuss for Space Exploration
 
What's New in Teams Calling, Meetings, Devices June 2024
What's New in Teams Calling, Meetings, Devices June 2024What's New in Teams Calling, Meetings, Devices June 2024
What's New in Teams Calling, Meetings, Devices June 2024
 
Scaling Vector Search: How Milvus Handles Billions+
Scaling Vector Search: How Milvus Handles Billions+Scaling Vector Search: How Milvus Handles Billions+
Scaling Vector Search: How Milvus Handles Billions+
 
TrustArc Webinar - Innovating with TRUSTe Responsible AI Certification
TrustArc Webinar - Innovating with TRUSTe Responsible AI CertificationTrustArc Webinar - Innovating with TRUSTe Responsible AI Certification
TrustArc Webinar - Innovating with TRUSTe Responsible AI Certification
 
"Making .NET Application Even Faster", Sergey Teplyakov.pptx
"Making .NET Application Even Faster", Sergey Teplyakov.pptx"Making .NET Application Even Faster", Sergey Teplyakov.pptx
"Making .NET Application Even Faster", Sergey Teplyakov.pptx
 
The History of Embeddings & Multimodal Embeddings
The History of Embeddings & Multimodal EmbeddingsThe History of Embeddings & Multimodal Embeddings
The History of Embeddings & Multimodal Embeddings
 
Top 12 AI Technology Trends For 2024.pdf
Top 12 AI Technology Trends For 2024.pdfTop 12 AI Technology Trends For 2024.pdf
Top 12 AI Technology Trends For 2024.pdf
 
Retrieval Augmented Generation Evaluation with Ragas
Retrieval Augmented Generation Evaluation with RagasRetrieval Augmented Generation Evaluation with Ragas
Retrieval Augmented Generation Evaluation with Ragas
 
FIDO Munich Seminar Workforce Authentication Case Study.pptx
FIDO Munich Seminar Workforce Authentication Case Study.pptxFIDO Munich Seminar Workforce Authentication Case Study.pptx
FIDO Munich Seminar Workforce Authentication Case Study.pptx
 
Discovery Series - Zero to Hero - Task Mining Session 1
Discovery Series - Zero to Hero - Task Mining Session 1Discovery Series - Zero to Hero - Task Mining Session 1
Discovery Series - Zero to Hero - Task Mining Session 1
 
Generative AI technology is a fascinating field that focuses on creating comp...
Generative AI technology is a fascinating field that focuses on creating comp...Generative AI technology is a fascinating field that focuses on creating comp...
Generative AI technology is a fascinating field that focuses on creating comp...
 
The Challenge of Interpretability in Generative AI Models.pdf
The Challenge of Interpretability in Generative AI Models.pdfThe Challenge of Interpretability in Generative AI Models.pdf
The Challenge of Interpretability in Generative AI Models.pdf
 

20210427 azure lille_meetup_azure_data_stack

  • 2. dataredkite.com premiseo.com Who are we ? 26/02/2021 2 I'm a data and cloud Architect and Spark lover. I worked many years as an Oracle consultant and expert, and now I work with Cloud solutions devoted to solve complex problems with high volumes of data. I am a Data Analyst & Solution Architect indepedent - ☁️ MCSE, Cosmos DB & Delta lover. I developed my skills through various clients' projects, teaching at the University and personal proof of concepts. I’m also the Co-Founder of DataRedkite, a product which can quickly give to its user a good management of data in Microsoft Azure DataLake. Laurent Leturgez Alexandre Bergere Meetup Azure Lille
  • 3. dataredkite.com premiseo.com Summary 26/02/2021 Meetup Azure Lille 3 Relational Databases NoSQL Databases Big Data Storage Data Big Data Streaming Storage : Compute :
  • 5. premiseo.com dataredkite.com 26/02/2021 Meetup Azure Lille 5 Relational Databases Managed relational SQL Database as a service Azure SQL Database Managed MariaDB database service for app developers Azure Database for MariaDB Managed MySQL database service for app developers Azure Database for MySQL Managed Postgres database service for app developers Azure Database for PostGres
  • 6. premiseo.com dataredkite.com 26/02/2021 Meetup Azure Lille 6 Relational Databases Managed relational SQL Database as a service Azure SQL Database Managed MySQL database service for app developers Azure Database for MySQL Managed MariaDB database service for app developers Azure Database for MariaDB Managed Postgres database service for app developers Azure Database for PostGres
  • 7. premiseo.com dataredkite.com 26/02/2021 Meetup Azure Lille 7 Relational Databases Managed relational SQL Database as a service Azure SQL Database Azure SQL Database
  • 8. dataredkite.com premiseo.com Azure SQL Database 27/04/2021 Meetup Azure Lille 8 • Azure SQL • SQL Server Paas service • Managed upgrades, patches, backups and monitoring • Latest Stable version of SQL Server • 99,99% availability • Deployment model • Single Database : database runs on non shared resources • Elastic Pool : database runs with a collection of databases that share set of resources at a predictable price
  • 9. dataredkite.com premiseo.com Azure SQL Database 27/04/2021 Meetup Azure Lille 9 • Azure SQL • Purchasing model • DTU (Database Transaction Unit) : https://docs.microsoft.com/en-us/azure/azure-sql/database/service-tiers- dtu • Basic tier • Standard Tier • Premium Tier • vCore model • Serverless • Service Tier • General Purpose (vCore) / Standard (DTU) : Common workloads • Business Critical (vCore) / Premium (DTU) : High transaction and availability / low latency IO • HyperScale (vCore) : • Up to 100Tb Database • Rapid Scale up (compute resources) • Rapid Scale out (read only nodes : read workload / hot-standby)
  • 10. dataredkite.com premiseo.com Azure SQL Database 27/04/2021 Meetup Azure Lille 10 • Azure SQL Managed Instance • Features • Paas platform for lift and shift at scale • Broadest SQL Server engine compatibility (network integration, features etc.) • With perservation of all Paas capabilities (patching, updates, backups, HA etc.) • vCore purchase model only • BYOL available • SQL Virtual Machine • SQL Server deployment on VM (Linux and Windows) • Can choice SQL Server version • From 2008 R2 • Up to 2019
  • 11. dataredkite.com premiseo.com Azure SQL Database 27/04/2021 Meetup Azure Lille 11 Azure SQL Database Managed Instance Instance scoped model with high compatibility to SQL Server Best for modernisation at scale with low cost effort (lift & shift) Single Standalone managed database for predictable and stable workloads Elastic Pool Shared resources model : multitenant
  • 12. premiseo.com dataredkite.com 26/02/2021 Meetup Azure Lille 12 Relational Databases Managed relational SQL Database as a service Azure SQL Database Managed MySQL database service for app developers Azure Database for MySQL Managed MariaDB database service for app developers Azure Database for MariaDB Managed Postgres database service for app developers Azure Database for PostGre
  • 13. dataredkite.com premiseo.com Azure Database for PostgreSQL 27/04/2021 Meetup Azure Lille 13 • Paas Service for PostgreSQL • Runs on Windows • Single Server • v9.5 to 11 • Up to 64 vCores depending on SKU (https://docs.microsoft.com/en-us/azure/postgresql/concepts-pricing- tiers) • Up to 2 for Basic SKU • Up to 64 for General Purpose SKU • Up to 32 for Memory Optimized SKU • Bunch of PG Extensions available • Automated Backup (retention up to 35days) • Backup frequency and backup types depend on database size • Geo-redundant backup option (General Purpose & Memory Optimized)
  • 14. dataredkite.com premiseo.com Azure Database for PostgreSQL 27/04/2021 Meetup Azure Lille 14 • Paas Service for PostgreSQL • HyperScale (Citus) • High performance and analytical workloads beyond 100Gb • Hyperscale delivers • Horizontal scaling across multiple machine (with Sharding) • Query parallelization across these servers • High performance for analytics • Based on server groups • Design approach required for table distribution and performance • Distributed tables (based on distribution column) • Reference tables (content concentrated into a single shard replicated on every worker node) • Local tables (ordinary unsharded tables. Perfect for small tables not involded into joins) • Automated backup through storage snapshots
  • 15. dataredkite.com premiseo.com Azure Database for PostgreSQL 27/04/2021 Meetup Azure Lille 15 • Paas Service for PostgreSQL • Flexible Server (Preview) • Automated patching • Automatic backups • Performance adjustment in three switchable compute tiers : Burstable, GP, Memory Optimized High Availability Zone Redundant HA (Optional)
  • 16. premiseo.com dataredkite.com 26/02/2021 Meetup Azure Lille 16 Relational Databases Managed relational SQL Database as a service Azure SQL Database Managed MariaDB database service for app developers Azure Database for MariaDB Managed Postgres database service for app developers Azure Database for PostGre Managed MySQL database service for app developers Azure Database for MySQL
  • 17. dataredkite.com premiseo.com Azure Database for MariaDB 27/04/2021 Meetup Azure Lille 17 • Paas Service for MariaDB • Runs on Windows • Single Server • V10.2 and 10.3 • Up to 64 vCores depending on SKU (https://docs.microsoft.com/en-us/azure/mariadb/concepts-pricing-tiers) • Up to 2 for Basic SKU • Up to 64 for General Purpose SKU • Up to 32 for Memory Optimized SKU • Automated Backup (retention up to 35days) • Backup frequency and backup types depend on database size • Geo-redundant backup option (General Purpose & Memory Optimized)
  • 18. premiseo.com dataredkite.com 26/02/2021 Meetup Azure Lille 18 Relational Databases Managed relational SQL Database as a service Azure SQL Database Managed MariaDB database service for app developers Azure Database for MariaDB Managed Postgres database service for app developers Azure Database for PostGre Managed MySQL database service for app developers Azure Database for MySQL
  • 19. dataredkite.com premiseo.com Azure Database for MySQL 27/04/2021 Meetup Azure Lille 19 • Paas Service for MySQL • Runs on Windows • Single Server • V5.6, 5.7, and 8.0 • Up to 64 vCores depending on SKU (https://docs.microsoft.com/en-us/azure/mysql/concepts-pricing-tiers) • Up to 2 for Basic SKU • Up to 64 for General Purpose SKU • Up to 32 for Memory Optimized SKU • Automated Backup (retention up to 35days) • Backup frequency and backup types depend on database size • Geo-redundant backup option (General Purpose & Memory Optimized)
  • 20. dataredkite.com premiseo.com Azure Database for MySQL 27/04/2021 Meetup Azure Lille 20 • Paas Service for MySQL Flexible Server (Preview) • V5.7 • Automated patching • Automatic backups • Performance adjustment in three switchable compute tiers : Burstable, GP, Memory Optimized • Network Isolation • Private Access through Vnet integration • Public Access
  • 21. dataredkite.com premiseo.com Azure Database for MySQL 27/04/2021 Meetup Azure Lille 21 • Paas Service for MySQL Flexible Server (Preview) High Availability Zone Redundant HA (Optional)
  • 22. premiseo.com dataredkite.com 26/02/2021 Meetup Azure Lille 22 NOSQL Databases Globally distributed, multi-model database for any scale Azure Cosmos DB
  • 23. dataredkite.com premiseo.com Azure Cosmos DB 26/02/2021 23 A globally distributed, massively scalable, multi-model database service Azure Cosmos DB o SQL API o MongoDB API o Cassandra API o Gremlin API o Table API
  • 25. premiseo.com dataredkite.com 26/02/2021 Meetup Azure Lille 25 Big Data Storage REST-based object storage for unstructured data Storage Account Massively scalable, secure data lake functionality built on Azure Blob Storage Azure Data Lake Storage
  • 26. premiseo.com dataredkite.com 26/02/2021 Meetup Azure Lille 26 Big Data Storage REST-based object storage for unstructured data Storage Account Massively scalable, secure data lake functionality built on Azure Blob Storage Azure Data Lake Storage
  • 27. dataredkite.com premiseo.com Storage Account 26/02/2021 27 o Azure Blobs : A scalable object store for text and binary data o Azure Files : Managed file shares for cloud or on-premises deployments o Azure Queues : A messaging store for reliable messaging between application components o Azure Tables : A NoSQL store for no-schema storage of structured data Azure Storage accounts are the base storage type within Azure. Azure Storage offers a very scalable object store for data objects and file system services in the cloud. It can also provide a messaging store for reliable messaging, or it can act as a NoSQL store. Azure selected four of these data services and placed them together under the name Azure Storage. The four services are Azure Blobs, Azure Files, Azure Queues, and Azure Tables. The following illustration shows the elements of Azure Storage
  • 28. dataredkite.com premiseo.com Storage Account 26/02/2021 28 Type of Storage Account Storage account type Services Redundancy options General-purpose V2 Basic storage account type for blobs, files, queues, and tables. Recommended for most scenarios using Azure Storage. LRS, GRS, RA-GRS, ZRS, GZRS, RA-GZRS General-purpose V1 Legacy account type for blobs, files, queues, and tables. Use general-purpose v2 accounts instead when possible. LRS, GRS, RA-GRS BlockBlobStorage Storage accounts with premium performance characteristics for block blobs and append blobs. Recommended for scenarios with high transactions rates, or scenarios that use smaller objects or require consistently low storage latency. LRS, ZRS FileStorage Files-only storage accounts with premium performance characteristics. Recommended for enterprise or high performance scale applications. LRS, ZRS BlobStorage Legacy Blob-only storage accounts. Use general-purpose v2 accounts instead when possible. LRS, GRS, RA-GRS
  • 31. premiseo.com dataredkite.com 26/02/2021 Meetup Azure Lille 31 Big Data Storage REST-based object storage for unstructured data Storage Account Massively scalable, secure data lake functionality built on Azure Blob Storage Azure Data Lake Storage
  • 32. dataredkite.com premiseo.com Azure Datalake Store 26/02/2021 32 Azure Data Lake Storage is a Hadoop-compatible data repository that can store any size or type of data. This storage service is available as Generation 1 (Gen1) or Generation 2 (Gen2). Key features of Data Lake Storage: o Unlimited scalability o Hadoop compatibility o Security support for both access control lists (ACLs) & RBAC (for Gen 2 only) o POSIX compliance o An optimized Azure Blob File System (ABFS) driver that's designed for big-data analytics o Zone-redundant storage o Geo-redundant storage Azure Datalake Gen 1 Azure Datalake Gen 2
  • 33. dataredkite.com premiseo.com Choose a storage solution on Azure 26/02/2021 33 Data classification Operations Latency & throughput Transactional support Recommended service Product catalog data Semi-structured because of the need to extend or modify the schema for new products o Customers require a high number of read operations, with the ability to query on many fields within the database. o The business requires a high number of write operations to track the constantly changing inventory. High throughput and low latency Required Azure Cosmos DB Photos and videos Unstructured o Only need to be retrieved by ID. o Customers require a high number of read operations with low latency. o Creates and updates will be somewhat infrequent and can have higher latency than read operations. Retrievals by ID need to support low latency and high throughput. Creates and updates can have higher latency than read operations. Not required Azure Blob storage Business data Structured Read-only, complex analytical queries across multiple databases Some latency in the results is expected based on the complex nature of the queries Required Azure SQL Database Azure Database for MariaDB Azure Database for PostGre Azure Database for MySQL
  • 35. premiseo.com dataredkite.com 26/02/2021 Meetup Azure Lille 35 Data Compute Managed data-integration solution Data Factory Process events with serverless code Azure Functions
  • 36. premiseo.com dataredkite.com 26/02/2021 Meetup Azure Lille 36 Data Compute Managed data-integration solution Data Factory Process events with serverless code Azure Functions
  • 37. dataredkite.com premiseo.com Azure Function 37 Azure Functions is the serverless compute service from Microsoft. Functions are event-driven: each function defines a trigger — the exact definition of the event source, for instance, the name of a storage queue. Uses cases: If you want to... then... Build a web API Implement an endpoint for your web applications using the HTTP trigger Process file uploads Run code when a file is uploaded or changed in blob storage Build a serverless workflow Chain a series of functions together using durable functions Respond to database changes Run custom logic when a document is created or updated in Cosmos DB Run scheduled tasks Execute code at set times Create reliable message queue systems Process message queues using Queue Storage, Service Bus, or Event Hubs
  • 38. dataredkite.com premiseo.com Azure Function 38 Consumption Plan Functions Consumption Plan (B1, B2, B3, S1, S2, S3 Scale automatically and only pay for compute resources when your functions are running. On the Consumption plan, instances of the Functions host will be dynamically added and removed based on the number of incoming events. Premium plan (P1v2, P2v2, P3v3) While automatically scaling based on demand, use prewarmed workers to run applications with no delay after being idle, run on more powerful instances and connect to VNETs. Azure App Service plan Run Functions within an App Service plan at regular App Service plan rates. Good fit for long- running operations, as well as when more predictive scaling and costs are required. Azure Functions hosting options : Azure Plan
  • 39. dataredkite.com premiseo.com 27/04/2021 39 Durable Functions is a library that brings workflow orchestration abstractions to Azure Functions. It introduces a number of idioms and tools to define stateful, potentially long-running operations, and manages a lot of mechanics of reliable communication and state management behind the scenes. Log of events in the course of orchestrator progression 3 steps of a workflow executed in sequence https://medium.com/hackernoon/making-sense-of-azure-durable-functions- 645ecb3c1d58 Azure Function Azure Durable Functions
  • 40. premiseo.com dataredkite.com 26/02/2021 Meetup Azure Lille 40 Data Compute Managed data-integration solution Data Factory Process events with serverless code Azure Functions
  • 41. dataredkite.com premiseo.com Azure Data Factory 27/04/2021 Meetup Azure Lille 41 • Serverless Data Integration service • Data Pipeline : logical group of activities • Data Flow : Data Transformation activity • Data Copy : Data Transfer activity • SSIS Integration • Git integration
  • 42. dataredkite.com premiseo.com Azure Data Factory 27/04/2021 Meetup Azure Lille 42 • Serverless Data Integration service • Job scheduling • Automatically through internal Scheduler • Manually • SDK : .NET, Python • REST API • PowerShell
  • 43. dataredkite.com premiseo.com Azure Data Factory 27/04/2021 Meetup Azure Lille 43 • Serverless Data Integration service • Integration runtime • Compute infrastructure used by ADF to provide data integration • Azure : Serverless • Self Hosted : Onprem or Azure Virtual Machine (Windows) • SSIS Activity Features Azure Data Flow Data Copy Dispatch Activity (HDI, Databricks, SQL …) Cloud to Cloud data transfer/flows Self-Hosted Data Flow Data Copy Dispatch Activity (HDI, Databricks, SQL …) OnPrem or Virtual Machine deployment (Windows) OnPrem <-> Cloud data transfer/flows When connectors are not available SSIS SSIS Package execution Private or public Network
  • 44. premiseo.com dataredkite.com 26/02/2021 Meetup Azure Lille 44 Big Data Compute Fast, easy, and collaborative Apache Spark-based analytics platform Azure Databricks HDInsight supports the latest open source projects from the Apache Hadoop and Spark ecosystems. Azure HDInsight Managed Enterprise Datawarehouse and BigData Analytics service Azure Synapse Analytics
  • 45. premiseo.com dataredkite.com 26/02/2021 Meetup Azure Lille 45 Big Data Compute Fast, easy, and collaborative Apache Spark-based analytics platform Azure Databricks HDInsight supports the latest open source projects from the Apache Hadoop and Spark ecosystems. Azure HDInsight Managed Enterprise Datawarehouse and BigData Analytics service Azure Synapse Analytics
  • 47. dataredkite.com premiseo.com Azure Databricks 26/02/2021 47 Azure Databricks is a data analytics platform optimized for the Microsoft Azure cloud services platform. Azure Databricks offers two environments for developing data intensive applications: o Azure Databricks Workspace: provides an interactive workspace that enables collaboration between data engineers, data scientists, and machine learning engineers. o Azure Databricks SQL Analytics: provides an easy-to-use platform for analysts who want to run SQL queries on their data lake, create multiple visualization types to explore query results from different perspectives, and build and share dashboards.
  • 48. premiseo.com dataredkite.com 26/02/2021 Meetup Azure Lille 48 Big Data Compute Fast, easy, and collaborative Apache Spark-based analytics platform Azure Databricks HDInsight supports the latest open source projects from the Apache Hadoop and Spark ecosystems. Azure HDInsight Managed Enterprise Datawarehouse and BigData Analytics service Azure Synapse Analytics
  • 49. dataredkite.com premiseo.com Azure HDInsights 27/04/2021 Meetup Azure Lille 49 • Managed Hadoop distribution for Azure • Based on Cloudera Hortonworks hadoop distribution • Comes in various flavours / shapes (VM shapes and number) • Hadoop : General purpose (HDFS, Yarn, MapReduce, Hive, Pig, Sqoop, Oozie) • Spark • Kafka • HBase • Hive / LLAP (Interactive Query) • Storm (Stream processing) • ML Services with R
  • 50. dataredkite.com premiseo.com Azure HDInsights 27/04/2021 Meetup Azure Lille 50 • At least one Storage account mandatory (for libs and binaries) • External Metastores available for Ambari, Hive and Oozie • HDInsights architecture
  • 51. premiseo.com dataredkite.com 26/02/2021 Meetup Azure Lille 51 Big Data Compute Fast, easy, and collaborative Apache Spark-based analytics platform Azure Databricks HDInsight supports the latest open source projects from the Apache Hadoop and Spark ecosystems. Azure HDInsigth Managed Enterprise Datawarehouse and BigData Analytics service Azure Synapse Analytics
  • 52. dataredkite.com premiseo.com Azure Synapse Analytics 27/04/2021 Meetup Azure Lille 52 Stream Analytics Data Factory Data Lake Modern Analytics MPP Datawarehouse
  • 53. dataredkite.com premiseo.com Azure Synapse Analytics 27/04/2021 Meetup Azure Lille 53 MPP Datawarehou se Choice of language (T-SQL, Spark SQL, Python, Scala, .Net) Analytics ready (Analysis Services, Power BI) Data Science and AI Ready (Azure Machine Learning integration)
  • 54. dataredkite.com premiseo.com Azure Synapse Analytics 27/04/2021 Meetup Azure Lille 54 Synapse Analytics • Sample Use Case : Pure Business Intelligence !
  • 55. dataredkite.com premiseo.com Azure Synapse Analytics 27/04/2021 Meetup Azure Lille 55 • Not for small database (Usually > 1Tb) • Cost Model • Synapse Provisioned • T-SQL Pool with DWU (Datawarehouse Units) • Storage (Geo redundant option) • Synapse Serverless • Spark Pools • Synapse Pipeline
  • 56. dataredkite.com premiseo.com Azure Synapse Analytics 27/04/2021 Meetup Azure Lille 56 • Architecture • DMS (Data Movement Service) • Used for Data Colocation • Key point: Data Partitioning and Data Distribution
  • 57. dataredkite.com premiseo.com Azure Synapse Analytics 27/04/2021 Meetup Azure Lille 57 • Hash distributed table • Replicated Table • Round Robin distributed Table • Example • Dimension to Fact table join
  • 58. premiseo.com dataredkite.com 26/02/2021 Meetup Azure Lille 58 Big Data Streaming Real-time data stream processing from millions of IoT devices Azure Stream Analytics Connect, monitor and manage billions of IoT assets Azure IoT Hub Real-time data stream with Kafka Azure HDInsigth & Kafka Use Spark Streaming with Databricks Spark Streaming with Databricks
  • 59. premiseo.com dataredkite.com 26/02/2021 Meetup Azure Lille 59 Big Data Streaming Real-time data stream processing from millions of IoT devices Azure Stream Analytics Connect, monitor and manage billions of IoT assets Azure IoT Hub Real-time data stream with Kafka Azure HDInsigth & Kafka Use Spark Streaming with Databricks Spark Streaming with Databricks
  • 61. dataredkite.com premiseo.com Azure Streaming Analytics 26/02/2021 61 o Azure Stream Analytics supports user-defined functions (UDF) or user-defined aggregates (UDA) in JavaScript for cloud jobs and C# for IoT Edge jobs UDFs, UDAs, and custom deserializers: o Analyze real-time telemetry streams from IoT devices o Web logs/clickstream analytics o Geospatial analytics for fleet management and driverless vehicles o Remote monitoring and predictive maintenance of high value assets o Real-time analytics on Point of Sale data for inventory control and anomaly detection Examples scenarios:
  • 62. premiseo.com dataredkite.com 26/02/2021 Meetup Azure Lille 62 Big Data Streaming Real-time data stream processing from millions of IoT devices Azure Stream Analytics Connect, monitor and manage billions of IoT assets Azure IoT Hub Real-time data stream with Kafka Azure HDInsigth & Kafka Use Spark Streaming with Databricks Spark Streaming with Databricks
  • 63. dataredkite.com premiseo.com Azure Iot Hub 63 Azure IoT Hub : o The cloud gateway that connects IoT devices to gather data and drive business insights and automation. o The big data streaming service of Azure. It is designed for high throughput data streaming scenarios where customers may send billions of requests per day. o Bi-directional communication capabilities
  • 64. dataredkite.com premiseo.com Iot Hub or Event Hubs 64 IoT Hub was developed to address the unique requirements of connecting IoT devices to the Azure cloud while Event Hubs was designed for big data streaming. Microsoft recommends using Azure IoT Hub to connect IoT devices to Azure. IoT Capability IoT Hub standard tier IoT Hub basic tier Event Hubs Device-to-cloud messaging Protocols: HTTPS, AMQP, AMQP over webSockets Protocols: MQTT, MQTT over webSockets Per-device identity File upload from devices Device Provisioning Service Cloud-to-device messaging Device twin and device management Device streams (preview) IoT Edge
  • 65. premiseo.com dataredkite.com 26/02/2021 Meetup Azure Lille 65 Big Data Streaming Real-time data stream processing from millions of IoT devices Azure Stream Analytics Connect, monitor and manage billions of IoT assets Azure IoT Hub Real-time data stream with Kafka Azure HDInsigth & Kafka Connect, monitor and manage billions of IoT assets Spark Streaming with Databricks
  • 66. dataredkite.com premiseo.com Apache Kafka on HDInsight architecture 27/04/2021 Meetup Azure Lille 66
  • 67. premiseo.com dataredkite.com 26/02/2021 Meetup Azure Lille 67 Big Data Streaming Real-time data stream processing from millions of IoT devices Azure Stream Analytics Connect, monitor and manage billions of IoT assets Azure IoT Hub Real-time data stream with Kafka Azure HDInsigth & Kafka Use Spark Streaming with Databricks Spark Streaming with Databricks
  • 68. dataredkite.com premiseo.com Azure Databricks 26/02/2021 68 o Apache Spark Streaming is a scalable fault-tolerant streaming processing system that natively supports both batch and streaming workloads. o Spark Streaming is an extension of the core Spark API
  • 70. dataredkite.com premiseo.com Azure Data Studio 26/02/2021 70 Azure Data Studio is a cross-platform database tool that you can run on Windows, macOS, and Linux. You'll use it to connect to SQL Data Warehouse and Azure SQL Database. Previously released under the preview name SQL Operations Studio, Azure Data Studio offers a modern editor experience with IntelliSense, code snippets, source control integration, and an integrated terminal. It is engineered with the data platform user in mind, with built in charting of query result sets and customizable dashboards.
  • 71. dataredkite.com premiseo.com Storage Explorer 26/02/2021 71 Begin by downloading and installing Storage Explorer. You can use Storage Explorer to do several operations against data in your Azure Storage account and data lake: o Upload files or folders from your local computer into Azure Storage. o Download cloud-based data to your local computer. o Copy or move files and folders around in the storage account. o Delete data from the storage account.
  • 72. dataredkite.com premiseo.com Visual Studio Code 26/02/2021 72 Visual Studio Code is a lightweight source code editor which runs on your desktop and is available for Windows, macOS and Linux. It comes with built-in support for JavaScript, TypeScript and Node.js and has a rich ecosystem of extensions for other languages (such as C++, C#, Java, Python, PHP, Go) and runtimes (such as .NET and Unity).
  • 73. premiseo.com dataredkite.com 26/02/2021 Meetup Azure Lille 73 Data Migration Tools
  • 74. dataredkite.com premiseo.com Summary 26/02/2021 74 Scenario Some recommended solutions Disaster Recovery Azure geo-redundant backups Read Scale Use read-only replicas to load balance read-only query workloads (preview) ETL (OLTP to OLAP) Azure Data Factory or SQL Server Integration Services or Databricks Migration from on-premises SQL Server to Azure SQL Database Azure Database Migration Service Kept up-to-date across several Azure SQL databases or SQL Server database Azure SQL Data Sync Detecting compatibility issues that can impact database functionality in your new version of SQL Server or Azure SQL Database Data Migration Assistant (DMA)
  • 77. premiseo.com dataredkite.com 26/02/2021 77 Just few sources in Microsoft Learn: o Azure for the Data Engineer o Store data in Azure o Work with relational data in Azure o Large Scale Data Processing with Azure Data Lake Storage Gen2 o Implement a Data Streaming Solution with Azure Streaming Analytics o Implement a Data Warehouse with Azure SQL Data Warehouse Sources
  • 79. premiseo.com dataredkite.com 26/02/2021 Meetup Azure Lille 79 Fast, easy, and collaborative Apache Spark-based analytics platform Azure Databricks Next Session: Azure Databricks
  • 80. dataredkite.com premiseo.com Thank you 26/02/2021 80 Meetup Azure Lille dataredkite.com https://premiseo.com/