SlideShare a Scribd company logo
Powered by
SQL Server 2019
Big Data Clusters
Rozalina Zaharieva
&
Dimitar Zahariev
SQLServer Big Data Cluster Layout
IoT data
Controller
Cluster
Compute plane
Compute pool Compute pool
SQL Compute
Node
SQL Compute
Node
Compute pool
SQL Compute
Node
SQL Compute
Node
SQL Compute
Node
Control planeSQL Server
Master instance
Storage plane
Directly read
From HDFS
Data pool
SQL Data
Node
SQL Data
Node
Storage Storage
HDFS Data Node
Spark
SQL
Server
Storage pool
Spark
SQL
Server
HDFS Data Node HDFS Data Node
Spark
SQL
Server
Kubernetes pod
External data sources
Microsoft SQL Server
Node
Persistent storage
Node Node Node Node Node Node Node
Analytics
Custom
apps
BI
Architecturedissection
• Kubernetes (K8s) concepts
• SQL Server 2019 big data cluster (BDC) components
Kubernetes concepts
WhatisKubernetesandwhatitdoes?
 Kubernetes is a container orchestrator and is responsible for:
 Run a cluster of hosts
 Schedule containers to run on different hosts
 Facilitate the communication between the containers
 Provide and control access to/from outside world
 Track and optimize the resource usage
 Similar solutions
 Docker Swarm, Mesos Marathon, Amazon ECS, Hashicorp Nomad
K8sarchitectureoverview
kube-proxy
Kubelet
Node1
Pod1
PodN
...
kube-proxy
Kubelet
NodeK
Pod1
PodM
...
Master Node
Scheduler
Controller
api-server
Key-Value Store
Master Node
Scheduler
Controller
api-server
Key-Value Store
Master Node
Scheduler
Controller
api-server
Key-Value Store
MasterNodes
 Responsible for managing the cluster
 Typically more than one is installed
 In HA mode one Master node is the
Leader
 Can be reached via CLI (kubectl),
APIs, or Dashboard
Master Node
Scheduler
Controller
api-server
Key-Value Store
Master Node
Scheduler
Controller
api-server
Key-Value Store
Master Node
Scheduler
Controller
api-server
Key-Value Store
Schedules the work on
different nodes
Takes care of:
1) Control loops
2) Desired state
Performs:
1) Administrative tasks
2) Stores cluster state
etcd is used and it can
be:
1) part of the master
2) installed externally
(Worker)Nodes
 Initially called Minions
 Container runtime
 containerd, rkt, lxd
 Kubelet
 Communicates with master
 Uses CRI shims
 kube-proxy
 Network proxy
Node
kube-proxy Kubelet
Container Runtime
Pod 1
Pod 2
Pods(1)
 Smallest unit of scheduling
 Contains one or more
containers
 Containers share the pod
environment
 Scheduled on nodes
 Created via manifest files
Pod
Main container
Supporting containers
net mount ...
Environment
Pods(2)
 Each pod has unique IP address
 Inter-pod communication is via a pod network
 Intra-pod communication is via localhost and
port
Pod 2
10.10.20.21
Pod network
Pod 1
10.10.20.20
localhost
ReplicationControllers
 Higher level workload
 Looks after pod or set of pods
 Scale up/down pods
 Sets Desired State
Replication Controller
Pod
Deployment
Deployments
 Even higher level workload
 Simplifies updates
and rollbacks
 Declarative and imperative
approach
 Self documenting
 Suitable for versioning
Replication Set
Pod
Services(1)
 Provide reliable network endpoint
 IP address
 DNS name
 Port
 Expose Pods to the outside world
 NodePort (cluster-wide port)
 LoadBalancer (cloud-based)
 Use End Point object to track Pods
IP = 10.10.10.1
DNS = demo-svc
Port = 32000
Service
Pod A IP, Pod B IP, ...
End Point
Node 1
Pod A
10.10.20.21
Node 2
Pod B
10.10.20.22
Services(2)
 Services use label selectors to do their magic
Service
version=v01
app=myapp
Pod
version=v01
app=myapp
Pod
version=v01
app=myapp
Services(2)
Service
version=v01
app=myapp
Pod
version=v01
app=myapp
Pod
version=v02
app=myapp
Pod
version=v02
app=myapp
Pod
version=v01
app=myapp
 Services use label selectors to do their magic
Services(2)
Service
version=v02
app=myapp
Pod
version=v01
app=myapp
Pod
version=v02
app=myapp
Pod
version=v02
app=myapp
Pod
version=v01
app=myapp
 Services use label selectors to do their magic
Services(2)
Service
version=v02
app=myapp
Pod
version=v02
app=myapp
Pod
version=v02
app=myapp
 Services use label selectors to do their magic
SQL Server 2019 big data cluster (BDC)
components
SQLServer2019bigdatacluster
Basenodeconfiguration
Applies to nodes across all planes. Services:
 kubelet – K8s local agent
 kube-proxy – network config and forwarding
 supervisord – process monitor and control
 fluentd – node logging
 flanneld – Software defined network
 collectd – OS and application data collection
SQL Big Data watchdog– config sync, watchdog, data
collector (DMV, etc)
Kubernetes node
watchdog
kubelet
kube-proxy
supervisord
fluentd
flanned
collectd
ControlPlane
External Endpoints:
 Kubernetes (REST)
 Aris Control Service (REST)
 Knox Gateway (REST gateway for Hadoop APIs)
 SQL Server Master (TDS gateway for data marts and
SQL Master Service)
Services:
 etcd
 Kubernetes Master Services Controller
 SQL Master instance
 SQL Big Data Admin Portal
 Knox Gateway
 HDFS Name Service
 YARN Master
 Hive Metastore
 InfluxDB (metrics store)
 Livy (REST interface for Spark)
 Spark Driver
Kubernetes node
Base node services + etcd
K8s Master service
Spark driver
SQL Big Data Admin portal
InfluxDB
Grafana
Kubernetes node
Base node services + etcd
Controller
Proxy
SQL Master
HDFS Name Node
Kibana
Kubernetes node
Base node services + etcd
Livy
Knox
Elastic Search
HIVE Metastore
YARN Master
Controller
 External REST/HTTPS Endpoint
 Bootstrap and Build out
 Manage Capacity
 Configure High Availability and recover from failure (AGs)
Security (authN, authZ, certificate rotation)
 Lifecycle (upgrade/downgrade/rollback)
 Configuration management
 Monitoring - capacity, health, metrics, logs
 Troubleshooting – performance, failures
 Cluster Admin Portal
Controller service
Buildout
Upgrade/Rollback
Add/Remove capacity
Central AuthZ/AutnN
Cluster Admin Portal
Troubleshooting
Controller
Metadata
SQLMasterInstance
 TDS endpoint into the cluster
 High value data
 OLTP server
 Data connectors
 Machine learning & extensibility
 Scalable query engine
Master instance Availability Group
Primary
Readable
Secondary
Readable
Secondary
Computeplane
 Hosts one or more SQL
Compute Pools
 Compute pool is a group of
instances that forms a data,
security, and resource boundary.
 Compute pool processes
complex distributed queries
against the data plane.
 Local storage is used for
shuffling data if necessary.
Compute pool node
Base node services
SQL Engine
Compute pool node
Base node services
SQL Engine
Compute pool node
Base node services
SQL Engine
Compute pool node
Base node services
SQL Engine
Dataplane
Storage pool:
 Data ingestion through Spark (batch and streaming)
 Data storage in HDFS
 Data access through HDFS and SQL endpoints. SQL
engine reads files in HDFS directly
Data pool:
 Partitioned, in-memory cache for external data
 Scale-out data storage for append only data sets
 Data ingestion through Spark
 Provide persistent SQL Server storage for the cluster
Storage pool node
Base node services
SQL Engine
HDFS
Spark
Data pool node
Base node services
SQL Engine
Storage pool node
Base node services
SQL Engine
HDFS
Spark
Installation,configurationsandtools
Installation methods:
• Cloud - platform such as Azure Kubernetes Service (AKS)
• On-premis - VMs, Bare Metal
• Localhost - using minikube (to be used only for training and testing)
Configurations:
• All-in-One Single Node and Different Multi Node Options
Tools:
• mssqlctl, kubectl, Azure Data Studio, SQL Server 2019 extension,
• Azure CLI (for AKS), mssql-cli, sqlcmd, curl
Demonstrations
Powered by

More Related Content

What's hot

Upgrade your SQL Server like a Ninja
Upgrade your SQL Server like a NinjaUpgrade your SQL Server like a Ninja
Upgrade your SQL Server like a Ninja
Amit Banerjee
 
SQL Server 2017 Deep Dive - @Ignite 2017
SQL Server 2017 Deep Dive - @Ignite 2017SQL Server 2017 Deep Dive - @Ignite 2017
SQL Server 2017 Deep Dive - @Ignite 2017
Travis Wright
 
Modern ETL: Azure Data Factory, Data Lake, and SQL Database
Modern ETL: Azure Data Factory, Data Lake, and SQL DatabaseModern ETL: Azure Data Factory, Data Lake, and SQL Database
Modern ETL: Azure Data Factory, Data Lake, and SQL Database
Eric Bragas
 
AliCloud Object Storage Service (OSS) Core Features
AliCloud Object Storage Service (OSS) Core FeaturesAliCloud Object Storage Service (OSS) Core Features
AliCloud Object Storage Service (OSS) Core Features
Alibaba Cloud
 
Data in Motion: Building Stream-Based Architectures with Qlik Replicate & Kaf...
Data in Motion: Building Stream-Based Architectures with Qlik Replicate & Kaf...Data in Motion: Building Stream-Based Architectures with Qlik Replicate & Kaf...
Data in Motion: Building Stream-Based Architectures with Qlik Replicate & Kaf...
HostedbyConfluent
 
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
Lucas Jellema
 
Real-Time Machine Learning with Redis, Apache Spark, Tensor Flow, and more wi...
Real-Time Machine Learning with Redis, Apache Spark, Tensor Flow, and more wi...Real-Time Machine Learning with Redis, Apache Spark, Tensor Flow, and more wi...
Real-Time Machine Learning with Redis, Apache Spark, Tensor Flow, and more wi...
Databricks
 
Beyond Relational
Beyond RelationalBeyond Relational
Beyond Relational
Lynn Langit
 
Netflix's Big Leap from Oracle to Cassandra
Netflix's Big Leap from Oracle to CassandraNetflix's Big Leap from Oracle to Cassandra
Netflix's Big Leap from Oracle to Cassandra
Roopa Tangirala
 
Microsoft ignite 2018 SQL Server 2019 big data clusters - intro session
Microsoft ignite 2018  SQL Server 2019 big data clusters - intro sessionMicrosoft ignite 2018  SQL Server 2019 big data clusters - intro session
Microsoft ignite 2018 SQL Server 2019 big data clusters - intro session
Travis Wright
 
Logging infrastructure for Microservices using StreamSets Data Collector
Logging infrastructure for Microservices using StreamSets Data CollectorLogging infrastructure for Microservices using StreamSets Data Collector
Logging infrastructure for Microservices using StreamSets Data Collector
Cask Data
 
Organizational compliance and security SQL 2012-2019 by George Walters
Organizational compliance and security SQL 2012-2019 by George WaltersOrganizational compliance and security SQL 2012-2019 by George Walters
Organizational compliance and security SQL 2012-2019 by George Walters
George Walters
 
Big Data Quickstart Series 3: Perform Data Integration
Big Data Quickstart Series 3: Perform Data IntegrationBig Data Quickstart Series 3: Perform Data Integration
Big Data Quickstart Series 3: Perform Data Integration
Alibaba Cloud
 
Cloud-based Linked Data Management for Self-service Application Development
Cloud-based Linked Data Management for Self-service Application DevelopmentCloud-based Linked Data Management for Self-service Application Development
Cloud-based Linked Data Management for Self-service Application Development
Peter Haase
 
Building a data warehouse with AWS Redshift, Matillion and Yellowfin
Building a data warehouse with AWS Redshift, Matillion and YellowfinBuilding a data warehouse with AWS Redshift, Matillion and Yellowfin
Building a data warehouse with AWS Redshift, Matillion and Yellowfin
Lynn Langit
 
Azure SQL Database
Azure SQL DatabaseAzure SQL Database
Azure SQL Database
rockplace
 
Spark
SparkSpark
10 Things Learned Releasing Databricks Enterprise Wide
10 Things Learned Releasing Databricks Enterprise Wide10 Things Learned Releasing Databricks Enterprise Wide
10 Things Learned Releasing Databricks Enterprise Wide
Databricks
 
Benchmarking Aerospike on the Google Cloud - NoSQL Speed with Ease
Benchmarking Aerospike on the Google Cloud - NoSQL Speed with EaseBenchmarking Aerospike on the Google Cloud - NoSQL Speed with Ease
Benchmarking Aerospike on the Google Cloud - NoSQL Speed with Ease
Lynn Langit
 
Azure - Data Platform
Azure - Data PlatformAzure - Data Platform
Azure - Data Platform
giventocode
 

What's hot (20)

Upgrade your SQL Server like a Ninja
Upgrade your SQL Server like a NinjaUpgrade your SQL Server like a Ninja
Upgrade your SQL Server like a Ninja
 
SQL Server 2017 Deep Dive - @Ignite 2017
SQL Server 2017 Deep Dive - @Ignite 2017SQL Server 2017 Deep Dive - @Ignite 2017
SQL Server 2017 Deep Dive - @Ignite 2017
 
Modern ETL: Azure Data Factory, Data Lake, and SQL Database
Modern ETL: Azure Data Factory, Data Lake, and SQL DatabaseModern ETL: Azure Data Factory, Data Lake, and SQL Database
Modern ETL: Azure Data Factory, Data Lake, and SQL Database
 
AliCloud Object Storage Service (OSS) Core Features
AliCloud Object Storage Service (OSS) Core FeaturesAliCloud Object Storage Service (OSS) Core Features
AliCloud Object Storage Service (OSS) Core Features
 
Data in Motion: Building Stream-Based Architectures with Qlik Replicate & Kaf...
Data in Motion: Building Stream-Based Architectures with Qlik Replicate & Kaf...Data in Motion: Building Stream-Based Architectures with Qlik Replicate & Kaf...
Data in Motion: Building Stream-Based Architectures with Qlik Replicate & Kaf...
 
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
 
Real-Time Machine Learning with Redis, Apache Spark, Tensor Flow, and more wi...
Real-Time Machine Learning with Redis, Apache Spark, Tensor Flow, and more wi...Real-Time Machine Learning with Redis, Apache Spark, Tensor Flow, and more wi...
Real-Time Machine Learning with Redis, Apache Spark, Tensor Flow, and more wi...
 
Beyond Relational
Beyond RelationalBeyond Relational
Beyond Relational
 
Netflix's Big Leap from Oracle to Cassandra
Netflix's Big Leap from Oracle to CassandraNetflix's Big Leap from Oracle to Cassandra
Netflix's Big Leap from Oracle to Cassandra
 
Microsoft ignite 2018 SQL Server 2019 big data clusters - intro session
Microsoft ignite 2018  SQL Server 2019 big data clusters - intro sessionMicrosoft ignite 2018  SQL Server 2019 big data clusters - intro session
Microsoft ignite 2018 SQL Server 2019 big data clusters - intro session
 
Logging infrastructure for Microservices using StreamSets Data Collector
Logging infrastructure for Microservices using StreamSets Data CollectorLogging infrastructure for Microservices using StreamSets Data Collector
Logging infrastructure for Microservices using StreamSets Data Collector
 
Organizational compliance and security SQL 2012-2019 by George Walters
Organizational compliance and security SQL 2012-2019 by George WaltersOrganizational compliance and security SQL 2012-2019 by George Walters
Organizational compliance and security SQL 2012-2019 by George Walters
 
Big Data Quickstart Series 3: Perform Data Integration
Big Data Quickstart Series 3: Perform Data IntegrationBig Data Quickstart Series 3: Perform Data Integration
Big Data Quickstart Series 3: Perform Data Integration
 
Cloud-based Linked Data Management for Self-service Application Development
Cloud-based Linked Data Management for Self-service Application DevelopmentCloud-based Linked Data Management for Self-service Application Development
Cloud-based Linked Data Management for Self-service Application Development
 
Building a data warehouse with AWS Redshift, Matillion and Yellowfin
Building a data warehouse with AWS Redshift, Matillion and YellowfinBuilding a data warehouse with AWS Redshift, Matillion and Yellowfin
Building a data warehouse with AWS Redshift, Matillion and Yellowfin
 
Azure SQL Database
Azure SQL DatabaseAzure SQL Database
Azure SQL Database
 
Spark
SparkSpark
Spark
 
10 Things Learned Releasing Databricks Enterprise Wide
10 Things Learned Releasing Databricks Enterprise Wide10 Things Learned Releasing Databricks Enterprise Wide
10 Things Learned Releasing Databricks Enterprise Wide
 
Benchmarking Aerospike on the Google Cloud - NoSQL Speed with Ease
Benchmarking Aerospike on the Google Cloud - NoSQL Speed with EaseBenchmarking Aerospike on the Google Cloud - NoSQL Speed with Ease
Benchmarking Aerospike on the Google Cloud - NoSQL Speed with Ease
 
Azure - Data Platform
Azure - Data PlatformAzure - Data Platform
Azure - Data Platform
 

Similar to Discovery Day 2019 Sofia - Big data clusters

The roadmap for sql server 2019
The roadmap for sql server 2019The roadmap for sql server 2019
The roadmap for sql server 2019
Javier Villegas
 
Experience sql server on l inux and docker
Experience sql server on l inux and dockerExperience sql server on l inux and docker
Experience sql server on l inux and docker
Bob Ward
 
Dragonflow Austin Summit Talk
Dragonflow Austin Summit Talk Dragonflow Austin Summit Talk
Dragonflow Austin Summit Talk
Eran Gampel
 
Brk2051 sql server on linux and docker
Brk2051 sql server on linux and dockerBrk2051 sql server on linux and docker
Brk2051 sql server on linux and docker
Bob Ward
 
Microsoft ignite 2018 SQL server 2019 big data clusters - deep dive session
Microsoft ignite 2018 SQL server 2019 big data clusters - deep dive sessionMicrosoft ignite 2018 SQL server 2019 big data clusters - deep dive session
Microsoft ignite 2018 SQL server 2019 big data clusters - deep dive session
Travis Wright
 
Event Streaming Architectures with Confluent and ScyllaDB
Event Streaming Architectures with Confluent and ScyllaDBEvent Streaming Architectures with Confluent and ScyllaDB
Event Streaming Architectures with Confluent and ScyllaDB
ScyllaDB
 
Migrate or modernize your database applications using Azure SQL Database Mana...
Migrate or modernize your database applications using Azure SQL Database Mana...Migrate or modernize your database applications using Azure SQL Database Mana...
Migrate or modernize your database applications using Azure SQL Database Mana...
ALI ANWAR, OCP®
 
Deploying windows containers with kubernetes
Deploying windows containers with kubernetesDeploying windows containers with kubernetes
Deploying windows containers with kubernetes
Ben Hall
 
Dockercon2015_paypal
Dockercon2015_paypalDockercon2015_paypal
Dockercon2015_paypal
ahunnargikar
 
Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LME
Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LMESet your Data in Motion with Confluent & Apache Kafka Tech Talk Series LME
Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LME
confluent
 
Azure Virtual Machines Deployment Scenarios
Azure Virtual Machines Deployment ScenariosAzure Virtual Machines Deployment Scenarios
Azure Virtual Machines Deployment Scenarios
Brian Benz
 
TopStack Product Architecture 2013-Q3
TopStack Product Architecture 2013-Q3TopStack Product Architecture 2013-Q3
TopStack Product Architecture 2013-Q3
TranscendComputing
 
Best Practice SharePoint Architecture
Best Practice SharePoint ArchitectureBest Practice SharePoint Architecture
Best Practice SharePoint Architecture
Michael Noel
 
Kubernetes for Docker Developers
Kubernetes for Docker DevelopersKubernetes for Docker Developers
Kubernetes for Docker Developers
Red Hat Developers
 
CloudStack DC Meetup - Apache CloudStack Overview and 4.1/4.2 Preview
CloudStack DC Meetup - Apache CloudStack Overview and 4.1/4.2 PreviewCloudStack DC Meetup - Apache CloudStack Overview and 4.1/4.2 Preview
CloudStack DC Meetup - Apache CloudStack Overview and 4.1/4.2 Preview
Chip Childers
 
StrongLoop Overview
StrongLoop OverviewStrongLoop Overview
StrongLoop Overview
Shubhra Kar
 
Private Cloud with Open Stack, Docker
Private Cloud with Open Stack, DockerPrivate Cloud with Open Stack, Docker
Private Cloud with Open Stack, Docker
Davinder Kohli
 
Global Azure Bootcamp 2017 - Why I love S2D for MSSQL on Azure
Global Azure Bootcamp 2017 - Why I love S2D for MSSQL on AzureGlobal Azure Bootcamp 2017 - Why I love S2D for MSSQL on Azure
Global Azure Bootcamp 2017 - Why I love S2D for MSSQL on Azure
Karim Vaes
 
SQL Server 2019 Modern Data Platform.pptx
SQL Server 2019 Modern Data Platform.pptxSQL Server 2019 Modern Data Platform.pptx
SQL Server 2019 Modern Data Platform.pptx
QuyVo27
 
Enabling Microservices Frameworks to Solve Business Problems
Enabling Microservices Frameworks to Solve  Business ProblemsEnabling Microservices Frameworks to Solve  Business Problems
Enabling Microservices Frameworks to Solve Business Problems
Ken Owens
 

Similar to Discovery Day 2019 Sofia - Big data clusters (20)

The roadmap for sql server 2019
The roadmap for sql server 2019The roadmap for sql server 2019
The roadmap for sql server 2019
 
Experience sql server on l inux and docker
Experience sql server on l inux and dockerExperience sql server on l inux and docker
Experience sql server on l inux and docker
 
Dragonflow Austin Summit Talk
Dragonflow Austin Summit Talk Dragonflow Austin Summit Talk
Dragonflow Austin Summit Talk
 
Brk2051 sql server on linux and docker
Brk2051 sql server on linux and dockerBrk2051 sql server on linux and docker
Brk2051 sql server on linux and docker
 
Microsoft ignite 2018 SQL server 2019 big data clusters - deep dive session
Microsoft ignite 2018 SQL server 2019 big data clusters - deep dive sessionMicrosoft ignite 2018 SQL server 2019 big data clusters - deep dive session
Microsoft ignite 2018 SQL server 2019 big data clusters - deep dive session
 
Event Streaming Architectures with Confluent and ScyllaDB
Event Streaming Architectures with Confluent and ScyllaDBEvent Streaming Architectures with Confluent and ScyllaDB
Event Streaming Architectures with Confluent and ScyllaDB
 
Migrate or modernize your database applications using Azure SQL Database Mana...
Migrate or modernize your database applications using Azure SQL Database Mana...Migrate or modernize your database applications using Azure SQL Database Mana...
Migrate or modernize your database applications using Azure SQL Database Mana...
 
Deploying windows containers with kubernetes
Deploying windows containers with kubernetesDeploying windows containers with kubernetes
Deploying windows containers with kubernetes
 
Dockercon2015_paypal
Dockercon2015_paypalDockercon2015_paypal
Dockercon2015_paypal
 
Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LME
Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LMESet your Data in Motion with Confluent & Apache Kafka Tech Talk Series LME
Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LME
 
Azure Virtual Machines Deployment Scenarios
Azure Virtual Machines Deployment ScenariosAzure Virtual Machines Deployment Scenarios
Azure Virtual Machines Deployment Scenarios
 
TopStack Product Architecture 2013-Q3
TopStack Product Architecture 2013-Q3TopStack Product Architecture 2013-Q3
TopStack Product Architecture 2013-Q3
 
Best Practice SharePoint Architecture
Best Practice SharePoint ArchitectureBest Practice SharePoint Architecture
Best Practice SharePoint Architecture
 
Kubernetes for Docker Developers
Kubernetes for Docker DevelopersKubernetes for Docker Developers
Kubernetes for Docker Developers
 
CloudStack DC Meetup - Apache CloudStack Overview and 4.1/4.2 Preview
CloudStack DC Meetup - Apache CloudStack Overview and 4.1/4.2 PreviewCloudStack DC Meetup - Apache CloudStack Overview and 4.1/4.2 Preview
CloudStack DC Meetup - Apache CloudStack Overview and 4.1/4.2 Preview
 
StrongLoop Overview
StrongLoop OverviewStrongLoop Overview
StrongLoop Overview
 
Private Cloud with Open Stack, Docker
Private Cloud with Open Stack, DockerPrivate Cloud with Open Stack, Docker
Private Cloud with Open Stack, Docker
 
Global Azure Bootcamp 2017 - Why I love S2D for MSSQL on Azure
Global Azure Bootcamp 2017 - Why I love S2D for MSSQL on AzureGlobal Azure Bootcamp 2017 - Why I love S2D for MSSQL on Azure
Global Azure Bootcamp 2017 - Why I love S2D for MSSQL on Azure
 
SQL Server 2019 Modern Data Platform.pptx
SQL Server 2019 Modern Data Platform.pptxSQL Server 2019 Modern Data Platform.pptx
SQL Server 2019 Modern Data Platform.pptx
 
Enabling Microservices Frameworks to Solve Business Problems
Enabling Microservices Frameworks to Solve  Business ProblemsEnabling Microservices Frameworks to Solve  Business Problems
Enabling Microservices Frameworks to Solve Business Problems
 

More from Ivan Donev

Power bi - enterprise cloud reporting platform Azure Bootcamp 19
Power bi - enterprise cloud reporting platform Azure Bootcamp 19Power bi - enterprise cloud reporting platform Azure Bootcamp 19
Power bi - enterprise cloud reporting platform Azure Bootcamp 19
Ivan Donev
 
Tips and tricks to optimiza SQL Server Backup and Restore
Tips and tricks to optimiza SQL Server Backup and RestoreTips and tricks to optimiza SQL Server Backup and Restore
Tips and tricks to optimiza SQL Server Backup and Restore
Ivan Donev
 
Get the most out of your Windows Azure VMs
Get the most out of your Windows Azure VMsGet the most out of your Windows Azure VMs
Get the most out of your Windows Azure VMs
Ivan Donev
 
Develop your database with Visual Studio
Develop your database with Visual StudioDevelop your database with Visual Studio
Develop your database with Visual Studio
Ivan Donev
 
Windows Azure Bootcamp - Microsoft BI in Azure VMs
Windows Azure Bootcamp - Microsoft BI in Azure VMsWindows Azure Bootcamp - Microsoft BI in Azure VMs
Windows Azure Bootcamp - Microsoft BI in Azure VMs
Ivan Donev
 
Building your first AS solution
Building your first AS solutionBuilding your first AS solution
Building your first AS solution
Ivan Donev
 
Sql server consolidation and virtualization
Sql server consolidation and virtualizationSql server consolidation and virtualization
Sql server consolidation and virtualization
Ivan Donev
 
Self-service BI with PowerPivot and PowerView
Self-service BI with PowerPivot and PowerViewSelf-service BI with PowerPivot and PowerView
Self-service BI with PowerPivot and PowerView
Ivan Donev
 
Is "the bigger the beter" valid in the database world
Is "the bigger the beter" valid in the database worldIs "the bigger the beter" valid in the database world
Is "the bigger the beter" valid in the database world
Ivan Donev
 

More from Ivan Donev (9)

Power bi - enterprise cloud reporting platform Azure Bootcamp 19
Power bi - enterprise cloud reporting platform Azure Bootcamp 19Power bi - enterprise cloud reporting platform Azure Bootcamp 19
Power bi - enterprise cloud reporting platform Azure Bootcamp 19
 
Tips and tricks to optimiza SQL Server Backup and Restore
Tips and tricks to optimiza SQL Server Backup and RestoreTips and tricks to optimiza SQL Server Backup and Restore
Tips and tricks to optimiza SQL Server Backup and Restore
 
Get the most out of your Windows Azure VMs
Get the most out of your Windows Azure VMsGet the most out of your Windows Azure VMs
Get the most out of your Windows Azure VMs
 
Develop your database with Visual Studio
Develop your database with Visual StudioDevelop your database with Visual Studio
Develop your database with Visual Studio
 
Windows Azure Bootcamp - Microsoft BI in Azure VMs
Windows Azure Bootcamp - Microsoft BI in Azure VMsWindows Azure Bootcamp - Microsoft BI in Azure VMs
Windows Azure Bootcamp - Microsoft BI in Azure VMs
 
Building your first AS solution
Building your first AS solutionBuilding your first AS solution
Building your first AS solution
 
Sql server consolidation and virtualization
Sql server consolidation and virtualizationSql server consolidation and virtualization
Sql server consolidation and virtualization
 
Self-service BI with PowerPivot and PowerView
Self-service BI with PowerPivot and PowerViewSelf-service BI with PowerPivot and PowerView
Self-service BI with PowerPivot and PowerView
 
Is "the bigger the beter" valid in the database world
Is "the bigger the beter" valid in the database worldIs "the bigger the beter" valid in the database world
Is "the bigger the beter" valid in the database world
 

Recently uploaded

Keynote : AI & Future Of Offensive Security
Keynote : AI & Future Of Offensive SecurityKeynote : AI & Future Of Offensive Security
Keynote : AI & Future Of Offensive Security
Priyanka Aash
 
Discovery Series - Zero to Hero - Task Mining Session 1
Discovery Series - Zero to Hero - Task Mining Session 1Discovery Series - Zero to Hero - Task Mining Session 1
Discovery Series - Zero to Hero - Task Mining Session 1
DianaGray10
 
Increase Quality with User Access Policies - July 2024
Increase Quality with User Access Policies - July 2024Increase Quality with User Access Policies - July 2024
Increase Quality with User Access Policies - July 2024
Peter Caitens
 
Retrieval Augmented Generation Evaluation with Ragas
Retrieval Augmented Generation Evaluation with RagasRetrieval Augmented Generation Evaluation with Ragas
Retrieval Augmented Generation Evaluation with Ragas
Zilliz
 
"Hands-on development experience using wasm Blazor", Furdak Vladyslav.pptx
"Hands-on development experience using wasm Blazor", Furdak Vladyslav.pptx"Hands-on development experience using wasm Blazor", Furdak Vladyslav.pptx
"Hands-on development experience using wasm Blazor", Furdak Vladyslav.pptx
Fwdays
 
Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...
Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...
Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...
Snarky Security
 
Exchange, Entra ID, Conectores, RAML: Todo, a la vez, en todas partes
Exchange, Entra ID, Conectores, RAML: Todo, a la vez, en todas partesExchange, Entra ID, Conectores, RAML: Todo, a la vez, en todas partes
Exchange, Entra ID, Conectores, RAML: Todo, a la vez, en todas partes
jorgelebrato
 
Indian Privacy law & Infosec for Startups
Indian Privacy law & Infosec for StartupsIndian Privacy law & Infosec for Startups
Indian Privacy law & Infosec for Startups
AMol NAik
 
Generative AI Reasoning Tech Talk - July 2024
Generative AI Reasoning Tech Talk - July 2024Generative AI Reasoning Tech Talk - July 2024
Generative AI Reasoning Tech Talk - July 2024
siddu769252
 
Zaitechno Handheld Raman Spectrometer.pdf
Zaitechno Handheld Raman Spectrometer.pdfZaitechno Handheld Raman Spectrometer.pdf
Zaitechno Handheld Raman Spectrometer.pdf
AmandaCheung15
 
FIDO Munich Seminar FIDO Automotive Apps.pptx
FIDO Munich Seminar FIDO Automotive Apps.pptxFIDO Munich Seminar FIDO Automotive Apps.pptx
FIDO Munich Seminar FIDO Automotive Apps.pptx
FIDO Alliance
 
The Challenge of Interpretability in Generative AI Models.pdf
The Challenge of Interpretability in Generative AI Models.pdfThe Challenge of Interpretability in Generative AI Models.pdf
The Challenge of Interpretability in Generative AI Models.pdf
Sara Kroft
 
Choosing the Best Outlook OST to PST Converter: Key Features and Considerations
Choosing the Best Outlook OST to PST Converter: Key Features and ConsiderationsChoosing the Best Outlook OST to PST Converter: Key Features and Considerations
Choosing the Best Outlook OST to PST Converter: Key Features and Considerations
webbyacad software
 
History and Introduction for Generative AI ( GenAI )
History and Introduction for Generative AI ( GenAI )History and Introduction for Generative AI ( GenAI )
History and Introduction for Generative AI ( GenAI )
Badri_Bady
 
How UiPath Discovery Suite supports identification of Agentic Process Automat...
How UiPath Discovery Suite supports identification of Agentic Process Automat...How UiPath Discovery Suite supports identification of Agentic Process Automat...
How UiPath Discovery Suite supports identification of Agentic Process Automat...
DianaGray10
 
Generative AI technology is a fascinating field that focuses on creating comp...
Generative AI technology is a fascinating field that focuses on creating comp...Generative AI technology is a fascinating field that focuses on creating comp...
Generative AI technology is a fascinating field that focuses on creating comp...
Nohoax Kanont
 
AMD Zen 5 Architecture Deep Dive from Tech Day
AMD Zen 5 Architecture Deep Dive from Tech DayAMD Zen 5 Architecture Deep Dive from Tech Day
AMD Zen 5 Architecture Deep Dive from Tech Day
Low Hong Chuan
 
Mastering Board Best Practices: Essential Skills for Effective Non-profit Lea...
Mastering Board Best Practices: Essential Skills for Effective Non-profit Lea...Mastering Board Best Practices: Essential Skills for Effective Non-profit Lea...
Mastering Board Best Practices: Essential Skills for Effective Non-profit Lea...
OnBoard
 
Top 12 AI Technology Trends For 2024.pdf
Top 12 AI Technology Trends For 2024.pdfTop 12 AI Technology Trends For 2024.pdf
Top 12 AI Technology Trends For 2024.pdf
Marrie Morris
 
Mule Experience Hub and Release Channel with Java 17
Mule Experience Hub and Release Channel with Java 17Mule Experience Hub and Release Channel with Java 17
Mule Experience Hub and Release Channel with Java 17
Bhajan Mehta
 

Recently uploaded (20)

Keynote : AI & Future Of Offensive Security
Keynote : AI & Future Of Offensive SecurityKeynote : AI & Future Of Offensive Security
Keynote : AI & Future Of Offensive Security
 
Discovery Series - Zero to Hero - Task Mining Session 1
Discovery Series - Zero to Hero - Task Mining Session 1Discovery Series - Zero to Hero - Task Mining Session 1
Discovery Series - Zero to Hero - Task Mining Session 1
 
Increase Quality with User Access Policies - July 2024
Increase Quality with User Access Policies - July 2024Increase Quality with User Access Policies - July 2024
Increase Quality with User Access Policies - July 2024
 
Retrieval Augmented Generation Evaluation with Ragas
Retrieval Augmented Generation Evaluation with RagasRetrieval Augmented Generation Evaluation with Ragas
Retrieval Augmented Generation Evaluation with Ragas
 
"Hands-on development experience using wasm Blazor", Furdak Vladyslav.pptx
"Hands-on development experience using wasm Blazor", Furdak Vladyslav.pptx"Hands-on development experience using wasm Blazor", Furdak Vladyslav.pptx
"Hands-on development experience using wasm Blazor", Furdak Vladyslav.pptx
 
Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...
Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...
Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...
 
Exchange, Entra ID, Conectores, RAML: Todo, a la vez, en todas partes
Exchange, Entra ID, Conectores, RAML: Todo, a la vez, en todas partesExchange, Entra ID, Conectores, RAML: Todo, a la vez, en todas partes
Exchange, Entra ID, Conectores, RAML: Todo, a la vez, en todas partes
 
Indian Privacy law & Infosec for Startups
Indian Privacy law & Infosec for StartupsIndian Privacy law & Infosec for Startups
Indian Privacy law & Infosec for Startups
 
Generative AI Reasoning Tech Talk - July 2024
Generative AI Reasoning Tech Talk - July 2024Generative AI Reasoning Tech Talk - July 2024
Generative AI Reasoning Tech Talk - July 2024
 
Zaitechno Handheld Raman Spectrometer.pdf
Zaitechno Handheld Raman Spectrometer.pdfZaitechno Handheld Raman Spectrometer.pdf
Zaitechno Handheld Raman Spectrometer.pdf
 
FIDO Munich Seminar FIDO Automotive Apps.pptx
FIDO Munich Seminar FIDO Automotive Apps.pptxFIDO Munich Seminar FIDO Automotive Apps.pptx
FIDO Munich Seminar FIDO Automotive Apps.pptx
 
The Challenge of Interpretability in Generative AI Models.pdf
The Challenge of Interpretability in Generative AI Models.pdfThe Challenge of Interpretability in Generative AI Models.pdf
The Challenge of Interpretability in Generative AI Models.pdf
 
Choosing the Best Outlook OST to PST Converter: Key Features and Considerations
Choosing the Best Outlook OST to PST Converter: Key Features and ConsiderationsChoosing the Best Outlook OST to PST Converter: Key Features and Considerations
Choosing the Best Outlook OST to PST Converter: Key Features and Considerations
 
History and Introduction for Generative AI ( GenAI )
History and Introduction for Generative AI ( GenAI )History and Introduction for Generative AI ( GenAI )
History and Introduction for Generative AI ( GenAI )
 
How UiPath Discovery Suite supports identification of Agentic Process Automat...
How UiPath Discovery Suite supports identification of Agentic Process Automat...How UiPath Discovery Suite supports identification of Agentic Process Automat...
How UiPath Discovery Suite supports identification of Agentic Process Automat...
 
Generative AI technology is a fascinating field that focuses on creating comp...
Generative AI technology is a fascinating field that focuses on creating comp...Generative AI technology is a fascinating field that focuses on creating comp...
Generative AI technology is a fascinating field that focuses on creating comp...
 
AMD Zen 5 Architecture Deep Dive from Tech Day
AMD Zen 5 Architecture Deep Dive from Tech DayAMD Zen 5 Architecture Deep Dive from Tech Day
AMD Zen 5 Architecture Deep Dive from Tech Day
 
Mastering Board Best Practices: Essential Skills for Effective Non-profit Lea...
Mastering Board Best Practices: Essential Skills for Effective Non-profit Lea...Mastering Board Best Practices: Essential Skills for Effective Non-profit Lea...
Mastering Board Best Practices: Essential Skills for Effective Non-profit Lea...
 
Top 12 AI Technology Trends For 2024.pdf
Top 12 AI Technology Trends For 2024.pdfTop 12 AI Technology Trends For 2024.pdf
Top 12 AI Technology Trends For 2024.pdf
 
Mule Experience Hub and Release Channel with Java 17
Mule Experience Hub and Release Channel with Java 17Mule Experience Hub and Release Channel with Java 17
Mule Experience Hub and Release Channel with Java 17
 

Discovery Day 2019 Sofia - Big data clusters

  • 1. Powered by SQL Server 2019 Big Data Clusters Rozalina Zaharieva & Dimitar Zahariev
  • 2. SQLServer Big Data Cluster Layout IoT data Controller Cluster Compute plane Compute pool Compute pool SQL Compute Node SQL Compute Node Compute pool SQL Compute Node SQL Compute Node SQL Compute Node Control planeSQL Server Master instance Storage plane Directly read From HDFS Data pool SQL Data Node SQL Data Node Storage Storage HDFS Data Node Spark SQL Server Storage pool Spark SQL Server HDFS Data Node HDFS Data Node Spark SQL Server Kubernetes pod External data sources Microsoft SQL Server Node Persistent storage Node Node Node Node Node Node Node Analytics Custom apps BI
  • 3. Architecturedissection • Kubernetes (K8s) concepts • SQL Server 2019 big data cluster (BDC) components
  • 5. WhatisKubernetesandwhatitdoes?  Kubernetes is a container orchestrator and is responsible for:  Run a cluster of hosts  Schedule containers to run on different hosts  Facilitate the communication between the containers  Provide and control access to/from outside world  Track and optimize the resource usage  Similar solutions  Docker Swarm, Mesos Marathon, Amazon ECS, Hashicorp Nomad
  • 6. K8sarchitectureoverview kube-proxy Kubelet Node1 Pod1 PodN ... kube-proxy Kubelet NodeK Pod1 PodM ... Master Node Scheduler Controller api-server Key-Value Store Master Node Scheduler Controller api-server Key-Value Store Master Node Scheduler Controller api-server Key-Value Store
  • 7. MasterNodes  Responsible for managing the cluster  Typically more than one is installed  In HA mode one Master node is the Leader  Can be reached via CLI (kubectl), APIs, or Dashboard Master Node Scheduler Controller api-server Key-Value Store Master Node Scheduler Controller api-server Key-Value Store Master Node Scheduler Controller api-server Key-Value Store Schedules the work on different nodes Takes care of: 1) Control loops 2) Desired state Performs: 1) Administrative tasks 2) Stores cluster state etcd is used and it can be: 1) part of the master 2) installed externally
  • 8. (Worker)Nodes  Initially called Minions  Container runtime  containerd, rkt, lxd  Kubelet  Communicates with master  Uses CRI shims  kube-proxy  Network proxy Node kube-proxy Kubelet Container Runtime Pod 1 Pod 2
  • 9. Pods(1)  Smallest unit of scheduling  Contains one or more containers  Containers share the pod environment  Scheduled on nodes  Created via manifest files Pod Main container Supporting containers net mount ... Environment
  • 10. Pods(2)  Each pod has unique IP address  Inter-pod communication is via a pod network  Intra-pod communication is via localhost and port Pod 2 10.10.20.21 Pod network Pod 1 10.10.20.20 localhost
  • 11. ReplicationControllers  Higher level workload  Looks after pod or set of pods  Scale up/down pods  Sets Desired State Replication Controller Pod
  • 12. Deployment Deployments  Even higher level workload  Simplifies updates and rollbacks  Declarative and imperative approach  Self documenting  Suitable for versioning Replication Set Pod
  • 13. Services(1)  Provide reliable network endpoint  IP address  DNS name  Port  Expose Pods to the outside world  NodePort (cluster-wide port)  LoadBalancer (cloud-based)  Use End Point object to track Pods IP = 10.10.10.1 DNS = demo-svc Port = 32000 Service Pod A IP, Pod B IP, ... End Point Node 1 Pod A 10.10.20.21 Node 2 Pod B 10.10.20.22
  • 14. Services(2)  Services use label selectors to do their magic Service version=v01 app=myapp Pod version=v01 app=myapp Pod version=v01 app=myapp
  • 18. SQL Server 2019 big data cluster (BDC) components
  • 20. Basenodeconfiguration Applies to nodes across all planes. Services:  kubelet – K8s local agent  kube-proxy – network config and forwarding  supervisord – process monitor and control  fluentd – node logging  flanneld – Software defined network  collectd – OS and application data collection SQL Big Data watchdog– config sync, watchdog, data collector (DMV, etc) Kubernetes node watchdog kubelet kube-proxy supervisord fluentd flanned collectd
  • 21. ControlPlane External Endpoints:  Kubernetes (REST)  Aris Control Service (REST)  Knox Gateway (REST gateway for Hadoop APIs)  SQL Server Master (TDS gateway for data marts and SQL Master Service) Services:  etcd  Kubernetes Master Services Controller  SQL Master instance  SQL Big Data Admin Portal  Knox Gateway  HDFS Name Service  YARN Master  Hive Metastore  InfluxDB (metrics store)  Livy (REST interface for Spark)  Spark Driver Kubernetes node Base node services + etcd K8s Master service Spark driver SQL Big Data Admin portal InfluxDB Grafana Kubernetes node Base node services + etcd Controller Proxy SQL Master HDFS Name Node Kibana Kubernetes node Base node services + etcd Livy Knox Elastic Search HIVE Metastore YARN Master
  • 22. Controller  External REST/HTTPS Endpoint  Bootstrap and Build out  Manage Capacity  Configure High Availability and recover from failure (AGs) Security (authN, authZ, certificate rotation)  Lifecycle (upgrade/downgrade/rollback)  Configuration management  Monitoring - capacity, health, metrics, logs  Troubleshooting – performance, failures  Cluster Admin Portal Controller service Buildout Upgrade/Rollback Add/Remove capacity Central AuthZ/AutnN Cluster Admin Portal Troubleshooting Controller Metadata
  • 23. SQLMasterInstance  TDS endpoint into the cluster  High value data  OLTP server  Data connectors  Machine learning & extensibility  Scalable query engine Master instance Availability Group Primary Readable Secondary Readable Secondary
  • 24. Computeplane  Hosts one or more SQL Compute Pools  Compute pool is a group of instances that forms a data, security, and resource boundary.  Compute pool processes complex distributed queries against the data plane.  Local storage is used for shuffling data if necessary. Compute pool node Base node services SQL Engine Compute pool node Base node services SQL Engine Compute pool node Base node services SQL Engine Compute pool node Base node services SQL Engine
  • 25. Dataplane Storage pool:  Data ingestion through Spark (batch and streaming)  Data storage in HDFS  Data access through HDFS and SQL endpoints. SQL engine reads files in HDFS directly Data pool:  Partitioned, in-memory cache for external data  Scale-out data storage for append only data sets  Data ingestion through Spark  Provide persistent SQL Server storage for the cluster Storage pool node Base node services SQL Engine HDFS Spark Data pool node Base node services SQL Engine Storage pool node Base node services SQL Engine HDFS Spark
  • 26. Installation,configurationsandtools Installation methods: • Cloud - platform such as Azure Kubernetes Service (AKS) • On-premis - VMs, Bare Metal • Localhost - using minikube (to be used only for training and testing) Configurations: • All-in-One Single Node and Different Multi Node Options Tools: • mssqlctl, kubectl, Azure Data Studio, SQL Server 2019 extension, • Azure CLI (for AKS), mssql-cli, sqlcmd, curl