This document provides an overview of the architecture and components of SQL Server 2019 Big Data Clusters. It describes the key Kubernetes concepts used in Big Data Clusters like pods, services, and nodes. It then explains the different planes (control, compute, data) and nodes that make up a Big Data Cluster and their roles. Components in each plane like the SQL master instance, compute pools, storage pools, and data pools are also outlined.
Amit Banerjee is a senior program manager at Microsoft focusing on performance and high availability disaster recovery (HADR) for SQL Server. He has nearly a decade of experience with SQL Server and was previously part of Microsoft's SQL escalation services and premier field engineering teams. Banerjee is also an author of books on SQL Server internals and troubleshooting as well as migration to SQL Server on Azure. In this presentation, he discusses SQL Server 2017's focus on choice, intelligence, and easy migration. He also outlines the upgrade journey and provides an overview of tools and services for database assessment, migration, and modernization.
SQL Server 2017 Deep Dive - @Ignite 2017Travis Wright
This was a presentation given at Ignite 2017 on SQL Server 2017. It covers the main new capabilities of SQL Server 2017. The video recording of the session is available here: https://myignite.microsoft.com/sessions/54946?source=sessions
Modern ETL: Azure Data Factory, Data Lake, and SQL DatabaseEric Bragas
This document discusses modern Extract, Transform, Load (ETL) tools in Azure, including Azure Data Factory, Azure Data Lake, and Azure SQL Database. It provides an overview of each tool and how they can be used together in a data warehouse architecture with Azure Data Lake acting as the data hub and Azure SQL Database being used for analytics and reporting through the creation of data marts. It also includes two demonstrations, one on Azure Data Factory and another showing Azure Data Lake Store and Analytics.
Data in Motion: Building Stream-Based Architectures with Qlik Replicate & Kaf...HostedbyConfluent
The challenge with today’s “data explosion” is finding the most appropriate answer to the question, “So where do I put my data?” while avoiding the longer-term problem: data warehouses, data lakes, cloud storage, NoSQL databases, … are often the places where “big” data goes to die.
Enter Physics 101, and my corollary to Newton’s First Law of Motion:
Data in motion tends to stay in motion until it comes rest on disk. Similarly, if data is at rest, it will remain at rest until an external “force” puts it in motion again.
Data inevitably comes to rest at some point. Without “external forces”, data often gets lost or becomes stale where it lands. “Modern” architectures tend to involve data pipelines where downstream consumers of data make use of data generated upstream, often with built-for-purpose repositories at each stage. This session will explore how data that has come to rest can be put in motion again; how Kafka can keep it in motion longer; and how pipelined architectures might be created to make use of that data.
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...Lucas Jellema
This presentation gives an brief overview of the history of relational databases, ACID and SQL and presents some of the key strentgths and potential weaknesses. It introduces the rise of NoSQL - why it arose, what is entails, when to use it. The presentation focuses on MongoDB as prime example of NoSQL document store and it shows how to interact with MongoDB from JavaScript (NodeJS) and Java.
Real-Time Machine Learning with Redis, Apache Spark, Tensor Flow, and more wi...Databricks
Predictive intelligence from machine learning has the potential to change everything in our day to day experiences, from education to entertainment, from travel to healthcare, from business to leisure and everything in between. Modern ML frameworks are batch by nature and cannot pivot on the fly to changing user data or situations. Many simple ML applications such as those that enhance the user experience, can benefit from real-time robust predictive models that adapt on the fly.
Join this session to learn how common practices in machine learning such as running a trained model in production can be substantially accelerated and radically simplified by using Redis modules that natively store and execute common models generated by Spark ML and Tensorflow algorithms. We will also discuss the implementation of simple, real-time feed-forward neural networks with Neural Redis and scenarios that can benefit from such efficient, accelerated artificial intelligence.
Real-life implementations of these new techniques at a large consumer credit company for fraud analytics, at an online e-commerce provider for user recommendations and at a large media company for targeting content will also be discussed.
The document discusses building data pipelines in the cloud. It covers serverless data pipeline patterns using services like BigQuery, Cloud Storage, Cloud Dataflow, and Cloud Pub/Sub. It also compares Cloud Dataflow and Cloud Dataproc for ETL workflows. Key questions around ingestion and ETL are discussed, focusing on volume, variety, velocity and veracity of data. Cloud vendor offerings for streaming and ETL are also compared.
This document summarizes Netflix's migration from Oracle to Cassandra. It discusses how Netflix moved its backend database from Oracle to Cassandra to gain scalability and reduce costs. The migration strategy involved dual writes to both databases, fork lifting the existing Oracle dataset, and a consistency checker. Challenges included security, denormalization, and engineering effort. Real use cases like APIs and viewing history are discussed along with lessons learned around data modeling, performance testing, and thinking of Cassandra as just storage.
Logging infrastructure for Microservices using StreamSets Data CollectorCask Data
This document discusses using StreamSets Data Collector (SDC) to build a logging infrastructure for microservices. SDC can ingest logs from microservices running in containers and handle issues like schema changes and new log formats. It processes and transforms the logs, sending them to destinations like Kafka. SDC pipelines can run on Spark clusters on Yarn and Mesos to handle large volumes of log data and load it into systems like HDFS, HBase and Elasticsearch for analysis.
Organizational compliance and security SQL 2012-2019 by George WaltersGeorge Walters
The compliance and security aspects of SQL Server, and the greater platform, are covered here. This goes through CTP 2.3 of SQL 2019. I start with the history of security in SQL Server, from the changes with SQL 2005, then into SQL 2008, 2008r2, 2012, 2014, 2016, 2017. We cover the requirement for installation, auditing, encryption, compliance, and so forth.
Big Data Quickstart Series 3: Perform Data IntegrationAlibaba Cloud
This document summarizes Derek Meng's presentation on data integration using Alibaba Cloud's MaxCompute big data platform. It discusses the general process of data integration including data acquisition, transformation, and governance. It provides an overview of MaxCompute basics, including its architecture, basic concepts such as projects and tables, and how to use MaxCompute's data channel and SQL. The document concludes with a brief introduction to DataWorks for data integration and a demo.
Cloud-based Linked Data Management for Self-service Application DevelopmentPeter Haase
Peter Haase and Michael Schmidt of fluid Operations AG presented on developing applications using linked open data. They discussed the increasing amount of linked open data available and challenges in building applications that integrate data from different sources and domains. Their Information Workbench platform aims to address these challenges by allowing users to discover, integrate, and customize applications using linked data in a no-code environment. Key components of the platform include virtualized integration of data sources and the vision of accessing linked data as a cloud-based data service.
This document provides an overview of Apache Spark, including:
- Spark is an open-source cluster computing framework that supports in-memory processing of large datasets across clusters of computers using a concept called resilient distributed datasets (RDDs).
- RDDs allow data to be partitioned across nodes in a fault-tolerant way, and support operations like map, filter, and reduce.
- Spark SQL, DataFrames, and Datasets provide interfaces for structured and semi-structured data processing.
- The document discusses Spark's performance advantages over Hadoop MapReduce and provides examples of common Spark applications like word count, Pi estimation, and stream processing.
10 Things Learned Releasing Databricks Enterprise WideDatabricks
Implementing tools, let alone an entire Unified Data Platform, like Databricks, can be quite the undertaking. Implementing a tool which you have not yet learned all the ins and outs of can be even more frustrating. Have you ever wished that you could take some of that uncertainty away? Four years ago, Western Governors University (WGU) took on the task of rewriting all of our ETL pipelines in Scala/Python, as well as migrating our Enterprise Data Warehouse into Delta, all on the Databricks platform. Starting with 4 users and rapidly growing to over 120 users across 8 business units, our Databricks environment turned into an entire unified platform, being used by individuals of all skill levels, data requirements, and internal security requirements.
Through this process, our team has had the chance and opportunity to learn while making a lot of mistakes. Taking a look back at those mistakes, there are a lot of things we wish we had known before opening the platform to our enterprise.
We would like to share with you 10 things we wish we had known before WGU started operating in our Databricks environment. Covering topics surrounding user management from both an AWS and Databricks perspective, understanding and managing costs, creating custom pipelines for efficient code management, learning about new Apache Spark snippets that helped save us a fortune, and more. We would like to provide our recommendations on how one can overcome these pitfalls to help new, current and prospective users to make their environments easier, safer, and more reliable to work in.
Benchmarking Aerospike on the Google Cloud - NoSQL Speed with EaseLynn Langit
Deck from blog post detailing our work with Aerospike to verify their performance benchmark on the Google Cloud, using GCE (Google Compute Engine) instances of 4 million TPS. Blog post is here -- http://googlecloudplatform.blogspot.com/2015/10/speed-with-Ease-NoSQL-on-the-Google-Cloud-Platform.html
This document summarizes key components of Microsoft Azure's data platform, including SQL Database, NoSQL options like Azure Tables, Blob Storage, and Azure Files. It provides an overview of each service, how they work, common use cases, and demos of creating resources and accessing data. The document is aimed at helping readers understand Azure's database and data storage options for building cloud applications.
This document discusses SQL Server 2019 and provides the following information:
1. It introduces Javier Villegas, a technical speaker and SQL Server expert.
2. It outlines several new capabilities in SQL Server 2019 including artificial intelligence, container support, and big data analytics capabilities using Apache Spark.
3. It compares editions and capabilities of SQL Server on Windows and Linux and notes they are largely the same.
Experience sql server on l inux and dockerBob Ward
Microsoft SQL Server provides a full-featured database for Linux that offers high performance, security and flexibility across languages and platforms at a lower cost compared to other commercial databases. It has the most consistent data platform with industry-leading performance on Linux and Windows and supports machine learning and artificial intelligence capabilities. SQL Server on Linux allows customers to deploy the database on their choice of Linux distribution for both traditional and container-based workloads.
Neutron Done the SDN Way
Dragonflow is an open source distributed control plane implementation of Neutron which is an integral part of OpenStack. Dragonflow introduces innovative solutions and features to implement networking and distributed network services in a manner that is both lightweight and simple to extend, yet targeted towards performance-intensive and latency-sensitive applications. Dragonflow aims at solving the performance
This session shows an overview of the features and architecture of SQL Server on Linux and Containers. It covers install, config, performance, security, HADR, Docker containers, and tools. Find the demos on http://aka.ms/bobwardms
Event Streaming Architectures with Confluent and ScyllaDBScyllaDB
Jeff Bean will lead a discussion of event-driven architectures, Apache Kafka, Kafka Connect, KSQL and Confluent Cloud. Then we'll talk about some uses of Confluent and Scylla together, including a co-deployment with Lookout, ScyllaDB and Confluent in the IoT space, and the upcoming native connector.
Migrate or modernize your database applications using Azure SQL Database Mana...ALI ANWAR, OCP®
Data Platform Summit 2019 is a community initiative by eDominer Systems. The agenda included presentations on Azure SQL Database Managed Instance, migration to the cloud with Azure SQL Database, and a demo. Azure SQL Database Managed Instance provides fully managed SQL Server instances in Azure with built-in intelligence and security. It offers several options for migrating SQL Server workloads to the cloud.
Deploying windows containers with kubernetesBen Hall
The document discusses deploying Windows containers with Kubernetes. It covers building Windows containers, deploying containers on Kubernetes, and operating Kubernetes. Specifically, it shows how to:
- Build a Windows container with SQL Server using Docker
- Deploy a .NET Core app container to Kubernetes and expose it using a load balancer
- Scale the deployment to multiple replicas and observe traffic distribution
- Perform rolling updates to deploy new versions of the application
This document discusses how PayPal uses Docker and PaaS to support the scale of its operations, which include 165 million active accounts and processing over 12.5 million payment transactions daily. It describes challenges such as firewall blocking and Elasticsearch issues in production environments. The solutions implemented registry high availability using Supervisord, Nginx, and Swift storage. It also discusses using Dockerized development environments for consistency and simulating production.
Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LMEconfluent
Confluent Platform is supporting London Metal Exchange’s Kafka Centre of Excellence across a number of projects with the main objective to provide a reliable, resilient, scalable and overall efficient Kafka as a Service model to the teams across the entire London Metal Exchange estate.
Azure Virtual Machines Deployment ScenariosBrian Benz
Architecture and Scenarios for deploying Database and middleware applications on Azure Virtual Machines including SQL Server, Oracle, Hadoop, and others.
The document provides an overview of the TopStack architecture, which delivers Platform as a Service (PaaS) capabilities by extending Infrastructure as a Service (IaaS) solutions. TopStack implements many popular AWS services and runs on private and public clouds. The focus for Q3 2013 is to complement OpenStack. TopStack uses common components like service registration, orchestration, logging, and configuration management. It offers services like load balancing, databases, queues and monitoring.
This document discusses best practice recommendations for SharePoint farm architecture. It recommends having a dedicated SQL database server and at least two web/application servers for high availability. It also recommends virtualizing servers to reduce hardware costs and enable easy scaling and failover. For high availability, it recommends using network load balancing and SQL database mirroring across multiple servers and database instances. The document provides guidance on logical architecture, hardware/software requirements, the installation and configuration process, and enabling Kerberos authentication for security.
Yes, Docker is great! We are all very aware of that but now it’s time to take the next step: wrapping it all and deploying to a production environment. For this scenario we need something more. For that “more” we have Kubernetes by Google - a container platform based on the same technology used to deploy billions of containers per month on Google’s infrastructure.
Ready to leverage your Docker skills? Come to this session to see how your current Docker skillset can be easily mapped to Kubernetes concepts and commands. And get ready to deploy your containers in production!
CloudStack DC Meetup - Apache CloudStack Overview and 4.1/4.2 PreviewChip Childers
Chip Childers is the VP of Apache CloudStack and Principal Engineer at SunGard Availability Services.
Apache CloudStack is open source software that can deploy and manage large networks of virtual machines as a scalable IaaS cloud platform. It is a top-level project at the Apache Software Foundation.
CloudStack enables cloud operators to design, install, support, upgrade and scale diverse cloud environments. It also allows application owners to easily consume infrastructure services so that infrastructure does not get in the way of delivering applications to end users.
OpenSource API Server based on Node.js API framework built on supported Node.js platform with Tooling and DevOps. Use cases are Omni-channel API Server, Mobile Backend as a Service (mBaaS) or Next Generation Enterprise Service Bus. Key functionality include built in enterprise connectors, ORM, Offline Sync, Mobile and JS SDKs, Isomorphic JavaScript and Graphical API creation tool.
Global Azure Bootcamp 2017 - Why I love S2D for MSSQL on AzureKarim Vaes
This document provides an overview of the Global Azure Bootcamp event and Azure platform services. It discusses Infrastructure Services, Platform Services, Domain Services, and Security & Management services available on Azure. It then summarizes Azure SQL Database service tiers and flavors including databases that can be scaled up/down or out/in depending on predictable or unpredictable workloads. The document concludes with a discussion of StorageWorkload benchmarks performed with Storage Spaces Direct and Scale-Out File Server on Azure, comparing performance of different drive types.
This document summarizes new features in SQL Server 2019 including intelligent query processing, data classification and auditing, accelerated database recovery, data virtualization, SQL Server replication in one command, additional capabilities and migration tools, and a modern platform with Linux, containers, and machine learning services. It provides examples of how these features can help solve modern data challenges and gain performance without changing applications.
Enabling Microservices Frameworks to Solve Business ProblemsKen Owens
Opening keynote at Mesoscon 2015 with announcements on creating an ecosystem for developing solutions to business problems leveraging Mesos, Mantl.io, Mesosphere Infinity, ZoomData, and Project Calico to create Fog nodes for IoE use cases.
Similar to Discovery Day 2019 Sofia - Big data clusters (20)
Tips and tricks to optimiza SQL Server Backup and RestoreIvan Donev
The backup strategy of every company, running SQL Server is the main reason of making the DBA happy (and able to get his beauty sleep). It the era of enormous data inputs, it is not only important to backup your data, but to back it up fast and to know you can restore it. In this session we will talk about backup strategies, tips and tricks on optimizing the SQL Server backups (both on disk and with 3-rd party software) and last but not least - how to be sure that you can recover and do it in time
Get the most out of your Windows Azure VMsIvan Donev
This document provides tips for optimizing SQL Server performance on Windows Azure VMs. It recommends starting with standard A2 VMs for basic workloads and using larger VMs like A8 or A9 for intensive workloads. It suggests keeping storage accounts close to VMs, disabling caching on data disks, using multiple disks for I/O bandwidth, and avoiding mixing storage accounts. The document also provides tips on NTFS allocation size, instant file initialization, data compression, locked pages in memory, and growing and shrinking databases.
Develop your database with Visual StudioIvan Donev
Ivan Donev presented on SQL Server Data Tools (SSDT) and database development in Visual Studio. SSDT allows developers to design, develop, and deploy databases in a single tool within Visual Studio. It supports SQL Server 2014 and enables connected development with features like SQL Server Object Explorer, a multi-mode table editor, and debugging with LocalDB. SSDT improves integration with Windows Azure and cloud development. While SSDT for Business Intelligence is still separate, the full SSDT tool brings all database development capabilities into Visual Studio.
Windows Azure Bootcamp - Microsoft BI in Azure VMsIvan Donev
This is the presentation from the Global Windows Azure Bootcamp event in Sofia, Bulgaria. The presentation covers topics regarding Business Intelligence components for Microsoft SQL Server 2008 R2 and 2012 on Windows Azure Virtual Machines.
This document outlines the history of Microsoft's OLAP Services product line from 1998 to 2012. It includes the major releases in 1998, 2000, 2005, 2008, and 2012. It also provides an overview of key concepts in multidimensional analysis including cubes, measures, dimensions, and star and snowflake schemas. Finally, it shows how an Analysis Services database can contain multiple cubes that reference shared dimensions and leverage a single data source.
Sql server consolidation and virtualizationIvan Donev
This document discusses SQL Server consolidation and virtualization. It begins with defining consolidation as combining units into more efficient larger units to improve cost efficiency. It then discusses approaches to consolidation like combining databases or instances. Considerations for consolidation like workloads, applications, and manageability are covered. SQL Server virtualization is also discussed, noting the benefits of isolation, migration, and simplification. The market section outlines products that can help like SQL Server 2008 R2 and the HP ProLiant DL980 server. It concludes with discussing how to start a consolidation project through inventory, testing, and migration planning. Tools to help are also listed.
Self-service BI with PowerPivot and PowerViewIvan Donev
This document discusses how to use PowerPivot and Power View for business intelligence and data analysis. PowerPivot allows users to load large amounts of data into Excel and perform self-service analytics. Power View builds on PowerPivot and allows users to visualize and explore data through interactive reports and dashboards. The document outlines several real-world scenarios where PowerPivot and Power View can be used and highlights their key features and capabilities like in-memory processing, tabular data models, and integration with SharePoint. It also provides demonstrations of building reports in Power View.
Is "the bigger the beter" valid in the database worldIvan Donev
What are the best practices for building and managing very large database environments? How can we get an advantage from implementing HP's FastTrack Solution? We will share some real-world tips and tricks around the management of multi-terabyte databases.
Keynote : AI & Future Of Offensive SecurityPriyanka Aash
In the presentation, the focus is on the transformative impact of artificial intelligence (AI) in cybersecurity, particularly in the context of malware generation and adversarial attacks. AI promises to revolutionize the field by enabling scalable solutions to historically challenging problems such as continuous threat simulation, autonomous attack path generation, and the creation of sophisticated attack payloads. The discussions underscore how AI-powered tools like AI-based penetration testing can outpace traditional methods, enhancing security posture by efficiently identifying and mitigating vulnerabilities across complex attack surfaces. The use of AI in red teaming further amplifies these capabilities, allowing organizations to validate security controls effectively against diverse adversarial scenarios. These advancements not only streamline testing processes but also bolster defense strategies, ensuring readiness against evolving cyber threats.
Discovery Series - Zero to Hero - Task Mining Session 1DianaGray10
This session is focused on providing you with an introduction to task mining. We will go over different types of task mining and provide you with a real-world demo on each type of task mining in detail.
Increase Quality with User Access Policies - July 2024Peter Caitens
⭐️ Increase Quality with User Access Policies ⭐️, presented by Peter Caitens and Adam Best of Salesforce. View the slides from this session to hear all about “User Access Policies” and how they can help you onboard users faster with greater quality.
Retrieval Augmented Generation Evaluation with RagasZilliz
Retrieval Augmented Generation (RAG) enhances chatbots by incorporating custom data in the prompt. Using large language models (LLMs) as judge has gained prominence in modern RAG systems. This talk will demo Ragas, an open-source automation tool for RAG evaluations. Christy will talk about and demo evaluating a RAG pipeline using Milvus and RAG metrics like context F1-score and answer correctness.
"Hands-on development experience using wasm Blazor", Furdak Vladyslav.pptxFwdays
I will share my personal experience of full-time development on wasm Blazor
What difficulties our team faced: life hacks with Blazor app routing, whether it is necessary to write JavaScript, which technology stack and architectural patterns we chose
What conclusions we made and what mistakes we committed
Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...Snarky Security
How wonderful it is that in our modern age, every bit of our biological data can be digitized, stored, and potentially pilfered by cyber thieves! Isn't it just splendid to think that while scientists are busy pushing the boundaries of biotechnology, hackers could be plotting the next big bio-data heist? This delightful scenario is brought to you by the ever-expanding digital landscape of biology and biotechnology, where the integration of computer science, engineering, and data science transforms our understanding and manipulation of biological systems.
While the fusion of technology and biology offers immense benefits, it also necessitates a careful consideration of the ethical, security, and associated social implications. But let's be honest, in the grand scheme of things, what's a little risk compared to potential scientific achievements? After all, progress in biotechnology waits for no one, and we're just along for the ride in this thrilling, slightly terrifying, adventure.
So, as we continue to navigate this complex landscape, let's not forget the importance of robust data protection measures and collaborative international efforts to safeguard sensitive biological information. After all, what could possibly go wrong?
-------------------------
This document provides a comprehensive analysis of the security implications biological data use. The analysis explores various aspects of biological data security, including the vulnerabilities associated with data access, the potential for misuse by state and non-state actors, and the implications for national and transnational security. Key aspects considered include the impact of technological advancements on data security, the role of international policies in data governance, and the strategies for mitigating risks associated with unauthorized data access.
This view offers valuable insights for security professionals, policymakers, and industry leaders across various sectors, highlighting the importance of robust data protection measures and collaborative international efforts to safeguard sensitive biological information. The analysis serves as a crucial resource for understanding the complex dynamics at the intersection of biotechnology and security, providing actionable recommendations to enhance biosecurity in an digital and interconnected world.
The evolving landscape of biology and biotechnology, significantly influenced by advancements in computer science, engineering, and data science, is reshaping our understanding and manipulation of biological systems. The integration of these disciplines has led to the development of fields such as computational biology and synthetic biology, which utilize computational power and engineering principles to solve complex biological problems and innovate new biotechnological applications. This interdisciplinary approach has not only accelerated research and development but also introduced new capabilities such as gene editing and biomanufact
The Zaitechno Handheld Raman Spectrometer is a powerful and portable tool for rapid, non-destructive chemical analysis. It utilizes Raman spectroscopy, a technique that analyzes the vibrational fingerprint of molecules to identify their chemical composition. This handheld instrument allows for on-site analysis of materials, making it ideal for a variety of applications, including:
Material identification: Identify unknown materials, minerals, and contaminants.
Quality control: Ensure the quality and consistency of raw materials and finished products.
Pharmaceutical analysis: Verify the identity and purity of pharmaceutical compounds.
Food safety testing: Detect contaminants and adulterants in food products.
Field analysis: Analyze materials in the field, such as during environmental monitoring or forensic investigations.
The Zaitechno Handheld Raman Spectrometer is easy to use and features a user-friendly interface. It is compact and lightweight, making it ideal for field applications. With its rapid analysis capabilities, the Zaitechno Handheld Raman Spectrometer can help you improve efficiency and productivity in your research or quality control workflows.
The Challenge of Interpretability in Generative AI Models.pdfSara Kroft
Navigating the intricacies of generative AI models reveals a pressing challenge: interpretability. Our blog delves into the complexities of understanding how these advanced models make decisions, shedding light on the mechanisms behind their outputs. Explore the latest research, practical implications, and ethical considerations, as we unravel the opaque processes that drive generative AI. Join us in this insightful journey to demystify the black box of artificial intelligence.
Dive into the complexities of generative AI with our blog on interpretability. Find out why making AI models understandable is key to trust and ethical use and discover current efforts to tackle this big challenge.
Choosing the Best Outlook OST to PST Converter: Key Features and Considerationswebbyacad software
When looking for a good software utility to convert Outlook OST files to PST format, it is important to find one that is easy to use and has useful features. WebbyAcad OST to PST Converter Tool is a great choice because it is simple to use for anyone, whether you are tech-savvy or not. It can smoothly change your files to PST while keeping all your data safe and secure. Plus, it can handle large amounts of data and convert multiple files at once, which can save you a lot of time. It even comes with 24*7 technical support assistance and a free trial, so you can try it out before making a decision. Whether you need to recover, move, or back up your data, Webbyacad OST to PST Converter is a reliable option that gives you all the support you need to manage your Outlook data effectively.
How UiPath Discovery Suite supports identification of Agentic Process Automat...DianaGray10
📚 Understand the basics of the newly persona-based LLM-powered Agentic Process Automation and discover how existing UiPath Discovery Suite products like Communication Mining, Process Mining, and Task Mining can be leveraged to identify APA candidates.
Topics Covered:
💡 Idea Behind APA: Explore the innovative concept of Agentic Process Automation and its significance in modern workflows.
🔄 How APA is Different from RPA: Learn the key differences between Agentic Process Automation and Robotic Process Automation.
🚀 Discover the Advantages of APA: Uncover the unique benefits of implementing APA in your organization.
🔍 Identifying APA Candidates with UiPath Discovery Products: See how UiPath's Communication Mining, Process Mining, and Task Mining tools can help pinpoint potential APA candidates.
🔮 Discussion on Expected Future Impacts: Engage in a discussion on the potential future impacts of APA on various industries and business processes.
Enhance your knowledge on the forefront of automation technology and stay ahead with Agentic Process Automation. 🧠💼✨
Speakers:
Arun Kumar Asokan, Delivery Director (US) @ qBotica and UiPath MVP
Naveen Chatlapalli, Solution Architect @ Ashling Partners and UiPath MVP
Generative AI technology is a fascinating field that focuses on creating comp...Nohoax Kanont
Generative AI technology is a fascinating field that focuses on creating computer models capable of generating new, original content. It leverages the power of large language models, neural networks, and machine learning to produce content that can mimic human creativity. This technology has seen a surge in innovation and adoption since the introduction of ChatGPT in 2022, leading to significant productivity benefits across various industries. With its ability to generate text, images, video, and audio, generative AI is transforming how we interact with technology and the types of tasks that can be automated.
Top 12 AI Technology Trends For 2024.pdfMarrie Morris
Technology has become an irreplaceable component of our daily lives. The role of AI in technology revolutionizes our lives for the betterment of the future. In this article, we will learn about the top 12 AI technology trends for 2024.
2. SQLServer Big Data Cluster Layout
IoT data
Controller
Cluster
Compute plane
Compute pool Compute pool
SQL Compute
Node
SQL Compute
Node
Compute pool
SQL Compute
Node
SQL Compute
Node
SQL Compute
Node
Control planeSQL Server
Master instance
Storage plane
Directly read
From HDFS
Data pool
SQL Data
Node
SQL Data
Node
Storage Storage
HDFS Data Node
Spark
SQL
Server
Storage pool
Spark
SQL
Server
HDFS Data Node HDFS Data Node
Spark
SQL
Server
Kubernetes pod
External data sources
Microsoft SQL Server
Node
Persistent storage
Node Node Node Node Node Node Node
Analytics
Custom
apps
BI
5. WhatisKubernetesandwhatitdoes?
Kubernetes is a container orchestrator and is responsible for:
Run a cluster of hosts
Schedule containers to run on different hosts
Facilitate the communication between the containers
Provide and control access to/from outside world
Track and optimize the resource usage
Similar solutions
Docker Swarm, Mesos Marathon, Amazon ECS, Hashicorp Nomad
7. MasterNodes
Responsible for managing the cluster
Typically more than one is installed
In HA mode one Master node is the
Leader
Can be reached via CLI (kubectl),
APIs, or Dashboard
Master Node
Scheduler
Controller
api-server
Key-Value Store
Master Node
Scheduler
Controller
api-server
Key-Value Store
Master Node
Scheduler
Controller
api-server
Key-Value Store
Schedules the work on
different nodes
Takes care of:
1) Control loops
2) Desired state
Performs:
1) Administrative tasks
2) Stores cluster state
etcd is used and it can
be:
1) part of the master
2) installed externally
8. (Worker)Nodes
Initially called Minions
Container runtime
containerd, rkt, lxd
Kubelet
Communicates with master
Uses CRI shims
kube-proxy
Network proxy
Node
kube-proxy Kubelet
Container Runtime
Pod 1
Pod 2
9. Pods(1)
Smallest unit of scheduling
Contains one or more
containers
Containers share the pod
environment
Scheduled on nodes
Created via manifest files
Pod
Main container
Supporting containers
net mount ...
Environment
10. Pods(2)
Each pod has unique IP address
Inter-pod communication is via a pod network
Intra-pod communication is via localhost and
port
Pod 2
10.10.20.21
Pod network
Pod 1
10.10.20.20
localhost
11. ReplicationControllers
Higher level workload
Looks after pod or set of pods
Scale up/down pods
Sets Desired State
Replication Controller
Pod
12. Deployment
Deployments
Even higher level workload
Simplifies updates
and rollbacks
Declarative and imperative
approach
Self documenting
Suitable for versioning
Replication Set
Pod
13. Services(1)
Provide reliable network endpoint
IP address
DNS name
Port
Expose Pods to the outside world
NodePort (cluster-wide port)
LoadBalancer (cloud-based)
Use End Point object to track Pods
IP = 10.10.10.1
DNS = demo-svc
Port = 32000
Service
Pod A IP, Pod B IP, ...
End Point
Node 1
Pod A
10.10.20.21
Node 2
Pod B
10.10.20.22
14. Services(2)
Services use label selectors to do their magic
Service
version=v01
app=myapp
Pod
version=v01
app=myapp
Pod
version=v01
app=myapp
20. Basenodeconfiguration
Applies to nodes across all planes. Services:
kubelet – K8s local agent
kube-proxy – network config and forwarding
supervisord – process monitor and control
fluentd – node logging
flanneld – Software defined network
collectd – OS and application data collection
SQL Big Data watchdog– config sync, watchdog, data
collector (DMV, etc)
Kubernetes node
watchdog
kubelet
kube-proxy
supervisord
fluentd
flanned
collectd
21. ControlPlane
External Endpoints:
Kubernetes (REST)
Aris Control Service (REST)
Knox Gateway (REST gateway for Hadoop APIs)
SQL Server Master (TDS gateway for data marts and
SQL Master Service)
Services:
etcd
Kubernetes Master Services Controller
SQL Master instance
SQL Big Data Admin Portal
Knox Gateway
HDFS Name Service
YARN Master
Hive Metastore
InfluxDB (metrics store)
Livy (REST interface for Spark)
Spark Driver
Kubernetes node
Base node services + etcd
K8s Master service
Spark driver
SQL Big Data Admin portal
InfluxDB
Grafana
Kubernetes node
Base node services + etcd
Controller
Proxy
SQL Master
HDFS Name Node
Kibana
Kubernetes node
Base node services + etcd
Livy
Knox
Elastic Search
HIVE Metastore
YARN Master
22. Controller
External REST/HTTPS Endpoint
Bootstrap and Build out
Manage Capacity
Configure High Availability and recover from failure (AGs)
Security (authN, authZ, certificate rotation)
Lifecycle (upgrade/downgrade/rollback)
Configuration management
Monitoring - capacity, health, metrics, logs
Troubleshooting – performance, failures
Cluster Admin Portal
Controller service
Buildout
Upgrade/Rollback
Add/Remove capacity
Central AuthZ/AutnN
Cluster Admin Portal
Troubleshooting
Controller
Metadata
23. SQLMasterInstance
TDS endpoint into the cluster
High value data
OLTP server
Data connectors
Machine learning & extensibility
Scalable query engine
Master instance Availability Group
Primary
Readable
Secondary
Readable
Secondary
24. Computeplane
Hosts one or more SQL
Compute Pools
Compute pool is a group of
instances that forms a data,
security, and resource boundary.
Compute pool processes
complex distributed queries
against the data plane.
Local storage is used for
shuffling data if necessary.
Compute pool node
Base node services
SQL Engine
Compute pool node
Base node services
SQL Engine
Compute pool node
Base node services
SQL Engine
Compute pool node
Base node services
SQL Engine
25. Dataplane
Storage pool:
Data ingestion through Spark (batch and streaming)
Data storage in HDFS
Data access through HDFS and SQL endpoints. SQL
engine reads files in HDFS directly
Data pool:
Partitioned, in-memory cache for external data
Scale-out data storage for append only data sets
Data ingestion through Spark
Provide persistent SQL Server storage for the cluster
Storage pool node
Base node services
SQL Engine
HDFS
Spark
Data pool node
Base node services
SQL Engine
Storage pool node
Base node services
SQL Engine
HDFS
Spark
26. Installation,configurationsandtools
Installation methods:
• Cloud - platform such as Azure Kubernetes Service (AKS)
• On-premis - VMs, Bare Metal
• Localhost - using minikube (to be used only for training and testing)
Configurations:
• All-in-One Single Node and Different Multi Node Options
Tools:
• mssqlctl, kubectl, Azure Data Studio, SQL Server 2019 extension,
• Azure CLI (for AKS), mssql-cli, sqlcmd, curl