This document provides an overview of Azure SQL Data Warehouse. It discusses what Azure SQL Data Warehouse is, how it is provisioned and scaled, best practices for designing tables in Azure SQL DW including distribution keys and data types, and methods for loading and querying data including PolyBase and labeling queries for monitoring. The presentation also covers tuning aspects like statistics, indexing, and resource classes.
This session covers how to work with PySpark interface to develop Spark applications. From loading, ingesting, and applying transformation on the data. The session covers how to work with different data sources of data, apply transformation, python best practices in developing Spark Apps. The demo covers integrating Apache Spark apps, In memory processing capabilities, working with notebooks, and integrating analytics tools into Spark Applications.
1- Introduction of Azure data factory.pptxBRIJESH KUMAR
Azure Data Factory is a cloud-based data integration service that allows users to easily construct extract, transform, load (ETL) and extract, load, transform (ELT) processes without code. It offers job scheduling, security for data in transit, integration with source control for continuous delivery, and scalability for large data volumes. The document demonstrates how to create an Azure Data Factory from the Azure portal.
This document provides an overview of Azure Data Factory (ADF), including why it is used, its key components and activities, how it works, and differences between versions 1 and 2. It describes the main steps in ADF as connect and collect, transform and enrich, publish, and monitor. The main components are pipelines, activities, datasets, and linked services. Activities include data movement, transformation, and control. Integration runtime and system variables are also summarized.
Dustin Vannoy presented on using Delta Lake with Azure Databricks. He began with an introduction to Spark and Databricks, demonstrating how to set up a workspace. He then discussed limitations of Spark including lack of ACID compliance and small file problems. Delta Lake addresses these issues with transaction logs for ACID transactions, schema enforcement, automatic file compaction, and performance optimizations like time travel. The presentation included demos of Delta Lake capabilities like schema validation, merging, and querying past versions of data.
This document provides an overview of AWS Lake Formation and related services for building a secure data lake. It discusses how Lake Formation provides a centralized management layer for data ingestion, cleaning, security and access. It also describes how Lake Formation integrates with services like AWS Glue, Amazon S3 and ML transforms to simplify and automate many data lake tasks. Finally, it provides an example workflow for using Lake Formation to deduplicate data from various sources and grant secure access for analysis.
A Step By Step Guide To Put DB2 On Amazon CloudDeepak Rao
This document provides steps for setting up DB2 9.7 on the Amazon Cloud Platform (AWS). It discusses key AWS services like EC2, S3, EBS, and AMIs. The steps include creating an AWS account, launching a pre-configured DB2 AMI instance on EC2, accepting the product license, configuring security and storage, creating databases, and testing connectivity. Costs for 5 hours of using DB2 on AWS are also estimated.
Organizations are grappling to manually classify and create an inventory for distributed and heterogeneous data assets to deliver value. However, the new Azure service for enterprises – Azure Synapse Analytics is poised to help organizations and fill the gap between data warehouses and data lakes.
Best Practices for Running PostgreSQL on AWS - DAT314 - re:Invent 2017Amazon Web Services
PostgreSQL is an open source database growing in popularity because of its rich features, vibrant community, and compatibility with commercial databases. Learn about ways to run PostgreSQL on AWS including self-managed, and the managed database services from AWS: Amazon Relational Database Service (Amazon RDS) and the Amazon Aurora PostgreSQL-compatible Edition. This talk covers key Amazon RDS for PostgreSQL functionality, availability, and management. We also review general guidelines for common user operations and activities such as migration, tuning, and monitoring for their RDS for PostgreSQL instances.
Amazon Kinesis Data Analytics는 실시간으로 스트리밍 데이터를 처리하고 분석할 수 있는 서버리스 서비스입니다. Kinesis Data Analytics를 사용하면 로그 분석, 클릭스트림 분석, 사물 인터넷(IoT), 광고 기술, 게임 등의 대규모의 스트림을 처리할 수 있는 애플리케이션을 신속하고 유연하게 구축할 수 있으며 유지관리의 어려움에서 벗어날 수 있습니다. 이 세션에서는 Kinesis Data Analytics의 동작과 기능, 운영상의 모범 사례에 대한 설명을 바탕으로 Streaming Application 개발, Studio Notebook 활용하는 방법을 데모를 통해 알아봅니다.
Amazon Redshift Deep Dive - Serverless, Streaming, ML, Auto Copy (New feature...Amazon Web Services Korea
이 세션에 참여하여 Amazon Redshift의 새로운 기능을 자세히 살펴보십시오. Amazon Data Sharing, Amazon Redshift Serverless, Redshift Streaming, Redshift ML 및 자동 복사 등에 대한 자세한 내용과 데모를 통해 Amazon Redshift의 새로운 기능을 알고 싶은 사용자에게 적합합니다.
Snowflake concepts & hands on expertise to help get you started on implementing Data warehouses using Snowflake. Necessary information and skills that will help you master Snowflake essentials.
NOVA SQL User Group - Azure Synapse Analytics Overview - May 2020Timothy McAliley
Jim Boriotti presents an overview and demo of Azure Synapse Analytics, an integrated data platform for business intelligence, artificial intelligence, and continuous intelligence. Azure Synapse Analytics includes Synapse SQL for querying with T-SQL, Synapse Spark for notebooks in Python, Scala, and .NET, and Synapse Pipelines for data workflows. The demo shows how Azure Synapse Analytics provides a unified environment for all data tasks through the Synapse Studio interface.
Amazon RDS allows you to launch an optimally configured, secure and highly available database with just a few clicks. It provides cost-efficient and resizable capacity while managing time-consuming database administration tasks, freeing you to focus on your applications and business.
Organizations need to gain insight and knowledge from a growing number of Internet of Things (IoT), APIs, clickstreams, unstructured and log data sources. However, organizations are also often limited by legacy data warehouses and ETL processes that were designed for transactional data. In this session, we introduce key ETL features of AWS Glue, cover common use cases ranging from scheduled nightly data warehouse loads to near real-time, event-driven ETL flows for your data lake. We discuss how to build scalable, efficient, and serverless ETL pipelines using AWS Glue. Additionally, Merck will share how they built an end-to-end ETL pipeline for their application release management system, and launched it in production in less than a week using AWS Glue.
This document outlines an agenda for a 90-minute workshop on Snowflake. The agenda includes introductions, an overview of Snowflake and data warehousing, demonstrations of how users utilize Snowflake, hands-on exercises loading sample data and running queries, and discussions of Snowflake architecture and capabilities. Real-world customer examples are also presented, such as a pharmacy building new applications on Snowflake and an education company using it to unify their data sources and achieve a 16x performance improvement.
The new Microsoft Azure SQL Data Warehouse (SQL DW) is an elastic data warehouse-as-a-service and is a Massively Parallel Processing (MPP) solution for "big data" with true enterprise class features. The SQL DW service is built for data warehouse workloads from a few hundred gigabytes to petabytes of data with truly unique features like disaggregated compute and storage allowing for customers to be able to utilize the service to match their needs. In this presentation, we take an in-depth look at implementing a SQL DW, elastic scale (grow, shrink, and pause), and hybrid data clouds with Hadoop integration via Polybase allowing for a true SQL experience across structured and unstructured data.
This document provides an introduction and overview of Azure DocumentDB. It discusses how DocumentDB is a fully managed NoSQL database service that provides fast and predictable performance for JSON data through SQL querying capabilities. It also describes how DocumentDB offers features like elastic scaling, high availability, global distribution and ease of development. The document then provides information on starting with DocumentDB, writing queries, and programming capabilities within DocumentDB like stored procedures and triggers.
SQL Server 2016 introduces several new features for In-Memory OLTP including support for up to 2 TB of user data in memory, system-versioned tables, row-level security, and Transparent Data Encryption. The in-memory processing has also been updated to support more T-SQL functionality such as foreign keys, LOB data types, outer joins, and subqueries. The garbage collection process for removing unused memory has also been improved.
This document provides an introduction and overview of Azure Data Lake. It describes Azure Data Lake as a single store of all data ranging from raw to processed that can be used for reporting, analytics and machine learning. It discusses key Azure Data Lake components like Data Lake Store, Data Lake Analytics, HDInsight and the U-SQL language. It compares Data Lakes to data warehouses and explains how Azure Data Lake Store, Analytics and U-SQL process and transform data at scale.
Row Level Security (RLS) enables implementation of row-level access restrictions in SQL Server. RLS uses predicate functions to define the security logic and filters rows for queries based on that logic. Security predicates bind the predicate functions to tables and are defined as filter predicates to silently filter rows or blocking predicates to prevent write operations. Best practices include keeping the security logic simple and on separate schemas for maintenance. RLS has some limitations including incompatibility with Filestream and Polybase.
This document provides an introduction and background about the presenter along with information about SQL Database. The presenter has over 30,000 hours of training experience with SQL Server and various Microsoft certifications. They created SQL School Greece as a resource for IT professionals and others interested in SQL Server. The presentation will cover what SQL Database is on Azure, its service tiers including basic, standard, and premium, database transaction units (DTUs), the Azure SQL Database logical server, management tools for SQL Database, and securing SQL Database. It concludes with an invitation to sign up for SQL PASS and follow the presenter on social media.
Live Query Statistics and Query Store are new features in SQL Server 2016 that provide insights into query performance. Live Query Statistics allows users to view live execution plans and operator statistics to troubleshoot long-running or problematic queries. Query Store automatically captures query histories, plans, and runtime statistics to help users identify performance regressions and force previous high-performing plans. Both features aim to simplify performance troubleshooting and provide greater visibility into the query optimization and execution process.
This document provides an introduction and overview of machine learning concepts and Azure Machine Learning. It defines machine learning as finding patterns in data and using those patterns to predict the future. It outlines the machine learning workflow and lifecycle, including preparing data, applying algorithms to find patterns, iterating to create the best model, and deploying the final model. It also describes machine learning concepts like supervised and unsupervised learning, and different problem types like regression, classification, and clustering. Finally, it discusses options for using Azure Machine Learning, including free and full-featured paid accounts, and demonstrates its use.
SQL Server 2016 introduces new features for business intelligence and reporting. PolyBase allows querying data across SQL Server and Hadoop using T-SQL. Integration Services has improved support for AlwaysOn availability groups and incremental package deployment. Reporting Services adds HTML5 rendering, PowerPoint export, and the ability to pin report items to Power BI dashboards. Mobile Report Publisher enables developing and publishing mobile reports.
Dynamic data masking is a data protection feature in SQL Server 2016 that masks sensitive data in query results without altering the actual data. It can help protect private information by exposing only obfuscated data to unauthorized users. Administrators can configure masking rules for specific columns using various masking functions like default, email, random, or custom string masking. The underlying data remains intact but masked data is returned for users without unmask permissions. It provides data security with minimal performance impact by masking results on-the-fly.
The document discusses technologies within the Microsoft SQL family and Azure SQL that can help organizations address requirements of the General Data Protection Regulation (GDPR). It covers features for discovering and classifying personal data, managing access and controlling how data is used, and protecting data through encryption, auditing and other security controls. Built-in technologies like dynamic data masking, row-level security, authentication options, and transparent data encryption are described as ways SQL Server and Azure SQL Database can help organizations comply with GDPR.
Dans cette session nous vous présenterons les différentes manières d'utiliser SQL Server dans une infrastructure Cloud (Microsoft Azure). Seront présentés des scénarios hybrides, de migration, de backup, et d'hébergement de bases de données SQL Server en mode IaaS ou PaaS.
This technical workshop equips you with the insights to modernize your legacy Windows and SQL Server applications. We will walk through the common Amazon Web Services (AWS) solutions and proven customer approaches to deploy and migrate SQL Server 2008 to the cloud.
The document discusses assessing and planning SQL database migrations to Azure. It outlines the steps involved, including initiating and discovering databases, assessing requirements and dependencies, planning the target platform of IaaS or PaaS, migrating the databases with various tools depending on downtime windows, and optimizing workloads in the cloud. It provides examples of tools like MAP, DMA, and migration options like transactional replication or Azure Database Migration Service.
In this presentation, we will do assess the on-premises environment and determining what workloads and databases are ready to make the move and what can you do to improve their Azure readiness while reducing downtime during the migration. Planning and assessment plays a critical role in moving to the cloud. We would see wide range of resources and tools to get an assessment completed with ease while identifying workload dependencies with practical tips and tricks focusing on sizing and costs. And finally, we’ll assess the SQL instances and identify their readiness for Azure as well.
What is in a modern BI architecture? In this presentation, we explore PaaS, Azure Active Directory and Storage options including SQL Database and SQL Datawarehouse.
This document provides an overview of Azure SQL Data Warehouse (SQL DWH), a cloud data warehouse service. It discusses SQL DWH's massively parallel processing (MPP) architecture that allows independent scaling of compute and storage. The document demonstrates how to create a SQL DWH, load data using PolyBase, and use common tools. It is intended to help users understand what SQL DWH is, how it works, and common scenarios it can be used for, such as processing large volumes of data without needing to purchase and manage hardware.
This document provides an overview of a course on implementing a modern data platform architecture using Azure services. The course objectives are to understand cloud and big data concepts, the role of Azure data services in a modern data platform, and how to implement a reference architecture using Azure data services. The course will provide an ARM template for a data platform solution that can address most data challenges.
This document provides an agenda and summary for a Data Analytics Meetup (DAM) on March 27, 2018. The agenda covers topics such as disruption opportunities in a changing data landscape, transitioning from traditional to modern BI architectures using Azure, Azure SQL Database vs Data Warehouse, data integration with Azure Data Factory and SSIS, Analysis Services, Power BI reporting, and a wrap-up. The document discusses challenges around data growth, digital transformation, and the shrinking time for companies to adapt to disruption. It provides overviews and comparisons of Azure SQL Database, Data Warehouse, and related Azure services to help modernize analytics architectures.
Integration Monday - Analysing StackExchange data with Azure Data LakeTom Kerkhove
Big data is the new big thing where storing the data is the easy part. Gaining insights in your pile of data is something different.
Based on a data dump of the well-known StackExchange websites, we will store & analyse 150+ GB of data with Azure Data Lake Store & Analytics to gain some insights about their users. After that we will use Power BI to give an at a glance overview of our learnings.
If you are a developer that is interested in big data, this is your time to shine! We will use our existing SQL & C# skills to analyse everything without having to worry about running clusters.
This document discusses the future of data and the Azure data ecosystem. It highlights that by 2025 there will be 175 zettabytes of data in the world and the average person will have over 5,000 digital interactions per day. It promotes Azure services like Power BI, Azure Synapse Analytics, Azure Data Factory and Azure Machine Learning for extracting value from data through analytics, visualization and machine learning. The document provides overviews of key Azure data and analytics services and how they fit together in an end-to-end data platform for business intelligence, artificial intelligence and continuous intelligence applications.
Microsoft Data Integration Pipelines: Azure Data Factory and SSISMark Kromer
The document discusses tools for building ETL pipelines to consume hybrid data sources and load data into analytics systems at scale. It describes how Azure Data Factory and SQL Server Integration Services can be used to automate pipelines that extract, transform, and load data from both on-premises and cloud data stores into data warehouses and data lakes for analytics. Specific patterns shown include analyzing blog comments, sentiment analysis with machine learning, and loading a modern data warehouse.
Azure provides several data related services for storing, processing, and analyzing data in the cloud at scale. Key services include Azure SQL Database for relational data, Azure DocumentDB for NoSQL data, Azure Data Warehouse for analytics, Azure Data Lake Store for big data storage, and Azure Storage for binary data. These services provide scalability, high availability, and manageability. Azure SQL Database provides fully managed SQL databases with options for single databases, elastic pools, and geo-replication. Azure Data Warehouse enables petabyte-scale analytics with massively parallel processing.
This document provides an overview of cloud computing concepts and Azure cloud services. It discusses cloud service models including infrastructure as a service (IaaS), platform as a service (PaaS), and software as a service (SaaS). It introduces Azure, the Microsoft cloud computing platform, and key Azure services like Azure Storage, Azure Portal, Azure Accounts, Azure Data Factory, and Azure Data Flow. Azure Data Factory allows building data integration solutions using activities, linked services, datasets and triggers without writing code. Azure Data Flow enables visually designing data transformations using a Spark optimizer without code.
This document provides a summary of Antonios Chatzipavlis's background and experience working with SQL Server. It details his career starting with SQL Server 6.0 in 1996 and earning his first Microsoft certification. It lists the various Microsoft certifications and roles he has held, including becoming an MVP for SQL Server. It also introduces his creation of SQL School Greece in 2012 to share his knowledge.
Slides from QSSUG Aug 2017 by David Alzamendi:
When on-premise, Data Warehouses are not the only option, many questions arise surrounding Azure SQL Data Warehouse.
In this session, David will cover the fundamentals of using Azure SQL Data Warehouse from a beginner's perspective. He'll discuss the benefits, demystify the pricing measurements and explain the difference between Azure SQL Database and Big Data.
By the end of this session, you will know how to deploy this service in just a few minutes using some of the latest techniques like extracting data from Azure data lakes and accessing Azure blob storage through PolyBase.
This document provides an overview of Microsoft Azure cloud services and why businesses use the cloud. It discusses Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS) models. Key Azure services are mentioned, including Virtual Machines, SQL Database, storage, and web apps. The cloud allows businesses to rapidly setup environments, scale as needed, and increase efficiency at a lower cost compared to on-premises infrastructure.
This document provides an overview of using Polybase for data virtualization in SQL Server. It discusses installing and configuring Polybase, connecting external data sources like Azure Blob Storage and SQL Server, using Polybase DMVs for monitoring and troubleshooting, and techniques for optimizing performance like predicate pushdown and creating statistics on external tables. The presentation aims to explain how Polybase can be leveraged to virtually access and query external data using T-SQL without needing to know the physical data locations or move the data.
Antonios Chatzipavlis presented on SQL Server backup and restore. The presentation covered database architecture basics including data files, transaction log files, and the buffer cache. It also discussed backup types like full, differential, transaction log, copy only and partial backups. Backup strategies and restore processes were explained, including restoring to a point in time and restoring system databases. The internals of how SQL Server performs backups using buffers and I/O threads was also summarized.
Antonios Chatzipavlis presented on migrating SQL workloads to Azure. He discussed modernizing data platforms by discovering, assessing, planning, transforming, optimizing, testing and remediating. Key migration considerations include remaining, rehosting, refactoring, rearchitecting, rebuilding or replacing workloads. Tools for migrating data include Microsoft Assessment and Planning Toolkit, Data Migration Assistant, Database Experimentation Assistant, SQL Server Migration Assistant, and Azure Database Migration Service. Workloads can be migrated to Azure VMs, Azure SQL Databases or Azure SQL Managed Instances.
This document summarizes a webinar presentation about workload management in SQL Server 2019. It discusses how SQL Server's Resource Governor feature can be used to provide multitenancy, predictable performance, and isolation for multiple workloads running on a single SQL Server instance. Key concepts covered include resource pools, workload groups, and classification functions to assign sessions to different pools and groups. The presentation also reviews best practices for using lookup tables in classification functions and shows some DMVs for monitoring Resource Governor configuration and statistics.
This document provides an overview of loading data into Azure SQL DW (Synapse Analytics). It discusses extracting source data into text files, landing the data into Azure Data Lake Store Gen2, preparing the data for loading into staging tables using PolyBase or COPY commands, transforming the data, and inserting it into production tables. It also compares ETL vs ELT approaches and SSIS vs Azure Data Factory for data integration. The presenter then demonstrates loading data in Synapse SQL pool and invites any questions.
The document provides an overview of the DAX language. It discusses that DAX is the programming language used in Power BI, Power Pivot, and Analysis Services for data modeling, reporting, and analytics. It describes the basic components of a DAX data model including tables, columns, relationships, measures, and hierarchies. It also covers DAX syntax, functions, operators, and how context and filter context work in DAX calculations and queries.
The document introduces Diagnostic Management Views (DMVs) and Dynamic Management Functions (DMFs) in SQL Server. It discusses that DMVs and DMFs return server state information and can be used to monitor server health, diagnose problems, and tune performance. It provides examples of common DMVs and DMFs used for query execution and the query plan cache. Finally, it notes that the presentation will demonstrate troubleshooting with DMVs and DMFs.
This document summarizes common T-SQL anti-patterns that can negatively impact query performance, including using SELECT *, functions in predicates, OR operators, implicit conversions, unnecessary sorts, correlated subqueries, and dynamic SQL execution. The presentation provides explanations of why each anti-pattern hurts performance and recommendations for more optimized alternatives such as using indexes, temporary tables, parameterization, and execution plan analysis.
This document discusses designing a modern data warehouse in Azure. It provides an overview of traditional vs. self-service data warehouses and their limitations. It also outlines challenges with current data warehouses around timeliness, flexibility, quality and findability. The document then discusses why organizations need a modern data warehouse based on criteria like customer experience, quality assurance and operational efficiency. It covers various approaches to ingesting, storing, preparing and modeling data in Azure. Finally, it discusses architectures like the lambda architecture and common data models.
Modernizing Your Database with SQL Server 2019 discusses SQL Server 2019 features that can help modernize a database, including:
- The Hybrid Buffer Pool which supports persistent memory to improve performance on read-heavy workloads.
- Memory-Optimized TempDB Metadata which stores TempDB metadata in memory-optimized tables to avoid certain blocking issues.
- Intelligent Query Processing features like Adaptive Query Processing, Batch Mode processing on rowstores, and Scalar UDF Inlining which improve query performance.
- Approximate Count Distinct, a new function that provides an estimated count of distinct values in a column faster than a precise count.
- Lightweight profiling, enabled by default, which provides query plan
This document discusses designing a modern data warehouse in Azure. It provides an overview of traditional vs. self-service data warehouses and their limitations. It also outlines challenges with current data warehouses around timeliness, flexibility, quality and findability. The document then discusses why organizations need a modern data warehouse based on criteria like customer experience, quality assurance and operational efficiency. It covers various approaches to ingesting, storing, preparing, modeling and serving data on Azure. Finally, it discusses architectures like the lambda architecture and common data models.
The document provides details about an SQL expert's background and certifications. It summarizes the expert's career starting in 1982 working with computers and 1988 starting in the computer industry. In 1996, they started working with SQL Server 6.0 and have since earned multiple Microsoft certifications. The expert now provides training and consultation services, and created an online school called SQL School Greece to teach SQL Server.
Azure SQL Database for the SQL Server DBA - Azure Bootcamp Athens 2018 Antonios Chatzipavlis
Azure SQL Database is a managed database service hosted in Microsoft's Azure cloud. Some key differences from SQL Server include: the service is paid by the hour based on the selected service tier; users can dynamically scale resources up or down; backups and high availability are managed by the service provider; and common administration tasks are handled by the provider rather than the user. The service offers automatic backups, point-in-time restore, and geo-restore capabilities along with built-in high availability through replication across three copies in the primary region.
The document provides biographical information about Antonios Chatzipavlis, a SQL Server expert and evangelist. It then summarizes his presentation on statistics and index internals in SQL Server, which covers topics like cardinality estimation, inspecting and updating statistics, index structure and types, and identifying missing indexes. The presentation includes demonstrations of analyzing cardinality estimation and picking the right index key.
Implementing Mobile Reports in SQL Sserver 2016 Reporting ServicesAntonios Chatzipavlis
The document provides an overview of implementing mobile reports in SQL Server 2016 Reporting Services. It discusses preparing data for mobile reports, using the SQL Server Mobile Report Publisher tool, and publishing mobile reports. The presenter has extensive experience with SQL Server and provides their qualifications. The presentation also provides information on optimizing reports, formatting time data, using filters and Excel files in reports, and designing reports using navigators and visualizations in the Mobile Report Publisher tool. It demonstrates the tool's interface and capabilities.
This document provides an overview of auditing data access in SQL Server. It discusses various methods for auditing such as using common criteria, SQL Trace, DML triggers, temporal tables, and implementing SQL Server Audit. SQL Server Audit is described as the primary auditing tool in SQL Server that can track both server and database level events. Considerations for implementing and managing SQL Server Audit are also covered.
This document provides information about a webinar on SQL Server 2016 Stretch Database presented by Antonios Chatzipavlis. The webinar covers an introduction to Stretch Database, its limitations and pricing, backup and restore of Stretch databases, and frequently asked questions. Antonios Chatzipavlis has over 30 years of experience working with computers and SQL Server. He is a Microsoft Certified Trainer and SQL Server Evangelist who runs the SQL School Greece training organization.
The document discusses SQL Server monitoring and troubleshooting. It provides an overview of SQL Server monitoring, including why it is important and common monitoring tools. It also describes the SQL Server threading model, including threads, schedulers, states, the waiter list, and runnable queue. Methods for using wait statistics like the DMVs sys.dm_os_waiting_tasks and sys.dm_os_wait_stats are presented. Extended Events are introduced as an alternative to SQL Trace. The importance of establishing a performance baseline is also noted.
The Challenge of Interpretability in Generative AI Models.pdfSara Kroft
Navigating the intricacies of generative AI models reveals a pressing challenge: interpretability. Our blog delves into the complexities of understanding how these advanced models make decisions, shedding light on the mechanisms behind their outputs. Explore the latest research, practical implications, and ethical considerations, as we unravel the opaque processes that drive generative AI. Join us in this insightful journey to demystify the black box of artificial intelligence.
Dive into the complexities of generative AI with our blog on interpretability. Find out why making AI models understandable is key to trust and ethical use and discover current efforts to tackle this big challenge.
"Making .NET Application Even Faster", Sergey Teplyakov.pptxFwdays
In this talk we're going to explore performance improvement lifecycle, starting with setting the performance goals, using profilers to figure out the bottle necks, making a fix and validating that the fix works by benchmarking it. The talk will be useful for novice and seasoned .NET developers and architects interested in making their application fast and understanding how things work under the hood.
Keynote : AI & Future Of Offensive SecurityPriyanka Aash
In the presentation, the focus is on the transformative impact of artificial intelligence (AI) in cybersecurity, particularly in the context of malware generation and adversarial attacks. AI promises to revolutionize the field by enabling scalable solutions to historically challenging problems such as continuous threat simulation, autonomous attack path generation, and the creation of sophisticated attack payloads. The discussions underscore how AI-powered tools like AI-based penetration testing can outpace traditional methods, enhancing security posture by efficiently identifying and mitigating vulnerabilities across complex attack surfaces. The use of AI in red teaming further amplifies these capabilities, allowing organizations to validate security controls effectively against diverse adversarial scenarios. These advancements not only streamline testing processes but also bolster defense strategies, ensuring readiness against evolving cyber threats.
Self-Healing Test Automation Framework - HealeniumKnoldus Inc.
Revolutionize your test automation with Healenium's self-healing framework. Automate test maintenance, reduce flakes, and increase efficiency. Learn how to build a robust test automation foundation. Discover the power of self-healing tests. Transform your testing experience.
Finetuning GenAI For Hacking and DefendingPriyanka Aash
Generative AI, particularly through the lens of large language models (LLMs), represents a transformative leap in artificial intelligence. With advancements that have fundamentally altered our approach to AI, understanding and leveraging these technologies is crucial for innovators and practitioners alike. This comprehensive exploration delves into the intricacies of GenAI, from its foundational principles and historical evolution to its practical applications in security and beyond.
Increase Quality with User Access Policies - July 2024Peter Caitens
⭐️ Increase Quality with User Access Policies ⭐️, presented by Peter Caitens and Adam Best of Salesforce. View the slides from this session to hear all about “User Access Policies” and how they can help you onboard users faster with greater quality.
"Building Future-Ready Apps with .NET 8 and Azure Serverless Ecosystem", Stan...Fwdays
.NET 8 brought a lot of improvements for developers and maturity to the Azure serverless container ecosystem. So, this talk will cover these changes and explain how you can apply them to your projects. Another reason for this talk is the re-invention of Serverless from a DevOps perspective as a Platform Engineering trend with Backstage and the recent Radius project from Microsoft. So now is the perfect time to look at developer productivity tooling and serverless apps from Microsoft's perspective.
Generative AI technology is a fascinating field that focuses on creating comp...Nohoax Kanont
Generative AI technology is a fascinating field that focuses on creating computer models capable of generating new, original content. It leverages the power of large language models, neural networks, and machine learning to produce content that can mimic human creativity. This technology has seen a surge in innovation and adoption since the introduction of ChatGPT in 2022, leading to significant productivity benefits across various industries. With its ability to generate text, images, video, and audio, generative AI is transforming how we interact with technology and the types of tasks that can be automated.
"Hands-on development experience using wasm Blazor", Furdak Vladyslav.pptxFwdays
I will share my personal experience of full-time development on wasm Blazor
What difficulties our team faced: life hacks with Blazor app routing, whether it is necessary to write JavaScript, which technology stack and architectural patterns we chose
What conclusions we made and what mistakes we committed
Garbage In, Garbage Out: Why poor data curation is killing your AI models (an...Zilliz
Enterprises have traditionally prioritized data quantity, assuming more is better for AI performance. However, a new reality is setting in: high-quality data, not just volume, is the key. This shift exposes a critical gap – many organizations struggle to understand their existing data and lack effective curation strategies and tools. This talk dives into these data challenges and explores the methods of automating data curation.
Discovery Series - Zero to Hero - Task Mining Session 1DianaGray10
This session is focused on providing you with an introduction to task mining. We will go over different types of task mining and provide you with a real-world demo on each type of task mining in detail.
3. 1982 I started working with computers
1988 I started my professional career in computers industry
1996 I started working with SQL Server 6.0
1998 I earned my first certification at Microsoft as
Microsoft Certified Solution Developer (3rd in Greece)
1999 I started my career as Microsoft Certified Trainer (MCT) with
more than 30.000 hours of training until now!
2010 I became for first time Microsoft MVP on Data Platform
I created the SQL School Greece www.sqlschool.gr
2012 I became MCT Regional Lead by Microsoft Learning Program.
2013 I was certified as MCSE : Data Platform
I was certified as MCSE : Business Intelligence
2016 I was certified as MCSE: Data Management & Analytics
Antonios
Chatzipavlis
SQL Server Expert and Evangelist
Data Platform MVP
MCT, MCSE, MCITP, MCPD, MCSD, MCDBA,
MCSA, MCTS, MCAD, MCP, OCA, ITIL-F
4. Μια πηγή ενημέρωσης για τον Microsoft SQL Server
προς τους Έλληνες IT Professionals, DBAs,
Developers, Information Workers αλλά και απλούς
χομπίστες που απλά τους αρέσει ο SQL Server.
Help line : help@sqlschool.gr
• Articles about SQL Server
• SQL Server News
• SQL Nights
• Webcasts
• Downloads
• Resources
What we are doing here Follow us in socials
fb/sqlschoolgr
fb/groups/sqlschool
@antoniosch
@sqlschool
yt/c/SqlschoolGr
SQL School Greece group
S E L E C T K N O W L E D G E F R O M S Q L S E R V E R
5. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Presentation Content
5
• First Look on Azure SQL DW
• Designing for Azure SQL DW
• Loading Data on Azure SQL DW
• Querying and Tuning Azure SQL DW
6. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
First Look on
Azure SQL Data Warehouse
6
7. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
What is Azure SQL Data Warehouse?
7
Service in
Microsoft Azure
It’s a PAAS
offering
It’s a Massively
Parallel Processing
System
Distribute
Storage
Distributed
Compute
8. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
SMP vs MPP
8
Symmetric Multiprocessing Massively Parallel Processing
9. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Data Warehousing Unit
9
A measure of the
underlying compute
power of the database
10. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Data Warehousing Unit
10
For Example
50 100
100 DWU 500 DWU
3 table loaded in 15 min
20 minutes to run a report
3 table loaded in 3 min
4 minutes to run a report
11. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Why Choose Cloud Over On-Premises DW?
11
• Doesn’t need large CAPEX to get started
• Doesn’t need large OPEX
• We can scale storage and compute up or down
on demand
12. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
What and How do you pay for this Service ?
12
• Storage
– Storage is billed by GB
– Standard or Premium Geo Redundant
– No cost for storage transactions
– Outbound data transfer is billed
• Compute Power
– Compute is billed by DWUs
– Can go from 100 to 2000
– Billed per hour
When not in use, compute
power of the DW can be
completely paused for
maximum savings
13. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Provisioning Azure SQL Data Warehouse
13
Select a
Region
Select or
Create a
Server
Pick
origin of
the data
Pick
DWU
level
14. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Methods of Provisioning
14
• Azure Portal
– Select New > Data + Storage
• PowerShell
– New AzureRmSqlDatabase Cmdlet
• T-SQL
– CREATE DATABASE Command
15. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Provision a Data Warehouse
15
DEMO
16. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Designing for
Azure SQL Data Warehouse
16
17. SQL Server Azure SQL DW!=
An Azure SQL DW database requires design
decisions that are different from SQL Server
18. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Distribution Key
18
Determines the method in which Azure
SQL Data Warehouse spreads the data
across multiple nodes
Azure SQL Data Warehouse
uses up to 60 distributions
when loading data into the
system
20. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Round-Robin Distribution
20
RecordNo CustomerID InvoiceDate
1 1000 2017-04-21
2 1000 2017-04-22
3 2000 2017-04-22
4 3000 2017-04-22
5 4000 2017-04-22
Rows distributed to all nodes
21. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Data Distribution best practice
21
Even DistributionOdd Distribution
22. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Good Hash Key
22
Distributes
Evenly
Used for
Grouping
Used as
Join Condition
Is Not
Updated
Has more than
60
distinct values
Round-Robin will always provide a uniform distribution but not necessarily the best performance
23. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Data Types
23
Use the smallest
data type which will
support your data
Avoid defining all
character columns
to a large default
length
Define columns as
VARCHAR instead of
NVARCHAR if you
don’t need Unicode
The goal is to not only save space but also move data as efficiently as possible
Some complex data types (xml, geography, etc) are not supported yet
24. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Table Types
24
Clustered
Columnstore
Default table
type
High
compression
ratio
Ideally
Segments of
1M rows
No secondary
indexes Heap
No index on
the data
Fast Load
No
compression
Allows
secondrary
indexes
Clustered
B-Tree
Sorted index
on the data
Fast singleton
lookup
No
compression
Allows
secondary
indexes
25. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Table Partitioning
25
1. Ease of loading and removal of data from a partitioned table
2. Targeting specific partitions on table maintenance operation
3. Performance improvements due to partition elimination
Partitioning is very common in SQL Server Data Warehouses for three reasons:
A highly granular partitioning scheme can work in SQL Server but hurt performance in Azure SQL DW
60 Distributions 365 Partitions 21.900 Data Buckets
21.900 Data Buckets Ideal Segment Size
(1M Rows)
21.900.000.000
Rows
Lower Granularity (week, month) can perform better depending on how much data you have
26. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
How do we apply these principles to a Dimensional Model?
26
• Fact Table
– Large ones are better as Columnstores
– Distributed through Has key as much as possible as long as it is even
– Partitioned only if the is large enough to fill up each segment
• Dimension Tables
– Can be Hash distributed or Round-Robin if there is no clear candidate
join key
– Columnstore for large dimensions
– Heap or Clustered Index for small dimensions
– Add secondary indexes for alternate join columns
– Partitioning not recommended
27. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Analyzing distribution and data types for DW tables
27
DEMO
28. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Loading Data on
Azure SQL Data Warehouse
28
29. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Loading an MPP System
29
The main principle of loading
data into Azure DW is to do as
much work in parallel as possible
30. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Data Warehouse Readers
30
100 200 300 400 500 600 1000 1200 1500 2000
Readers 8 16 24 32 40 48 60 60 60 60
Writers 60 60 60 60 60 60 60 60 60 60
DWU
Your DWUs have a direct impact on how fast you can load data in parallel
- Azure SQL Data Warehouse introduces the concept of Data Warehouse
Readers.
- These are threads that will be reading data in parallel and then passing it
off to Writer threads.
31. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Optimize Insert Batch Size
31
• Avoid trickle insert pattern
– Ideal batch size is 1 million or more direct or in a file
• Avoid Ordered Data
– Data ordered by distribution key can introduce hot spots that slow down the load
operation
• Using Temporary Tables
– Stage and transform on a Temp Heap table before moving to permanent storage
• Use the CREATE TABLE AS statement
– Fully parallel operation
– It’s minimally logged
– It can change: distribution, table type, partitioning
33. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
User Resource Class
33
Class Smallrc Mediumrc Largerc Xlargerc
Default 8 16 24 32
Memory 100 MB 100-1600 MB 200-3200 MB 400-6400 MB
The lower range corresponds to DWU100 the upper range to DWU2000
User Resource classes as database roles that govern how many resources
are given to a query
For fast and high quality loads create a user just for loading which utilize a medium or large RC
34. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Loading Methods
34
• Single-client loading methods
– SSIS
– Azure Data Factory
– BCP
– Can add some parallel capabilities but are bottleneck at the Control node
• Parallel readers loading methods
– PolyBase
– Reads from Azure Blob Storage and loads the content into Azure SQL DW
– Bypasses the Control node and loads directly into the Compute Nodes
35. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Control Node
35
The Control Node
receives connections
and orchestrates the
queries
The Compute Nodes
do processing on the
data and scale with
the DWUs
36. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Loading with SSIS
36
SSIS Control
Node
37. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Loading data with SSIS
37
DEMO
38. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Loading with PolyBase
38
Control
Node Azure
Blob Storage
PolyBase can load data from
UTF-8 delimited text files and
popular Hadoop file formats
(RC file, ORC and Parquet)
39. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Loading data with PolyBase
39
DEMO
40. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Migration Utility
40
• Supports SQL Server 2012+ and
Azure SQL Database
• Provides a migration report pointing
out possible issues
• Assists with schema migration
• Assists with data migration
41. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Using the Azure SQL DW migration utility
41
DEMO
42. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Querying and Tuning
Azure SQL Data Warehouse
42
43. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Workload Management Principles
43
User Resource
Class
Concurrency
Model
Transaction Size
TwoMaximumLimits
1024 Connections
32 Concurrent Queries
45. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Resource Class and Concurrency Slots
45
Class Smallrc Mediumrc Largerc Xlargerc
DWUs 100-2000 100-2000 100-2000 100-2000
Slots 1 1-6 2-32 4-64
SELECT queries against system views, stats and other management commands do not use concurrency slots
46. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Transaction Size Limits
46
100 200 300 400 500 600 1000 1200 1500 2000
GB /
Distribution
1 1,5 2,25 3 3,75 4,5 7,5 9 11,25 15
DWU
A DW200 transaction doing equal work per distribution could
consume 60 x 1,5 GB = 90 GB of space
47. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Maintaining Statistics
47
• The service does not create or maintain stats
automatically
• Creating New stats
– Sampled single column stats is a good start
– Multi columns stats for joins involving multiple columns
– Focus on columns used in JOINs, GROUP BY, HAVING and WHERE clauses
– Increase the sample if necessary
• Updating existing stats
– If new dates or dimension categories added
– If new data loads have completed
– If an UPDATE or DELETE changes the distribution of data
48. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Index Defrag
48
• Heap
– Does not have a defrag option
• B-Tree Index
– Useful for removing low levels of fragmentation
• Columnstore
– Proactively compresses CLOSED rowgroups
• On a large table with heavy fragmentation it is often faster to recreate the
table with the CREATE TABLE AS SELECT and switch it with the older
49. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Index Rebuild
49
• Heap
– Can be rebuilt to remove forward pointers
• B-Tree Index
– Will remove high levels of fragmentation
• Columnstore
– Can increase the density of segments
• Rebuilding as index is an OFFLINE operation in Azure SQL DW
50. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Scaling Performance
50
• Increase the User Resource Class
– EXEC sp_addrolemember ‘largerc’, ‘loaduser’;
– Higher Resource Class – more memory and CPU
– More concurrency slots – less concurrent queries
– The highest role assigned takes precedence
• Increase the Data Warehouse Units
– ALTER DATABASE AWDW MODIFY (SERVICE_OBJECTIVE=‘DW1000’);
– It is an OFFLINE operation
– Make sure there are no loads or transactions in progress
– Can also be done through the Azure Portal
51. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Tracking Queries with Lables
51
SELECT sum(Qty)
FROM dbo.FactInternetSales
OPTION (LABEL=‘mylabel’);
SELECT *
FROM sys.dm_pdw_exec_requests
WHERE label=‘mylabel’);
User Query
Admin Query
52. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Labeling a query and tracking its execution
52
DEMO