Google Analytics 101 provides an introductory lesson on features of Google Analytics including Audience, Behavior, Conversion and A/B Testing. If you are considering a website redesign, you should review this to get familiar with how to integrate analytics. Check out http://insivia.com to learn more.
PowerPivot allows for industrial-strength data analysis in Excel by removing sheet size constraints, treating data more like a database than individual sheets. It combines data from multiple sources into a fast local database that can be analyzed while disconnected. PowerPivot features include relationships across tables, easy filtering with slicers, and the DAX programming language for writing formulas across rows rather than cells. Resources for learning more include the PowerPivot website and blog.
This document provides an overview of DAX (Data Analysis Expressions) and how it can be used for data analysis in Power BI and Analysis Services Tabular models. It discusses key DAX concepts like calculated columns, calculated measures, and filter context. It also covers common DAX functions and how to work with dates in DAX. The document provides examples of how to define security and write DAX queries against the BI Semantic Model.
An introductory session to DAX and common analytic patterns that we've built and used in enterprise environments. This session was originally presented at SQL Saturday Silicon Valley 2016.
The document discusses dimensional modeling concepts used in data warehouse design. Dimensional modeling organizes data into facts and dimensions. Facts are measures that are analyzed, while dimensions provide context for the facts. The dimensional model uses star and snowflake schemas to store data in denormalized tables optimized for querying. Key aspects covered include fact and dimension tables, slowly changing dimensions, and handling many-to-many and recursive relationships.
Slides for the Usergroup meeting for the Manchester Power BI User Group on June 27th, 2019.
Subject: Power BI for Developers about Power BI Embedded and Power BI Custom Visuals
This document provides an introduction to data mining and big data. It defines data mining as the process of analyzing data from different perspectives to discover useful patterns and relationships. The document lists some common applications of data mining in industries like finance, insurance, and telecommunications. It also outlines the typical steps involved in data mining, including data integration, cleaning, transformation, and knowledge presentation. Big data is defined as extremely large data sets that are difficult to process using traditional tools. The rapid growth of data from sources like social media and mobile devices is driving the need for tools to handle big data's volume, velocity, and variety of data types.
Tech Due Diligence from CTO's perspective - Talk at code.talks commerceChris Philipps
This document provides guidance from a Chief Technology Officer's perspective on conducting technical due diligence for startups and investments. It discusses what technical due diligence involves, including identifying assets, risks, leadership, team skills, and technology scalability. The CTO recommends being prepared with documentation, demonstrating an understanding of investors' expectations, and knowing one's own company story. Key topics to review include products, architecture, processes, and more. Potential red flags and sample questions are also provided to help companies survive the technical due diligence process. Overall, the document aims to help technology founders and leaders effectively collaborate and address any issues that may arise during an evaluation of their company's technical strengths and weaknesses.
Report authoring in Business IntelligenceRavi Pandit
In this presentation, we've discussed about what is Business Intelligence? How does it work? Why it is Important? Reporting in Business Intelligence, Report authoring in IBM Cognos Analytics, challenges og Report Authoring, solutions to these challenges, how IBM Cognos is helping report authors, Authoring Reports with Cognos.
Category: Education, Business, Engineering
This document provides an overview of Microsoft Power BI, including its history, key features, and capabilities. It describes how Power BI allows users to connect to various data sources, perform data transformation using Power Query, build interactive reports with Power View and Power Pivot, and create visualizations and dashboards to share insights. The document also discusses Power BI Desktop, the Power BI service, and how to publish reports and dashboards to the web for sharing.
Data Quality Patterns in the Cloud with Azure Data FactoryMark Kromer
This document discusses data quality patterns when using Azure Data Factory (ADF). It presents two modern data warehouse patterns that use ADF for orchestration: one using traditional ADF activities and another leveraging ADF mapping data flows. It also provides links to additional resources on ADF data flows, data quality patterns, expressions, performance, and connectors.
Power BI data modeling is the process of creating a relationship between common columns of multiple tables. If the column headings are the same across tables, then Power BI auto-detects the relationship between tables. Using these columns, we can merge the tables as well.
The document discusses data warehousing and OLAP (online analytical processing). It defines a data warehouse as a subject-oriented, integrated, time-variant and non-volatile collection of data used to support management decision making. The document outlines common data warehouse architectures like star schemas and snowflake schemas and discusses how data is modeled and organized in multidimensional data cubes. It also describes typical OLAP operations for analyzing and exploring cube data like roll-up, drill-down, slice and dice.
Data Visualization Design Best Practices WorkshopJSI
This document provides guidance on effective data visualization. It emphasizes starting with the audience and their needs, identifying the key story or message in the data, and using simple, clear design principles. Charts should be designed in 5-8 seconds to engage the audience. The document recommends several resources for choosing effective chart types and improving visualization skills. Overall, it stresses the importance of visualization in empowering stakeholders to make informed decisions.
The document provides tips for designing visually stunning reports in Power BI, including using appropriate background colors, spacing, alignment and grids. It recommends focusing 80% of effort on cleaning and enhancing the data, and using techniques like gauges/indicators, charts, images and text in reports, panels, dashboards and infographics. Specific tips include using fewer than 4 colors from a consistent palette, drawing on a grid, leaving empty space and enhancing important page elements. The overall goal is to present data and insights in a simple, visually appealing and easy to understand format.
Data Science in the Real World: Making a Difference Srinath Perera
We use the terms “Big Data” and “Data Science” for use of data processing to make sense of the world around us. Spanning many fields, Big Data brings together technologies like Distributed Systems, Machine Learning, Statistics, and Internet of Things together. It is a multi-billion-dollar industry including use cases like targeted advertising, fraud detection, product recommendations, and market surveys. With new technologies like Internet of Things (IoT), these use cases are expanding to scenarios like Smart Cities, Smart health, and Smart Agriculture.
These usecases use basic analytics, advanced statistical methods, and predictive technologies like Machine Learning. However, it is not just about crunching the data. Some usecases like Urban Planning can be slow, and there is enough time to process the data. However, with use cases like traffic, patient monitoring, surveillance the the value of results degrades much faster with time and needs results within milliseconds to seconds. Collecting data from many sources, cleaning them up, processing them using computation clusters, and doing all these fast is a major challenge.
This talk will discuss motivation behind big data and data science and how it can make a difference. Then it will discuss the challenges, systems, and methodologies for implementing and sustaining a data science pipeline.
The document discusses data cubes and multidimensional data models. It provides examples of 2D and 3D data cubes to represent sales data with dimensions of time, item, and location. A data cube is a metaphor for storing multidimensional data without redundancy. Common schemas for multidimensional data include star schemas with a central fact table linked to dimension tables, snowflake schemas with some normalized dimension tables, and fact constellations with multiple linked fact tables. Dimension hierarchies allow mapping of low-level concepts like cities to higher-level concepts like states/provinces.
What Is Power BI? | Introduction To Microsoft Power BI | Power BI Training | ...Edureka!
( Power BI Training - https://www.edureka.co/power-bi-training )
This Edureka "What Is Power BI?" tutorial will help you an introduction to Power BI. This video helps you to learn the following topics:
1. Why Power BI?
2. What Is Power BI?
3. Who Use Power BI?
4. Components Of Power BI
5. Building Blocks Of Power BI
Check out our Power BI Playlist: https://goo.gl/97sJv1
Power BI is a business intelligence tool that converts data from different sources into attractive dashboards and reports. It includes Power BI Desktop for creating reports, Power BI Service for publishing reports, and Power BI mobile apps for viewing reports and dashboards. Power BI Desktop can import or directly query data from various sources like files, databases, and the web. It allows users to transform, visualize, and analyze data to gain insights. The imported data is stored in the Power BI service, while direct query leaves the data in its source.
Learn best practices for taking advantage of Amazon Redshift's columnar technology and parallel processing capabilities to improve your data warehouse performance.
(BDT401) Amazon Redshift Deep Dive: Tuning and Best PracticesAmazon Web Services
Get a look under the covers: Learn tuning best practices for taking advantage of Amazon Redshift's columnar technology and parallel processing capabilities to improve your delivery of queries and improve overall database performance. This session explains how to migrate from existing data warehouses, create an optimized schema, efficiently load data, use work load management, tune your queries, and use Amazon Redshift's interleaved sorting features. Finally, learn how TripAdvisor uses these best practices to give their entire organization access to analytic insights at scale.
This document provides an overview and best practices for using Amazon Redshift as a data warehouse. It discusses ingestion best practices like using multiple files for COPY and primary keys. It also covers data hygiene practices like analyzing tables and vacuuming regularly. Recent features like automatic compression, table restore, UDFs and interleaved sort keys are described. The document provides guidance on migrating workloads and tuning queries, including using WLM queues and the performance monitor in the console.
Deep Dive: Amazon Redshift (March 2017)Julien SIMON
This document provides an overview of optimizing performance in Amazon Redshift. It discusses the architecture of Redshift including columnar storage and compression. It also covers best practices for schema design such as choosing distribution styles, sort keys and column widths. Additional topics include ingestion strategies, regular maintenance of statistics and vacuuming, and workload management using queues.
This document provides an overview of optimizing performance in Amazon Redshift. It discusses the architecture of Redshift including columnar storage and compression. It also covers best practices for schema design such as choosing distribution styles, sort keys and column widths. Additional topics include ingestion strategies, regular maintenance of statistics and vacuuming, and workload management using queues.
Redshift is Amazon's cloud data warehousing service that allows users to interact with S3 storage and EC2 compute. It uses a columnar data structure and zone maps to optimize analytic queries. Data is distributed across nodes using either an even or keyed approach. Sort keys and queries are optimized using statistics from ANALYZE operations while VACUUM reclaims space. Security, monitoring, and backups are managed natively with Redshift.
Brad McGehee's presentation on "How to Interpret Query Execution Plans in SQL Server 2005/2008".
Presented to the San Francisco SQL Server User Group on March 11, 2009.
The document provides an overview of various techniques for optimizing database and application performance. It discusses fundamentals like minimizing logical I/O, balancing workload, and serial processing. It also covers the cost-based optimizer, column constraints and indexes, SQL tuning tips, subqueries vs joins, and non-SQL issues like undo storage and data migrations. Key recommendations include using column constraints, focusing on serial processing per table, and not over-relying on statistics to solve all performance problems.
This document discusses how to implement operations like selection, joining, grouping, and sorting in Cassandra without SQL. It explains that Cassandra uses a nested data model to efficiently store and retrieve related data. Operations like selection can be performed by creating additional column families that index data by fields like birthdate and allow fast retrieval of records by those fields. Joining can be implemented by nesting related entity data within the same column family. Grouping and sorting are also achieved through additional indexing column families. While this requires duplicating data for different queries, it takes advantage of Cassandra's strengths in scalable updates.
As a developer, it is important to understand MySQL storage engines and how they can impact performance. The key factors to consider include the type of data being stored, concurrency needs, and requirements for transactions. The storage engine chosen affects aspects like locking granularity, indexing support, and performance for queries, inserts, and updates. Explain statements help analyze query execution plans and identify opportunities to improve performance through proper indexing.
MySQL: Know more about open Source DatabaseMahesh Salaria
- As a developer, it is important to understand MySQL's storage engines, data types, indexing, and normalization to build high-performing applications.
- MySQL has several storage engines that handle different table types differently in terms of transactions, locking, storage, and memory usage. Choosing the right engine depends on data usage.
- Properly normalizing data, using optimal data types, and adding indexes improves performance by reducing storage needs, memory usage, and speeding up queries.
This document provides guidance on optimizing database performance through techniques like indexing, query tuning, avoiding unnecessary operations, and following best practices for objects like stored procedures, triggers, views and transactions. It emphasizes strategies like indexing frequently accessed columns, avoiding correlated subqueries and unnecessary joins, tuning queries to select only required columns, and keeping transactions and locks as short as possible.
Optimizing Query is very important to improve the performance of the database. Analyse query using query execution plan, create cluster index and non-cluster index and create indexed views
This document discusses best practices for building large scale relational data warehouses. It recommends partitioning large fact tables, building indexes on the date key of fact tables and foreign keys of dimension tables. Choosing the right partition grain and designing dimension tables efficiently with surrogate keys is also discussed. Maintaining data using sliding window techniques and efficiently loading, deleting, and backing up data are covered.
A quick overview of Redshift and common use-cases. Followed by tools and links to performance tuning. How Redshift fits in the AWS data services. A list of key new features since last meetup in September 2016, including Redshift Spectrum that allows one to run SQL directly on your data sitting on Amazon S3. It also includes Redshift echosystem with data integration, bi, consultancy and data modelling partners.
SQL Server 2008 Development for ProgrammersAdam Hutson
The document outlines a presentation by Adam Hutson on SQL Server 2008 development for programmers, including an overview of CRUD and JOIN basics, dynamic versus compiled statements, indexes and execution plans, performance issues, scaling databases, and Adam's personal toolbox of SQL scripts and templates. Adam has 11 years of database development experience and maintains a blog with resources for SQL topics.
Use EXPLAIN to profile query execution plans. Optimize queries by using proper indexes, limiting unnecessary DISTINCT and ORDER BY clauses, batching INSERTs, and avoiding correlated subqueries. Know your storage engines and choose the best one for your data needs. Monitor configuration variables, indexes, and queries to ensure optimal performance. Design schemas thoughtfully with normalization and denormalization in mind.
AWS June 2016 Webinar Series - Amazon Redshift or Big Data AnalyticsAmazon Web Services
Analyzing big data quickly and efficiently requires a data warehouse optimized to handle and scale for large datasets. Amazon Redshift is a fast, petabyte-scale data warehouse that makes it simple and cost-effective to analyze big data for a fraction of the cost of traditional data warehouses. By following a few best practices, you can take advantage of Amazon Redshift’s columnar technology and parallel processing capabilities to minimize I/O and deliver high throughput and query performance. This webinar will cover techniques to load data efficiently, design optimal schemas, and tune query and database performance.
Learning Objectives:
Get an inside look at Amazon Redshift's columnar technology and parallel processing capabilities
Learn how to migrate from existing data warehouses, optimize schemas, and load data efficiently
Learn best practices for managing workload, tuning your queries, and using Amazon Redshift's interleaved sorting features
Similar to How to Fine-Tune Performance Using Amazon Redshift (20)
Analytics Web Day | From Theory to Practice: Big Data Stories from the FieldAWS Germany
The document discusses three case studies of companies using big data technologies:
1) An insurance company modernized its data warehouse by using AWS services like S3, EMR and Zeppelin for analytics at minimal cost.
2) A telecom company implemented advanced analytics and stream processing on AWS to better understand customers and enhance systems.
3) An industrial use case uses stream processing, machine learning and AWS services for predictive maintenance and error detection.
Analytics Web Day | Query your Data in S3 with SQL and optimize for Cost and ...AWS Germany
The previous presentation showed how events can be ingested and analyzed continuously in real time. One of Big Data's principles is to store raw data as long as possible - to be able to answer future questions. If the data is permanently stored in Amazon Simple Storage Service (S3), it can be queried at any time with Amazon Athena without spinning up a database.
This session shows step by step how the data should be structured so that both costs and response times are reduced when using Athena. The details and effects of compression, partitions, and column storage formats are compared. Finally, AWS Glue is used as a fully managed service for Extract Transform Load (ETL) to derive optimized views from the raw data for frequently issued queries.
Speaker: Steffen Grunwald, Senior Solutions Architect, AWS
Modern Applications Web Day | Impress Your Friends with Your First Serverless...AWS Germany
"Build and run applications without thinking about servers". You want it? You get it! We will start this session with a motivation why serverless applications are a thing. Once we got there, we will actually start building one, of course with making use of a serverless CI/CD pipeline. After we will have looked into how we can still test it locally, we shall also dive into analyzing and debugging our app - of course in a serverless manner.
Speaker: Dirk Fröhner, Senior Solutions Architect, AWS
Modern Applications Web Day | Manage Your Infrastructure and Configuration on...AWS Germany
It's easy to say - "Hey I will use the cloud and be scalable and elastic!" - But it is not easy managing all that at scale, and keeping it flexible! Let's talk about Infrastructure as Code and Configuration as Code! This session will help you grasp the available toolset and best practices when it comes to managing your infrastructure and configuration on AWS. It will show you how can you make any changes to your workload with a single 'git push origin master'
Speaker: Darko Meszaros, Solutions Architect, AWS
Modern Applications Web Day | Container Workloads on AWSAWS Germany
Containers gained strong traction since day one for both enterprises and startups. Today AWS customers are launching hundreds of millions of new containers – each week. Join us as we cover the state of containerized application development and deployment trends. This session will dive deep on new container capabilities that help customers deploying and running container-based workloads for web services and batches.
Speaker: Steffen Grunwald, Senior Solutions Architect, AWS & Sascha Möllering, Senior Solutions Architect, AWS
Modern Applications Web Day | Continuous Delivery to Amazon EKS with SpinnakerAWS Germany
With more and more application workloads moving to Kubernetes, the interest in managed Kubernetes services in enterprises is increasing. While Amazon EKS will make operations easier, an efficient and transparent delivery pipeline becomes more important than ever. This will provide an increased application development velocity that will directly convert into a competitive advantage with fast paced digital services. While established tools such as Jenkins can be used quite efficiently for CI tasks, modern cloud-native tools like Spinnaker are gaining attention by focusing more in the continuous delivery process. We will show you how Spinnaker and its new Kubernetes v2 provider can be utilized together with Amazon EKS to streamline your application deployments.
Speaker: Jukka Forsgren, nordcloud
The most common way to start developing for Alexa is with custom skills while not too many of us except for device manufacturers get in touch with Smart Home skills on Alexa. This session introduces and demonstrates the power of Smart Home skills and it takes a look behind the technical scene of what happens in between an “Alexa, turn on the lights” and Alexa´s final “Ok” confirmation. Once you are familiar with the concept of Smart Home skills you will find out that it’s not just for implementing large-scale Smart Home solutions as the Smart Home API is also a great playground for your next Do it Yourself project. At the end of this session you’ve learned about the probably simplest way to build a Smart Home project with Raspberry Pi and AWS IoT – and you will be equipped with essential knowledge on how to build your own voice-controlled “thing”.
Hotel or Taxi? "Sorting hat" for travel expenses with AWS ML infrastructureAWS Germany
Automating the boring task of submitting travel expenses we developed ML model for classifying recipes. Using AWS EC2, Lambda, S3, SageMaker, Rekognition we evaluated different ways of training model and serving predictions as well as different model approaches (classical ML vs. Deep Learning).
Wild Rydes with Big Data/Kinesis focus: AWS Serverless WorkshopAWS Germany
This is a hands-on workshop where every participant will not only learn how to architect and implement a serverless application on Amazon Web Services using nothing but serverless resources for all layers in theory, but actually do it in practice, with all the necessary support from the speakers. Serverless computing allows you to build and run applications and services without thinking about servers. Serverless applications don't require you to provision, scale, and manage any servers. You can build them for nearly any type of application or backend service, and everything required to run and scale your application with high availability is handled for you. Building serverless applications means that developers can focus on their core product instead of worrying about managing and operating servers or runtimes. This reduced overhead lets developers reclaim time and energy that can be spent on developing great products which scale and that are reliable.
Nearly everything in IT - servers, applications, websites, connected devices, and other things - generate discrete, time-stamped records of events called logs. Processing and analyzing these logs to gain actionable insights is log analytics. We'll look at how to use centralized log analytics across multiple sources with Amazon Elasticsearch Service.
Deep Dive into Concepts and Tools for Analyzing Streaming Data on AWS AWS Germany
Querying streaming data with SQL to derive actionable insights at the point of impact in a timely and continuous fashion offers various benefits over querying data in a traditional database. However, although it is desirable for many use cases to transition to a stream based paradigm, stream processing systems and traditional databases are fundamentally different: in a database, the data is (more or less) fixed and the queries are executed in an ad-hoc manner, whereas in stream processing systems, the queries are fixed and the data flows through the system in real-time. This leads to different primitives that are required to model and query streaming data.
In this session, we will introduce basic stream processing concepts and discuss strategies that are commonly used to address the challenges that arise from querying of streaming data. We will discuss different time semantics, processing guarantees and elaborate how to deal with reordering and late arriving of events. Finally, we will compare how different streaming use cases can be implemented on AWS by leveraging Amazon Kinesis Data Analytics and Apache Flink.
Zehntausende gemeinnützige und nichtstaatliche Organisationen weltweit verwenden AWS, damit sie sich auf ihre eigentliche Mission konzentrieren können, statt ihre IT-Infrastruktur zu verwalten. Die Anwendungsgebiete von Nonprofits und NGOs sind dabei genauso vielfältig, wie bei Enterprise oder Start-up oder anderen AWS-Anwendern im öffentlichen Sektor. Gemeinnützige Organisationen und NGOs nutzen AWS z.B. um hochverfügbare und hochskalierbare Websites zu erstellen, um ihre Spendenaktionen und Öffentlichkeitsarbeit effizient zu verwalten, oder um Nutzen aus Big Data Anwendungen zu ziehen.
In dieser Sitzung werden wir einen Blick auf die verschiedenen AWS-Programme werfen, die gemeinnützigen Organisationen den Einstige in AWS und die Umsetzung ihrer IT-Projekte erleichtern. Insbesondere informieren wir auch über das Angebote mit Stifter-Helfen.de - dem deutschen TechSoup-Partner. Dieses Angebot stellt den begünstigten Organisationen pro Jahr $2.000 in AWS Credit Codes zu Verfügung.
Die Session richtet sich an alle, die sich für einen guten Zweck engagieren wollen und dabei nicht auf innovative Cloud-Services zur Umsetzung ihrer IT-Projekte verzichten wollen. Für die Teilnahme and der Session sind keine technischen Vorkenntnisse notwendig
The document discusses data architecture challenges and best practices for microservices. It covers challenges like distributed transactions, eventual consistency, and choosing appropriate data stores. It provides recommendations for handling errors and rollbacks in a distributed system using techniques like correlation IDs, transaction managers, and event-driven architectures with DynamoDB streams. The document also provides a framework for classifying non-functional requirements and mapping them to suitable AWS data services.
Serverless vs. Developers – the real crashAWS Germany
With serverless things are getting really different. Commodity building blocks from our cloud providers, functional billing, serverless marketplaces etc. are going to hit the usual “Not invented here”3 syndrome in organizations.
Many beloved things have to be un- or re-learned by software developers. How can we prepare our organizations and people for unlearning old patterns and behaviours? Let’s have a look from a knowledge management perspective.
Objective of the talk:
Intro into systemic knowledge management
Query your data in S3 with SQL and optimize for cost and performanceAWS Germany
Streaming services allow you to ingest and analyze events continuously in real time. One of Big Data's principles is to store raw data as long as possible - to be able to answer future questions. If the data is permanently stored in Amazon Simple Storage Service (S3), it can be queried at any time with Amazon Athena without spinning up a database.
This session shows step by step how the data should be structured so that both costs and response times are reduced when using Athena. The details and effects of compression, partitions, and column storage formats are compared. Finally, AWS Glue is used as a fully managed service for Extract Transform Load (ETL) to derive optimized views from the raw data for frequently issued queries.
Secret Management with Hashicorp’s VaultAWS Germany
When running a Kubernetes Cluster in AWS there are secrets like AWS and Kubernetes credentials, access information for databases or integration with the company LDAP that need to be stored and managed.
HashiCorp’s Vault secures, stores, and controls access to tokens, passwords, certificates, API keys, and other secrets . It handles leasing, key revocation, key rolling, and auditing.
This talk will give an overview of secret management in general and Vault’s concepts. The talk will explain how to make use of Vault’s extensive feature set and show patterns that implement integration between Kubernetes applications and Vault.
Running more than one containerized application in production makes teams look for solutions to quickly deploy and orchestrate containers. One of the most popular options is the open-source project Kubernetes. With the release of the Amazon Elastic Container Service for Kubernetes (EKS), engineering teams now have access to a fully managed Kubernetes control plane and time to focus on building applications. This workshop will deliver hands-on labs to support you getting familiar with Amazon's EKS.
Our challenge is to provide a container cluster as part of the Cloud Platform at Scout24. Our goal is to support all the different applications with varying requirements the Scout24 dev teams can throw at us. Up until now, we have run all of them on the same ECS cluster with the same parameters. As we get further into our AWS migration, we have learned this does not scale. We combat this by introducing categories in one cluster with different configurations for the service. We will introduce how we tune each category differently, with different resource limits, different scaling approaches and more…
Containers gained strong traction since day one for both enterprises and startups. Today AWS customers are launching hundreds of millions of new containers – each week. Join us as we cover the state of containerized application development and deployment trends. This session will dive deep on new container capabilities that help customers deploying and running container-based workloads for web services and batches.
Deploying and Scaling Your First Cloud Application with Amazon LightsailAWS Germany
Are you looking to move to the cloud, but aren’t sure quite where to start? Are you already using AWS, and are looking for ways to simplify some of your workflows? If you answered “yes” (or even “maybe”) to either one of those questions, this session / hands-on workshop is for you. We’re going to take you through using Amazon Lightsail, an AWS service that provides the quickest way to get started in the cloud, to deploy and scale an application on AWS.
The goal of Meet Mack Monday Zoom meetings is to inform residents of township issues that impact them and to get feedback and comments from residents about such issues. This helps me keep better informed of residents’ concerns when I vote on the issues at Board of Supervisors meetings. This meeting focused on Anti Chick-fil-A “on” the Bypass Petition Update, Wawa Coming Soon – Will It Sell Beer/Wine 24/7? LI/O-LI District Overlay Problems: High Density Housing, More Traffic Congestion, Pedestrian Crosswalk Improvements – Why the Delays? Corners at Newtown “Garage Core” Apartments: Should We Amend JMZO to Allow This New Use in the Town Center District?, Pollinator Garden” in Roberts Ridge Park, Indoor Pickleball Club Proposed for Vacant Bed, Bath, and Beyond Site
Pass AWS Certified Developer Associate with new exam dumps 2024SkillCertProExams
• For a full set of 1350+ questions. Go to
https://skillcertpro.com/product/aws-certified-developer-associate-practice-exam-questions/
• SkillCertPro offers detailed explanations to each question which helps to understand the concepts better.
• It is recommended to score above 85% in SkillCertPro exams before attempting a real exam.
• SkillCertPro updates exam questions every 2 weeks.
• You will get life time access and life time free updates
• SkillCertPro assures 100% pass guarantee in first attempt.
Using Large Language Models in Public Services (Past Tense)
#smart_conference #Nile_University #IEEE #AI #LLM #NLP
The presentation explored the transformative potential of large language models (LLMs) in revolutionizing public service delivery. As artificial intelligence and natural language processing technologies advanced, LLMs offered unprecedented opportunities to streamline operations, enhance citizen engagement, and drive innovative solutions for pressing societal challenges.
Trapbone Routing Plan created by Marcus Davis JrMarcusDavisJr1
This is a mock routing plan I made for musical artist Trapbone. The project was made while pursuing a music business bachelor's degree from Full Sail University.
This Presentations defines communication skills as the ability to exchange information via the use of language, both receptively and expressively. It examines several forms of communication based on organizational linkages and flow. Semantic concerns, emotional/psychological considerations, corporate policies, and personal attitudes can all operate as communication barriers. Effective communication is two-way, with active listening and feedback, and it is clear, concise, complete, concrete, respectful, and accurate. Good communication skills are essential for career success, dispute resolution, connection building, and increased productivity.
2. What is Amazon Redshift?
•Petabyte-scale columnar
database
•Fast response time
• ~ 10 times faster than
typical relational database
•Cost-effective (around 1000 $
per TB per year)
3. When customers are using Amazon Redshift?
• Reduce cost by
extending DW
instead of adding
HW
• Migrate from
existing DWH
system
• Respond faster to
business: provision
in minutes
Replacing
traditional DWH
• Improve
performance by an
order of magnitude
• Make more data
available for
analysis
• Access business
data via standard
reporting tools
Analyzing Big
Data
• Add analytics
functionality to the
applications
• Scale DW capacity
as demand grows
• Reduce SW and HW
costs by an order of
magnitude
Providing services
and SaaS
5. Redshift reduces IO
• With raw
storage you do
unnecessary IO
• With columnar
storage you
only read the
data you need
Column storage
• COPY
compresses
automatically
• You can analyze
and override
• More
performance,
less cost
Data
compression
• Track the
minimum and
maximum value
for each block
• Skip over blocks
that do not
contain relevant
data
Zone maps Direct-attached
storage
• > 2 GB/sec scan
rate
• Optimized for
data processing
• High disk density
7. Main possible Issues with Redshift
performance
• Incorrect column encoding
• Skewed table data
• Queries not benefiting from
sort keys – Excessive scanning
• Tables without statistics or
which need vacuum
• Tables with very large
VARCHAR columns
• Queries waiting on queue
slots
• Queries that are disk-based –
incorrect sizing, GROUP BY
distinct values, UNION, hash
joins with DISTINCT values
• Commit queue waits
• Inefficient data loads
• Inefficient use of Temporary
Tables
• Large nested loop JOINs
• Inappropriate Join Cardinality
8. Incorrect column encoding
Running an Amazon Redshift cluster without column encoding is not considered a best practice, and customers find large
performance gains when they ensure that column encoding is optimally applied. To determine if you are deviating from
this best practice, you can use the v_extended_table_info view from the Amazon Redshift Utils GitHub repository.
If you find that you have tables without optimal column encoding, then use
the Amazon Redshift Column Encoding Utility from the Utils repository to
apply encoding. This command line utility uses the ANALYZE COMPRESSION
command on each table. If encoding is required, it generates a SQL script
which creates a new table with the correct encoding, copies all the data
into the new table, and then transactionally renames the new table to the
old name while retaining the original data.
•Raw Encoding
•Byte-Dictionary Encoding
•Delta Encoding
•LZO Encoding
•Mostly Encoding
•Runlength Encoding
•Text255 and Text32k Encodings
•Zstandard Encoding
9. Skewed table data
If skew is a problem, you typically see
that node performance is uneven on
the cluster. Use table_inspector.sql, to
see how data blocks in a distribution
key map to the slices and nodes in the
cluster.
Consider changing the distribution key
to a column that exhibits high
cardinality and uniform distribution.
Evaluate a candidate column as a
distribution key by creating a new table
using CTAS:
CREATE TABLE my_test_table DISTKEY
(<column name>) AS SELECT <column
name> FROM <table name>;
You can use SCT for setting the proper Distribution Key and Style during the
migration process
10. Deal with tables with very large VARCHAR
columns
During processing of complex queries, intermediate query results might need to be stored in temporary blocks.
These temporary tables are not compressed, so unnecessarily wide columns consume excessive memory and
temporary disk space, which can affect query performance.
SELECT database, schema || '.' || "table" AS
"table", max_varchar FROM svv_table_info
WHERE max_varchar > 150 ORDER BY 2;
Use the following query to generate a list of
tables that should have their maximum column
widths reviewed:
Identify which table columns have wide varchar columns and
then determine the true maximum width for each wide
column:
SELECT max(len(rtrim(column_name))) FROM table_name;
If you query the top running queries for the database using the top_queries.sql admin script, pay special attention to
SELECT * queries which include the JSON fragment column. If end users query these large columns but don’t use
actually execute JSON functions against them, consider moving them into another table that only contains the
primary key column of the original table and the JSON column.
11. Queries not benefiting from sort keys
Amazon Redshift tables can have a sort key column identified, which acts like an index in other databases, but which
does not incur a storage cost as with other platforms (for more information, see Choosing Sort Keys). A sort key should
be created on those columns which are most commonly used in WHERE clauses. If you have a known query pattern,
then COMPOUND sort keys give the best performance; if end users query different columns equally, then use an
INTERLEAVED sort key. If using compound sort keys, review your queries to ensure that their WHERE clauses specify the
sort columns in the same order they were defined in the compound key.
To determine which tables don’t have sort keys, run the following query against the v_extended_table_info view from
the Amazon Redshift Utils repository:
SELECT * FROM admin.v_extended_table_info WHERE sortkey IS null;
If you do not have a sort key this potentially can lead to a excessive scanning issue.
12. Queries waiting on queue slots (ICT)
Amazon Redshift runs queries using a queuing system known as
workload management (WLM). You can define up to 8 queues
to separate workloads from each other, and set the
concurrency on each queue to meet your overall throughput
requirements.
In some cases, the queue to which a user or query has been
assigned is completely busy and a user’s query must wait for a
slot to be open. During this time, the system is not executing
the query at all, which is a sign that you may need to increase
concurrency.
First, you need to determine if any queries are queuing, using
the queuing_queries.sql admin script. Review the maximum
concurrency that your cluster has needed in the past with
wlm_apex.sql, down to an hour-by-hour historical analysis with
wlm_apex_hourly.sql. Keep in mind that while increasing
concurrency allows more queries to run, each query will get a
smaller share of the memory allocated to its queue
13. Queries that are disk-based (ICT)
SELECT q.query, trim(q.cat_text)
FROM (
SELECT query,
replace( listagg(text,' ') WITHIN GROUP (ORDER BY sequence),
'n', ' ') AS cat_text
FROM stl_querytext
WHERE userid>1
GROUP BY query) q
JOIN (
SELECT distinct query
FROM svl_query_summary
WHERE is_diskbased='t' AND (LABEL LIKE 'hash%' OR LABEL LIKE 'sort%’
OR LABEL LIKE 'aggr%’) AND userid > 1) qs
ON qs.query = q.query;
If a query isn’t able to
completely execute in memory,
it may need to use disk-based
temporary storage for parts of
an explain plan. The additional
disk I/O slows down the query;
this can be addressed by
increasing the amount of
memory allocated to a session
(for more information, see WLM
Dynamic Memory Allocation).
To determine if any queries have been writing to disk, use the following query:
Based on the user or the queue
assignment rules, you can
increase the amount of
memory given to the selected
queue to prevent queries
needing to spill to disk to
complete.
14. Commit queue waits (ICT)
Amazon Redshift is designed for analytics queries, rather than transaction processing. The cost of COMMIT is relatively
high, and excessive use of COMMIT can result in queries waiting for access to a commit queue.
If you are committing too often on your database, you will start to see waits on the commit queue increase, which can
be viewed with the commit_stats.sql admin script. This script shows the largest queue length and queue time for
queries run in the past two days. If you have queries that are waiting on the commit queue, then look for sessions that
are committing multiple times per session, such as ETL jobs that are logging progress or inefficient data loads.
One of the worst practices is to insert data into Amazon Redshift row by row. Use COPY command or your ETL tool
compatible with Amazon Redshift instead.
15. Inefficient use of Temporary Tables
Amazon Redshift provides temporary tables, which are like normal tables except that they are only visible within a single
session. When the user disconnects the session, the tables are automatically deleted. Temporary tables can be created
using the CREATE TEMPORARY TABLE syntax, or by issuing a SELECT … INTO #TEMP_TABLE query. The CREATE TABLE
statement gives you complete control over the definition of the temporary table, while the SELECT … INTO and C(T)TAS
commands use the input data to determine column names, sizes and data types, and use default storage properties.
These default storage properties may cause issues if not carefully considered. Amazon Redshift’s default table structure is
to use EVEN distribution with no column encoding. This is a sub-optimal data structure for many types of queries, and if you
are using the SELECT…INTO syntax you cannot set the column encoding or distribution and sort keys. If you use the CREATE
TABLE AS (CTAS) syntax, you can specify a distribution style and sort keys, and Redshift will automatically apply LZO
encoding for everything other than sort keys, booleans, reals and doubles. If you consider this automatic encoding sub-
optimal, and require further control, use the CREATE TABLE syntax rather than CTAS.
If you are creating temporary tables, it is highly recommended that you convert all SELECT…INTO syntax to use the CREATE
statement. This ensures that your temporary tables have column encodings and are distributed in a fashion that is
sympathetic to the other entities that are part of the workflow.
16. Tables without statistics
• Amazon Redshift, like other databases, requires statistics about tables and the
composition of data blocks being stored in order to make good decisions when
planning a query (for more information, see Analyzing Tables). Without good
statistics, the optimiser may make suboptimal choices about the order in which
to access tables, or how to join datasets together.
• The ANALYZE Command History topic in the Amazon Redshift Developer Guide
supplies queries to help you address missing or stale statistics, and you can also
simply run the missing_table_stats.sql admin script to determine which tables
are missing stats, or the statement below to determine tables that have stale
statistics:
SELECT database, schema || '.' || "table" AS "table", stats_off FROM svv_table_info
WHERE stats_off > 5 ORDER BY 2;
17. Tables which need VACUUM
In Amazon Redshift, data blocks are immutable. When
rows are DELETED or UPDATED, they are simply logically
deleted (flagged for deletion) but not physically
removed from disk. Updates result in a new block being
written with new data appended. Both of these
operations cause the previous version of the row to
continue consuming disk space and continue being
scanned when a query scans the table. As a result, table
storage space is increased and performance degraded
due to otherwise avoidable disk I/O during scans. A
VACUUM command recovers the space from deleted
rows and restores the sort order.
You can use the perf_alert.sql admin script to identify
tables that have had alerts about scanning a large
number of deleted rows raised in the last seven days.
To address issues with tables with missing or stale statistics or where vacuum is required, run another AWS Labs utility,
Analyze & Vacuum Schema. This ensures that you always keep up-to-date statistics, and only vacuum tables that actually
need reorganization.
18. Analyze the performance of the queries and
address issues
• Your best friends in a Redshift world
are:
• ANALYZE for identifying out-of-date
statistics
• SVL_QUERY_SUMMARY View for
summary of all the useful data around
the behavior of your queries and high-
level overview of the cluster
• Query Alerts – for receiving an
important information around
deviations in behavior of your queries
• SVL_QUERY_REPORT View – for
collecting details about your queries
health and performance
19. Setup and finetune monitoring and
maintenance procedures
• In addition to your preferred monitoring methods and multiple partners
solutions we do have some helpful tools:
• Amazon Redshift Column Encoding Utility
• Use table_inspector.sql, to see how data blocks in a distribution key map to the
slices and nodes in the cluster.
• Run the missing_table_stats.sql admin script to determine which tables are missing
stats.
• Use the perf_alert.sql admin script to identify tables that have had alerts about
scanning a large number of deleted rows raised in the last seven days.
• Use top_queries.sql to determine the top running queries.
• Review the maximum concurrency that your cluster has needed in the past with
wlm_apex.sql, down to an hour-by-hour historical analysis with
wlm_apex_hourly.sql.
• View the commit stats with the commit_stats.sql admin script
20. Eliminate Nested Loops
• Due to SQL query specifying a
join condition that requires a
”brute force” approach
between two large tables
• Quite easy to spot
• Look for “Nested Loop” in a
Query Plans or
STL_ALERT_EVENT_LOG
ON a.date > b.date
ON a.text LIKE b.text
ON a.x = b.x OR a.y = b.y
• Rewrite inequality JOIN condition
as a window function
• Use small nested loop instead of
two large tables
• Maybe you can do a nested loop
join and persist the results in a
separate relational table
21. Inappropriate Joins Cardinality
• Query “Fan-out”
• Look for high number of rows
generated from joins (higher
than the sum of rows of all
scanned tables) in the query
plan or execution metrics
• Carefully review the joins logic
• Use SVL_QUERY_SUMMARY to
detect
FROM house
JOIN rooms ON rooms.house_id = house.id
JOIN residents ON residents.house_id = house.id
• If possible break large fan-out
queries into several smaller
queries
• Use derived tables to transform
one-to-many joins into one-to-one
joins
• Try out some advanced techniques
like 1 and 2
22. Suboptimal WHERE clause
If your WHERE clause causes excessive table scans, you might see a SCAN
step in the segment with the highest maxtime value in
SVL_QUERY_SUMMARY. For more information, see Using the
SVL_QUERY_SUMMARY View.
To fix this issue, add a WHERE clause to the query based on the primary sort
column of the largest table. This approach will help minimize scanning time.
For more information, see Amazon Redshift Best Practices for Designing
Tables.
23. Insufficiently Restrictive Predicate
If your query has an insufficiently restrictive predicate, you might see a
SCAN step in the segment with the highest maxtime value in
SVL_QUERY_SUMMARY that has a very high rows value compared to the
rows value in the final RETURN step in the query. For more information, see
Using the SVL_QUERY_SUMMARY View.
To fix this issue, try adding a predicate to the query or making the existing
predicate more restrictive to narrow the output.
24. Very Large Result Set
If your query returns a very large result set, consider rewriting the query to
use UNLOAD to write the results to Amazon S3.
This approach will improve the performance of the RETURN step by taking
advantage of parallel processing. For more information on checking for a
very large result set, see Using the SVL_QUERY_SUMMARY View.
25. Large SELECT List
If your query has an unusually large SELECT list, you might see a bytes value
that is high relative to the rows value for any step (in comparison to other
steps) in SVL_QUERY_SUMMARY. This high bytes value can be an indicator
that you are selecting a lot of columns. For more information, see Using the
SVL_QUERY_SUMMARY View.
To fix this issue, review the columns you are selecting and see if any can be
removed.
26. Working with SVL_QUERY_SUMMARY
SELECT query, elapsed, substring FROM svl_qlog ORDER BY query DESC limit 5;
Select your query ID
Collect query data
SELECT * FROM svl_query_summary WHERE query = MyQueryID ORDER BY stm, seg, step;
For analysis please refer to: https://docs.aws.amazon.com/redshift/latest/dg/using-SVL-
Query-Summary.html
27. Some additional materials can be found here
• Best practices for designing queries
• Best practices for designing tables
• Top 10 performance tuning techniques
• Troubleshooting queries
Amazon Redshift Developer Guide
https://docs.aws.amazon.com/redshift/l
atest/dg/redshift-dg.pdf