Just when the world of “Data 1.0” showed some signs of maturing; the “Outside In” driven demands seem to have already initiated some the disruptive changes to the data landscape. Parallel growth in volume, velocity and variety of data coupled with incessant war on finding newer insights and value from data has posed a Big Question: Is Your Data Warehouse Relevant?
In short, the surrounding changes happening real time is the new “Data 2.0”. It is characterized by feeding the ever hungry minds with sharper insights whether it is related to regulation, finance, corporate action, risk management or purely aimed at improving operational efficiencies. The source in this new “Data 2.0” has to be commensurate to the outside in demands from customers, regulators, stakeholders and business users; and hence, you would need a high relformance (relevance + performance) data warehouse which will be relevant to your business eco-system and will have the power to scale exponentially.
We starts this webinar by giving the audiences a sneak preview of what happened in the Data 1.0 world & which characteristics are shaping the new Data 2.0 world. It then delves deep on the challenges that growing data volumes have posed to the Data warehouse teams. It also presents the audiences some of the practical and proven methodologies to address these performance challenges. Finally, in the end it will highlight some of the thought provoking ways to turbo charge your data warehouse related initiatives by leveraging some of the newer technologies like Hadoop. Overall, the webinar will educate audiences with building high performance and relevant data warehouses which is capable of meeting the newer demands while significantly driving down the total cost of ownership.
Snowflake's Kent Graziano talks about what makes a data warehouse as a service and some of the key features of Snowflake's data warehouse as a service.
Enterprise Data Management Framework OverviewJohn Bao Vuu
A solid data management foundation to support big data analytics and more importantly a data-driven culture is necessary for today’s organizations.
A mature Data Management Program can reduce operational costs and enable rapid business growth and development. Data Management program must evolve to monetize data assets, deliver breakthrough innovation and help drive business strategies in new markets.
The document discusses metadata in data warehousing and business intelligence contexts. Some key points:
1. Metadata provides information about data in a data warehouse or warehouse components like data marts. It describes data structures, attributes, transformations and more.
2. Metadata is important for tasks like ETL processing, querying, reporting and overall data management. It helps users understand what data is available and how to access and analyze it.
3. There are different types of metadata including technical metadata about data storage and processes, and business metadata that provides business definitions and rules. Maintaining accurate and consistent metadata is vital for a successful data warehouse.
Data Architecture Strategies: Data Architecture for Digital TransformationDATAVERSITY
MDM, data quality, data architecture, and more. At the same time, combining these foundational data management approaches with other innovative techniques can help drive organizational change as well as technological transformation. This webinar will provide practical steps for creating a data foundation for effective digital transformation.
According to Gartner, “By 2018, organizations with data virtualization capabilities will spend 40% less on building and managing data integration processes for connecting distributed data assets.” This solidifies Data Virtualization as a critical piece of technology for any flexible and agile modern data architecture.
This session will:
• Introduce data virtualization and explain how it differs from traditional data integration approaches
• Discuss key patterns and use cases of Data Virtualization
• Set the scene for subsequent sessions in the Packed Lunch Webinar Series, which will take a deeper dive into various challenges solved by data virtualization.
Agenda:
• Introduction & benefits of DV
• Summary & Next Steps
• Q&A
Watch full webinar here: https://goo.gl/EFQNFs
This webinar is part of the Data Virtualization Packed Lunch Webinar Series: https://goo.gl/W1BeCb
Tekslate.com is the Industry leader in providing Informatica Data Quality Training across the globe. Our online training methodology focus on hands on experience of Informatica Data Quality.
- The document discusses data management strategies for accountants and compliance with accounting standards. It addresses data quality, governance, and assurance frameworks.
- Various definitions are provided around data quality, governance, and frameworks to structure quality activities and assess data quality.
- A data governance strategy is recommended that sets core data standards, focuses initially on critical data, and uses a slow-burn approach of monthly/quarterly reviews and a program of works to gradually improve data quality and maturity.
This document summarizes SAP archiving and document management at Australia Post. It discusses why Open Text was selected as the archiving solution and provides an overview of the SAP and Open Text architecture landscape. It also lists several SAP objects that have been implemented for archiving and discusses new opportunities for archive and imaging within SAP at Australia Post.
The document discusses modern data architectures. It presents conceptual models for data ingestion, storage, processing, and insights/actions. It compares traditional vs modern architectures. The modern architecture uses a data lake for storage and allows for on-demand analysis. It provides an example of how this could be implemented on Microsoft Azure using services like Azure Data Lake Storage, Azure Data Bricks, and Azure Data Warehouse. It also outlines common data management functions such as data governance, architecture, development, operations, and security.
Most organisations think that they have poor data quality, but don’t know how to measure it or what to do about it. Teams of data scientists, analysts, and ETL developers are either blindly taking a “garbage in -> garbage out” approach, or worse still, “cleansing” data to fit their limited perspectives. DataOps is a systematic approach to measuring data and for planning mitigations for bad data.
This document discusses data warehousing, including its definition, importance, components, strategies, ETL processes, and considerations for success and pitfalls. A data warehouse is a collection of integrated, subject-oriented, non-volatile data used for analysis. It allows more effective decision making through consolidated historical data from multiple sources. Key components include summarized and current detailed data, as well as transformation programs. Common strategies are enterprise-wide and data mart approaches. ETL processes extract, transform and load the data. Clean data and proper implementation, training and maintenance are important for success.
The document discusses different techniques for building a Customer Data Hub (CDH), including registry, co-existence, and transactional techniques. It outlines the CDH build methodology, including data analysis, defining the data model and business logic, participation models, governance, and deliverables. An example enterprise customer data model is also shown using a hybrid-party model with relationships, hierarchies, and extended attributes.
The document discusses Snowflake, a cloud data platform. It covers Snowflake's data landscape and benefits over legacy systems. It also describes how Snowflake can be deployed on AWS, Azure and GCP. Pricing is noted to vary by region but not cloud platform. The document outlines Snowflake's editions, architecture using a shared-nothing model, support for structured data, storage compression, and virtual warehouses that can autoscale. Security features like MFA and encryption are highlighted.
Embarking on building a modern data warehouse in the cloud can be an overwhelming experience due to the sheer number of products that can be used, especially when the use cases for many products overlap others. In this talk I will cover the use cases of many of the Microsoft products that you can use when building a modern data warehouse, broken down into four areas: ingest, store, prep, and model & serve. It’s a complicated story that I will try to simplify, giving blunt opinions of when to use what products and the pros/cons of each.
This document introduces an online course on data warehousing from Edureka. It provides an overview of key topics that will be covered in the course, including what a data warehouse is, its architecture, the ETL process, and modeling dimensions and facts. It also shows examples of using PostgreSQL to create tables and Talend to populate them as part of a hands-on project in the course. The course modules will cover data warehousing introduction, dimensions and facts, normalization, modeling, ETL concepts, and a project building a data warehouse using Talend.
The document provides guidance on requirements gathering and implementation for an SAP BI project. It outlines key steps including establishing business sponsorship, defining scope, prototyping reports, testing, training users, and obtaining sign-off. Requirements gathering involves workshops to specify report needs in detail. Reports are then developed, tested, and prototyped for user feedback before final development and testing prior to go-live. The roles of business and technical teams are also defined.
In the past few years, the term "data lake" has leaked into our lexicon. But what exactly IS a data lake? Some IT managers confuse data lakes with data warehouses. Some people think data lakes replace data warehouses. Both of these conclusions are false. Their is room in your data architecture for both data lakes and data warehouses. They both have different use cases and those use cases can be complementary.
Todd Reichmuth, Solutions Engineer with Snowflake Computing, has spent the past 18 years in the world of Data Warehousing and Big Data. He spent that time at Netezza and then later at IBM Data. Earlier in 2018 making the jump to the cloud at Snowflake Computing.
Mike Myer, Sales Director with Snowflake Computing, has spent the past 6 years in the world of Security and looking to drive awareness to better Data Warehousing and Big Data solutions available! Was previously at local tech companies FireMon and Lockpath and decided to join Snowflake due to the disruptive technology that's truly helping folks in the Big Data world on a day to day basis.
The document discusses using the Data Vault 2.0 methodology for agile data mining projects. It provides background on a customer segmentation project for a motor insurance company. The Data Vault 2.0 modeling approach is described as well as the CRISP-DM process model. An example is then shown applying several iterations of a decision tree model to a sample database, improving results with each iteration by adding additional attributes to the Data Vault 2.0 model and RapidMiner process. The conclusions state that Data Vault 2.0 provides a flexible data model that supports an agile approach to data mining projects by allowing incremental changes to the model and attributes.
Tackling data quality problems requires more than a series of tactical, one off improvement projects. By their nature, many data quality problems extend across and often beyond an organization. Addressing these issues requires a holistic architectural approach combining people, process and technology. Join Donna Burbank and Nigel Turner as they provide practical ways to control data quality issues in your organization.
Data Warehouse Design and Best PracticesIvo Andreev
A data warehouse is a database designed for query and analysis rather than for transaction processing. An appropriate design leads to scalable, balanced and flexible architecture that is capable to meet both present and long-term future needs. This session covers a comparison of the main data warehouse architectures together with best practices for the logical and physical design that support staging, load and querying.
Take your reports to the next dimension! In this session we will discuss how to combine the power of SSRS and SSAS to create cube driven reports. We will talk about using SSAS as a data source, writing MDX queries, using report parameters, passing parameters for drill down reports, performance tuning, and the pro’s and con’s of using a cube as your data source.
Jeff Prom is a Senior Consultant with Magenic Technologies. He holds a bachelor’s degree, three SQL Server certifications, and is an active PASS member. Jeff has been working in the IT industry for over 14 years and currently specializes in data and business intelligence.
Column-oriented databases like Infobright Community Edition are well-suited for data warehousing due to their high data compression rates and efficient handling of analytic queries. Infobright uses data packs, knowledge nodes, and an optimizer to retrieve only necessary column data without decompressing entire files. It achieves industry-leading compression of 10-40x by optimizing algorithms for each data type and stores metadata to resolve complex queries without traditional row-based indexing. By integrating with MySQL, Infobright leverages existing connectivity and provides a low-cost option for data warehousing and business intelligence.
DWBI98 - Template Solutions for Data Warehouses and Data Marts - PresentationDavid Walker
The document appears to be a presentation by David M. Walker from Data Management & Warehousing given on October 14, 1998 about template solutions for data warehouses and data marts. The presentation contains 15 slides and discusses using templates to build data warehouses and marts more efficiently.
Testing data warehouse applications by Kirti BhushanKirti Bhushan
This document outlines a data warehouse testing strategy. It begins with an introduction that defines a data warehouse and discusses the need for data warehouse testing and challenges it presents. It then describes the testing model, including phases for project definition, test design, development, execution and acceptance. Next, it covers the goals of data warehouse testing like data completeness, transformation, quality and various types of non-functional testing. Finally, it discusses roles, artifacts, tools and references related to data warehouse testing.
A short overview about Business Intelligence. What BI is in short, how BI market is growing, what vendors are operating in the market today. Future directions.
Dokumen tersebut merupakan presentasi tentang seminar data warehouse yang membahas tentang pengertian data warehouse, karakteristiknya, arsitektur, proses ETL, dan perancangan data warehouse dengan skema bintang.
Keynote: The Journey to Pervasive AnalyticsCloudera, Inc.
We are in the middle of a data rush. When you are right in the center of a storm, it can seem overwhelming. Where should I start? What do I need to think about? What is the best long-term bet? But don’t forget that more data should mean great news. More data should mean more insight, more guidance, and more strategic direction. However, more data doesn’t automatically rally your entire business around common goals and insights. You need a platform and architecture that can support a thriving, analytic-driven business culture that embraces a pervasive analytics strategy.
Oracle GoldenGate Demo and Data Integration ConceptsFumiko Yamashita
The document discusses an Oracle GoldenGate demo and data integration concepts. It provides an agenda that includes an Oracle GoldenGate UI demo of migrating data from a Siebel 7 and Database 10g source system to a Siebel 8 and Database 11g target system. It also discusses a technical demo of data transformation. Additionally, it covers data integration concepts such as operational reporting vs data warehousing, master data management, and change data capture.
The document discusses advances in database querying and summarizes key topics including data warehousing, online analytical processing (OLAP), and data mining. It describes how data warehouses integrate data from various sources to enable decision making, and how OLAP tools allow users to analyze aggregated data and model "what-if" scenarios. The document also covers data transformation techniques used to build the data warehouse.
The document discusses two approaches to data warehousing - the Kimball and Inmon approaches. The Inmon approach involves building a centralized data warehouse first with data from across the enterprise, taking a top-down approach. The Kimball approach involves first building smaller departmental data marts using dimensional modeling and then combining them using a data bus architecture in a bottom-up approach. While both aim to consolidate organizational data, the Inmon approach emphasizes a single integrated structure for the entire enterprise whereas the Kimball approach focuses on flexibility through multiple simple structures.
White paper making an-operational_data_store_(ods)_the_center_of_your_data_...Eric Javier Espino Man
The document discusses implementing an operational data store (ODS) to centralize data from multiple source systems. An ODS integrates disparate data for reporting and analytics while insulating operational systems. The document recommends selling an ODS internally by highlighting benefits like reduced workload for ETL developers and improved access to real-time data for business users. It also provides best practices like using automation tools that simplify ODS creation and maintenance.
This document provides an overview of the 3-tier data warehouse architecture. It discusses the three tiers: the bottom tier contains the data warehouse server which fetches relevant data from various data sources and loads it into the data warehouse using backend tools for extraction, cleaning, transformation and loading. The bottom tier also contains the data marts and metadata repository. The middle tier contains the OLAP server which presents multidimensional data to users from the data warehouse and data marts. The top tier contains the front-end tools like query, reporting and analysis tools that allow users to access and analyze the data.
This document discusses the two main approaches to data warehouse architecture: Bill Inmon's approach and Ralph Kimball's approach. Bill Inmon advocates for a single, large integrated data warehouse schema with a top-down design. This takes longer and is more expensive but provides a more complex, stable solution. Ralph Kimball prefers multiple smaller subject-oriented data marts with dimensional modeling. This is quicker to deliver and implement but requires later integration into a full data warehouse. The document also stresses the importance of understanding user needs and involving users throughout the process.
This document provides an overview of dimensional modeling techniques for data warehouse design, including what a data warehouse is, how dimensional modeling fits into the data presentation area, and some of the key concepts and components of dimensional modeling such as facts, dimensions, and star schemas. It also discusses design concepts like snowflake schemas, slowly changing dimensions, and conformed dimensions.
The document presents on multidimensional data models. It discusses the key components of multidimensional data models including dimensions and facts. It describes different types of multidimensional data models such as data cube model, star schema model, snowflake schema model, and fact constellations. The star schema model and snowflake schema model are explained in more detail through examples and their benefits are highlighted.
Breakout: Hadoop and the Operational Data StoreCloudera, Inc.
As disparate data volumes continue to be operationalized across the enterprise, data will need to be processed, cleansed, transformed, and made available to end users at greater speeds. Traditional ODS systems run into issues when trying to process large data volumes causing operations to be backed up, data to be archived, and ETL/ ELT processes to fail. Join this breakout to learn how to battle these issues.
Best practices and tips on how to design and develop a Data Warehouse using Microsoft SQL Server BI products.
This presentation describes the inception and full lifecycle of the Carl Zeiss Vision corporate enterprise data warehouse.
Technologies covered include:
•Using SQL Server 2008 as your data warehouse DB
•SSIS as your ETL Tool
•SSAS as your data cube Tool
You will Learn:
•How to Architect a data warehouse system from End-to-End
•Components of the data warehouse and functionality
•How to Profile data and understand your source systems
•Whether to ODS or not to ODS (Determining if a operational Data Store is required)
•The staging area of the data warehouse
•How to Build the data warehouse – Designing Dimensions and Fact tables
•The Importance of using Conformed Dimensions
•ETL – Moving data through your data warehouse system
•Data Cubes - OLAP
•Lessons learned from Zeiss and other projects
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!Caserta
Joe Caserta went over the details inside the big data ecosystem and the Caserta Concepts Data Pyramid, which includes Data Ingestion, Data Lake/Data Science Workbench and the Big Data Warehouse. He then dove into the foundation of dimensional data modeling, which is as important as ever in the top tier of the Data Pyramid. Topics covered:
- The 3 grains of Fact Tables
- Modeling the different types of Slowly Changing Dimensions
- Advanced Modeling techniques like Ragged Hierarchies, Bridge Tables, etc.
- ETL Architecture.
He also talked about ModelStorming, a technique used to quickly convert business requirements into an Event Matrix and Dimensional Data Model.
This was a jam-packed abbreviated version of 4 days of rigorous training of these techniques being taught in September by Joe Caserta (Co-Author, with Ralph Kimball, The Data Warehouse ETL Toolkit) and Lawrence Corr (Author, Agile Data Warehouse Design).
For more information, visit http://casertaconcepts.com/.
The document discusses data warehousing and OLAP technology for data mining. It defines a data warehouse as a subject-oriented, integrated, time-variant, and non-volatile collection of data that supports management decision making. It describes how data warehouses use a multi-dimensional data model with dimensions and facts to organize data into cubes that can be sliced, diced, and aggregated. It also discusses how data warehouse architecture, implementation, indexing techniques, and metadata repositories help optimize online analytical processing queries on historical and summarized data to support data mining.
Become BI Architect with 1KEY Agile BI Suite - OLAPDhiren Gala
Business intelligence uses applications and technologies to analyze data and help users make better business decisions. Online transaction processing (OLTP) is used for daily operations like processing, while online analytical processing (OLAP) is used for data analysis and decision making. Data warehouses integrate data from different sources to provide a centralized system for analysis and reporting. Dimensional modeling approaches like star schemas and snowflake schemas organize data to support OLAP.
Business Intelligence and Multidimensional DatabaseRussel Chowdhury
It was an honor that my employer assigned me to study with Business Intelligence that follows SQL Server Analysis
Services. Hence I started and prepared a presentation as a startup guide for a new learner.
* Thanks to all the contributions gathered here to prepare the doc.
Data Warehouse approaches with Dynamics AXAlvin You
Dynamics AX의 BI 구축을 위해 필요한 Data Warehouse 내용입니다.
• What is a Data Warehouse
• Data Warehouse Approaches
• Why Invest in a Data Warehouse
• Getting Started
• BI Models
• BI Solutions
The document discusses data warehousing and OLAP technology for data mining. It defines a data warehouse as a subject-oriented, integrated, time-variant, and nonvolatile collection of data to support management decision making. It describes how a data warehouse uses a multi-dimensional data model with dimensions and measures. It also discusses efficient computation of data cubes, OLAP operations, and further developments in data cube technology like discovery-driven and multi-feature cubes to support data mining applications from information processing to analytical processing and knowledge discovery.
We offer online IT training with placements, project assistance in different platforms with real time industry consultants to provide quality training for all it professionals, corporate clients and students etc. Special features by InformaticaTrainingClasses are Extensive Training will be in both Informatica Online Training and Placement. We help you in resume preparation and conducting Mock Interviews.
Emphasis is given on important topics which are essential and mostly used in real time projects. Informatica training Classes is an Online Training Leader when it comes to high-end effective and efficient I.T Training. We have always been and still are focusing on the key aspects which are providing utmost effective and competent training to both students and professionals who are eager to enrich their technical skills.
Training Features at Informatica training classes:
We believe that online training has to be measured by three major aspects viz., Quality, Content and Relationship with the Trainer and Student. Not only our online training classes are important but apart from that the material which we provide are in tune with the latest IT training standards, so a student has not to worry at all whether the training imparted is outdated or latest.
Course content:
• Basics of data warehousing concepts
• Power center components
• Informatica concepts and overview
• Sources
• Targets
• Transformations
• Advanced Informatica concepts
Please Visit us for the Demo Classes, we have regular batches and weekend batches.
Informatica online training classes
Phone: (404)-900-9988
Email: info@informaticatrainingclasses.com
Web: http://www.informaticatrainingclasses.com
There are four categories of business dashboards that provide different benefits:
1. Traditional performance summary dashboards present summary information and alerts in tables and graphs.
2. Traditional metrics dashboards display metrics and compare actual to goals to track progress toward goals.
3. Dynamic content dashboards allow interactive analysis and embed workflows to assist decision making.
4. Dynamic visualization dashboards display more data using advanced visualizations to improve decision making.
Effective dashboards mirror the user's workflow, allow drilling into data, and can be shared and used offline.
What is OLAP -Data Warehouse Concepts - IT Online Training @ NewyorksysNEWYORKSYS-IT SOLUTIONS
NEWYORKSYSTRAINING are destined to offer quality IT online training and comprehensive IT consulting services with complete business service delivery orientation.
The document provides an overview of SAP BI training. It discusses that SAP stands for Systems Applications and Products in Data Processing and was founded in 1972 in Germany. It is the world's fourth largest software provider and largest provider of ERP software. The training covers topics such as the 3-tier architecture of SAP, data warehousing, ETL, the SAP BI architecture and key components, OLTP vs OLAP, business intelligence definition, and the ASAP methodology for SAP implementations.
Delivering fast, powerful and scalable analyticsMariaDB plc
This session will provide insight on making the most of your data assets with analytics, and what you need for your next analytics project. We’ll showcase how the MariaDB AX solution delivers fast and scalable analytics using real-world use cases.
Fast, powerful and scalable analytics can provide many business benefits including getting more value from data, faster decision making, cost reduction, and developing new products and services. There are four main types of analytics: descriptive (what happened), diagnostic (why did it happen), predictive (what will happen), and prescriptive (what action should be taken). MariaDB AX is a big data analytics solution that provides real-time analytics capabilities, built-in analytics functions, and easier management and scaling on commodity hardware at a lower cost than other solutions. It allows for both transactional and analytical processing using a single SQL interface.
The document provides an introduction to data warehousing. It defines a data warehouse as a subject-oriented, integrated, time-varying, and non-volatile collection of data used for organizational decision making. It describes key characteristics of a data warehouse such as maintaining historical data, facilitating analysis to improve understanding, and enabling better decision making. It also discusses dimensions, facts, ETL processes, and common data warehouse architectures like star schemas.
The document discusses the data warehouse lifecycle and key components. It covers topics like source systems, data staging, presentation area, business intelligence tools, dimensional modeling concepts, fact and dimension tables, star schemas, slowly changing dimensions, dates, hierarchies, and physical design considerations. Common pitfalls discussed include becoming overly focused on technology, tackling too large of projects, and neglecting user acceptance.
Business Intelligence Data Warehouse SystemKiran kumar
This document provides an overview of data warehousing and business intelligence concepts. It discusses:
- What a data warehouse is and its key properties like being integrated, non-volatile, time-variant and subject-oriented.
- Common data warehouse architectures including dimensional modeling, ETL processes, and different layers like the data storage layer and presentation layer.
- How data marts are subsets of the data warehouse that focus on specific business functions or departments.
- Different types of dimensions tables and slowly changing dimensions.
- How business intelligence uses the data warehouse for analysis, querying, reporting and generating insights to help with decision making.
This document provides an overview of data warehousing and related concepts. It defines a data warehouse as a centralized database for analysis and reporting that stores current and historical data from multiple sources. The document describes key elements of data warehousing including Extract-Transform-Load (ETL) processes, multidimensional data models, online analytical processing (OLAP), and data marts. It also outlines advantages such as enhanced access and consistency, and disadvantages like time required for data extraction and loading.
What is a Data Warehouse and How Do I Test It?RTTS
ETL Testing: A primer for Testers on Data Warehouses, ETL, Business Intelligence and how to test them.
Are you hearing and reading about Big Data, Enterprise Data Warehouses (EDW), the ETL Process and Business Intelligence (BI)? The software markets for EDW and BI are quickly approaching $22 billion, according to Gartner, and Big Data is growing at an exponential pace.
Are you being tasked to test these environments or would you like to learn about them and be prepared for when you are asked to test them?
RTTS, the Software Quality Experts, provided this groundbreaking webinar, based upon our many years of experience in providing software quality solutions for more than 400 companies.
You will learn the answer to the following questions:
• What is Big Data and what does it mean to me?
• What are the business reasons for a building a Data Warehouse and for using Business Intelligence software?
• How do Data Warehouses, Business Intelligence tools and ETL work from a technical perspective?
• Who are the primary players in this software space?
• How do I test these environments?
• What tools should I use?
This slide deck is geared towards:
QA Testers
Data Architects
Business Analysts
ETL Developers
Operations Teams
Project Managers
...and anyone else who is (a) new to the EDW space, (b) wants to be educated in the business and technical sides and (c) wants to understand how to test them.
Innovate 2014 - Customizing Your Rational Insight Deployment (workshop)Marc Nehme
This workshop will enable users on the IBM Rational Insight solution and how to customize it to achieve their specific business needs. The first segment will provide an overview of the Insight offering. The next segment will provide an understanding of framework data modeling and a walkthrough of the customization process. The next segment will be a series of hands-on labs that will allow users to assume the role of an Insight developer and report author. Users will experience and end to end scenario in creating a custom metric and report based on an IBM Rational Team Concert custom attribute, however the process is applicable to any data source.
Team project - Data visualization on Olist company dataManasa Damera
This document discusses optimizing the analytics process for a Brazilian e-commerce company called Olist. It begins with an overview of the client scenario and scattered data. The goals are to normalize the data, optimize the ETL process, and automate analytics insights. The document describes plans to normalize the data into tables, extract data from CSV files, transform and clean the data, and load it into a PostgreSQL database. It discusses how analytical procedures and a dashboard can provide insights for different departments. Finally, it demonstrates the database interaction through a Metabase dashboard.
Similar to Designing high performance datawarehouse (20)
MoSync Cross Platform mobile app developmentUday Kothari
You can use MoSyc to develop cross platform mobile apps. Its free, very stable and easy to build mobile apps. Use MoSyc Reload to build prototype quickly and MoSync IDE to build native apps.
Cross platform mobile app development tools reviewUday Kothari
The document discusses various cross-platform mobile app development frameworks. It describes frameworks that use web technologies like HTML5, CSS, and JavaScript to build apps that are portable across platforms (e.g. PhoneGap, Rhodes). It also discusses frameworks that use cross-compilation to build truly native mobile apps from a common codebase (e.g. Xamarin, Appcelerator). The document provides brief overviews of popular frameworks like Sencha Touch, Appcelerator, and MoSync, outlining their pros, cons, and capabilities.
BI & Analytics in Action Using QlikViewUday Kothari
The QlikView Business Discovery platform delivers true self-service BI that empowers business users by driving innovative decision-making. It is one of the fastest growing BI products and has been recognized in the industry for its ease of use for business users, visualizing data with engaging, state-of-the-art graphics and ability to consolidate relevant data from multiple sources into a single application. Companies in India like Flipkart, Godrej, Canon, HDFC Life, Reliance life and shoppers stop are leveraging QlikView to empower their business users through QlikView.
This webinar will gives an overview of QlikView architecture, talks about why it is different and then, it will also take you through how to get started with using QlikView. For data analytics enthusiasts, this webinar presents simplest ways to learn QlikView. At the end of this webinar, participants will be equipped to install QlikView and create simple dashboards. In short this is a 'Fast track indtroduction to creating your first QlikView dashboard'.
Business Intelligence and Big Data Analytics with Pentaho Uday Kothari
This webinar gives an overview of the Pentaho technology stack and then delves deep into its features like ETL, Reporting, Dashboards, Analytics and Big Data. The webinar also facilitates a cross industry perspective and how Pentaho can be leveraged effectively for decision making. In the end, it also highlights how apart from strong technological features, low TCO is central to Pentaho’s value proposition. For BI technology enthusiasts, this webinar presents easiest ways to learn an end to end analytics tool. For those who are interested in developing a BI / Analytics toolset for their organization, this webinar presents an interesting option of leveraging low cost technology. For big data enthusiasts, this webinar presents overview of how Pentaho has come out as a leader in data integration space for Big data.
Pentaho is one of the leading niche players in Business Intelligence and Big Data Analytics. It offers a comprehensive, end-to-end open source platform for Data Integration and Business Analytics. Pentaho’s leading product: Pentaho Business Analytics is a data integration, BI and analytics platform composed of ETL, OLAP, reporting, interactive dashboards, ad hoc analysis, data mining and predictive analytics.
The art technique of data visualizationUday Kothari
Decision making based on information has been the single most important objective of a data warehousing or big data pursuit. No matter how big, fast and varied data are generated and processed; decision makers are only concerned with the consumption of its end result – data visualization.
Data visualization simply means representing data in a visually appealing manner to enable understanding of the context in which we operate. Data visualization is a “moment of truth” that stems from a data management initiative. It is a very linear process of decision making; and hence, critical to its success. However, data visualizations also possess the potential to put an end to such initiatives; especially, when they are either heavily biased on just the design or contain information overload.
This webinar on the art and technique of data visualization focuses sharply on the one thing that matters most to qualify for effective data visualization: the truth that comes out from data. We have facilitated the discussion with the help of our 3D framework: Design, Discovery & Data.
After registering, you will receive a confirmation email containing information about joining the webinar.
Innovative Internet & Digital marketing Uday Kothari
This presentation was given on 24th Aug at MCCIA Pune event. It discusses how search engine works, SEO and SMO. How one can engage with web site visitors.
Self-Healing Test Automation Framework - HealeniumKnoldus Inc.
Revolutionize your test automation with Healenium's self-healing framework. Automate test maintenance, reduce flakes, and increase efficiency. Learn how to build a robust test automation foundation. Discover the power of self-healing tests. Transform your testing experience.
"Hands-on development experience using wasm Blazor", Furdak Vladyslav.pptxFwdays
I will share my personal experience of full-time development on wasm Blazor
What difficulties our team faced: life hacks with Blazor app routing, whether it is necessary to write JavaScript, which technology stack and architectural patterns we chose
What conclusions we made and what mistakes we committed
It's your unstructured data: How to get your GenAI app to production (and spe...Zilliz
So you've successfully built a GenAI app POC for your company -- now comes the hard part: bringing it to production. Aparavi addresses the challenges of AI projects while addressing data privacy and PII. Our Service for RAG helps AI developers and data scientists to scale their app to 1000s to millions of users using corporate unstructured data. Aparavi’s AI Data Loader cleans, prepares and then loads only the relevant unstructured data for each AI project/app, enabling you to operationalize the creation of GenAI apps easily and accurately while giving you the time to focus on what you really want to do - building a great AI application with useful and relevant context. All within your environment and never having to share private corporate data with anyone - not even Aparavi.
Garbage In, Garbage Out: Why poor data curation is killing your AI models (an...Zilliz
Enterprises have traditionally prioritized data quantity, assuming more is better for AI performance. However, a new reality is setting in: high-quality data, not just volume, is the key. This shift exposes a critical gap – many organizations struggle to understand their existing data and lack effective curation strategies and tools. This talk dives into these data challenges and explores the methods of automating data curation.
The Zaitechno Handheld Raman Spectrometer is a powerful and portable tool for rapid, non-destructive chemical analysis. It utilizes Raman spectroscopy, a technique that analyzes the vibrational fingerprint of molecules to identify their chemical composition. This handheld instrument allows for on-site analysis of materials, making it ideal for a variety of applications, including:
Material identification: Identify unknown materials, minerals, and contaminants.
Quality control: Ensure the quality and consistency of raw materials and finished products.
Pharmaceutical analysis: Verify the identity and purity of pharmaceutical compounds.
Food safety testing: Detect contaminants and adulterants in food products.
Field analysis: Analyze materials in the field, such as during environmental monitoring or forensic investigations.
The Zaitechno Handheld Raman Spectrometer is easy to use and features a user-friendly interface. It is compact and lightweight, making it ideal for field applications. With its rapid analysis capabilities, the Zaitechno Handheld Raman Spectrometer can help you improve efficiency and productivity in your research or quality control workflows.
This PDF delves into the aspects of information security from a forensic perspective, focusing on privacy leaks. It provides insights into the methods and tools used in forensic investigations to uncover and mitigate privacy breaches in mobile and cloud environments.
TrustArc Webinar - Innovating with TRUSTe Responsible AI CertificationTrustArc
In a landmark year marked by significant AI advancements, it’s vital to prioritize transparency, accountability, and respect for privacy rights with your AI innovation.
Learn how to navigate the shifting AI landscape with our innovative solution TRUSTe Responsible AI Certification, the first AI certification designed for data protection and privacy. Crafted by a team with 10,000+ privacy certifications issued, this framework integrated industry standards and laws for responsible AI governance.
This webinar will review:
- How compliance can play a role in the development and deployment of AI systems
- How to model trust and transparency across products and services
- How to save time and work smarter in understanding regulatory obligations, including AI
- How to operationalize and deploy AI governance best practices in your organization
The History of Embeddings & Multimodal EmbeddingsZilliz
Frank Liu will walk through the history of embeddings and how we got to the cool embedding models used today. He'll end with a demo on how multimodal RAG is used.
Redefining Cybersecurity with AI CapabilitiesPriyanka Aash
In this comprehensive overview of Cisco's latest innovations in cybersecurity, the focus is squarely on resilience and adaptation in the face of evolving threats. The discussion covers the imperative of tackling Mal information, the increasing sophistication of insider attacks, and the expanding attack surfaces in a hybrid work environment. Emphasizing a shift towards integrated platforms over fragmented tools, Cisco introduces its Security Cloud, designed to provide end-to-end visibility and robust protection across user interactions, cloud environments, and breaches. AI emerges as a pivotal tool, from enhancing user experiences to predicting and defending against cyber threats. The blog underscores Cisco's commitment to simplifying security stacks while ensuring efficacy and economic feasibility, making a compelling case for their platform approach in safeguarding digital landscapes.
How UiPath Discovery Suite supports identification of Agentic Process Automat...DianaGray10
📚 Understand the basics of the newly persona-based LLM-powered Agentic Process Automation and discover how existing UiPath Discovery Suite products like Communication Mining, Process Mining, and Task Mining can be leveraged to identify APA candidates.
Topics Covered:
💡 Idea Behind APA: Explore the innovative concept of Agentic Process Automation and its significance in modern workflows.
🔄 How APA is Different from RPA: Learn the key differences between Agentic Process Automation and Robotic Process Automation.
🚀 Discover the Advantages of APA: Uncover the unique benefits of implementing APA in your organization.
🔍 Identifying APA Candidates with UiPath Discovery Products: See how UiPath's Communication Mining, Process Mining, and Task Mining tools can help pinpoint potential APA candidates.
🔮 Discussion on Expected Future Impacts: Engage in a discussion on the potential future impacts of APA on various industries and business processes.
Enhance your knowledge on the forefront of automation technology and stay ahead with Agentic Process Automation. 🧠💼✨
Speakers:
Arun Kumar Asokan, Delivery Director (US) @ qBotica and UiPath MVP
Naveen Chatlapalli, Solution Architect @ Ashling Partners and UiPath MVP
Finetuning GenAI For Hacking and DefendingPriyanka Aash
Generative AI, particularly through the lens of large language models (LLMs), represents a transformative leap in artificial intelligence. With advancements that have fundamentally altered our approach to AI, understanding and leveraging these technologies is crucial for innovators and practitioners alike. This comprehensive exploration delves into the intricacies of GenAI, from its foundational principles and historical evolution to its practical applications in security and beyond.
"Making .NET Application Even Faster", Sergey Teplyakov.pptxFwdays
In this talk we're going to explore performance improvement lifecycle, starting with setting the performance goals, using profilers to figure out the bottle necks, making a fix and validating that the fix works by benchmarking it. The talk will be useful for novice and seasoned .NET developers and architects interested in making their application fast and understanding how things work under the hood.
1. Welcome to the webinar on
Designing High Performance Datawarehouse
Presented by
&
2. Contents
1
What happened in the Data 1.0 World
2
What is shaping the new Data 2.0 World
3
Designing High Performance Datawarehouse
4
Q&A
3. What happened in the Data 1.0 World?
Before 2000
Do we need a DWH?
2000s
Select success : top down &
bottom up
Advent of ODS
Now
Business led
We’ve got BI / DWH Tools
Volume | Variety | Velocity |
Value
Performance vs. Volume :
Game Changer
Need insights from nonstructured data as well
Drill-down Reporting from
DWH – getting into mainstream
Analytics is a differentiator
Data Silos
Metrics for success?
OLAP = Insights
Painful Implementations
Show me the ROI
Standardized KPIs
Analytics as differentiator?
(DATA) Big, Real time, In-memory
– what do with existing
initiatives?
Retaining skills and expertise
Data 2.0 : scale, performance,
knowledge, relevance
4. Challenges in current DW environment - Survey
42%
say
Can’t scale to big data volumes
27% say
Inadequate data load speed
27%
say
Poor query response
25%
Existing DW modeled for
reports & OLAP only
24%
24%
23%
19%
Can’t score analytic models
Fast enough
18%
Cost of scaling up or out is too expensive
15%
Can’t support high
Concurrent user count
15%
Inadequate support for
In-memory processing
9%
18%
Current platform needs great
Manual effort for performance
Poorly suited to real-time
workloads
Can’t support in-database
analytics
Poor CPU speed and
capacity
Current platform is a legacy,
We must phase it out
TDWI research based on 278 respondents – Top Responses`
5. Social Media
Data
Data 2.0 World
True Sentiment
Faster Compliance
Text Data
Sensor Data
High Performance
Data Warehouse
Concurrency Enabled
Able to handle Complexity
Ability to Scale
Syndicated
Data
Faster Reach
Speed
Numeric
Data
Every 18 months, non-rich structured and unstructured enterprise
data doubles.
Big Data Analytics
Analytics =
Competitive Advantage
Efficiencies driving
down costs
Customer
experience & service
Business is now equipped to consume, identify and act upon this data for superior insights
6. So what is a High Performance Datawarehouse?
Key Dimensions
8. CONCURRENCY
Streaming Big Data
S Event Processing
P Real time operation
Operational BI
E
Near time Analytics
E
Dashboard
D
Refresh
Fast Queries
Competing Workloads – OLAP, Analytics
Intraday data loads
Thousands of users
Ad hoc queries
High
Performance
Data
Warehouse
Big Data volumes
Detailed source data
Thousands of reports
Scale out into: cloud, clusters, grids, etc.
SCALE
Big Data variety
Unstructured
Sensor
Social media
Many sources /
targets
Complex models
and SQL
High availability
C
O
M
P
L
E
X
I
T
Y
10. Industry recognized top techniques
45%
say
Creating Summary Tables
44%
say
33%
Adding Indexes
say
Altering SQL Statements or routines
24%
24%
Changing physical data models
16%
Using in-memory databases
21%
16%
Upgrading Hardware
20%
16%
Choosing between column-row
oriented data storage
Restricting or throttling user queries
15%
Moving an application to a
separate data mart
10%
Applying workload to
management controls
Shifting some workloads
to off-peak hours
Adjusting system parameters
6%
Others
TDWI research based on 329 responses from 114 respondents
12. Summary table design process
A good sampling of queries. These may come from user interviews, testing / QA queries,
COLLECT
production queries, reports or any other means that provide a good representation of
expected production queries
ANALYZE
IDENTIFY
The dimension hierarchy levels, dimension attributes, and fact table measures that are
required by each query or report.
The row counts associated with each dimension level represented.
The most commonly required dimension levels against the number of rows in the resulting
BALANCE
summary tables. A goal should be to design summary tables that are roughly 1/100th the size
of the source fact tables in terms of rows (or less)
MINIMIZE
The columns that are carried in the summary table in favor of joining back to the dimension
table. The larger the summary table, the less performance advantages it provides.
Some of the best candidates for aggregation will be those where the row counts decrease the most from one level in a
hierarchy to the next.
13. Capturing requirements for Summary table
•Choosing Aggregates to Create - There are two basic pieces of information which are
required to select the appropriate aggregates.
•Expected usage patterns of the data.
•Data volumes and distributions in the fact table
Report
Date
Calendar Year
Measures
Sales
Sale_Amt
Dimension
Level
Report 1
Dimension Level
Store
Item
District
Report 2
District
Calendar Year
Sales_Qty
Sale_Amt
Store Geography
Report 3
District
Calendar Month
Calendar Year
Sales_Qty
Sale_Amt
Calendar Month
Fiscal Period
Fiscal Week
Fiscal Period
Fiscal Week
Sales_Qty
Sale_Amt
Sales_Qty
Sale_Amt
Sale_Amt
Fiscal Week
Sales_Qty
Sale_Amt
Division
Region
District
Store
Subject
Category
Department
Fiscal Year
Fiscal Quarter
Fiscal Period
Fiscal Week
Report 4
Report 5
Report 6
Report 7
Report 8
Report 9
Report 10
Report 11
District
Store
Category
Dept
Dept
District
District
District
District
Region
Dept
Category
Fiscal Quarter
Fiscal Period
Fiscal Week
Sales_Qty
Sale_Amt
Sales_Qty
Item Category
Date
#
Populated
of Members
1
3
50
3980
279
1987
4145
3
12
36
156
14. Summary table design considerations
Aggregate storage column selection
Semi-additive and all non-additive fact data
– need not be stored in the summary table
Add as many “pre calculated” columns as possible
“Count” columns could be added for non additive
facts to preserve a portion of the information
Recreating vs. Updating Aggregates
Efficient for aggregation programs to update the
aggregate tables with the newly loaded data
Regeneration more appropriate if there is a lot of
program logic to determine what data must be
updated in the aggregate table
Storing Aggregate Rows
A combined table containing basic level fact
rows and aggregate rows
A single aggregate table which holds all
aggregate data for a single base fact table
A separate table for each aggregate created
– Most preferred option
Storing Aggregate Dimension Data
Multiple hierarchies in a single dimension
Store all of the aggregate dimension records
together in a single table
Use a separate table for each level in the
dimension
Add dimension data to aggregate fact table
16. Dimension table indexing
Create a non clustered, primary key on the surrogate key of
each dimension table
•
A clustered index on the business key should be considered.
• Enhance the query response when the business key is
used in the WHERE clause.
• Help avoid lock escalation during ETL process
•
For large type 2 SCDs, create a four-part non-clustered index :
business key, record begin date, record end date and surrogate
key
•
Create non-clustered indexes on columns in the dimension that
will be used for searching, sorting, or grouping,.
•
If there’s a hierarchy in a dimension, such as Category- Sub
Category-Product ID, then create index on Hierarchy
Index Type
EmployeeKey
•
Index columns
Non clustered
EmployeeNationalIDAlternateKey
clustered
EmployeeNationalIDAlternateKey,
StartDate, EndDate
EmployeeKey
Non clustered
FirstName
LastName
DeoartmentName
Non clustered
17. Fact table indexing
Index columns
Index Type
clustered
•
Create a clustered, composite index composed of each of
the foreign keys to the fact tables
OrderDateKey
ProductKey
CustomerKey
PromotionKey
CurrencyKey
SalesTerritoryKey
DueDateKey
•
Keep the most commonly queried date column as the
leftmost column in the index
•
There can be more than one date in the fact table but there
is usually one date that is of the most interest to business
users. A clustered index on this column has the effect of
quickly segmenting the amount of data that must be
evaluated for a given query
19. Row Store and Column Store
Most of the queries does not
process all the attributes of a
particular relation.
Row Store
Column Store
(+) Easy to add/modify a record
(+) Only need to read in relevant data
(-) Might read in unnecessary data
(-) Tuple writes require multiple accesses
• One can obtain the performance benefits of a column-store using a row-store
by making some changes to the physical structure of the row store.
– Vertically partitioning
– Using index-only plans
– Using materialized views
20. Vertical Partitioning
• Process:
– Full Vertical partitioning of each relation
• Each column =1 Physical table
• This can be achieved by adding integer position column to every table
• Adding integer position is better than adding primary key
– Join on Position for multi column fetch
21. Index-only plans
• Process:
– Add B+Tree index for every Table.column
– Plans never access the actual tuples on disk
– Headers are not stored, so per tuple overhead is less
23. Ecosystem of
open
Source projects
Metadata Management
(Hcatlog)
Distributed Processing
(MapReduce)
Distributed Storage
(HDFS)
Hosted by
Apache
Foundation
Query
(Pig)
Google
developed and
shared
concepts
(Hcatlog APIs, WebHDFS,
Talend Open Studio for Big Data, Sqoop)
Scripting
(Pig)
Data Extraction & Loading
Non-Relational Database
(Hbase)
Workflow & Scheduling
(Oozie)
Management & Monitoring
(Ambari, Zookeeper)
Hadoop ecosystem
Distributed File
System that has
the ability to
scale out
24. Promising uses of Hadoop in DW context
Data Staging
Hadoop’s scalability and low cost
enable organizations to keep all
data forever in a readily
accessible online environment
Data archiving
Schema flexibility
Hadoop enables the growing
practice of “late binding” –
instead of transforming data as
it’s ingested by Hadoop, structure
is applied at runtime
Hadoop allows organizations to
deploy an extremely scalable and
economical ETL environment
Hadoop can quickly and easily
ingest any data format
Processing flexibility
Distributed DW architecture
Off load workloads for big data and
advanced analytics to HDFS,
discovery platforms and MapReduce
25. What led to Datawarehouse at Facebook
The Problem
The Hadoop Experiment
Challenges with Hadoop
Data, data and more data
Superior in availability, scalability
Programmability & Metadata
200 GB per day in
And Manageability compared
March 2008
to commercial Databases
2+ TB (compressed) per day
Uses Hadoop File System (HDFS)
Map Reduce hard to program
Need to publish data in well
known schemas
HIVE
What is Hive?
Key Building Principles
Tables
A system for managing and
querying structured data built on
top of Hadoop
SQL on structured data as a familiar data
warehousing tool
Each table has a corresponding directory in HDFS
Uses Map Reduce for execution
Pluggable map/reduce scripts in language
of your choice: Rich Data Types
Uses HDFS for storage
Performance
Each table points to existing data directories in
HDFS
Split data based on hash of a column – mainly for
parallelism
27. Analytical platforms overview
1010data
Aster Data (Teradata)
Calpont
Datallegro (Microsoft)
Exasol
Greenplum (EMC)
IBM SmartAnalytics
Infobright
Kognitio
Netezza (IBM)
Oracle Exadata
Paraccel
Pervasive
Sand Technology
SAP HANA
Sybase IQ (SAP)
Teradata
Vertica (HP)
Purpose-built database management
systems designed explicitly for query
processing and analysis that provides
dramatically higher price/performance
and availability compared to general
purpose solutions.
Deployment Options
-Software only (Paraccel, Vertica)
-Appliance (SAP, Exadata, Netezza)
-Hosted(1010data, Kognitio)
•
Kelley Blue Book – Consolidates millions of auto transactions each week to calculate car valuations
•
AT&T Mobility – Tracks purchasing patterns for 80M customers daily to optimize targeted
marketing
28. Which platform do you choose?
Hadoop
Analytic Database
General Purpose
RDBMS
Structured
Semi-Structured
Unstructured
29. Thank You
Please send your Feedback & Corporate Training /Consulting Services
requirements on BI to sameer@compulinkacademy.com