SlideShare a Scribd company logo
Welcome to the webinar on

Designing High Performance Datawarehouse

Presented by

&
Contents

1

What happened in the Data 1.0 World

2

What is shaping the new Data 2.0 World

3

Designing High Performance Datawarehouse

4

Q&A
What happened in the Data 1.0 World?
Before 2000

Do we need a DWH?

2000s

Select success : top down &
bottom up

Advent of ODS

Now

Business led

We’ve got BI / DWH Tools

Volume | Variety | Velocity |
Value

Performance vs. Volume :
Game Changer

Need insights from nonstructured data as well

Drill-down Reporting from
DWH – getting into mainstream

Analytics is a differentiator

Data Silos
Metrics for success?
OLAP = Insights
Painful Implementations

Show me the ROI
Standardized KPIs
Analytics as differentiator?

(DATA) Big, Real time, In-memory
– what do with existing
initiatives?

Retaining skills and expertise
Data 2.0 : scale, performance,
knowledge, relevance
Challenges in current DW environment - Survey
42%

say
Can’t scale to big data volumes

27% say
Inadequate data load speed

27%

say
Poor query response

25%
Existing DW modeled for
reports & OLAP only

24%
24%
23%
19%

Can’t score analytic models
Fast enough

18%

Cost of scaling up or out is too expensive

15%

Can’t support high
Concurrent user count

15%
Inadequate support for
In-memory processing

9%

18%
Current platform needs great
Manual effort for performance
Poorly suited to real-time
workloads
Can’t support in-database
analytics
Poor CPU speed and
capacity

Current platform is a legacy,
We must phase it out

TDWI research based on 278 respondents – Top Responses`
Social Media
Data

Data 2.0 World

True Sentiment
Faster Compliance

Text Data

Sensor Data

High Performance
Data Warehouse

Concurrency Enabled
Able to handle Complexity
Ability to Scale

Syndicated
Data

Faster Reach

Speed

Numeric
Data

Every 18 months, non-rich structured and unstructured enterprise
data doubles.

Big Data Analytics
Analytics =
Competitive Advantage

Efficiencies driving
down costs

Customer
experience & service

Business is now equipped to consume, identify and act upon this data for superior insights
So what is a High Performance Datawarehouse?

Key Dimensions
CONCURRENCY

S
P
E
E
D

HIGH
PERFORMANCE
DATA
WAREHOUSE

SCALE

C
O
M
P
L
E
X
I
T
Y
CONCURRENCY





 Streaming Big Data
S  Event Processing
P  Real time operation
 Operational BI
E
 Near time Analytics
E
 Dashboard
D
Refresh
 Fast Queries

Competing Workloads – OLAP, Analytics
Intraday data loads
Thousands of users
Ad hoc queries

High
Performance
Data
Warehouse






Big Data volumes
Detailed source data
Thousands of reports
Scale out into: cloud, clusters, grids, etc.

SCALE

 Big Data variety
 Unstructured
 Sensor
 Social media
 Many sources /
targets
 Complex models
and SQL
 High availability

C
O
M
P
L
E
X
I
T
Y
Designing High Performance Datawarehouse
Industry recognized top techniques
45%

say
Creating Summary Tables

44%

say

33%
Adding Indexes

say
Altering SQL Statements or routines

24%
24%

Changing physical data models

16%

Using in-memory databases

21%

16%

Upgrading Hardware

20%
16%

Choosing between column-row
oriented data storage
Restricting or throttling user queries

15%

Moving an application to a
separate data mart

10%
Applying workload to
management controls

Shifting some workloads
to off-peak hours
Adjusting system parameters

6%
Others

TDWI research based on 329 responses from 114 respondents
Designing Summary Tables

45%

say
Creating Summary Tables
Summary table design process
A good sampling of queries. These may come from user interviews, testing / QA queries,

COLLECT

production queries, reports or any other means that provide a good representation of

expected production queries

ANALYZE

IDENTIFY

The dimension hierarchy levels, dimension attributes, and fact table measures that are

required by each query or report.

The row counts associated with each dimension level represented.

The most commonly required dimension levels against the number of rows in the resulting

BALANCE

summary tables. A goal should be to design summary tables that are roughly 1/100th the size
of the source fact tables in terms of rows (or less)

MINIMIZE

The columns that are carried in the summary table in favor of joining back to the dimension
table. The larger the summary table, the less performance advantages it provides.

Some of the best candidates for aggregation will be those where the row counts decrease the most from one level in a
hierarchy to the next.
Capturing requirements for Summary table
•Choosing Aggregates to Create - There are two basic pieces of information which are
required to select the appropriate aggregates.
•Expected usage patterns of the data.
•Data volumes and distributions in the fact table
Report

Date
Calendar Year

Measures
Sales
Sale_Amt

Dimension

Level

Report 1

Dimension Level
Store
Item
District

Report 2

District

Calendar Year

Sales_Qty
Sale_Amt

Store Geography

Report 3

District

Calendar Month
Calendar Year

Sales_Qty
Sale_Amt

Calendar Month
Fiscal Period
Fiscal Week
Fiscal Period
Fiscal Week

Sales_Qty
Sale_Amt
Sales_Qty
Sale_Amt
Sale_Amt

Fiscal Week

Sales_Qty
Sale_Amt

Division
Region
District
Store
Subject
Category
Department
Fiscal Year
Fiscal Quarter
Fiscal Period
Fiscal Week

Report 4
Report 5
Report 6
Report 7
Report 8
Report 9
Report 10
Report 11

District
Store

Category

Dept
Dept

District
District
District
District
Region

Dept
Category

Fiscal Quarter
Fiscal Period
Fiscal Week

Sales_Qty
Sale_Amt
Sales_Qty

Item Category
Date

#
Populated
of Members
1
3
50
3980
279
1987
4145
3
12
36
156
Summary table design considerations
Aggregate storage column selection

 Semi-additive and all non-additive fact data
– need not be stored in the summary table
 Add as many “pre calculated” columns as possible
 “Count” columns could be added for non additive
facts to preserve a portion of the information

Recreating vs. Updating Aggregates

 Efficient for aggregation programs to update the
aggregate tables with the newly loaded data
 Regeneration more appropriate if there is a lot of
program logic to determine what data must be
updated in the aggregate table

Storing Aggregate Rows
 A combined table containing basic level fact
rows and aggregate rows
 A single aggregate table which holds all
aggregate data for a single base fact table
 A separate table for each aggregate created

– Most preferred option

Storing Aggregate Dimension Data
 Multiple hierarchies in a single dimension
 Store all of the aggregate dimension records
together in a single table
 Use a separate table for each level in the

dimension
 Add dimension data to aggregate fact table
Efficient Indexing for Datawarehouse

44%

say
Adding Indexes
Dimension table indexing
Create a non clustered, primary key on the surrogate key of
each dimension table

•

A clustered index on the business key should be considered.
• Enhance the query response when the business key is
used in the WHERE clause.
• Help avoid lock escalation during ETL process

•

For large type 2 SCDs, create a four-part non-clustered index :
business key, record begin date, record end date and surrogate
key

•

Create non-clustered indexes on columns in the dimension that
will be used for searching, sorting, or grouping,.

•

If there’s a hierarchy in a dimension, such as Category- Sub
Category-Product ID, then create index on Hierarchy

Index Type

EmployeeKey

•

Index columns

Non clustered

EmployeeNationalIDAlternateKey

clustered

EmployeeNationalIDAlternateKey,
StartDate, EndDate
EmployeeKey

Non clustered

FirstName
LastName
DeoartmentName

Non clustered
Fact table indexing

Index columns

Index Type
clustered

•

Create a clustered, composite index composed of each of
the foreign keys to the fact tables

OrderDateKey
ProductKey
CustomerKey
PromotionKey
CurrencyKey
SalesTerritoryKey
DueDateKey

•

Keep the most commonly queried date column as the
leftmost column in the index

•

There can be more than one date in the fact table but there
is usually one date that is of the most interest to business
users. A clustered index on this column has the effect of
quickly segmenting the amount of data that must be
evaluated for a given query
Column Oriented databases
Row Store and Column Store
Most of the queries does not
process all the attributes of a
particular relation.

Row Store

Column Store

(+) Easy to add/modify a record

(+) Only need to read in relevant data

(-) Might read in unnecessary data

(-) Tuple writes require multiple accesses

• One can obtain the performance benefits of a column-store using a row-store
by making some changes to the physical structure of the row store.
– Vertically partitioning
– Using index-only plans
– Using materialized views
Vertical Partitioning
• Process:
– Full Vertical partitioning of each relation
• Each column =1 Physical table
• This can be achieved by adding integer position column to every table
• Adding integer position is better than adding primary key

– Join on Position for multi column fetch
Index-only plans
• Process:
– Add B+Tree index for every Table.column
– Plans never access the actual tuples on disk
– Headers are not stored, so per tuple overhead is less
Using Hadoop for Datawarehouse
Ecosystem of
open
Source projects

Metadata Management
(Hcatlog)
Distributed Processing
(MapReduce)
Distributed Storage
(HDFS)

Hosted by
Apache
Foundation

Query
(Pig)

Google
developed and
shared
concepts

(Hcatlog APIs, WebHDFS,
Talend Open Studio for Big Data, Sqoop)

Scripting
(Pig)

Data Extraction & Loading

Non-Relational Database
(Hbase)

Workflow & Scheduling
(Oozie)

Management & Monitoring
(Ambari, Zookeeper)

Hadoop ecosystem

Distributed File
System that has
the ability to
scale out
Promising uses of Hadoop in DW context

Data Staging

Hadoop’s scalability and low cost
enable organizations to keep all
data forever in a readily
accessible online environment

Data archiving

Schema flexibility

Hadoop enables the growing
practice of “late binding” –
instead of transforming data as
it’s ingested by Hadoop, structure
is applied at runtime

Hadoop allows organizations to
deploy an extremely scalable and
economical ETL environment

Hadoop can quickly and easily
ingest any data format

Processing flexibility

Distributed DW architecture

Off load workloads for big data and
advanced analytics to HDFS,
discovery platforms and MapReduce
What led to Datawarehouse at Facebook
The Problem

The Hadoop Experiment

Challenges with Hadoop

Data, data and more data

Superior in availability, scalability

Programmability & Metadata



200 GB per day in

And Manageability compared

March 2008

to commercial Databases

2+ TB (compressed) per day

Uses Hadoop File System (HDFS)



Map Reduce hard to program
Need to publish data in well
known schemas

HIVE
What is Hive?

Key Building Principles

Tables

A system for managing and
querying structured data built on
top of Hadoop

SQL on structured data as a familiar data
warehousing tool

Each table has a corresponding directory in HDFS

Uses Map Reduce for execution

Pluggable map/reduce scripts in language
of your choice: Rich Data Types

Uses HDFS for storage

Performance

Each table points to existing data directories in
HDFS
Split data based on hash of a column – mainly for
parallelism
Analytical platforms
Analytical platforms overview
1010data
Aster Data (Teradata)
Calpont
Datallegro (Microsoft)
Exasol
Greenplum (EMC)
IBM SmartAnalytics
Infobright
Kognitio
Netezza (IBM)
Oracle Exadata
Paraccel
Pervasive
Sand Technology
SAP HANA
Sybase IQ (SAP)
Teradata
Vertica (HP)

Purpose-built database management
systems designed explicitly for query
processing and analysis that provides
dramatically higher price/performance
and availability compared to general
purpose solutions.
Deployment Options
-Software only (Paraccel, Vertica)
-Appliance (SAP, Exadata, Netezza)
-Hosted(1010data, Kognitio)

•

Kelley Blue Book – Consolidates millions of auto transactions each week to calculate car valuations

•

AT&T Mobility – Tracks purchasing patterns for 80M customers daily to optimize targeted
marketing
Which platform do you choose?

Hadoop

Analytic Database

General Purpose
RDBMS

Structured 

Semi-Structured 

Unstructured
Thank You
Please send your Feedback & Corporate Training /Consulting Services

requirements on BI to sameer@compulinkacademy.com

More Related Content

What's hot

Demystifying Data Warehouse as a Service
Demystifying Data Warehouse as a ServiceDemystifying Data Warehouse as a Service
Demystifying Data Warehouse as a Service
Snowflake Computing
 
Enterprise Data Management Framework Overview
Enterprise Data Management Framework OverviewEnterprise Data Management Framework Overview
Enterprise Data Management Framework Overview
John Bao Vuu
 
datamarts.ppt
datamarts.pptdatamarts.ppt
datamarts.ppt
bhavyag24
 
Data Architecture Strategies: Data Architecture for Digital Transformation
Data Architecture Strategies: Data Architecture for Digital TransformationData Architecture Strategies: Data Architecture for Digital Transformation
Data Architecture Strategies: Data Architecture for Digital Transformation
DATAVERSITY
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An Introduction
Denodo
 
Informatica Data Quality Training
Informatica Data Quality TrainingInformatica Data Quality Training
Informatica Data Quality Training
tekslate1
 
Data Management Strategies
Data Management StrategiesData Management Strategies
Data Management Strategies
Micheal Axelsen
 
SAP Archiving
SAP ArchivingSAP Archiving
SAP Archiving
Phil Gleadhill
 
Modern Data architecture Design
Modern Data architecture DesignModern Data architecture Design
Modern Data architecture Design
Kujambu Murugesan
 
Measuring Data Quality with DataOps
Measuring Data Quality with DataOpsMeasuring Data Quality with DataOps
Measuring Data Quality with DataOps
Steven Ensslen
 
Data Warehouse Basic Guide
Data Warehouse Basic GuideData Warehouse Basic Guide
Data Warehouse Basic Guide
thomasmary607
 
5 Level of MDM Maturity
5 Level of MDM Maturity5 Level of MDM Maturity
5 Level of MDM Maturity
PanaEk Warawit
 
Master Data Management methodology
Master Data Management methodologyMaster Data Management methodology
Master Data Management methodology
Database Architechs
 
Snowflake Architecture.pptx
Snowflake Architecture.pptxSnowflake Architecture.pptx
Snowflake Architecture.pptx
chennakesava44
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouse
James Serra
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
Edureka!
 
SAP BI Requirements Gathering Process
SAP BI Requirements Gathering ProcessSAP BI Requirements Gathering Process
SAP BI Requirements Gathering Process
silvaft
 
The Marriage of the Data Lake and the Data Warehouse and Why You Need Both
The Marriage of the Data Lake and the Data Warehouse and Why You Need BothThe Marriage of the Data Lake and the Data Warehouse and Why You Need Both
The Marriage of the Data Lake and the Data Warehouse and Why You Need Both
Adaryl "Bob" Wakefield, MBA
 
Agile Data Mining with Data Vault 2.0 (english)
Agile Data Mining with Data Vault 2.0 (english)Agile Data Mining with Data Vault 2.0 (english)
Agile Data Mining with Data Vault 2.0 (english)
Michael Olschimke
 
Data Quality Best Practices
Data Quality Best PracticesData Quality Best Practices
Data Quality Best Practices
DATAVERSITY
 

What's hot (20)

Demystifying Data Warehouse as a Service
Demystifying Data Warehouse as a ServiceDemystifying Data Warehouse as a Service
Demystifying Data Warehouse as a Service
 
Enterprise Data Management Framework Overview
Enterprise Data Management Framework OverviewEnterprise Data Management Framework Overview
Enterprise Data Management Framework Overview
 
datamarts.ppt
datamarts.pptdatamarts.ppt
datamarts.ppt
 
Data Architecture Strategies: Data Architecture for Digital Transformation
Data Architecture Strategies: Data Architecture for Digital TransformationData Architecture Strategies: Data Architecture for Digital Transformation
Data Architecture Strategies: Data Architecture for Digital Transformation
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An Introduction
 
Informatica Data Quality Training
Informatica Data Quality TrainingInformatica Data Quality Training
Informatica Data Quality Training
 
Data Management Strategies
Data Management StrategiesData Management Strategies
Data Management Strategies
 
SAP Archiving
SAP ArchivingSAP Archiving
SAP Archiving
 
Modern Data architecture Design
Modern Data architecture DesignModern Data architecture Design
Modern Data architecture Design
 
Measuring Data Quality with DataOps
Measuring Data Quality with DataOpsMeasuring Data Quality with DataOps
Measuring Data Quality with DataOps
 
Data Warehouse Basic Guide
Data Warehouse Basic GuideData Warehouse Basic Guide
Data Warehouse Basic Guide
 
5 Level of MDM Maturity
5 Level of MDM Maturity5 Level of MDM Maturity
5 Level of MDM Maturity
 
Master Data Management methodology
Master Data Management methodologyMaster Data Management methodology
Master Data Management methodology
 
Snowflake Architecture.pptx
Snowflake Architecture.pptxSnowflake Architecture.pptx
Snowflake Architecture.pptx
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouse
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
 
SAP BI Requirements Gathering Process
SAP BI Requirements Gathering ProcessSAP BI Requirements Gathering Process
SAP BI Requirements Gathering Process
 
The Marriage of the Data Lake and the Data Warehouse and Why You Need Both
The Marriage of the Data Lake and the Data Warehouse and Why You Need BothThe Marriage of the Data Lake and the Data Warehouse and Why You Need Both
The Marriage of the Data Lake and the Data Warehouse and Why You Need Both
 
Agile Data Mining with Data Vault 2.0 (english)
Agile Data Mining with Data Vault 2.0 (english)Agile Data Mining with Data Vault 2.0 (english)
Agile Data Mining with Data Vault 2.0 (english)
 
Data Quality Best Practices
Data Quality Best PracticesData Quality Best Practices
Data Quality Best Practices
 

Viewers also liked

Data Warehouse Best Practices
Data Warehouse Best PracticesData Warehouse Best Practices
Data Warehouse Best Practices
Eduardo Castro
 
Data Warehouse Design and Best Practices
Data Warehouse Design and Best PracticesData Warehouse Design and Best Practices
Data Warehouse Design and Best Practices
Ivo Andreev
 
Using SSRS Reports with SSAS Cubes
Using SSRS Reports with SSAS CubesUsing SSRS Reports with SSAS Cubes
Using SSRS Reports with SSAS Cubes
Code Mastery
 
Open Source Datawarehouse
Open Source DatawarehouseOpen Source Datawarehouse
Open Source Datawarehouse
عباس بني اسدي مقدم
 
DWBI98 - Template Solutions for Data Warehouses and Data Marts - Presentation
DWBI98 - Template Solutions for Data Warehouses and Data Marts - PresentationDWBI98 - Template Solutions for Data Warehouses and Data Marts - Presentation
DWBI98 - Template Solutions for Data Warehouses and Data Marts - Presentation
David Walker
 
Business Intelligence with SQL Server
Business Intelligence with SQL ServerBusiness Intelligence with SQL Server
Business Intelligence with SQL Server
Peter Gfader
 
Testing data warehouse applications by Kirti Bhushan
Testing data warehouse applications by Kirti BhushanTesting data warehouse applications by Kirti Bhushan
Testing data warehouse applications by Kirti Bhushan
Kirti Bhushan
 
Business Intelligence Overview
Business Intelligence OverviewBusiness Intelligence Overview
Business Intelligence Overview
Claudio Menozzi
 
Seminar datawarehouse @ Universitas Multimedia Nusantara
Seminar datawarehouse @ Universitas Multimedia NusantaraSeminar datawarehouse @ Universitas Multimedia Nusantara
Seminar datawarehouse @ Universitas Multimedia Nusantara
Universitas Multimedia Nusantara
 
Keynote: The Journey to Pervasive Analytics
Keynote: The Journey to Pervasive AnalyticsKeynote: The Journey to Pervasive Analytics
Keynote: The Journey to Pervasive Analytics
Cloudera, Inc.
 
Oracle GoldenGate Demo and Data Integration Concepts
Oracle GoldenGate Demo and Data Integration ConceptsOracle GoldenGate Demo and Data Integration Concepts
Oracle GoldenGate Demo and Data Integration Concepts
Fumiko Yamashita
 
Datawarehouse Overview
Datawarehouse OverviewDatawarehouse Overview
Datawarehouse Overview
ashok kumar
 
Inmon & kimball method
Inmon & kimball methodInmon & kimball method
Inmon & kimball method
VijayMohan Vasu
 
White paper making an-operational_data_store_(ods)_the_center_of_your_data_...
White paper   making an-operational_data_store_(ods)_the_center_of_your_data_...White paper   making an-operational_data_store_(ods)_the_center_of_your_data_...
White paper making an-operational_data_store_(ods)_the_center_of_your_data_...
Eric Javier Espino Man
 
3 tier data warehouse
3 tier data warehouse3 tier data warehouse
3 tier data warehouse
J M
 
Data warehouse inmon versus kimball 2
Data warehouse inmon versus kimball 2Data warehouse inmon versus kimball 2
Data warehouse inmon versus kimball 2
Mike Frampton
 
Dimensional Modeling
Dimensional ModelingDimensional Modeling
Dimensional Modeling
aksrauf
 
Multidimentional data model
Multidimentional data modelMultidimentional data model
Multidimentional data model
jagdish_93
 
Breakout: Hadoop and the Operational Data Store
Breakout: Hadoop and the Operational Data StoreBreakout: Hadoop and the Operational Data Store
Breakout: Hadoop and the Operational Data Store
Cloudera, Inc.
 
Architecting a Data Warehouse: A Case Study
Architecting a Data Warehouse: A Case StudyArchitecting a Data Warehouse: A Case Study
Architecting a Data Warehouse: A Case Study
Mark Ginnebaugh
 

Viewers also liked (20)

Data Warehouse Best Practices
Data Warehouse Best PracticesData Warehouse Best Practices
Data Warehouse Best Practices
 
Data Warehouse Design and Best Practices
Data Warehouse Design and Best PracticesData Warehouse Design and Best Practices
Data Warehouse Design and Best Practices
 
Using SSRS Reports with SSAS Cubes
Using SSRS Reports with SSAS CubesUsing SSRS Reports with SSAS Cubes
Using SSRS Reports with SSAS Cubes
 
Open Source Datawarehouse
Open Source DatawarehouseOpen Source Datawarehouse
Open Source Datawarehouse
 
DWBI98 - Template Solutions for Data Warehouses and Data Marts - Presentation
DWBI98 - Template Solutions for Data Warehouses and Data Marts - PresentationDWBI98 - Template Solutions for Data Warehouses and Data Marts - Presentation
DWBI98 - Template Solutions for Data Warehouses and Data Marts - Presentation
 
Business Intelligence with SQL Server
Business Intelligence with SQL ServerBusiness Intelligence with SQL Server
Business Intelligence with SQL Server
 
Testing data warehouse applications by Kirti Bhushan
Testing data warehouse applications by Kirti BhushanTesting data warehouse applications by Kirti Bhushan
Testing data warehouse applications by Kirti Bhushan
 
Business Intelligence Overview
Business Intelligence OverviewBusiness Intelligence Overview
Business Intelligence Overview
 
Seminar datawarehouse @ Universitas Multimedia Nusantara
Seminar datawarehouse @ Universitas Multimedia NusantaraSeminar datawarehouse @ Universitas Multimedia Nusantara
Seminar datawarehouse @ Universitas Multimedia Nusantara
 
Keynote: The Journey to Pervasive Analytics
Keynote: The Journey to Pervasive AnalyticsKeynote: The Journey to Pervasive Analytics
Keynote: The Journey to Pervasive Analytics
 
Oracle GoldenGate Demo and Data Integration Concepts
Oracle GoldenGate Demo and Data Integration ConceptsOracle GoldenGate Demo and Data Integration Concepts
Oracle GoldenGate Demo and Data Integration Concepts
 
Datawarehouse Overview
Datawarehouse OverviewDatawarehouse Overview
Datawarehouse Overview
 
Inmon & kimball method
Inmon & kimball methodInmon & kimball method
Inmon & kimball method
 
White paper making an-operational_data_store_(ods)_the_center_of_your_data_...
White paper   making an-operational_data_store_(ods)_the_center_of_your_data_...White paper   making an-operational_data_store_(ods)_the_center_of_your_data_...
White paper making an-operational_data_store_(ods)_the_center_of_your_data_...
 
3 tier data warehouse
3 tier data warehouse3 tier data warehouse
3 tier data warehouse
 
Data warehouse inmon versus kimball 2
Data warehouse inmon versus kimball 2Data warehouse inmon versus kimball 2
Data warehouse inmon versus kimball 2
 
Dimensional Modeling
Dimensional ModelingDimensional Modeling
Dimensional Modeling
 
Multidimentional data model
Multidimentional data modelMultidimentional data model
Multidimentional data model
 
Breakout: Hadoop and the Operational Data Store
Breakout: Hadoop and the Operational Data StoreBreakout: Hadoop and the Operational Data Store
Breakout: Hadoop and the Operational Data Store
 
Architecting a Data Warehouse: A Case Study
Architecting a Data Warehouse: A Case StudyArchitecting a Data Warehouse: A Case Study
Architecting a Data Warehouse: A Case Study
 

Similar to Designing high performance datawarehouse

Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Caserta
 
3dw
3dw3dw
Become BI Architect with 1KEY Agile BI Suite - OLAP
Become BI Architect with 1KEY Agile BI Suite - OLAPBecome BI Architect with 1KEY Agile BI Suite - OLAP
Become BI Architect with 1KEY Agile BI Suite - OLAP
Dhiren Gala
 
Business Intelligence and Multidimensional Database
Business Intelligence and Multidimensional DatabaseBusiness Intelligence and Multidimensional Database
Business Intelligence and Multidimensional Database
Russel Chowdhury
 
Data Warehouse approaches with Dynamics AX
Data Warehouse  approaches with Dynamics AXData Warehouse  approaches with Dynamics AX
Data Warehouse approaches with Dynamics AX
Alvin You
 
3dw
3dw3dw
Dataware house introduction by InformaticaTrainingClasses
Dataware house introduction by InformaticaTrainingClassesDataware house introduction by InformaticaTrainingClasses
Dataware house introduction by InformaticaTrainingClasses
InformaticaTrainingClasses
 
MicroStrategy - Effective Business Dashboards
MicroStrategy - Effective Business DashboardsMicroStrategy - Effective Business Dashboards
MicroStrategy - Effective Business Dashboards
MicroStrategy Nederland
 
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ NewyorksysWhat is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
NEWYORKSYS-IT SOLUTIONS
 
Day 02 sap_bi_overview_and_terminology
Day 02 sap_bi_overview_and_terminologyDay 02 sap_bi_overview_and_terminology
Day 02 sap_bi_overview_and_terminology
tovetrivel
 
Delivering fast, powerful and scalable analytics
Delivering fast, powerful and scalable analyticsDelivering fast, powerful and scalable analytics
Delivering fast, powerful and scalable analytics
MariaDB plc
 
Fast, Powerful and Scalable Analytics
Fast, Powerful and Scalable AnalyticsFast, Powerful and Scalable Analytics
Fast, Powerful and Scalable Analytics
MariaDB plc
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
Gurpreet Singh Sachdeva
 
The Data Warehouse Lifecycle
The Data Warehouse LifecycleThe Data Warehouse Lifecycle
The Data Warehouse Lifecycle
bartlowe
 
Business Intelligence Data Warehouse System
Business Intelligence Data Warehouse SystemBusiness Intelligence Data Warehouse System
Business Intelligence Data Warehouse System
Kiran kumar
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
Rishikese MR
 
DataWarehouse Architecture,daat mining,data mart,etl process.pptx
DataWarehouse Architecture,daat mining,data mart,etl process.pptxDataWarehouse Architecture,daat mining,data mart,etl process.pptx
DataWarehouse Architecture,daat mining,data mart,etl process.pptx
ArunPatrick2
 
What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?
RTTS
 
Innovate 2014 - Customizing Your Rational Insight Deployment (workshop)
Innovate 2014 - Customizing Your Rational Insight Deployment (workshop)Innovate 2014 - Customizing Your Rational Insight Deployment (workshop)
Innovate 2014 - Customizing Your Rational Insight Deployment (workshop)
Marc Nehme
 
Team project - Data visualization on Olist company data
Team project - Data visualization on Olist company dataTeam project - Data visualization on Olist company data
Team project - Data visualization on Olist company data
Manasa Damera
 

Similar to Designing high performance datawarehouse (20)

Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
 
3dw
3dw3dw
3dw
 
Become BI Architect with 1KEY Agile BI Suite - OLAP
Become BI Architect with 1KEY Agile BI Suite - OLAPBecome BI Architect with 1KEY Agile BI Suite - OLAP
Become BI Architect with 1KEY Agile BI Suite - OLAP
 
Business Intelligence and Multidimensional Database
Business Intelligence and Multidimensional DatabaseBusiness Intelligence and Multidimensional Database
Business Intelligence and Multidimensional Database
 
Data Warehouse approaches with Dynamics AX
Data Warehouse  approaches with Dynamics AXData Warehouse  approaches with Dynamics AX
Data Warehouse approaches with Dynamics AX
 
3dw
3dw3dw
3dw
 
Dataware house introduction by InformaticaTrainingClasses
Dataware house introduction by InformaticaTrainingClassesDataware house introduction by InformaticaTrainingClasses
Dataware house introduction by InformaticaTrainingClasses
 
MicroStrategy - Effective Business Dashboards
MicroStrategy - Effective Business DashboardsMicroStrategy - Effective Business Dashboards
MicroStrategy - Effective Business Dashboards
 
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ NewyorksysWhat is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
 
Day 02 sap_bi_overview_and_terminology
Day 02 sap_bi_overview_and_terminologyDay 02 sap_bi_overview_and_terminology
Day 02 sap_bi_overview_and_terminology
 
Delivering fast, powerful and scalable analytics
Delivering fast, powerful and scalable analyticsDelivering fast, powerful and scalable analytics
Delivering fast, powerful and scalable analytics
 
Fast, Powerful and Scalable Analytics
Fast, Powerful and Scalable AnalyticsFast, Powerful and Scalable Analytics
Fast, Powerful and Scalable Analytics
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
 
The Data Warehouse Lifecycle
The Data Warehouse LifecycleThe Data Warehouse Lifecycle
The Data Warehouse Lifecycle
 
Business Intelligence Data Warehouse System
Business Intelligence Data Warehouse SystemBusiness Intelligence Data Warehouse System
Business Intelligence Data Warehouse System
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
 
DataWarehouse Architecture,daat mining,data mart,etl process.pptx
DataWarehouse Architecture,daat mining,data mart,etl process.pptxDataWarehouse Architecture,daat mining,data mart,etl process.pptx
DataWarehouse Architecture,daat mining,data mart,etl process.pptx
 
What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?
 
Innovate 2014 - Customizing Your Rational Insight Deployment (workshop)
Innovate 2014 - Customizing Your Rational Insight Deployment (workshop)Innovate 2014 - Customizing Your Rational Insight Deployment (workshop)
Innovate 2014 - Customizing Your Rational Insight Deployment (workshop)
 
Team project - Data visualization on Olist company data
Team project - Data visualization on Olist company dataTeam project - Data visualization on Olist company data
Team project - Data visualization on Olist company data
 

More from Uday Kothari

Introduction to blockchain Session @ Tie Pune
Introduction to blockchain Session @ Tie Pune Introduction to blockchain Session @ Tie Pune
Introduction to blockchain Session @ Tie Pune
Uday Kothari
 
MoSync Cross Platform mobile app development
MoSync  Cross Platform mobile app developmentMoSync  Cross Platform mobile app development
MoSync Cross Platform mobile app development
Uday Kothari
 
Cross platform mobile app development tools review
Cross platform mobile app development tools reviewCross platform mobile app development tools review
Cross platform mobile app development tools review
Uday Kothari
 
BI & Analytics in Action Using QlikView
BI & Analytics in Action Using QlikViewBI & Analytics in Action Using QlikView
BI & Analytics in Action Using QlikView
Uday Kothari
 
Business Intelligence and Big Data Analytics with Pentaho
Business Intelligence and Big Data Analytics with Pentaho Business Intelligence and Big Data Analytics with Pentaho
Business Intelligence and Big Data Analytics with Pentaho
Uday Kothari
 
The art technique of data visualization
The art  technique of data visualizationThe art  technique of data visualization
The art technique of data visualization
Uday Kothari
 
Innovative Internet & Digital marketing
 Innovative Internet & Digital marketing  Innovative Internet & Digital marketing
Innovative Internet & Digital marketing
Uday Kothari
 

More from Uday Kothari (7)

Introduction to blockchain Session @ Tie Pune
Introduction to blockchain Session @ Tie Pune Introduction to blockchain Session @ Tie Pune
Introduction to blockchain Session @ Tie Pune
 
MoSync Cross Platform mobile app development
MoSync  Cross Platform mobile app developmentMoSync  Cross Platform mobile app development
MoSync Cross Platform mobile app development
 
Cross platform mobile app development tools review
Cross platform mobile app development tools reviewCross platform mobile app development tools review
Cross platform mobile app development tools review
 
BI & Analytics in Action Using QlikView
BI & Analytics in Action Using QlikViewBI & Analytics in Action Using QlikView
BI & Analytics in Action Using QlikView
 
Business Intelligence and Big Data Analytics with Pentaho
Business Intelligence and Big Data Analytics with Pentaho Business Intelligence and Big Data Analytics with Pentaho
Business Intelligence and Big Data Analytics with Pentaho
 
The art technique of data visualization
The art  technique of data visualizationThe art  technique of data visualization
The art technique of data visualization
 
Innovative Internet & Digital marketing
 Innovative Internet & Digital marketing  Innovative Internet & Digital marketing
Innovative Internet & Digital marketing
 

Recently uploaded

FIDO Munich Seminar Blueprint for In-Vehicle Payment Standard.pptx
FIDO Munich Seminar Blueprint for In-Vehicle Payment Standard.pptxFIDO Munich Seminar Blueprint for In-Vehicle Payment Standard.pptx
FIDO Munich Seminar Blueprint for In-Vehicle Payment Standard.pptx
FIDO Alliance
 
Self-Healing Test Automation Framework - Healenium
Self-Healing Test Automation Framework - HealeniumSelf-Healing Test Automation Framework - Healenium
Self-Healing Test Automation Framework - Healenium
Knoldus Inc.
 
"Hands-on development experience using wasm Blazor", Furdak Vladyslav.pptx
"Hands-on development experience using wasm Blazor", Furdak Vladyslav.pptx"Hands-on development experience using wasm Blazor", Furdak Vladyslav.pptx
"Hands-on development experience using wasm Blazor", Furdak Vladyslav.pptx
Fwdays
 
The Path to General-Purpose Robots - Coatue
The Path to General-Purpose Robots - CoatueThe Path to General-Purpose Robots - Coatue
The Path to General-Purpose Robots - Coatue
Razin Mustafiz
 
It's your unstructured data: How to get your GenAI app to production (and spe...
It's your unstructured data: How to get your GenAI app to production (and spe...It's your unstructured data: How to get your GenAI app to production (and spe...
It's your unstructured data: How to get your GenAI app to production (and spe...
Zilliz
 
Garbage In, Garbage Out: Why poor data curation is killing your AI models (an...
Garbage In, Garbage Out: Why poor data curation is killing your AI models (an...Garbage In, Garbage Out: Why poor data curation is killing your AI models (an...
Garbage In, Garbage Out: Why poor data curation is killing your AI models (an...
Zilliz
 
Zaitechno Handheld Raman Spectrometer.pdf
Zaitechno Handheld Raman Spectrometer.pdfZaitechno Handheld Raman Spectrometer.pdf
Zaitechno Handheld Raman Spectrometer.pdf
AmandaCheung15
 
Enterprise_Mobile_Security_Forum_2013.pdf
Enterprise_Mobile_Security_Forum_2013.pdfEnterprise_Mobile_Security_Forum_2013.pdf
Enterprise_Mobile_Security_Forum_2013.pdf
Yury Chemerkin
 
Mule Experience Hub and Release Channel with Java 17
Mule Experience Hub and Release Channel with Java 17Mule Experience Hub and Release Channel with Java 17
Mule Experience Hub and Release Channel with Java 17
Bhajan Mehta
 
TrustArc Webinar - Innovating with TRUSTe Responsible AI Certification
TrustArc Webinar - Innovating with TRUSTe Responsible AI CertificationTrustArc Webinar - Innovating with TRUSTe Responsible AI Certification
TrustArc Webinar - Innovating with TRUSTe Responsible AI Certification
TrustArc
 
FIDO Munich Seminar: Strong Workforce Authn Push & Pull Factors.pptx
FIDO Munich Seminar: Strong Workforce Authn Push & Pull Factors.pptxFIDO Munich Seminar: Strong Workforce Authn Push & Pull Factors.pptx
FIDO Munich Seminar: Strong Workforce Authn Push & Pull Factors.pptx
FIDO Alliance
 
The History of Embeddings & Multimodal Embeddings
The History of Embeddings & Multimodal EmbeddingsThe History of Embeddings & Multimodal Embeddings
The History of Embeddings & Multimodal Embeddings
Zilliz
 
Redefining Cybersecurity with AI Capabilities
Redefining Cybersecurity with AI CapabilitiesRedefining Cybersecurity with AI Capabilities
Redefining Cybersecurity with AI Capabilities
Priyanka Aash
 
How UiPath Discovery Suite supports identification of Agentic Process Automat...
How UiPath Discovery Suite supports identification of Agentic Process Automat...How UiPath Discovery Suite supports identification of Agentic Process Automat...
How UiPath Discovery Suite supports identification of Agentic Process Automat...
DianaGray10
 
Indian Privacy law & Infosec for Startups
Indian Privacy law & Infosec for StartupsIndian Privacy law & Infosec for Startups
Indian Privacy law & Infosec for Startups
AMol NAik
 
FIDO Munich Seminar FIDO Automotive Apps.pptx
FIDO Munich Seminar FIDO Automotive Apps.pptxFIDO Munich Seminar FIDO Automotive Apps.pptx
FIDO Munich Seminar FIDO Automotive Apps.pptx
FIDO Alliance
 
Generative AI Reasoning Tech Talk - July 2024
Generative AI Reasoning Tech Talk - July 2024Generative AI Reasoning Tech Talk - July 2024
Generative AI Reasoning Tech Talk - July 2024
siddu769252
 
Finetuning GenAI For Hacking and Defending
Finetuning GenAI For Hacking and DefendingFinetuning GenAI For Hacking and Defending
Finetuning GenAI For Hacking and Defending
Priyanka Aash
 
"Making .NET Application Even Faster", Sergey Teplyakov.pptx
"Making .NET Application Even Faster", Sergey Teplyakov.pptx"Making .NET Application Even Faster", Sergey Teplyakov.pptx
"Making .NET Application Even Faster", Sergey Teplyakov.pptx
Fwdays
 
Perth MuleSoft Meetup July 2024
Perth MuleSoft Meetup July 2024Perth MuleSoft Meetup July 2024
Perth MuleSoft Meetup July 2024
Michael Price
 

Recently uploaded (20)

FIDO Munich Seminar Blueprint for In-Vehicle Payment Standard.pptx
FIDO Munich Seminar Blueprint for In-Vehicle Payment Standard.pptxFIDO Munich Seminar Blueprint for In-Vehicle Payment Standard.pptx
FIDO Munich Seminar Blueprint for In-Vehicle Payment Standard.pptx
 
Self-Healing Test Automation Framework - Healenium
Self-Healing Test Automation Framework - HealeniumSelf-Healing Test Automation Framework - Healenium
Self-Healing Test Automation Framework - Healenium
 
"Hands-on development experience using wasm Blazor", Furdak Vladyslav.pptx
"Hands-on development experience using wasm Blazor", Furdak Vladyslav.pptx"Hands-on development experience using wasm Blazor", Furdak Vladyslav.pptx
"Hands-on development experience using wasm Blazor", Furdak Vladyslav.pptx
 
The Path to General-Purpose Robots - Coatue
The Path to General-Purpose Robots - CoatueThe Path to General-Purpose Robots - Coatue
The Path to General-Purpose Robots - Coatue
 
It's your unstructured data: How to get your GenAI app to production (and spe...
It's your unstructured data: How to get your GenAI app to production (and spe...It's your unstructured data: How to get your GenAI app to production (and spe...
It's your unstructured data: How to get your GenAI app to production (and spe...
 
Garbage In, Garbage Out: Why poor data curation is killing your AI models (an...
Garbage In, Garbage Out: Why poor data curation is killing your AI models (an...Garbage In, Garbage Out: Why poor data curation is killing your AI models (an...
Garbage In, Garbage Out: Why poor data curation is killing your AI models (an...
 
Zaitechno Handheld Raman Spectrometer.pdf
Zaitechno Handheld Raman Spectrometer.pdfZaitechno Handheld Raman Spectrometer.pdf
Zaitechno Handheld Raman Spectrometer.pdf
 
Enterprise_Mobile_Security_Forum_2013.pdf
Enterprise_Mobile_Security_Forum_2013.pdfEnterprise_Mobile_Security_Forum_2013.pdf
Enterprise_Mobile_Security_Forum_2013.pdf
 
Mule Experience Hub and Release Channel with Java 17
Mule Experience Hub and Release Channel with Java 17Mule Experience Hub and Release Channel with Java 17
Mule Experience Hub and Release Channel with Java 17
 
TrustArc Webinar - Innovating with TRUSTe Responsible AI Certification
TrustArc Webinar - Innovating with TRUSTe Responsible AI CertificationTrustArc Webinar - Innovating with TRUSTe Responsible AI Certification
TrustArc Webinar - Innovating with TRUSTe Responsible AI Certification
 
FIDO Munich Seminar: Strong Workforce Authn Push & Pull Factors.pptx
FIDO Munich Seminar: Strong Workforce Authn Push & Pull Factors.pptxFIDO Munich Seminar: Strong Workforce Authn Push & Pull Factors.pptx
FIDO Munich Seminar: Strong Workforce Authn Push & Pull Factors.pptx
 
The History of Embeddings & Multimodal Embeddings
The History of Embeddings & Multimodal EmbeddingsThe History of Embeddings & Multimodal Embeddings
The History of Embeddings & Multimodal Embeddings
 
Redefining Cybersecurity with AI Capabilities
Redefining Cybersecurity with AI CapabilitiesRedefining Cybersecurity with AI Capabilities
Redefining Cybersecurity with AI Capabilities
 
How UiPath Discovery Suite supports identification of Agentic Process Automat...
How UiPath Discovery Suite supports identification of Agentic Process Automat...How UiPath Discovery Suite supports identification of Agentic Process Automat...
How UiPath Discovery Suite supports identification of Agentic Process Automat...
 
Indian Privacy law & Infosec for Startups
Indian Privacy law & Infosec for StartupsIndian Privacy law & Infosec for Startups
Indian Privacy law & Infosec for Startups
 
FIDO Munich Seminar FIDO Automotive Apps.pptx
FIDO Munich Seminar FIDO Automotive Apps.pptxFIDO Munich Seminar FIDO Automotive Apps.pptx
FIDO Munich Seminar FIDO Automotive Apps.pptx
 
Generative AI Reasoning Tech Talk - July 2024
Generative AI Reasoning Tech Talk - July 2024Generative AI Reasoning Tech Talk - July 2024
Generative AI Reasoning Tech Talk - July 2024
 
Finetuning GenAI For Hacking and Defending
Finetuning GenAI For Hacking and DefendingFinetuning GenAI For Hacking and Defending
Finetuning GenAI For Hacking and Defending
 
"Making .NET Application Even Faster", Sergey Teplyakov.pptx
"Making .NET Application Even Faster", Sergey Teplyakov.pptx"Making .NET Application Even Faster", Sergey Teplyakov.pptx
"Making .NET Application Even Faster", Sergey Teplyakov.pptx
 
Perth MuleSoft Meetup July 2024
Perth MuleSoft Meetup July 2024Perth MuleSoft Meetup July 2024
Perth MuleSoft Meetup July 2024
 

Designing high performance datawarehouse

  • 1. Welcome to the webinar on Designing High Performance Datawarehouse Presented by &
  • 2. Contents 1 What happened in the Data 1.0 World 2 What is shaping the new Data 2.0 World 3 Designing High Performance Datawarehouse 4 Q&A
  • 3. What happened in the Data 1.0 World? Before 2000 Do we need a DWH? 2000s Select success : top down & bottom up Advent of ODS Now Business led We’ve got BI / DWH Tools Volume | Variety | Velocity | Value Performance vs. Volume : Game Changer Need insights from nonstructured data as well Drill-down Reporting from DWH – getting into mainstream Analytics is a differentiator Data Silos Metrics for success? OLAP = Insights Painful Implementations Show me the ROI Standardized KPIs Analytics as differentiator? (DATA) Big, Real time, In-memory – what do with existing initiatives? Retaining skills and expertise Data 2.0 : scale, performance, knowledge, relevance
  • 4. Challenges in current DW environment - Survey 42% say Can’t scale to big data volumes 27% say Inadequate data load speed 27% say Poor query response 25% Existing DW modeled for reports & OLAP only 24% 24% 23% 19% Can’t score analytic models Fast enough 18% Cost of scaling up or out is too expensive 15% Can’t support high Concurrent user count 15% Inadequate support for In-memory processing 9% 18% Current platform needs great Manual effort for performance Poorly suited to real-time workloads Can’t support in-database analytics Poor CPU speed and capacity Current platform is a legacy, We must phase it out TDWI research based on 278 respondents – Top Responses`
  • 5. Social Media Data Data 2.0 World True Sentiment Faster Compliance Text Data Sensor Data High Performance Data Warehouse Concurrency Enabled Able to handle Complexity Ability to Scale Syndicated Data Faster Reach Speed Numeric Data Every 18 months, non-rich structured and unstructured enterprise data doubles. Big Data Analytics Analytics = Competitive Advantage Efficiencies driving down costs Customer experience & service Business is now equipped to consume, identify and act upon this data for superior insights
  • 6. So what is a High Performance Datawarehouse? Key Dimensions
  • 8. CONCURRENCY      Streaming Big Data S  Event Processing P  Real time operation  Operational BI E  Near time Analytics E  Dashboard D Refresh  Fast Queries Competing Workloads – OLAP, Analytics Intraday data loads Thousands of users Ad hoc queries High Performance Data Warehouse     Big Data volumes Detailed source data Thousands of reports Scale out into: cloud, clusters, grids, etc. SCALE  Big Data variety  Unstructured  Sensor  Social media  Many sources / targets  Complex models and SQL  High availability C O M P L E X I T Y
  • 10. Industry recognized top techniques 45% say Creating Summary Tables 44% say 33% Adding Indexes say Altering SQL Statements or routines 24% 24% Changing physical data models 16% Using in-memory databases 21% 16% Upgrading Hardware 20% 16% Choosing between column-row oriented data storage Restricting or throttling user queries 15% Moving an application to a separate data mart 10% Applying workload to management controls Shifting some workloads to off-peak hours Adjusting system parameters 6% Others TDWI research based on 329 responses from 114 respondents
  • 12. Summary table design process A good sampling of queries. These may come from user interviews, testing / QA queries, COLLECT production queries, reports or any other means that provide a good representation of expected production queries ANALYZE IDENTIFY The dimension hierarchy levels, dimension attributes, and fact table measures that are required by each query or report. The row counts associated with each dimension level represented. The most commonly required dimension levels against the number of rows in the resulting BALANCE summary tables. A goal should be to design summary tables that are roughly 1/100th the size of the source fact tables in terms of rows (or less) MINIMIZE The columns that are carried in the summary table in favor of joining back to the dimension table. The larger the summary table, the less performance advantages it provides. Some of the best candidates for aggregation will be those where the row counts decrease the most from one level in a hierarchy to the next.
  • 13. Capturing requirements for Summary table •Choosing Aggregates to Create - There are two basic pieces of information which are required to select the appropriate aggregates. •Expected usage patterns of the data. •Data volumes and distributions in the fact table Report Date Calendar Year Measures Sales Sale_Amt Dimension Level Report 1 Dimension Level Store Item District Report 2 District Calendar Year Sales_Qty Sale_Amt Store Geography Report 3 District Calendar Month Calendar Year Sales_Qty Sale_Amt Calendar Month Fiscal Period Fiscal Week Fiscal Period Fiscal Week Sales_Qty Sale_Amt Sales_Qty Sale_Amt Sale_Amt Fiscal Week Sales_Qty Sale_Amt Division Region District Store Subject Category Department Fiscal Year Fiscal Quarter Fiscal Period Fiscal Week Report 4 Report 5 Report 6 Report 7 Report 8 Report 9 Report 10 Report 11 District Store Category Dept Dept District District District District Region Dept Category Fiscal Quarter Fiscal Period Fiscal Week Sales_Qty Sale_Amt Sales_Qty Item Category Date # Populated of Members 1 3 50 3980 279 1987 4145 3 12 36 156
  • 14. Summary table design considerations Aggregate storage column selection  Semi-additive and all non-additive fact data – need not be stored in the summary table  Add as many “pre calculated” columns as possible  “Count” columns could be added for non additive facts to preserve a portion of the information Recreating vs. Updating Aggregates  Efficient for aggregation programs to update the aggregate tables with the newly loaded data  Regeneration more appropriate if there is a lot of program logic to determine what data must be updated in the aggregate table Storing Aggregate Rows  A combined table containing basic level fact rows and aggregate rows  A single aggregate table which holds all aggregate data for a single base fact table  A separate table for each aggregate created – Most preferred option Storing Aggregate Dimension Data  Multiple hierarchies in a single dimension  Store all of the aggregate dimension records together in a single table  Use a separate table for each level in the dimension  Add dimension data to aggregate fact table
  • 15. Efficient Indexing for Datawarehouse 44% say Adding Indexes
  • 16. Dimension table indexing Create a non clustered, primary key on the surrogate key of each dimension table • A clustered index on the business key should be considered. • Enhance the query response when the business key is used in the WHERE clause. • Help avoid lock escalation during ETL process • For large type 2 SCDs, create a four-part non-clustered index : business key, record begin date, record end date and surrogate key • Create non-clustered indexes on columns in the dimension that will be used for searching, sorting, or grouping,. • If there’s a hierarchy in a dimension, such as Category- Sub Category-Product ID, then create index on Hierarchy Index Type EmployeeKey • Index columns Non clustered EmployeeNationalIDAlternateKey clustered EmployeeNationalIDAlternateKey, StartDate, EndDate EmployeeKey Non clustered FirstName LastName DeoartmentName Non clustered
  • 17. Fact table indexing Index columns Index Type clustered • Create a clustered, composite index composed of each of the foreign keys to the fact tables OrderDateKey ProductKey CustomerKey PromotionKey CurrencyKey SalesTerritoryKey DueDateKey • Keep the most commonly queried date column as the leftmost column in the index • There can be more than one date in the fact table but there is usually one date that is of the most interest to business users. A clustered index on this column has the effect of quickly segmenting the amount of data that must be evaluated for a given query
  • 19. Row Store and Column Store Most of the queries does not process all the attributes of a particular relation. Row Store Column Store (+) Easy to add/modify a record (+) Only need to read in relevant data (-) Might read in unnecessary data (-) Tuple writes require multiple accesses • One can obtain the performance benefits of a column-store using a row-store by making some changes to the physical structure of the row store. – Vertically partitioning – Using index-only plans – Using materialized views
  • 20. Vertical Partitioning • Process: – Full Vertical partitioning of each relation • Each column =1 Physical table • This can be achieved by adding integer position column to every table • Adding integer position is better than adding primary key – Join on Position for multi column fetch
  • 21. Index-only plans • Process: – Add B+Tree index for every Table.column – Plans never access the actual tuples on disk – Headers are not stored, so per tuple overhead is less
  • 22. Using Hadoop for Datawarehouse
  • 23. Ecosystem of open Source projects Metadata Management (Hcatlog) Distributed Processing (MapReduce) Distributed Storage (HDFS) Hosted by Apache Foundation Query (Pig) Google developed and shared concepts (Hcatlog APIs, WebHDFS, Talend Open Studio for Big Data, Sqoop) Scripting (Pig) Data Extraction & Loading Non-Relational Database (Hbase) Workflow & Scheduling (Oozie) Management & Monitoring (Ambari, Zookeeper) Hadoop ecosystem Distributed File System that has the ability to scale out
  • 24. Promising uses of Hadoop in DW context Data Staging Hadoop’s scalability and low cost enable organizations to keep all data forever in a readily accessible online environment Data archiving Schema flexibility Hadoop enables the growing practice of “late binding” – instead of transforming data as it’s ingested by Hadoop, structure is applied at runtime Hadoop allows organizations to deploy an extremely scalable and economical ETL environment Hadoop can quickly and easily ingest any data format Processing flexibility Distributed DW architecture Off load workloads for big data and advanced analytics to HDFS, discovery platforms and MapReduce
  • 25. What led to Datawarehouse at Facebook The Problem The Hadoop Experiment Challenges with Hadoop Data, data and more data Superior in availability, scalability Programmability & Metadata  200 GB per day in And Manageability compared March 2008 to commercial Databases 2+ TB (compressed) per day Uses Hadoop File System (HDFS)  Map Reduce hard to program Need to publish data in well known schemas HIVE What is Hive? Key Building Principles Tables A system for managing and querying structured data built on top of Hadoop SQL on structured data as a familiar data warehousing tool Each table has a corresponding directory in HDFS Uses Map Reduce for execution Pluggable map/reduce scripts in language of your choice: Rich Data Types Uses HDFS for storage Performance Each table points to existing data directories in HDFS Split data based on hash of a column – mainly for parallelism
  • 27. Analytical platforms overview 1010data Aster Data (Teradata) Calpont Datallegro (Microsoft) Exasol Greenplum (EMC) IBM SmartAnalytics Infobright Kognitio Netezza (IBM) Oracle Exadata Paraccel Pervasive Sand Technology SAP HANA Sybase IQ (SAP) Teradata Vertica (HP) Purpose-built database management systems designed explicitly for query processing and analysis that provides dramatically higher price/performance and availability compared to general purpose solutions. Deployment Options -Software only (Paraccel, Vertica) -Appliance (SAP, Exadata, Netezza) -Hosted(1010data, Kognitio) • Kelley Blue Book – Consolidates millions of auto transactions each week to calculate car valuations • AT&T Mobility – Tracks purchasing patterns for 80M customers daily to optimize targeted marketing
  • 28. Which platform do you choose? Hadoop Analytic Database General Purpose RDBMS Structured  Semi-Structured  Unstructured
  • 29. Thank You Please send your Feedback & Corporate Training /Consulting Services requirements on BI to sameer@compulinkacademy.com