Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote

@joe_Caserta#DataSummit
@joe_Caserta
Architecting Data For The Modern Enterprise
Presented by
Joe Caserta
May 17, 2017
Data Summit 2017
New York City
#DataSummit

About Joe Caserta
Launched Big Data practice
Co-author, with Ralph Kimball, The Data
Warehouse ETL Toolkit (Wiley)
Data Analysis, Data Warehousing and Business
Intelligence since 1996
Began consulting database programing and data
modeling 25+ years hands-on experience building database
solutions
Founded Caserta Concepts in NYC
Web log analytics solution published in Intelligent
Enterprise magazine
Launched Data Science, Data Interaction and Cloud
practices
Laser focus on extending Data Analytics with Big Data
solutions
1986
2004
1996
2009
2001
2013
2012
2014
Dedicated to Data Governance Techniques on Big
Data (Innovation)
Awarded Top 20 Big Data Companies 2016
Top 20 Most Powerful
Big Data consulting firms
Launched Big Data Warehousing (BDW) Meetup NYC:
2,000+ Members
2016 Awarded Fastest Growing Big Data Companies
2016
Established best practices for big data ecosystem
implementations

About Caserta Concepts
– Consulting Data Innovation
– Award-winning company
– Internationally recognized work force
– Strategy, Architecture, Implementation, Governance
– Innovation Partner
– Strategic Consulting
– Advanced Architecture
– Build & Deploy
- Leader in Enterprise Data Solutions
– Big Data Analytics
– Data Warehousing
– Business Intelligence
Data Science
Cloud Computing
Data Governance

Why is Data so Important?
1500s
Printing Press
1840s
Penny Post
1850s
Telegraph
1850s
Rural Free Post
1890s
Telephone
1900s
Radio
1950s
TV
1970s
PCs
1980s
Internet
1990s
Web
2000s
Social Media, Mobile, Big Data, Cloud
98,000+ Tweets
695,000 Status Updates
11 Million instant messages
698,445 Google Searches
168 million+ emails sent
1,829 TB of data created
217 new mobile web
users
Every 60 Seconds

Understanding the Customer
Awareness Consideration Purchase Service
Loyalty
Expansion
PR
Radio
TV
Print
Outdoor
Word of Mouth
Direct Mail
Customer Service
Physical Touchpoints
Digital Touchpoints
Search
Paid Content
email
Website/
Landing Pages
Social Media
Community
Chat
Social Media
Call Center
Offers
Mailings
Survey
Loyalty Programs
email
Agents
Partners
Ads
Website
Mobile
3rd Party Sites
Offers
Web self-service

Life As We Know It
Business: “I need to analyze some new data”
 IT collects requirements
 Creates normalized and/or dimensional data models
 Profiles and conforms and the data
 Sophisticated ETL programs and quality standards
 Loads it into data models
 Builds a BI semantic layer
 Creates dashboards and reports
IT: “You can access your data in 3-6 months to see if it has value!
– Onboarding new data is difficult!
– Rigid Structures and Data Governance
– Disconnected/removed from business

The Problem: Shadow IT = Data Sprawl
• There is one application for every 5-10 employees generating copies of
the same files leading to massive amounts of duplicate idle data strewn
all across the enterprise. - Michael Vizard, ITBusinessEdge.com
• Employees spend 35% of their work time searching for information...
finding what they seek 50% of the time or less.
- “The High Cost of Not Finding Information,” IDC

The New Data Paradigm
OLD WAY:
• Structure Data  Ingest Data  Analyze Data
• Fully Governed
• Monolith
NEW WAY:
• Ingest Data  Analyze Data  Structure Data
• Just Enough Governance
• Dynamic
RECIPE:
• Data Officer & Data Organization
• Enterprise Data Lake
• Corporate Data Pyramid

Business Value
Cloud-based Data Lake
Big Data Analysis: The Ecosystem of the future
Analyze
Persist
DeployIngest
Data Integration
Identity Resolution
Data Quality
Discovery Exploration
Machine Learning
Models Development
Reports / Dashboards
Applications
APIs
Structured Data
Unstructured Data
SQL, NoSQL, Object Store
Find Share Collaborate
Data Engineer Data Scientist Business Analyst App Developer
Provides innovative and industry
leading technologies to rapidly be
applied to the business without
having to manage compatibility and
data complexity.
Technical Value
Provides an open framework
to reduce the number of
integration points and testing
environments to deliver
business solutions.
or

Ingest Raw
Data
Organize, Define,
Complete
Munging, Blending
Machine Learning
Data Quality and Monitoring
Metadata, ILM , Security
Data Catalog
Data Integration
Fully Governed ( trusted)
Arbitrary/Ad-hoc Queries
and Reporting
Usage Pattern Data Governance
Metadata, ILM,
Security
Corporate Data Pyramid (CDP)

Cloud Component AWS Google Microsoft
Scalable distributed storage S3 GCS Azure Storage
Pluggable fit-for-purpose processing EMR DataProc HDInsight
Compute Services EC2 GCE VMs
Consistent extensible framework Spark Spark Spark
Dimensional MPP Data Warehouse Redshift BigQuery
Azure SQL Data
Warehouse
Data Streaming Kenesis PubSub Azure Stream
Common Interface Jupyter DataLab Azure Notebook
The Data Lake on the Cloud
• Remove barriers between data ingestion and analysis
• Democratize data with Just Enough Data Governance (JEDG)

Which Cloud?

The Clouds Coalesce
Percent of organizations with AWS as primary, also
uses GCP
Percent of organizations with AWS as primary,
also uses Azure
Percent of organizations with GCP as primary, also
uses AWS
41%
32%
31%
Source: Clutch, 2016

• Development local or distributed is identical
• Beautiful high level API’s
• Full universe of Python modules
• Open source and Free
• Blazing fast!
Spark has become our default processing engine for a data engineering & science
Why Spark?

Analytics Development Lifecycle
• Data Science is performed in the ephemeral workspaces
• The work products of data science is promoted from “insights” to real applications.
• Rigorous Data Governance applied
• Processes must be hardened, repeatable, and performant
Big$
Data$
Warehouse$
Data$Science$Workspace$
Data$Lake$–$Integrated$Sandbox$$
Landing$Area$–$Source$Data$in$“Full$Fidelity”$
New
Data
New
Insights
Governance
Refinery

Unexpected Reaction to Change

Global economics
Intensity of competition
Reduce costs
Move to cross-functional teams
New executive leadership
Speed of technical change
Social trends and changes
Period of time in present role
Status & perks of office/dept under threat
No apparent reasons for proposed changes
Lack of understanding of proposed changes
Fear of inability to cope with new technology
Concern over job security
Forces for Change Forces Resisting Change
Status Quo
Moving the Status Quo
http://www.change-management-coach.com/force-field-analysis.html

Introducing the Chief Data Officer
• Evangelize a data vision for the organization
• Support & enforce data governance policies via outreach, training & tools
• Monitor and enforce data quality in collaboration with data owners
• Monitor and enforce data security along with Legal/Security/Compliance
• Work with IT to develop/maintain an enterprise repository of strategic data
• Set standards for analytical reporting and generate data insights
• Provide a single point of accountability for data
initiatives and issues
• Innovate ways to use existing data
• Enrich and augment data by combining internal and
external sources
• Support efficient and agile analytics through training
and templates

The CDO: The Whole Brain Challenge
Front
Back
Analytics Oriented
• Data Science
• Research
Process Oriented
• Data Governance
• Compliance
Operations Oriented
• Shared Services
• Data Engineering
Revenue Oriented
• Revenue Goals
• Monetizing Data

Chief Data Organization (Oversight)
Vertical Business Area
[Sales/Finance/Marketing/Operations/Customer Svc]
Product Owner
SCRUM Master
Agile Development Team
Business Subject Matter Expertise
Data Librarian/Data Stewardship
Data Science/ Statistical Skills
Data Engineering / Architecture
Presentation/ BI Report Development Skills
Data Quality Assurance
DevOps
IT Organization
(Oversight)
Enterprise Data Architect
Solution Engineers
Data Integration Practice
User Experience Practice
QA Practice
Operations Practice
Advanced Analytics
Business Analysts
Data Analysts
Data Scientists
Statisticians
Data Engineers
Planning Organization
Project Managers
Data Organization
Data Gov Coordinator
Data Librarians
Data Stewards
Agile Data Teams

Caution: Assembly Required
 Some of the most hopeful tools are brand new or in
incubation
 Enterprise big data implementations typically combine
products with custom built components
The Buildout
People, Processes and Business commitment are still critical!
Data Integration & Quality Data Catalog & Governance Emerging Solutions

What the Future Holds
• DevOps for Analytics
• Search-Based BI (NLP)
• Artificial Intelligence (AI)
• Virtual Reality BI (VR)
• Virtual Assistant BI (Voice)
• Reporting/Predictions Converge
• Citizen Data Scientists Emerge

Joe Caserta
President, Caserta Concepts
joe@casertaconcepts.com
@joe_caserta
Thank You!

Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote

Related slideshows

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote

Similar to Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote (20)

Recently uploaded

Recently uploaded (20)

Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote

Editor's Notes