SlideShare a Scribd company logo
Strata 2017
Creating a DevOps
Practice for Analytics
Bob Eilbacher
September 28, 2017
 About Caserta
 DevOps
 DevOps for Analytics
 Organization and Teams
 Questions
About Caserta
 Data Intelligence Consulting and Modern Data Engineering
 Award-winning data innovation
 Internationally recognized work force
 Strategy, Architecture, Governance, Implementation
About Caserta
 Architecture & Design
 Implementation Services
 Disruption Management
 Strategic Technical Consulting
 Training & Education
 Application Innovation
 Cloud Management
What is DevOps for Analytics?
First some terminology…
 DevOps
 Associated with movement primarily in application
development space for last 5-10 years
 Focused on very fast and continuous software product
 Think intra-day Prod releases at Netflix, Amazon, etc.
 Convergence of development and operations
methodologies to minimize TTR
 Tons of resources –, DZone
What is DevOps for Analytics?
Some more terminology…
 DataOps
 Re-emergent term
 Seems to have a broader context
 Applying DevOps to data management or to handling
backend databases
 Also tends to carry real legacy connotation
 Manual operations of database backups and restores,
What is DevOps for Analytics?
And finally…
 AnalyticsOps
 This is a term that we see starting to be used more
 Its focused on applying DevOps practices within a data
analytics and data science context
 This is the area we’re interested in for this talk
 We’ll use the terms AnalyticsOps or the more explicit
DevOps for Analytics interchangeably
 Speak with anyone and they will tell you first that DevOps
is a culture
 Based primarily on teamwork
 Speak with anyone and they will tell you first that DevOps is a
 Based primarily on teamwork
 Aims to address the underlying conflict between
development and operations objectives
Innovation @ speed vs. Performance @ quality
Change vs. Stability
 Culture is not “implemented”
 It needs to evolve
 Good news is it can be seeded
 It works!
 75% of IT and product dev organizations were successfully
using DevOps to some extent
– Source: RightScale 2016 State of the Cloud Report
 It’s flexible
 No two companies’ DevOps approaches will look the same
 Infinite number of ways to create teamwork
 A reflection of the organization itself
 DevOps tenets
 Continuous Integration
 Test Automation
 Continuous Delivery
 Continuous Deployment
 End-to-end automation is still aspirational for most
 Justify how much automation you need based on business
 What DevOps is not is a toolchain implementation
 Tools help the team execute within the culture
 Don’t run out and put an end to end chain in place and then
expect adoption
 Lets talk about tools for a minute …
 Explosion of both open-source and commercial DevOps
 Serve every discrete need
 requirements management, SCM, test automation, defect
tracking, build, deployment, monitoring and more
 1,500+ tools available
 Tooling categories:
 Code : Code development, version control tools, code merging
 Build : Continuous integration tools, build status
 Test : Test and results determine performance
 Package : Artifact repository, application pre-deployment
 Release : Change management, release automation
 Configure : Infrastructure configuration and management,
Infrastructure as Code tools
 Monitor : Applications performance monitoring, end user
Source: XebiaLabs
Why DevOps for Analytics?
“The fact is that analytic teams are
being compared by their businesses to
Amazon Prime – 2-day delivery of
almost anything”
Source: Unknown
Why DevOps for Analytics?
Why DevOps for Analytics?
 A couple of recent real world examples…
Data Science Rock Star Process Overengineering
Why DevOps for Analytics?
 Analytics and data science projects, what used to take
months to achieve is happening in days or hours
 Businesses typically like that and want more…
 Enabled by the strong trend toward cloud analytic
 Infrastructure as code (IaC) allows extension of software
development practices to servers and infrastructure
 We can automate the build of complex analytic pipelines -
storage, processing engines, etc. with relative ease
DevOps for Analytics
 DevOps for Analytics combines the development and
operations teams and establishes best practices that
improve coordination between data science and operations
 BUT… Data Science and Analytics are different from
application development
 Especially in a Big Data environments - need big data to test big
data applications
 Much more diverse mix of tools and technologies – not just java
 Some differences in approach are needed
DevOps for Analytics
 AnalyticsOps this is still in its early days
 There aren’t any real solid industry success stories published
 People are still trying to figure out what works and aren’t’ open
kimono and sharing experiences just yet
 Not a lot of experienced practitioners
 But there are some early themes and guidelines emerging
DevOps for Analytics
 Environments
 Separate DEV and PROD environments
 Should you reuse any of the PROD data assets?
 Separate landing area, destination area (Data Lake), etc.
 Trickier with increasing data volumes – do it smart to avoid
double costs
 Sharing compute cluster resources is OK
 Make all job inputs and outputs configuration driven (PROD
and DEV code doesn’t change) – for CI
DevOps for Analytics
 Automated Testing
 It’s almost impossible to get full code coverage
 How do you unit test SPARK SQL scripts? Regression tests?
Data validation?
 Test data is a complex problem – handle as a cross-functional
 Analytic results are often buried in complex outputs, QA
becomes forensic data analysis
 Automate what you can, supplement with community based
real-world data testing in a parallel Dev/Test environment
 The role of the Test/QA Engineer is still really important
 Test/QA Engineers need Data Engineering experience
DevOps for Analytics
 Monitoring
 Tracking and analyzing intra-day demand and longer term trends
in infrastructure performance (standard DevOps)
 But then…
 By their nature analytics processes require monitoring and
tuning over time with real-world inputs
 Data drifts; Predictive models have a finite lifetime
 Silent failures
 Feedback to developers so they can see how their code is
performing and affecting the Prod environment
 Continuous improvement
 The next wave is analytics on analytics…
DevOps for Analytics
 Emerging DevOps for Analytics environment usually contain
 CI
 Repo to store analytics app
 Repo to store configuration
 An API to deploy to the cluster
 Mechanism to monitor behavior and performance
DevOps for Analytics Organization
 Building a DevOps for Analytics culture is not an easy
 Should fall under the purview of a dedicated data organization
 These organizations are typically lead by the Chief Data
 More recently by Chief Data Scientist a Chief Analytics Officer
 Key responsibilities include
 Fostering adoption
 Clarifying and aligning to the business' vision
 Securing reasonable funding
DevOps for Analytics Organization
 The goal over time is to create lean, highly performant, cross-
functional, extremely effective teams
 Business Stakeholders
 Data Engineers
 Data Analysts & Data Scientists
 QA
 Operations
 All of these skills are important - but when in doubt get more Data
 Everyone on team has an equal voice
 Everyone codes & Everyone needs to know what Prod looks like
DevOps for Analytics Organization
 Start-up Condition: Bring in an experienced set of DevOps for
Analytics Engineers
 Help define the culture, lead by example
 Identify the Innovators and get them involved and leading
 The DevOps Engineers job is to ultimately engineer themselves out
of the equation
Source: Matthew Skelton, DevOps Patterns - Team Topologies
Final Thoughts
“We aim to engineer systems and processes
to better integrate development and
operations, resulting in decreased time to
market and an application infrastructure
that is instrumented, scalable and fault
tolerant… and immortal!”
- Will Liu, Equinox Data Team
Final Thoughts
 There are plenty of benefits in establishing a DevOps
for Analytics culture for your organization
 For the business: Speed to insight
 For the teams: Professional and personal satisfaction
 Be Fearless –
go build your own DevOps for Analytics culture!
Happy Birthday Joe Caserta!
Thank You
 Bob Eilbacher
 Vice President Operations, Caserta
Upcoming Training Opportunity:
Caserta is hosting 3 Days of Training Courses October 18-20th in NYC,
taught by Joe Caserta, co-author of The Data Warehouse ETL Toolkit:
Day 1: Agile Data Warehouse Design & Dimensional Modeling
Day 2: ETL Architecture & Design
Day 3: Big Data for Data Warehouse Practitioners
More info at

More Related Content

What's hot

Migrating 3000 users and 1100 applications from Lotus Notes to Office 365
Migrating 3000 users and 1100 applications from Lotus Notes to Office 365Migrating 3000 users and 1100 applications from Lotus Notes to Office 365
Migrating 3000 users and 1100 applications from Lotus Notes to Office 365
Arno Flapper
Mastering SharePoint Migration Planning
Mastering SharePoint Migration PlanningMastering SharePoint Migration Planning
Mastering SharePoint Migration Planning
Christian Buckley
Azure Data Storage
Azure Data StorageAzure Data Storage
Azure Data Storage
Ken Cenerelli
Microsoft SharePoint
Microsoft SharePointMicrosoft SharePoint
Microsoft SharePoint
David J Rosenthal
Architecting a datalake
Architecting a datalakeArchitecting a datalake
Architecting a datalake
Laurent Leturgez
Productionzing ML Model Using MLflow Model Serving
Productionzing ML Model Using MLflow Model ServingProductionzing ML Model Using MLflow Model Serving
Productionzing ML Model Using MLflow Model Serving
Demystifying Data Warehousing as a Service - DFW
Demystifying Data Warehousing as a Service - DFWDemystifying Data Warehousing as a Service - DFW
Demystifying Data Warehousing as a Service - DFW
Kent Graziano
Taxonomy Governance and Iteration
Taxonomy Governance and IterationTaxonomy Governance and Iteration
Taxonomy Governance and Iteration
Enterprise Knowledge
Architecting Agile Data Applications for Scale
Architecting Agile Data Applications for ScaleArchitecting Agile Data Applications for Scale
Architecting Agile Data Applications for Scale
Informatica Cloud Overview
Informatica Cloud OverviewInformatica Cloud Overview
Informatica Cloud Overview
Darren Cunningham
Planning Your Migration to SharePoint Online #SPBiz60
Planning Your Migration to SharePoint Online #SPBiz60Planning Your Migration to SharePoint Online #SPBiz60
Planning Your Migration to SharePoint Online #SPBiz60
Christian Buckley
Blockchain in industry 4.0
Blockchain in industry 4.0Blockchain in industry 4.0
Blockchain in industry 4.0
Mujahid Hussain
3D: DBT using Databricks and Delta
3D: DBT using Databricks and Delta3D: DBT using Databricks and Delta
3D: DBT using Databricks and Delta
Microsoft Azure Cost Optimization and improve efficiency
Microsoft Azure Cost Optimization and improve efficiencyMicrosoft Azure Cost Optimization and improve efficiency
Microsoft Azure Cost Optimization and improve efficiency
Kushan Lahiru Perera
Best Practices in DataOps: How to Create Agile, Automated Data Pipelines
Best Practices in DataOps: How to Create Agile, Automated Data PipelinesBest Practices in DataOps: How to Create Agile, Automated Data Pipelines
Best Practices in DataOps: How to Create Agile, Automated Data Pipelines
Eric Kavanagh
Data Analytics Meetup: Introduction to Azure Data Lake Storage
Data Analytics Meetup: Introduction to Azure Data Lake Storage Data Analytics Meetup: Introduction to Azure Data Lake Storage
Data Analytics Meetup: Introduction to Azure Data Lake Storage
SharePoint Beginner Training for End Users
SharePoint Beginner Training for End UsersSharePoint Beginner Training for End Users
SharePoint Beginner Training for End Users
Gregory Zelfond
Essential Metadata Strategies
Essential Metadata StrategiesEssential Metadata Strategies
Essential Metadata Strategies
Learn More About Microsoft Teams
Learn More About Microsoft Teams Learn More About Microsoft Teams
Learn More About Microsoft Teams
Dock 365
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...

What's hot (20)

Migrating 3000 users and 1100 applications from Lotus Notes to Office 365
Migrating 3000 users and 1100 applications from Lotus Notes to Office 365Migrating 3000 users and 1100 applications from Lotus Notes to Office 365
Migrating 3000 users and 1100 applications from Lotus Notes to Office 365
Mastering SharePoint Migration Planning
Mastering SharePoint Migration PlanningMastering SharePoint Migration Planning
Mastering SharePoint Migration Planning
Azure Data Storage
Azure Data StorageAzure Data Storage
Azure Data Storage
Microsoft SharePoint
Microsoft SharePointMicrosoft SharePoint
Microsoft SharePoint
Architecting a datalake
Architecting a datalakeArchitecting a datalake
Architecting a datalake
Productionzing ML Model Using MLflow Model Serving
Productionzing ML Model Using MLflow Model ServingProductionzing ML Model Using MLflow Model Serving
Productionzing ML Model Using MLflow Model Serving
Demystifying Data Warehousing as a Service - DFW
Demystifying Data Warehousing as a Service - DFWDemystifying Data Warehousing as a Service - DFW
Demystifying Data Warehousing as a Service - DFW
Taxonomy Governance and Iteration
Taxonomy Governance and IterationTaxonomy Governance and Iteration
Taxonomy Governance and Iteration
Architecting Agile Data Applications for Scale
Architecting Agile Data Applications for ScaleArchitecting Agile Data Applications for Scale
Architecting Agile Data Applications for Scale
Informatica Cloud Overview
Informatica Cloud OverviewInformatica Cloud Overview
Informatica Cloud Overview
Planning Your Migration to SharePoint Online #SPBiz60
Planning Your Migration to SharePoint Online #SPBiz60Planning Your Migration to SharePoint Online #SPBiz60
Planning Your Migration to SharePoint Online #SPBiz60
Blockchain in industry 4.0
Blockchain in industry 4.0Blockchain in industry 4.0
Blockchain in industry 4.0
3D: DBT using Databricks and Delta
3D: DBT using Databricks and Delta3D: DBT using Databricks and Delta
3D: DBT using Databricks and Delta
Microsoft Azure Cost Optimization and improve efficiency
Microsoft Azure Cost Optimization and improve efficiencyMicrosoft Azure Cost Optimization and improve efficiency
Microsoft Azure Cost Optimization and improve efficiency
Best Practices in DataOps: How to Create Agile, Automated Data Pipelines
Best Practices in DataOps: How to Create Agile, Automated Data PipelinesBest Practices in DataOps: How to Create Agile, Automated Data Pipelines
Best Practices in DataOps: How to Create Agile, Automated Data Pipelines
Data Analytics Meetup: Introduction to Azure Data Lake Storage
Data Analytics Meetup: Introduction to Azure Data Lake Storage Data Analytics Meetup: Introduction to Azure Data Lake Storage
Data Analytics Meetup: Introduction to Azure Data Lake Storage
SharePoint Beginner Training for End Users
SharePoint Beginner Training for End UsersSharePoint Beginner Training for End Users
SharePoint Beginner Training for End Users
Essential Metadata Strategies
Essential Metadata StrategiesEssential Metadata Strategies
Essential Metadata Strategies
Learn More About Microsoft Teams
Learn More About Microsoft Teams Learn More About Microsoft Teams
Learn More About Microsoft Teams
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...

Similar to Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017

DevOps 1 (1).pptx
DevOps 1 (1).pptxDevOps 1 (1).pptx
DevOps 1 (1).pptx
Introduction to DevOps slides-converted (1).pptx
Introduction to DevOps slides-converted (1).pptxIntroduction to DevOps slides-converted (1).pptx
Introduction to DevOps slides-converted (1).pptx
Innovate Better Through Machine data Analytics
Innovate Better Through Machine data AnalyticsInnovate Better Through Machine data Analytics
Innovate Better Through Machine data Analytics
Hal Rottenberg
Dev ops
Dev opsDev ops
Integrating SAP into DevOps Pipelines: Why and How
Integrating SAP into DevOps Pipelines: Why and HowIntegrating SAP into DevOps Pipelines: Why and How
Integrating SAP into DevOps Pipelines: Why and How
Introduction to DevOps slides.pdf
Introduction to DevOps slides.pdfIntroduction to DevOps slides.pdf
Introduction to DevOps slides.pdf
Breaking DevOps Illusion
Breaking DevOps IllusionBreaking DevOps Illusion
Breaking DevOps Illusion
DevOps Indonesia
Paul Peissner
ITpreneurs’ DevOps Portfolio- Professionalizing DevOps Skills
ITpreneurs’ DevOps Portfolio- Professionalizing DevOps SkillsITpreneurs’ DevOps Portfolio- Professionalizing DevOps Skills
ITpreneurs’ DevOps Portfolio- Professionalizing DevOps Skills
Devops Strategy Roadmap Lifecycle Ppt Powerpoint Presentation Slides Complete...
Devops Strategy Roadmap Lifecycle Ppt Powerpoint Presentation Slides Complete...Devops Strategy Roadmap Lifecycle Ppt Powerpoint Presentation Slides Complete...
Devops Strategy Roadmap Lifecycle Ppt Powerpoint Presentation Slides Complete...
AWS re:Invent 2016: Lift and Evolve – Saving Money in the Cloud is Easy, Maki...
AWS re:Invent 2016: Lift and Evolve – Saving Money in the Cloud is Easy, Maki...AWS re:Invent 2016: Lift and Evolve – Saving Money in the Cloud is Easy, Maki...
AWS re:Invent 2016: Lift and Evolve – Saving Money in the Cloud is Easy, Maki...
Amazon Web Services
Join Us to Explore DevOps on AWS with REAN Cloud
Join Us to Explore DevOps on AWS with REAN CloudJoin Us to Explore DevOps on AWS with REAN Cloud
Join Us to Explore DevOps on AWS with REAN Cloud
Amazon Web Services
DevOps for the Discouraged
DevOps for the Discouraged DevOps for the Discouraged
DevOps for the Discouraged
James Wickett
Dev ops concept
Dev ops conceptDev ops concept
Dev ops concept
Professional Guru
Meetup DevOps - Accelerate
Meetup DevOps - AccelerateMeetup DevOps - Accelerate
Meetup DevOps - Accelerate
DevOps Culture transformation in Modern Software Delivery
DevOps Culture transformation in Modern Software DeliveryDevOps Culture transformation in Modern Software Delivery
DevOps Culture transformation in Modern Software Delivery
Najib Radzuan
Continuous Security / DevSecOps- Why How and What
Continuous Security /  DevSecOps- Why How and WhatContinuous Security /  DevSecOps- Why How and What
Continuous Security / DevSecOps- Why How and What
Marc Hornbeek
DevOps culture, concepte , philosophie and practices
DevOps culture, concepte , philosophie and practicesDevOps culture, concepte , philosophie and practices
DevOps culture, concepte , philosophie and practices
apidays LIVE India 2022_Achieving High DevOps Practice Maturity.pptx
apidays LIVE India 2022_Achieving High DevOps Practice Maturity.pptxapidays LIVE India 2022_Achieving High DevOps Practice Maturity.pptx
apidays LIVE India 2022_Achieving High DevOps Practice Maturity.pptx

Similar to Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017 (20)

DevOps 1 (1).pptx
DevOps 1 (1).pptxDevOps 1 (1).pptx
DevOps 1 (1).pptx
Introduction to DevOps slides-converted (1).pptx
Introduction to DevOps slides-converted (1).pptxIntroduction to DevOps slides-converted (1).pptx
Introduction to DevOps slides-converted (1).pptx
Innovate Better Through Machine data Analytics
Innovate Better Through Machine data AnalyticsInnovate Better Through Machine data Analytics
Innovate Better Through Machine data Analytics
Dev ops
Dev opsDev ops
Dev ops
Integrating SAP into DevOps Pipelines: Why and How
Integrating SAP into DevOps Pipelines: Why and HowIntegrating SAP into DevOps Pipelines: Why and How
Integrating SAP into DevOps Pipelines: Why and How
Introduction to DevOps slides.pdf
Introduction to DevOps slides.pdfIntroduction to DevOps slides.pdf
Introduction to DevOps slides.pdf
Breaking DevOps Illusion
Breaking DevOps IllusionBreaking DevOps Illusion
Breaking DevOps Illusion
ITpreneurs’ DevOps Portfolio- Professionalizing DevOps Skills
ITpreneurs’ DevOps Portfolio- Professionalizing DevOps SkillsITpreneurs’ DevOps Portfolio- Professionalizing DevOps Skills
ITpreneurs’ DevOps Portfolio- Professionalizing DevOps Skills
Devops Strategy Roadmap Lifecycle Ppt Powerpoint Presentation Slides Complete...
Devops Strategy Roadmap Lifecycle Ppt Powerpoint Presentation Slides Complete...Devops Strategy Roadmap Lifecycle Ppt Powerpoint Presentation Slides Complete...
Devops Strategy Roadmap Lifecycle Ppt Powerpoint Presentation Slides Complete...
AWS re:Invent 2016: Lift and Evolve – Saving Money in the Cloud is Easy, Maki...
AWS re:Invent 2016: Lift and Evolve – Saving Money in the Cloud is Easy, Maki...AWS re:Invent 2016: Lift and Evolve – Saving Money in the Cloud is Easy, Maki...
AWS re:Invent 2016: Lift and Evolve – Saving Money in the Cloud is Easy, Maki...
Join Us to Explore DevOps on AWS with REAN Cloud
Join Us to Explore DevOps on AWS with REAN CloudJoin Us to Explore DevOps on AWS with REAN Cloud
Join Us to Explore DevOps on AWS with REAN Cloud
DevOps for the Discouraged
DevOps for the Discouraged DevOps for the Discouraged
DevOps for the Discouraged
Dev ops concept
Dev ops conceptDev ops concept
Dev ops concept
Meetup DevOps - Accelerate
Meetup DevOps - AccelerateMeetup DevOps - Accelerate
Meetup DevOps - Accelerate
DevOps Culture transformation in Modern Software Delivery
DevOps Culture transformation in Modern Software DeliveryDevOps Culture transformation in Modern Software Delivery
DevOps Culture transformation in Modern Software Delivery
Continuous Security / DevSecOps- Why How and What
Continuous Security /  DevSecOps- Why How and WhatContinuous Security /  DevSecOps- Why How and What
Continuous Security / DevSecOps- Why How and What
DevOps culture, concepte , philosophie and practices
DevOps culture, concepte , philosophie and practicesDevOps culture, concepte , philosophie and practices
DevOps culture, concepte , philosophie and practices
apidays LIVE India 2022_Achieving High DevOps Practice Maturity.pptx
apidays LIVE India 2022_Achieving High DevOps Practice Maturity.pptxapidays LIVE India 2022_Achieving High DevOps Practice Maturity.pptx
apidays LIVE India 2022_Achieving High DevOps Practice Maturity.pptx

More from Caserta

Using Machine Learning & Spark to Power Data-Driven Marketing
Using Machine Learning & Spark to Power Data-Driven MarketingUsing Machine Learning & Spark to Power Data-Driven Marketing
Using Machine Learning & Spark to Power Data-Driven Marketing
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
General Data Protection Regulation - BDW Meetup, October 11th, 2017
General Data Protection Regulation - BDW Meetup, October 11th, 2017General Data Protection Regulation - BDW Meetup, October 11th, 2017
General Data Protection Regulation - BDW Meetup, October 11th, 2017
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteArchitecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
The Rise of the CDO in Today's Enterprise
The Rise of the CDO in Today's EnterpriseThe Rise of the CDO in Today's Enterprise
The Rise of the CDO in Today's Enterprise
Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
You're the New CDO, Now What?
You're the New CDO, Now What?You're the New CDO, Now What?
You're the New CDO, Now What?
The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation
Making Big Data Easy for Everyone
Making Big Data Easy for EveryoneMaking Big Data Easy for Everyone
Making Big Data Easy for Everyone
Benefits of the Azure Cloud
Benefits of the Azure CloudBenefits of the Azure Cloud
Benefits of the Azure Cloud
Big Data Analytics on the Cloud
Big Data Analytics on the CloudBig Data Analytics on the Cloud
Big Data Analytics on the Cloud
Intro to Data Science on Hadoop
Intro to Data Science on HadoopIntro to Data Science on Hadoop
Intro to Data Science on Hadoop
The Emerging Role of the Data Lake
The Emerging Role of the Data LakeThe Emerging Role of the Data Lake
The Emerging Role of the Data Lake
Not Your Father's Database by Databricks
Not Your Father's Database by DatabricksNot Your Father's Database by Databricks
Not Your Father's Database by Databricks
Mastering Customer Data on Apache Spark
Mastering Customer Data on Apache SparkMastering Customer Data on Apache Spark
Mastering Customer Data on Apache Spark
Moving Past Infrastructure Limitations
Moving Past Infrastructure LimitationsMoving Past Infrastructure Limitations
Moving Past Infrastructure Limitations

More from Caserta (20)

Using Machine Learning & Spark to Power Data-Driven Marketing
Using Machine Learning & Spark to Power Data-Driven MarketingUsing Machine Learning & Spark to Power Data-Driven Marketing
Using Machine Learning & Spark to Power Data-Driven Marketing
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
General Data Protection Regulation - BDW Meetup, October 11th, 2017
General Data Protection Regulation - BDW Meetup, October 11th, 2017General Data Protection Regulation - BDW Meetup, October 11th, 2017
General Data Protection Regulation - BDW Meetup, October 11th, 2017
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteArchitecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
The Rise of the CDO in Today's Enterprise
The Rise of the CDO in Today's EnterpriseThe Rise of the CDO in Today's Enterprise
The Rise of the CDO in Today's Enterprise
Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
You're the New CDO, Now What?
You're the New CDO, Now What?You're the New CDO, Now What?
You're the New CDO, Now What?
The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation
Making Big Data Easy for Everyone
Making Big Data Easy for EveryoneMaking Big Data Easy for Everyone
Making Big Data Easy for Everyone
Benefits of the Azure Cloud
Benefits of the Azure CloudBenefits of the Azure Cloud
Benefits of the Azure Cloud
Big Data Analytics on the Cloud
Big Data Analytics on the CloudBig Data Analytics on the Cloud
Big Data Analytics on the Cloud
Intro to Data Science on Hadoop
Intro to Data Science on HadoopIntro to Data Science on Hadoop
Intro to Data Science on Hadoop
The Emerging Role of the Data Lake
The Emerging Role of the Data LakeThe Emerging Role of the Data Lake
The Emerging Role of the Data Lake
Not Your Father's Database by Databricks
Not Your Father's Database by DatabricksNot Your Father's Database by Databricks
Not Your Father's Database by Databricks
Mastering Customer Data on Apache Spark
Mastering Customer Data on Apache SparkMastering Customer Data on Apache Spark
Mastering Customer Data on Apache Spark
Moving Past Infrastructure Limitations
Moving Past Infrastructure LimitationsMoving Past Infrastructure Limitations
Moving Past Infrastructure Limitations

Recently uploaded

Yury Chemerkin
Choosing the Best Outlook OST to PST Converter: Key Features and Considerations
Choosing the Best Outlook OST to PST Converter: Key Features and ConsiderationsChoosing the Best Outlook OST to PST Converter: Key Features and Considerations
Choosing the Best Outlook OST to PST Converter: Key Features and Considerations
webbyacad software
Retrieval Augmented Generation Evaluation with Ragas
Retrieval Augmented Generation Evaluation with RagasRetrieval Augmented Generation Evaluation with Ragas
Retrieval Augmented Generation Evaluation with Ragas
Finetuning GenAI For Hacking and Defending
Finetuning GenAI For Hacking and DefendingFinetuning GenAI For Hacking and Defending
Finetuning GenAI For Hacking and Defending
Priyanka Aash
Mule Experience Hub and Release Channel with Java 17
Mule Experience Hub and Release Channel with Java 17Mule Experience Hub and Release Channel with Java 17
Mule Experience Hub and Release Channel with Java 17
Bhajan Mehta
FIDO Munich Seminar: Biometrics and Passkeys for In-Vehicle Apps.pptx
FIDO Munich Seminar: Biometrics and Passkeys for In-Vehicle Apps.pptxFIDO Munich Seminar: Biometrics and Passkeys for In-Vehicle Apps.pptx
FIDO Munich Seminar: Biometrics and Passkeys for In-Vehicle Apps.pptx
FIDO Alliance
Redefining Cybersecurity with AI Capabilities
Redefining Cybersecurity with AI CapabilitiesRedefining Cybersecurity with AI Capabilities
Redefining Cybersecurity with AI Capabilities
Priyanka Aash
FIDO Munich Seminar Introduction to FIDO.pptx
FIDO Munich Seminar Introduction to FIDO.pptxFIDO Munich Seminar Introduction to FIDO.pptx
FIDO Munich Seminar Introduction to FIDO.pptx
FIDO Alliance
Yury Chemerkin
Cracking AI Black Box - Strategies for Customer-centric Enterprise Excellence
Cracking AI Black Box - Strategies for Customer-centric Enterprise ExcellenceCracking AI Black Box - Strategies for Customer-centric Enterprise Excellence
Cracking AI Black Box - Strategies for Customer-centric Enterprise Excellence
Quentin Reul
Camunda Chapter NY Meetup July 2024.pptx
Camunda Chapter NY Meetup July 2024.pptxCamunda Chapter NY Meetup July 2024.pptx
Camunda Chapter NY Meetup July 2024.pptx
Exchange, Entra ID, Conectores, RAML: Todo, a la vez, en todas partes
Exchange, Entra ID, Conectores, RAML: Todo, a la vez, en todas partesExchange, Entra ID, Conectores, RAML: Todo, a la vez, en todas partes
Exchange, Entra ID, Conectores, RAML: Todo, a la vez, en todas partes
Scaling Vector Search: How Milvus Handles Billions+
Scaling Vector Search: How Milvus Handles Billions+Scaling Vector Search: How Milvus Handles Billions+
Scaling Vector Search: How Milvus Handles Billions+
Perth MuleSoft Meetup July 2024
Perth MuleSoft Meetup July 2024Perth MuleSoft Meetup July 2024
Perth MuleSoft Meetup July 2024
Michael Price
Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...
Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...
Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...
Snarky Security
NVIDIA at Breakthrough Discuss for Space Exploration
NVIDIA at Breakthrough Discuss for Space ExplorationNVIDIA at Breakthrough Discuss for Space Exploration
NVIDIA at Breakthrough Discuss for Space Exploration
Alison B. Lowndes
Mastering Board Best Practices: Essential Skills for Effective Non-profit Lea...
Mastering Board Best Practices: Essential Skills for Effective Non-profit Lea...Mastering Board Best Practices: Essential Skills for Effective Non-profit Lea...
Mastering Board Best Practices: Essential Skills for Effective Non-profit Lea...
FIDO Munich Seminar: FIDO Tech Principles.pptx
FIDO Munich Seminar: FIDO Tech Principles.pptxFIDO Munich Seminar: FIDO Tech Principles.pptx
FIDO Munich Seminar: FIDO Tech Principles.pptx
FIDO Alliance
Generative AI technology is a fascinating field that focuses on creating comp...
Generative AI technology is a fascinating field that focuses on creating comp...Generative AI technology is a fascinating field that focuses on creating comp...
Generative AI technology is a fascinating field that focuses on creating comp...
Nohoax Kanont
"Hands-on development experience using wasm Blazor", Furdak Vladyslav.pptx
"Hands-on development experience using wasm Blazor", Furdak Vladyslav.pptx"Hands-on development experience using wasm Blazor", Furdak Vladyslav.pptx
"Hands-on development experience using wasm Blazor", Furdak Vladyslav.pptx

Recently uploaded (20)

Choosing the Best Outlook OST to PST Converter: Key Features and Considerations
Choosing the Best Outlook OST to PST Converter: Key Features and ConsiderationsChoosing the Best Outlook OST to PST Converter: Key Features and Considerations
Choosing the Best Outlook OST to PST Converter: Key Features and Considerations
Retrieval Augmented Generation Evaluation with Ragas
Retrieval Augmented Generation Evaluation with RagasRetrieval Augmented Generation Evaluation with Ragas
Retrieval Augmented Generation Evaluation with Ragas
Finetuning GenAI For Hacking and Defending
Finetuning GenAI For Hacking and DefendingFinetuning GenAI For Hacking and Defending
Finetuning GenAI For Hacking and Defending
Mule Experience Hub and Release Channel with Java 17
Mule Experience Hub and Release Channel with Java 17Mule Experience Hub and Release Channel with Java 17
Mule Experience Hub and Release Channel with Java 17
FIDO Munich Seminar: Biometrics and Passkeys for In-Vehicle Apps.pptx
FIDO Munich Seminar: Biometrics and Passkeys for In-Vehicle Apps.pptxFIDO Munich Seminar: Biometrics and Passkeys for In-Vehicle Apps.pptx
FIDO Munich Seminar: Biometrics and Passkeys for In-Vehicle Apps.pptx
Redefining Cybersecurity with AI Capabilities
Redefining Cybersecurity with AI CapabilitiesRedefining Cybersecurity with AI Capabilities
Redefining Cybersecurity with AI Capabilities
FIDO Munich Seminar Introduction to FIDO.pptx
FIDO Munich Seminar Introduction to FIDO.pptxFIDO Munich Seminar Introduction to FIDO.pptx
FIDO Munich Seminar Introduction to FIDO.pptx
Cracking AI Black Box - Strategies for Customer-centric Enterprise Excellence
Cracking AI Black Box - Strategies for Customer-centric Enterprise ExcellenceCracking AI Black Box - Strategies for Customer-centric Enterprise Excellence
Cracking AI Black Box - Strategies for Customer-centric Enterprise Excellence
Camunda Chapter NY Meetup July 2024.pptx
Camunda Chapter NY Meetup July 2024.pptxCamunda Chapter NY Meetup July 2024.pptx
Camunda Chapter NY Meetup July 2024.pptx
Exchange, Entra ID, Conectores, RAML: Todo, a la vez, en todas partes
Exchange, Entra ID, Conectores, RAML: Todo, a la vez, en todas partesExchange, Entra ID, Conectores, RAML: Todo, a la vez, en todas partes
Exchange, Entra ID, Conectores, RAML: Todo, a la vez, en todas partes
Scaling Vector Search: How Milvus Handles Billions+
Scaling Vector Search: How Milvus Handles Billions+Scaling Vector Search: How Milvus Handles Billions+
Scaling Vector Search: How Milvus Handles Billions+
Perth MuleSoft Meetup July 2024
Perth MuleSoft Meetup July 2024Perth MuleSoft Meetup July 2024
Perth MuleSoft Meetup July 2024
Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...
Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...
Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...
NVIDIA at Breakthrough Discuss for Space Exploration
NVIDIA at Breakthrough Discuss for Space ExplorationNVIDIA at Breakthrough Discuss for Space Exploration
NVIDIA at Breakthrough Discuss for Space Exploration
Mastering Board Best Practices: Essential Skills for Effective Non-profit Lea...
Mastering Board Best Practices: Essential Skills for Effective Non-profit Lea...Mastering Board Best Practices: Essential Skills for Effective Non-profit Lea...
Mastering Board Best Practices: Essential Skills for Effective Non-profit Lea...
FIDO Munich Seminar: FIDO Tech Principles.pptx
FIDO Munich Seminar: FIDO Tech Principles.pptxFIDO Munich Seminar: FIDO Tech Principles.pptx
FIDO Munich Seminar: FIDO Tech Principles.pptx
Generative AI technology is a fascinating field that focuses on creating comp...
Generative AI technology is a fascinating field that focuses on creating comp...Generative AI technology is a fascinating field that focuses on creating comp...
Generative AI technology is a fascinating field that focuses on creating comp...
"Hands-on development experience using wasm Blazor", Furdak Vladyslav.pptx
"Hands-on development experience using wasm Blazor", Furdak Vladyslav.pptx"Hands-on development experience using wasm Blazor", Furdak Vladyslav.pptx
"Hands-on development experience using wasm Blazor", Furdak Vladyslav.pptx

Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017

  • 1. Strata 2017 Creating a DevOps Practice for Analytics Bob Eilbacher September 28, 2017
  • 2. Agenda  About Caserta  DevOps  DevOps for Analytics  Organization and Teams  Questions
  • 3. About Caserta  Data Intelligence Consulting and Modern Data Engineering  Award-winning data innovation  Internationally recognized work force  Strategy, Architecture, Governance, Implementation
  • 4. About Caserta  Architecture & Design  Implementation Services  Disruption Management  Strategic Technical Consulting  Training & Education  Application Innovation  Cloud Management
  • 5. What is DevOps for Analytics? First some terminology…  DevOps  Associated with movement primarily in application development space for last 5-10 years  Focused on very fast and continuous software product releases  Think intra-day Prod releases at Netflix, Amazon, etc.  Convergence of development and operations methodologies to minimize TTR  Tons of resources –, DZone
  • 6. What is DevOps for Analytics? Some more terminology…  DataOps  Re-emergent term  Seems to have a broader context  Applying DevOps to data management or to handling backend databases  Also tends to carry real legacy connotation  Manual operations of database backups and restores,
  • 7. What is DevOps for Analytics? And finally…  AnalyticsOps  This is a term that we see starting to be used more  Its focused on applying DevOps practices within a data analytics and data science context  This is the area we’re interested in for this talk  We’ll use the terms AnalyticsOps or the more explicit DevOps for Analytics interchangeably
  • 8. DevOps…  Speak with anyone and they will tell you first that DevOps is a culture  Based primarily on teamwork
  • 10. DevOps…  Speak with anyone and they will tell you first that DevOps is a “culture”  Based primarily on teamwork  Aims to address the underlying conflict between development and operations objectives Innovation @ speed vs. Performance @ quality Change vs. Stability  Culture is not “implemented”  It needs to evolve  Good news is it can be seeded
  • 11. DevOps…  It works!  75% of IT and product dev organizations were successfully using DevOps to some extent – Source: RightScale 2016 State of the Cloud Report  It’s flexible  No two companies’ DevOps approaches will look the same  Infinite number of ways to create teamwork  A reflection of the organization itself
  • 12. DevOps…  DevOps tenets  Continuous Integration  Test Automation  Continuous Delivery  Continuous Deployment  End-to-end automation is still aspirational for most companies  Justify how much automation you need based on business requirements.
  • 13. DevOps…  What DevOps is not is a toolchain implementation  Tools help the team execute within the culture  Don’t run out and put an end to end chain in place and then expect adoption  Lets talk about tools for a minute …  Explosion of both open-source and commercial DevOps tooling  Serve every discrete need  requirements management, SCM, test automation, defect tracking, build, deployment, monitoring and more  1,500+ tools available
  • 14. DevOps…  Tooling categories:  Code : Code development, version control tools, code merging  Build : Continuous integration tools, build status  Test : Test and results determine performance  Package : Artifact repository, application pre-deployment staging  Release : Change management, release automation  Configure : Infrastructure configuration and management, Infrastructure as Code tools  Monitor : Applications performance monitoring, end user experience
  • 16. Why DevOps for Analytics? “The fact is that analytic teams are being compared by their businesses to Amazon Prime – 2-day delivery of almost anything” Source: Unknown
  • 17. Why DevOps for Analytics?
  • 18. Why DevOps for Analytics?  A couple of recent real world examples… Data Science Rock Star Process Overengineering
  • 19. Why DevOps for Analytics?  Analytics and data science projects, what used to take months to achieve is happening in days or hours  Businesses typically like that and want more…  Enabled by the strong trend toward cloud analytic platforms/services  Infrastructure as code (IaC) allows extension of software development practices to servers and infrastructure  We can automate the build of complex analytic pipelines - storage, processing engines, etc. with relative ease
  • 20. DevOps for Analytics  DevOps for Analytics combines the development and operations teams and establishes best practices that improve coordination between data science and operations  BUT… Data Science and Analytics are different from application development  Especially in a Big Data environments - need big data to test big data applications  Much more diverse mix of tools and technologies – not just java  Some differences in approach are needed
  • 21. DevOps for Analytics  AnalyticsOps this is still in its early days  There aren’t any real solid industry success stories published  People are still trying to figure out what works and aren’t’ open kimono and sharing experiences just yet  Not a lot of experienced practitioners  But there are some early themes and guidelines emerging
  • 22. DevOps for Analytics  Environments  Separate DEV and PROD environments  Should you reuse any of the PROD data assets?  Separate landing area, destination area (Data Lake), etc.  Trickier with increasing data volumes – do it smart to avoid double costs  Sharing compute cluster resources is OK  Make all job inputs and outputs configuration driven (PROD and DEV code doesn’t change) – for CI
  • 23. DevOps for Analytics  Automated Testing  It’s almost impossible to get full code coverage  How do you unit test SPARK SQL scripts? Regression tests? Data validation?  Test data is a complex problem – handle as a cross-functional initiative.  Analytic results are often buried in complex outputs, QA becomes forensic data analysis  Automate what you can, supplement with community based real-world data testing in a parallel Dev/Test environment  The role of the Test/QA Engineer is still really important  Test/QA Engineers need Data Engineering experience
  • 24. DevOps for Analytics  Monitoring  Tracking and analyzing intra-day demand and longer term trends in infrastructure performance (standard DevOps)  But then…  By their nature analytics processes require monitoring and tuning over time with real-world inputs  Data drifts; Predictive models have a finite lifetime  Silent failures  Feedback to developers so they can see how their code is performing and affecting the Prod environment  Continuous improvement  The next wave is analytics on analytics…
  • 25. DevOps for Analytics  Emerging DevOps for Analytics environment usually contain  SCM  CI  Repo to store analytics app  Repo to store configuration  An API to deploy to the cluster  Mechanism to monitor behavior and performance
  • 26. DevOps for Analytics Organization  Building a DevOps for Analytics culture is not an easy undertaking  Should fall under the purview of a dedicated data organization  These organizations are typically lead by the Chief Data Officer  More recently by Chief Data Scientist a Chief Analytics Officer  Key responsibilities include  Fostering adoption  Clarifying and aligning to the business' vision  Securing reasonable funding
  • 27. DevOps for Analytics Organization  The goal over time is to create lean, highly performant, cross- functional, extremely effective teams  Business Stakeholders  Data Engineers  Data Analysts & Data Scientists  QA  Operations  All of these skills are important - but when in doubt get more Data Engineers!  Everyone on team has an equal voice  Everyone codes & Everyone needs to know what Prod looks like
  • 28. DevOps for Analytics Organization  Start-up Condition: Bring in an experienced set of DevOps for Analytics Engineers  Help define the culture, lead by example  Identify the Innovators and get them involved and leading  The DevOps Engineers job is to ultimately engineer themselves out of the equation Source: Matthew Skelton, DevOps Patterns - Team Topologies
  • 29. Final Thoughts “We aim to engineer systems and processes to better integrate development and operations, resulting in decreased time to market and an application infrastructure that is instrumented, scalable and fault tolerant… and immortal!” - Will Liu, Equinox Data Team
  • 30. Final Thoughts  There are plenty of benefits in establishing a DevOps for Analytics culture for your organization  For the business: Speed to insight  For the teams: Professional and personal satisfaction  Be Fearless – go build your own DevOps for Analytics culture!
  • 32. Happy Birthday Joe Caserta!
  • 33. Thank You  Bob Eilbacher  Vice President Operations, Caserta  Upcoming Training Opportunity: Caserta is hosting 3 Days of Training Courses October 18-20th in NYC, taught by Joe Caserta, co-author of The Data Warehouse ETL Toolkit: Day 1: Agile Data Warehouse Design & Dimensional Modeling Day 2: ETL Architecture & Design Day 3: Big Data for Data Warehouse Practitioners More info at