SlideShare a Scribd company logo
Learn to Use Databricks
for Data Science
Sean Owen, Principal Solutions Architect
Austin Ford, Sr. Product Manager
Data Science is a tough job
▪ Today, companies are becoming
more and more data-driven, and
the ones getting the most out of
their data will be the ones to
succeed
▪ As a result, Data Science is now a
core capability of many
businesses
▪ Unfortunately, it comes with a
challenging, complex workflow at
scale
What does a data science workflow look like?
I need the correctly sized compute
resource for my task
I need to be able to find and access
the right data sources to fuel my
analysis
I need to be sure my toolbox is ready
with the packages and libraries
required for my work
1. Setup
I’ve been given a business
question to answer with data.
Before I can even get started on
the data science, I need to set
up my development
environment.
What does a data science workflow look like?
I uncover insights through
statistical inference, modeling, or
other methods
I start with exploratory data
analysis to familiarize myself with
the data and form hypotheses
I synthesize the results of my work
and the answers to the original
business question
2. Data Science
Once the initial overhead of
setup is complete, the real work
begins.
At any point, I could be sent
back to the Setup phase to add
another data source, change
the size of my compute
resource, or pull in another
library.
What does a data science workflow look like?
I share the results with my business
stakeholders via email or Slack
I formulate the results into a report
or dashboard so they can be
consumed
I get feedback about my work from
my stakeholders and iterate with
them to have the biggest impact
3. Sharing Results
The most important step comes
once I finish the analysis:
sharing the results with my
stakeholders.
Our answer: The Databricks Lakehouse Platform
We want to remove the overhead so
you can focus on the most important
part of your work — data science
Structured Semi-structured Unstructured Streaming
BI &
SQL Analytics
Machine Learning
Real-time Data
Applications
Data Management & Governance
Open Data Storage
Data Science &
Engineering
Lakehouse Platform
Simple | Open | Collaborative
Reliable | Scalable | Secure
Structured Semi-structured Unstructured Streaming
BI &
SQL Analytics
Machine Learning
Real-time Data
Applications
Data Management & Governance
Open Data Storage
Data Science &
Engineering
Lakehouse Platform
Simple | Open | Collaborative
Reliable | Scalable | Secure
Our focus today
Databricks makes setup easy
1. Setup
The Lakehouse brings all
your company’s data
together into a single place
so you don’t have to go
digging through a variety of
data sources
Easily choose the right
compute resource for your
task and switch as needed
single-machine VMs
GPUs Spark clusters
Databricks’ runtimes come
prepackaged with the most
common data science tools,
and customization is easy
Add Python libraries on top of a
runtime with a single line of code
Databricks has the tools to enable you to focus on
your work
2. Data Science
Multi-language,
collaborative notebooks
with co-presence,
commenting, and co-editing
Built-in visualizations that
take you from raw data to
insights in two clicks
Auto-logged revision
history and a git integration
to ensure reproducibility and
enable version control
Databricks lets your share results and iterate
quickly
3. Sharing Results
Easily share your notebooks
with stakeholders, who can
view them as reports
Create a dashboard directly
from your notebook’s results
Iterate with your
stakeholders directly in the
notebook through comments
and co-presence
Getting practical: hands-on with an expert
Sean Owen
Principal Solutions Architect

More Related Content

What's hot

Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
Databricks
 
Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)
James Serra
 
Making Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMaking Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse Technology
Matei Zaharia
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse Architecture
Databricks
 
Databricks: A Tool That Empowers You To Do More With Data
Databricks: A Tool That Empowers You To Do More With DataDatabricks: A Tool That Empowers You To Do More With Data
Databricks: A Tool That Empowers You To Do More With Data
Databricks
 
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
DataScienceConferenc1
 
Introduction to Azure Databricks
Introduction to Azure DatabricksIntroduction to Azure Databricks
Introduction to Azure Databricks
James Serra
 
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardDelta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Paris Data Engineers !
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)
James Serra
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
James Serra
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta Lake
Databricks
 
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Cathrine Wilhelmsen
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
James Serra
 
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache KafkaThe Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
Kai Wähner
 
Azure Synapse Analytics
Azure Synapse AnalyticsAzure Synapse Analytics
Azure Synapse Analytics
WinWire Technologies Inc
 
Building large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBuilding large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudi
Bill Liu
 
3D: DBT using Databricks and Delta
3D: DBT using Databricks and Delta3D: DBT using Databricks and Delta
3D: DBT using Databricks and Delta
Databricks
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
Databricks
 
Databricks Fundamentals
Databricks FundamentalsDatabricks Fundamentals
Databricks Fundamentals
Dalibor Wijas
 

What's hot (20)

Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)
 
Making Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMaking Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse Technology
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse Architecture
 
Databricks: A Tool That Empowers You To Do More With Data
Databricks: A Tool That Empowers You To Do More With DataDatabricks: A Tool That Empowers You To Do More With Data
Databricks: A Tool That Empowers You To Do More With Data
 
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
 
Introduction to Azure Databricks
Introduction to Azure DatabricksIntroduction to Azure Databricks
Introduction to Azure Databricks
 
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardDelta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta Lake
 
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache KafkaThe Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
 
Azure Synapse Analytics
Azure Synapse AnalyticsAzure Synapse Analytics
Azure Synapse Analytics
 
Building large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBuilding large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudi
 
3D: DBT using Databricks and Delta
3D: DBT using Databricks and Delta3D: DBT using Databricks and Delta
3D: DBT using Databricks and Delta
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
 
Databricks Fundamentals
Databricks FundamentalsDatabricks Fundamentals
Databricks Fundamentals
 

Similar to Learn to Use Databricks for Data Science

What makes an effective data team?
What makes an effective data team?What makes an effective data team?
What makes an effective data team?
Snowplow Analytics
 
Democratizing Data Science in the Enterprise
Democratizing Data Science in the EnterpriseDemocratizing Data Science in the Enterprise
Democratizing Data Science in the Enterprise
Jesus Rodriguez
 
How to Scale your Analytics in a Maturing Organization
How to Scale your Analytics in a Maturing OrganizationHow to Scale your Analytics in a Maturing Organization
How to Scale your Analytics in a Maturing Organization
Kissmetrics on SlideShare
 
Data science in business Administration Nagarajan.pptx
Data science in business Administration Nagarajan.pptxData science in business Administration Nagarajan.pptx
Data science in business Administration Nagarajan.pptx
NagarajanG35
 
The Right Data Warehouse: Automation Now, Business Value Thereafter
The Right Data Warehouse: Automation Now, Business Value ThereafterThe Right Data Warehouse: Automation Now, Business Value Thereafter
The Right Data Warehouse: Automation Now, Business Value Thereafter
Inside Analysis
 
How Can Analytics Improve Business?
How Can Analytics Improve Business?How Can Analytics Improve Business?
How Can Analytics Improve Business?
Inside Analysis
 
Data Scientist Salary, Skills, Jobs And Resume | Data Scientist Career | Data...
Data Scientist Salary, Skills, Jobs And Resume | Data Scientist Career | Data...Data Scientist Salary, Skills, Jobs And Resume | Data Scientist Career | Data...
Data Scientist Salary, Skills, Jobs And Resume | Data Scientist Career | Data...
Simplilearn
 
Rapid-fire BI
Rapid-fire BIRapid-fire BI
Rapid-fire BI
Brett Sheppard
 
Make data simple in the cognitive era
Make data simple in the cognitive eraMake data simple in the cognitive era
Make data simple in the cognitive era
IBM Analytics
 
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...
Denodo
 
Boston Data Engineering: Designing and Implementing Data Mesh at Your Company...
Boston Data Engineering: Designing and Implementing Data Mesh at Your Company...Boston Data Engineering: Designing and Implementing Data Mesh at Your Company...
Boston Data Engineering: Designing and Implementing Data Mesh at Your Company...
Boston Data Engineering
 
Big data
Big dataBig data
Data_Harmonization_ClearStory
Data_Harmonization_ClearStoryData_Harmonization_ClearStory
Data_Harmonization_ClearStory
William Davis
 
Breed data scientists_ A Presentation.pptx
Breed data scientists_ A Presentation.pptxBreed data scientists_ A Presentation.pptx
Breed data scientists_ A Presentation.pptx
GautamPopli1
 
Top Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practicesTop Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practices
SpringPeople
 
What Managers Need to Know about Data Science
What Managers Need to Know about Data ScienceWhat Managers Need to Know about Data Science
What Managers Need to Know about Data Science
Annie Flippo
 
Advanced Use Cases for Analytics Breakout Session
Advanced Use Cases for Analytics Breakout SessionAdvanced Use Cases for Analytics Breakout Session
Advanced Use Cases for Analytics Breakout Session
Splunk
 
Notebooks in IBM
Notebooks in IBMNotebooks in IBM
Notebooks in IBM
Rosario Cunha
 
Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...
Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...
Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...
Denodo
 
H2O World - What you need before doing predictive analysis - Keen.io
H2O World - What you need before doing predictive analysis - Keen.ioH2O World - What you need before doing predictive analysis - Keen.io
H2O World - What you need before doing predictive analysis - Keen.io
Sri Ambati
 

Similar to Learn to Use Databricks for Data Science (20)

What makes an effective data team?
What makes an effective data team?What makes an effective data team?
What makes an effective data team?
 
Democratizing Data Science in the Enterprise
Democratizing Data Science in the EnterpriseDemocratizing Data Science in the Enterprise
Democratizing Data Science in the Enterprise
 
How to Scale your Analytics in a Maturing Organization
How to Scale your Analytics in a Maturing OrganizationHow to Scale your Analytics in a Maturing Organization
How to Scale your Analytics in a Maturing Organization
 
Data science in business Administration Nagarajan.pptx
Data science in business Administration Nagarajan.pptxData science in business Administration Nagarajan.pptx
Data science in business Administration Nagarajan.pptx
 
The Right Data Warehouse: Automation Now, Business Value Thereafter
The Right Data Warehouse: Automation Now, Business Value ThereafterThe Right Data Warehouse: Automation Now, Business Value Thereafter
The Right Data Warehouse: Automation Now, Business Value Thereafter
 
How Can Analytics Improve Business?
How Can Analytics Improve Business?How Can Analytics Improve Business?
How Can Analytics Improve Business?
 
Data Scientist Salary, Skills, Jobs And Resume | Data Scientist Career | Data...
Data Scientist Salary, Skills, Jobs And Resume | Data Scientist Career | Data...Data Scientist Salary, Skills, Jobs And Resume | Data Scientist Career | Data...
Data Scientist Salary, Skills, Jobs And Resume | Data Scientist Career | Data...
 
Rapid-fire BI
Rapid-fire BIRapid-fire BI
Rapid-fire BI
 
Make data simple in the cognitive era
Make data simple in the cognitive eraMake data simple in the cognitive era
Make data simple in the cognitive era
 
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...
 
Boston Data Engineering: Designing and Implementing Data Mesh at Your Company...
Boston Data Engineering: Designing and Implementing Data Mesh at Your Company...Boston Data Engineering: Designing and Implementing Data Mesh at Your Company...
Boston Data Engineering: Designing and Implementing Data Mesh at Your Company...
 
Big data
Big dataBig data
Big data
 
Data_Harmonization_ClearStory
Data_Harmonization_ClearStoryData_Harmonization_ClearStory
Data_Harmonization_ClearStory
 
Breed data scientists_ A Presentation.pptx
Breed data scientists_ A Presentation.pptxBreed data scientists_ A Presentation.pptx
Breed data scientists_ A Presentation.pptx
 
Top Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practicesTop Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practices
 
What Managers Need to Know about Data Science
What Managers Need to Know about Data ScienceWhat Managers Need to Know about Data Science
What Managers Need to Know about Data Science
 
Advanced Use Cases for Analytics Breakout Session
Advanced Use Cases for Analytics Breakout SessionAdvanced Use Cases for Analytics Breakout Session
Advanced Use Cases for Analytics Breakout Session
 
Notebooks in IBM
Notebooks in IBMNotebooks in IBM
Notebooks in IBM
 
Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...
Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...
Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...
 
H2O World - What you need before doing predictive analysis - Keen.io
H2O World - What you need before doing predictive analysis - Keen.ioH2O World - What you need before doing predictive analysis - Keen.io
H2O World - What you need before doing predictive analysis - Keen.io
 

More from Databricks

Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
Databricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
Databricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
Databricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
Databricks
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack Detection
Databricks
 
Jeeves Grows Up: An AI Chatbot for Performance and Quality
Jeeves Grows Up: An AI Chatbot for Performance and QualityJeeves Grows Up: An AI Chatbot for Performance and Quality
Jeeves Grows Up: An AI Chatbot for Performance and Quality
Databricks
 
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + FugueIntuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Databricks
 
Infrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload DeploymentInfrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload Deployment
Databricks
 

More from Databricks (20)

Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack Detection
 
Jeeves Grows Up: An AI Chatbot for Performance and Quality
Jeeves Grows Up: An AI Chatbot for Performance and QualityJeeves Grows Up: An AI Chatbot for Performance and Quality
Jeeves Grows Up: An AI Chatbot for Performance and Quality
 
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + FugueIntuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
 
Infrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload DeploymentInfrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload Deployment
 

Recently uploaded

Parcel Delivery - Intel Segmentation and Last Mile Opt.pdf
Parcel Delivery - Intel Segmentation and Last Mile Opt.pdfParcel Delivery - Intel Segmentation and Last Mile Opt.pdf
Parcel Delivery - Intel Segmentation and Last Mile Opt.pdf
AltanAtabarut
 
Dataguard Switchover Best Practices using DGMGRL (Dataguard Broker Command Line)
Dataguard Switchover Best Practices using DGMGRL (Dataguard Broker Command Line)Dataguard Switchover Best Practices using DGMGRL (Dataguard Broker Command Line)
Dataguard Switchover Best Practices using DGMGRL (Dataguard Broker Command Line)
Alireza Kamrani
 
Audits Of Complaints Against the PPD Report_2022.pdf
Audits Of Complaints Against the PPD Report_2022.pdfAudits Of Complaints Against the PPD Report_2022.pdf
Audits Of Complaints Against the PPD Report_2022.pdf
evwcarr
 
Aws MLOps Interview Questions with answers
Aws MLOps Interview Questions  with answersAws MLOps Interview Questions  with answers
Aws MLOps Interview Questions with answers
Sathiakumar Chandr
 
Vrinda store data analysis project using Excel
Vrinda store data analysis project using ExcelVrinda store data analysis project using Excel
Vrinda store data analysis project using Excel
SantuJana12
 
Field Diary and lab record, Importance.pdf
Field Diary and lab record, Importance.pdfField Diary and lab record, Importance.pdf
Field Diary and lab record, Importance.pdf
hritikbui
 
The Rise of Python in Finance,Automating Trading Strategies: _.pdf
The Rise of Python in Finance,Automating Trading Strategies: _.pdfThe Rise of Python in Finance,Automating Trading Strategies: _.pdf
The Rise of Python in Finance,Automating Trading Strategies: _.pdf
Riya Sen
 
Combined supervised and unsupervised neural networks for pulse shape discrimi...
Combined supervised and unsupervised neural networks for pulse shape discrimi...Combined supervised and unsupervised neural networks for pulse shape discrimi...
Combined supervised and unsupervised neural networks for pulse shape discrimi...
Samuel Jackson
 
SFBA Splunk Usergroup meeting July 17, 2024
SFBA Splunk Usergroup meeting July 17, 2024SFBA Splunk Usergroup meeting July 17, 2024
SFBA Splunk Usergroup meeting July 17, 2024
Becky Burwell
 
Getting Started with Interactive Brokers API and Python.pdf
Getting Started with Interactive Brokers API and Python.pdfGetting Started with Interactive Brokers API and Python.pdf
Getting Started with Interactive Brokers API and Python.pdf
Riya Sen
 
How AI is Revolutionizing Data Collection.pdf
How AI is Revolutionizing Data Collection.pdfHow AI is Revolutionizing Data Collection.pdf
How AI is Revolutionizing Data Collection.pdf
PromptCloud
 
future-of-asset-management-future-of-asset-management
future-of-asset-management-future-of-asset-managementfuture-of-asset-management-future-of-asset-management
future-of-asset-management-future-of-asset-management
Aadee4
 
Where to order Frederick Community College diploma?
Where to order Frederick Community College diploma?Where to order Frederick Community College diploma?
Where to order Frederick Community College diploma?
SomalyEng
 
Solution Manual for First Course in Abstract Algebra A, 8th Edition by John B...
Solution Manual for First Course in Abstract Algebra A, 8th Edition by John B...Solution Manual for First Course in Abstract Algebra A, 8th Edition by John B...
Solution Manual for First Course in Abstract Algebra A, 8th Edition by John B...
rightmanforbloodline
 
Acid Base Practice Test 4- KEY.pdfkkjkjk
Acid Base Practice Test 4- KEY.pdfkkjkjkAcid Base Practice Test 4- KEY.pdfkkjkjk
Acid Base Practice Test 4- KEY.pdfkkjkjk
talha2khan2k
 
PRODUCT | RESEARCH-PRESENTATION-1.1.pptx
PRODUCT | RESEARCH-PRESENTATION-1.1.pptxPRODUCT | RESEARCH-PRESENTATION-1.1.pptx
PRODUCT | RESEARCH-PRESENTATION-1.1.pptx
amazenolmedojeruel
 
Technology used in Ott data analysis project
Technology used in Ott data analysis  projectTechnology used in Ott data analysis  project
Technology used in Ott data analysis project
49AkshitYadav
 
SAMPLE PRODUCT RESEARCH PR - strikingly.pptx
SAMPLE PRODUCT RESEARCH PR - strikingly.pptxSAMPLE PRODUCT RESEARCH PR - strikingly.pptx
SAMPLE PRODUCT RESEARCH PR - strikingly.pptx
wojakmodern
 
Unit 1 Introduction to DATA SCIENCE .pptx
Unit 1 Introduction to DATA SCIENCE .pptxUnit 1 Introduction to DATA SCIENCE .pptx
Unit 1 Introduction to DATA SCIENCE .pptx
Priyanka Jadhav
 
Bimbingan kaunseling untuk pelajar IPTA/IPTS di Malaysia
Bimbingan kaunseling untuk pelajar IPTA/IPTS di MalaysiaBimbingan kaunseling untuk pelajar IPTA/IPTS di Malaysia
Bimbingan kaunseling untuk pelajar IPTA/IPTS di Malaysia
aznidajailani
 

Recently uploaded (20)

Parcel Delivery - Intel Segmentation and Last Mile Opt.pdf
Parcel Delivery - Intel Segmentation and Last Mile Opt.pdfParcel Delivery - Intel Segmentation and Last Mile Opt.pdf
Parcel Delivery - Intel Segmentation and Last Mile Opt.pdf
 
Dataguard Switchover Best Practices using DGMGRL (Dataguard Broker Command Line)
Dataguard Switchover Best Practices using DGMGRL (Dataguard Broker Command Line)Dataguard Switchover Best Practices using DGMGRL (Dataguard Broker Command Line)
Dataguard Switchover Best Practices using DGMGRL (Dataguard Broker Command Line)
 
Audits Of Complaints Against the PPD Report_2022.pdf
Audits Of Complaints Against the PPD Report_2022.pdfAudits Of Complaints Against the PPD Report_2022.pdf
Audits Of Complaints Against the PPD Report_2022.pdf
 
Aws MLOps Interview Questions with answers
Aws MLOps Interview Questions  with answersAws MLOps Interview Questions  with answers
Aws MLOps Interview Questions with answers
 
Vrinda store data analysis project using Excel
Vrinda store data analysis project using ExcelVrinda store data analysis project using Excel
Vrinda store data analysis project using Excel
 
Field Diary and lab record, Importance.pdf
Field Diary and lab record, Importance.pdfField Diary and lab record, Importance.pdf
Field Diary and lab record, Importance.pdf
 
The Rise of Python in Finance,Automating Trading Strategies: _.pdf
The Rise of Python in Finance,Automating Trading Strategies: _.pdfThe Rise of Python in Finance,Automating Trading Strategies: _.pdf
The Rise of Python in Finance,Automating Trading Strategies: _.pdf
 
Combined supervised and unsupervised neural networks for pulse shape discrimi...
Combined supervised and unsupervised neural networks for pulse shape discrimi...Combined supervised and unsupervised neural networks for pulse shape discrimi...
Combined supervised and unsupervised neural networks for pulse shape discrimi...
 
SFBA Splunk Usergroup meeting July 17, 2024
SFBA Splunk Usergroup meeting July 17, 2024SFBA Splunk Usergroup meeting July 17, 2024
SFBA Splunk Usergroup meeting July 17, 2024
 
Getting Started with Interactive Brokers API and Python.pdf
Getting Started with Interactive Brokers API and Python.pdfGetting Started with Interactive Brokers API and Python.pdf
Getting Started with Interactive Brokers API and Python.pdf
 
How AI is Revolutionizing Data Collection.pdf
How AI is Revolutionizing Data Collection.pdfHow AI is Revolutionizing Data Collection.pdf
How AI is Revolutionizing Data Collection.pdf
 
future-of-asset-management-future-of-asset-management
future-of-asset-management-future-of-asset-managementfuture-of-asset-management-future-of-asset-management
future-of-asset-management-future-of-asset-management
 
Where to order Frederick Community College diploma?
Where to order Frederick Community College diploma?Where to order Frederick Community College diploma?
Where to order Frederick Community College diploma?
 
Solution Manual for First Course in Abstract Algebra A, 8th Edition by John B...
Solution Manual for First Course in Abstract Algebra A, 8th Edition by John B...Solution Manual for First Course in Abstract Algebra A, 8th Edition by John B...
Solution Manual for First Course in Abstract Algebra A, 8th Edition by John B...
 
Acid Base Practice Test 4- KEY.pdfkkjkjk
Acid Base Practice Test 4- KEY.pdfkkjkjkAcid Base Practice Test 4- KEY.pdfkkjkjk
Acid Base Practice Test 4- KEY.pdfkkjkjk
 
PRODUCT | RESEARCH-PRESENTATION-1.1.pptx
PRODUCT | RESEARCH-PRESENTATION-1.1.pptxPRODUCT | RESEARCH-PRESENTATION-1.1.pptx
PRODUCT | RESEARCH-PRESENTATION-1.1.pptx
 
Technology used in Ott data analysis project
Technology used in Ott data analysis  projectTechnology used in Ott data analysis  project
Technology used in Ott data analysis project
 
SAMPLE PRODUCT RESEARCH PR - strikingly.pptx
SAMPLE PRODUCT RESEARCH PR - strikingly.pptxSAMPLE PRODUCT RESEARCH PR - strikingly.pptx
SAMPLE PRODUCT RESEARCH PR - strikingly.pptx
 
Unit 1 Introduction to DATA SCIENCE .pptx
Unit 1 Introduction to DATA SCIENCE .pptxUnit 1 Introduction to DATA SCIENCE .pptx
Unit 1 Introduction to DATA SCIENCE .pptx
 
Bimbingan kaunseling untuk pelajar IPTA/IPTS di Malaysia
Bimbingan kaunseling untuk pelajar IPTA/IPTS di MalaysiaBimbingan kaunseling untuk pelajar IPTA/IPTS di Malaysia
Bimbingan kaunseling untuk pelajar IPTA/IPTS di Malaysia
 

Learn to Use Databricks for Data Science

  • 1. Learn to Use Databricks for Data Science Sean Owen, Principal Solutions Architect Austin Ford, Sr. Product Manager
  • 2. Data Science is a tough job ▪ Today, companies are becoming more and more data-driven, and the ones getting the most out of their data will be the ones to succeed ▪ As a result, Data Science is now a core capability of many businesses ▪ Unfortunately, it comes with a challenging, complex workflow at scale
  • 3. What does a data science workflow look like? I need the correctly sized compute resource for my task I need to be able to find and access the right data sources to fuel my analysis I need to be sure my toolbox is ready with the packages and libraries required for my work 1. Setup I’ve been given a business question to answer with data. Before I can even get started on the data science, I need to set up my development environment.
  • 4. What does a data science workflow look like? I uncover insights through statistical inference, modeling, or other methods I start with exploratory data analysis to familiarize myself with the data and form hypotheses I synthesize the results of my work and the answers to the original business question 2. Data Science Once the initial overhead of setup is complete, the real work begins. At any point, I could be sent back to the Setup phase to add another data source, change the size of my compute resource, or pull in another library.
  • 5. What does a data science workflow look like? I share the results with my business stakeholders via email or Slack I formulate the results into a report or dashboard so they can be consumed I get feedback about my work from my stakeholders and iterate with them to have the biggest impact 3. Sharing Results The most important step comes once I finish the analysis: sharing the results with my stakeholders.
  • 6. Our answer: The Databricks Lakehouse Platform We want to remove the overhead so you can focus on the most important part of your work — data science
  • 7. Structured Semi-structured Unstructured Streaming BI & SQL Analytics Machine Learning Real-time Data Applications Data Management & Governance Open Data Storage Data Science & Engineering Lakehouse Platform Simple | Open | Collaborative Reliable | Scalable | Secure
  • 8. Structured Semi-structured Unstructured Streaming BI & SQL Analytics Machine Learning Real-time Data Applications Data Management & Governance Open Data Storage Data Science & Engineering Lakehouse Platform Simple | Open | Collaborative Reliable | Scalable | Secure Our focus today
  • 9. Databricks makes setup easy 1. Setup The Lakehouse brings all your company’s data together into a single place so you don’t have to go digging through a variety of data sources Easily choose the right compute resource for your task and switch as needed single-machine VMs GPUs Spark clusters Databricks’ runtimes come prepackaged with the most common data science tools, and customization is easy Add Python libraries on top of a runtime with a single line of code
  • 10. Databricks has the tools to enable you to focus on your work 2. Data Science Multi-language, collaborative notebooks with co-presence, commenting, and co-editing Built-in visualizations that take you from raw data to insights in two clicks Auto-logged revision history and a git integration to ensure reproducibility and enable version control
  • 11. Databricks lets your share results and iterate quickly 3. Sharing Results Easily share your notebooks with stakeholders, who can view them as reports Create a dashboard directly from your notebook’s results Iterate with your stakeholders directly in the notebook through comments and co-presence
  • 12. Getting practical: hands-on with an expert Sean Owen Principal Solutions Architect