SlideShare a Scribd company logo
Sean Suchter
CTO @ Pepperdata
Spark performance is too hard,
let’s make it easier
Pepperdata does performance (for Big Data)
15
Thousand
Production
Nodes
50
Million
Jobs/Year
200
Trillion
Performance
Data Points
Today’s talk will cover…
• How code translates to execution
• How to find common, known problems
• For the rest of the problems…
– Why debugging performance problems is hard
– Data elements needed for complete view of application
performance from separate tools
– Bringing these elements together in a single tool
Brief terminology about Spark
• An app contains
multiple jobs
• A job contains
multiple stages
• A stage contains
multiple tasks
• Executors run tasks
Example App
A word count app:
val textFile = sc.textFile("hdfs:/dict.txt")
val counts = textFile.flatMap(line => line.split(" "))
.map(word => (word, 1))
.reduceByKey(_ + _)
counts.saveAsTextFile("hdfs:/wordcounts.txt")
1. Declares input from
external storage
2. Specifies
transformations
3. Triggers an action
Distributed Architecture
Spark executes a job using
multiple machines.
Spark
Driver
process
Spark
Executor 1
process
Spark
Executor 2
process
Spark
Executor N
process
Sends tasks
Stages
Image source.
val textFile = sc.textFile("hdfs:/dict.txt")
val counts = textFile.flatMap(line => line.split(" "))
.map(word => (word, 1))
.reduceByKey(_ + _)
counts.saveAsTextFile("hdfs:/wordcounts.txt")
Shuffle and Re-partitioning
Image source.
Stages and Tasks in Example Job
Task
0
Task
1
Task
n
Task
n+m
Task
n+1
Task
n+2
Debugging known problems
The easier case…
Spark History Server
11
Spark History Server
12
Intro: Dr Elephant (MapReduce)
What does Dr. Elephant do?
• Performance monitoring and tuning service
• Finds common mistakes, indicates best practices
14
Spark Application Heuristics
15
Spark Application Heuristics
16
3 Classes of Spark Heuristics
• Configuration Settings
• Simple Alarms on Stage/Job Failure
• Data-Dependent Tuning Suggestions
17
Configuration Heuristic
• Display some basic config settings for your app
• Complain if some settings not explicitly set
• Recommend configuring an external shuffle
service (especially if dynamic allocation is
enabled)
• These recommendations won’t change over
multiple runs of an application
18
Stages and Jobs Heuristics
• Simple alarms showing stage and job failure rates
• Good for seeing when there’s a problem
19
Executors Heuristic
• Looks at the distribution across executors of
several different metrics
• Outliers in these distributions probably indicate:
– Suboptimal partitioning.
– One or more slow executors due to external
circumstances (cluster weather)
20
Partitions Heuristic
• Ideally data for each task will fit into the RAM
available to that task.
• Sandy Ryza (once from Cloudera) has an
excellent blog on Spark tuning:
(observed shuffle write) * (observed shuffle spill memory) * (spark.executor.cores)
(observed shuffle spill disk) * (spark.executor.memory) * (spark.shuffle.memoryFraction) * (spark.shuffle.safetyFraction)
http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/
21
More Heuristics?
Yes, please! Dr. Elephant is open source.
https://github.com/linkedin/dr-elephant
22
Is there an enterprise version?
Pepperdata Application Profiler
• Benefits to our users:
– Provide simple answers to simple questions
– Combination of metrics for experts
– Simple actionable insights for all users
– Pepperdata support
• Why stay close to open source?
– Heuristics
24
Pepperdata Application Profiler
25
Debugging novel problems
The harder case…
2 reasons this is hard
Reason #1
Same external symptom (“too slow”), but many possible
causes:
• code
• data
• configuration
• cluster weather
Reason #2
Existing tools provide limited visibility
• Spark Web UI is the most popular
– Good view of query execution plan (job/stages/DAG)
– Limited view of aggregate performance data
• Time series
– Ganglia, Ambari, CM, etc provide time series data for cluster (but
not specific to Spark apps)
– Spark Sink metrics can be fed to InfluxDb/others, yielding partial
Spark app metrics
• Code execution not connected to resource consumption
• Load from other apps unaccounted
3 data elements form a complete picture
of Spark application performance
1. Code execution plan
– Indicates which block of code is being executed, where
2. Time series view
– Visual of resource consumption of application
– Outliers in resource usage very easy to detect
3. Cluster weather
– A view of all applications that run on the cluster
Spark Web UI
First half of solution
Logical code execution plan from Spark:
Jobs / Stages / DAG
Physical execution plan from Spark:
Executors / Tasks
Time series view
Second half of solution
Time series view of resource consumption
for the App
Bring them together
Best of both worlds
Code Analyzer = execution plan + time series
GC across all Stages of App
Let’s examine GC activity in Stage 4
Executor skew increased Stage duration 2x
Executor 6 does twice as much work: possible
solution increase number of partitions
What if it’s not your fault?
Cluster weather
How does cluster weather impact your app ?
No apparent reason for delay from Spark
Web UI
Time series shows slower run of app with
much lower resources
View cluster weather for slower run of app
Cluster weather reveals reason for CPU
constraints on slower app
Cluster weather reveals reason for
memory constraints on slower app
Cluster weather reveals reason for HDFS
constraints on slower app
Code Analyzer for Apache Spark
• Free during Early Access starting today
• Early Access is for development teams
• To learn more visit booth #101
• info@pepperdata.com
pepperdata.com/products/code-analyzer
Other performance tools mentioned
• Dr Elephant
– github.com/linkedin/dr-elephant
• Application Profiler
– www.pepperdata.com/products/application-profiler/
To recap
• Use heuristics to find known problems
• Execution plan + time series = powerful visualization
• Knowing cluster weather can prevent time wasted
debugging performance “issues” that aren’t the app’s
fault
Spark Summit Talk Plugs
Tuesday 11:40AM Connect Code to Resource Consumption to Scale Your
Production Spark Applications (Vinod @ Pepperdata)
Tuesday 12:50PM Kubernetes SIG Big Data Birds-of-a-Feather session
(many)
Tuesday 3:20PM Apache Spark on Kubernetes (Anirudh @ Google, Tim @
Hyperpilot)
Wednesday 11:00AM HDFS on Kubernetes – Lessons Learned (Kimoon @
Pepperdata)
Wednesday 11:00AM Dr Elephant for Monitoring and Tuning Apache Spark Jobs
on Hadoop (Carl @ LinkedIn, Simon @ Pepperdata)
Thank You.
www.pepperdata.com/products/code-analyzer/
ssuchter@pepperdata.com

More Related Content

What's hot

Advanced Natural Language Processing with Apache Spark NLP
Advanced Natural Language Processing with Apache Spark NLPAdvanced Natural Language Processing with Apache Spark NLP
Advanced Natural Language Processing with Apache Spark NLP
Databricks
 
Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...
Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...
Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...
Databricks
 
Building a Unified Data Pipeline with Apache Spark and XGBoost with Nan Zhu
Building a Unified Data Pipeline with Apache Spark and XGBoost with Nan ZhuBuilding a Unified Data Pipeline with Apache Spark and XGBoost with Nan Zhu
Building a Unified Data Pipeline with Apache Spark and XGBoost with Nan Zhu
Databricks
 
Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...
Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...
Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...
Databricks
 
A Developer’s View into Spark's Memory Model with Wenchen Fan
A Developer’s View into Spark's Memory Model with Wenchen FanA Developer’s View into Spark's Memory Model with Wenchen Fan
A Developer’s View into Spark's Memory Model with Wenchen Fan
Databricks
 
Improving Apache Spark for Dynamic Allocation and Spot Instances
Improving Apache Spark for Dynamic Allocation and Spot InstancesImproving Apache Spark for Dynamic Allocation and Spot Instances
Improving Apache Spark for Dynamic Allocation and Spot Instances
Databricks
 
Jump Start on Apache® Spark™ 2.x with Databricks
Jump Start on Apache® Spark™ 2.x with Databricks Jump Start on Apache® Spark™ 2.x with Databricks
Jump Start on Apache® Spark™ 2.x with Databricks
Databricks
 
Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)
Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)
Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)
Spark Summit
 
Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi
 Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi
Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi
Databricks
 
Continuous Application with FAIR Scheduler with Robert Xue
Continuous Application with FAIR Scheduler with Robert XueContinuous Application with FAIR Scheduler with Robert Xue
Continuous Application with FAIR Scheduler with Robert Xue
Databricks
 
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
Databricks
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache Spark
Rahul Jain
 
Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...
Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...
Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...
Databricks
 
Metrics-Driven Tuning of Apache Spark at Scale with Edwina Lu and Ye Zhou
Metrics-Driven Tuning of Apache Spark at Scale with Edwina Lu and Ye ZhouMetrics-Driven Tuning of Apache Spark at Scale with Edwina Lu and Ye Zhou
Metrics-Driven Tuning of Apache Spark at Scale with Edwina Lu and Ye Zhou
Databricks
 
Transactional writes to cloud storage with Eric Liang
Transactional writes to cloud storage with Eric LiangTransactional writes to cloud storage with Eric Liang
Transactional writes to cloud storage with Eric Liang
Databricks
 
Spark Tuning For Enterprise System Administrators, Spark Summit East 2016
Spark Tuning For Enterprise System Administrators, Spark Summit East 2016Spark Tuning For Enterprise System Administrators, Spark Summit East 2016
Spark Tuning For Enterprise System Administrators, Spark Summit East 2016
Anya Bida
 
Alpine academy apache spark series #1 introduction to cluster computing wit...
Alpine academy apache spark series #1   introduction to cluster computing wit...Alpine academy apache spark series #1   introduction to cluster computing wit...
Alpine academy apache spark series #1 introduction to cluster computing wit...
Holden Karau
 
Monitor Apache Spark 3 on Kubernetes using Metrics and Plugins
Monitor Apache Spark 3 on Kubernetes using Metrics and PluginsMonitor Apache Spark 3 on Kubernetes using Metrics and Plugins
Monitor Apache Spark 3 on Kubernetes using Metrics and Plugins
Databricks
 
Spark Summit EU talk by Jim Dowling
Spark Summit EU talk by Jim DowlingSpark Summit EU talk by Jim Dowling
Spark Summit EU talk by Jim Dowling
Spark Summit
 
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Databricks
 

What's hot (20)

Advanced Natural Language Processing with Apache Spark NLP
Advanced Natural Language Processing with Apache Spark NLPAdvanced Natural Language Processing with Apache Spark NLP
Advanced Natural Language Processing with Apache Spark NLP
 
Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...
Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...
Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...
 
Building a Unified Data Pipeline with Apache Spark and XGBoost with Nan Zhu
Building a Unified Data Pipeline with Apache Spark and XGBoost with Nan ZhuBuilding a Unified Data Pipeline with Apache Spark and XGBoost with Nan Zhu
Building a Unified Data Pipeline with Apache Spark and XGBoost with Nan Zhu
 
Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...
Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...
Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...
 
A Developer’s View into Spark's Memory Model with Wenchen Fan
A Developer’s View into Spark's Memory Model with Wenchen FanA Developer’s View into Spark's Memory Model with Wenchen Fan
A Developer’s View into Spark's Memory Model with Wenchen Fan
 
Improving Apache Spark for Dynamic Allocation and Spot Instances
Improving Apache Spark for Dynamic Allocation and Spot InstancesImproving Apache Spark for Dynamic Allocation and Spot Instances
Improving Apache Spark for Dynamic Allocation and Spot Instances
 
Jump Start on Apache® Spark™ 2.x with Databricks
Jump Start on Apache® Spark™ 2.x with Databricks Jump Start on Apache® Spark™ 2.x with Databricks
Jump Start on Apache® Spark™ 2.x with Databricks
 
Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)
Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)
Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)
 
Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi
 Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi
Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi
 
Continuous Application with FAIR Scheduler with Robert Xue
Continuous Application with FAIR Scheduler with Robert XueContinuous Application with FAIR Scheduler with Robert Xue
Continuous Application with FAIR Scheduler with Robert Xue
 
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache Spark
 
Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...
Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...
Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...
 
Metrics-Driven Tuning of Apache Spark at Scale with Edwina Lu and Ye Zhou
Metrics-Driven Tuning of Apache Spark at Scale with Edwina Lu and Ye ZhouMetrics-Driven Tuning of Apache Spark at Scale with Edwina Lu and Ye Zhou
Metrics-Driven Tuning of Apache Spark at Scale with Edwina Lu and Ye Zhou
 
Transactional writes to cloud storage with Eric Liang
Transactional writes to cloud storage with Eric LiangTransactional writes to cloud storage with Eric Liang
Transactional writes to cloud storage with Eric Liang
 
Spark Tuning For Enterprise System Administrators, Spark Summit East 2016
Spark Tuning For Enterprise System Administrators, Spark Summit East 2016Spark Tuning For Enterprise System Administrators, Spark Summit East 2016
Spark Tuning For Enterprise System Administrators, Spark Summit East 2016
 
Alpine academy apache spark series #1 introduction to cluster computing wit...
Alpine academy apache spark series #1   introduction to cluster computing wit...Alpine academy apache spark series #1   introduction to cluster computing wit...
Alpine academy apache spark series #1 introduction to cluster computing wit...
 
Monitor Apache Spark 3 on Kubernetes using Metrics and Plugins
Monitor Apache Spark 3 on Kubernetes using Metrics and PluginsMonitor Apache Spark 3 on Kubernetes using Metrics and Plugins
Monitor Apache Spark 3 on Kubernetes using Metrics and Plugins
 
Spark Summit EU talk by Jim Dowling
Spark Summit EU talk by Jim DowlingSpark Summit EU talk by Jim Dowling
Spark Summit EU talk by Jim Dowling
 
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
 

Similar to Apache Spark Performance is too hard. Let's make it easier

Connect Code to Resource Consumption to Scale Your Production Spark Applicati...
Connect Code to Resource Consumption to Scale Your Production Spark Applicati...Connect Code to Resource Consumption to Scale Your Production Spark Applicati...
Connect Code to Resource Consumption to Scale Your Production Spark Applicati...
Databricks
 
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingTiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Paco Nathan
 
Spark Development Lifecycle at Workday - ApacheCon 2020
Spark Development Lifecycle at Workday - ApacheCon 2020Spark Development Lifecycle at Workday - ApacheCon 2020
Spark Development Lifecycle at Workday - ApacheCon 2020
Pavel Hardak
 
Apache Spark Development Lifecycle @ Workday - ApacheCon 2020
Apache Spark Development Lifecycle @ Workday - ApacheCon 2020Apache Spark Development Lifecycle @ Workday - ApacheCon 2020
Apache Spark Development Lifecycle @ Workday - ApacheCon 2020
Eren Avşaroğulları
 
Highlights and Challenges from Running Spark on Mesos in Production by Morri ...
Highlights and Challenges from Running Spark on Mesos in Production by Morri ...Highlights and Challenges from Running Spark on Mesos in Production by Morri ...
Highlights and Challenges from Running Spark on Mesos in Production by Morri ...
Spark Summit
 
Teaching Apache Spark: Demonstrations on the Databricks Cloud Platform
Teaching Apache Spark: Demonstrations on the Databricks Cloud PlatformTeaching Apache Spark: Demonstrations on the Databricks Cloud Platform
Teaching Apache Spark: Demonstrations on the Databricks Cloud Platform
Yao Yao
 
Apache Spark: What's under the hood
Apache Spark: What's under the hoodApache Spark: What's under the hood
Apache Spark: What's under the hood
Adarsh Pannu
 
Introduction to Apache Spark :: Lagos Scala Meetup session 2
Introduction to Apache Spark :: Lagos Scala Meetup session 2 Introduction to Apache Spark :: Lagos Scala Meetup session 2
Introduction to Apache Spark :: Lagos Scala Meetup session 2
Olalekan Fuad Elesin
 
From Pipelines to Refineries: scaling big data applications with Tim Hunter
From Pipelines to Refineries: scaling big data applications with Tim HunterFrom Pipelines to Refineries: scaling big data applications with Tim Hunter
From Pipelines to Refineries: scaling big data applications with Tim Hunter
Databricks
 
Taboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache SparkTaboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache Spark
tsliwowicz
 
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARKSCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK
zmhassan
 
Hands on with Apache Spark
Hands on with Apache SparkHands on with Apache Spark
Hands on with Apache Spark
Dan Lynn
 
Unit testing of spark applications
Unit testing of spark applicationsUnit testing of spark applications
Unit testing of spark applications
Knoldus Inc.
 
Taboola's experience with Apache Spark (presentation @ Reversim 2014)
Taboola's experience with Apache Spark (presentation @ Reversim 2014)Taboola's experience with Apache Spark (presentation @ Reversim 2014)
Taboola's experience with Apache Spark (presentation @ Reversim 2014)
tsliwowicz
 
Headaches and Breakthroughs in Building Continuous Applications
Headaches and Breakthroughs in Building Continuous ApplicationsHeadaches and Breakthroughs in Building Continuous Applications
Headaches and Breakthroughs in Building Continuous Applications
Databricks
 
Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...
Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...
Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...
Landon Robinson
 
Spark Uber Development Kit
Spark Uber Development KitSpark Uber Development Kit
Spark Uber Development Kit
DataWorks Summit/Hadoop Summit
 
Typesafe spark- Zalando meetup
Typesafe spark- Zalando meetupTypesafe spark- Zalando meetup
Typesafe spark- Zalando meetup
Stavros Kontopoulos
 
Introduction to Apache Spark
Introduction to Apache Spark Introduction to Apache Spark
Introduction to Apache Spark
Juan Pedro Moreno
 
Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
 Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F... Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
Databricks
 

Similar to Apache Spark Performance is too hard. Let's make it easier (20)

Connect Code to Resource Consumption to Scale Your Production Spark Applicati...
Connect Code to Resource Consumption to Scale Your Production Spark Applicati...Connect Code to Resource Consumption to Scale Your Production Spark Applicati...
Connect Code to Resource Consumption to Scale Your Production Spark Applicati...
 
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingTiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
 
Spark Development Lifecycle at Workday - ApacheCon 2020
Spark Development Lifecycle at Workday - ApacheCon 2020Spark Development Lifecycle at Workday - ApacheCon 2020
Spark Development Lifecycle at Workday - ApacheCon 2020
 
Apache Spark Development Lifecycle @ Workday - ApacheCon 2020
Apache Spark Development Lifecycle @ Workday - ApacheCon 2020Apache Spark Development Lifecycle @ Workday - ApacheCon 2020
Apache Spark Development Lifecycle @ Workday - ApacheCon 2020
 
Highlights and Challenges from Running Spark on Mesos in Production by Morri ...
Highlights and Challenges from Running Spark on Mesos in Production by Morri ...Highlights and Challenges from Running Spark on Mesos in Production by Morri ...
Highlights and Challenges from Running Spark on Mesos in Production by Morri ...
 
Teaching Apache Spark: Demonstrations on the Databricks Cloud Platform
Teaching Apache Spark: Demonstrations on the Databricks Cloud PlatformTeaching Apache Spark: Demonstrations on the Databricks Cloud Platform
Teaching Apache Spark: Demonstrations on the Databricks Cloud Platform
 
Apache Spark: What's under the hood
Apache Spark: What's under the hoodApache Spark: What's under the hood
Apache Spark: What's under the hood
 
Introduction to Apache Spark :: Lagos Scala Meetup session 2
Introduction to Apache Spark :: Lagos Scala Meetup session 2 Introduction to Apache Spark :: Lagos Scala Meetup session 2
Introduction to Apache Spark :: Lagos Scala Meetup session 2
 
From Pipelines to Refineries: scaling big data applications with Tim Hunter
From Pipelines to Refineries: scaling big data applications with Tim HunterFrom Pipelines to Refineries: scaling big data applications with Tim Hunter
From Pipelines to Refineries: scaling big data applications with Tim Hunter
 
Taboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache SparkTaboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache Spark
 
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARKSCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK
 
Hands on with Apache Spark
Hands on with Apache SparkHands on with Apache Spark
Hands on with Apache Spark
 
Unit testing of spark applications
Unit testing of spark applicationsUnit testing of spark applications
Unit testing of spark applications
 
Taboola's experience with Apache Spark (presentation @ Reversim 2014)
Taboola's experience with Apache Spark (presentation @ Reversim 2014)Taboola's experience with Apache Spark (presentation @ Reversim 2014)
Taboola's experience with Apache Spark (presentation @ Reversim 2014)
 
Headaches and Breakthroughs in Building Continuous Applications
Headaches and Breakthroughs in Building Continuous ApplicationsHeadaches and Breakthroughs in Building Continuous Applications
Headaches and Breakthroughs in Building Continuous Applications
 
Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...
Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...
Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...
 
Spark Uber Development Kit
Spark Uber Development KitSpark Uber Development Kit
Spark Uber Development Kit
 
Typesafe spark- Zalando meetup
Typesafe spark- Zalando meetupTypesafe spark- Zalando meetup
Typesafe spark- Zalando meetup
 
Introduction to Apache Spark
Introduction to Apache Spark Introduction to Apache Spark
Introduction to Apache Spark
 
Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
 Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F... Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
 

More from Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
Databricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
Databricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
Databricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
Databricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Recently uploaded

CrushFTP 10.4.0.29 PC Software - WhizNews
CrushFTP 10.4.0.29 PC Software - WhizNewsCrushFTP 10.4.0.29 PC Software - WhizNews
CrushFTP 10.4.0.29 PC Software - WhizNews
Eman Nisar
 
BitLocker Data Recovery | BLR Tools Data Recovery Solutions
BitLocker Data Recovery | BLR Tools Data Recovery SolutionsBitLocker Data Recovery | BLR Tools Data Recovery Solutions
BitLocker Data Recovery | BLR Tools Data Recovery Solutions
Alina Tait
 
Unlocking the Future of Artificial Intelligence
Unlocking the Future of Artificial IntelligenceUnlocking the Future of Artificial Intelligence
Unlocking the Future of Artificial Intelligence
dorinIonescu
 
Unlocking value with event-driven architecture by Confluent
Unlocking value with event-driven architecture by ConfluentUnlocking value with event-driven architecture by Confluent
Unlocking value with event-driven architecture by Confluent
confluent
 
Crowd Strike\Windows Update Issue: Overview and Current Status
Crowd Strike\Windows Update Issue: Overview and Current StatusCrowd Strike\Windows Update Issue: Overview and Current Status
Crowd Strike\Windows Update Issue: Overview and Current Status
ramaganesan0504
 
06. Ruby Array & Hash - Ruby Core Teaching
06. Ruby Array & Hash - Ruby Core Teaching06. Ruby Array & Hash - Ruby Core Teaching
06. Ruby Array & Hash - Ruby Core Teaching
quanhoangd129
 
Understanding Automated Testing Tools for Web Applications.pdf
Understanding Automated Testing Tools for Web Applications.pdfUnderstanding Automated Testing Tools for Web Applications.pdf
Understanding Automated Testing Tools for Web Applications.pdf
kalichargn70th171
 
BDRSuite - #1 Cost effective Data Backup and Recovery Solution
BDRSuite - #1 Cost effective Data Backup and Recovery SolutionBDRSuite - #1 Cost effective Data Backup and Recovery Solution
BDRSuite - #1 Cost effective Data Backup and Recovery Solution
praveene26
 
240717 ProPILE - Probing Privacy Leakage in Large Language Models.pdf
240717 ProPILE - Probing Privacy Leakage in Large Language Models.pdf240717 ProPILE - Probing Privacy Leakage in Large Language Models.pdf
240717 ProPILE - Probing Privacy Leakage in Large Language Models.pdf
CS Kwak
 
01. Ruby Introduction - Ruby Core Teaching
01. Ruby Introduction - Ruby Core Teaching01. Ruby Introduction - Ruby Core Teaching
01. Ruby Introduction - Ruby Core Teaching
quanhoangd129
 
Old Tools, New Tricks: Unleashing the Power of Time-Tested Testing Tools
Old Tools, New Tricks: Unleashing the Power of Time-Tested Testing ToolsOld Tools, New Tricks: Unleashing the Power of Time-Tested Testing Tools
Old Tools, New Tricks: Unleashing the Power of Time-Tested Testing Tools
Benjamin Bischoff
 
The Politics of Agile Development.pptx
The  Politics of  Agile Development.pptxThe  Politics of  Agile Development.pptx
The Politics of Agile Development.pptx
NMahendiran
 
Literals - A Machine Independent Feature
Literals - A Machine Independent FeatureLiterals - A Machine Independent Feature
Literals - A Machine Independent Feature
21h16charis
 
Monitoring the Execution of 14K Tests: Methods Tend to Have One Path that Is ...
Monitoring the Execution of 14K Tests: Methods Tend to Have One Path that Is ...Monitoring the Execution of 14K Tests: Methods Tend to Have One Path that Is ...
Monitoring the Execution of 14K Tests: Methods Tend to Have One Path that Is ...
Andre Hora
 
PathSpotter: Exploring Tested Paths to Discover Missing Tests (FSE 2024)
PathSpotter: Exploring Tested Paths to Discover Missing Tests (FSE 2024)PathSpotter: Exploring Tested Paths to Discover Missing Tests (FSE 2024)
PathSpotter: Exploring Tested Paths to Discover Missing Tests (FSE 2024)
Andre Hora
 
AI-driven Automation_ Transforming DevOps Practices.docx
AI-driven Automation_ Transforming DevOps Practices.docxAI-driven Automation_ Transforming DevOps Practices.docx
AI-driven Automation_ Transforming DevOps Practices.docx
zoondiacom
 
Top 10 ERP Companies in UAE Banibro IT Solutions.pdf
Top 10 ERP Companies in UAE Banibro IT Solutions.pdfTop 10 ERP Companies in UAE Banibro IT Solutions.pdf
Top 10 ERP Companies in UAE Banibro IT Solutions.pdf
Banibro IT Solutions
 
iBirds Services - Comprehensive Salesforce CRM and Software Development Solut...
iBirds Services - Comprehensive Salesforce CRM and Software Development Solut...iBirds Services - Comprehensive Salesforce CRM and Software Development Solut...
iBirds Services - Comprehensive Salesforce CRM and Software Development Solut...
vijayatibirds
 
09. Ruby Object Oriented Programming - Ruby Core Teaching
09. Ruby Object Oriented Programming - Ruby Core Teaching09. Ruby Object Oriented Programming - Ruby Core Teaching
09. Ruby Object Oriented Programming - Ruby Core Teaching
quanhoangd129
 
vSAN_Tutorial_Presentation with important topics
vSAN_Tutorial_Presentation with important  topicsvSAN_Tutorial_Presentation with important  topics
vSAN_Tutorial_Presentation with important topics
abhilashspt
 

Recently uploaded (20)

CrushFTP 10.4.0.29 PC Software - WhizNews
CrushFTP 10.4.0.29 PC Software - WhizNewsCrushFTP 10.4.0.29 PC Software - WhizNews
CrushFTP 10.4.0.29 PC Software - WhizNews
 
BitLocker Data Recovery | BLR Tools Data Recovery Solutions
BitLocker Data Recovery | BLR Tools Data Recovery SolutionsBitLocker Data Recovery | BLR Tools Data Recovery Solutions
BitLocker Data Recovery | BLR Tools Data Recovery Solutions
 
Unlocking the Future of Artificial Intelligence
Unlocking the Future of Artificial IntelligenceUnlocking the Future of Artificial Intelligence
Unlocking the Future of Artificial Intelligence
 
Unlocking value with event-driven architecture by Confluent
Unlocking value with event-driven architecture by ConfluentUnlocking value with event-driven architecture by Confluent
Unlocking value with event-driven architecture by Confluent
 
Crowd Strike\Windows Update Issue: Overview and Current Status
Crowd Strike\Windows Update Issue: Overview and Current StatusCrowd Strike\Windows Update Issue: Overview and Current Status
Crowd Strike\Windows Update Issue: Overview and Current Status
 
06. Ruby Array & Hash - Ruby Core Teaching
06. Ruby Array & Hash - Ruby Core Teaching06. Ruby Array & Hash - Ruby Core Teaching
06. Ruby Array & Hash - Ruby Core Teaching
 
Understanding Automated Testing Tools for Web Applications.pdf
Understanding Automated Testing Tools for Web Applications.pdfUnderstanding Automated Testing Tools for Web Applications.pdf
Understanding Automated Testing Tools for Web Applications.pdf
 
BDRSuite - #1 Cost effective Data Backup and Recovery Solution
BDRSuite - #1 Cost effective Data Backup and Recovery SolutionBDRSuite - #1 Cost effective Data Backup and Recovery Solution
BDRSuite - #1 Cost effective Data Backup and Recovery Solution
 
240717 ProPILE - Probing Privacy Leakage in Large Language Models.pdf
240717 ProPILE - Probing Privacy Leakage in Large Language Models.pdf240717 ProPILE - Probing Privacy Leakage in Large Language Models.pdf
240717 ProPILE - Probing Privacy Leakage in Large Language Models.pdf
 
01. Ruby Introduction - Ruby Core Teaching
01. Ruby Introduction - Ruby Core Teaching01. Ruby Introduction - Ruby Core Teaching
01. Ruby Introduction - Ruby Core Teaching
 
Old Tools, New Tricks: Unleashing the Power of Time-Tested Testing Tools
Old Tools, New Tricks: Unleashing the Power of Time-Tested Testing ToolsOld Tools, New Tricks: Unleashing the Power of Time-Tested Testing Tools
Old Tools, New Tricks: Unleashing the Power of Time-Tested Testing Tools
 
The Politics of Agile Development.pptx
The  Politics of  Agile Development.pptxThe  Politics of  Agile Development.pptx
The Politics of Agile Development.pptx
 
Literals - A Machine Independent Feature
Literals - A Machine Independent FeatureLiterals - A Machine Independent Feature
Literals - A Machine Independent Feature
 
Monitoring the Execution of 14K Tests: Methods Tend to Have One Path that Is ...
Monitoring the Execution of 14K Tests: Methods Tend to Have One Path that Is ...Monitoring the Execution of 14K Tests: Methods Tend to Have One Path that Is ...
Monitoring the Execution of 14K Tests: Methods Tend to Have One Path that Is ...
 
PathSpotter: Exploring Tested Paths to Discover Missing Tests (FSE 2024)
PathSpotter: Exploring Tested Paths to Discover Missing Tests (FSE 2024)PathSpotter: Exploring Tested Paths to Discover Missing Tests (FSE 2024)
PathSpotter: Exploring Tested Paths to Discover Missing Tests (FSE 2024)
 
AI-driven Automation_ Transforming DevOps Practices.docx
AI-driven Automation_ Transforming DevOps Practices.docxAI-driven Automation_ Transforming DevOps Practices.docx
AI-driven Automation_ Transforming DevOps Practices.docx
 
Top 10 ERP Companies in UAE Banibro IT Solutions.pdf
Top 10 ERP Companies in UAE Banibro IT Solutions.pdfTop 10 ERP Companies in UAE Banibro IT Solutions.pdf
Top 10 ERP Companies in UAE Banibro IT Solutions.pdf
 
iBirds Services - Comprehensive Salesforce CRM and Software Development Solut...
iBirds Services - Comprehensive Salesforce CRM and Software Development Solut...iBirds Services - Comprehensive Salesforce CRM and Software Development Solut...
iBirds Services - Comprehensive Salesforce CRM and Software Development Solut...
 
09. Ruby Object Oriented Programming - Ruby Core Teaching
09. Ruby Object Oriented Programming - Ruby Core Teaching09. Ruby Object Oriented Programming - Ruby Core Teaching
09. Ruby Object Oriented Programming - Ruby Core Teaching
 
vSAN_Tutorial_Presentation with important topics
vSAN_Tutorial_Presentation with important  topicsvSAN_Tutorial_Presentation with important  topics
vSAN_Tutorial_Presentation with important topics
 

Apache Spark Performance is too hard. Let's make it easier

  • 1. Sean Suchter CTO @ Pepperdata Spark performance is too hard, let’s make it easier
  • 2. Pepperdata does performance (for Big Data) 15 Thousand Production Nodes 50 Million Jobs/Year 200 Trillion Performance Data Points
  • 3. Today’s talk will cover… • How code translates to execution • How to find common, known problems • For the rest of the problems… – Why debugging performance problems is hard – Data elements needed for complete view of application performance from separate tools – Bringing these elements together in a single tool
  • 4. Brief terminology about Spark • An app contains multiple jobs • A job contains multiple stages • A stage contains multiple tasks • Executors run tasks
  • 5. Example App A word count app: val textFile = sc.textFile("hdfs:/dict.txt") val counts = textFile.flatMap(line => line.split(" ")) .map(word => (word, 1)) .reduceByKey(_ + _) counts.saveAsTextFile("hdfs:/wordcounts.txt") 1. Declares input from external storage 2. Specifies transformations 3. Triggers an action
  • 6. Distributed Architecture Spark executes a job using multiple machines. Spark Driver process Spark Executor 1 process Spark Executor 2 process Spark Executor N process Sends tasks
  • 7. Stages Image source. val textFile = sc.textFile("hdfs:/dict.txt") val counts = textFile.flatMap(line => line.split(" ")) .map(word => (word, 1)) .reduceByKey(_ + _) counts.saveAsTextFile("hdfs:/wordcounts.txt")
  • 9. Stages and Tasks in Example Job Task 0 Task 1 Task n Task n+m Task n+1 Task n+2
  • 13. Intro: Dr Elephant (MapReduce)
  • 14. What does Dr. Elephant do? • Performance monitoring and tuning service • Finds common mistakes, indicates best practices 14
  • 17. 3 Classes of Spark Heuristics • Configuration Settings • Simple Alarms on Stage/Job Failure • Data-Dependent Tuning Suggestions 17
  • 18. Configuration Heuristic • Display some basic config settings for your app • Complain if some settings not explicitly set • Recommend configuring an external shuffle service (especially if dynamic allocation is enabled) • These recommendations won’t change over multiple runs of an application 18
  • 19. Stages and Jobs Heuristics • Simple alarms showing stage and job failure rates • Good for seeing when there’s a problem 19
  • 20. Executors Heuristic • Looks at the distribution across executors of several different metrics • Outliers in these distributions probably indicate: – Suboptimal partitioning. – One or more slow executors due to external circumstances (cluster weather) 20
  • 21. Partitions Heuristic • Ideally data for each task will fit into the RAM available to that task. • Sandy Ryza (once from Cloudera) has an excellent blog on Spark tuning: (observed shuffle write) * (observed shuffle spill memory) * (spark.executor.cores) (observed shuffle spill disk) * (spark.executor.memory) * (spark.shuffle.memoryFraction) * (spark.shuffle.safetyFraction) http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/ 21
  • 22. More Heuristics? Yes, please! Dr. Elephant is open source. https://github.com/linkedin/dr-elephant 22
  • 23. Is there an enterprise version?
  • 24. Pepperdata Application Profiler • Benefits to our users: – Provide simple answers to simple questions – Combination of metrics for experts – Simple actionable insights for all users – Pepperdata support • Why stay close to open source? – Heuristics 24
  • 27. 2 reasons this is hard
  • 28. Reason #1 Same external symptom (“too slow”), but many possible causes: • code • data • configuration • cluster weather
  • 29. Reason #2 Existing tools provide limited visibility • Spark Web UI is the most popular – Good view of query execution plan (job/stages/DAG) – Limited view of aggregate performance data • Time series – Ganglia, Ambari, CM, etc provide time series data for cluster (but not specific to Spark apps) – Spark Sink metrics can be fed to InfluxDb/others, yielding partial Spark app metrics • Code execution not connected to resource consumption • Load from other apps unaccounted
  • 30. 3 data elements form a complete picture of Spark application performance 1. Code execution plan – Indicates which block of code is being executed, where 2. Time series view – Visual of resource consumption of application – Outliers in resource usage very easy to detect 3. Cluster weather – A view of all applications that run on the cluster
  • 31. Spark Web UI First half of solution
  • 32. Logical code execution plan from Spark: Jobs / Stages / DAG
  • 33. Physical execution plan from Spark: Executors / Tasks
  • 34. Time series view Second half of solution
  • 35. Time series view of resource consumption for the App
  • 36. Bring them together Best of both worlds
  • 37. Code Analyzer = execution plan + time series
  • 38. GC across all Stages of App
  • 39. Let’s examine GC activity in Stage 4
  • 40. Executor skew increased Stage duration 2x
  • 41. Executor 6 does twice as much work: possible solution increase number of partitions
  • 42. What if it’s not your fault? Cluster weather
  • 43. How does cluster weather impact your app ?
  • 44. No apparent reason for delay from Spark Web UI
  • 45. Time series shows slower run of app with much lower resources
  • 46. View cluster weather for slower run of app
  • 47. Cluster weather reveals reason for CPU constraints on slower app
  • 48. Cluster weather reveals reason for memory constraints on slower app
  • 49. Cluster weather reveals reason for HDFS constraints on slower app
  • 50. Code Analyzer for Apache Spark • Free during Early Access starting today • Early Access is for development teams • To learn more visit booth #101 • info@pepperdata.com pepperdata.com/products/code-analyzer
  • 51. Other performance tools mentioned • Dr Elephant – github.com/linkedin/dr-elephant • Application Profiler – www.pepperdata.com/products/application-profiler/
  • 52. To recap • Use heuristics to find known problems • Execution plan + time series = powerful visualization • Knowing cluster weather can prevent time wasted debugging performance “issues” that aren’t the app’s fault
  • 53. Spark Summit Talk Plugs Tuesday 11:40AM Connect Code to Resource Consumption to Scale Your Production Spark Applications (Vinod @ Pepperdata) Tuesday 12:50PM Kubernetes SIG Big Data Birds-of-a-Feather session (many) Tuesday 3:20PM Apache Spark on Kubernetes (Anirudh @ Google, Tim @ Hyperpilot) Wednesday 11:00AM HDFS on Kubernetes – Lessons Learned (Kimoon @ Pepperdata) Wednesday 11:00AM Dr Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop (Carl @ LinkedIn, Simon @ Pepperdata)