SlideShare a Scribd company logo
Sean Suchter
CTO @ Pepperdata
Spark performance is too hard,
let’s make it easier
Pepperdata does performance (for Big Data)
Data Points
Today’s talk will cover…
• How code translates to execution
• How to find common, known problems
• For the rest of the problems…
– Why debugging performance problems is hard
– Data elements needed for complete view of application
performance from separate tools
– Bringing these elements together in a single tool
Brief terminology about Spark
• An app contains
multiple jobs
• A job contains
multiple stages
• A stage contains
multiple tasks
• Executors run tasks
Example App
A word count app:
val textFile = sc.textFile("hdfs:/dict.txt")
val counts = textFile.flatMap(line => line.split(" "))
.map(word => (word, 1))
.reduceByKey(_ + _)
1. Declares input from
external storage
2. Specifies
3. Triggers an action
Distributed Architecture
Spark executes a job using
multiple machines.
Executor 1
Executor 2
Executor N
Sends tasks
Image source.
val textFile = sc.textFile("hdfs:/dict.txt")
val counts = textFile.flatMap(line => line.split(" "))
.map(word => (word, 1))
.reduceByKey(_ + _)
Shuffle and Re-partitioning
Image source.
Stages and Tasks in Example Job
Debugging known problems
The easier case…
Spark History Server
Spark History Server
Intro: Dr Elephant (MapReduce)
What does Dr. Elephant do?
• Performance monitoring and tuning service
• Finds common mistakes, indicates best practices
Spark Application Heuristics
Spark Application Heuristics
3 Classes of Spark Heuristics
• Configuration Settings
• Simple Alarms on Stage/Job Failure
• Data-Dependent Tuning Suggestions
Configuration Heuristic
• Display some basic config settings for your app
• Complain if some settings not explicitly set
• Recommend configuring an external shuffle
service (especially if dynamic allocation is
• These recommendations won’t change over
multiple runs of an application
Stages and Jobs Heuristics
• Simple alarms showing stage and job failure rates
• Good for seeing when there’s a problem
Executors Heuristic
• Looks at the distribution across executors of
several different metrics
• Outliers in these distributions probably indicate:
– Suboptimal partitioning.
– One or more slow executors due to external
circumstances (cluster weather)
Partitions Heuristic
• Ideally data for each task will fit into the RAM
available to that task.
• Sandy Ryza (once from Cloudera) has an
excellent blog on Spark tuning:
(observed shuffle write) * (observed shuffle spill memory) * (spark.executor.cores)
(observed shuffle spill disk) * (spark.executor.memory) * (spark.shuffle.memoryFraction) * (spark.shuffle.safetyFraction)
More Heuristics?
Yes, please! Dr. Elephant is open source.
Is there an enterprise version?
Pepperdata Application Profiler
• Benefits to our users:
– Provide simple answers to simple questions
– Combination of metrics for experts
– Simple actionable insights for all users
– Pepperdata support
• Why stay close to open source?
– Heuristics
Pepperdata Application Profiler
Debugging novel problems
The harder case…
2 reasons this is hard
Reason #1
Same external symptom (“too slow”), but many possible
• code
• data
• configuration
• cluster weather
Reason #2
Existing tools provide limited visibility
• Spark Web UI is the most popular
– Good view of query execution plan (job/stages/DAG)
– Limited view of aggregate performance data
• Time series
– Ganglia, Ambari, CM, etc provide time series data for cluster (but
not specific to Spark apps)
– Spark Sink metrics can be fed to InfluxDb/others, yielding partial
Spark app metrics
• Code execution not connected to resource consumption
• Load from other apps unaccounted
3 data elements form a complete picture
of Spark application performance
1. Code execution plan
– Indicates which block of code is being executed, where
2. Time series view
– Visual of resource consumption of application
– Outliers in resource usage very easy to detect
3. Cluster weather
– A view of all applications that run on the cluster
Spark Web UI
First half of solution
Logical code execution plan from Spark:
Jobs / Stages / DAG
Physical execution plan from Spark:
Executors / Tasks
Time series view
Second half of solution
Time series view of resource consumption
for the App
Bring them together
Best of both worlds
Code Analyzer = execution plan + time series
GC across all Stages of App
Let’s examine GC activity in Stage 4
Executor skew increased Stage duration 2x
Executor 6 does twice as much work: possible
solution increase number of partitions
What if it’s not your fault?
Cluster weather
How does cluster weather impact your app ?
No apparent reason for delay from Spark
Web UI
Time series shows slower run of app with
much lower resources
View cluster weather for slower run of app
Cluster weather reveals reason for CPU
constraints on slower app
Cluster weather reveals reason for
memory constraints on slower app
Cluster weather reveals reason for HDFS
constraints on slower app
Code Analyzer for Apache Spark
• Free during Early Access starting today
• Early Access is for development teams
• To learn more visit booth #101
Other performance tools mentioned
• Dr Elephant
• Application Profiler
To recap
• Use heuristics to find known problems
• Execution plan + time series = powerful visualization
• Knowing cluster weather can prevent time wasted
debugging performance “issues” that aren’t the app’s
Spark Summit Talk Plugs
Tuesday 11:40AM Connect Code to Resource Consumption to Scale Your
Production Spark Applications (Vinod @ Pepperdata)
Tuesday 12:50PM Kubernetes SIG Big Data Birds-of-a-Feather session
Tuesday 3:20PM Apache Spark on Kubernetes (Anirudh @ Google, Tim @
Wednesday 11:00AM HDFS on Kubernetes – Lessons Learned (Kimoon @
Wednesday 11:00AM Dr Elephant for Monitoring and Tuning Apache Spark Jobs
on Hadoop (Carl @ LinkedIn, Simon @ Pepperdata)
Thank You.

More Related Content

What's hot

Advanced Natural Language Processing with Apache Spark NLP
Advanced Natural Language Processing with Apache Spark NLPAdvanced Natural Language Processing with Apache Spark NLP
Advanced Natural Language Processing with Apache Spark NLP
Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...
Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...
Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...
Building a Unified Data Pipeline with Apache Spark and XGBoost with Nan Zhu
Building a Unified Data Pipeline with Apache Spark and XGBoost with Nan ZhuBuilding a Unified Data Pipeline with Apache Spark and XGBoost with Nan Zhu
Building a Unified Data Pipeline with Apache Spark and XGBoost with Nan Zhu
Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...
Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...
Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...
A Developer’s View into Spark's Memory Model with Wenchen Fan
A Developer’s View into Spark's Memory Model with Wenchen FanA Developer’s View into Spark's Memory Model with Wenchen Fan
A Developer’s View into Spark's Memory Model with Wenchen Fan
Improving Apache Spark for Dynamic Allocation and Spot Instances
Improving Apache Spark for Dynamic Allocation and Spot InstancesImproving Apache Spark for Dynamic Allocation and Spot Instances
Improving Apache Spark for Dynamic Allocation and Spot Instances
Jump Start on Apache® Spark™ 2.x with Databricks
Jump Start on Apache® Spark™ 2.x with Databricks Jump Start on Apache® Spark™ 2.x with Databricks
Jump Start on Apache® Spark™ 2.x with Databricks
Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)
Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)
Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)
Spark Summit
Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi
 Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi
Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi
Continuous Application with FAIR Scheduler with Robert Xue
Continuous Application with FAIR Scheduler with Robert XueContinuous Application with FAIR Scheduler with Robert Xue
Continuous Application with FAIR Scheduler with Robert Xue
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache Spark
Rahul Jain
Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...
Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...
Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...
Metrics-Driven Tuning of Apache Spark at Scale with Edwina Lu and Ye Zhou
Metrics-Driven Tuning of Apache Spark at Scale with Edwina Lu and Ye ZhouMetrics-Driven Tuning of Apache Spark at Scale with Edwina Lu and Ye Zhou
Metrics-Driven Tuning of Apache Spark at Scale with Edwina Lu and Ye Zhou
Transactional writes to cloud storage with Eric Liang
Transactional writes to cloud storage with Eric LiangTransactional writes to cloud storage with Eric Liang
Transactional writes to cloud storage with Eric Liang
Spark Tuning For Enterprise System Administrators, Spark Summit East 2016
Spark Tuning For Enterprise System Administrators, Spark Summit East 2016Spark Tuning For Enterprise System Administrators, Spark Summit East 2016
Spark Tuning For Enterprise System Administrators, Spark Summit East 2016
Anya Bida
Alpine academy apache spark series #1 introduction to cluster computing wit...
Alpine academy apache spark series #1   introduction to cluster computing wit...Alpine academy apache spark series #1   introduction to cluster computing wit...
Alpine academy apache spark series #1 introduction to cluster computing wit...
Holden Karau
Monitor Apache Spark 3 on Kubernetes using Metrics and Plugins
Monitor Apache Spark 3 on Kubernetes using Metrics and PluginsMonitor Apache Spark 3 on Kubernetes using Metrics and Plugins
Monitor Apache Spark 3 on Kubernetes using Metrics and Plugins
Spark Summit EU talk by Jim Dowling
Spark Summit EU talk by Jim DowlingSpark Summit EU talk by Jim Dowling
Spark Summit EU talk by Jim Dowling
Spark Summit
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...

What's hot (20)

Advanced Natural Language Processing with Apache Spark NLP
Advanced Natural Language Processing with Apache Spark NLPAdvanced Natural Language Processing with Apache Spark NLP
Advanced Natural Language Processing with Apache Spark NLP
Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...
Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...
Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...
Building a Unified Data Pipeline with Apache Spark and XGBoost with Nan Zhu
Building a Unified Data Pipeline with Apache Spark and XGBoost with Nan ZhuBuilding a Unified Data Pipeline with Apache Spark and XGBoost with Nan Zhu
Building a Unified Data Pipeline with Apache Spark and XGBoost with Nan Zhu
Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...
Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...
Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...
A Developer’s View into Spark's Memory Model with Wenchen Fan
A Developer’s View into Spark's Memory Model with Wenchen FanA Developer’s View into Spark's Memory Model with Wenchen Fan
A Developer’s View into Spark's Memory Model with Wenchen Fan
Improving Apache Spark for Dynamic Allocation and Spot Instances
Improving Apache Spark for Dynamic Allocation and Spot InstancesImproving Apache Spark for Dynamic Allocation and Spot Instances
Improving Apache Spark for Dynamic Allocation and Spot Instances
Jump Start on Apache® Spark™ 2.x with Databricks
Jump Start on Apache® Spark™ 2.x with Databricks Jump Start on Apache® Spark™ 2.x with Databricks
Jump Start on Apache® Spark™ 2.x with Databricks
Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)
Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)
Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)
Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi
 Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi
Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi
Continuous Application with FAIR Scheduler with Robert Xue
Continuous Application with FAIR Scheduler with Robert XueContinuous Application with FAIR Scheduler with Robert Xue
Continuous Application with FAIR Scheduler with Robert Xue
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache Spark
Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...
Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...
Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...
Metrics-Driven Tuning of Apache Spark at Scale with Edwina Lu and Ye Zhou
Metrics-Driven Tuning of Apache Spark at Scale with Edwina Lu and Ye ZhouMetrics-Driven Tuning of Apache Spark at Scale with Edwina Lu and Ye Zhou
Metrics-Driven Tuning of Apache Spark at Scale with Edwina Lu and Ye Zhou
Transactional writes to cloud storage with Eric Liang
Transactional writes to cloud storage with Eric LiangTransactional writes to cloud storage with Eric Liang
Transactional writes to cloud storage with Eric Liang
Spark Tuning For Enterprise System Administrators, Spark Summit East 2016
Spark Tuning For Enterprise System Administrators, Spark Summit East 2016Spark Tuning For Enterprise System Administrators, Spark Summit East 2016
Spark Tuning For Enterprise System Administrators, Spark Summit East 2016
Alpine academy apache spark series #1 introduction to cluster computing wit...
Alpine academy apache spark series #1   introduction to cluster computing wit...Alpine academy apache spark series #1   introduction to cluster computing wit...
Alpine academy apache spark series #1 introduction to cluster computing wit...
Monitor Apache Spark 3 on Kubernetes using Metrics and Plugins
Monitor Apache Spark 3 on Kubernetes using Metrics and PluginsMonitor Apache Spark 3 on Kubernetes using Metrics and Plugins
Monitor Apache Spark 3 on Kubernetes using Metrics and Plugins
Spark Summit EU talk by Jim Dowling
Spark Summit EU talk by Jim DowlingSpark Summit EU talk by Jim Dowling
Spark Summit EU talk by Jim Dowling
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...

Similar to Apache Spark Performance is too hard. Let's make it easier

Connect Code to Resource Consumption to Scale Your Production Spark Applicati...
Connect Code to Resource Consumption to Scale Your Production Spark Applicati...Connect Code to Resource Consumption to Scale Your Production Spark Applicati...
Connect Code to Resource Consumption to Scale Your Production Spark Applicati...
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingTiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Paco Nathan
Spark Development Lifecycle at Workday - ApacheCon 2020
Spark Development Lifecycle at Workday - ApacheCon 2020Spark Development Lifecycle at Workday - ApacheCon 2020
Spark Development Lifecycle at Workday - ApacheCon 2020
Pavel Hardak
Apache Spark Development Lifecycle @ Workday - ApacheCon 2020
Apache Spark Development Lifecycle @ Workday - ApacheCon 2020Apache Spark Development Lifecycle @ Workday - ApacheCon 2020
Apache Spark Development Lifecycle @ Workday - ApacheCon 2020
Eren Avşaroğulları
Highlights and Challenges from Running Spark on Mesos in Production by Morri ...
Highlights and Challenges from Running Spark on Mesos in Production by Morri ...Highlights and Challenges from Running Spark on Mesos in Production by Morri ...
Highlights and Challenges from Running Spark on Mesos in Production by Morri ...
Spark Summit
Teaching Apache Spark: Demonstrations on the Databricks Cloud Platform
Teaching Apache Spark: Demonstrations on the Databricks Cloud PlatformTeaching Apache Spark: Demonstrations on the Databricks Cloud Platform
Teaching Apache Spark: Demonstrations on the Databricks Cloud Platform
Yao Yao
Apache Spark: What's under the hood
Apache Spark: What's under the hoodApache Spark: What's under the hood
Apache Spark: What's under the hood
Adarsh Pannu
Introduction to Apache Spark :: Lagos Scala Meetup session 2
Introduction to Apache Spark :: Lagos Scala Meetup session 2 Introduction to Apache Spark :: Lagos Scala Meetup session 2
Introduction to Apache Spark :: Lagos Scala Meetup session 2
Olalekan Fuad Elesin
From Pipelines to Refineries: scaling big data applications with Tim Hunter
From Pipelines to Refineries: scaling big data applications with Tim HunterFrom Pipelines to Refineries: scaling big data applications with Tim Hunter
From Pipelines to Refineries: scaling big data applications with Tim Hunter
Taboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache SparkTaboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache Spark
Hands on with Apache Spark
Hands on with Apache SparkHands on with Apache Spark
Hands on with Apache Spark
Dan Lynn
Unit testing of spark applications
Unit testing of spark applicationsUnit testing of spark applications
Unit testing of spark applications
Knoldus Inc.
Taboola's experience with Apache Spark (presentation @ Reversim 2014)
Taboola's experience with Apache Spark (presentation @ Reversim 2014)Taboola's experience with Apache Spark (presentation @ Reversim 2014)
Taboola's experience with Apache Spark (presentation @ Reversim 2014)
Headaches and Breakthroughs in Building Continuous Applications
Headaches and Breakthroughs in Building Continuous ApplicationsHeadaches and Breakthroughs in Building Continuous Applications
Headaches and Breakthroughs in Building Continuous Applications
Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...
Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...
Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...
Landon Robinson
Spark Uber Development Kit
Spark Uber Development KitSpark Uber Development Kit
Spark Uber Development Kit
DataWorks Summit/Hadoop Summit
Typesafe spark- Zalando meetup
Typesafe spark- Zalando meetupTypesafe spark- Zalando meetup
Typesafe spark- Zalando meetup
Stavros Kontopoulos
Introduction to Apache Spark
Introduction to Apache Spark Introduction to Apache Spark
Introduction to Apache Spark
Juan Pedro Moreno
Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
 Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F... Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...

Similar to Apache Spark Performance is too hard. Let's make it easier (20)

Connect Code to Resource Consumption to Scale Your Production Spark Applicati...
Connect Code to Resource Consumption to Scale Your Production Spark Applicati...Connect Code to Resource Consumption to Scale Your Production Spark Applicati...
Connect Code to Resource Consumption to Scale Your Production Spark Applicati...
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingTiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Spark Development Lifecycle at Workday - ApacheCon 2020
Spark Development Lifecycle at Workday - ApacheCon 2020Spark Development Lifecycle at Workday - ApacheCon 2020
Spark Development Lifecycle at Workday - ApacheCon 2020
Apache Spark Development Lifecycle @ Workday - ApacheCon 2020
Apache Spark Development Lifecycle @ Workday - ApacheCon 2020Apache Spark Development Lifecycle @ Workday - ApacheCon 2020
Apache Spark Development Lifecycle @ Workday - ApacheCon 2020
Highlights and Challenges from Running Spark on Mesos in Production by Morri ...
Highlights and Challenges from Running Spark on Mesos in Production by Morri ...Highlights and Challenges from Running Spark on Mesos in Production by Morri ...
Highlights and Challenges from Running Spark on Mesos in Production by Morri ...
Teaching Apache Spark: Demonstrations on the Databricks Cloud Platform
Teaching Apache Spark: Demonstrations on the Databricks Cloud PlatformTeaching Apache Spark: Demonstrations on the Databricks Cloud Platform
Teaching Apache Spark: Demonstrations on the Databricks Cloud Platform
Apache Spark: What's under the hood
Apache Spark: What's under the hoodApache Spark: What's under the hood
Apache Spark: What's under the hood
Introduction to Apache Spark :: Lagos Scala Meetup session 2
Introduction to Apache Spark :: Lagos Scala Meetup session 2 Introduction to Apache Spark :: Lagos Scala Meetup session 2
Introduction to Apache Spark :: Lagos Scala Meetup session 2
From Pipelines to Refineries: scaling big data applications with Tim Hunter
From Pipelines to Refineries: scaling big data applications with Tim HunterFrom Pipelines to Refineries: scaling big data applications with Tim Hunter
From Pipelines to Refineries: scaling big data applications with Tim Hunter
Taboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache SparkTaboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache Spark
Hands on with Apache Spark
Hands on with Apache SparkHands on with Apache Spark
Hands on with Apache Spark
Unit testing of spark applications
Unit testing of spark applicationsUnit testing of spark applications
Unit testing of spark applications
Taboola's experience with Apache Spark (presentation @ Reversim 2014)
Taboola's experience with Apache Spark (presentation @ Reversim 2014)Taboola's experience with Apache Spark (presentation @ Reversim 2014)
Taboola's experience with Apache Spark (presentation @ Reversim 2014)
Headaches and Breakthroughs in Building Continuous Applications
Headaches and Breakthroughs in Building Continuous ApplicationsHeadaches and Breakthroughs in Building Continuous Applications
Headaches and Breakthroughs in Building Continuous Applications
Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...
Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...
Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...
Spark Uber Development Kit
Spark Uber Development KitSpark Uber Development Kit
Spark Uber Development Kit
Typesafe spark- Zalando meetup
Typesafe spark- Zalando meetupTypesafe spark- Zalando meetup
Typesafe spark- Zalando meetup
Introduction to Apache Spark
Introduction to Apache Spark Introduction to Apache Spark
Introduction to Apache Spark
Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
 Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F... Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...

More from Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake

Recently uploaded

CrushFTP PC Software - WhizNews
CrushFTP PC Software - WhizNewsCrushFTP PC Software - WhizNews
CrushFTP PC Software - WhizNews
Eman Nisar
BitLocker Data Recovery | BLR Tools Data Recovery Solutions
BitLocker Data Recovery | BLR Tools Data Recovery SolutionsBitLocker Data Recovery | BLR Tools Data Recovery Solutions
BitLocker Data Recovery | BLR Tools Data Recovery Solutions
Alina Tait
Unlocking the Future of Artificial Intelligence
Unlocking the Future of Artificial IntelligenceUnlocking the Future of Artificial Intelligence
Unlocking the Future of Artificial Intelligence
Unlocking value with event-driven architecture by Confluent
Unlocking value with event-driven architecture by ConfluentUnlocking value with event-driven architecture by Confluent
Unlocking value with event-driven architecture by Confluent
Crowd Strike\Windows Update Issue: Overview and Current Status
Crowd Strike\Windows Update Issue: Overview and Current StatusCrowd Strike\Windows Update Issue: Overview and Current Status
Crowd Strike\Windows Update Issue: Overview and Current Status
06. Ruby Array & Hash - Ruby Core Teaching
06. Ruby Array & Hash - Ruby Core Teaching06. Ruby Array & Hash - Ruby Core Teaching
06. Ruby Array & Hash - Ruby Core Teaching
Understanding Automated Testing Tools for Web Applications.pdf
Understanding Automated Testing Tools for Web Applications.pdfUnderstanding Automated Testing Tools for Web Applications.pdf
Understanding Automated Testing Tools for Web Applications.pdf
BDRSuite - #1 Cost effective Data Backup and Recovery Solution
BDRSuite - #1 Cost effective Data Backup and Recovery SolutionBDRSuite - #1 Cost effective Data Backup and Recovery Solution
BDRSuite - #1 Cost effective Data Backup and Recovery Solution
240717 ProPILE - Probing Privacy Leakage in Large Language Models.pdf
240717 ProPILE - Probing Privacy Leakage in Large Language Models.pdf240717 ProPILE - Probing Privacy Leakage in Large Language Models.pdf
240717 ProPILE - Probing Privacy Leakage in Large Language Models.pdf
CS Kwak
01. Ruby Introduction - Ruby Core Teaching
01. Ruby Introduction - Ruby Core Teaching01. Ruby Introduction - Ruby Core Teaching
01. Ruby Introduction - Ruby Core Teaching
Old Tools, New Tricks: Unleashing the Power of Time-Tested Testing Tools
Old Tools, New Tricks: Unleashing the Power of Time-Tested Testing ToolsOld Tools, New Tricks: Unleashing the Power of Time-Tested Testing Tools
Old Tools, New Tricks: Unleashing the Power of Time-Tested Testing Tools
Benjamin Bischoff
The Politics of Agile Development.pptx
The  Politics of  Agile Development.pptxThe  Politics of  Agile Development.pptx
The Politics of Agile Development.pptx
Literals - A Machine Independent Feature
Literals - A Machine Independent FeatureLiterals - A Machine Independent Feature
Literals - A Machine Independent Feature
Monitoring the Execution of 14K Tests: Methods Tend to Have One Path that Is ...
Monitoring the Execution of 14K Tests: Methods Tend to Have One Path that Is ...Monitoring the Execution of 14K Tests: Methods Tend to Have One Path that Is ...
Monitoring the Execution of 14K Tests: Methods Tend to Have One Path that Is ...
Andre Hora
PathSpotter: Exploring Tested Paths to Discover Missing Tests (FSE 2024)
PathSpotter: Exploring Tested Paths to Discover Missing Tests (FSE 2024)PathSpotter: Exploring Tested Paths to Discover Missing Tests (FSE 2024)
PathSpotter: Exploring Tested Paths to Discover Missing Tests (FSE 2024)
Andre Hora
AI-driven Automation_ Transforming DevOps Practices.docx
AI-driven Automation_ Transforming DevOps Practices.docxAI-driven Automation_ Transforming DevOps Practices.docx
AI-driven Automation_ Transforming DevOps Practices.docx
Top 10 ERP Companies in UAE Banibro IT Solutions.pdf
Top 10 ERP Companies in UAE Banibro IT Solutions.pdfTop 10 ERP Companies in UAE Banibro IT Solutions.pdf
Top 10 ERP Companies in UAE Banibro IT Solutions.pdf
Banibro IT Solutions
iBirds Services - Comprehensive Salesforce CRM and Software Development Solut...
iBirds Services - Comprehensive Salesforce CRM and Software Development Solut...iBirds Services - Comprehensive Salesforce CRM and Software Development Solut...
iBirds Services - Comprehensive Salesforce CRM and Software Development Solut...
09. Ruby Object Oriented Programming - Ruby Core Teaching
09. Ruby Object Oriented Programming - Ruby Core Teaching09. Ruby Object Oriented Programming - Ruby Core Teaching
09. Ruby Object Oriented Programming - Ruby Core Teaching
vSAN_Tutorial_Presentation with important topics
vSAN_Tutorial_Presentation with important  topicsvSAN_Tutorial_Presentation with important  topics
vSAN_Tutorial_Presentation with important topics

Recently uploaded (20)

CrushFTP PC Software - WhizNews
CrushFTP PC Software - WhizNewsCrushFTP PC Software - WhizNews
CrushFTP PC Software - WhizNews
BitLocker Data Recovery | BLR Tools Data Recovery Solutions
BitLocker Data Recovery | BLR Tools Data Recovery SolutionsBitLocker Data Recovery | BLR Tools Data Recovery Solutions
BitLocker Data Recovery | BLR Tools Data Recovery Solutions
Unlocking the Future of Artificial Intelligence
Unlocking the Future of Artificial IntelligenceUnlocking the Future of Artificial Intelligence
Unlocking the Future of Artificial Intelligence
Unlocking value with event-driven architecture by Confluent
Unlocking value with event-driven architecture by ConfluentUnlocking value with event-driven architecture by Confluent
Unlocking value with event-driven architecture by Confluent
Crowd Strike\Windows Update Issue: Overview and Current Status
Crowd Strike\Windows Update Issue: Overview and Current StatusCrowd Strike\Windows Update Issue: Overview and Current Status
Crowd Strike\Windows Update Issue: Overview and Current Status
06. Ruby Array & Hash - Ruby Core Teaching
06. Ruby Array & Hash - Ruby Core Teaching06. Ruby Array & Hash - Ruby Core Teaching
06. Ruby Array & Hash - Ruby Core Teaching
Understanding Automated Testing Tools for Web Applications.pdf
Understanding Automated Testing Tools for Web Applications.pdfUnderstanding Automated Testing Tools for Web Applications.pdf
Understanding Automated Testing Tools for Web Applications.pdf
BDRSuite - #1 Cost effective Data Backup and Recovery Solution
BDRSuite - #1 Cost effective Data Backup and Recovery SolutionBDRSuite - #1 Cost effective Data Backup and Recovery Solution
BDRSuite - #1 Cost effective Data Backup and Recovery Solution
240717 ProPILE - Probing Privacy Leakage in Large Language Models.pdf
240717 ProPILE - Probing Privacy Leakage in Large Language Models.pdf240717 ProPILE - Probing Privacy Leakage in Large Language Models.pdf
240717 ProPILE - Probing Privacy Leakage in Large Language Models.pdf
01. Ruby Introduction - Ruby Core Teaching
01. Ruby Introduction - Ruby Core Teaching01. Ruby Introduction - Ruby Core Teaching
01. Ruby Introduction - Ruby Core Teaching
Old Tools, New Tricks: Unleashing the Power of Time-Tested Testing Tools
Old Tools, New Tricks: Unleashing the Power of Time-Tested Testing ToolsOld Tools, New Tricks: Unleashing the Power of Time-Tested Testing Tools
Old Tools, New Tricks: Unleashing the Power of Time-Tested Testing Tools
The Politics of Agile Development.pptx
The  Politics of  Agile Development.pptxThe  Politics of  Agile Development.pptx
The Politics of Agile Development.pptx
Literals - A Machine Independent Feature
Literals - A Machine Independent FeatureLiterals - A Machine Independent Feature
Literals - A Machine Independent Feature
Monitoring the Execution of 14K Tests: Methods Tend to Have One Path that Is ...
Monitoring the Execution of 14K Tests: Methods Tend to Have One Path that Is ...Monitoring the Execution of 14K Tests: Methods Tend to Have One Path that Is ...
Monitoring the Execution of 14K Tests: Methods Tend to Have One Path that Is ...
PathSpotter: Exploring Tested Paths to Discover Missing Tests (FSE 2024)
PathSpotter: Exploring Tested Paths to Discover Missing Tests (FSE 2024)PathSpotter: Exploring Tested Paths to Discover Missing Tests (FSE 2024)
PathSpotter: Exploring Tested Paths to Discover Missing Tests (FSE 2024)
AI-driven Automation_ Transforming DevOps Practices.docx
AI-driven Automation_ Transforming DevOps Practices.docxAI-driven Automation_ Transforming DevOps Practices.docx
AI-driven Automation_ Transforming DevOps Practices.docx
Top 10 ERP Companies in UAE Banibro IT Solutions.pdf
Top 10 ERP Companies in UAE Banibro IT Solutions.pdfTop 10 ERP Companies in UAE Banibro IT Solutions.pdf
Top 10 ERP Companies in UAE Banibro IT Solutions.pdf
iBirds Services - Comprehensive Salesforce CRM and Software Development Solut...
iBirds Services - Comprehensive Salesforce CRM and Software Development Solut...iBirds Services - Comprehensive Salesforce CRM and Software Development Solut...
iBirds Services - Comprehensive Salesforce CRM and Software Development Solut...
09. Ruby Object Oriented Programming - Ruby Core Teaching
09. Ruby Object Oriented Programming - Ruby Core Teaching09. Ruby Object Oriented Programming - Ruby Core Teaching
09. Ruby Object Oriented Programming - Ruby Core Teaching
vSAN_Tutorial_Presentation with important topics
vSAN_Tutorial_Presentation with important  topicsvSAN_Tutorial_Presentation with important  topics
vSAN_Tutorial_Presentation with important topics

Apache Spark Performance is too hard. Let's make it easier

  • 1. Sean Suchter CTO @ Pepperdata Spark performance is too hard, let’s make it easier
  • 2. Pepperdata does performance (for Big Data) 15 Thousand Production Nodes 50 Million Jobs/Year 200 Trillion Performance Data Points
  • 3. Today’s talk will cover… • How code translates to execution • How to find common, known problems • For the rest of the problems… – Why debugging performance problems is hard – Data elements needed for complete view of application performance from separate tools – Bringing these elements together in a single tool
  • 4. Brief terminology about Spark • An app contains multiple jobs • A job contains multiple stages • A stage contains multiple tasks • Executors run tasks
  • 5. Example App A word count app: val textFile = sc.textFile("hdfs:/dict.txt") val counts = textFile.flatMap(line => line.split(" ")) .map(word => (word, 1)) .reduceByKey(_ + _) counts.saveAsTextFile("hdfs:/wordcounts.txt") 1. Declares input from external storage 2. Specifies transformations 3. Triggers an action
  • 6. Distributed Architecture Spark executes a job using multiple machines. Spark Driver process Spark Executor 1 process Spark Executor 2 process Spark Executor N process Sends tasks
  • 7. Stages Image source. val textFile = sc.textFile("hdfs:/dict.txt") val counts = textFile.flatMap(line => line.split(" ")) .map(word => (word, 1)) .reduceByKey(_ + _) counts.saveAsTextFile("hdfs:/wordcounts.txt")
  • 9. Stages and Tasks in Example Job Task 0 Task 1 Task n Task n+m Task n+1 Task n+2
  • 13. Intro: Dr Elephant (MapReduce)
  • 14. What does Dr. Elephant do? • Performance monitoring and tuning service • Finds common mistakes, indicates best practices 14
  • 17. 3 Classes of Spark Heuristics • Configuration Settings • Simple Alarms on Stage/Job Failure • Data-Dependent Tuning Suggestions 17
  • 18. Configuration Heuristic • Display some basic config settings for your app • Complain if some settings not explicitly set • Recommend configuring an external shuffle service (especially if dynamic allocation is enabled) • These recommendations won’t change over multiple runs of an application 18
  • 19. Stages and Jobs Heuristics • Simple alarms showing stage and job failure rates • Good for seeing when there’s a problem 19
  • 20. Executors Heuristic • Looks at the distribution across executors of several different metrics • Outliers in these distributions probably indicate: – Suboptimal partitioning. – One or more slow executors due to external circumstances (cluster weather) 20
  • 21. Partitions Heuristic • Ideally data for each task will fit into the RAM available to that task. • Sandy Ryza (once from Cloudera) has an excellent blog on Spark tuning: (observed shuffle write) * (observed shuffle spill memory) * (spark.executor.cores) (observed shuffle spill disk) * (spark.executor.memory) * (spark.shuffle.memoryFraction) * (spark.shuffle.safetyFraction) 21
  • 22. More Heuristics? Yes, please! Dr. Elephant is open source. 22
  • 23. Is there an enterprise version?
  • 24. Pepperdata Application Profiler • Benefits to our users: – Provide simple answers to simple questions – Combination of metrics for experts – Simple actionable insights for all users – Pepperdata support • Why stay close to open source? – Heuristics 24
  • 27. 2 reasons this is hard
  • 28. Reason #1 Same external symptom (“too slow”), but many possible causes: • code • data • configuration • cluster weather
  • 29. Reason #2 Existing tools provide limited visibility • Spark Web UI is the most popular – Good view of query execution plan (job/stages/DAG) – Limited view of aggregate performance data • Time series – Ganglia, Ambari, CM, etc provide time series data for cluster (but not specific to Spark apps) – Spark Sink metrics can be fed to InfluxDb/others, yielding partial Spark app metrics • Code execution not connected to resource consumption • Load from other apps unaccounted
  • 30. 3 data elements form a complete picture of Spark application performance 1. Code execution plan – Indicates which block of code is being executed, where 2. Time series view – Visual of resource consumption of application – Outliers in resource usage very easy to detect 3. Cluster weather – A view of all applications that run on the cluster
  • 31. Spark Web UI First half of solution
  • 32. Logical code execution plan from Spark: Jobs / Stages / DAG
  • 33. Physical execution plan from Spark: Executors / Tasks
  • 34. Time series view Second half of solution
  • 35. Time series view of resource consumption for the App
  • 36. Bring them together Best of both worlds
  • 37. Code Analyzer = execution plan + time series
  • 38. GC across all Stages of App
  • 39. Let’s examine GC activity in Stage 4
  • 40. Executor skew increased Stage duration 2x
  • 41. Executor 6 does twice as much work: possible solution increase number of partitions
  • 42. What if it’s not your fault? Cluster weather
  • 43. How does cluster weather impact your app ?
  • 44. No apparent reason for delay from Spark Web UI
  • 45. Time series shows slower run of app with much lower resources
  • 46. View cluster weather for slower run of app
  • 47. Cluster weather reveals reason for CPU constraints on slower app
  • 48. Cluster weather reveals reason for memory constraints on slower app
  • 49. Cluster weather reveals reason for HDFS constraints on slower app
  • 50. Code Analyzer for Apache Spark • Free during Early Access starting today • Early Access is for development teams • To learn more visit booth #101 •
  • 51. Other performance tools mentioned • Dr Elephant – • Application Profiler –
  • 52. To recap • Use heuristics to find known problems • Execution plan + time series = powerful visualization • Knowing cluster weather can prevent time wasted debugging performance “issues” that aren’t the app’s fault
  • 53. Spark Summit Talk Plugs Tuesday 11:40AM Connect Code to Resource Consumption to Scale Your Production Spark Applications (Vinod @ Pepperdata) Tuesday 12:50PM Kubernetes SIG Big Data Birds-of-a-Feather session (many) Tuesday 3:20PM Apache Spark on Kubernetes (Anirudh @ Google, Tim @ Hyperpilot) Wednesday 11:00AM HDFS on Kubernetes – Lessons Learned (Kimoon @ Pepperdata) Wednesday 11:00AM Dr Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop (Carl @ LinkedIn, Simon @ Pepperdata)