SlideShare a Scribd company logo
Jiaqi Tan, Soila Pertet, Xinghao Pan, Mike Kasick, Keith Bare, Eugene Marinelli, Rajeev Gandhi Priya Narasimhan Carnegie Mellon University
Automated Problem Diagnosis Diagnosing problems Creates major headaches for administrators Worsens as scale and system complexity grows Goal: automate it and get proactive Failure detection and prediction Problem determination (or “fingerpointing”) Problem visualization How: Instrumentation plus statistical analysis  Priya Narasimhan  ©  Oct 25, 2009 Carnegie Mellon University
Challenges in Problem Analysis  Challenging in large-scale networked environment Can have multiple failure manifestations with a single root cause Can have multiple root causes for a single failure manifestation Problems and/or their manifestations can “travel” among communicating components A lot of information from multiple sources – what to use? what to discard? Automated fingerpointing Automatically discover faulty node in a distributed system Priya Narasimhan  ©  Oct 25, 2009 Carnegie Mellon University
Exploration of Fingerpointing Current explorations Hadoop  Open-source implementation of Map/Reduce (Yahoo!) PVFS  High-performance file system (Argonne National Labs) Lustre High-performance file system (Sun Microsystems) Studied  Various types of problems  Various kinds of instrumentation Various kinds of data-analysis techniques Priya Narasimhan  ©  Oct 25, 2009 Carnegie Mellon University
Why? Hadoop is fault-tolerant Heartbeats: detect lost nodes Speculative re-execution: recover work due to lost/laggard nodes Hadoop’s fault-tolerance can mask performance problems Nodes alive but slow Target failures for our diagnosis Performance degradations (slow, hangs) Priya Narasimhan  ©  Oct 25, 2009 Carnegie Mellon University
Hadoop Failure Survey Hadoop Issue Tracker from Jan 07 to Dec 08 https://issues.apache.org/jira Targeted Failures: 66% Priya Narasimhan  ©  Oct 25, 2009 Carnegie Mellon University
Hadoop Mailing List Survey Hadoop user’s mailing list: Oct 08 to Apr 09 Examined queries on optimizing programs Most queries: MapReduce-specific aspects e.g. data skew, number of maps and reduces Priya Narasimhan  ©  Oct 25, 2009 Carnegie Mellon University
M45 Job Performance Priya Narasimhan  ©  Oct 25, 2009 Carnegie Mellon University Failures due to bad config (e.g., missing files) detected quickly However,  20-30% of failed jobs run for over 1hr before  aborting Early fault detection desirable
BEFORE : Hadoop Web Console Admin/user sifts through wealth of information Problem is aggravated in large clusters Multiple clicks to chase down a problem No support for historical comparison Information displayed is a snapshot in time Poor localization of correlated problems Progress indicators for all tasks are skewed by correlated problem No clear indicators of performance problems Are task slowdowns due to data skew or bugs? Unclear Priya Narasimhan  ©  Oct 25, 2009 Carnegie Mellon University
AFTER : Goals, Non-Goals Diagnose faulty Master/Slave node to user/admin  Target production environment Don’t instrument Hadoop or applications additionally Use Hadoop logs as-is ( white-box strategy )  Use OS-level metrics ( black-box strategy )  Work for various workloads and under workload changes Support online and offline diagnosis Non-goals (for now) Tracing problem to offending line of code Priya Narasimhan  ©  Oct 25, 2009 Carnegie Mellon University
Target Hadoop Clusters 4000-processor Yahoo!’s M45 cluster Production environment (managed by Yahoo!) Offered to CMU as free cloud-computing resource Diverse kinds of real workloads, problems in the wild Massive machine-learning, language/machine-translation Permission to harvest all logs and OS data each week 100-node Amazon’s EC2 cluster Production environment (managed by Amazon) Commercial, pay-as-you-use cloud-computing resource Workloads under our control, problems injected by us gridmix, nutch, pig, sort, randwriter Can harvest logs and OS data of only our workloads Priya Narasimhan  ©  Oct 25, 2009 Carnegie Mellon University
Performance Problems Studied Priya Narasimhan  ©  Oct 25, 2009 Carnegie Mellon University Studied Hadoop Issue Tracker (JIRA) from Jan-Dec 2007 Fault Description Resource contention CPU hog External process uses 70% of CPU Packet-loss  5% or 50% of incoming packets dropped Disk hog 20GB file repeatedly written to Disk full Disk full Application bugs  Source: Hadoop JIRA HADOOP-1036 Maps hang due to unhandled exception HADOOP-1152 Reduces fail while copying map output HADOOP-2080 Reduces fail due to incorrect checksum  HADOOP-2051 Jobs hang due to unhandled exception HADOOP-1255 Infinite loop at Nameode
Hadoop: Instrumentation Priya Narasimhan  ©  Oct 25, 2009 Carnegie Mellon University JobTracker NameNode TaskTracker DataNode Map/Reduce tasks HDFS blocks MASTER NODE SLAVE NODES Hadoop logs OS data OS data Hadoop logs
How About Those Metrics? White-box  metrics (from Hadoop logs) Event-driven (based on Hadoop’s activities) Durations Map-task durations, Reduce-task durations, ReduceCopy-durations,  etc.  System-wide  dependencies  between tasks and data blocks Heartbeat  information: Heartbeat rates, Heartbeat-timestamp skew between the Master and Slave nodes Black-box   metrics (from OS /proc) 64 different time-driven metrics (sampled every second) Memory used, context-switch rate, User-CPU usage, System-CPU usage, I/O wait time, run-queue size, number of bytes transmitted, number of bytes received, pages in, pages out, page faults Priya Narasimhan  ©  Oct 25, 2009 Carnegie Mellon University
Intuition for Diagnosis Slave nodes are doing approximately similar things for a given job Gather metrics and extract statistics Determine metrics of relevance For both black-box and white-box data Peer-compare histograms, means, etc. to determine “odd-man out” Priya Narasimhan  ©  Oct 25, 2009 Carnegie Mellon University
Log-Analysis Approach S ALSA:  A nalyzing  L ogs as  S t A te Machines [ USENIX WASL 2008 ] Extract state-machine views of execution from Hadoop logs Distributed control-flow view of logs Distributed data-flow view of logs Diagnose failures based on statistics of these extracted views Control-flow based diagnosis Control-flow + data-flow based diagnosis Perform analysis incrementally so that we can support it online  Carnegie Mellon University Priya Narasimhan  ©  Oct 25, 2009
Applying SALSA to Hadoop Priya Narasimhan  ©  Oct 25, 2009 Carnegie Mellon University [ t] Launch Map task : [t] Copy Map outputs : [t] Map task done Map outputs to Reduce tasks on other nodes Data-flow view: transfer of data to other nodes [ t] Launch Reduce task : [t] Reduce is idling, waiting for Map outputs  : [t] Repeat until all Map outputs copied [t] Start Reduce Copy (of completed Map output) : [t] Finish Reduce Copy [t] Reduce Merge Copy Incoming Map outputs for this Reduce task Control-flow view: state orders, durations
Distributed Control+Data Flow Distributed control-flow Causal flow of task execution across cluster nodes, i.e., Reduces waiting on Maps via Shuffles Distributed data-flow Data paths of Map outputs shuffled to Reduces HDFS data blocks read into and written out of jobs Job-centric data-flows : Fused Control+Data Flows Correlate paths of data and execution Create conjoined causal paths from data source before, to data destination after, processing <your name here>  ©  Oct 25, 2009 http://www.pdl.cmu.edu/
Intuition: Peer Similarity Oct 25, 2009 Carnegie Mellon University In fault-free conditions, metrics (e.g., WriteBlock durations) are similar across nodes Faulty node: Same metric is different on faulty node, as compared to non-faulty nodes Kullback-Leibler divergence (comparison of histograms) Faulty node Normalized counts (total 1.0) Histograms (distributions) of durations of  WriteBlock  over a 30-second window Normal node Normal node Normalized counts (total 1.0) Normalized counts (total 1.0)
What Else Do We Do? Analyze black-box data with similar intuition Derive PDFs and use a clustering approach Distinct behavior profiles of metric correlations Compare them across nodes Technique called Ganesha [ HotMetrics 2009 ] Analyze heartbeat traffic Compare heartbeat durations across nodes Compare heartbeat-timestamp skews across nodes Different metrics, different viewpoints, different algorithms Priya Narasimhan  ©  Oct 25, 2009 Carnegie Mellon University
Priya Narasimhan  ©  Oct 25, 2009 Carnegie Mellon University
Putting the Elephant Together Priya Narasimhan  ©  Oct 25, 2009 Carnegie Mellon University TaskTracker heartbeat timestamps Black-box resource usage JobTracker Durations views TaskTracker Durations views JobTracker heartbeat timestamps Job-centric data flows BliMEy:  Bli nd  Me n and the  E lephant Framework [ CMU-CS-09-135  ]
Visualization To uncover Hadoop’s execution in an insightful way To reveal outcome of diagnosis on sight To allow developers/admins to get a handle as the system scales Value to programmers [ HotCloud 2009 ] Allows them to spot issues that might assist them in restructuring their code Allows them to spot faulty nodes Priya Narasimhan  ©  Oct 25, 2009 Carnegie Mellon University
Visualization ( timeseries )  Priya Narasimhan  ©  Oct 25, 2009 Carnegie Mellon University DiskHog on slave node visible through lower  heartbeat rate for that node
Visualization( heatmaps )  Priya Narasimhan  ©  Oct 25, 2009 Carnegie Mellon University CPU Hog on node 1 visible on Map-task durations
Visualizations ( swimlanes )  Priya Narasimhan  ©  Oct 25, 2009 Carnegie Mellon University Long-tailed map Delaying overall job completion time
MIROS MIROS (Map Inputs, Reduce Outputs, Shuffles) Aggregates data volumes across: All tasks per node  Entire job Shows skewed data flows, bottlenecks Jiaqi Tan  © July 09 http://www.pdl.cmu.edu/
Current Developments State-machine extraction + visualization being implemented for the Hadoop Chukwa project Collaboration with Yahoo! Web-based visualization widgets for HICC (Hadoop Infrastructure Care Center) “ Swimlanes” currently available in Chukwa trunk (CHUKWA-94) <your name here>  ©  Oct 25, 2009 http://www.pdl.cmu.edu/
Briefly: Online Fingerpointing ASDF :  A utomated  S ystem for  D iagnosing  F ailures Can incorporate any number of different data sources Can use any number of analysis techniques to process this data Can support online or offline analyses for Hadoop Currently plugging in our white-box & black-box algorithms  Priya Narasimhan  ©  Oct 25, 2009 Carnegie Mellon University
Hard Problems Understanding the limits of black-box fingerpointing What failures are outside the reach of a black-box approach? What are the limits of “peer” comparison?  What other kinds of black-box instrumentation exist?  Scalability Scaling  to run across large systems and understanding “growing pains” Visualization Helping system administrators visualize problem diagnosis Trade-offs More instrumentation and more frequent data can improve accuracy of diagnosis, but at what performance cost? Virtualized environments Do these environments help/hurt problem diagnosis?  Priya Narasimhan  ©  Oct 25, 2009 Carnegie Mellon University
Summary Automated problem diagnosis Current targets: Hadoop, PVFS, Lustre Initial set of failures  Real-world bug databases, problems in the wild Short-term: Transition techniques into Hadoop code-base working with Yahoo! Long-term Scalability, scalability, scalability, ….  Expand fault study Improve visualization, working with users Additional details USENIX WASL 2008  (white-box log analysis) USENIX HotCloud 2009   (visualization) USENIX HotMetrics 2009  (black-box metric analysis) HotDep 2009  (black-box analysis for PVFS) Priya Narasimhan  ©  Oct 25, 2009 Carnegie Mellon University
priya@cs.cmu.edu  Oct 25, 2009 Carnegie Mellon University

More Related Content

What's hot

Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...
Ian Foster
 
IEEE ICCCN 2013 - Continuous Gossip-based Aggregation through Dynamic Informa...
IEEE ICCCN 2013 - Continuous Gossip-based Aggregation through Dynamic Informa...IEEE ICCCN 2013 - Continuous Gossip-based Aggregation through Dynamic Informa...
IEEE ICCCN 2013 - Continuous Gossip-based Aggregation through Dynamic Informa...
Kalman Graffi
 
(Slides) Task scheduling algorithm for multicore processor system for minimiz...
(Slides) Task scheduling algorithm for multicore processor system for minimiz...(Slides) Task scheduling algorithm for multicore processor system for minimiz...
(Slides) Task scheduling algorithm for multicore processor system for minimiz...
Naoki Shibata
 
Autonomic Resource Provisioning for Cloud-Based Software
Autonomic Resource Provisioning for Cloud-Based SoftwareAutonomic Resource Provisioning for Cloud-Based Software
Autonomic Resource Provisioning for Cloud-Based Software
Pooyan Jamshidi
 
An efficient scheduling policy for load balancing model for computational gri...
An efficient scheduling policy for load balancing model for computational gri...An efficient scheduling policy for load balancing model for computational gri...
An efficient scheduling policy for load balancing model for computational gri...
Alexander Decker
 
Dynamic Data Center concept
Dynamic Data Center concept  Dynamic Data Center concept
Dynamic Data Center concept
Miha Ahronovitz
 
Snorkel: Dark Data and Machine Learning with Christopher Ré
Snorkel: Dark Data and Machine Learning with Christopher RéSnorkel: Dark Data and Machine Learning with Christopher Ré
Snorkel: Dark Data and Machine Learning with Christopher Ré
Jen Aman
 
Challenges in Large Scale Machine Learning
Challenges in Large Scale  Machine LearningChallenges in Large Scale  Machine Learning
Challenges in Large Scale Machine Learning
Sudarsun Santhiappan
 
Developing Computational Skills in the Sciences with Matlab Webinar 2017
Developing Computational Skills in the Sciences with Matlab Webinar 2017Developing Computational Skills in the Sciences with Matlab Webinar 2017
Developing Computational Skills in the Sciences with Matlab Webinar 2017
SERC at Carleton College
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
Michel Bruley
 
Challenges on Distributed Machine Learning
Challenges on Distributed Machine LearningChallenges on Distributed Machine Learning
Challenges on Distributed Machine Learning
jie cao
 
Fault tolerance on cloud computing
Fault tolerance on cloud computingFault tolerance on cloud computing
Fault tolerance on cloud computing
www.pixelsolutionbd.com
 
MALT: Distributed Data-Parallelism for Existing ML Applications (Distributed ...
MALT: Distributed Data-Parallelism for Existing ML Applications (Distributed ...MALT: Distributed Data-Parallelism for Existing ML Applications (Distributed ...
MALT: Distributed Data-Parallelism for Existing ML Applications (Distributed ...
asimkadav
 
Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...
Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...
Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...
Frederic Desprez
 
Scalable machine learning
Scalable machine learningScalable machine learning
Scalable machine learning
Arnaud Rachez
 
ACM DEBS 2015: Realtime Streaming Analytics Patterns
ACM DEBS 2015: Realtime Streaming Analytics PatternsACM DEBS 2015: Realtime Streaming Analytics Patterns
ACM DEBS 2015: Realtime Streaming Analytics Patterns
Srinath Perera
 
Distributed computing poli
Distributed computing poliDistributed computing poli
Distributed computing poli
ivascucristian
 
18 Data Streams
18 Data Streams18 Data Streams
18 Data Streams
Pier Luca Lanzi
 
The Interplay of Workflow Execution and Resource Provisioning
The Interplay of Workflow Execution and Resource ProvisioningThe Interplay of Workflow Execution and Resource Provisioning
The Interplay of Workflow Execution and Resource Provisioning
Rafael Ferreira da Silva
 
Presentation southernstork 2009-nov-southernworkshop
Presentation southernstork 2009-nov-southernworkshopPresentation southernstork 2009-nov-southernworkshop
Presentation southernstork 2009-nov-southernworkshop
balmanme
 

What's hot (20)

Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...
 
IEEE ICCCN 2013 - Continuous Gossip-based Aggregation through Dynamic Informa...
IEEE ICCCN 2013 - Continuous Gossip-based Aggregation through Dynamic Informa...IEEE ICCCN 2013 - Continuous Gossip-based Aggregation through Dynamic Informa...
IEEE ICCCN 2013 - Continuous Gossip-based Aggregation through Dynamic Informa...
 
(Slides) Task scheduling algorithm for multicore processor system for minimiz...
(Slides) Task scheduling algorithm for multicore processor system for minimiz...(Slides) Task scheduling algorithm for multicore processor system for minimiz...
(Slides) Task scheduling algorithm for multicore processor system for minimiz...
 
Autonomic Resource Provisioning for Cloud-Based Software
Autonomic Resource Provisioning for Cloud-Based SoftwareAutonomic Resource Provisioning for Cloud-Based Software
Autonomic Resource Provisioning for Cloud-Based Software
 
An efficient scheduling policy for load balancing model for computational gri...
An efficient scheduling policy for load balancing model for computational gri...An efficient scheduling policy for load balancing model for computational gri...
An efficient scheduling policy for load balancing model for computational gri...
 
Dynamic Data Center concept
Dynamic Data Center concept  Dynamic Data Center concept
Dynamic Data Center concept
 
Snorkel: Dark Data and Machine Learning with Christopher Ré
Snorkel: Dark Data and Machine Learning with Christopher RéSnorkel: Dark Data and Machine Learning with Christopher Ré
Snorkel: Dark Data and Machine Learning with Christopher Ré
 
Challenges in Large Scale Machine Learning
Challenges in Large Scale  Machine LearningChallenges in Large Scale  Machine Learning
Challenges in Large Scale Machine Learning
 
Developing Computational Skills in the Sciences with Matlab Webinar 2017
Developing Computational Skills in the Sciences with Matlab Webinar 2017Developing Computational Skills in the Sciences with Matlab Webinar 2017
Developing Computational Skills in the Sciences with Matlab Webinar 2017
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Challenges on Distributed Machine Learning
Challenges on Distributed Machine LearningChallenges on Distributed Machine Learning
Challenges on Distributed Machine Learning
 
Fault tolerance on cloud computing
Fault tolerance on cloud computingFault tolerance on cloud computing
Fault tolerance on cloud computing
 
MALT: Distributed Data-Parallelism for Existing ML Applications (Distributed ...
MALT: Distributed Data-Parallelism for Existing ML Applications (Distributed ...MALT: Distributed Data-Parallelism for Existing ML Applications (Distributed ...
MALT: Distributed Data-Parallelism for Existing ML Applications (Distributed ...
 
Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...
Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...
Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...
 
Scalable machine learning
Scalable machine learningScalable machine learning
Scalable machine learning
 
ACM DEBS 2015: Realtime Streaming Analytics Patterns
ACM DEBS 2015: Realtime Streaming Analytics PatternsACM DEBS 2015: Realtime Streaming Analytics Patterns
ACM DEBS 2015: Realtime Streaming Analytics Patterns
 
Distributed computing poli
Distributed computing poliDistributed computing poli
Distributed computing poli
 
18 Data Streams
18 Data Streams18 Data Streams
18 Data Streams
 
The Interplay of Workflow Execution and Resource Provisioning
The Interplay of Workflow Execution and Resource ProvisioningThe Interplay of Workflow Execution and Resource Provisioning
The Interplay of Workflow Execution and Resource Provisioning
 
Presentation southernstork 2009-nov-southernworkshop
Presentation southernstork 2009-nov-southernworkshopPresentation southernstork 2009-nov-southernworkshop
Presentation southernstork 2009-nov-southernworkshop
 

Viewers also liked

Hw09 Matchmaking In The Cloud
Hw09   Matchmaking In The CloudHw09   Matchmaking In The Cloud
Hw09 Matchmaking In The Cloud
Cloudera, Inc.
 
Hw09 Cross Data Center Logs Processing
Hw09   Cross Data Center Logs ProcessingHw09   Cross Data Center Logs Processing
Hw09 Cross Data Center Logs Processing
Cloudera, Inc.
 
Hw09 Analytics And Reporting
Hw09   Analytics And ReportingHw09   Analytics And Reporting
Hw09 Analytics And Reporting
Cloudera, Inc.
 
Hw09 Optimizing Hadoop Deployments
Hw09   Optimizing Hadoop DeploymentsHw09   Optimizing Hadoop Deployments
Hw09 Optimizing Hadoop Deployments
Cloudera, Inc.
 
Hadoop Puzzlers
Hadoop PuzzlersHadoop Puzzlers
Hadoop Puzzlers
Cloudera, Inc.
 
Doug Cutting on the State of the Hadoop Ecosystem
Doug Cutting on the State of the Hadoop EcosystemDoug Cutting on the State of the Hadoop Ecosystem
Doug Cutting on the State of the Hadoop Ecosystem
Cloudera, Inc.
 
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
Cloudera, Inc.
 

Viewers also liked (7)

Hw09 Matchmaking In The Cloud
Hw09   Matchmaking In The CloudHw09   Matchmaking In The Cloud
Hw09 Matchmaking In The Cloud
 
Hw09 Cross Data Center Logs Processing
Hw09   Cross Data Center Logs ProcessingHw09   Cross Data Center Logs Processing
Hw09 Cross Data Center Logs Processing
 
Hw09 Analytics And Reporting
Hw09   Analytics And ReportingHw09   Analytics And Reporting
Hw09 Analytics And Reporting
 
Hw09 Optimizing Hadoop Deployments
Hw09   Optimizing Hadoop DeploymentsHw09   Optimizing Hadoop Deployments
Hw09 Optimizing Hadoop Deployments
 
Hadoop Puzzlers
Hadoop PuzzlersHadoop Puzzlers
Hadoop Puzzlers
 
Doug Cutting on the State of the Hadoop Ecosystem
Doug Cutting on the State of the Hadoop EcosystemDoug Cutting on the State of the Hadoop Ecosystem
Doug Cutting on the State of the Hadoop Ecosystem
 
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
 

Similar to Hw09 Fingerpointing Sourcing Performance Issues

Vitus Masters Defense
Vitus Masters DefenseVitus Masters Defense
Vitus Masters Defense
derDoc
 
Cs6703 grid and cloud computing book
Cs6703 grid and cloud computing bookCs6703 grid and cloud computing book
Cs6703 grid and cloud computing book
kaleeswaranme
 
Rosaic: A Round-wise Fair Scheduling Approach for Mobile Clouds Based on Task...
Rosaic: A Round-wise Fair Scheduling Approach for Mobile Clouds Based on Task...Rosaic: A Round-wise Fair Scheduling Approach for Mobile Clouds Based on Task...
Rosaic: A Round-wise Fair Scheduling Approach for Mobile Clouds Based on Task...
Mahmud Hossain
 
An introduction to Workload Modelling for Cloud Applications
An introduction to Workload Modelling for Cloud ApplicationsAn introduction to Workload Modelling for Cloud Applications
An introduction to Workload Modelling for Cloud Applications
Ravi Yogesh
 
DIET_BLAST
DIET_BLASTDIET_BLAST
DIET_BLAST
Frederic Desprez
 
Overview of the Data Processing Error Analysis System (DPEAS)
Overview of the Data Processing Error Analysis System (DPEAS)Overview of the Data Processing Error Analysis System (DPEAS)
Overview of the Data Processing Error Analysis System (DPEAS)
The HDF-EOS Tools and Information Center
 
CS4961-L1.ppt
CS4961-L1.pptCS4961-L1.ppt
CS4961-L1.ppt
MarlonMagtibay2
 
Machine Learning for automated diagnosis of distributed ...AE
Machine Learning for automated diagnosis of distributed ...AEMachine Learning for automated diagnosis of distributed ...AE
Machine Learning for automated diagnosis of distributed ...AE
butest
 
University of Iowa Webmail
University of Iowa WebmailUniversity of Iowa Webmail
University of Iowa Webmail
David Shafer
 
RAMSES: Robust Analytic Models for Science at Extreme Scales
RAMSES: Robust Analytic Models for Science at Extreme ScalesRAMSES: Robust Analytic Models for Science at Extreme Scales
RAMSES: Robust Analytic Models for Science at Extreme Scales
Ian Foster
 
PyData 2015 Keynote: "A Systems View of Machine Learning"
PyData 2015 Keynote: "A Systems View of Machine Learning" PyData 2015 Keynote: "A Systems View of Machine Learning"
PyData 2015 Keynote: "A Systems View of Machine Learning"
Joshua Bloom
 
EPAS: A SAMPLING BASED SIMILARITY IDENTIFICATION ALGORITHM FOR THE CLOUD
EPAS: A SAMPLING BASED SIMILARITY IDENTIFICATION ALGORITHM FOR THE CLOUDEPAS: A SAMPLING BASED SIMILARITY IDENTIFICATION ALGORITHM FOR THE CLOUD
EPAS: A SAMPLING BASED SIMILARITY IDENTIFICATION ALGORITHM FOR THE CLOUD
Nexgen Technology
 
Design (Cloud systems) for Failures
Design (Cloud systems) for FailuresDesign (Cloud systems) for Failures
Design (Cloud systems) for Failures
Rodolfo Kohn
 
Building ML Pipelines with DCOS
Building ML Pipelines with DCOSBuilding ML Pipelines with DCOS
Building ML Pipelines with DCOS
QAware GmbH
 
Using a Cloud to Replenish Parched Groundwater Modeling Efforts
Using a Cloud to Replenish Parched Groundwater Modeling EffortsUsing a Cloud to Replenish Parched Groundwater Modeling Efforts
Using a Cloud to Replenish Parched Groundwater Modeling Efforts
Joseph Luchette
 
eResearch workflows for studying free and open source software development
eResearch workflows for studying free and open source software developmenteResearch workflows for studying free and open source software development
eResearch workflows for studying free and open source software development
Andrea Wiggins
 
Ajug april 2011
Ajug april 2011Ajug april 2011
Ajug april 2011
Christopher Curtin
 
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
confluent
 
Virtual Gov Day - IT Operations Breakout - Jennifer Green, R&D Scientist, Los...
Virtual Gov Day - IT Operations Breakout - Jennifer Green, R&D Scientist, Los...Virtual Gov Day - IT Operations Breakout - Jennifer Green, R&D Scientist, Los...
Virtual Gov Day - IT Operations Breakout - Jennifer Green, R&D Scientist, Los...
Splunk
 
[IJET V2I2P18] Authors: Roopa G Yeklaspur, Dr.Yerriswamy.T
[IJET V2I2P18] Authors: Roopa G Yeklaspur, Dr.Yerriswamy.T[IJET V2I2P18] Authors: Roopa G Yeklaspur, Dr.Yerriswamy.T
[IJET V2I2P18] Authors: Roopa G Yeklaspur, Dr.Yerriswamy.T
IJET - International Journal of Engineering and Techniques
 

Similar to Hw09 Fingerpointing Sourcing Performance Issues (20)

Vitus Masters Defense
Vitus Masters DefenseVitus Masters Defense
Vitus Masters Defense
 
Cs6703 grid and cloud computing book
Cs6703 grid and cloud computing bookCs6703 grid and cloud computing book
Cs6703 grid and cloud computing book
 
Rosaic: A Round-wise Fair Scheduling Approach for Mobile Clouds Based on Task...
Rosaic: A Round-wise Fair Scheduling Approach for Mobile Clouds Based on Task...Rosaic: A Round-wise Fair Scheduling Approach for Mobile Clouds Based on Task...
Rosaic: A Round-wise Fair Scheduling Approach for Mobile Clouds Based on Task...
 
An introduction to Workload Modelling for Cloud Applications
An introduction to Workload Modelling for Cloud ApplicationsAn introduction to Workload Modelling for Cloud Applications
An introduction to Workload Modelling for Cloud Applications
 
DIET_BLAST
DIET_BLASTDIET_BLAST
DIET_BLAST
 
Overview of the Data Processing Error Analysis System (DPEAS)
Overview of the Data Processing Error Analysis System (DPEAS)Overview of the Data Processing Error Analysis System (DPEAS)
Overview of the Data Processing Error Analysis System (DPEAS)
 
CS4961-L1.ppt
CS4961-L1.pptCS4961-L1.ppt
CS4961-L1.ppt
 
Machine Learning for automated diagnosis of distributed ...AE
Machine Learning for automated diagnosis of distributed ...AEMachine Learning for automated diagnosis of distributed ...AE
Machine Learning for automated diagnosis of distributed ...AE
 
University of Iowa Webmail
University of Iowa WebmailUniversity of Iowa Webmail
University of Iowa Webmail
 
RAMSES: Robust Analytic Models for Science at Extreme Scales
RAMSES: Robust Analytic Models for Science at Extreme ScalesRAMSES: Robust Analytic Models for Science at Extreme Scales
RAMSES: Robust Analytic Models for Science at Extreme Scales
 
PyData 2015 Keynote: "A Systems View of Machine Learning"
PyData 2015 Keynote: "A Systems View of Machine Learning" PyData 2015 Keynote: "A Systems View of Machine Learning"
PyData 2015 Keynote: "A Systems View of Machine Learning"
 
EPAS: A SAMPLING BASED SIMILARITY IDENTIFICATION ALGORITHM FOR THE CLOUD
EPAS: A SAMPLING BASED SIMILARITY IDENTIFICATION ALGORITHM FOR THE CLOUDEPAS: A SAMPLING BASED SIMILARITY IDENTIFICATION ALGORITHM FOR THE CLOUD
EPAS: A SAMPLING BASED SIMILARITY IDENTIFICATION ALGORITHM FOR THE CLOUD
 
Design (Cloud systems) for Failures
Design (Cloud systems) for FailuresDesign (Cloud systems) for Failures
Design (Cloud systems) for Failures
 
Building ML Pipelines with DCOS
Building ML Pipelines with DCOSBuilding ML Pipelines with DCOS
Building ML Pipelines with DCOS
 
Using a Cloud to Replenish Parched Groundwater Modeling Efforts
Using a Cloud to Replenish Parched Groundwater Modeling EffortsUsing a Cloud to Replenish Parched Groundwater Modeling Efforts
Using a Cloud to Replenish Parched Groundwater Modeling Efforts
 
eResearch workflows for studying free and open source software development
eResearch workflows for studying free and open source software developmenteResearch workflows for studying free and open source software development
eResearch workflows for studying free and open source software development
 
Ajug april 2011
Ajug april 2011Ajug april 2011
Ajug april 2011
 
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
 
Virtual Gov Day - IT Operations Breakout - Jennifer Green, R&D Scientist, Los...
Virtual Gov Day - IT Operations Breakout - Jennifer Green, R&D Scientist, Los...Virtual Gov Day - IT Operations Breakout - Jennifer Green, R&D Scientist, Los...
Virtual Gov Day - IT Operations Breakout - Jennifer Green, R&D Scientist, Los...
 
[IJET V2I2P18] Authors: Roopa G Yeklaspur, Dr.Yerriswamy.T
[IJET V2I2P18] Authors: Roopa G Yeklaspur, Dr.Yerriswamy.T[IJET V2I2P18] Authors: Roopa G Yeklaspur, Dr.Yerriswamy.T
[IJET V2I2P18] Authors: Roopa G Yeklaspur, Dr.Yerriswamy.T
 

More from Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
Cloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
Cloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
Cloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
Cloudera, Inc.
 

More from Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Recently uploaded

The History of Embeddings & Multimodal Embeddings
The History of Embeddings & Multimodal EmbeddingsThe History of Embeddings & Multimodal Embeddings
The History of Embeddings & Multimodal Embeddings
Zilliz
 
"Building Future-Ready Apps with .NET 8 and Azure Serverless Ecosystem", Stan...
"Building Future-Ready Apps with .NET 8 and Azure Serverless Ecosystem", Stan..."Building Future-Ready Apps with .NET 8 and Azure Serverless Ecosystem", Stan...
"Building Future-Ready Apps with .NET 8 and Azure Serverless Ecosystem", Stan...
Fwdays
 
FIDO Munich Seminar FIDO Automotive Apps.pptx
FIDO Munich Seminar FIDO Automotive Apps.pptxFIDO Munich Seminar FIDO Automotive Apps.pptx
FIDO Munich Seminar FIDO Automotive Apps.pptx
FIDO Alliance
 
Choosing the Best Outlook OST to PST Converter: Key Features and Considerations
Choosing the Best Outlook OST to PST Converter: Key Features and ConsiderationsChoosing the Best Outlook OST to PST Converter: Key Features and Considerations
Choosing the Best Outlook OST to PST Converter: Key Features and Considerations
webbyacad software
 
AMD Zen 5 Architecture Deep Dive from Tech Day
AMD Zen 5 Architecture Deep Dive from Tech DayAMD Zen 5 Architecture Deep Dive from Tech Day
AMD Zen 5 Architecture Deep Dive from Tech Day
Low Hong Chuan
 
Increase Quality with User Access Policies - July 2024
Increase Quality with User Access Policies - July 2024Increase Quality with User Access Policies - July 2024
Increase Quality with User Access Policies - July 2024
Peter Caitens
 
Mule Experience Hub and Release Channel with Java 17
Mule Experience Hub and Release Channel with Java 17Mule Experience Hub and Release Channel with Java 17
Mule Experience Hub and Release Channel with Java 17
Bhajan Mehta
 
Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...
Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...
Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...
Snarky Security
 
Enterprise_Mobile_Security_Forum_2013.pdf
Enterprise_Mobile_Security_Forum_2013.pdfEnterprise_Mobile_Security_Forum_2013.pdf
Enterprise_Mobile_Security_Forum_2013.pdf
Yury Chemerkin
 
Cracking AI Black Box - Strategies for Customer-centric Enterprise Excellence
Cracking AI Black Box - Strategies for Customer-centric Enterprise ExcellenceCracking AI Black Box - Strategies for Customer-centric Enterprise Excellence
Cracking AI Black Box - Strategies for Customer-centric Enterprise Excellence
Quentin Reul
 
FIDO Munich Seminar Workforce Authentication Case Study.pptx
FIDO Munich Seminar Workforce Authentication Case Study.pptxFIDO Munich Seminar Workforce Authentication Case Study.pptx
FIDO Munich Seminar Workforce Authentication Case Study.pptx
FIDO Alliance
 
Generative AI Reasoning Tech Talk - July 2024
Generative AI Reasoning Tech Talk - July 2024Generative AI Reasoning Tech Talk - July 2024
Generative AI Reasoning Tech Talk - July 2024
siddu769252
 
Indian Privacy law & Infosec for Startups
Indian Privacy law & Infosec for StartupsIndian Privacy law & Infosec for Startups
Indian Privacy law & Infosec for Startups
AMol NAik
 
What's New in Teams Calling, Meetings, Devices June 2024
What's New in Teams Calling, Meetings, Devices June 2024What's New in Teams Calling, Meetings, Devices June 2024
What's New in Teams Calling, Meetings, Devices June 2024
Stephanie Beckett
 
Finetuning GenAI For Hacking and Defending
Finetuning GenAI For Hacking and DefendingFinetuning GenAI For Hacking and Defending
Finetuning GenAI For Hacking and Defending
Priyanka Aash
 
Mastering Board Best Practices: Essential Skills for Effective Non-profit Lea...
Mastering Board Best Practices: Essential Skills for Effective Non-profit Lea...Mastering Board Best Practices: Essential Skills for Effective Non-profit Lea...
Mastering Board Best Practices: Essential Skills for Effective Non-profit Lea...
OnBoard
 
What's New in Copilot for Microsoft 365 June 2024.pptx
What's New in Copilot for Microsoft 365 June 2024.pptxWhat's New in Copilot for Microsoft 365 June 2024.pptx
What's New in Copilot for Microsoft 365 June 2024.pptx
Stephanie Beckett
 
The Challenge of Interpretability in Generative AI Models.pdf
The Challenge of Interpretability in Generative AI Models.pdfThe Challenge of Interpretability in Generative AI Models.pdf
The Challenge of Interpretability in Generative AI Models.pdf
Sara Kroft
 
History and Introduction for Generative AI ( GenAI )
History and Introduction for Generative AI ( GenAI )History and Introduction for Generative AI ( GenAI )
History and Introduction for Generative AI ( GenAI )
Badri_Bady
 
Redefining Cybersecurity with AI Capabilities
Redefining Cybersecurity with AI CapabilitiesRedefining Cybersecurity with AI Capabilities
Redefining Cybersecurity with AI Capabilities
Priyanka Aash
 

Recently uploaded (20)

The History of Embeddings & Multimodal Embeddings
The History of Embeddings & Multimodal EmbeddingsThe History of Embeddings & Multimodal Embeddings
The History of Embeddings & Multimodal Embeddings
 
"Building Future-Ready Apps with .NET 8 and Azure Serverless Ecosystem", Stan...
"Building Future-Ready Apps with .NET 8 and Azure Serverless Ecosystem", Stan..."Building Future-Ready Apps with .NET 8 and Azure Serverless Ecosystem", Stan...
"Building Future-Ready Apps with .NET 8 and Azure Serverless Ecosystem", Stan...
 
FIDO Munich Seminar FIDO Automotive Apps.pptx
FIDO Munich Seminar FIDO Automotive Apps.pptxFIDO Munich Seminar FIDO Automotive Apps.pptx
FIDO Munich Seminar FIDO Automotive Apps.pptx
 
Choosing the Best Outlook OST to PST Converter: Key Features and Considerations
Choosing the Best Outlook OST to PST Converter: Key Features and ConsiderationsChoosing the Best Outlook OST to PST Converter: Key Features and Considerations
Choosing the Best Outlook OST to PST Converter: Key Features and Considerations
 
AMD Zen 5 Architecture Deep Dive from Tech Day
AMD Zen 5 Architecture Deep Dive from Tech DayAMD Zen 5 Architecture Deep Dive from Tech Day
AMD Zen 5 Architecture Deep Dive from Tech Day
 
Increase Quality with User Access Policies - July 2024
Increase Quality with User Access Policies - July 2024Increase Quality with User Access Policies - July 2024
Increase Quality with User Access Policies - July 2024
 
Mule Experience Hub and Release Channel with Java 17
Mule Experience Hub and Release Channel with Java 17Mule Experience Hub and Release Channel with Java 17
Mule Experience Hub and Release Channel with Java 17
 
Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...
Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...
Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...
 
Enterprise_Mobile_Security_Forum_2013.pdf
Enterprise_Mobile_Security_Forum_2013.pdfEnterprise_Mobile_Security_Forum_2013.pdf
Enterprise_Mobile_Security_Forum_2013.pdf
 
Cracking AI Black Box - Strategies for Customer-centric Enterprise Excellence
Cracking AI Black Box - Strategies for Customer-centric Enterprise ExcellenceCracking AI Black Box - Strategies for Customer-centric Enterprise Excellence
Cracking AI Black Box - Strategies for Customer-centric Enterprise Excellence
 
FIDO Munich Seminar Workforce Authentication Case Study.pptx
FIDO Munich Seminar Workforce Authentication Case Study.pptxFIDO Munich Seminar Workforce Authentication Case Study.pptx
FIDO Munich Seminar Workforce Authentication Case Study.pptx
 
Generative AI Reasoning Tech Talk - July 2024
Generative AI Reasoning Tech Talk - July 2024Generative AI Reasoning Tech Talk - July 2024
Generative AI Reasoning Tech Talk - July 2024
 
Indian Privacy law & Infosec for Startups
Indian Privacy law & Infosec for StartupsIndian Privacy law & Infosec for Startups
Indian Privacy law & Infosec for Startups
 
What's New in Teams Calling, Meetings, Devices June 2024
What's New in Teams Calling, Meetings, Devices June 2024What's New in Teams Calling, Meetings, Devices June 2024
What's New in Teams Calling, Meetings, Devices June 2024
 
Finetuning GenAI For Hacking and Defending
Finetuning GenAI For Hacking and DefendingFinetuning GenAI For Hacking and Defending
Finetuning GenAI For Hacking and Defending
 
Mastering Board Best Practices: Essential Skills for Effective Non-profit Lea...
Mastering Board Best Practices: Essential Skills for Effective Non-profit Lea...Mastering Board Best Practices: Essential Skills for Effective Non-profit Lea...
Mastering Board Best Practices: Essential Skills for Effective Non-profit Lea...
 
What's New in Copilot for Microsoft 365 June 2024.pptx
What's New in Copilot for Microsoft 365 June 2024.pptxWhat's New in Copilot for Microsoft 365 June 2024.pptx
What's New in Copilot for Microsoft 365 June 2024.pptx
 
The Challenge of Interpretability in Generative AI Models.pdf
The Challenge of Interpretability in Generative AI Models.pdfThe Challenge of Interpretability in Generative AI Models.pdf
The Challenge of Interpretability in Generative AI Models.pdf
 
History and Introduction for Generative AI ( GenAI )
History and Introduction for Generative AI ( GenAI )History and Introduction for Generative AI ( GenAI )
History and Introduction for Generative AI ( GenAI )
 
Redefining Cybersecurity with AI Capabilities
Redefining Cybersecurity with AI CapabilitiesRedefining Cybersecurity with AI Capabilities
Redefining Cybersecurity with AI Capabilities
 

Hw09 Fingerpointing Sourcing Performance Issues

  • 1. Jiaqi Tan, Soila Pertet, Xinghao Pan, Mike Kasick, Keith Bare, Eugene Marinelli, Rajeev Gandhi Priya Narasimhan Carnegie Mellon University
  • 2. Automated Problem Diagnosis Diagnosing problems Creates major headaches for administrators Worsens as scale and system complexity grows Goal: automate it and get proactive Failure detection and prediction Problem determination (or “fingerpointing”) Problem visualization How: Instrumentation plus statistical analysis Priya Narasimhan © Oct 25, 2009 Carnegie Mellon University
  • 3. Challenges in Problem Analysis Challenging in large-scale networked environment Can have multiple failure manifestations with a single root cause Can have multiple root causes for a single failure manifestation Problems and/or their manifestations can “travel” among communicating components A lot of information from multiple sources – what to use? what to discard? Automated fingerpointing Automatically discover faulty node in a distributed system Priya Narasimhan © Oct 25, 2009 Carnegie Mellon University
  • 4. Exploration of Fingerpointing Current explorations Hadoop Open-source implementation of Map/Reduce (Yahoo!) PVFS High-performance file system (Argonne National Labs) Lustre High-performance file system (Sun Microsystems) Studied Various types of problems Various kinds of instrumentation Various kinds of data-analysis techniques Priya Narasimhan © Oct 25, 2009 Carnegie Mellon University
  • 5. Why? Hadoop is fault-tolerant Heartbeats: detect lost nodes Speculative re-execution: recover work due to lost/laggard nodes Hadoop’s fault-tolerance can mask performance problems Nodes alive but slow Target failures for our diagnosis Performance degradations (slow, hangs) Priya Narasimhan © Oct 25, 2009 Carnegie Mellon University
  • 6. Hadoop Failure Survey Hadoop Issue Tracker from Jan 07 to Dec 08 https://issues.apache.org/jira Targeted Failures: 66% Priya Narasimhan © Oct 25, 2009 Carnegie Mellon University
  • 7. Hadoop Mailing List Survey Hadoop user’s mailing list: Oct 08 to Apr 09 Examined queries on optimizing programs Most queries: MapReduce-specific aspects e.g. data skew, number of maps and reduces Priya Narasimhan © Oct 25, 2009 Carnegie Mellon University
  • 8. M45 Job Performance Priya Narasimhan © Oct 25, 2009 Carnegie Mellon University Failures due to bad config (e.g., missing files) detected quickly However, 20-30% of failed jobs run for over 1hr before aborting Early fault detection desirable
  • 9. BEFORE : Hadoop Web Console Admin/user sifts through wealth of information Problem is aggravated in large clusters Multiple clicks to chase down a problem No support for historical comparison Information displayed is a snapshot in time Poor localization of correlated problems Progress indicators for all tasks are skewed by correlated problem No clear indicators of performance problems Are task slowdowns due to data skew or bugs? Unclear Priya Narasimhan © Oct 25, 2009 Carnegie Mellon University
  • 10. AFTER : Goals, Non-Goals Diagnose faulty Master/Slave node to user/admin Target production environment Don’t instrument Hadoop or applications additionally Use Hadoop logs as-is ( white-box strategy ) Use OS-level metrics ( black-box strategy ) Work for various workloads and under workload changes Support online and offline diagnosis Non-goals (for now) Tracing problem to offending line of code Priya Narasimhan © Oct 25, 2009 Carnegie Mellon University
  • 11. Target Hadoop Clusters 4000-processor Yahoo!’s M45 cluster Production environment (managed by Yahoo!) Offered to CMU as free cloud-computing resource Diverse kinds of real workloads, problems in the wild Massive machine-learning, language/machine-translation Permission to harvest all logs and OS data each week 100-node Amazon’s EC2 cluster Production environment (managed by Amazon) Commercial, pay-as-you-use cloud-computing resource Workloads under our control, problems injected by us gridmix, nutch, pig, sort, randwriter Can harvest logs and OS data of only our workloads Priya Narasimhan © Oct 25, 2009 Carnegie Mellon University
  • 12. Performance Problems Studied Priya Narasimhan © Oct 25, 2009 Carnegie Mellon University Studied Hadoop Issue Tracker (JIRA) from Jan-Dec 2007 Fault Description Resource contention CPU hog External process uses 70% of CPU Packet-loss 5% or 50% of incoming packets dropped Disk hog 20GB file repeatedly written to Disk full Disk full Application bugs Source: Hadoop JIRA HADOOP-1036 Maps hang due to unhandled exception HADOOP-1152 Reduces fail while copying map output HADOOP-2080 Reduces fail due to incorrect checksum HADOOP-2051 Jobs hang due to unhandled exception HADOOP-1255 Infinite loop at Nameode
  • 13. Hadoop: Instrumentation Priya Narasimhan © Oct 25, 2009 Carnegie Mellon University JobTracker NameNode TaskTracker DataNode Map/Reduce tasks HDFS blocks MASTER NODE SLAVE NODES Hadoop logs OS data OS data Hadoop logs
  • 14. How About Those Metrics? White-box metrics (from Hadoop logs) Event-driven (based on Hadoop’s activities) Durations Map-task durations, Reduce-task durations, ReduceCopy-durations, etc. System-wide dependencies between tasks and data blocks Heartbeat information: Heartbeat rates, Heartbeat-timestamp skew between the Master and Slave nodes Black-box metrics (from OS /proc) 64 different time-driven metrics (sampled every second) Memory used, context-switch rate, User-CPU usage, System-CPU usage, I/O wait time, run-queue size, number of bytes transmitted, number of bytes received, pages in, pages out, page faults Priya Narasimhan © Oct 25, 2009 Carnegie Mellon University
  • 15. Intuition for Diagnosis Slave nodes are doing approximately similar things for a given job Gather metrics and extract statistics Determine metrics of relevance For both black-box and white-box data Peer-compare histograms, means, etc. to determine “odd-man out” Priya Narasimhan © Oct 25, 2009 Carnegie Mellon University
  • 16. Log-Analysis Approach S ALSA: A nalyzing L ogs as S t A te Machines [ USENIX WASL 2008 ] Extract state-machine views of execution from Hadoop logs Distributed control-flow view of logs Distributed data-flow view of logs Diagnose failures based on statistics of these extracted views Control-flow based diagnosis Control-flow + data-flow based diagnosis Perform analysis incrementally so that we can support it online Carnegie Mellon University Priya Narasimhan © Oct 25, 2009
  • 17. Applying SALSA to Hadoop Priya Narasimhan © Oct 25, 2009 Carnegie Mellon University [ t] Launch Map task : [t] Copy Map outputs : [t] Map task done Map outputs to Reduce tasks on other nodes Data-flow view: transfer of data to other nodes [ t] Launch Reduce task : [t] Reduce is idling, waiting for Map outputs : [t] Repeat until all Map outputs copied [t] Start Reduce Copy (of completed Map output) : [t] Finish Reduce Copy [t] Reduce Merge Copy Incoming Map outputs for this Reduce task Control-flow view: state orders, durations
  • 18. Distributed Control+Data Flow Distributed control-flow Causal flow of task execution across cluster nodes, i.e., Reduces waiting on Maps via Shuffles Distributed data-flow Data paths of Map outputs shuffled to Reduces HDFS data blocks read into and written out of jobs Job-centric data-flows : Fused Control+Data Flows Correlate paths of data and execution Create conjoined causal paths from data source before, to data destination after, processing <your name here> © Oct 25, 2009 http://www.pdl.cmu.edu/
  • 19. Intuition: Peer Similarity Oct 25, 2009 Carnegie Mellon University In fault-free conditions, metrics (e.g., WriteBlock durations) are similar across nodes Faulty node: Same metric is different on faulty node, as compared to non-faulty nodes Kullback-Leibler divergence (comparison of histograms) Faulty node Normalized counts (total 1.0) Histograms (distributions) of durations of WriteBlock over a 30-second window Normal node Normal node Normalized counts (total 1.0) Normalized counts (total 1.0)
  • 20. What Else Do We Do? Analyze black-box data with similar intuition Derive PDFs and use a clustering approach Distinct behavior profiles of metric correlations Compare them across nodes Technique called Ganesha [ HotMetrics 2009 ] Analyze heartbeat traffic Compare heartbeat durations across nodes Compare heartbeat-timestamp skews across nodes Different metrics, different viewpoints, different algorithms Priya Narasimhan © Oct 25, 2009 Carnegie Mellon University
  • 21. Priya Narasimhan © Oct 25, 2009 Carnegie Mellon University
  • 22. Putting the Elephant Together Priya Narasimhan © Oct 25, 2009 Carnegie Mellon University TaskTracker heartbeat timestamps Black-box resource usage JobTracker Durations views TaskTracker Durations views JobTracker heartbeat timestamps Job-centric data flows BliMEy: Bli nd Me n and the E lephant Framework [ CMU-CS-09-135 ]
  • 23. Visualization To uncover Hadoop’s execution in an insightful way To reveal outcome of diagnosis on sight To allow developers/admins to get a handle as the system scales Value to programmers [ HotCloud 2009 ] Allows them to spot issues that might assist them in restructuring their code Allows them to spot faulty nodes Priya Narasimhan © Oct 25, 2009 Carnegie Mellon University
  • 24. Visualization ( timeseries ) Priya Narasimhan © Oct 25, 2009 Carnegie Mellon University DiskHog on slave node visible through lower heartbeat rate for that node
  • 25. Visualization( heatmaps ) Priya Narasimhan © Oct 25, 2009 Carnegie Mellon University CPU Hog on node 1 visible on Map-task durations
  • 26. Visualizations ( swimlanes ) Priya Narasimhan © Oct 25, 2009 Carnegie Mellon University Long-tailed map Delaying overall job completion time
  • 27. MIROS MIROS (Map Inputs, Reduce Outputs, Shuffles) Aggregates data volumes across: All tasks per node Entire job Shows skewed data flows, bottlenecks Jiaqi Tan © July 09 http://www.pdl.cmu.edu/
  • 28. Current Developments State-machine extraction + visualization being implemented for the Hadoop Chukwa project Collaboration with Yahoo! Web-based visualization widgets for HICC (Hadoop Infrastructure Care Center) “ Swimlanes” currently available in Chukwa trunk (CHUKWA-94) <your name here> © Oct 25, 2009 http://www.pdl.cmu.edu/
  • 29. Briefly: Online Fingerpointing ASDF : A utomated S ystem for D iagnosing F ailures Can incorporate any number of different data sources Can use any number of analysis techniques to process this data Can support online or offline analyses for Hadoop Currently plugging in our white-box & black-box algorithms Priya Narasimhan © Oct 25, 2009 Carnegie Mellon University
  • 30. Hard Problems Understanding the limits of black-box fingerpointing What failures are outside the reach of a black-box approach? What are the limits of “peer” comparison? What other kinds of black-box instrumentation exist? Scalability Scaling to run across large systems and understanding “growing pains” Visualization Helping system administrators visualize problem diagnosis Trade-offs More instrumentation and more frequent data can improve accuracy of diagnosis, but at what performance cost? Virtualized environments Do these environments help/hurt problem diagnosis? Priya Narasimhan © Oct 25, 2009 Carnegie Mellon University
  • 31. Summary Automated problem diagnosis Current targets: Hadoop, PVFS, Lustre Initial set of failures Real-world bug databases, problems in the wild Short-term: Transition techniques into Hadoop code-base working with Yahoo! Long-term Scalability, scalability, scalability, …. Expand fault study Improve visualization, working with users Additional details USENIX WASL 2008 (white-box log analysis) USENIX HotCloud 2009 (visualization) USENIX HotMetrics 2009 (black-box metric analysis) HotDep 2009 (black-box analysis for PVFS) Priya Narasimhan © Oct 25, 2009 Carnegie Mellon University
  • 32. priya@cs.cmu.edu Oct 25, 2009 Carnegie Mellon University

Editor's Notes

  1. Quick mention verbally of what Hadoop is: Distributed parallel processing runtime with a master-slave architecture. Focus on limping-but-alive: performance degradations not caught by heartbeats
  2. Describe x and y axes