SlideShare a Scribd company logo
Learn Apache Spark: A Comprehensive Guide
Content
▪ Introduction
▪ What is Apache Spark?
▪ Apache Spark Features
▪ Components of Apache Spark Ecosystem
▪ Apache Spark Languages
▪ Apache Spark History
▪ Why You Should Learn Apache Spark
▪ Do We Need Hadoop to Run Spark?
Content
▪ Apache Spark Installation
▪ Apache Spark Example
▪ Apache Spark Use Cases
▪ Apache Spark Books
▪ Apache Spark Certifications
▪ Apache Spark Training
▪ Final Words
Introduction
For the analysis of big data, the industry is extensively using Apache
Spark. Hadoop enables a flexible, scalable, cost-effective, and fault-
tolerant computing solution. But the main concern is to maintain the
speed while processing big data. The industry needs a powerful engine
that can respond in less than seconds and perform in-memory
processing. Also, that can perform stream processing as well as batch
processing of the data. This is what made Apache Spark come into
existence!
This is the comprehensive guide that will help you learn Apache Spark.
Starting from the introduction, I’ll show you everything you want to
know about Apache Spark. Sounds good? Let’s dive right in..
What is Apache Spark?
The Spark is a project of Apache, popularly known as “lightning fast
cluster computing”. Spark is an open-source framework for the
processing of large datasets. It is the most active Apache project of the
present time. Spark is written in Scala and provides APIs in Python,
Scala, Java, and R.
The most important feature of Apache Spark is its in-memory cluster
computing that is responsible to increase the speed of data
processing. Spark is known to provide a more general and faster data
processing platform. It helps you run programs comparatively faster
than Hadoop i.e. 100 times faster in memory and 10 times faster even
on the disk.
Apache Spark Features
▪ Multiple Language Support
Apache Spark supports multiple languages; it provides APIs written in
Scala, Java, Python or R. It allows users to write applications in different
languages.
▪ Fast Speed
The most important feature of Apache Spark is its processing speed. It
allows an application to run on Hadoop cluster, up to 100 times faster in
memory, and 10 times faster on disk.
▪ Runs Everywhere
Spark can run on multiple platforms without affecting the processing
speed. It can run on Hadoop, Kubernetes, Mesos, Standalone, and even
in the Cloud.
Apache Spark Features
Apache Spark Features
▪ General Purpose
The spark is a powered by the plethora of libraries for machine learning i.e.
MLlib, DataFrames, and SQL along with Spark Streaming and GraphX. One is
allowed to use a combination of these libraries coherently in an application.
The feature of combining streaming, SQL, and complex analytics, and using in
the same application makes Spark a general-purpose framework.
▪ Advanced Analytics
Apache Spark is known to support ‘Map’ and ‘Reduce’ that has been
mentioned earlier. But along with MapReduce, it supports Streaming data,
SQL queries, Graph algorithms, and Machine learning. Thus, Apache Spark is a
great mean of performing advanced analytics.
Apache Spark Components
Apache Spark Ecosystem comprises of various Apache Spark components that are
responsible for the functioning of the Apache Spark. There are 5 components of Apache
Spark that constitute Apache Spark ecosystem.
▪ Spark Core
The main execution engine of the Spark platform is known as Spark Core. All the
working and functionality of Apache Spark depends on the Spark Core including
memory management, task scheduling, fault recovery, and others. It enables in-
memory processing and is responsible to define RDD (Resilient Distributed Dataset) by
an API that is the programming abstraction of Spark.
▪ Spark SQL and DataFrames
The Spark SQL is the main component of Spark that works with the structured data and
supports structured data processing. Spark SQL comes with a programming abstraction
known as DataFrames. Spark SQL enables developers to combine SQL queries with
manipulated programmatic data that are supported by RDDs in different languages.
Learn Apache Spark: A Comprehensive Guide
Apache Spark Components
▪ Spark Streaming
This Spark component is responsible for the live stream data processing such as log
files created by production web servers. It provides API for the manipulation of data
streams, thus makes it easy to learn Apache Spark project. This component is also
responsible for throughput, scalability, and fault tolerance as that of the Spark Core.
▪ MLlib
MLlib is the in-built library of Spark that contains the functionality of Machine
Learning, known as MLlib. It provides various ML algorithms such as clustering,
classification, regression, collaborative filtering and supporting functionality. MLlib
also contains many low-level machine learning primitives.
▪ GraphX
GraphX is the library that enables graph computations. GraphX also provides an API
to perform graph computation by allowing users generate directed graph using
arbitrary properties of the edge and vertex.
Apache Spark Languages
Apache Spark is written in Scala. So, Scala is the native language
used to interact with the Spark Core. Besides, the APIs of Apache
Spark has been written in other languages, these are
▪ Scala
▪ Java
▪ Python
▪ R
As the framework of Spark is built on Scala, it can offer some great
features as compared to other Apache Spark languages. Using
Scala with Apache Spark provides you access to the latest features.
According to a Spark Survey on Apache Spark Languages, 71% of
Spark developers are using Scala, 58% are using Python, 31% are
using Java, while 18% are using R language.
Learn Apache Spark: A Comprehensive Guide
Apache Spark History
Apache Spark introduction cannot actually begin without mentioning the
history of Apache Spark. So, let’s state in brief, Spark was first introduced
in the year 2009 in UC Berkeley R&D Lab, now AMP Lab by M. Zaharia.
And then Spark was open-sourced under BSD License in the year 2010.
In 2013, the Spark project was donated to Apache Software Foundation
and the BSD license turned into Apache 2.0. In 2014, Spark became a top-
level project of Apache Foundation, known as Apache Spark.
In 2015, with the effort of over 1000 contributors, Apache Spark became
one of the most active Apache projects as well as most active open source
project of big data. Till date,. Apache Spark version 2.3.0 has recently
been released on Feb 28th, 2018 which is the latest version of Apache
Spark.
Learn Apache Spark: A Comprehensive Guide
Why You Should Learn Apache
SparkWith the generation of big data by businesses, it has become very
important to analyze that data to understand business insights. Spark is a
revolutionary framework on big data processing land. Enterprises are
extensively adopting Spark which in turn is increasing demand for Apache
Spark developers.
According to O'Reilly Data Science Salary Survey, the salary of developers
is a function of their Apache skills. Scala language and Apache Spark skills
give a good boost to your existing salary. Apache Spark developers are
known as the programmers who receive the highest salary in
development. With the increasing demand for Apache Spark developers
and their salary level, it is the right time for development professionals to
learn Apache Spark and thus help enterprises to perform analysis of data.
Why You Should Learn Apache
SparkHere are the top 5 reasons you should learn Apache
Spark to boost your development career.
▪ To get more access to Big Data
▪ To grow with the growing Apache Spark Adoption
▪ To get benefits of existing big data investments
▪ To fulfill the demands for Spark developers
▪ To make big money
Do You Need Hadoop to Run
Spark?Spark and Hadoop are the most popular big data processing
frameworks. Being faster than MapReduce, Apache Spark has taken an
edge over the Hadoop in terms of speed. Also, Spark can be used for
the processing of different kind of data including real-time whereas
Hadoop can only be used for the batch processing.
Although Hadoop and Spark don’t do the same thing but can still work
together. Spark is responsible for the faster and real-data processing of
data in Hadoop. To achieve maximum benefits, one can run Spark in
the distributed mode using HDFS.
So, it is not the case that we always need Hadoop to run Spark. But if
you want to run Spark with Hadoop, HDFS is the main requirement to
run Spark in the distributed mode.
Apache Spark Installation
The installation of Apache Spark is not a single step process but
we need to perform a series of steps. Note that Java and Scala
are the prerequisites to install Spark. Let’s start 7 step Apache
Spark installation process.
Step 1: Verify if Java is Installed
Step 2: Verify if Scala is Installed
Step 3: Download Scala
Step 4: Install Scala
Step 5: Download Spark
Step 6: Install Spark
Step 7: Verify Spark Installation
Spark Example: Word Count
ApplicationLet’s understand Spark with an example i.e. how to run word count
application. The word count application will count the number of each
word in the document. Consider the below-given input text which has
been saved as input.txt in the home directory.
Following is the procedure to execute the word count application –
Step 1: Open Spark shell
Step 2: Create RDD
Step 3: Execute word count logic
Step 4: Apply action
Step 5: Check output
Apache Spark Use Cases
So, after getting through Apache Spark introduction and installation, it’s
time to have an overview of the Apache Spark use cases. What do these
Spark use cases signify? The Apache Spark use cases explain where
Apache Spark can be used. Before reading the Apache Spark use cases,
let’s understand why companies should use Apache Spark. So, the
businesses should adopt or say have adopted Apache Spark due to its
▪ Ease of use
▪ High-performance gains
▪ Advanced analytics
▪ Real-time data streaming
▪ Ease of deployment
Learn Apache Spark: A Comprehensive Guide
Apache Spark Use Cases
Apache Spark helps businesses to understand the types of
challenges and problems where we can effectively use Apache
Spark. Let’s have a quick sampling of top Apache Spark use cases
in different industries!
▪ E-Commerce Industry
▪ Healthcare Industry
▪ Travel Industry
▪ Game Industry
▪ Security Industry
Apache Spark Books
. Here is the list of top 10 Apache Spark Books –
▪ Learning Spark: Lightning-Fast Big Data Analysis
▪ High-Performance Spark: Best Practices for Scaling and Optimizing Spark
▪ Mastering Apache Spark
▪ Apache Spark in 24 Hours, Sams Teach Yourself
▪ Spark Cookbook
▪ Apache Spark Graph Processing
▪ Advanced Analytics with Apark: Patterns for learning from Data at Scale
▪ Spark: The Definitive Guide – Big Data Processing Made Simple
▪ Spark GraphX in Action
▪ Big Data Analytics with Spark
Apache Spark Certifications
With the increasing popularity of Apache Spark in the big data industry, the
demand for Apache Spark developers is also increasing. But the companies are
looking for the candidates with validated Apache Spark skills i.e. professionals with
an Apache Spark Certification.
Apache Spark Certifications will help you to start a big data career by validating
your Apache Spark skills and expertise. Getting an Apache Spark Certification will
make you stand out of the crowd by demonstrating your skills to the employers and
peers. Here is the list of top 5 Apache Spark Certifications:
▪ HDP Certified Apache Spark Developer
▪ O’Reilly Developer Certification for Apache Spark
▪ Cloudera Spark and Hadoop Developer
▪ Databricks Certification for Apache Spark
▪ MapR Certified Spark Developer
Apache Spark Training
As the demand for Apache Spark developers is on the rise in the
industry, it becomes important to enhance your Apache Spark skills. A
good Apache Spark training helps big data professionals to get hands-
on experience as per industry standards. Nowadays, enterprises are
looking for Hadoop developers who are skilled in the implementation
of Apache Spark best practices.
Whizlabs Apache Spark Training helps you to learn Apache Spark and
prepares you for the HDPCD Certification exam. This Apache Spark
online training helps you get familiar with the deployment of Apache
Spark to develop complex and sophisticated solutions for the
enterprises.
Apache Spark Training
Whizlabs online training for Apache Spark Certification is one
of the best in industry Apache Spark training. Whizlabs
Hortonworks Apache Spark Developer Certification Online
Training helps you to
▪ validate your Apache Spark expertise
▪ demonstrate your Apache Spark skills
▪ remain updated with the latest releases
▪ solve your queries by industry experts
▪ get accredited as certified Spark developer
▪ earn more by giving you a raise in your salary
Final Words
In this presentation, we have covered a complete definitive and
comprehensive guide on Apache Spark. No doubt, it is a must-read guide
for those who want to learn Apache and also for those who want to
extend their Apache Spark skills. Whether you want to learn Apache
Spark components or need to find best Apache Spark certifications, you
can find here!
This guide is the one-stop destination where one can find the answer to
all the questions based on Apache Spark. Apache Spark has the power to
simplify the challenging processing tasks on different types of large
datasets. It performs complex analytics with the integration of graph
algorithms and machine learning. Spark has brought Big Data processing
for everyone. Just check it out!
Reference Links
1. https://spark.apache.org/
2. https://www.whizlabs.com/blog/learn-apache-spark/
3. https://www.whizlabs.com/blog/importance-of-apache-spark/
4. https://www.whizlabs.com/blog/best-apache-spark-books/
5. https://hortonworks.com/
6. https://www.cloudera.com/
Thank You!

More Related Content

What's hot

Apache Spark PDF
Apache Spark PDFApache Spark PDF
Apache Spark PDF
Naresh Rupareliya
 
Spark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark MeetupSpark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark Meetup
Databricks
 
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
Edureka!
 
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Databricks
 
Spark introduction and architecture
Spark introduction and architectureSpark introduction and architecture
Spark introduction and architecture
Sohil Jain
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
Anastasios Skarlatidis
 
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLabApache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
CloudxLab
 
Apache Spark Introduction
Apache Spark IntroductionApache Spark Introduction
Apache Spark Introduction
sudhakara st
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Databricks
 
Introduction to PySpark
Introduction to PySparkIntroduction to PySpark
Introduction to PySpark
Russell Jurney
 
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
Simplilearn
 
Spark
SparkSpark
Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginn...
Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginn...Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginn...
Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginn...
Simplilearn
 
Introduction to Apache Spark Developer Training
Introduction to Apache Spark Developer TrainingIntroduction to Apache Spark Developer Training
Introduction to Apache Spark Developer Training
Cloudera, Inc.
 
Apache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & InternalsApache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & Internals
Anton Kirillov
 
Programming in Spark using PySpark
Programming in Spark using PySpark      Programming in Spark using PySpark
Programming in Spark using PySpark
Mostafa
 
Intro to Apache Spark
Intro to Apache SparkIntro to Apache Spark
Intro to Apache Spark
Robert Sanders
 
Spark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka StreamsSpark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka Streams
Guido Schmutz
 
Spark streaming , Spark SQL
Spark streaming , Spark SQLSpark streaming , Spark SQL
Spark streaming , Spark SQL
Yousun Jeong
 
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in SparkSpark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Bo Yang
 

What's hot (20)

Apache Spark PDF
Apache Spark PDFApache Spark PDF
Apache Spark PDF
 
Spark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark MeetupSpark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark Meetup
 
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
 
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
 
Spark introduction and architecture
Spark introduction and architectureSpark introduction and architecture
Spark introduction and architecture
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLabApache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
 
Apache Spark Introduction
Apache Spark IntroductionApache Spark Introduction
Apache Spark Introduction
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
 
Introduction to PySpark
Introduction to PySparkIntroduction to PySpark
Introduction to PySpark
 
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
 
Spark
SparkSpark
Spark
 
Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginn...
Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginn...Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginn...
Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginn...
 
Introduction to Apache Spark Developer Training
Introduction to Apache Spark Developer TrainingIntroduction to Apache Spark Developer Training
Introduction to Apache Spark Developer Training
 
Apache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & InternalsApache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & Internals
 
Programming in Spark using PySpark
Programming in Spark using PySpark      Programming in Spark using PySpark
Programming in Spark using PySpark
 
Intro to Apache Spark
Intro to Apache SparkIntro to Apache Spark
Intro to Apache Spark
 
Spark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka StreamsSpark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka Streams
 
Spark streaming , Spark SQL
Spark streaming , Spark SQLSpark streaming , Spark SQL
Spark streaming , Spark SQL
 
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in SparkSpark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
 

Similar to Learn Apache Spark: A Comprehensive Guide

Spark for big data analytics
Spark for big data analyticsSpark for big data analytics
Spark for big data analytics
Edureka!
 
Introduction to spark
Introduction to sparkIntroduction to spark
Introduction to spark
Home
 
Apache spark
Apache sparkApache spark
Apache spark
Dona Mary Philip
 
Performance of Spark vs MapReduce
Performance of Spark vs MapReducePerformance of Spark vs MapReduce
Performance of Spark vs MapReduce
Edureka!
 
Detailed guide to the Apache Spark Framework
Detailed guide to the Apache Spark FrameworkDetailed guide to the Apache Spark Framework
Detailed guide to the Apache Spark Framework
Aegis Software Canada
 
Spark and Hadoop Technology
Spark and Hadoop Technology Spark and Hadoop Technology
Spark and Hadoop Technology
Avinash Gautam
 
Apache Spark Notes
Apache Spark NotesApache Spark Notes
Apache Spark Notes
Venkateswaran Kandasamy
 
Pyspark vs Spark Let's Unravel the Bond!
Pyspark vs Spark Let's Unravel the Bond!Pyspark vs Spark Let's Unravel the Bond!
Pyspark vs Spark Let's Unravel the Bond!
ankitbhandari32
 
Started with-apache-spark
Started with-apache-sparkStarted with-apache-spark
Started with-apache-spark
Happiest Minds Technologies
 
5 things one must know about spark!
5 things one must know about spark!5 things one must know about spark!
5 things one must know about spark!
Edureka!
 
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Simplilearn
 
A Master Guide To Apache Spark Application And Versatile Uses.pdf
A Master Guide To Apache Spark Application And Versatile Uses.pdfA Master Guide To Apache Spark Application And Versatile Uses.pdf
A Master Guide To Apache Spark Application And Versatile Uses.pdf
DataSpace Academy
 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
Dharmjit Singh
 
Spark introduction & Architecture.pptx
Spark introduction & Architecture.pptxSpark introduction & Architecture.pptx
Spark introduction & Architecture.pptx
MUMERSHARJEELCh
 
Spark Interview Questions and Answers | Apache Spark Interview Questions | Sp...
Spark Interview Questions and Answers | Apache Spark Interview Questions | Sp...Spark Interview Questions and Answers | Apache Spark Interview Questions | Sp...
Spark Interview Questions and Answers | Apache Spark Interview Questions | Sp...
Edureka!
 
[Rakuten TechConf2014] [C-6] Leveraging Spark for Cluster Computing
[Rakuten TechConf2014] [C-6] Leveraging Spark for Cluster Computing[Rakuten TechConf2014] [C-6] Leveraging Spark for Cluster Computing
[Rakuten TechConf2014] [C-6] Leveraging Spark for Cluster Computing
Rakuten Group, Inc.
 
Spark_Part 1
Spark_Part 1Spark_Part 1
Spark_Part 1
Shashi Prakash
 
Big Data Processing with Spark and Scala
Big Data Processing with Spark and Scala Big Data Processing with Spark and Scala
Big Data Processing with Spark and Scala
Edureka!
 
SparkPaper
SparkPaperSparkPaper
SparkPaper
Suraj Thapaliya
 
Apache spark installation [autosaved]
Apache spark installation [autosaved]Apache spark installation [autosaved]
Apache spark installation [autosaved]
Shweta Patnaik
 

Similar to Learn Apache Spark: A Comprehensive Guide (20)

Spark for big data analytics
Spark for big data analyticsSpark for big data analytics
Spark for big data analytics
 
Introduction to spark
Introduction to sparkIntroduction to spark
Introduction to spark
 
Apache spark
Apache sparkApache spark
Apache spark
 
Performance of Spark vs MapReduce
Performance of Spark vs MapReducePerformance of Spark vs MapReduce
Performance of Spark vs MapReduce
 
Detailed guide to the Apache Spark Framework
Detailed guide to the Apache Spark FrameworkDetailed guide to the Apache Spark Framework
Detailed guide to the Apache Spark Framework
 
Spark and Hadoop Technology
Spark and Hadoop Technology Spark and Hadoop Technology
Spark and Hadoop Technology
 
Apache Spark Notes
Apache Spark NotesApache Spark Notes
Apache Spark Notes
 
Pyspark vs Spark Let's Unravel the Bond!
Pyspark vs Spark Let's Unravel the Bond!Pyspark vs Spark Let's Unravel the Bond!
Pyspark vs Spark Let's Unravel the Bond!
 
Started with-apache-spark
Started with-apache-sparkStarted with-apache-spark
Started with-apache-spark
 
5 things one must know about spark!
5 things one must know about spark!5 things one must know about spark!
5 things one must know about spark!
 
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
 
A Master Guide To Apache Spark Application And Versatile Uses.pdf
A Master Guide To Apache Spark Application And Versatile Uses.pdfA Master Guide To Apache Spark Application And Versatile Uses.pdf
A Master Guide To Apache Spark Application And Versatile Uses.pdf
 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
 
Spark introduction & Architecture.pptx
Spark introduction & Architecture.pptxSpark introduction & Architecture.pptx
Spark introduction & Architecture.pptx
 
Spark Interview Questions and Answers | Apache Spark Interview Questions | Sp...
Spark Interview Questions and Answers | Apache Spark Interview Questions | Sp...Spark Interview Questions and Answers | Apache Spark Interview Questions | Sp...
Spark Interview Questions and Answers | Apache Spark Interview Questions | Sp...
 
[Rakuten TechConf2014] [C-6] Leveraging Spark for Cluster Computing
[Rakuten TechConf2014] [C-6] Leveraging Spark for Cluster Computing[Rakuten TechConf2014] [C-6] Leveraging Spark for Cluster Computing
[Rakuten TechConf2014] [C-6] Leveraging Spark for Cluster Computing
 
Spark_Part 1
Spark_Part 1Spark_Part 1
Spark_Part 1
 
Big Data Processing with Spark and Scala
Big Data Processing with Spark and Scala Big Data Processing with Spark and Scala
Big Data Processing with Spark and Scala
 
SparkPaper
SparkPaperSparkPaper
SparkPaper
 
Apache spark installation [autosaved]
Apache spark installation [autosaved]Apache spark installation [autosaved]
Apache spark installation [autosaved]
 

More from Whizlabs

When Should You Use AWS Lambda?
When Should You Use AWS Lambda?When Should You Use AWS Lambda?
When Should You Use AWS Lambda?
Whizlabs
 
AWS Lambda Documentation
AWS Lambda DocumentationAWS Lambda Documentation
AWS Lambda Documentation
Whizlabs
 
AWS Lambda Tutorial
AWS Lambda TutorialAWS Lambda Tutorial
AWS Lambda Tutorial
Whizlabs
 
Detailed Analysis of AWS Lambda vs EC2
 Detailed Analysis of AWS Lambda vs EC2 Detailed Analysis of AWS Lambda vs EC2
Detailed Analysis of AWS Lambda vs EC2
Whizlabs
 
What is AWS lambda?
What is AWS lambda?What is AWS lambda?
What is AWS lambda?
Whizlabs
 
Amazon Elastic Block Storage and Balancer
Amazon Elastic Block Storage and BalancerAmazon Elastic Block Storage and Balancer
Amazon Elastic Block Storage and Balancer
Whizlabs
 
Amazon Elastic Compute Cloud
Amazon Elastic Compute CloudAmazon Elastic Compute Cloud
Amazon Elastic Compute Cloud
Whizlabs
 
AWS Virtual Private Cloud
AWS Virtual Private CloudAWS Virtual Private Cloud
AWS Virtual Private Cloud
Whizlabs
 
The Advantages of Using a Private Cloud Over a Virtual Private Cloud
The Advantages of Using a Private Cloud Over a Virtual Private CloudThe Advantages of Using a Private Cloud Over a Virtual Private Cloud
The Advantages of Using a Private Cloud Over a Virtual Private Cloud
Whizlabs
 
Virtual Private Cloud
Virtual Private CloudVirtual Private Cloud
Virtual Private Cloud
Whizlabs
 
Amazon Glacier vs Amazon S3
Amazon Glacier vs Amazon S3Amazon Glacier vs Amazon S3
Amazon Glacier vs Amazon S3
Whizlabs
 
What is Amazon Glacier?
What is Amazon Glacier?What is Amazon Glacier?
What is Amazon Glacier?
Whizlabs
 
Azure interview-questions-pdf
Azure interview-questions-pdfAzure interview-questions-pdf
Azure interview-questions-pdf
Whizlabs
 
Top 100 Java Interview Questions with Detailed Answers
Top 100 Java Interview Questions with Detailed AnswersTop 100 Java Interview Questions with Detailed Answers
Top 100 Java Interview Questions with Detailed Answers
Whizlabs
 
Top 25 Big Data Interview Questions and Answers
Top 25 Big Data Interview Questions and Answers Top 25 Big Data Interview Questions and Answers
Top 25 Big Data Interview Questions and Answers
Whizlabs
 
50 must read hadoop interview questions & answers - whizlabs
50 must read hadoop interview questions & answers - whizlabs50 must read hadoop interview questions & answers - whizlabs
50 must read hadoop interview questions & answers - whizlabs
Whizlabs
 
When to Target PMP Exam – PMBOK5 or PMBOK6?
When to Target PMP Exam – PMBOK5 or PMBOK6?When to Target PMP Exam – PMBOK5 or PMBOK6?
When to Target PMP Exam – PMBOK5 or PMBOK6?
Whizlabs
 
Secrets To Winning At Office Politics How To Get Things Done And Increase You...
Secrets To Winning At Office Politics How To Get Things Done And Increase You...Secrets To Winning At Office Politics How To Get Things Done And Increase You...
Secrets To Winning At Office Politics How To Get Things Done And Increase You...
Whizlabs
 
Tips For Managing A Diverse Project Team - PMP Webinar
Tips For Managing A Diverse Project Team - PMP WebinarTips For Managing A Diverse Project Team - PMP Webinar
Tips For Managing A Diverse Project Team - PMP Webinar
Whizlabs
 
Top Ten Reasons For Project Failure - PMP Webinar
Top Ten Reasons For Project Failure - PMP WebinarTop Ten Reasons For Project Failure - PMP Webinar
Top Ten Reasons For Project Failure - PMP Webinar
Whizlabs
 

More from Whizlabs (20)

When Should You Use AWS Lambda?
When Should You Use AWS Lambda?When Should You Use AWS Lambda?
When Should You Use AWS Lambda?
 
AWS Lambda Documentation
AWS Lambda DocumentationAWS Lambda Documentation
AWS Lambda Documentation
 
AWS Lambda Tutorial
AWS Lambda TutorialAWS Lambda Tutorial
AWS Lambda Tutorial
 
Detailed Analysis of AWS Lambda vs EC2
 Detailed Analysis of AWS Lambda vs EC2 Detailed Analysis of AWS Lambda vs EC2
Detailed Analysis of AWS Lambda vs EC2
 
What is AWS lambda?
What is AWS lambda?What is AWS lambda?
What is AWS lambda?
 
Amazon Elastic Block Storage and Balancer
Amazon Elastic Block Storage and BalancerAmazon Elastic Block Storage and Balancer
Amazon Elastic Block Storage and Balancer
 
Amazon Elastic Compute Cloud
Amazon Elastic Compute CloudAmazon Elastic Compute Cloud
Amazon Elastic Compute Cloud
 
AWS Virtual Private Cloud
AWS Virtual Private CloudAWS Virtual Private Cloud
AWS Virtual Private Cloud
 
The Advantages of Using a Private Cloud Over a Virtual Private Cloud
The Advantages of Using a Private Cloud Over a Virtual Private CloudThe Advantages of Using a Private Cloud Over a Virtual Private Cloud
The Advantages of Using a Private Cloud Over a Virtual Private Cloud
 
Virtual Private Cloud
Virtual Private CloudVirtual Private Cloud
Virtual Private Cloud
 
Amazon Glacier vs Amazon S3
Amazon Glacier vs Amazon S3Amazon Glacier vs Amazon S3
Amazon Glacier vs Amazon S3
 
What is Amazon Glacier?
What is Amazon Glacier?What is Amazon Glacier?
What is Amazon Glacier?
 
Azure interview-questions-pdf
Azure interview-questions-pdfAzure interview-questions-pdf
Azure interview-questions-pdf
 
Top 100 Java Interview Questions with Detailed Answers
Top 100 Java Interview Questions with Detailed AnswersTop 100 Java Interview Questions with Detailed Answers
Top 100 Java Interview Questions with Detailed Answers
 
Top 25 Big Data Interview Questions and Answers
Top 25 Big Data Interview Questions and Answers Top 25 Big Data Interview Questions and Answers
Top 25 Big Data Interview Questions and Answers
 
50 must read hadoop interview questions & answers - whizlabs
50 must read hadoop interview questions & answers - whizlabs50 must read hadoop interview questions & answers - whizlabs
50 must read hadoop interview questions & answers - whizlabs
 
When to Target PMP Exam – PMBOK5 or PMBOK6?
When to Target PMP Exam – PMBOK5 or PMBOK6?When to Target PMP Exam – PMBOK5 or PMBOK6?
When to Target PMP Exam – PMBOK5 or PMBOK6?
 
Secrets To Winning At Office Politics How To Get Things Done And Increase You...
Secrets To Winning At Office Politics How To Get Things Done And Increase You...Secrets To Winning At Office Politics How To Get Things Done And Increase You...
Secrets To Winning At Office Politics How To Get Things Done And Increase You...
 
Tips For Managing A Diverse Project Team - PMP Webinar
Tips For Managing A Diverse Project Team - PMP WebinarTips For Managing A Diverse Project Team - PMP Webinar
Tips For Managing A Diverse Project Team - PMP Webinar
 
Top Ten Reasons For Project Failure - PMP Webinar
Top Ten Reasons For Project Failure - PMP WebinarTop Ten Reasons For Project Failure - PMP Webinar
Top Ten Reasons For Project Failure - PMP Webinar
 

Recently uploaded

Where to order Frederick Community College diploma?
Where to order Frederick Community College diploma?Where to order Frederick Community College diploma?
Where to order Frederick Community College diploma?
SomalyEng
 
Histology of Muscle types histology o.ppt
Histology of Muscle types histology o.pptHistology of Muscle types histology o.ppt
Histology of Muscle types histology o.ppt
SamanArshad11
 
Data Storytelling Final Project for MBA 635
Data Storytelling Final Project for MBA 635Data Storytelling Final Project for MBA 635
Data Storytelling Final Project for MBA 635
HeidiLivengood
 
Full Disclosure Board Policy.docx BRGY LICUMA
Full  Disclosure Board Policy.docx BRGY LICUMAFull  Disclosure Board Policy.docx BRGY LICUMA
Full Disclosure Board Policy.docx BRGY LICUMA
brgylicumaormoccity
 
Solution Manual for First Course in Abstract Algebra A, 8th Edition by John B...
Solution Manual for First Course in Abstract Algebra A, 8th Edition by John B...Solution Manual for First Course in Abstract Algebra A, 8th Edition by John B...
Solution Manual for First Course in Abstract Algebra A, 8th Edition by John B...
rightmanforbloodline
 
Unit 1 Introduction to DATA SCIENCE .pptx
Unit 1 Introduction to DATA SCIENCE .pptxUnit 1 Introduction to DATA SCIENCE .pptx
Unit 1 Introduction to DATA SCIENCE .pptx
Priyanka Jadhav
 
Dataguard Switchover Best Practices using DGMGRL (Dataguard Broker Command Line)
Dataguard Switchover Best Practices using DGMGRL (Dataguard Broker Command Line)Dataguard Switchover Best Practices using DGMGRL (Dataguard Broker Command Line)
Dataguard Switchover Best Practices using DGMGRL (Dataguard Broker Command Line)
Alireza Kamrani
 
PRODUCT | RESEARCH-PRESENTATION-1.1.pptx
PRODUCT | RESEARCH-PRESENTATION-1.1.pptxPRODUCT | RESEARCH-PRESENTATION-1.1.pptx
PRODUCT | RESEARCH-PRESENTATION-1.1.pptx
amazenolmedojeruel
 
SAMPLE PRODUCT RESEARCH PR - strikingly.pptx
SAMPLE PRODUCT RESEARCH PR - strikingly.pptxSAMPLE PRODUCT RESEARCH PR - strikingly.pptx
SAMPLE PRODUCT RESEARCH PR - strikingly.pptx
wojakmodern
 
Data Analytics for Decision Making By District 11 Solutions
Data Analytics for Decision Making By District 11 SolutionsData Analytics for Decision Making By District 11 Solutions
Data Analytics for Decision Making By District 11 Solutions
District 11 Solutions
 
Selcuk Topal Arbitrum Scientific Report.pdf
Selcuk Topal Arbitrum Scientific Report.pdfSelcuk Topal Arbitrum Scientific Report.pdf
Selcuk Topal Arbitrum Scientific Report.pdf
SelcukTOPAL2
 
SFBA Splunk Usergroup meeting July 17, 2024
SFBA Splunk Usergroup meeting July 17, 2024SFBA Splunk Usergroup meeting July 17, 2024
SFBA Splunk Usergroup meeting July 17, 2024
Becky Burwell
 
Parcel Delivery - Intel Segmentation and Last Mile Opt.pdf
Parcel Delivery - Intel Segmentation and Last Mile Opt.pdfParcel Delivery - Intel Segmentation and Last Mile Opt.pdf
Parcel Delivery - Intel Segmentation and Last Mile Opt.pdf
AltanAtabarut
 
Systane Global education training centre
Systane Global education training centreSystane Global education training centre
Systane Global education training centre
AkhinaRomdoni
 
Aws MLOps Interview Questions with answers
Aws MLOps Interview Questions  with answersAws MLOps Interview Questions  with answers
Aws MLOps Interview Questions with answers
Sathiakumar Chandr
 
Vrinda store data analysis project using Excel
Vrinda store data analysis project using ExcelVrinda store data analysis project using Excel
Vrinda store data analysis project using Excel
SantuJana12
 
Cal Girls The Lalit Jaipur 8445551418 Khusi Top Class Girls Call Jaipur Avail...
Cal Girls The Lalit Jaipur 8445551418 Khusi Top Class Girls Call Jaipur Avail...Cal Girls The Lalit Jaipur 8445551418 Khusi Top Class Girls Call Jaipur Avail...
Cal Girls The Lalit Jaipur 8445551418 Khusi Top Class Girls Call Jaipur Avail...
deepikakumaridk25
 
Bimbingan kaunseling untuk pelajar IPTA/IPTS di Malaysia
Bimbingan kaunseling untuk pelajar IPTA/IPTS di MalaysiaBimbingan kaunseling untuk pelajar IPTA/IPTS di Malaysia
Bimbingan kaunseling untuk pelajar IPTA/IPTS di Malaysia
aznidajailani
 
Technology used in Ott data analysis project
Technology used in Ott data analysis  projectTechnology used in Ott data analysis  project
Technology used in Ott data analysis project
49AkshitYadav
 
Cal Girls Mansarovar Jaipur | 08445551418 | Rajni High Profile Girls Call in ...
Cal Girls Mansarovar Jaipur | 08445551418 | Rajni High Profile Girls Call in ...Cal Girls Mansarovar Jaipur | 08445551418 | Rajni High Profile Girls Call in ...
Cal Girls Mansarovar Jaipur | 08445551418 | Rajni High Profile Girls Call in ...
femim26318
 

Recently uploaded (20)

Where to order Frederick Community College diploma?
Where to order Frederick Community College diploma?Where to order Frederick Community College diploma?
Where to order Frederick Community College diploma?
 
Histology of Muscle types histology o.ppt
Histology of Muscle types histology o.pptHistology of Muscle types histology o.ppt
Histology of Muscle types histology o.ppt
 
Data Storytelling Final Project for MBA 635
Data Storytelling Final Project for MBA 635Data Storytelling Final Project for MBA 635
Data Storytelling Final Project for MBA 635
 
Full Disclosure Board Policy.docx BRGY LICUMA
Full  Disclosure Board Policy.docx BRGY LICUMAFull  Disclosure Board Policy.docx BRGY LICUMA
Full Disclosure Board Policy.docx BRGY LICUMA
 
Solution Manual for First Course in Abstract Algebra A, 8th Edition by John B...
Solution Manual for First Course in Abstract Algebra A, 8th Edition by John B...Solution Manual for First Course in Abstract Algebra A, 8th Edition by John B...
Solution Manual for First Course in Abstract Algebra A, 8th Edition by John B...
 
Unit 1 Introduction to DATA SCIENCE .pptx
Unit 1 Introduction to DATA SCIENCE .pptxUnit 1 Introduction to DATA SCIENCE .pptx
Unit 1 Introduction to DATA SCIENCE .pptx
 
Dataguard Switchover Best Practices using DGMGRL (Dataguard Broker Command Line)
Dataguard Switchover Best Practices using DGMGRL (Dataguard Broker Command Line)Dataguard Switchover Best Practices using DGMGRL (Dataguard Broker Command Line)
Dataguard Switchover Best Practices using DGMGRL (Dataguard Broker Command Line)
 
PRODUCT | RESEARCH-PRESENTATION-1.1.pptx
PRODUCT | RESEARCH-PRESENTATION-1.1.pptxPRODUCT | RESEARCH-PRESENTATION-1.1.pptx
PRODUCT | RESEARCH-PRESENTATION-1.1.pptx
 
SAMPLE PRODUCT RESEARCH PR - strikingly.pptx
SAMPLE PRODUCT RESEARCH PR - strikingly.pptxSAMPLE PRODUCT RESEARCH PR - strikingly.pptx
SAMPLE PRODUCT RESEARCH PR - strikingly.pptx
 
Data Analytics for Decision Making By District 11 Solutions
Data Analytics for Decision Making By District 11 SolutionsData Analytics for Decision Making By District 11 Solutions
Data Analytics for Decision Making By District 11 Solutions
 
Selcuk Topal Arbitrum Scientific Report.pdf
Selcuk Topal Arbitrum Scientific Report.pdfSelcuk Topal Arbitrum Scientific Report.pdf
Selcuk Topal Arbitrum Scientific Report.pdf
 
SFBA Splunk Usergroup meeting July 17, 2024
SFBA Splunk Usergroup meeting July 17, 2024SFBA Splunk Usergroup meeting July 17, 2024
SFBA Splunk Usergroup meeting July 17, 2024
 
Parcel Delivery - Intel Segmentation and Last Mile Opt.pdf
Parcel Delivery - Intel Segmentation and Last Mile Opt.pdfParcel Delivery - Intel Segmentation and Last Mile Opt.pdf
Parcel Delivery - Intel Segmentation and Last Mile Opt.pdf
 
Systane Global education training centre
Systane Global education training centreSystane Global education training centre
Systane Global education training centre
 
Aws MLOps Interview Questions with answers
Aws MLOps Interview Questions  with answersAws MLOps Interview Questions  with answers
Aws MLOps Interview Questions with answers
 
Vrinda store data analysis project using Excel
Vrinda store data analysis project using ExcelVrinda store data analysis project using Excel
Vrinda store data analysis project using Excel
 
Cal Girls The Lalit Jaipur 8445551418 Khusi Top Class Girls Call Jaipur Avail...
Cal Girls The Lalit Jaipur 8445551418 Khusi Top Class Girls Call Jaipur Avail...Cal Girls The Lalit Jaipur 8445551418 Khusi Top Class Girls Call Jaipur Avail...
Cal Girls The Lalit Jaipur 8445551418 Khusi Top Class Girls Call Jaipur Avail...
 
Bimbingan kaunseling untuk pelajar IPTA/IPTS di Malaysia
Bimbingan kaunseling untuk pelajar IPTA/IPTS di MalaysiaBimbingan kaunseling untuk pelajar IPTA/IPTS di Malaysia
Bimbingan kaunseling untuk pelajar IPTA/IPTS di Malaysia
 
Technology used in Ott data analysis project
Technology used in Ott data analysis  projectTechnology used in Ott data analysis  project
Technology used in Ott data analysis project
 
Cal Girls Mansarovar Jaipur | 08445551418 | Rajni High Profile Girls Call in ...
Cal Girls Mansarovar Jaipur | 08445551418 | Rajni High Profile Girls Call in ...Cal Girls Mansarovar Jaipur | 08445551418 | Rajni High Profile Girls Call in ...
Cal Girls Mansarovar Jaipur | 08445551418 | Rajni High Profile Girls Call in ...
 

Learn Apache Spark: A Comprehensive Guide

  • 2. Content ▪ Introduction ▪ What is Apache Spark? ▪ Apache Spark Features ▪ Components of Apache Spark Ecosystem ▪ Apache Spark Languages ▪ Apache Spark History ▪ Why You Should Learn Apache Spark ▪ Do We Need Hadoop to Run Spark?
  • 3. Content ▪ Apache Spark Installation ▪ Apache Spark Example ▪ Apache Spark Use Cases ▪ Apache Spark Books ▪ Apache Spark Certifications ▪ Apache Spark Training ▪ Final Words
  • 4. Introduction For the analysis of big data, the industry is extensively using Apache Spark. Hadoop enables a flexible, scalable, cost-effective, and fault- tolerant computing solution. But the main concern is to maintain the speed while processing big data. The industry needs a powerful engine that can respond in less than seconds and perform in-memory processing. Also, that can perform stream processing as well as batch processing of the data. This is what made Apache Spark come into existence! This is the comprehensive guide that will help you learn Apache Spark. Starting from the introduction, I’ll show you everything you want to know about Apache Spark. Sounds good? Let’s dive right in..
  • 5. What is Apache Spark? The Spark is a project of Apache, popularly known as “lightning fast cluster computing”. Spark is an open-source framework for the processing of large datasets. It is the most active Apache project of the present time. Spark is written in Scala and provides APIs in Python, Scala, Java, and R. The most important feature of Apache Spark is its in-memory cluster computing that is responsible to increase the speed of data processing. Spark is known to provide a more general and faster data processing platform. It helps you run programs comparatively faster than Hadoop i.e. 100 times faster in memory and 10 times faster even on the disk.
  • 6. Apache Spark Features ▪ Multiple Language Support Apache Spark supports multiple languages; it provides APIs written in Scala, Java, Python or R. It allows users to write applications in different languages. ▪ Fast Speed The most important feature of Apache Spark is its processing speed. It allows an application to run on Hadoop cluster, up to 100 times faster in memory, and 10 times faster on disk. ▪ Runs Everywhere Spark can run on multiple platforms without affecting the processing speed. It can run on Hadoop, Kubernetes, Mesos, Standalone, and even in the Cloud.
  • 8. Apache Spark Features ▪ General Purpose The spark is a powered by the plethora of libraries for machine learning i.e. MLlib, DataFrames, and SQL along with Spark Streaming and GraphX. One is allowed to use a combination of these libraries coherently in an application. The feature of combining streaming, SQL, and complex analytics, and using in the same application makes Spark a general-purpose framework. ▪ Advanced Analytics Apache Spark is known to support ‘Map’ and ‘Reduce’ that has been mentioned earlier. But along with MapReduce, it supports Streaming data, SQL queries, Graph algorithms, and Machine learning. Thus, Apache Spark is a great mean of performing advanced analytics.
  • 9. Apache Spark Components Apache Spark Ecosystem comprises of various Apache Spark components that are responsible for the functioning of the Apache Spark. There are 5 components of Apache Spark that constitute Apache Spark ecosystem. ▪ Spark Core The main execution engine of the Spark platform is known as Spark Core. All the working and functionality of Apache Spark depends on the Spark Core including memory management, task scheduling, fault recovery, and others. It enables in- memory processing and is responsible to define RDD (Resilient Distributed Dataset) by an API that is the programming abstraction of Spark. ▪ Spark SQL and DataFrames The Spark SQL is the main component of Spark that works with the structured data and supports structured data processing. Spark SQL comes with a programming abstraction known as DataFrames. Spark SQL enables developers to combine SQL queries with manipulated programmatic data that are supported by RDDs in different languages.
  • 11. Apache Spark Components ▪ Spark Streaming This Spark component is responsible for the live stream data processing such as log files created by production web servers. It provides API for the manipulation of data streams, thus makes it easy to learn Apache Spark project. This component is also responsible for throughput, scalability, and fault tolerance as that of the Spark Core. ▪ MLlib MLlib is the in-built library of Spark that contains the functionality of Machine Learning, known as MLlib. It provides various ML algorithms such as clustering, classification, regression, collaborative filtering and supporting functionality. MLlib also contains many low-level machine learning primitives. ▪ GraphX GraphX is the library that enables graph computations. GraphX also provides an API to perform graph computation by allowing users generate directed graph using arbitrary properties of the edge and vertex.
  • 12. Apache Spark Languages Apache Spark is written in Scala. So, Scala is the native language used to interact with the Spark Core. Besides, the APIs of Apache Spark has been written in other languages, these are ▪ Scala ▪ Java ▪ Python ▪ R As the framework of Spark is built on Scala, it can offer some great features as compared to other Apache Spark languages. Using Scala with Apache Spark provides you access to the latest features. According to a Spark Survey on Apache Spark Languages, 71% of Spark developers are using Scala, 58% are using Python, 31% are using Java, while 18% are using R language.
  • 14. Apache Spark History Apache Spark introduction cannot actually begin without mentioning the history of Apache Spark. So, let’s state in brief, Spark was first introduced in the year 2009 in UC Berkeley R&D Lab, now AMP Lab by M. Zaharia. And then Spark was open-sourced under BSD License in the year 2010. In 2013, the Spark project was donated to Apache Software Foundation and the BSD license turned into Apache 2.0. In 2014, Spark became a top- level project of Apache Foundation, known as Apache Spark. In 2015, with the effort of over 1000 contributors, Apache Spark became one of the most active Apache projects as well as most active open source project of big data. Till date,. Apache Spark version 2.3.0 has recently been released on Feb 28th, 2018 which is the latest version of Apache Spark.
  • 16. Why You Should Learn Apache SparkWith the generation of big data by businesses, it has become very important to analyze that data to understand business insights. Spark is a revolutionary framework on big data processing land. Enterprises are extensively adopting Spark which in turn is increasing demand for Apache Spark developers. According to O'Reilly Data Science Salary Survey, the salary of developers is a function of their Apache skills. Scala language and Apache Spark skills give a good boost to your existing salary. Apache Spark developers are known as the programmers who receive the highest salary in development. With the increasing demand for Apache Spark developers and their salary level, it is the right time for development professionals to learn Apache Spark and thus help enterprises to perform analysis of data.
  • 17. Why You Should Learn Apache SparkHere are the top 5 reasons you should learn Apache Spark to boost your development career. ▪ To get more access to Big Data ▪ To grow with the growing Apache Spark Adoption ▪ To get benefits of existing big data investments ▪ To fulfill the demands for Spark developers ▪ To make big money
  • 18. Do You Need Hadoop to Run Spark?Spark and Hadoop are the most popular big data processing frameworks. Being faster than MapReduce, Apache Spark has taken an edge over the Hadoop in terms of speed. Also, Spark can be used for the processing of different kind of data including real-time whereas Hadoop can only be used for the batch processing. Although Hadoop and Spark don’t do the same thing but can still work together. Spark is responsible for the faster and real-data processing of data in Hadoop. To achieve maximum benefits, one can run Spark in the distributed mode using HDFS. So, it is not the case that we always need Hadoop to run Spark. But if you want to run Spark with Hadoop, HDFS is the main requirement to run Spark in the distributed mode.
  • 19. Apache Spark Installation The installation of Apache Spark is not a single step process but we need to perform a series of steps. Note that Java and Scala are the prerequisites to install Spark. Let’s start 7 step Apache Spark installation process. Step 1: Verify if Java is Installed Step 2: Verify if Scala is Installed Step 3: Download Scala Step 4: Install Scala Step 5: Download Spark Step 6: Install Spark Step 7: Verify Spark Installation
  • 20. Spark Example: Word Count ApplicationLet’s understand Spark with an example i.e. how to run word count application. The word count application will count the number of each word in the document. Consider the below-given input text which has been saved as input.txt in the home directory. Following is the procedure to execute the word count application – Step 1: Open Spark shell Step 2: Create RDD Step 3: Execute word count logic Step 4: Apply action Step 5: Check output
  • 21. Apache Spark Use Cases So, after getting through Apache Spark introduction and installation, it’s time to have an overview of the Apache Spark use cases. What do these Spark use cases signify? The Apache Spark use cases explain where Apache Spark can be used. Before reading the Apache Spark use cases, let’s understand why companies should use Apache Spark. So, the businesses should adopt or say have adopted Apache Spark due to its ▪ Ease of use ▪ High-performance gains ▪ Advanced analytics ▪ Real-time data streaming ▪ Ease of deployment
  • 23. Apache Spark Use Cases Apache Spark helps businesses to understand the types of challenges and problems where we can effectively use Apache Spark. Let’s have a quick sampling of top Apache Spark use cases in different industries! ▪ E-Commerce Industry ▪ Healthcare Industry ▪ Travel Industry ▪ Game Industry ▪ Security Industry
  • 24. Apache Spark Books . Here is the list of top 10 Apache Spark Books – ▪ Learning Spark: Lightning-Fast Big Data Analysis ▪ High-Performance Spark: Best Practices for Scaling and Optimizing Spark ▪ Mastering Apache Spark ▪ Apache Spark in 24 Hours, Sams Teach Yourself ▪ Spark Cookbook ▪ Apache Spark Graph Processing ▪ Advanced Analytics with Apark: Patterns for learning from Data at Scale ▪ Spark: The Definitive Guide – Big Data Processing Made Simple ▪ Spark GraphX in Action ▪ Big Data Analytics with Spark
  • 25. Apache Spark Certifications With the increasing popularity of Apache Spark in the big data industry, the demand for Apache Spark developers is also increasing. But the companies are looking for the candidates with validated Apache Spark skills i.e. professionals with an Apache Spark Certification. Apache Spark Certifications will help you to start a big data career by validating your Apache Spark skills and expertise. Getting an Apache Spark Certification will make you stand out of the crowd by demonstrating your skills to the employers and peers. Here is the list of top 5 Apache Spark Certifications: ▪ HDP Certified Apache Spark Developer ▪ O’Reilly Developer Certification for Apache Spark ▪ Cloudera Spark and Hadoop Developer ▪ Databricks Certification for Apache Spark ▪ MapR Certified Spark Developer
  • 26. Apache Spark Training As the demand for Apache Spark developers is on the rise in the industry, it becomes important to enhance your Apache Spark skills. A good Apache Spark training helps big data professionals to get hands- on experience as per industry standards. Nowadays, enterprises are looking for Hadoop developers who are skilled in the implementation of Apache Spark best practices. Whizlabs Apache Spark Training helps you to learn Apache Spark and prepares you for the HDPCD Certification exam. This Apache Spark online training helps you get familiar with the deployment of Apache Spark to develop complex and sophisticated solutions for the enterprises.
  • 27. Apache Spark Training Whizlabs online training for Apache Spark Certification is one of the best in industry Apache Spark training. Whizlabs Hortonworks Apache Spark Developer Certification Online Training helps you to ▪ validate your Apache Spark expertise ▪ demonstrate your Apache Spark skills ▪ remain updated with the latest releases ▪ solve your queries by industry experts ▪ get accredited as certified Spark developer ▪ earn more by giving you a raise in your salary
  • 28. Final Words In this presentation, we have covered a complete definitive and comprehensive guide on Apache Spark. No doubt, it is a must-read guide for those who want to learn Apache and also for those who want to extend their Apache Spark skills. Whether you want to learn Apache Spark components or need to find best Apache Spark certifications, you can find here! This guide is the one-stop destination where one can find the answer to all the questions based on Apache Spark. Apache Spark has the power to simplify the challenging processing tasks on different types of large datasets. It performs complex analytics with the integration of graph algorithms and machine learning. Spark has brought Big Data processing for everyone. Just check it out!
  • 29. Reference Links 1. https://spark.apache.org/ 2. https://www.whizlabs.com/blog/learn-apache-spark/ 3. https://www.whizlabs.com/blog/importance-of-apache-spark/ 4. https://www.whizlabs.com/blog/best-apache-spark-books/ 5. https://hortonworks.com/ 6. https://www.cloudera.com/ Thank You!