SlideShare a Scribd company logo
Er.Jay Nagar(Technology Researcher )
+91-9601957620
What is Apache Hadoop?
 Open source software framework designed for
storage and processing of large scale data on
clusters of commodity hardware
 Created by Doug Cutting and Mike Carafella in
2005.
 Cutting named the program after his son’s toy
elephant.
Uses for Hadoop
 Data-intensive text processing
 Assembly of large genomes
 Graph mining
 Machine learning and data mining
 Large scale social network analysis
Who Uses Hadoop?
The Hadoop Ecosystem
• Contains Libraries and other
modules
Hadoop
Common
• Hadoop Distributed File
SystemHDFS
• Yet Another Resource
Negotiator
Hadoop
YARN
• A programming model for
large scale data processing
Hadoop
MapReduce
How much data?
 Facebook
 500 TB per day
 Yahoo
 Over 170 PB
 eBay
 Over 6 PB
 Getting the data to the processors becomes the
bottleneck
• Hadoop:
• an open-source software framework that supports data-intensive
distributed applications, licensed under the Apache v2 license.
• Goals / Requirements:
• Abstract and facilitate the storage and processing of large and/or
rapidly growing data sets
• Structured and non-structured data
• Simple programming models
• High scalability and availability
• Use commodity (cheap!) hardware with little redundancy
• Fault-tolerance
• Move computation rather than data
Hadoop Framework Tools
Hadoop’s Architecture
• Distributed, with some centralization
• Main nodes of cluster are where most of the
computational power and storage of the system
lies
• Main nodes run TaskTracker to accept and reply
to MapReduce tasks, and also DataNode to
store needed blocks closely as possible
• Central control node runs NameNode to keep
track of HDFS directories & files, and JobTracker
to dispatch compute tasks to TaskTracker
Hadoop’s Architecture
• Hadoop Distributed Filesystem
• Tailored to needs of MapReduce
• Targeted towards many reads of filestreams
• Writes are more costly
• High degree of data replication (3x by default)
• No need for RAID on normal nodes
• Large blocksize (64MB)
• Location awareness of DataNodes in network
• Hadoop is in use at most organizations that
handle big data:
o Yahoo!
o Facebook
o Amazon
o Netflix
o Etc…
• Some examples of scale:
o Yahoo!’s Search Webmap runs on 10,000
core Linux cluster and powers Yahoo!
Web search
o FB’s Hadoop cluster hosts 100+ PB of
data (July, 2012) & growing at ½ PB/day
(Nov, 2012)
Hadoop’s Architecture
NameNode:
• Stores metadata for the files, like the directory
structure of a typical FS.
• The server holding the NameNode instance is quite
crucial, as there is only one.
• Transaction log for file deletes/adds, etc. Does not use
transactions for whole blocks or file-streams, only
metadata.
• Handles creation of more replica blocks when
necessary after a DataNode failure
Hadoop’s Architecture
DataNode:
• Stores the actual data in HDFS
• Can run on any underlying filesystem (ext3/4, NTFS, etc)
• Notifies NameNode of what blocks it has
• NameNode replicates blocks 2x in local rack, 1x elsewhere
Hadoop’s Architecture: MapReduce Engine
Apache Hadoop Big Data Technology
Hadoop’s Architecture
MapReduce Engine:
• JobTracker & TaskTracker
• JobTracker splits up data into smaller
tasks(“Map”) and sends it to the TaskTracker
process in each node
• TaskTracker reports back to the JobTracker
node and reports on job progress, sends data
(“Reduce”) or requests new jobs
HDFS Basic Concepts
 HDFS is a file system written in Java based on
the Google’s GFS
 Provides redundant storage for massive amounts
of data
HDFS Basic Concepts
 HDFS works best with a smaller number of large
files
 Millions as opposed to billions of files
 Typically 100MB or more per file
 Files in HDFS are write once
 Optimized for streaming reads of large files and
not random reads
How are Files Stored
 Files are split into blocks
 Blocks are split across many machines at load
time
 Different blocks from the same file will be stored on
different machines
 Blocks are replicated across multiple machines
 The NameNode keeps track of which blocks
make up a file and where they are stored
Data Replication
 Default replication is 3-fold
MapReduce
Distributing computation across nodes
MapReduce Overview
 A method for distributing computation across
multiple nodes
 Each node processes the data that is stored at
that node
 Consists of two main phases
 Map
 Reduce
MapReduce Features
 Automatic parallelization and distribution
 Fault-Tolerance
 Provides a clean abstraction for programmers to
use
The Mapper
 Reads data as key/value pairs
 The key is often discarded
 Outputs zero or more key/value pairs
Shuffle and Sort
 Output from the mapper is sorted by key
 All values with the same key are guaranteed to
go to the same machine
The Reducer
 Called once for each unique key
 Gets a list of all values associated with a key as
input
 The reducer outputs zero or more final key/value
pairs
 Usually just one output per input key
MapReduce: Word Count
Mapper
(intermediates)
Mapper
(intermediates)
Mapper
(intermediates)
Mapper
(intermediates)
Reducer Reducer Reducer
(intermediates) (intermediates) (intermediates)
Partitioner Partitioner Partitioner Partitioner
shuffling
Overview
 NameNode
 Holds the metadata for the HDFS
 Secondary NameNode
 Performs housekeeping functions for the
NameNode
 DataNode
 Stores the actual HDFS data blocks
 JobTracker
 Manages MapReduce jobs
 TaskTracker
 Monitors individual Map and Reduce tasks
The NameNode
 Stores the HDFS file system information in a
fsimage
 Updates to the file system (add/remove blocks)
do not change the fsimage file
 They are instead written to a log file
 When starting the NameNode loads the fsimage
file and then applies the changes in the log file
The Secondary NameNode
 NOT a backup for the NameNode
 Periodically reads the log file and applies the
changes to the fsimage file bringing it up to date
 Allows the NameNode to restart faster when
required
JobTracker and TaskTracker
 JobTracker
 Determines the execution plan for the job
 Assigns individual tasks
 TaskTracker
 Keeps track of the performance of an individual
mapper or reducer
Hadoop Ecosystem
Other available tools
Why do these tools exist?
 MapReduce is very powerful, but can be awkward
to master
 These tools allow programmers who are familiar
with other programming styles to take advantage
of the power of MapReduce
Other Tools
 Hive
 Hadoop processing with SQL
 Pig
 Hadoop processing with scripting
 Cascading
 Pipe and Filter processing model
 HBase
 Database model built on top of Hadoop
 Flume
 Designed for large scale data movement

More Related Content

What's hot

Introduction to Spark with Python
Introduction to Spark with PythonIntroduction to Spark with Python
Introduction to Spark with Python
Gokhan Atil
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Noritaka Sekiyama
 
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational CacheUsing Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Dremio Corporation
 
Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
 Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
Databricks
 
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Dr. Arif Wider
 
Efficient Spark Analytics on Encrypted Data with Gidon Gershinsky
 Efficient Spark Analytics on Encrypted Data with Gidon Gershinsky Efficient Spark Analytics on Encrypted Data with Gidon Gershinsky
Efficient Spark Analytics on Encrypted Data with Gidon Gershinsky
Databricks
 
Amazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best PracticesAmazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best Practices
Amazon Web Services
 
File Format Benchmark - Avro, JSON, ORC and Parquet
File Format Benchmark - Avro, JSON, ORC and ParquetFile Format Benchmark - Avro, JSON, ORC and Parquet
File Format Benchmark - Avro, JSON, ORC and Parquet
DataWorks Summit/Hadoop Summit
 
Introduction to Impala
Introduction to ImpalaIntroduction to Impala
Introduction to Impala
markgrover
 
Introduction to PySpark
Introduction to PySparkIntroduction to PySpark
Introduction to PySpark
Russell Jurney
 
Architecting a datalake
Architecting a datalakeArchitecting a datalake
Architecting a datalake
Laurent Leturgez
 
Standing Up an Effective Enterprise Data Hub -- Technology and Beyond
Standing Up an Effective Enterprise Data Hub -- Technology and BeyondStanding Up an Effective Enterprise Data Hub -- Technology and Beyond
Standing Up an Effective Enterprise Data Hub -- Technology and Beyond
Cloudera, Inc.
 
(BDT208) A Technical Introduction to Amazon Elastic MapReduce
(BDT208) A Technical Introduction to Amazon Elastic MapReduce(BDT208) A Technical Introduction to Amazon Elastic MapReduce
(BDT208) A Technical Introduction to Amazon Elastic MapReduce
Amazon Web Services
 
Data Services Marketplace
Data Services MarketplaceData Services Marketplace
Data Services Marketplace
Denodo
 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
Vadim Y. Bichutskiy
 
ADV Slides: Strategies for Fitting a Data Lake into a Modern Data Architecture
ADV Slides: Strategies for Fitting a Data Lake into a Modern Data ArchitectureADV Slides: Strategies for Fitting a Data Lake into a Modern Data Architecture
ADV Slides: Strategies for Fitting a Data Lake into a Modern Data Architecture
DATAVERSITY
 
Hive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmarkHive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmark
Dongwon Kim
 
Introduction to DataFusion An Embeddable Query Engine Written in Rust
Introduction to DataFusion  An Embeddable Query Engine Written in RustIntroduction to DataFusion  An Embeddable Query Engine Written in Rust
Introduction to DataFusion An Embeddable Query Engine Written in Rust
Andrew Lamb
 
Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...
Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...
Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...
Amazon Web Services
 
Moving to Databricks & Delta
Moving to Databricks & DeltaMoving to Databricks & Delta
Moving to Databricks & Delta
Databricks
 

What's hot (20)

Introduction to Spark with Python
Introduction to Spark with PythonIntroduction to Spark with Python
Introduction to Spark with Python
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
 
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational CacheUsing Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
 
Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
 Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
 
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
 
Efficient Spark Analytics on Encrypted Data with Gidon Gershinsky
 Efficient Spark Analytics on Encrypted Data with Gidon Gershinsky Efficient Spark Analytics on Encrypted Data with Gidon Gershinsky
Efficient Spark Analytics on Encrypted Data with Gidon Gershinsky
 
Amazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best PracticesAmazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best Practices
 
File Format Benchmark - Avro, JSON, ORC and Parquet
File Format Benchmark - Avro, JSON, ORC and ParquetFile Format Benchmark - Avro, JSON, ORC and Parquet
File Format Benchmark - Avro, JSON, ORC and Parquet
 
Introduction to Impala
Introduction to ImpalaIntroduction to Impala
Introduction to Impala
 
Introduction to PySpark
Introduction to PySparkIntroduction to PySpark
Introduction to PySpark
 
Architecting a datalake
Architecting a datalakeArchitecting a datalake
Architecting a datalake
 
Standing Up an Effective Enterprise Data Hub -- Technology and Beyond
Standing Up an Effective Enterprise Data Hub -- Technology and BeyondStanding Up an Effective Enterprise Data Hub -- Technology and Beyond
Standing Up an Effective Enterprise Data Hub -- Technology and Beyond
 
(BDT208) A Technical Introduction to Amazon Elastic MapReduce
(BDT208) A Technical Introduction to Amazon Elastic MapReduce(BDT208) A Technical Introduction to Amazon Elastic MapReduce
(BDT208) A Technical Introduction to Amazon Elastic MapReduce
 
Data Services Marketplace
Data Services MarketplaceData Services Marketplace
Data Services Marketplace
 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
 
ADV Slides: Strategies for Fitting a Data Lake into a Modern Data Architecture
ADV Slides: Strategies for Fitting a Data Lake into a Modern Data ArchitectureADV Slides: Strategies for Fitting a Data Lake into a Modern Data Architecture
ADV Slides: Strategies for Fitting a Data Lake into a Modern Data Architecture
 
Hive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmarkHive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmark
 
Introduction to DataFusion An Embeddable Query Engine Written in Rust
Introduction to DataFusion  An Embeddable Query Engine Written in RustIntroduction to DataFusion  An Embeddable Query Engine Written in Rust
Introduction to DataFusion An Embeddable Query Engine Written in Rust
 
Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...
Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...
Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...
 
Moving to Databricks & Delta
Moving to Databricks & DeltaMoving to Databricks & Delta
Moving to Databricks & Delta
 

Viewers also liked

Introduction to MapReduce & hadoop
Introduction to MapReduce & hadoopIntroduction to MapReduce & hadoop
Introduction to MapReduce & hadoop
Colin Su
 
Introduction to MapReduce
Introduction to MapReduceIntroduction to MapReduce
Introduction to MapReduce
Chicago Hadoop Users Group
 
Stock Analyzer Hadoop MapReduce Implementation
Stock Analyzer Hadoop MapReduce ImplementationStock Analyzer Hadoop MapReduce Implementation
Stock Analyzer Hadoop MapReduce Implementation
Maruthi Nataraj K
 
Hadoop 2 - Going beyond MapReduce
Hadoop 2 - Going beyond MapReduceHadoop 2 - Going beyond MapReduce
Hadoop 2 - Going beyond MapReduce
Uwe Printz
 
Hadoop hbase mapreduce
Hadoop hbase mapreduceHadoop hbase mapreduce
Hadoop hbase mapreduce
FARUK BERKSÖZ
 
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hari Shankar Sreekumar
 
An Introduction to MapReduce
An Introduction to MapReduceAn Introduction to MapReduce
An Introduction to MapReduce
Frane Bandov
 
MapReduce Design Patterns
MapReduce Design PatternsMapReduce Design Patterns
MapReduce Design Patterns
Donald Miner
 
Map reduce: beyond word count
Map reduce: beyond word countMap reduce: beyond word count
Map reduce: beyond word count
Jeff Patti
 
Mapreduce Algorithms
Mapreduce AlgorithmsMapreduce Algorithms
Mapreduce Algorithms
Amund Tveit
 
Intro to HDFS and MapReduce
Intro to HDFS and MapReduceIntro to HDFS and MapReduce
Intro to HDFS and MapReduce
Ryan Tabora
 
Introduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
Introduction to MapReduce | MapReduce Architecture | MapReduce FundamentalsIntroduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
Introduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
Skillspeed
 
Hadoop Real Life Use Case & MapReduce Details
Hadoop Real Life Use Case & MapReduce DetailsHadoop Real Life Use Case & MapReduce Details
Hadoop Real Life Use Case & MapReduce Details
Anju Singh
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce Fundamentals
Lynn Langit
 
MapReduce in Simple Terms
MapReduce in Simple TermsMapReduce in Simple Terms
MapReduce in Simple Terms
Saliya Ekanayake
 
Big data, map reduce and beyond
Big data, map reduce and beyondBig data, map reduce and beyond
Big data, map reduce and beyond
datasalt
 
Introduction To Map Reduce
Introduction To Map ReduceIntroduction To Map Reduce
Introduction To Map Reduce
rantav
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
EMC
 

Viewers also liked (18)

Introduction to MapReduce & hadoop
Introduction to MapReduce & hadoopIntroduction to MapReduce & hadoop
Introduction to MapReduce & hadoop
 
Introduction to MapReduce
Introduction to MapReduceIntroduction to MapReduce
Introduction to MapReduce
 
Stock Analyzer Hadoop MapReduce Implementation
Stock Analyzer Hadoop MapReduce ImplementationStock Analyzer Hadoop MapReduce Implementation
Stock Analyzer Hadoop MapReduce Implementation
 
Hadoop 2 - Going beyond MapReduce
Hadoop 2 - Going beyond MapReduceHadoop 2 - Going beyond MapReduce
Hadoop 2 - Going beyond MapReduce
 
Hadoop hbase mapreduce
Hadoop hbase mapreduceHadoop hbase mapreduce
Hadoop hbase mapreduce
 
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
 
An Introduction to MapReduce
An Introduction to MapReduceAn Introduction to MapReduce
An Introduction to MapReduce
 
MapReduce Design Patterns
MapReduce Design PatternsMapReduce Design Patterns
MapReduce Design Patterns
 
Map reduce: beyond word count
Map reduce: beyond word countMap reduce: beyond word count
Map reduce: beyond word count
 
Mapreduce Algorithms
Mapreduce AlgorithmsMapreduce Algorithms
Mapreduce Algorithms
 
Intro to HDFS and MapReduce
Intro to HDFS and MapReduceIntro to HDFS and MapReduce
Intro to HDFS and MapReduce
 
Introduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
Introduction to MapReduce | MapReduce Architecture | MapReduce FundamentalsIntroduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
Introduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
 
Hadoop Real Life Use Case & MapReduce Details
Hadoop Real Life Use Case & MapReduce DetailsHadoop Real Life Use Case & MapReduce Details
Hadoop Real Life Use Case & MapReduce Details
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce Fundamentals
 
MapReduce in Simple Terms
MapReduce in Simple TermsMapReduce in Simple Terms
MapReduce in Simple Terms
 
Big data, map reduce and beyond
Big data, map reduce and beyondBig data, map reduce and beyond
Big data, map reduce and beyond
 
Introduction To Map Reduce
Introduction To Map ReduceIntroduction To Map Reduce
Introduction To Map Reduce
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
 

Similar to Apache Hadoop Big Data Technology

Hadoop Technology
Hadoop TechnologyHadoop Technology
Hadoop Technology
Atul Kushwaha
 
Introduction to HDFS and MapReduce
Introduction to HDFS and MapReduceIntroduction to HDFS and MapReduce
Introduction to HDFS and MapReduce
Derek Chen
 
hadoop
hadoophadoop
hadoop
swatic018
 
hadoop
hadoophadoop
hadoop
swatic018
 
Topic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptxTopic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptx
DanishMahmood23
 
Hadoop
HadoopHadoop
Managing Big data with Hadoop
Managing Big data with HadoopManaging Big data with Hadoop
Managing Big data with Hadoop
Nalini Mehta
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
Flavio Vit
 
HADOOP
HADOOPHADOOP
Hadoop overview.pdf
Hadoop overview.pdfHadoop overview.pdf
Hadoop overview.pdf
Sunil D Patil
 
Lecture 2 part 1
Lecture 2 part 1Lecture 2 part 1
Lecture 2 part 1
Jazan University
 
Seminar ppt
Seminar pptSeminar ppt
Seminar ppt
RajatTripathi34
 
Apache hadoop basics
Apache hadoop basicsApache hadoop basics
Apache hadoop basics
saili mane
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
Mr. Ankit
 
Hadoop - HDFS
Hadoop - HDFSHadoop - HDFS
Hadoop - HDFS
KavyaGo
 
HADOOP.pptx
HADOOP.pptxHADOOP.pptx
HADOOP.pptx
Bharathi567510
 
getFamiliarWithHadoop
getFamiliarWithHadoopgetFamiliarWithHadoop
getFamiliarWithHadoop
AmirReza Mohammadi
 
big data hadoop technonolgy for storing and processing data
big data hadoop technonolgy for storing and processing databig data hadoop technonolgy for storing and processing data
big data hadoop technonolgy for storing and processing data
preetik9044
 
Apache hadoop
Apache hadoopApache hadoop
Apache hadoop
sheetal sharma
 
Hadoop
HadoopHadoop
Hadoop
avnishagr
 

Similar to Apache Hadoop Big Data Technology (20)

Hadoop Technology
Hadoop TechnologyHadoop Technology
Hadoop Technology
 
Introduction to HDFS and MapReduce
Introduction to HDFS and MapReduceIntroduction to HDFS and MapReduce
Introduction to HDFS and MapReduce
 
hadoop
hadoophadoop
hadoop
 
hadoop
hadoophadoop
hadoop
 
Topic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptxTopic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptx
 
Hadoop
HadoopHadoop
Hadoop
 
Managing Big data with Hadoop
Managing Big data with HadoopManaging Big data with Hadoop
Managing Big data with Hadoop
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
HADOOP
HADOOPHADOOP
HADOOP
 
Hadoop overview.pdf
Hadoop overview.pdfHadoop overview.pdf
Hadoop overview.pdf
 
Lecture 2 part 1
Lecture 2 part 1Lecture 2 part 1
Lecture 2 part 1
 
Seminar ppt
Seminar pptSeminar ppt
Seminar ppt
 
Apache hadoop basics
Apache hadoop basicsApache hadoop basics
Apache hadoop basics
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Hadoop - HDFS
Hadoop - HDFSHadoop - HDFS
Hadoop - HDFS
 
HADOOP.pptx
HADOOP.pptxHADOOP.pptx
HADOOP.pptx
 
getFamiliarWithHadoop
getFamiliarWithHadoopgetFamiliarWithHadoop
getFamiliarWithHadoop
 
big data hadoop technonolgy for storing and processing data
big data hadoop technonolgy for storing and processing databig data hadoop technonolgy for storing and processing data
big data hadoop technonolgy for storing and processing data
 
Apache hadoop
Apache hadoopApache hadoop
Apache hadoop
 
Hadoop
HadoopHadoop
Hadoop
 

More from Jay Nagar

11 best tips to grow your influence youtube
11 best tips to grow your influence youtube11 best tips to grow your influence youtube
11 best tips to grow your influence youtube
Jay Nagar
 
Impact of micro vs macro influencers in 2022
Impact of micro vs macro influencers in 2022Impact of micro vs macro influencers in 2022
Impact of micro vs macro influencers in 2022
Jay Nagar
 
What is Signature marketing
What is Signature marketingWhat is Signature marketing
What is Signature marketing
Jay Nagar
 
100+ Guest blogging sites list
100+ Guest blogging sites list100+ Guest blogging sites list
100+ Guest blogging sites list
Jay Nagar
 
Ethical Hacking and Defense Penetration
Ethical Hacking and Defense PenetrationEthical Hacking and Defense Penetration
Ethical Hacking and Defense Penetration
Jay Nagar
 
Cyber Security and Cyber Awareness Tips manual 2020
Cyber Security and Cyber Awareness Tips manual 2020Cyber Security and Cyber Awareness Tips manual 2020
Cyber Security and Cyber Awareness Tips manual 2020
Jay Nagar
 
On-Page SEO Techniques By Digitech Jay
On-Page SEO Techniques By Digitech JayOn-Page SEO Techniques By Digitech Jay
On-Page SEO Techniques By Digitech Jay
Jay Nagar
 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial Intelligence
Jay Nagar
 
Cyber Security and Cyber Awareness
Cyber Security and Cyber Awareness Cyber Security and Cyber Awareness
Cyber Security and Cyber Awareness
Jay Nagar
 
Cyber security and Privacy Awareness manual
Cyber security and Privacy Awareness manual Cyber security and Privacy Awareness manual
Cyber security and Privacy Awareness manual
Jay Nagar
 
Dynamic programming
Dynamic programmingDynamic programming
Dynamic programming
Jay Nagar
 
Bluethooth Protocol stack/layers
Bluethooth Protocol stack/layersBluethooth Protocol stack/layers
Bluethooth Protocol stack/layers
Jay Nagar
 
GPRS(General Packet Radio Service)
GPRS(General Packet Radio Service)GPRS(General Packet Radio Service)
GPRS(General Packet Radio Service)
Jay Nagar
 
Communication and Networking
Communication and NetworkingCommunication and Networking
Communication and Networking
Jay Nagar
 
MOBILE COMPUTING and WIRELESS COMMUNICATION
MOBILE COMPUTING and WIRELESS COMMUNICATION MOBILE COMPUTING and WIRELESS COMMUNICATION
MOBILE COMPUTING and WIRELESS COMMUNICATION
Jay Nagar
 
Global system for mobile communication(GSM)
Global system for mobile communication(GSM)Global system for mobile communication(GSM)
Global system for mobile communication(GSM)
Jay Nagar
 
Python for beginners
Python for beginnersPython for beginners
Python for beginners
Jay Nagar
 
Earn Money from bug bounty
Earn Money from bug bountyEarn Money from bug bounty
Earn Money from bug bounty
Jay Nagar
 
Code smell & refactoring
Code smell & refactoringCode smell & refactoring
Code smell & refactoring
Jay Nagar
 
The Diffie-Hellman Algorithm
The Diffie-Hellman AlgorithmThe Diffie-Hellman Algorithm
The Diffie-Hellman Algorithm
Jay Nagar
 

More from Jay Nagar (20)

11 best tips to grow your influence youtube
11 best tips to grow your influence youtube11 best tips to grow your influence youtube
11 best tips to grow your influence youtube
 
Impact of micro vs macro influencers in 2022
Impact of micro vs macro influencers in 2022Impact of micro vs macro influencers in 2022
Impact of micro vs macro influencers in 2022
 
What is Signature marketing
What is Signature marketingWhat is Signature marketing
What is Signature marketing
 
100+ Guest blogging sites list
100+ Guest blogging sites list100+ Guest blogging sites list
100+ Guest blogging sites list
 
Ethical Hacking and Defense Penetration
Ethical Hacking and Defense PenetrationEthical Hacking and Defense Penetration
Ethical Hacking and Defense Penetration
 
Cyber Security and Cyber Awareness Tips manual 2020
Cyber Security and Cyber Awareness Tips manual 2020Cyber Security and Cyber Awareness Tips manual 2020
Cyber Security and Cyber Awareness Tips manual 2020
 
On-Page SEO Techniques By Digitech Jay
On-Page SEO Techniques By Digitech JayOn-Page SEO Techniques By Digitech Jay
On-Page SEO Techniques By Digitech Jay
 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial Intelligence
 
Cyber Security and Cyber Awareness
Cyber Security and Cyber Awareness Cyber Security and Cyber Awareness
Cyber Security and Cyber Awareness
 
Cyber security and Privacy Awareness manual
Cyber security and Privacy Awareness manual Cyber security and Privacy Awareness manual
Cyber security and Privacy Awareness manual
 
Dynamic programming
Dynamic programmingDynamic programming
Dynamic programming
 
Bluethooth Protocol stack/layers
Bluethooth Protocol stack/layersBluethooth Protocol stack/layers
Bluethooth Protocol stack/layers
 
GPRS(General Packet Radio Service)
GPRS(General Packet Radio Service)GPRS(General Packet Radio Service)
GPRS(General Packet Radio Service)
 
Communication and Networking
Communication and NetworkingCommunication and Networking
Communication and Networking
 
MOBILE COMPUTING and WIRELESS COMMUNICATION
MOBILE COMPUTING and WIRELESS COMMUNICATION MOBILE COMPUTING and WIRELESS COMMUNICATION
MOBILE COMPUTING and WIRELESS COMMUNICATION
 
Global system for mobile communication(GSM)
Global system for mobile communication(GSM)Global system for mobile communication(GSM)
Global system for mobile communication(GSM)
 
Python for beginners
Python for beginnersPython for beginners
Python for beginners
 
Earn Money from bug bounty
Earn Money from bug bountyEarn Money from bug bounty
Earn Money from bug bounty
 
Code smell & refactoring
Code smell & refactoringCode smell & refactoring
Code smell & refactoring
 
The Diffie-Hellman Algorithm
The Diffie-Hellman AlgorithmThe Diffie-Hellman Algorithm
The Diffie-Hellman Algorithm
 

Recently uploaded

Full Disclosure Board Policy.docx BRGY LICUMA
Full  Disclosure Board Policy.docx BRGY LICUMAFull  Disclosure Board Policy.docx BRGY LICUMA
Full Disclosure Board Policy.docx BRGY LICUMA
brgylicumaormoccity
 
Field Diary and lab record, Importance.pdf
Field Diary and lab record, Importance.pdfField Diary and lab record, Importance.pdf
Field Diary and lab record, Importance.pdf
hritikbui
 
Annex K RBF's The World Game pdf document
Annex K RBF's The World Game pdf documentAnnex K RBF's The World Game pdf document
Annex K RBF's The World Game pdf document
Steven McGee
 
Unit 1 Introduction to DATA SCIENCE .pptx
Unit 1 Introduction to DATA SCIENCE .pptxUnit 1 Introduction to DATA SCIENCE .pptx
Unit 1 Introduction to DATA SCIENCE .pptx
Priyanka Jadhav
 
393947940-The-Dell-EMC-PowerMax-Family-Overview.pdf
393947940-The-Dell-EMC-PowerMax-Family-Overview.pdf393947940-The-Dell-EMC-PowerMax-Family-Overview.pdf
393947940-The-Dell-EMC-PowerMax-Family-Overview.pdf
Ladislau5
 
Training on CSPro and step by steps.pptx
Training on CSPro and step by steps.pptxTraining on CSPro and step by steps.pptx
Training on CSPro and step by steps.pptx
lenjisoHussein
 
Data Storytelling Final Project for MBA 635
Data Storytelling Final Project for MBA 635Data Storytelling Final Project for MBA 635
Data Storytelling Final Project for MBA 635
HeidiLivengood
 
Accounting and Auditing Laws-Rules-and-Regulations
Accounting and Auditing Laws-Rules-and-RegulationsAccounting and Auditing Laws-Rules-and-Regulations
Accounting and Auditing Laws-Rules-and-Regulations
DALubis
 
Parcel Delivery - Intel Segmentation and Last Mile Opt.pptx
Parcel Delivery - Intel Segmentation and Last Mile Opt.pptxParcel Delivery - Intel Segmentation and Last Mile Opt.pptx
Parcel Delivery - Intel Segmentation and Last Mile Opt.pptx
AltanAtabarut
 
Aws MLOps Interview Questions with answers
Aws MLOps Interview Questions  with answersAws MLOps Interview Questions  with answers
Aws MLOps Interview Questions with answers
Sathiakumar Chandr
 
Audits Of Complaints Against the PPD Report_2022.pdf
Audits Of Complaints Against the PPD Report_2022.pdfAudits Of Complaints Against the PPD Report_2022.pdf
Audits Of Complaints Against the PPD Report_2022.pdf
evwcarr
 
Bimbingan kaunseling untuk pelajar IPTA/IPTS di Malaysia
Bimbingan kaunseling untuk pelajar IPTA/IPTS di MalaysiaBimbingan kaunseling untuk pelajar IPTA/IPTS di Malaysia
Bimbingan kaunseling untuk pelajar IPTA/IPTS di Malaysia
aznidajailani
 
The Rise of Python in Finance,Automating Trading Strategies: _.pdf
The Rise of Python in Finance,Automating Trading Strategies: _.pdfThe Rise of Python in Finance,Automating Trading Strategies: _.pdf
The Rise of Python in Finance,Automating Trading Strategies: _.pdf
Riya Sen
 
Where to order Frederick Community College diploma?
Where to order Frederick Community College diploma?Where to order Frederick Community College diploma?
Where to order Frederick Community College diploma?
SomalyEng
 
Big Data and Analytics Shaping the future of Payments
Big Data and Analytics Shaping the future of PaymentsBig Data and Analytics Shaping the future of Payments
Big Data and Analytics Shaping the future of Payments
RuchiRathor2
 
Acid Base Practice Test 4- KEY.pdfkkjkjk
Acid Base Practice Test 4- KEY.pdfkkjkjkAcid Base Practice Test 4- KEY.pdfkkjkjk
Acid Base Practice Test 4- KEY.pdfkkjkjk
talha2khan2k
 
SFBA Splunk Usergroup meeting July 17, 2024
SFBA Splunk Usergroup meeting July 17, 2024SFBA Splunk Usergroup meeting July 17, 2024
SFBA Splunk Usergroup meeting July 17, 2024
Becky Burwell
 
DESIGN AND DEVELOPMENT OF AUTO OXYGEN CONCENTRATOR WITH SOS ALERT FOR HIKING ...
DESIGN AND DEVELOPMENT OF AUTO OXYGEN CONCENTRATOR WITH SOS ALERT FOR HIKING ...DESIGN AND DEVELOPMENT OF AUTO OXYGEN CONCENTRATOR WITH SOS ALERT FOR HIKING ...
DESIGN AND DEVELOPMENT OF AUTO OXYGEN CONCENTRATOR WITH SOS ALERT FOR HIKING ...
JeevanKp7
 
SAMPLE PRODUCT RESEARCH PR - strikingly.pptx
SAMPLE PRODUCT RESEARCH PR - strikingly.pptxSAMPLE PRODUCT RESEARCH PR - strikingly.pptx
SAMPLE PRODUCT RESEARCH PR - strikingly.pptx
wojakmodern
 
Technology used in Ott data analysis project
Technology used in Ott data analysis  projectTechnology used in Ott data analysis  project
Technology used in Ott data analysis project
49AkshitYadav
 

Recently uploaded (20)

Full Disclosure Board Policy.docx BRGY LICUMA
Full  Disclosure Board Policy.docx BRGY LICUMAFull  Disclosure Board Policy.docx BRGY LICUMA
Full Disclosure Board Policy.docx BRGY LICUMA
 
Field Diary and lab record, Importance.pdf
Field Diary and lab record, Importance.pdfField Diary and lab record, Importance.pdf
Field Diary and lab record, Importance.pdf
 
Annex K RBF's The World Game pdf document
Annex K RBF's The World Game pdf documentAnnex K RBF's The World Game pdf document
Annex K RBF's The World Game pdf document
 
Unit 1 Introduction to DATA SCIENCE .pptx
Unit 1 Introduction to DATA SCIENCE .pptxUnit 1 Introduction to DATA SCIENCE .pptx
Unit 1 Introduction to DATA SCIENCE .pptx
 
393947940-The-Dell-EMC-PowerMax-Family-Overview.pdf
393947940-The-Dell-EMC-PowerMax-Family-Overview.pdf393947940-The-Dell-EMC-PowerMax-Family-Overview.pdf
393947940-The-Dell-EMC-PowerMax-Family-Overview.pdf
 
Training on CSPro and step by steps.pptx
Training on CSPro and step by steps.pptxTraining on CSPro and step by steps.pptx
Training on CSPro and step by steps.pptx
 
Data Storytelling Final Project for MBA 635
Data Storytelling Final Project for MBA 635Data Storytelling Final Project for MBA 635
Data Storytelling Final Project for MBA 635
 
Accounting and Auditing Laws-Rules-and-Regulations
Accounting and Auditing Laws-Rules-and-RegulationsAccounting and Auditing Laws-Rules-and-Regulations
Accounting and Auditing Laws-Rules-and-Regulations
 
Parcel Delivery - Intel Segmentation and Last Mile Opt.pptx
Parcel Delivery - Intel Segmentation and Last Mile Opt.pptxParcel Delivery - Intel Segmentation and Last Mile Opt.pptx
Parcel Delivery - Intel Segmentation and Last Mile Opt.pptx
 
Aws MLOps Interview Questions with answers
Aws MLOps Interview Questions  with answersAws MLOps Interview Questions  with answers
Aws MLOps Interview Questions with answers
 
Audits Of Complaints Against the PPD Report_2022.pdf
Audits Of Complaints Against the PPD Report_2022.pdfAudits Of Complaints Against the PPD Report_2022.pdf
Audits Of Complaints Against the PPD Report_2022.pdf
 
Bimbingan kaunseling untuk pelajar IPTA/IPTS di Malaysia
Bimbingan kaunseling untuk pelajar IPTA/IPTS di MalaysiaBimbingan kaunseling untuk pelajar IPTA/IPTS di Malaysia
Bimbingan kaunseling untuk pelajar IPTA/IPTS di Malaysia
 
The Rise of Python in Finance,Automating Trading Strategies: _.pdf
The Rise of Python in Finance,Automating Trading Strategies: _.pdfThe Rise of Python in Finance,Automating Trading Strategies: _.pdf
The Rise of Python in Finance,Automating Trading Strategies: _.pdf
 
Where to order Frederick Community College diploma?
Where to order Frederick Community College diploma?Where to order Frederick Community College diploma?
Where to order Frederick Community College diploma?
 
Big Data and Analytics Shaping the future of Payments
Big Data and Analytics Shaping the future of PaymentsBig Data and Analytics Shaping the future of Payments
Big Data and Analytics Shaping the future of Payments
 
Acid Base Practice Test 4- KEY.pdfkkjkjk
Acid Base Practice Test 4- KEY.pdfkkjkjkAcid Base Practice Test 4- KEY.pdfkkjkjk
Acid Base Practice Test 4- KEY.pdfkkjkjk
 
SFBA Splunk Usergroup meeting July 17, 2024
SFBA Splunk Usergroup meeting July 17, 2024SFBA Splunk Usergroup meeting July 17, 2024
SFBA Splunk Usergroup meeting July 17, 2024
 
DESIGN AND DEVELOPMENT OF AUTO OXYGEN CONCENTRATOR WITH SOS ALERT FOR HIKING ...
DESIGN AND DEVELOPMENT OF AUTO OXYGEN CONCENTRATOR WITH SOS ALERT FOR HIKING ...DESIGN AND DEVELOPMENT OF AUTO OXYGEN CONCENTRATOR WITH SOS ALERT FOR HIKING ...
DESIGN AND DEVELOPMENT OF AUTO OXYGEN CONCENTRATOR WITH SOS ALERT FOR HIKING ...
 
SAMPLE PRODUCT RESEARCH PR - strikingly.pptx
SAMPLE PRODUCT RESEARCH PR - strikingly.pptxSAMPLE PRODUCT RESEARCH PR - strikingly.pptx
SAMPLE PRODUCT RESEARCH PR - strikingly.pptx
 
Technology used in Ott data analysis project
Technology used in Ott data analysis  projectTechnology used in Ott data analysis  project
Technology used in Ott data analysis project
 

Apache Hadoop Big Data Technology

  • 2. What is Apache Hadoop?  Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware  Created by Doug Cutting and Mike Carafella in 2005.  Cutting named the program after his son’s toy elephant.
  • 3. Uses for Hadoop  Data-intensive text processing  Assembly of large genomes  Graph mining  Machine learning and data mining  Large scale social network analysis
  • 5. The Hadoop Ecosystem • Contains Libraries and other modules Hadoop Common • Hadoop Distributed File SystemHDFS • Yet Another Resource Negotiator Hadoop YARN • A programming model for large scale data processing Hadoop MapReduce
  • 6. How much data?  Facebook  500 TB per day  Yahoo  Over 170 PB  eBay  Over 6 PB  Getting the data to the processors becomes the bottleneck
  • 7. • Hadoop: • an open-source software framework that supports data-intensive distributed applications, licensed under the Apache v2 license. • Goals / Requirements: • Abstract and facilitate the storage and processing of large and/or rapidly growing data sets • Structured and non-structured data • Simple programming models • High scalability and availability • Use commodity (cheap!) hardware with little redundancy • Fault-tolerance • Move computation rather than data
  • 9. Hadoop’s Architecture • Distributed, with some centralization • Main nodes of cluster are where most of the computational power and storage of the system lies • Main nodes run TaskTracker to accept and reply to MapReduce tasks, and also DataNode to store needed blocks closely as possible • Central control node runs NameNode to keep track of HDFS directories & files, and JobTracker to dispatch compute tasks to TaskTracker
  • 10. Hadoop’s Architecture • Hadoop Distributed Filesystem • Tailored to needs of MapReduce • Targeted towards many reads of filestreams • Writes are more costly • High degree of data replication (3x by default) • No need for RAID on normal nodes • Large blocksize (64MB) • Location awareness of DataNodes in network
  • 11. • Hadoop is in use at most organizations that handle big data: o Yahoo! o Facebook o Amazon o Netflix o Etc… • Some examples of scale: o Yahoo!’s Search Webmap runs on 10,000 core Linux cluster and powers Yahoo! Web search o FB’s Hadoop cluster hosts 100+ PB of data (July, 2012) & growing at ½ PB/day (Nov, 2012)
  • 12. Hadoop’s Architecture NameNode: • Stores metadata for the files, like the directory structure of a typical FS. • The server holding the NameNode instance is quite crucial, as there is only one. • Transaction log for file deletes/adds, etc. Does not use transactions for whole blocks or file-streams, only metadata. • Handles creation of more replica blocks when necessary after a DataNode failure
  • 13. Hadoop’s Architecture DataNode: • Stores the actual data in HDFS • Can run on any underlying filesystem (ext3/4, NTFS, etc) • Notifies NameNode of what blocks it has • NameNode replicates blocks 2x in local rack, 1x elsewhere
  • 16. Hadoop’s Architecture MapReduce Engine: • JobTracker & TaskTracker • JobTracker splits up data into smaller tasks(“Map”) and sends it to the TaskTracker process in each node • TaskTracker reports back to the JobTracker node and reports on job progress, sends data (“Reduce”) or requests new jobs
  • 17. HDFS Basic Concepts  HDFS is a file system written in Java based on the Google’s GFS  Provides redundant storage for massive amounts of data
  • 18. HDFS Basic Concepts  HDFS works best with a smaller number of large files  Millions as opposed to billions of files  Typically 100MB or more per file  Files in HDFS are write once  Optimized for streaming reads of large files and not random reads
  • 19. How are Files Stored  Files are split into blocks  Blocks are split across many machines at load time  Different blocks from the same file will be stored on different machines  Blocks are replicated across multiple machines  The NameNode keeps track of which blocks make up a file and where they are stored
  • 20. Data Replication  Default replication is 3-fold
  • 22. MapReduce Overview  A method for distributing computation across multiple nodes  Each node processes the data that is stored at that node  Consists of two main phases  Map  Reduce
  • 23. MapReduce Features  Automatic parallelization and distribution  Fault-Tolerance  Provides a clean abstraction for programmers to use
  • 24. The Mapper  Reads data as key/value pairs  The key is often discarded  Outputs zero or more key/value pairs
  • 25. Shuffle and Sort  Output from the mapper is sorted by key  All values with the same key are guaranteed to go to the same machine
  • 26. The Reducer  Called once for each unique key  Gets a list of all values associated with a key as input  The reducer outputs zero or more final key/value pairs  Usually just one output per input key
  • 28. Mapper (intermediates) Mapper (intermediates) Mapper (intermediates) Mapper (intermediates) Reducer Reducer Reducer (intermediates) (intermediates) (intermediates) Partitioner Partitioner Partitioner Partitioner shuffling
  • 29. Overview  NameNode  Holds the metadata for the HDFS  Secondary NameNode  Performs housekeeping functions for the NameNode  DataNode  Stores the actual HDFS data blocks  JobTracker  Manages MapReduce jobs  TaskTracker  Monitors individual Map and Reduce tasks
  • 30. The NameNode  Stores the HDFS file system information in a fsimage  Updates to the file system (add/remove blocks) do not change the fsimage file  They are instead written to a log file  When starting the NameNode loads the fsimage file and then applies the changes in the log file
  • 31. The Secondary NameNode  NOT a backup for the NameNode  Periodically reads the log file and applies the changes to the fsimage file bringing it up to date  Allows the NameNode to restart faster when required
  • 32. JobTracker and TaskTracker  JobTracker  Determines the execution plan for the job  Assigns individual tasks  TaskTracker  Keeps track of the performance of an individual mapper or reducer
  • 34. Why do these tools exist?  MapReduce is very powerful, but can be awkward to master  These tools allow programmers who are familiar with other programming styles to take advantage of the power of MapReduce
  • 35. Other Tools  Hive  Hadoop processing with SQL  Pig  Hadoop processing with scripting  Cascading  Pipe and Filter processing model  HBase  Database model built on top of Hadoop  Flume  Designed for large scale data movement

Editor's Notes

  1. Default replication is 3-fold