SlideShare a Scribd company logo
Hadoop
        Framework for Distributed Applications




                                        Nishant M Gandhi
                                BE 4th YEAR Comp. Eng.
C K Pithawalla College of Engineering & Technology,Surat.
Hadoop

• Introduction
• History
• Key Technologies
  – MapReduce
  – HDFS
• Other Projects On Hadoop
• Conclusion
Introduction:

What is                       ?
 Hadoop is a framework for running applications on large clusters
 built of commodity hardware.
                          ----HADOOP WIKI


  Hadoop is a free, Java-based programming framework that
  supports the processing of large data sets in a distributed
  computing environment.
Introduction (conti..)


 #1 Open Source
 #2 Part of Apache group
 #3 Power of JAVA
 #4 Supported By Big Web Giant Companies




#1 Google’s Powerful Computation MapReduce Technology
#2 Hadoop Distributed File System(HDFS) inspired by Google File
System(GFS)
#3 Used for Cluster & Distributed Computing
#4 Support from…
History:
Inventor Doug Cutting, creator of Apache Lucene

The Origin of the Name “Hadoop”:

The name my kid gave a stuffed yellow elephant. Short, relatively easy to
spell and pronounce, meaningless, and not used elsewhere: those are my
naming criteria. ---Daug Cutting.

Started with building Web Search Engine
   •Nutch in 2002
   •Aim was to index billions of pages
   •Architecture can’t support billions of pages
Google’s GFS in 2003 solved storage problem
   •Nutch Distributed Filesystem(NDFS) in 2004
Google’s MapReduce in 2004
   •MapReduce implimented in Nutch 2005
Feb 2006 they moved out of Nutch to form an independent
subproject of Lucene called Hadoop.
History (conti..)
At around the same time, Doug Cutting joined Yahoo
February 2008 , Yahoo! announced that its production search index
was being generated by a 10,000-core Hadoop cluster




 In January 2008, Hadoop was made its own top-level project at
 apache, confirming its success and its diverse, active community.
 By this time Hadoop was being used by many other companies
 besides Yahoo! such as
     • Last.fm
     • Facebook
     • The New York Times
     • Twitter
     • Microsoft
     • IBM
Key Technologies:
   •MapReduce
     -Computational Parallel Programming Model
     -Technology developed by google




   •Hadoop Distributed File System
     -Distributed File System for large data set
     -Inspired by Google File System
Key Technologies: MapReduce
Key Technologies: MapReduce

 •   Programming model developed at Google

 •   Sort/merge based distributed computing

 •   Initially, it was intended for their internal search/indexing
     application, but now used extensively by more organizations
     (e.g., Yahoo, Amazon.com, IBM, etc.)

 •   It is functional style programming (e.g., LISP) that is naturally
     parallelizable across a large cluster of workstations or PCS.

 •   The underlying system takes care of the partitioning of the
     input data, scheduling the program’s execution across several
     machines, handling machine failures, and managing required
     inter-machine communication. (This is the key for Hadoop’s
     success)
Key Technologies: HDFS

 At Google MapReduce operation are run on a special file system
  called Google File System (GFS) that is highly optimized for this
  purpose.

 GFS is not open source.

 Doug Cutting and others at Yahoo! reverse engineered the GFS
  and called it Hadoop Distributed File System (HDFS).
Key Technologies: HDFS
Key Technologies: HDFS

  •   Very Large Distributed File System
      – 10K nodes, 100 million files, 10 PB
  •   Assumes Commodity Hardware
      – Files are replicated to handle hardware failure
      – Detect failures and recovers from them
  •   Optimized for Batch Processing
      – Data locations exposed so that computations can move to
      where data resides
      – Provides very high aggregate bandwidth
  •   User Space, runs on heterogeneous OS
Other Projects on Hadoop:

        ZooKeeper: co-ordination services



        Pig: A high-level data-flow language and execution
        framework for parallel computation.



        Hive:A data warehouse infrastructure that provides
        data summarization and ad hoc querying.



         Chukwa: A data collection system for managing
         large distributed systems.
Other Projects on Hadoop:

               Avro: Apache Avro is a data serialization system.
               Avro provides:
               •Rich data structures.
               •A compact, fast, binary data format.
               •A container file, to store persistent data.
               •Simple integration with dynamic languages.


               Just as Google's Bigtable leverages the
               distributed data storage provided by the
               Google File System, HBase provides
               Bigtable-like capabilities on top of
               Hadoop Core.
Hadoop Architecture on DELL C Series
Server:
Conclusion:



Hadoop has been very effective solution for companies dealing
 with the data in perabytes.

It has solved many problems in industry related to huge data
 management and distributed system.

As it is open source, so it is adopted by companies widely.
Thank You…..

More Related Content

What's hot

Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Simplilearn
 
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component
rebeccatho
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
Stanley Wang
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
sravya raju
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
joelcrabb
 
Introduction to HDFS
Introduction to HDFSIntroduction to HDFS
Introduction to HDFS
Bhavesh Padharia
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
EMC
 
Hadoop And Their Ecosystem ppt
 Hadoop And Their Ecosystem ppt Hadoop And Their Ecosystem ppt
Hadoop And Their Ecosystem ppt
sunera pathan
 
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Simplilearn
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
Apache Apex
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
sudhakara st
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
Varun Narang
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
Philippe Julio
 
Hive(ppt)
Hive(ppt)Hive(ppt)
Hive(ppt)
Abhinav Tyagi
 
Apache Hadoop and HBase
Apache Hadoop and HBaseApache Hadoop and HBase
Apache Hadoop and HBase
Cloudera, Inc.
 
Hadoop Tutorial For Beginners
Hadoop Tutorial For BeginnersHadoop Tutorial For Beginners
Hadoop Tutorial For Beginners
Dataflair Web Services Pvt Ltd
 
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Simplilearn
 
Hadoop Ecosystem
Hadoop EcosystemHadoop Ecosystem
Hadoop Ecosystem
Sandip Darwade
 
Hadoop
HadoopHadoop
Introduction to NoSQL Databases
Introduction to NoSQL DatabasesIntroduction to NoSQL Databases
Introduction to NoSQL Databases
Derek Stainer
 

What's hot (20)

Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
 
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Introduction to HDFS
Introduction to HDFSIntroduction to HDFS
Introduction to HDFS
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
 
Hadoop And Their Ecosystem ppt
 Hadoop And Their Ecosystem ppt Hadoop And Their Ecosystem ppt
Hadoop And Their Ecosystem ppt
 
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
Hive(ppt)
Hive(ppt)Hive(ppt)
Hive(ppt)
 
Apache Hadoop and HBase
Apache Hadoop and HBaseApache Hadoop and HBase
Apache Hadoop and HBase
 
Hadoop Tutorial For Beginners
Hadoop Tutorial For BeginnersHadoop Tutorial For Beginners
Hadoop Tutorial For Beginners
 
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
 
Hadoop Ecosystem
Hadoop EcosystemHadoop Ecosystem
Hadoop Ecosystem
 
Hadoop
HadoopHadoop
Hadoop
 
Introduction to NoSQL Databases
Introduction to NoSQL DatabasesIntroduction to NoSQL Databases
Introduction to NoSQL Databases
 

Viewers also liked

Practical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & PigPractical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & Pig
Milind Bhandarkar
 
Introduction to Hadoop Technology
Introduction to Hadoop TechnologyIntroduction to Hadoop Technology
Introduction to Hadoop Technology
Manish Borkar
 
Hadoop Technologies
Hadoop TechnologiesHadoop Technologies
Hadoop Technologies
Kannappan Sirchabesan
 
Hadoop technology
Hadoop technologyHadoop technology
Hadoop technology
tipanagiriharika
 
Hadoop Technology
Hadoop TechnologyHadoop Technology
Hadoop Technology
Atul Kushwaha
 
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
DataWorks Summit/Hadoop Summit
 
Big data- HDFS(2nd presentation)
Big data- HDFS(2nd presentation)Big data- HDFS(2nd presentation)
Big data- HDFS(2nd presentation)
Takrim Ul Islam Laskar
 
HDFS Design Principles
HDFS Design PrinciplesHDFS Design Principles
HDFS Design Principles
Konstantin V. Shvachko
 
Distributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology OverviewDistributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology Overview
Konstantin V. Shvachko
 
Hadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the FieldHadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the Field
DataWorks Summit
 
Hadoop - Lessons Learned
Hadoop - Lessons LearnedHadoop - Lessons Learned
Hadoop - Lessons Learned
tcurdt
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
Subhas Kumar Ghosh
 
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
DataWorks Summit/Hadoop Summit
 
Hadoop - Overview
Hadoop - OverviewHadoop - Overview
Hadoop - Overview
Jay
 
Hadoop HDFS Architeture and Design
Hadoop HDFS Architeture and DesignHadoop HDFS Architeture and Design
Hadoop HDFS Architeture and Design
sudhakara st
 
Hadoop & Big Data benchmarking
Hadoop & Big Data benchmarkingHadoop & Big Data benchmarking
Hadoop & Big Data benchmarking
Bart Vandewoestyne
 
Hadoop demo ppt
Hadoop demo pptHadoop demo ppt
Hadoop demo ppt
Phil Young
 
Hadoop & HDFS for Beginners
Hadoop & HDFS for BeginnersHadoop & HDFS for Beginners
Hadoop & HDFS for Beginners
Rahul Jain
 
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and HadoopFacebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
royans
 
Pig, Making Hadoop Easy
Pig, Making Hadoop EasyPig, Making Hadoop Easy
Pig, Making Hadoop Easy
Nick Dimiduk
 

Viewers also liked (20)

Practical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & PigPractical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & Pig
 
Introduction to Hadoop Technology
Introduction to Hadoop TechnologyIntroduction to Hadoop Technology
Introduction to Hadoop Technology
 
Hadoop Technologies
Hadoop TechnologiesHadoop Technologies
Hadoop Technologies
 
Hadoop technology
Hadoop technologyHadoop technology
Hadoop technology
 
Hadoop Technology
Hadoop TechnologyHadoop Technology
Hadoop Technology
 
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
 
Big data- HDFS(2nd presentation)
Big data- HDFS(2nd presentation)Big data- HDFS(2nd presentation)
Big data- HDFS(2nd presentation)
 
HDFS Design Principles
HDFS Design PrinciplesHDFS Design Principles
HDFS Design Principles
 
Distributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology OverviewDistributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology Overview
 
Hadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the FieldHadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the Field
 
Hadoop - Lessons Learned
Hadoop - Lessons LearnedHadoop - Lessons Learned
Hadoop - Lessons Learned
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
 
Hadoop - Overview
Hadoop - OverviewHadoop - Overview
Hadoop - Overview
 
Hadoop HDFS Architeture and Design
Hadoop HDFS Architeture and DesignHadoop HDFS Architeture and Design
Hadoop HDFS Architeture and Design
 
Hadoop & Big Data benchmarking
Hadoop & Big Data benchmarkingHadoop & Big Data benchmarking
Hadoop & Big Data benchmarking
 
Hadoop demo ppt
Hadoop demo pptHadoop demo ppt
Hadoop demo ppt
 
Hadoop & HDFS for Beginners
Hadoop & HDFS for BeginnersHadoop & HDFS for Beginners
Hadoop & HDFS for Beginners
 
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and HadoopFacebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
 
Pig, Making Hadoop Easy
Pig, Making Hadoop EasyPig, Making Hadoop Easy
Pig, Making Hadoop Easy
 

Similar to Hadoop

Hadoop training
Hadoop trainingHadoop training
Hadoop training
TIB Academy
 
Anju
AnjuAnju
Introduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop EcosystemIntroduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop Ecosystem
Mahabubur Rahaman
 
Hadoop jon
Hadoop jonHadoop jon
Hadoop jon
Humoyun Ahmedov
 
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptxM. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
Dr.Florence Dayana
 
Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft Platform
Jesus Rodriguez
 
Cap 10 ingles
Cap  10 inglesCap  10 ingles
Cap 10 ingles
ElianaSalinas4
 
Cap 10 ingles
Cap  10 inglesCap  10 ingles
Cap 10 ingles
ElianaSalinas4
 
Unit IV.pdf
Unit IV.pdfUnit IV.pdf
Unit IV.pdf
KennyPratheepKumar
 
Hadoop and Big Data
Hadoop and Big DataHadoop and Big Data
Hadoop and Big Data
Harshdeep Kaur
 
Big data Analytics Hadoop
Big data Analytics HadoopBig data Analytics Hadoop
Big data Analytics Hadoop
Mishika Bharadwaj
 
List of Engineering Colleges in Uttarakhand
List of Engineering Colleges in UttarakhandList of Engineering Colleges in Uttarakhand
List of Engineering Colleges in Uttarakhand
Roorkee College of Engineering, Roorkee
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
sonukumar379092
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
arslanhaneef
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
tcloudcomputing-tw
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystem
sunera pathan
 
Hadoop and their in big data analysis EcoSystem.pptx
Hadoop and their in big data analysis EcoSystem.pptxHadoop and their in big data analysis EcoSystem.pptx
Hadoop and their in big data analysis EcoSystem.pptx
Rahul Borate
 
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
Venneladonthireddy1
 
Hadoop
HadoopHadoop
Hadoop
chandinisanz
 
hadoop-ecosystem-ppt.pptx
hadoop-ecosystem-ppt.pptxhadoop-ecosystem-ppt.pptx
hadoop-ecosystem-ppt.pptx
raghavanand36
 

Similar to Hadoop (20)

Hadoop training
Hadoop trainingHadoop training
Hadoop training
 
Anju
AnjuAnju
Anju
 
Introduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop EcosystemIntroduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop Ecosystem
 
Hadoop jon
Hadoop jonHadoop jon
Hadoop jon
 
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptxM. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
 
Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft Platform
 
Cap 10 ingles
Cap  10 inglesCap  10 ingles
Cap 10 ingles
 
Cap 10 ingles
Cap  10 inglesCap  10 ingles
Cap 10 ingles
 
Unit IV.pdf
Unit IV.pdfUnit IV.pdf
Unit IV.pdf
 
Hadoop and Big Data
Hadoop and Big DataHadoop and Big Data
Hadoop and Big Data
 
Big data Analytics Hadoop
Big data Analytics HadoopBig data Analytics Hadoop
Big data Analytics Hadoop
 
List of Engineering Colleges in Uttarakhand
List of Engineering Colleges in UttarakhandList of Engineering Colleges in Uttarakhand
List of Engineering Colleges in Uttarakhand
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystem
 
Hadoop and their in big data analysis EcoSystem.pptx
Hadoop and their in big data analysis EcoSystem.pptxHadoop and their in big data analysis EcoSystem.pptx
Hadoop and their in big data analysis EcoSystem.pptx
 
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
 
Hadoop
HadoopHadoop
Hadoop
 
hadoop-ecosystem-ppt.pptx
hadoop-ecosystem-ppt.pptxhadoop-ecosystem-ppt.pptx
hadoop-ecosystem-ppt.pptx
 

More from Nishant Gandhi

Customer Feedback Analytics for Starbucks
Customer Feedback Analytics for Starbucks Customer Feedback Analytics for Starbucks
Customer Feedback Analytics for Starbucks
Nishant Gandhi
 
Guest Lecture: Introduction to Big Data at Indian Institute of Technology
Guest Lecture: Introduction to Big Data at Indian Institute of TechnologyGuest Lecture: Introduction to Big Data at Indian Institute of Technology
Guest Lecture: Introduction to Big Data at Indian Institute of Technology
Nishant Gandhi
 
Processing Large Graphs
Processing Large GraphsProcessing Large Graphs
Processing Large Graphs
Nishant Gandhi
 
Graph Coloring Algorithms on Pregel Model using Hadoop
Graph Coloring Algorithms on Pregel Model using HadoopGraph Coloring Algorithms on Pregel Model using Hadoop
Graph Coloring Algorithms on Pregel Model using Hadoop
Nishant Gandhi
 
Neo4j vs giraph
Neo4j vs giraphNeo4j vs giraph
Neo4j vs giraph
Nishant Gandhi
 
Map reduce programming model to solve graph problems
Map reduce programming model to solve graph problemsMap reduce programming model to solve graph problems
Map reduce programming model to solve graph problems
Nishant Gandhi
 
Packet tracer practical guide
Packet tracer practical guidePacket tracer practical guide
Packet tracer practical guide
Nishant Gandhi
 
Hadoop Report
Hadoop ReportHadoop Report
Hadoop Report
Nishant Gandhi
 

More from Nishant Gandhi (8)

Customer Feedback Analytics for Starbucks
Customer Feedback Analytics for Starbucks Customer Feedback Analytics for Starbucks
Customer Feedback Analytics for Starbucks
 
Guest Lecture: Introduction to Big Data at Indian Institute of Technology
Guest Lecture: Introduction to Big Data at Indian Institute of TechnologyGuest Lecture: Introduction to Big Data at Indian Institute of Technology
Guest Lecture: Introduction to Big Data at Indian Institute of Technology
 
Processing Large Graphs
Processing Large GraphsProcessing Large Graphs
Processing Large Graphs
 
Graph Coloring Algorithms on Pregel Model using Hadoop
Graph Coloring Algorithms on Pregel Model using HadoopGraph Coloring Algorithms on Pregel Model using Hadoop
Graph Coloring Algorithms on Pregel Model using Hadoop
 
Neo4j vs giraph
Neo4j vs giraphNeo4j vs giraph
Neo4j vs giraph
 
Map reduce programming model to solve graph problems
Map reduce programming model to solve graph problemsMap reduce programming model to solve graph problems
Map reduce programming model to solve graph problems
 
Packet tracer practical guide
Packet tracer practical guidePacket tracer practical guide
Packet tracer practical guide
 
Hadoop Report
Hadoop ReportHadoop Report
Hadoop Report
 

Hadoop

  • 1. Hadoop Framework for Distributed Applications Nishant M Gandhi BE 4th YEAR Comp. Eng. C K Pithawalla College of Engineering & Technology,Surat.
  • 2. Hadoop • Introduction • History • Key Technologies – MapReduce – HDFS • Other Projects On Hadoop • Conclusion
  • 3. Introduction: What is ? Hadoop is a framework for running applications on large clusters built of commodity hardware. ----HADOOP WIKI Hadoop is a free, Java-based programming framework that supports the processing of large data sets in a distributed computing environment.
  • 4. Introduction (conti..) #1 Open Source #2 Part of Apache group #3 Power of JAVA #4 Supported By Big Web Giant Companies #1 Google’s Powerful Computation MapReduce Technology #2 Hadoop Distributed File System(HDFS) inspired by Google File System(GFS) #3 Used for Cluster & Distributed Computing #4 Support from…
  • 5. History: Inventor Doug Cutting, creator of Apache Lucene The Origin of the Name “Hadoop”: The name my kid gave a stuffed yellow elephant. Short, relatively easy to spell and pronounce, meaningless, and not used elsewhere: those are my naming criteria. ---Daug Cutting. Started with building Web Search Engine •Nutch in 2002 •Aim was to index billions of pages •Architecture can’t support billions of pages Google’s GFS in 2003 solved storage problem •Nutch Distributed Filesystem(NDFS) in 2004 Google’s MapReduce in 2004 •MapReduce implimented in Nutch 2005 Feb 2006 they moved out of Nutch to form an independent subproject of Lucene called Hadoop.
  • 6. History (conti..) At around the same time, Doug Cutting joined Yahoo February 2008 , Yahoo! announced that its production search index was being generated by a 10,000-core Hadoop cluster In January 2008, Hadoop was made its own top-level project at apache, confirming its success and its diverse, active community. By this time Hadoop was being used by many other companies besides Yahoo! such as • Last.fm • Facebook • The New York Times • Twitter • Microsoft • IBM
  • 7. Key Technologies: •MapReduce -Computational Parallel Programming Model -Technology developed by google •Hadoop Distributed File System -Distributed File System for large data set -Inspired by Google File System
  • 9. Key Technologies: MapReduce • Programming model developed at Google • Sort/merge based distributed computing • Initially, it was intended for their internal search/indexing application, but now used extensively by more organizations (e.g., Yahoo, Amazon.com, IBM, etc.) • It is functional style programming (e.g., LISP) that is naturally parallelizable across a large cluster of workstations or PCS. • The underlying system takes care of the partitioning of the input data, scheduling the program’s execution across several machines, handling machine failures, and managing required inter-machine communication. (This is the key for Hadoop’s success)
  • 10. Key Technologies: HDFS  At Google MapReduce operation are run on a special file system called Google File System (GFS) that is highly optimized for this purpose.  GFS is not open source.  Doug Cutting and others at Yahoo! reverse engineered the GFS and called it Hadoop Distributed File System (HDFS).
  • 12. Key Technologies: HDFS • Very Large Distributed File System – 10K nodes, 100 million files, 10 PB • Assumes Commodity Hardware – Files are replicated to handle hardware failure – Detect failures and recovers from them • Optimized for Batch Processing – Data locations exposed so that computations can move to where data resides – Provides very high aggregate bandwidth • User Space, runs on heterogeneous OS
  • 13. Other Projects on Hadoop: ZooKeeper: co-ordination services Pig: A high-level data-flow language and execution framework for parallel computation. Hive:A data warehouse infrastructure that provides data summarization and ad hoc querying. Chukwa: A data collection system for managing large distributed systems.
  • 14. Other Projects on Hadoop: Avro: Apache Avro is a data serialization system. Avro provides: •Rich data structures. •A compact, fast, binary data format. •A container file, to store persistent data. •Simple integration with dynamic languages. Just as Google's Bigtable leverages the distributed data storage provided by the Google File System, HBase provides Bigtable-like capabilities on top of Hadoop Core.
  • 15. Hadoop Architecture on DELL C Series Server:
  • 16. Conclusion: Hadoop has been very effective solution for companies dealing with the data in perabytes. It has solved many problems in industry related to huge data management and distributed system. As it is open source, so it is adopted by companies widely.