SlideShare a Scribd company logo
CDH
What is CDH?
•Popular distribution of Apache
Hadoop and related projects.
•Delivers scalable storage and
distributed computing.
•Apache-licensed open source
Big Data
•Collection of large data sets that cannot be processed using traditional
computing techniques
•Big Data challenges
•Storage
•Capturing data
•Analyze data
Hadoop
•What is Hadoop?
• The Apache Hadoop software library is a framework that allows for the
distributed processing of large data sets across clusters of computers using
simple programming models
•Hadoop components
•HDFS – Storage layer
•MapReduce – Processing layer
•YARN – Resource management layer
Hadoop
ecosystem
HDFS
•Stores different types of large data sets (structured, semi-structured
and unstructured)
•HDFS creates a level of abstraction over the resources, from where we
can see the whole HDFS as a single unit.
•Stores data across various resources and maintains the log file about
the stored data.(metadata)
HDFS Cont...
MapReduce
•Core component in Hadoop ecosystem for processing
•Two functions
•Map
•Reduce
Kafka
• Distributed publish-subscribe messaging system and a robust queue
that can handle a high volume of data and enables to pass messages
from one end-point to another
• Built on top of the ZooKeeper synchronization service
• Integrates well with Apache Storm and Spark for real-time streaming
data analysis
Sqoop
• Sqoop − SQL to Hadoop and Hadoop to SQL
• Tool designed to transfer data between Hadoop and relational
database servers
YARN
• Performs all processing activities by allocating resources
• YARN is an attempt to take Apache Hadoop beyond MapReduce for
data-processing
• Consists
• Resource manager
• Node manager
YARN
Hive
•Data warehouse software
•Provide data summarization, query and analysis
•Query language – Hive Query language (HQL)
•Uses metastore to store meta-data about the data
•Familiar built in user defined functions
Pig
• Write complex MapReduce transformations using a simple scripting
language called Pig Latin
• Pig translates the Pig Latin script into MapReduce
• Makes Hadoop data accessible for a variety of batch processing
workloads
• Data preparation
• ETL
• Data mining
Impala
• It is an interactive SQL like query engine that runs on top of Hadoop
Distributed File System (HDFS)
• Parallel processing SQL query engine for processing huge volume of
data
• Provide unified platform for real time queries.
• Impala is faster than Apache Hive.
• Impala is memory intensive and does not run effectively for heavy
data operations like joins.
Impala
HBase
• Wide column store database (NoSQL).
• Database built in top of the HDFS
• HBase does not support a structured query language like SQL
• Provides random, real time access to data in Hadoop
Spark
• Open-source distributed general-purpose cluster computing
framework with in-memory data processing engine
• It can run in Hadoop clusters through YARN and it can process data in
HDFS .
• Fast and general engine for large-scale data processing.
• High level API's for programming languages: Java, Python, Scarla, R
• Supports SQL queries, Streaming data, Machine learning (ML), and
Graph algorithms.
Mahout
• Use for creating scalable machine learning algorithms
• Implemented on top of Apache Hadoop and using the MapReduce
paradigm.
• Lets applications to analyze large sets of data effectively and in
quicktime
• Mahout provides the data science tools to automatically find
meaningful patterns in big data sets in HDFS.
SolR
• Open source platform for searches of data stored in HDFS
• Advanced full-text search
• Near real-time indexing
• Standards based upon interfaces like JSON, XML, HTTP
• Comprehensive HTML administration interfaces
Kudu
• Apache Kudu completes Hadoop's storage layer to enable fast
analytics on fast data.
• It runs on commodity hardware, is horizontally scalable, and supports
highly available operation.
• Integration with MapReduce, Spark and other Hadoop ecosystem
components.
• Strong performance for running sequential and random workloads
simultaneously
Sentry
• Granular, role-based authorization module for Hadoop
• Provides the ability to control and enforce precise levels of privileges
on data for authenticated users and applications on a Hadoop cluster.
• Designed to be a pluggable authorization engine for Hadoop
components.
• Allows to define authorization rules to validate a user
Thank You

More Related Content

What's hot

Hadoop and Big Data
Hadoop and Big DataHadoop and Big Data
Hadoop and Big Data
Harshdeep Kaur
 
Hadoop Backup and Disaster Recovery
Hadoop Backup and Disaster RecoveryHadoop Backup and Disaster Recovery
Hadoop Backup and Disaster Recovery
Cloudera, Inc.
 
OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)
OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)
OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)
SANG WON PARK
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
Alluxio, Inc.
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
DataWorks Summit
 
Apache Spark overview
Apache Spark overviewApache Spark overview
Apache Spark overview
DataArt
 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
Vadim Y. Bichutskiy
 
Introduction to Big Data and hadoop
Introduction to Big Data and hadoopIntroduction to Big Data and hadoop
Introduction to Big Data and hadoop
Sandeep Patil
 
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component
rebeccatho
 
ORC improvement in Apache Spark 2.3
ORC improvement in Apache Spark 2.3ORC improvement in Apache Spark 2.3
ORC improvement in Apache Spark 2.3
DataWorks Summit
 
Hadoop MapReduce Framework
Hadoop MapReduce FrameworkHadoop MapReduce Framework
Hadoop MapReduce Framework
Edureka!
 
Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeper
Saurav Haloi
 
Securing Hadoop with Apache Ranger
Securing Hadoop with Apache RangerSecuring Hadoop with Apache Ranger
Securing Hadoop with Apache Ranger
DataWorks Summit
 
Hadoop Strata Talk - Uber, your hadoop has arrived
Hadoop Strata Talk - Uber, your hadoop has arrived Hadoop Strata Talk - Uber, your hadoop has arrived
Hadoop Strata Talk - Uber, your hadoop has arrived
Vinoth Chandar
 
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Simplilearn
 
Apache Tez – Present and Future
Apache Tez – Present and FutureApache Tez – Present and Future
Apache Tez – Present and Future
DataWorks Summit
 
Hadoop
HadoopHadoop
Hadoop
Hadoop Hadoop
Hadoop
ABHIJEET RAJ
 
Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security Architecture
Owen O'Malley
 
YARN High Availability
YARN High AvailabilityYARN High Availability
YARN High Availability
DataWorks Summit
 

What's hot (20)

Hadoop and Big Data
Hadoop and Big DataHadoop and Big Data
Hadoop and Big Data
 
Hadoop Backup and Disaster Recovery
Hadoop Backup and Disaster RecoveryHadoop Backup and Disaster Recovery
Hadoop Backup and Disaster Recovery
 
OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)
OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)
OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
 
Apache Spark overview
Apache Spark overviewApache Spark overview
Apache Spark overview
 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
 
Introduction to Big Data and hadoop
Introduction to Big Data and hadoopIntroduction to Big Data and hadoop
Introduction to Big Data and hadoop
 
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component
 
ORC improvement in Apache Spark 2.3
ORC improvement in Apache Spark 2.3ORC improvement in Apache Spark 2.3
ORC improvement in Apache Spark 2.3
 
Hadoop MapReduce Framework
Hadoop MapReduce FrameworkHadoop MapReduce Framework
Hadoop MapReduce Framework
 
Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeper
 
Securing Hadoop with Apache Ranger
Securing Hadoop with Apache RangerSecuring Hadoop with Apache Ranger
Securing Hadoop with Apache Ranger
 
Hadoop Strata Talk - Uber, your hadoop has arrived
Hadoop Strata Talk - Uber, your hadoop has arrived Hadoop Strata Talk - Uber, your hadoop has arrived
Hadoop Strata Talk - Uber, your hadoop has arrived
 
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
 
Apache Tez – Present and Future
Apache Tez – Present and FutureApache Tez – Present and Future
Apache Tez – Present and Future
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop
Hadoop Hadoop
Hadoop
 
Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security Architecture
 
YARN High Availability
YARN High AvailabilityYARN High Availability
YARN High Availability
 

Similar to Cloudera Hadoop Distribution

Hadoop And Their Ecosystem ppt
 Hadoop And Their Ecosystem ppt Hadoop And Their Ecosystem ppt
Hadoop And Their Ecosystem ppt
sunera pathan
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystem
sunera pathan
 
hadoop-ecosystem-ppt.pptx
hadoop-ecosystem-ppt.pptxhadoop-ecosystem-ppt.pptx
hadoop-ecosystem-ppt.pptx
raghavanand36
 
Getting started big data
Getting started big dataGetting started big data
Getting started big data
Kibrom Gebrehiwot
 
BIGDATA ppts
BIGDATA pptsBIGDATA ppts
BIGDATA ppts
Krisshhna Daasaarii
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
Dr. C.V. Suresh Babu
 
Introduction To Hadoop Ecosystem
Introduction To Hadoop EcosystemIntroduction To Hadoop Ecosystem
Introduction To Hadoop Ecosystem
InSemble
 
Hadoop distributions - ecosystem
Hadoop distributions - ecosystemHadoop distributions - ecosystem
Hadoop distributions - ecosystem
Jakub Stransky
 
BDA R20 21NM - Summary Big Data Analytics
BDA R20 21NM - Summary Big Data AnalyticsBDA R20 21NM - Summary Big Data Analytics
BDA R20 21NM - Summary Big Data Analytics
NetajiGandi1
 
An Introduction-to-Hive and its Applications and Implementations.pptx
An Introduction-to-Hive and its Applications and Implementations.pptxAn Introduction-to-Hive and its Applications and Implementations.pptx
An Introduction-to-Hive and its Applications and Implementations.pptx
iaeronlineexm
 
Hadoop
HadoopHadoop
Hadoop
chandinisanz
 
Scaling Storage and Computation with Hadoop
Scaling Storage and Computation with HadoopScaling Storage and Computation with Hadoop
Scaling Storage and Computation with Hadoop
yaevents
 
Impala for PhillyDB Meetup
Impala for PhillyDB MeetupImpala for PhillyDB Meetup
Impala for PhillyDB Meetup
Shravan (Sean) Pabba
 
01-Introduction-to-Hive.pptx
01-Introduction-to-Hive.pptx01-Introduction-to-Hive.pptx
01-Introduction-to-Hive.pptx
VIJAYAPRABAP
 
Etu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和SparkEtu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和Spark
James Chen
 
Big data Hadoop
Big data  Hadoop   Big data  Hadoop
Big data Hadoop
Ayyappan Paramesh
 
Big Data and Cloud Computing
Big Data and Cloud ComputingBig Data and Cloud Computing
Big Data and Cloud Computing
Farzad Nozarian
 
Hadoop training
Hadoop trainingHadoop training
Hadoop training
TIB Academy
 
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptxM. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
Dr.Florence Dayana
 
Unit II Hadoop Ecosystem_Updated.pptx
Unit II Hadoop Ecosystem_Updated.pptxUnit II Hadoop Ecosystem_Updated.pptx
Unit II Hadoop Ecosystem_Updated.pptx
BhavanaHotchandani
 

Similar to Cloudera Hadoop Distribution (20)

Hadoop And Their Ecosystem ppt
 Hadoop And Their Ecosystem ppt Hadoop And Their Ecosystem ppt
Hadoop And Their Ecosystem ppt
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystem
 
hadoop-ecosystem-ppt.pptx
hadoop-ecosystem-ppt.pptxhadoop-ecosystem-ppt.pptx
hadoop-ecosystem-ppt.pptx
 
Getting started big data
Getting started big dataGetting started big data
Getting started big data
 
BIGDATA ppts
BIGDATA pptsBIGDATA ppts
BIGDATA ppts
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Introduction To Hadoop Ecosystem
Introduction To Hadoop EcosystemIntroduction To Hadoop Ecosystem
Introduction To Hadoop Ecosystem
 
Hadoop distributions - ecosystem
Hadoop distributions - ecosystemHadoop distributions - ecosystem
Hadoop distributions - ecosystem
 
BDA R20 21NM - Summary Big Data Analytics
BDA R20 21NM - Summary Big Data AnalyticsBDA R20 21NM - Summary Big Data Analytics
BDA R20 21NM - Summary Big Data Analytics
 
An Introduction-to-Hive and its Applications and Implementations.pptx
An Introduction-to-Hive and its Applications and Implementations.pptxAn Introduction-to-Hive and its Applications and Implementations.pptx
An Introduction-to-Hive and its Applications and Implementations.pptx
 
Hadoop
HadoopHadoop
Hadoop
 
Scaling Storage and Computation with Hadoop
Scaling Storage and Computation with HadoopScaling Storage and Computation with Hadoop
Scaling Storage and Computation with Hadoop
 
Impala for PhillyDB Meetup
Impala for PhillyDB MeetupImpala for PhillyDB Meetup
Impala for PhillyDB Meetup
 
01-Introduction-to-Hive.pptx
01-Introduction-to-Hive.pptx01-Introduction-to-Hive.pptx
01-Introduction-to-Hive.pptx
 
Etu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和SparkEtu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和Spark
 
Big data Hadoop
Big data  Hadoop   Big data  Hadoop
Big data Hadoop
 
Big Data and Cloud Computing
Big Data and Cloud ComputingBig Data and Cloud Computing
Big Data and Cloud Computing
 
Hadoop training
Hadoop trainingHadoop training
Hadoop training
 
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptxM. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
 
Unit II Hadoop Ecosystem_Updated.pptx
Unit II Hadoop Ecosystem_Updated.pptxUnit II Hadoop Ecosystem_Updated.pptx
Unit II Hadoop Ecosystem_Updated.pptx
 

Recently uploaded

Final ebook Keeping the Memory @live.pdf
Final ebook Keeping the Memory @live.pdfFinal ebook Keeping the Memory @live.pdf
Final ebook Keeping the Memory @live.pdf
Zuzana Mészárosová
 
Ardra Nakshatra (आर्द्रा): Understanding its Effects and Remedies
Ardra Nakshatra (आर्द्रा): Understanding its Effects and RemediesArdra Nakshatra (आर्द्रा): Understanding its Effects and Remedies
Ardra Nakshatra (आर्द्रा): Understanding its Effects and Remedies
Astro Pathshala
 
Chapter-2-Era-of-One-party-Dominance-Class-12-Political-Science-Notes-2 (1).pptx
Chapter-2-Era-of-One-party-Dominance-Class-12-Political-Science-Notes-2 (1).pptxChapter-2-Era-of-One-party-Dominance-Class-12-Political-Science-Notes-2 (1).pptx
Chapter-2-Era-of-One-party-Dominance-Class-12-Political-Science-Notes-2 (1).pptx
Brajeswar Paul
 
Webinar Innovative assessments for SOcial Emotional Skills
Webinar Innovative assessments for SOcial Emotional SkillsWebinar Innovative assessments for SOcial Emotional Skills
Webinar Innovative assessments for SOcial Emotional Skills
EduSkills OECD
 
NationalLearningCamp-2024-Orientation-for-RO-SDO.pptx
NationalLearningCamp-2024-Orientation-for-RO-SDO.pptxNationalLearningCamp-2024-Orientation-for-RO-SDO.pptx
NationalLearningCamp-2024-Orientation-for-RO-SDO.pptx
CelestineMiranda
 
Final_SD_Session3_Ferriols, Ador Dionisio, Fajardo.pptx
Final_SD_Session3_Ferriols, Ador Dionisio, Fajardo.pptxFinal_SD_Session3_Ferriols, Ador Dionisio, Fajardo.pptx
Final_SD_Session3_Ferriols, Ador Dionisio, Fajardo.pptx
shimeathdelrosario1
 
The membership Module in the Odoo 17 ERP
The membership Module in the Odoo 17 ERPThe membership Module in the Odoo 17 ERP
The membership Module in the Odoo 17 ERP
Celine George
 
CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 12 - GLOBAL SUCCESS - FORM MỚI 2025 - HK1 (C...
CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 12 - GLOBAL SUCCESS - FORM MỚI 2025 - HK1 (C...CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 12 - GLOBAL SUCCESS - FORM MỚI 2025 - HK1 (C...
CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 12 - GLOBAL SUCCESS - FORM MỚI 2025 - HK1 (C...
Nguyen Thanh Tu Collection
 
The Jewish Trinity : Sabbath,Shekinah and Sanctuary 4.pdf
The Jewish Trinity : Sabbath,Shekinah and Sanctuary 4.pdfThe Jewish Trinity : Sabbath,Shekinah and Sanctuary 4.pdf
The Jewish Trinity : Sabbath,Shekinah and Sanctuary 4.pdf
JackieSparrow3
 
Is Email Marketing Really Effective In 2024?
Is Email Marketing Really Effective In 2024?Is Email Marketing Really Effective In 2024?
Is Email Marketing Really Effective In 2024?
Rakesh Jalan
 
Righteous among Nations - eTwinning e-book (1).pdf
Righteous among Nations - eTwinning e-book (1).pdfRighteous among Nations - eTwinning e-book (1).pdf
Righteous among Nations - eTwinning e-book (1).pdf
Zuzana Mészárosová
 
2024 KWL Back 2 School Summer Conference
2024 KWL Back 2 School Summer Conference2024 KWL Back 2 School Summer Conference
2024 KWL Back 2 School Summer Conference
KlettWorldLanguages
 
How to Store Data on the Odoo 17 Website
How to Store Data on the Odoo 17 WebsiteHow to Store Data on the Odoo 17 Website
How to Store Data on the Odoo 17 Website
Celine George
 
ENGLISH-7-CURRICULUM MAP- MATATAG CURRICULUM
ENGLISH-7-CURRICULUM MAP- MATATAG CURRICULUMENGLISH-7-CURRICULUM MAP- MATATAG CURRICULUM
ENGLISH-7-CURRICULUM MAP- MATATAG CURRICULUM
HappieMontevirgenCas
 
The basics of sentences session 9pptx.pptx
The basics of sentences session 9pptx.pptxThe basics of sentences session 9pptx.pptx
The basics of sentences session 9pptx.pptx
heathfieldcps1
 
Views in Odoo - Advanced Views - Pivot View in Odoo 17
Views in Odoo - Advanced Views - Pivot View in Odoo 17Views in Odoo - Advanced Views - Pivot View in Odoo 17
Views in Odoo - Advanced Views - Pivot View in Odoo 17
Celine George
 
Front Desk Management in the Odoo 17 ERP
Front Desk  Management in the Odoo 17 ERPFront Desk  Management in the Odoo 17 ERP
Front Desk Management in the Odoo 17 ERP
Celine George
 
How to Add Colour Kanban Records in Odoo 17 Notebook
How to Add Colour Kanban Records in Odoo 17 NotebookHow to Add Colour Kanban Records in Odoo 17 Notebook
How to Add Colour Kanban Records in Odoo 17 Notebook
Celine George
 
Capitol Doctoral Presentation -June 2024v2.pptx
Capitol Doctoral Presentation -June 2024v2.pptxCapitol Doctoral Presentation -June 2024v2.pptx
Capitol Doctoral Presentation -June 2024v2.pptx
CapitolTechU
 
(T.L.E.) Agriculture: Essentials of Gardening
(T.L.E.) Agriculture: Essentials of Gardening(T.L.E.) Agriculture: Essentials of Gardening
(T.L.E.) Agriculture: Essentials of Gardening
MJDuyan
 

Recently uploaded (20)

Final ebook Keeping the Memory @live.pdf
Final ebook Keeping the Memory @live.pdfFinal ebook Keeping the Memory @live.pdf
Final ebook Keeping the Memory @live.pdf
 
Ardra Nakshatra (आर्द्रा): Understanding its Effects and Remedies
Ardra Nakshatra (आर्द्रा): Understanding its Effects and RemediesArdra Nakshatra (आर्द्रा): Understanding its Effects and Remedies
Ardra Nakshatra (आर्द्रा): Understanding its Effects and Remedies
 
Chapter-2-Era-of-One-party-Dominance-Class-12-Political-Science-Notes-2 (1).pptx
Chapter-2-Era-of-One-party-Dominance-Class-12-Political-Science-Notes-2 (1).pptxChapter-2-Era-of-One-party-Dominance-Class-12-Political-Science-Notes-2 (1).pptx
Chapter-2-Era-of-One-party-Dominance-Class-12-Political-Science-Notes-2 (1).pptx
 
Webinar Innovative assessments for SOcial Emotional Skills
Webinar Innovative assessments for SOcial Emotional SkillsWebinar Innovative assessments for SOcial Emotional Skills
Webinar Innovative assessments for SOcial Emotional Skills
 
NationalLearningCamp-2024-Orientation-for-RO-SDO.pptx
NationalLearningCamp-2024-Orientation-for-RO-SDO.pptxNationalLearningCamp-2024-Orientation-for-RO-SDO.pptx
NationalLearningCamp-2024-Orientation-for-RO-SDO.pptx
 
Final_SD_Session3_Ferriols, Ador Dionisio, Fajardo.pptx
Final_SD_Session3_Ferriols, Ador Dionisio, Fajardo.pptxFinal_SD_Session3_Ferriols, Ador Dionisio, Fajardo.pptx
Final_SD_Session3_Ferriols, Ador Dionisio, Fajardo.pptx
 
The membership Module in the Odoo 17 ERP
The membership Module in the Odoo 17 ERPThe membership Module in the Odoo 17 ERP
The membership Module in the Odoo 17 ERP
 
CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 12 - GLOBAL SUCCESS - FORM MỚI 2025 - HK1 (C...
CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 12 - GLOBAL SUCCESS - FORM MỚI 2025 - HK1 (C...CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 12 - GLOBAL SUCCESS - FORM MỚI 2025 - HK1 (C...
CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 12 - GLOBAL SUCCESS - FORM MỚI 2025 - HK1 (C...
 
The Jewish Trinity : Sabbath,Shekinah and Sanctuary 4.pdf
The Jewish Trinity : Sabbath,Shekinah and Sanctuary 4.pdfThe Jewish Trinity : Sabbath,Shekinah and Sanctuary 4.pdf
The Jewish Trinity : Sabbath,Shekinah and Sanctuary 4.pdf
 
Is Email Marketing Really Effective In 2024?
Is Email Marketing Really Effective In 2024?Is Email Marketing Really Effective In 2024?
Is Email Marketing Really Effective In 2024?
 
Righteous among Nations - eTwinning e-book (1).pdf
Righteous among Nations - eTwinning e-book (1).pdfRighteous among Nations - eTwinning e-book (1).pdf
Righteous among Nations - eTwinning e-book (1).pdf
 
2024 KWL Back 2 School Summer Conference
2024 KWL Back 2 School Summer Conference2024 KWL Back 2 School Summer Conference
2024 KWL Back 2 School Summer Conference
 
How to Store Data on the Odoo 17 Website
How to Store Data on the Odoo 17 WebsiteHow to Store Data on the Odoo 17 Website
How to Store Data on the Odoo 17 Website
 
ENGLISH-7-CURRICULUM MAP- MATATAG CURRICULUM
ENGLISH-7-CURRICULUM MAP- MATATAG CURRICULUMENGLISH-7-CURRICULUM MAP- MATATAG CURRICULUM
ENGLISH-7-CURRICULUM MAP- MATATAG CURRICULUM
 
The basics of sentences session 9pptx.pptx
The basics of sentences session 9pptx.pptxThe basics of sentences session 9pptx.pptx
The basics of sentences session 9pptx.pptx
 
Views in Odoo - Advanced Views - Pivot View in Odoo 17
Views in Odoo - Advanced Views - Pivot View in Odoo 17Views in Odoo - Advanced Views - Pivot View in Odoo 17
Views in Odoo - Advanced Views - Pivot View in Odoo 17
 
Front Desk Management in the Odoo 17 ERP
Front Desk  Management in the Odoo 17 ERPFront Desk  Management in the Odoo 17 ERP
Front Desk Management in the Odoo 17 ERP
 
How to Add Colour Kanban Records in Odoo 17 Notebook
How to Add Colour Kanban Records in Odoo 17 NotebookHow to Add Colour Kanban Records in Odoo 17 Notebook
How to Add Colour Kanban Records in Odoo 17 Notebook
 
Capitol Doctoral Presentation -June 2024v2.pptx
Capitol Doctoral Presentation -June 2024v2.pptxCapitol Doctoral Presentation -June 2024v2.pptx
Capitol Doctoral Presentation -June 2024v2.pptx
 
(T.L.E.) Agriculture: Essentials of Gardening
(T.L.E.) Agriculture: Essentials of Gardening(T.L.E.) Agriculture: Essentials of Gardening
(T.L.E.) Agriculture: Essentials of Gardening
 

Cloudera Hadoop Distribution

  • 1. CDH
  • 2. What is CDH? •Popular distribution of Apache Hadoop and related projects. •Delivers scalable storage and distributed computing. •Apache-licensed open source
  • 3. Big Data •Collection of large data sets that cannot be processed using traditional computing techniques •Big Data challenges •Storage •Capturing data •Analyze data
  • 4. Hadoop •What is Hadoop? • The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models •Hadoop components •HDFS – Storage layer •MapReduce – Processing layer •YARN – Resource management layer
  • 6. HDFS •Stores different types of large data sets (structured, semi-structured and unstructured) •HDFS creates a level of abstraction over the resources, from where we can see the whole HDFS as a single unit. •Stores data across various resources and maintains the log file about the stored data.(metadata)
  • 8. MapReduce •Core component in Hadoop ecosystem for processing •Two functions •Map •Reduce
  • 9. Kafka • Distributed publish-subscribe messaging system and a robust queue that can handle a high volume of data and enables to pass messages from one end-point to another • Built on top of the ZooKeeper synchronization service • Integrates well with Apache Storm and Spark for real-time streaming data analysis
  • 10. Sqoop • Sqoop − SQL to Hadoop and Hadoop to SQL • Tool designed to transfer data between Hadoop and relational database servers
  • 11. YARN • Performs all processing activities by allocating resources • YARN is an attempt to take Apache Hadoop beyond MapReduce for data-processing • Consists • Resource manager • Node manager
  • 12. YARN
  • 13. Hive •Data warehouse software •Provide data summarization, query and analysis •Query language – Hive Query language (HQL) •Uses metastore to store meta-data about the data •Familiar built in user defined functions
  • 14. Pig • Write complex MapReduce transformations using a simple scripting language called Pig Latin • Pig translates the Pig Latin script into MapReduce • Makes Hadoop data accessible for a variety of batch processing workloads • Data preparation • ETL • Data mining
  • 15. Impala • It is an interactive SQL like query engine that runs on top of Hadoop Distributed File System (HDFS) • Parallel processing SQL query engine for processing huge volume of data • Provide unified platform for real time queries. • Impala is faster than Apache Hive. • Impala is memory intensive and does not run effectively for heavy data operations like joins.
  • 17. HBase • Wide column store database (NoSQL). • Database built in top of the HDFS • HBase does not support a structured query language like SQL • Provides random, real time access to data in Hadoop
  • 18. Spark • Open-source distributed general-purpose cluster computing framework with in-memory data processing engine • It can run in Hadoop clusters through YARN and it can process data in HDFS . • Fast and general engine for large-scale data processing. • High level API's for programming languages: Java, Python, Scarla, R • Supports SQL queries, Streaming data, Machine learning (ML), and Graph algorithms.
  • 19. Mahout • Use for creating scalable machine learning algorithms • Implemented on top of Apache Hadoop and using the MapReduce paradigm. • Lets applications to analyze large sets of data effectively and in quicktime • Mahout provides the data science tools to automatically find meaningful patterns in big data sets in HDFS.
  • 20. SolR • Open source platform for searches of data stored in HDFS • Advanced full-text search • Near real-time indexing • Standards based upon interfaces like JSON, XML, HTTP • Comprehensive HTML administration interfaces
  • 21. Kudu • Apache Kudu completes Hadoop's storage layer to enable fast analytics on fast data. • It runs on commodity hardware, is horizontally scalable, and supports highly available operation. • Integration with MapReduce, Spark and other Hadoop ecosystem components. • Strong performance for running sequential and random workloads simultaneously
  • 22. Sentry • Granular, role-based authorization module for Hadoop • Provides the ability to control and enforce precise levels of privileges on data for authenticated users and applications on a Hadoop cluster. • Designed to be a pluggable authorization engine for Hadoop components. • Allows to define authorization rules to validate a user

Editor's Notes

  1. only Hadoop solution to offer unified batch processing, interactive SQL and interactive search, and role-based access controls.
  2. fault-tolerant and self-healing distributed file-system turn a cluster of industry-standard servers into a massively scalable pool of storage.  Features Scalability Flexibility Reliability
  3. Accessibility Reliability Flexibility Hadoop scalable
  4. we have two main challenges.The first challenge is how to collect large volume of data and the second challenge is to analyze the collected data
  5. front end for parsing SQL statements, generating logical plans, optimizing logical plans, translating them into physical plans which are executed by MapReduce jobs.
  6. 1 line of pig latin = approximately 100 lines of map reduce
  7. Nosql database Characteristics – fault tolerance(replication) , fast(near real time lookups), usable(Data model accommodates wide range of use cases)
  8. Apache Spark’s Streaming and SQL programming models with MLlib and GraphX make it easier for developers and data scientists to build applications that exploit machine learning and graph analytics.
  9. SUPPORTS Collaborative filtering Clustering Classification Frequent itemset mining
  10. Solr is highly reliable, scalable and fault tolerant. Hadoop operators put documents in Apache Solr by “indexing” via XML, JSON, CSV or binary over HTTP. Then users can query those petabytes of data via HTTP GET. They can receive XML, JSON, CSV or binary results. Apache Solr is optimized for high volume web traffic.
  11. Sentry currently works out of the box with Apache Hive, Hive Metastore/HCatalog, Apache Solr, Impala, and HDFS 
  12. Sentry currently works out of the box with Apache Hive, Hive Metastore/HCatalog, Apache Solr, Impala, and HDFS