SlideShare a Scribd company logo
Hive
Presented by : Mohammad
Mashhoood Syed
What is Hive?
• Apache Hive is a data warehouse software built on top of
Hadoop that facilitates reading, writing and managing
large datasets residing in distributed storage using SQL.
• Hive provides the necessary SQL abstraction so that SQL-
like queries can be integrated with the underlying Java
code without having to implement the queries in the
low-level Java API
• It allows structure to be projected onto data that is
already in storage.
• It can create schemas/table definitions that
point to data in Hadoop, turning unstructured
data into structured data.
• Helps to treat your data in Hadoop as Tables;
which can be partitioned and bucketed.
Hive is not
• A relational database
• A design for OnLine Transaction Processing
(OLTP)
• A language for real-time queries and row-level
updates
Features of hive
• Hive is fast and scalable.
• It provides SQL-like queries (i.e., HQL) that are
implicitly transformed to MapReduce or Spark
jobs.
• It is capable of analyzing large datasets stored in
HDFS.
• It can operate on compressed data stored in the
Hadoop ecosystem.
• It supports user-defined functions (UDFs) where
user can provide its functionality.
Hive Origination
• Hive originated as an internal project
in Facebook
• Later it was adopted in Apache as an
open source project
• Facebook deals with massive amount
of data (petabytes scale) and it needs
to perform more than 75k ad-hoc
queries on this massive amount of
data
Why Hive?
• Since the data is collected from multiple
servers and is of diverse nature, any RDBMS
system could not fit as probable solution
• Map Reduce could be a natural choice, but it
had its own limitations
Architecture
Working
1. Execute Query: The Hive interface such as Command Line or Web UI sends
query to Driver (any database driver such as JDBC, ODBC, etc.) to execute
2. Get Plan: The driver takes the help of query compiler that parses the query to
check the syntax and query plan or the requirement of query
3. Get Metadata: The compiler sends metadata request to Metastore (any
database).
4. Send Metadata : Metastore sends metadata as a response to the compiler.
5. Send Plan : The compiler checks the requirement and resends the plan to the
driver. Up to here, the parsing and compiling of a query is complete.
6. Execute Plan: The driver sends the execute plan to the execution engine.
7.Execute Job: Internally, the process of execution job is a MapReduce job.
The execution engine sends the job to JobTracker, which is in Name node and
it assigns this job to TaskTracker, which is in Data node. Here, the query
executes MapReduce job.
Data modeling
Tables
Partitions
buckets
Here tables are organized into partitions for
grouping same type of data based on partition
key
Partitions divided further into buckets based on
some other column
Tables in hive are created the same way it is
done in RDBMS
Different modes of Hive
• Hive can operate in two modes depending on
the size of data nodes in Hadoop.
• These modes are :
• Local mode
• Map reduce mode
Local Mode
• If the Hadoop installed under pseudo mode
with having one data node we use Hive in this
mode
• If the data size is smaller in term of limited to
single local machine, we can use this mode
• Processing will be very fast on smaller data
sets present in the local machine
Map Reduce mode
• If Hadoop is having multiple data nodes and
data is distributed across different node we
use Hive in this mode
• It will perform on large amount of data sets
and query going to execute in parallel way
• Processing of large data sets with better
performance can be achieved through this
mode
Advantages of hive
• Keeps queries running fast
• Takes very little time to write Hive query in
comparison to MapReduce code
• HiveQL is a declarative language like SQL
• Multiple users can simultaneously query the
data using Hive-QL.
• Very easy to write query including joins in Hive
• Simple to learn and use
Disadvantages of Hive
• It's not designed for Online transaction
processing (OLTP), it is only used for the
Online Analytical Processing (OLAP).
• Hive supports overwriting or apprehending
data, but not updates and deletes.
• Sub-queries are not supported, in Hive
Copying file from local system into
Hadoop environment
• Hdfs dfs –copyFromLocal (file path)
destination path
Creating table
Give full file path
Unveiling Hive: A Comprehensive Exploration of Hive in Hadoop Ecosystem
Unveiling Hive: A Comprehensive Exploration of Hive in Hadoop Ecosystem
Unveiling Hive: A Comprehensive Exploration of Hive in Hadoop Ecosystem
Unveiling Hive: A Comprehensive Exploration of Hive in Hadoop Ecosystem
IMDB dataset
Number of movies per year
select startyear,count(*) as count from
movies where startyear > 2000 and
startyear < 2022 group by startyear
order by count;
Comedy movies
• select primarytitle,startyear,runtimeminutes,genres from
movies where array_contains(genres,"Comedy");
• select distinct titletype from movies;
Upcoming horror movies
select * from movies where titletype = 'movie'
and startyear > 2021 and
array_contains(genres,"Horror");
Movies in 2021 with rating more than 9
select m.startyear,m.titletype,m.primarytitle,r.averagerating,m.genres from movies as
m join rating as r on m.tconst = r.tconst
where m.titletype = 'movie' and m.startyear = 2021 and r.averagerating > 9 ;
Action series with rating more than 9
select m.startyear,m.titletype,m.primarytitle,r.averagerating,m.genres from movies as
m join rating as r on m.tconst = r.tconst
where m.titletype = 'tvSeries' and r.averagerating > 9 and
array_contains(genres,"Action");
•THANK YOU !

More Related Content

Similar to Unveiling Hive: A Comprehensive Exploration of Hive in Hadoop Ecosystem

Big Data Developers Moscow Meetup 1 - sql on hadoop
Big Data Developers Moscow Meetup 1  - sql on hadoopBig Data Developers Moscow Meetup 1  - sql on hadoop
Big Data Developers Moscow Meetup 1 - sql on hadoop
bddmoscow
 
Working with Hive Analytics
Working with Hive AnalyticsWorking with Hive Analytics
Working with Hive Analytics
Manish Chopra
 
Etu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和SparkEtu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和Spark
James Chen
 
SQL Server 2012 and Big Data
SQL Server 2012 and Big DataSQL Server 2012 and Big Data
SQL Server 2012 and Big Data
Microsoft TechNet - Belgium and Luxembourg
 
Apache Hive
Apache HiveApache Hive
Apache Hive
tusharsinghal58
 
6.hive
6.hive6.hive
Apache Hadoop Hive
Apache Hadoop HiveApache Hadoop Hive
Apache Hadoop Hive
Some corner at the Laboratory
 
Impala for PhillyDB Meetup
Impala for PhillyDB MeetupImpala for PhillyDB Meetup
Impala for PhillyDB Meetup
Shravan (Sean) Pabba
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
Dr. C.V. Suresh Babu
 
Technologies for Data Analytics Platform
Technologies for Data Analytics PlatformTechnologies for Data Analytics Platform
Technologies for Data Analytics Platform
N Masahiro
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
ch adnan
 
Big data - Online Training
Big data - Online TrainingBig data - Online Training
Big data - Online Training
Learntek1
 
Big Data & Analytics (CSE6005) L6.pptx
Big Data & Analytics (CSE6005) L6.pptxBig Data & Analytics (CSE6005) L6.pptx
Big Data & Analytics (CSE6005) L6.pptx
Anonymous9etQKwW
 
List of Engineering Colleges in Uttarakhand
List of Engineering Colleges in UttarakhandList of Engineering Colleges in Uttarakhand
List of Engineering Colleges in Uttarakhand
Roorkee College of Engineering, Roorkee
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
sonukumar379092
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
arslanhaneef
 
Hadoop ppt1
Hadoop ppt1Hadoop ppt1
Hadoop ppt1
chariorienit
 
Unit 5-apache hive
Unit 5-apache hiveUnit 5-apache hive
Unit 5-apache hive
vishal choudhary
 
Getting started big data
Getting started big dataGetting started big data
Getting started big data
Kibrom Gebrehiwot
 

Similar to Unveiling Hive: A Comprehensive Exploration of Hive in Hadoop Ecosystem (20)

Big Data Developers Moscow Meetup 1 - sql on hadoop
Big Data Developers Moscow Meetup 1  - sql on hadoopBig Data Developers Moscow Meetup 1  - sql on hadoop
Big Data Developers Moscow Meetup 1 - sql on hadoop
 
Working with Hive Analytics
Working with Hive AnalyticsWorking with Hive Analytics
Working with Hive Analytics
 
Etu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和SparkEtu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和Spark
 
SQL Server 2012 and Big Data
SQL Server 2012 and Big DataSQL Server 2012 and Big Data
SQL Server 2012 and Big Data
 
Apache Hive
Apache HiveApache Hive
Apache Hive
 
6.hive
6.hive6.hive
6.hive
 
Apache Hadoop Hive
Apache Hadoop HiveApache Hadoop Hive
Apache Hadoop Hive
 
Impala for PhillyDB Meetup
Impala for PhillyDB MeetupImpala for PhillyDB Meetup
Impala for PhillyDB Meetup
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Technologies for Data Analytics Platform
Technologies for Data Analytics PlatformTechnologies for Data Analytics Platform
Technologies for Data Analytics Platform
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Big data - Online Training
Big data - Online TrainingBig data - Online Training
Big data - Online Training
 
Big Data & Analytics (CSE6005) L6.pptx
Big Data & Analytics (CSE6005) L6.pptxBig Data & Analytics (CSE6005) L6.pptx
Big Data & Analytics (CSE6005) L6.pptx
 
List of Engineering Colleges in Uttarakhand
List of Engineering Colleges in UttarakhandList of Engineering Colleges in Uttarakhand
List of Engineering Colleges in Uttarakhand
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Hadoop ppt1
Hadoop ppt1Hadoop ppt1
Hadoop ppt1
 
Unit 5-apache hive
Unit 5-apache hiveUnit 5-apache hive
Unit 5-apache hive
 
Getting started big data
Getting started big dataGetting started big data
Getting started big data
 

Recently uploaded

Getting Started with Interactive Brokers API and Python.pdf
Getting Started with Interactive Brokers API and Python.pdfGetting Started with Interactive Brokers API and Python.pdf
Getting Started with Interactive Brokers API and Python.pdf
Riya Sen
 
SOFTWARE ENGINEERING-UNIT-1SOFTWARE ENGINEERING
SOFTWARE ENGINEERING-UNIT-1SOFTWARE ENGINEERINGSOFTWARE ENGINEERING-UNIT-1SOFTWARE ENGINEERING
SOFTWARE ENGINEERING-UNIT-1SOFTWARE ENGINEERING
PrabhuB33
 
CT AnGIOGRAPHY of pulmonary embolism.pptx
CT AnGIOGRAPHY of pulmonary embolism.pptxCT AnGIOGRAPHY of pulmonary embolism.pptx
CT AnGIOGRAPHY of pulmonary embolism.pptx
RejoJohn2
 
Parcel Delivery - Intel Segmentation and Last Mile Opt.pptx
Parcel Delivery - Intel Segmentation and Last Mile Opt.pptxParcel Delivery - Intel Segmentation and Last Mile Opt.pptx
Parcel Delivery - Intel Segmentation and Last Mile Opt.pptx
AltanAtabarut
 
Parcel Delivery - Intel Segmentation and Last Mile Opt.pdf
Parcel Delivery - Intel Segmentation and Last Mile Opt.pdfParcel Delivery - Intel Segmentation and Last Mile Opt.pdf
Parcel Delivery - Intel Segmentation and Last Mile Opt.pdf
AltanAtabarut
 
Vrinda store data analysis project using Excel
Vrinda store data analysis project using ExcelVrinda store data analysis project using Excel
Vrinda store data analysis project using Excel
SantuJana12
 
Data Analytics for Decision Making By District 11 Solutions
Data Analytics for Decision Making By District 11 SolutionsData Analytics for Decision Making By District 11 Solutions
Data Analytics for Decision Making By District 11 Solutions
District 11 Solutions
 
Cal Girls Hotel Safari Jaipur | | Girls Call Free Drop Service
Cal Girls Hotel Safari Jaipur | | Girls Call Free Drop ServiceCal Girls Hotel Safari Jaipur | | Girls Call Free Drop Service
Cal Girls Hotel Safari Jaipur | | Girls Call Free Drop Service
Deepikakumari457585
 
Accounting and Auditing Laws-Rules-and-Regulations
Accounting and Auditing Laws-Rules-and-RegulationsAccounting and Auditing Laws-Rules-and-Regulations
Accounting and Auditing Laws-Rules-and-Regulations
DALubis
 
future-of-asset-management-future-of-asset-management
future-of-asset-management-future-of-asset-managementfuture-of-asset-management-future-of-asset-management
future-of-asset-management-future-of-asset-management
Aadee4
 
Big Data and Analytics Shaping the future of Payments
Big Data and Analytics Shaping the future of PaymentsBig Data and Analytics Shaping the future of Payments
Big Data and Analytics Shaping the future of Payments
RuchiRathor2
 
Histology of Muscle types histology o.ppt
Histology of Muscle types histology o.pptHistology of Muscle types histology o.ppt
Histology of Muscle types histology o.ppt
SamanArshad11
 
Bimbingan kaunseling untuk pelajar IPTA/IPTS di Malaysia
Bimbingan kaunseling untuk pelajar IPTA/IPTS di MalaysiaBimbingan kaunseling untuk pelajar IPTA/IPTS di Malaysia
Bimbingan kaunseling untuk pelajar IPTA/IPTS di Malaysia
aznidajailani
 
From Signals to Solutions: Effective Strategies for CDR Analysis in Fraud Det...
From Signals to Solutions: Effective Strategies for CDR Analysis in Fraud Det...From Signals to Solutions: Effective Strategies for CDR Analysis in Fraud Det...
From Signals to Solutions: Effective Strategies for CDR Analysis in Fraud Det...
Milind Agarwal
 
SFBA Splunk Usergroup meeting July 17, 2024
SFBA Splunk Usergroup meeting July 17, 2024SFBA Splunk Usergroup meeting July 17, 2024
SFBA Splunk Usergroup meeting July 17, 2024
Becky Burwell
 
Cal Girls Mansarovar Jaipur | 08445551418 | Rajni High Profile Girls Call in ...
Cal Girls Mansarovar Jaipur | 08445551418 | Rajni High Profile Girls Call in ...Cal Girls Mansarovar Jaipur | 08445551418 | Rajni High Profile Girls Call in ...
Cal Girls Mansarovar Jaipur | 08445551418 | Rajni High Profile Girls Call in ...
femim26318
 
Data Storytelling Final Project for MBA 635
Data Storytelling Final Project for MBA 635Data Storytelling Final Project for MBA 635
Data Storytelling Final Project for MBA 635
HeidiLivengood
 
DESIGN AND DEVELOPMENT OF AUTO OXYGEN CONCENTRATOR WITH SOS ALERT FOR HIKING ...
DESIGN AND DEVELOPMENT OF AUTO OXYGEN CONCENTRATOR WITH SOS ALERT FOR HIKING ...DESIGN AND DEVELOPMENT OF AUTO OXYGEN CONCENTRATOR WITH SOS ALERT FOR HIKING ...
DESIGN AND DEVELOPMENT OF AUTO OXYGEN CONCENTRATOR WITH SOS ALERT FOR HIKING ...
JeevanKp7
 
Solution Manual for First Course in Abstract Algebra A, 8th Edition by John B...
Solution Manual for First Course in Abstract Algebra A, 8th Edition by John B...Solution Manual for First Course in Abstract Algebra A, 8th Edition by John B...
Solution Manual for First Course in Abstract Algebra A, 8th Edition by John B...
rightmanforbloodline
 
The Rise of Python in Finance,Automating Trading Strategies: _.pdf
The Rise of Python in Finance,Automating Trading Strategies: _.pdfThe Rise of Python in Finance,Automating Trading Strategies: _.pdf
The Rise of Python in Finance,Automating Trading Strategies: _.pdf
Riya Sen
 

Recently uploaded (20)

Getting Started with Interactive Brokers API and Python.pdf
Getting Started with Interactive Brokers API and Python.pdfGetting Started with Interactive Brokers API and Python.pdf
Getting Started with Interactive Brokers API and Python.pdf
 
SOFTWARE ENGINEERING-UNIT-1SOFTWARE ENGINEERING
SOFTWARE ENGINEERING-UNIT-1SOFTWARE ENGINEERINGSOFTWARE ENGINEERING-UNIT-1SOFTWARE ENGINEERING
SOFTWARE ENGINEERING-UNIT-1SOFTWARE ENGINEERING
 
CT AnGIOGRAPHY of pulmonary embolism.pptx
CT AnGIOGRAPHY of pulmonary embolism.pptxCT AnGIOGRAPHY of pulmonary embolism.pptx
CT AnGIOGRAPHY of pulmonary embolism.pptx
 
Parcel Delivery - Intel Segmentation and Last Mile Opt.pptx
Parcel Delivery - Intel Segmentation and Last Mile Opt.pptxParcel Delivery - Intel Segmentation and Last Mile Opt.pptx
Parcel Delivery - Intel Segmentation and Last Mile Opt.pptx
 
Parcel Delivery - Intel Segmentation and Last Mile Opt.pdf
Parcel Delivery - Intel Segmentation and Last Mile Opt.pdfParcel Delivery - Intel Segmentation and Last Mile Opt.pdf
Parcel Delivery - Intel Segmentation and Last Mile Opt.pdf
 
Vrinda store data analysis project using Excel
Vrinda store data analysis project using ExcelVrinda store data analysis project using Excel
Vrinda store data analysis project using Excel
 
Data Analytics for Decision Making By District 11 Solutions
Data Analytics for Decision Making By District 11 SolutionsData Analytics for Decision Making By District 11 Solutions
Data Analytics for Decision Making By District 11 Solutions
 
Cal Girls Hotel Safari Jaipur | | Girls Call Free Drop Service
Cal Girls Hotel Safari Jaipur | | Girls Call Free Drop ServiceCal Girls Hotel Safari Jaipur | | Girls Call Free Drop Service
Cal Girls Hotel Safari Jaipur | | Girls Call Free Drop Service
 
Accounting and Auditing Laws-Rules-and-Regulations
Accounting and Auditing Laws-Rules-and-RegulationsAccounting and Auditing Laws-Rules-and-Regulations
Accounting and Auditing Laws-Rules-and-Regulations
 
future-of-asset-management-future-of-asset-management
future-of-asset-management-future-of-asset-managementfuture-of-asset-management-future-of-asset-management
future-of-asset-management-future-of-asset-management
 
Big Data and Analytics Shaping the future of Payments
Big Data and Analytics Shaping the future of PaymentsBig Data and Analytics Shaping the future of Payments
Big Data and Analytics Shaping the future of Payments
 
Histology of Muscle types histology o.ppt
Histology of Muscle types histology o.pptHistology of Muscle types histology o.ppt
Histology of Muscle types histology o.ppt
 
Bimbingan kaunseling untuk pelajar IPTA/IPTS di Malaysia
Bimbingan kaunseling untuk pelajar IPTA/IPTS di MalaysiaBimbingan kaunseling untuk pelajar IPTA/IPTS di Malaysia
Bimbingan kaunseling untuk pelajar IPTA/IPTS di Malaysia
 
From Signals to Solutions: Effective Strategies for CDR Analysis in Fraud Det...
From Signals to Solutions: Effective Strategies for CDR Analysis in Fraud Det...From Signals to Solutions: Effective Strategies for CDR Analysis in Fraud Det...
From Signals to Solutions: Effective Strategies for CDR Analysis in Fraud Det...
 
SFBA Splunk Usergroup meeting July 17, 2024
SFBA Splunk Usergroup meeting July 17, 2024SFBA Splunk Usergroup meeting July 17, 2024
SFBA Splunk Usergroup meeting July 17, 2024
 
Cal Girls Mansarovar Jaipur | 08445551418 | Rajni High Profile Girls Call in ...
Cal Girls Mansarovar Jaipur | 08445551418 | Rajni High Profile Girls Call in ...Cal Girls Mansarovar Jaipur | 08445551418 | Rajni High Profile Girls Call in ...
Cal Girls Mansarovar Jaipur | 08445551418 | Rajni High Profile Girls Call in ...
 
Data Storytelling Final Project for MBA 635
Data Storytelling Final Project for MBA 635Data Storytelling Final Project for MBA 635
Data Storytelling Final Project for MBA 635
 
DESIGN AND DEVELOPMENT OF AUTO OXYGEN CONCENTRATOR WITH SOS ALERT FOR HIKING ...
DESIGN AND DEVELOPMENT OF AUTO OXYGEN CONCENTRATOR WITH SOS ALERT FOR HIKING ...DESIGN AND DEVELOPMENT OF AUTO OXYGEN CONCENTRATOR WITH SOS ALERT FOR HIKING ...
DESIGN AND DEVELOPMENT OF AUTO OXYGEN CONCENTRATOR WITH SOS ALERT FOR HIKING ...
 
Solution Manual for First Course in Abstract Algebra A, 8th Edition by John B...
Solution Manual for First Course in Abstract Algebra A, 8th Edition by John B...Solution Manual for First Course in Abstract Algebra A, 8th Edition by John B...
Solution Manual for First Course in Abstract Algebra A, 8th Edition by John B...
 
The Rise of Python in Finance,Automating Trading Strategies: _.pdf
The Rise of Python in Finance,Automating Trading Strategies: _.pdfThe Rise of Python in Finance,Automating Trading Strategies: _.pdf
The Rise of Python in Finance,Automating Trading Strategies: _.pdf
 

Unveiling Hive: A Comprehensive Exploration of Hive in Hadoop Ecosystem

  • 1. Hive Presented by : Mohammad Mashhoood Syed
  • 2. What is Hive? • Apache Hive is a data warehouse software built on top of Hadoop that facilitates reading, writing and managing large datasets residing in distributed storage using SQL. • Hive provides the necessary SQL abstraction so that SQL- like queries can be integrated with the underlying Java code without having to implement the queries in the low-level Java API • It allows structure to be projected onto data that is already in storage.
  • 3. • It can create schemas/table definitions that point to data in Hadoop, turning unstructured data into structured data. • Helps to treat your data in Hadoop as Tables; which can be partitioned and bucketed.
  • 4. Hive is not • A relational database • A design for OnLine Transaction Processing (OLTP) • A language for real-time queries and row-level updates
  • 5. Features of hive • Hive is fast and scalable. • It provides SQL-like queries (i.e., HQL) that are implicitly transformed to MapReduce or Spark jobs. • It is capable of analyzing large datasets stored in HDFS. • It can operate on compressed data stored in the Hadoop ecosystem. • It supports user-defined functions (UDFs) where user can provide its functionality.
  • 6. Hive Origination • Hive originated as an internal project in Facebook • Later it was adopted in Apache as an open source project • Facebook deals with massive amount of data (petabytes scale) and it needs to perform more than 75k ad-hoc queries on this massive amount of data
  • 7. Why Hive? • Since the data is collected from multiple servers and is of diverse nature, any RDBMS system could not fit as probable solution • Map Reduce could be a natural choice, but it had its own limitations
  • 10. 1. Execute Query: The Hive interface such as Command Line or Web UI sends query to Driver (any database driver such as JDBC, ODBC, etc.) to execute 2. Get Plan: The driver takes the help of query compiler that parses the query to check the syntax and query plan or the requirement of query 3. Get Metadata: The compiler sends metadata request to Metastore (any database). 4. Send Metadata : Metastore sends metadata as a response to the compiler. 5. Send Plan : The compiler checks the requirement and resends the plan to the driver. Up to here, the parsing and compiling of a query is complete. 6. Execute Plan: The driver sends the execute plan to the execution engine. 7.Execute Job: Internally, the process of execution job is a MapReduce job. The execution engine sends the job to JobTracker, which is in Name node and it assigns this job to TaskTracker, which is in Data node. Here, the query executes MapReduce job.
  • 11. Data modeling Tables Partitions buckets Here tables are organized into partitions for grouping same type of data based on partition key Partitions divided further into buckets based on some other column Tables in hive are created the same way it is done in RDBMS
  • 12. Different modes of Hive • Hive can operate in two modes depending on the size of data nodes in Hadoop. • These modes are : • Local mode • Map reduce mode
  • 13. Local Mode • If the Hadoop installed under pseudo mode with having one data node we use Hive in this mode • If the data size is smaller in term of limited to single local machine, we can use this mode • Processing will be very fast on smaller data sets present in the local machine
  • 14. Map Reduce mode • If Hadoop is having multiple data nodes and data is distributed across different node we use Hive in this mode • It will perform on large amount of data sets and query going to execute in parallel way • Processing of large data sets with better performance can be achieved through this mode
  • 15. Advantages of hive • Keeps queries running fast • Takes very little time to write Hive query in comparison to MapReduce code • HiveQL is a declarative language like SQL • Multiple users can simultaneously query the data using Hive-QL. • Very easy to write query including joins in Hive • Simple to learn and use
  • 16. Disadvantages of Hive • It's not designed for Online transaction processing (OLTP), it is only used for the Online Analytical Processing (OLAP). • Hive supports overwriting or apprehending data, but not updates and deletes. • Sub-queries are not supported, in Hive
  • 17. Copying file from local system into Hadoop environment • Hdfs dfs –copyFromLocal (file path) destination path
  • 25. Number of movies per year select startyear,count(*) as count from movies where startyear > 2000 and startyear < 2022 group by startyear order by count;
  • 26. Comedy movies • select primarytitle,startyear,runtimeminutes,genres from movies where array_contains(genres,"Comedy");
  • 27. • select distinct titletype from movies;
  • 28. Upcoming horror movies select * from movies where titletype = 'movie' and startyear > 2021 and array_contains(genres,"Horror");
  • 29. Movies in 2021 with rating more than 9 select m.startyear,m.titletype,m.primarytitle,r.averagerating,m.genres from movies as m join rating as r on m.tconst = r.tconst where m.titletype = 'movie' and m.startyear = 2021 and r.averagerating > 9 ;
  • 30. Action series with rating more than 9 select m.startyear,m.titletype,m.primarytitle,r.averagerating,m.genres from movies as m join rating as r on m.tconst = r.tconst where m.titletype = 'tvSeries' and r.averagerating > 9 and array_contains(genres,"Action");