This presentation about Hadoop architecture will help you understand the architecture of Apache Hadoop in detail. In this video, you will learn what is Hadoop, components of Hadoop, what is HDFS, HDFS architecture, Hadoop MapReduce, Hadoop MapReduce example, Hadoop YARN and finally, a demo on MapReduce. Apache Hadoop offers a versatile, adaptable and reliable distributed computing big data framework for a group of systems with capacity limit and local computing power. After watching this video, you will also understand the Hadoop Distributed File System and its features along with the practical implementation.
Below are the topics covered in this Hadoop Architecture presentation:
1. What is Hadoop?
2. Components of Hadoop
3. What is HDFS?
4. HDFS Architecture
5. Hadoop MapReduce
6. Hadoop MapReduce Example
7. Hadoop YARN
8. Demo on MapReduce
What are the course objectives?
This course will enable you to:
1. Understand the different components of Hadoop ecosystem such as Hadoop 2.7, Yarn, MapReduce, Pig, Hive, Impala, HBase, Sqoop, Flume, and Apache Spark
2. Understand Hadoop Distributed File System (HDFS) and YARN as well as their architecture, and learn how to work with them for storage and resource management
3. Understand MapReduce and its characteristics, and assimilate some advanced MapReduce concepts
4. Get an overview of Sqoop and Flume and describe how to ingest data using them
5. Create database and tables in Hive and Impala, understand HBase, and use Hive and Impala for partitioning
6. Understand different types of file formats, Avro Schema, using Arvo with Hive, and Sqoop and Schema evolution
7. Understand Flume, Flume architecture, sources, flume sinks, channels, and flume configurations
8. Understand HBase, its architecture, data storage, and working with HBase. You will also understand the difference between HBase and RDBMS
9. Gain a working knowledge of Pig and its components
10. Do functional programming in Spark
11. Understand resilient distribution datasets (RDD) in detail
12. Implement and build Spark applications
13. Gain an in-depth understanding of parallel processing in Spark and Spark RDD optimization techniques
14. Understand the common use-cases of Spark and the various interactive algorithms
15. Learn Spark SQL, creating, transforming, and querying Data frames
Who should take up this Big Data and Hadoop Certification Training Course?
Big Data career opportunities are on the rise, and Hadoop is quickly becoming a must-know technology for the following professionals:
1. Software Developers and Architects
2. Analytics Professionals
3. Senior IT professionals
4. Testing and Mainframe professionals
5. Data Management Professionals
6. Business Intelligence Professionals
7. Project Managers
8. Aspiring Data Scientists
Learn more at https://www.simplilearn.com/big-data-and-analytics/big-data-and-hadoop-training
The Hadoop Distributed File System (HDFS) is the primary data storage system used by Hadoop applications. It employs a Master and Slave architecture with a NameNode that manages metadata and DataNodes that store data blocks. The NameNode tracks locations of data blocks and regulates access to files, while DataNodes store file blocks and manage read/write operations as directed by the NameNode. HDFS provides high-performance, scalable access to data across large Hadoop clusters.
This document provides an introduction to the Pig analytics platform for Hadoop. It begins with an overview of big data and Hadoop, then discusses the basics of Pig including its data model, language called Pig Latin, and components. Key points made are that Pig provides a high-level language for expressing data analysis processes, compiles queries into MapReduce programs for execution, and allows for easier programming than lower-level systems like Java MapReduce. The document also compares Pig to SQL and Hive, and demonstrates visualizing Pig jobs with the Twitter Ambrose tool.
The presentation covers following topics: 1) Hadoop Introduction 2) Hadoop nodes and daemons 3) Architecture 4) Hadoop best features 5) Hadoop characteristics. For more further knowledge of Hadoop refer the link: http://data-flair.training/blogs/hadoop-tutorial-for-beginners/
HDFS is a Java-based file system that provides scalable and reliable data storage, and it was designed to span large clusters of commodity servers. HDFS has demonstrated production scalability of up to 200 PB of storage and a single cluster of 4500 servers, supporting close to a billion files and blocks.
Apache Spark is a In Memory Data Processing Solution that can work with existing data source like HDFS and can make use of your existing computation infrastructure like YARN/Mesos etc. This talk will cover a basic introduction of Apache Spark with its various components like MLib, Shark, GrpahX and with few examples.
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...Simplilearn
The document provides information about Hadoop training. It discusses the need for Hadoop in today's data-heavy world. It then describes what Hadoop is, its ecosystem including HDFS for storage and MapReduce for processing. It also discusses YARN and provides a bank use case. It further explains the architecture and working of HDFS and MapReduce in processing large datasets in parallel across clusters.
The document discusses Hadoop, an open-source software framework that allows distributed processing of large datasets across clusters of computers. It describes Hadoop as having two main components - the Hadoop Distributed File System (HDFS) which stores data across infrastructure, and MapReduce which processes the data in a parallel, distributed manner. HDFS provides redundancy, scalability, and fault tolerance. Together these components provide a solution for businesses to efficiently analyze the large, unstructured "Big Data" they collect.
Big Data raises challenges about how to process such vast pool of raw data and how to aggregate value to our lives. For addressing these demands an ecosystem of tools named Hadoop was conceived.
This document provides an overview of Apache Spark, including how it compares to Hadoop, the Spark ecosystem, Resilient Distributed Datasets (RDDs), transformations and actions on RDDs, the directed acyclic graph (DAG) scheduler, Spark Streaming, and the DataFrames API. Key points covered include Spark's faster performance versus Hadoop through its use of memory instead of disk, the RDD abstraction for distributed collections, common RDD operations, and Spark's capabilities for real-time streaming data processing and SQL queries on structured data.
This presentation provides an overview of Hadoop, including:
- A brief history of data and the rise of big data from various sources.
- An introduction to Hadoop as an open source framework used for distributed processing and storage of large datasets across clusters of computers.
- Descriptions of the key components of Hadoop - HDFS for storage, and MapReduce for processing - and how they work together in the Hadoop architecture.
- An explanation of how Hadoop can be installed and configured in standalone, pseudo-distributed and fully distributed modes.
- Examples of major companies that use Hadoop like Amazon, Facebook, Google and Yahoo to handle their large-scale data and analytics needs.
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Simplilearn
This presentation about Hadoop for beginners will help you understand what is Hadoop, why Hadoop, what is Hadoop HDFS, Hadoop MapReduce, Hadoop YARN, a use case of Hadoop and finally a demo on HDFS (Hadoop Distributed File System), MapReduce and YARN. Big Data is a massive amount of data which cannot be stored, processed, and analyzed using traditional systems. To overcome this problem, we use Hadoop. Hadoop is a framework which stores and handles Big Data in a distributed and parallel fashion. Hadoop overcomes the challenges of Big Data. Hadoop has three components HDFS, MapReduce, and YARN. HDFS is the storage unit of Hadoop, MapReduce is its processing unit, and YARN is the resource management unit of Hadoop. In this video, we will look into these units individually and also see a demo on each of these units.
Below topics are explained in this Hadoop presentation:
1. What is Hadoop
2. Why Hadoop
3. Big Data generation
4. Hadoop HDFS
5. Hadoop MapReduce
6. Hadoop YARN
7. Use of Hadoop
8. Demo on HDFS, MapReduce and YARN
What is this Big Data Hadoop training course about?
The Big Data Hadoop and Spark developer course have been designed to impart an in-depth knowledge of Big Data processing using Hadoop and Spark. The course is packed with real-life projects and case studies to be executed in the CloudLab.
What are the course objectives?
This course will enable you to:
1. Understand the different components of the Hadoop ecosystem such as Hadoop 2.7, Yarn, MapReduce, Pig, Hive, Impala, HBase, Sqoop, Flume, and Apache Spark
2. Understand Hadoop Distributed File System (HDFS) and YARN as well as their architecture, and learn how to work with them for storage and resource management
3. Understand MapReduce and its characteristics, and assimilate some advanced MapReduce concepts
4. Get an overview of Sqoop and Flume and describe how to ingest data using them
5. Create database and tables in Hive and Impala, understand HBase, and use Hive and Impala for partitioning
6. Understand different types of file formats, Avro Schema, using Arvo with Hive, and Sqoop and Schema evolution
7. Understand Flume, Flume architecture, sources, flume sinks, channels, and flume configurations
8. Understand HBase, its architecture, data storage, and working with HBase. You will also understand the difference between HBase and RDBMS
9. Gain a working knowledge of Pig and its components
10. Do functional programming in Spark
11. Understand resilient distribution datasets (RDD) in detail
12. Implement and build Spark applications
13. Gain an in-depth understanding of parallel processing in Spark and Spark RDD optimization techniques
14. Understand the common use-cases of Spark and the various interactive algorithms
15. Learn Spark SQL, creating, transforming, and querying Data frames
Learn more at https://www.simplilearn.com/big-data-and-analytics/big-data-and-hadoop-training
This presentation discusses the following topics:
Hadoop Distributed File System (HDFS)
How does HDFS work?
HDFS Architecture
Features of HDFS
Benefits of using HDFS
Examples: Target Marketing
HDFS data replication
Here is how you can solve this problem using MapReduce and Unix commands:
Map step:
grep -o 'Blue\|Green' input.txt | wc -l > output
This uses grep to search the input file for the strings "Blue" or "Green" and print only the matches. The matches are piped to wc which counts the lines (matches).
Reduce step:
cat output
This isn't really needed as there is only one mapper. Cat prints the contents of the output file which has the count of Blue and Green.
So MapReduce has been simulated using grep for the map and cat for the reduce functionality. The key aspects are - grep extracts the relevant data (map
Big Data and Hadoop training course is designed to provide knowledge and skills to become a successful Hadoop Developer. In-depth knowledge of concepts such as Hadoop Distributed File System, Setting up the Hadoop Cluster, Map-Reduce,PIG, HIVE, HBase, Zookeeper, SQOOP etc. will be covered in the course.
Hadoop is the popular open source like Facebook, Twitter, RFID readers, sensors, and implementation of MapReduce, a powerful tool so on.Your management wants to derive designed for deep analysis and transformation of information from both the relational data and thevery large data sets. Hadoop enables you to unstructuredexplore complex data, using custom analyses data, and wants this information as soon astailored to your information and questions. possible.Hadoop is the system that allows unstructured What should you do? Hadoop may be the answer!data to be distributed across hundreds or Hadoop is an open source project of the Apachethousands of machines forming shared nothing Foundation.clusters, and the execution of Map/Reduce It is a framework written in Java originallyroutines to run on the data in that cluster. Hadoop developed by Doug Cutting who named it after hishas its own filesystem which replicates data to sons toy elephant.multiple nodes to ensure if one node holding data Hadoop uses Google’s MapReduce and Google Filegoes down, there are at least 2 other nodes from System technologies as its foundation.which to retrieve that piece of information. This It is optimized to handle massive quantities of dataprotects the data availability from node failure, which could be structured, unstructured orsomething which is critical when there are many semi-structured, using commodity hardware, thatnodes in a cluster (aka RAID at a server level). is, relatively inexpensive computers. This massive parallel processing is done with greatWhat is Hadoop? performance. However, it is a batch operation handling massive quantities of data, so theThe data are stored in a relational database in your response time is not immediate.desktop computer and this desktop computer As of Hadoop version 0.20.2, updates are nothas no problem handling this load. possible, but appends will be possible starting inThen your company starts growing very quickly, version 0.21.and that data grows to 10GB. Hadoop replicates its data across differentAnd then 100GB. computers, so that if one goes down, the data areAnd you start to reach the limits of your current processed on one of the replicated computers.desktop computer. Hadoop is not suitable for OnLine Transaction So you scale-up by investing in a larger computer, Processing workloads where data are randomly and you are then OK for a few more months. accessed on structured data like a relational When your data grows to 10TB, and then 100TB. database.Hadoop is not suitable for OnLineAnd you are fast approaching the limits of that Analytical Processing or Decision Support Systemcomputer. workloads where data are sequentially accessed onMoreover, you are now asked to feed your structured data like a relational database, to application with unstructured data coming from generate reports that provide business sources intelligence. Hadoop is used for Big Data. It complements OnLine Transaction Processing and OnLine Analytical Pro
Introduction to Hadoop and Hadoop component rebeccatho
This document provides an introduction to Apache Hadoop, which is an open-source software framework for distributed storage and processing of large datasets. It discusses Hadoop's main components of MapReduce and HDFS. MapReduce is a programming model for processing large datasets in a distributed manner, while HDFS provides distributed, fault-tolerant storage. Hadoop runs on commodity computer clusters and can scale to thousands of nodes.
Hadoop is an open-source framework for distributed storage and processing of large datasets across clusters of commodity hardware. It addresses problems with traditional systems like data growth, network/server failures, and high costs by allowing data to be stored in a distributed manner and processed in parallel. Hadoop has two main components - the Hadoop Distributed File System (HDFS) which provides high-throughput access to application data across servers, and the MapReduce programming model which processes large amounts of data in parallel by splitting work into map and reduce tasks.
Hadoop is an open-source framework for distributed storage and processing of large datasets across clusters of commodity hardware. It was created to support applications handling large datasets operating on many servers. Key Hadoop technologies include MapReduce for distributed computing, and HDFS for distributed file storage inspired by Google File System. Other related Apache projects extend Hadoop capabilities, like Pig for data flows, Hive for data warehousing, and HBase for NoSQL-like big data. Hadoop provides an effective solution for companies dealing with petabytes of data through distributed and parallel processing.
This Hadoop will help you understand the different tools present in the Hadoop ecosystem. This Hadoop video will take you through an overview of the important tools of Hadoop ecosystem which include Hadoop HDFS, Hadoop Pig, Hadoop Yarn, Hadoop Hive, Apache Spark, Mahout, Apache Kafka, Storm, Sqoop, Apache Ranger, Oozie and also discuss the architecture of these tools. It will cover the different tasks of Hadoop such as data storage, data processing, cluster resource management, data ingestion, machine learning, streaming and more. Now, let us get started and understand each of these tools in detail.
Below topics are explained in this Hadoop ecosystem presentation:
1. What is Hadoop ecosystem?
1. Pig (Scripting)
2. Hive (SQL queries)
3. Apache Spark (Real-time data analysis)
4. Mahout (Machine learning)
5. Apache Ambari (Management and monitoring)
6. Kafka & Storm
7. Apache Ranger & Apache Knox (Security)
8. Oozie (Workflow system)
9. Hadoop MapReduce (Data processing)
10. Hadoop Yarn (Cluster resource management)
11. Hadoop HDFS (Data storage)
12. Sqoop & Flume (Data collection and ingestion)
What is this Big Data Hadoop training course about?
The Big Data Hadoop and Spark developer course have been designed to impart in-depth knowledge of Big Data processing using Hadoop and Spark. The course is packed with real-life projects and case studies to be executed in the CloudLab.
What are the course objectives?
This course will enable you to:
1. Understand the different components of the Hadoop ecosystem such as Hadoop 2.7, Yarn, MapReduce, Pig, Hive, Impala, HBase, Sqoop, Flume, and Apache Spark
2. Understand Hadoop Distributed File System (HDFS) and YARN as well as their architecture, and learn how to work with them for storage and resource management
3. Understand MapReduce and its characteristics, and assimilate some advanced MapReduce concepts
4. Get an overview of Sqoop and Flume and describe how to ingest data using them
5. Create database and tables in Hive and Impala, understand HBase, and use Hive and Impala for partitioning
6. Understand different types of file formats, Avro Schema, using Arvo with Hive, and Sqoop and Schema evolution
7. Understand Flume, Flume architecture, sources, flume sinks, channels, and flume configurations
8. Understand HBase, its architecture, data storage, and working with HBase. You will also understand the difference between HBase and RDBMS
9. Gain a working knowledge of Pig and its components
10. Do functional programming in Spark
11. Understand resilient distribution datasets (RDD) in detail
12. Implement and build Spark applications
13. Learn Spark SQL, creating, transforming, and querying Data frames
14. Understand the common use-cases of Spark and the various interactive algorithms
Learn more at https://www.simplilearn.com/big-data-and-analytics/big-data-and-hadoop-training.
Apache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay RadiaYahoo Developer Network
This document discusses scaling HDFS through federation. HDFS currently uses a single namenode that limits scalability. Federation allows multiple independent namenodes to each manage a subset of the namespace, improving scalability. It also generalizes the block storage layer to use block pools, separating block management from namenodes. This paves the way for horizontal scaling of both namenodes and block storage in the future. Federation preserves namenode robustness while requiring few code changes. It also provides benefits like improved isolation and availability when scaling to extremely large clusters with billions of files and blocks.
Hadoop is an open-source software framework for distributed storage and processing of large datasets. It has three core components: HDFS for storage, MapReduce for processing, and YARN for resource management. HDFS stores data as blocks across clusters of commodity servers. MapReduce allows distributed processing of large datasets in parallel. YARN improves on MapReduce and provides a general framework for distributed applications beyond batch processing.
A brief introduction to Hadoop distributed file system. How a file is broken into blocks, written and replicated on HDFS. How missing replicas are taken care of. How a job is launched and its status is checked. Some advantages and disadvantages of HDFS-1.x
Understanding Hadoop Clusters and the Networkbradhedlund
This document provides an overview of Hadoop clusters and the network. It describes the typical roles in a Hadoop cluster including NameNode, DataNodes, JobTracker, and Secondary NameNode. It explains how data is written to HDFS in a distributed manner across multiple racks and DataNodes for redundancy. It also summarizes how MapReduce jobs are executed by distributing tasks to DataNodes where the data is located when possible for locality.
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...Simplilearn
This video on Hadoop interview questions part-1 will take you through the general Hadoop questions and questions on HDFS, MapReduce and YARN, which are very likely to be asked in any Hadoop interview. It covers all the topics on the major components of Hadoop. This Hadoop tutorial will give you an idea about the different scenario-based questions you could face and some multiple-choice questions as well. Now, let us dive into this Hadoop interview questions video and gear up for youe next Hadoop Interview.
What is this Big Data Hadoop training course about?
The Big Data Hadoop and Spark developer course have been designed to impart an in-depth knowledge of Big Data processing using Hadoop and Spark. The course is packed with real-life projects and case studies to be executed in the CloudLab.
What are the course objectives?
This course will enable you to:
1. Understand the different components of the Hadoop ecosystem such as Hadoop 2.7, Yarn, MapReduce, Pig, Hive, Impala, HBase, Sqoop, Flume, and Apache Spark
2. Understand Hadoop Distributed File System (HDFS) and YARN as well as their architecture, and learn how to work with them for storage and resource management
3. Understand MapReduce and its characteristics, and assimilate some advanced MapReduce concepts
4. Get an overview of Sqoop and Flume and describe how to ingest data using them
5. Create database and tables in Hive and Impala, understand HBase, and use Hive and Impala for partitioning
6. Understand different types of file formats, Avro Schema, using Arvo with Hive, and Sqoop and Schema evolution
7. Understand Flume, Flume architecture, sources, flume sinks, channels, and flume configurations
8. Understand HBase, its architecture, data storage, and working with HBase. You will also understand the difference between HBase and RDBMS
9. Gain a working knowledge of Pig and its components
10. Do functional programming in Spark
11. Understand resilient distribution datasets (RDD) in detail
12. Implement and build Spark applications
13. Gain an in-depth understanding of parallel processing in Spark and Spark RDD optimization techniques
14. Understand the common use-cases of Spark and the various interactive algorithms
15. Learn Spark SQL, creating, transforming, and querying Data frames
Learn more at https://www.simplilearn.com/big-data-and-analytics/big-data-and-hadoop-training
HDFS is a distributed file system designed for large data sets and high throughput access. It uses a master/slave architecture with a Namenode managing the file system namespace and Datanodes storing file data blocks. Blocks are replicated across Datanodes for fault tolerance. The system is highly scalable, handling large clusters and files sizes ranging from gigabytes to terabytes.
The document discusses Hadoop, its components, and how they work together. It covers HDFS, which stores and manages large files across commodity servers; MapReduce, which processes large datasets in parallel; and other tools like Pig and Hive that provide interfaces for Hadoop. Key points are that Hadoop is designed for large datasets and hardware failures, HDFS replicates data for reliability, and MapReduce moves computation instead of data for efficiency.
Hadoop is an open source framework for distributed storage and processing of large datasets across commodity hardware. It has two main components - the Hadoop Distributed File System (HDFS) for storage, and MapReduce for processing. HDFS stores data across clusters in a redundant and fault-tolerant manner. MapReduce allows distributed processing of large datasets in parallel using map and reduce functions. The architecture aims to provide reliable, scalable computing using commodity hardware.
The document discusses the key features and architecture of the Hadoop File System (HDFS). HDFS is designed for large data sets and high fault tolerance. It uses a master/slave architecture with one namenode that manages file metadata and multiple datanodes that store file data blocks. HDFS replicates blocks across datanodes for reliability and provides interfaces for applications to access file data.
This document provides an overview of Hadoop and MapReduce concepts. It discusses:
- HDFS architecture with NameNode and DataNodes for metadata and data storage. HDFS provides reliability through block replication across nodes.
- MapReduce framework for distributed processing of large datasets across clusters. It consists of map and reduce phases with intermediate shuffling and sorting of data.
- Hadoop was developed based on Google's papers describing their distributed file system GFS and MapReduce processing model. It allows processing of data in parallel across large clusters of commodity hardware.
Best Hadoop Institutes : kelly tecnologies is the best Hadoop training Institute in Bangalore.Providing hadoop courses by realtime faculty in Bangalore.
Apache Hadoop is an open-source software framework that supports large-scale distributed applications and processing of multi-petabyte datasets across thousands of commodity servers. It implements the MapReduce programming model for distributed processing and the Hadoop Distributed File System (HDFS) for reliable data storage. HDFS stores data across commodity servers, provides high aggregate bandwidth, and detects/recovers from failures automatically.
A simple replication-based mechanism has been used to achieve high data reliability of Hadoop Distributed File System (HDFS). However, replication based mechanisms have high degree of disk storage requirement since it makes copies of full block without consideration of storage size. Studies have shown that erasure-coding mechanism can provide more storage space when used as an alternative to replication. Also, it can increase write throughput compared to replication mechanism. To improve both space efficiency and I/O performance of the HDFS while preserving the same data reliability level, we propose HDFS+, an erasure coding based Hadoop Distributed File System. The proposed scheme writes a full block on the primary DataNode and then performs erasure coding with Vandermonde-based Reed-Solomon algorithm that divides data into m data fragments and encode them into n data fragments (n>m), which are saved in N distinct DataNodes such that the original object can be reconstructed from any m fragments. The experimental results show that our scheme can save up to 33% of storage space while outperforming the original scheme in write performance by 1.4 times. Our scheme provides the same read performance as the original scheme as long as data can be read from the primary DataNode even under single-node or double-node failure. Otherwise, the read performance of the HDFS+ decreases to some extent. However, as the number of fragments increases, we show that the performance degradation becomes negligible.
In this session you will learn:
History of Hadoop
Hadoop Ecosystem
Hadoop Animal Planet
What is Hadoop?
Distinctions of Hadoop
Hadoop Components
The Hadoop Distributed Filesystem
Design of HDFS
When Not to use Hadoop?
HDFS Concepts
Anatomy of a File Read
Anatomy of a File Write
Replication & Rack awareness
Mapreduce Components
Typical Mapreduce Job
To know more, click here: https://www.mindsmapped.com/courses/big-data-hadoop/big-data-and-hadoop-training-for-beginners/
The document discusses data partitioning and distribution across multiple machines in a cluster. It explains that data replication does not scale well, but data partitioning, where each record exists on only one machine, allows write latency to scale with the number of machines in the cluster. Coherence provides a distributed cache that partitions data and offers functions for server-side processing near the data through tools like entry processors.
So you want to get started with Hadoop, but how. This session will show you how to get started with Hadoop development using Pig. Prior Hadoop experience is not needed.
Thursday, May 8th, 02:00pm-02:50pm
Hadoop Institutes : kelly technologies is the best Hadoop Training Institutes in Hyderabad. Providing Hadoop training by real time faculty in Hyderabad.
HDFS is a distributed file system that stores large data across multiple nodes in a Hadoop cluster. It divides files into blocks and replicates them across nodes for reliability. The NameNode manages the file system namespace and regulates client access, while DataNodes store data blocks. HDFS provides interfaces for applications to access data blocks efficiently and is highly fault tolerant due to replication.
Block Cipher Modes Of Operation | Computer Networking and Security | SimplilearnSimplilearn
Embark on a thrilling voyage through the intricate realm of Block Cipher Modes of Operation in our latest YouTube video! This video is a journey where every bit of data is a precious cargo sailing through treacherous cyber seas. Explore the strategic maneuvers of encryption, from the steadfast Electronic Codebook (ECB) to the labyrinthine Cipher Block Chaining (CBC) and the agile Counter (CTR) mode. In an era where digital data flows like the world's vast oceans, encryption stands as the lighthouse guiding and protecting every byte from the stormy seas of cyber threats. This video isn't just a journey through ones and zeroes; it's an odyssey alongside the silent guardians ensuring the safety of our digital universe, one block at a time.
What Is Default Gateway? | Default Gateway Explained In 9 Minutes | #Cybersec...Simplilearn
In this video on Proxy vs Reverse Proxy Explained In 15 Minutes, embark on a 15-minute journey into the realm of proxies and reverse proxies! Delve into the intricacies of how proxies function, unraveling their types and shedding light on their widespread applications. Explore the advantages that make proxies indispensable in the digital landscape. But the journey doesn't stop there! Venture into the realm of reverse proxies, understanding their distinct role and uncovering the unique advantages they offer. By the end, grasp the differentiating factors between a proxy and a reverse proxy, demystifying the intricacies of these crucial components in the world of networking.
How Much Do Ethical Hackers Make | Ethical Hacking Career|SimplilearnSimplilearn
This video is based on the Ethical Hacker salary. As cybersecurity continues to be a top priority for businesses and organizations around the world, the demand for skilled, ethical hackers is on the rise. This has led to an increase in salaries for those working in the field, with many professionals earning six-figure salaries. In this video session, we will learn what ethical hacking is, Salary trends for ethical hackers, skills required for a career in ethical hacking, and tips for starting a successful career in ethical hacking.
Is DevOps The Right Career Option To Choose In 2024? | Career Growth In DevOp...Simplilearn
In this comprehensive video on Is DevOps "The Right Career Option To Choose In 2024?" from Simplilearn , we delve into what DevOps is, why you should choose a DevOps career, and the exciting career progression by job roles in this dynamic field. Discover the detailed career path in DevOps, from entry-level positions to senior roles, and explore the impressive salary potential that comes with each stage. We'll also highlight the abundant career growth opportunities in DevOps, making it a top choice for tech enthusiasts. Whether you're a beginner or an experienced professional, this video will guide you through everything you need to know about pursuing a successful career in DevOps in 2024. Don't miss out on learning about the future of DevOps, DevOps job roles, DevOps salary, and DevOps career growth. Make sure to like, comment, and subscribe for more insights on DevOps and other in-demand tech careers.
How To Pass PMP Exam | Everything About PMP Exam | PMP Certification | Simpli...Simplilearn
Unlock your potential and become a certified Project Management Professional with our comprehensive guide on "How to Pass the PMP Exam" from Simplilearn. This video covers everything you need to know about PMP certification, including what PMP is and who it’s for, the eligibility and application process, and the PMP exam structure. We delve into the exam duration and crucial details, along with the best resources to help you succeed. Discover the right mindset to adopt while preparing for the PMP exam and gain valuable tips and tricks to maximize your chances of passing on your first attempt. Whether you're just starting or in the final stages of your PMP preparation, this video is packed with essential information to help you achieve your PMP certification goals.
What Is Cloud Security? | Cloud Security Fundamentals | Cloud Computing Tutor...Simplilearn
In this video by Simplilearn, we explore the vital topic of cloud security. As businesses increasingly rely on cloud-based tools and services, understanding how to protect these digital environments is more important than ever. Cloud security involves a range of strategies and technologies designed to safeguard data, applications, and services from potential threats. Whether you're a small business owner or part of a large enterprise, this video will help you grasp the essentials of cloud security, why it's critical, and how to implement best practices to ensure your digital assets are protected. Stay tuned as we break down complex concepts into simple, actionable steps.
Java Spring Boot Roadmap | How To Master Spring Boot In 2024 | Spring Boot 20...Simplilearn
In this roadmap tutorial, we'll guide you through a comprehensive step-by-step journey of How To Master Spring Boot In 2024.If you're wondering what is the complete Java Spring Boot Roadmap then this video will provide you with a structured path .In the first segment, you'll learn What is Spring Boot? and understand its core concepts. Next, we’ll explore the "Features of Spring Boot, highlighting its benefits and why it’s a popular choice among developers. Then, we'll present a "Spring Boot Demo" to give you a practical understanding of how to create and run a simple Spring Boot application. Finally, we'll dive into the Spring Boot Roadmap, detailing the steps you need to follow to master Spring Boot in 2024.This tutorial is designed to equip you with the knowledge and skills necessary to excel in your Spring Boot journey. By following this structured roadmap, you'll gain a thorough understanding of Spring Boot and its ecosystem, enabling you to build robust, production-ready applications. Join us on this journey to unlock the full potential of Spring Boot in your development projects.
The WORST Beginner Cyber Security Mistakes Everyone Makes.pptxSimplilearn
Are you just starting your journey in cyber security? Avoiding common mistakes can set you on the right path and boost your success. In this video The WORST Beginner Cyber Security Mistakes Everyone Makes, we dive into "The WORST Beginner Cyber Security Mistakes Everyone Makes" and provide practical tips to help you steer clear of these pitfalls. We’ll cover essential topics like mastering the basics, gaining hands-on experience, and the importance of networking. You’ll learn why documenting your work is crucial, why you shouldn't wait too long to start applying for jobs, and how to focus on key areas instead of trying to qualify for every job out there. We’ll also debunk the myth that you need a deep IT background to succeed in cyber security. Don’t miss out on our insights about attending cyber security events and exploring different specializations without stressing over the perfect choice. This guide is packed with practical advice to help you build a strong foundation and grow in your cyber security career.
Machine Learning Interview Questions 2024 | ML Interview Questions And Answer...Simplilearn
In this video on machine learning interview questions 2024, we will cover 30 essential questions and answers for all levels: beginner, intermediate, and advanced.
Starting with the basics, we'll explain core concepts like supervised vs. unsupervised learning, overfitting, and the bias-variance tradeoff. Moving to intermediate topics, we delve into cross-validation, precision and recall, decision tree algorithms, and gradient descent.
For advanced learners, we'll tackle support vector machines, L1 vs. L2 regularization, and handling imbalanced datasets. This video aims to equip you with the knowledge and confidence to ace any machine learning interview. Join us and enhance your interview skills today! Don’t forget to like, comment, and subscribe for more insights.
Scrum Explained Under 20 Mins | What Is Scrum? | Scrum Master Training Tutori...Simplilearn
In this video on Scrum explained in under 20 minutes, we’ll guide you through the essentials of this popular Agile framework used for managing and completing complex projects. We'll begin by answering the fundamental question, "What is Scrum?" to give you a clear understanding of its purpose and principles. Next, we’ll delve into the key Scrum roles: the Product Owner, who defines the product vision; the Scrum Master, who facilitates the process and removes obstacles; and the Development Team, who work collaboratively to deliver the product increment. We’ll then explore the core Scrum artifacts, including the Product Backlog, Sprint Backlog, and Increment, which are crucial for maintaining transparency and tracking progress. Following this, we’ll cover the essential Scrum events, such as Sprints, Daily Stand-ups, Sprint Reviews, and Retrospectives, which ensure continuous improvement and alignment within the team.We’ll also outline the Scrum process flow, providing a step-by-step overview of how work progresses from planning to delivery. Additionally, we'll highlight the benefits of using Scrum, such as improved collaboration, flexibility, and faster delivery times. Finally, we’ll address common challenges teams may face when implementing Scrum and introduce some of the top tools and software that can enhance your Scrum practice.
7 Best Data Science Jobs 2024 | Data Science Jobs and Salary | Data Science C...Simplilearn
In this video on Data Science Job Roles 2024, we delve into the top data science job roles expected to dominate the industry in 2024. Join us as we explore the pivotal positions of a Data Analyst, Data Architect, and Statistician, uncovering the key responsibilities, skills required, and the growing demand for these roles in the ever-evolving world of data science. Whether you're a budding data enthusiast or a seasoned professional, understanding these roles is crucial for your career trajectory.
How To Start Dropshipping In 2024 | What Is Dropshipping | Dropshipping For B...Simplilearn
In this video on How To Start Dropshipping In 2024, learn how to start dropshipping in 2024! We'll guide you through setting up your dropshipping business, exploring the essential steps to get started. Discover the critical role of digital marketing in driving your dropshipping success, and learn how to leverage it effectively. We'll also address common challenges in dropshipping and provide practical solutions to overcome them. Whether you're a beginner or looking to refine your strategy, this video has you covered. Don't miss out on these valuable insights to kickstart your dropshipping journey in 2024!
How To Write An SEO Optimized Blog Post ? | SEO Optimized Blog Post | Simplil...Simplilearn
In this video on How To Write An SEO Optimized Blog Post ?, unlock the secrets of crafting an SEO-optimized blog post with a deep dive into the realm of Blog SEO. We'll cover what Blog SEO is, why it's important, and easy ways to make your blog show up more in search results. Whether you're new to blogging or have been doing it for a while, these tips will help you improve your blog and get more people to see it. Join us to find out how to write blog posts that not only grab attention but also rank higher on Google and other search engines. It's simple, practical advice for anyone looking to boost their blog's visibility!
Which Cybersecurity Specialization Is Best In 2024? | Cybersecurity Careers 2...Simplilearn
In this video on Which Cybersecurity Specialization Is Best In 2024?, we're delving deep into the world of cybersecurity to answer one important question: Which Cybersecurity Specialization Is Best In 2024?We'll explore why cybersecurity specializations are essential in today's digital landscape and take a closer look at some of the most sought-after certifications, including Certified Information Systems Security Professional (CISSP), CompTIA Security+, Certified Information Systems Auditor (CISA), Certified Information Security Manager (CISM), and Systems Security Certified Practitioner (SSCP). Get ready to secure your future and supercharge your cybersecurity career!
Are you ready to embark on a journey into the world of Azure? Choosing Azure certification is a smart move in today's tech-driven landscape. With various levels of certifications to match your expertise, Azure offers a clear learning path to success. Whether you're a beginner or a seasoned pro, Azure's roadmap provides structured guidance to master the cloud platform, ensuring you're equipped with the skills and knowledge needed for a thriving career in the digital age. Discover the reasons to choose Azure certification, explore the different certification levels, and follow the learning path that aligns with your goals. Let's get started!
Cybersecurity Roadmap 2024 | Cyber Security Career Roadmap For 2024 | Simplil...Simplilearn
Looking to navigate the dynamic world of cybersecurity in 2024 and beyond? Join us on an exciting journey as we unveil the ultimate Cybersecurity Roadmap for 2024! Whether you're a student exploring career options, an aspiring cybersecurity professional, or someone keen to upskill, our comprehensive roadmap has you covered. From educational pathways to the latest industry trends, we've got it all. Discover the key steps to becoming a cybersecurity expert, securing your digital future, and making a difference in the cyber landscape. Don't miss out on this invaluable guide.
Top 10 Business Analysis Tools | Business Analysis Tools And Techniques | Sim...Simplilearn
In this video on Top 10 Business Analysis Tools, we delve into a comprehensive guide to the essential tools that can transform the way you analyze data and make informed decisions. From user-friendly options like MS Office to advanced platforms such as Rational RequisitePro, we've got the top tools covered. Discover how these tools can revolutionize your approach to business analysis, providing valuable insights and helping you stay ahead in the data-driven world. Join us as we unveil the key to success with our countdown of the top 10 business analysis tools. Don't miss out!
ITIL Roadmap 2023 | How To Get Certified In ITIL | ITIL V4 Foundation Trainin...Simplilearn
This video is based on ITIL Roadmap 2023. In this video, we will dive into the world of ITIL, covering its fundamental concepts, key components, and levels of certification. Learn how to embark on the journey to becoming an ITIL Expert, gaining insights and tips to navigate this transformative landscape successfully. Whether you're new to ITIL or seeking to enhance your expertise, this video is your guide.
Top 5 Ethical Hacking Courses In India | 5 Best Ethical Hacking Courses in In...Simplilearn
In this video on Top 5 Ethical Hacking Courses In India, We will delve into the realm of ethical hacking as we explore the 5 Best Ethical Hacking courses available in India. Discover how these courses provide invaluable insights into cybersecurity, equipping you with the knowledge and skills needed to navigate the digital world ethically and securely. Whether you're a beginner or looking to advance your career, join us to find the perfect ethical hacking course to suit your aspirations.
Top 10 DevOps Jobs 2024 | 10 Highest Paying DevOps Jobs 2024 | DevOps Career ...Simplilearn
In this video, we delve into the Top 10 DevOps Jobs 2024. Discover exciting career opportunities in the ever-evolving field of DevOps, from DevOps Engineer to Automation Architect. Explore the skills and qualifications needed to thrive in these roles and stay ahead in the tech industry. Whether you're an aspiring DevOps professional or looking to switch careers, this video provides valuable insights into the future of DevOps job trends. Don't miss out on the chance to shape your DevOps career for 2024.
PRESS RELEASE - UNIVERSITY OF GHANA, JULY 16, 2024.pdfnservice241
The University of Ghana has launched a new vision and strategic plan, which will focus on transforming lives and societies through unparalleled scholarship, innovation, and result-oriented discoveries.
Email Marketing in Odoo 17 - Odoo 17 SlidesCeline George
Email marketing is used to send advertisements or commercial messages to specific groups of people by using email. Email Marketing also helps to track the campaign’s overall effectiveness. This slide will show the features of odoo 17 email marketing.
New features of Maintenance Module in Odoo 17Celine George
In Odoo, the Maintenance Module is a comprehensive tool designed to help organizations manage their equipment, machinery, and overall maintenance activities efficiently. This module enables users to schedule, track, and manage maintenance requests and activities, ensuring minimal downtime and optimal operational efficiency.
Plato and Aristotle's Views on Poetry by V.Jesinthal Maryjessintv
PPT on Plato and Aristotle's Views on Poetry prepared by Mrs.V.Jesinthal Mary, Dept of English and Foreign Languages(EFL),SRMIST Science and Humanities ,Ramapuram,Chennai-600089
How to define Related field in Odoo 17 - Odoo 17 SlidesCeline George
The related attribute is used in field definitions to establish a relationship between models and automatically fetch the value from a related model's field. It provides a way to reference and display fields from related models without having to create a separate field and write code to synchronize the values manually.
Description:
Welcome to the comprehensive guide on Relational Database Management System (RDBMS) concepts, tailored for final year B.Sc. Computer Science students affiliated with Alagappa University. This document covers fundamental principles and advanced topics in RDBMS, offering a structured approach to understanding databases in the context of modern computing. PDF content is prepared from the text book Learn Oracle 8I by JOSE A RAMALHO.
Key Topics Covered:
Main Topic : PL/SQL
Sub-Topic :
Structure of PL/SQL Block, Declaration Section, Variable, Constant, Execution Section, Exception, How PL/SQL works, Control Structures, If then Command,
Loop Command, Loop with IF, Loop with When, For Loop Command, While Command, Integrating SQL in PL/SQL program.
Target Audience:
Final year B.Sc. Computer Science students at Alagappa University seeking a solid foundation in RDBMS principles for academic and practical applications.
URL for previous slides
Unit V
Chapter 15
Unit IV
Chapter 14 Synonym : https://www.slideshare.net/slideshow/lecture_notes_unit4_chapter14_synonyms-pdf/270327685
Chapter 13 Users, Privileges : https://www.slideshare.net/slideshow/lecture-notes-unit4-chapter13-users-roles-and-privileges/270304806
Chapter 12 View : https://www.slideshare.net/slideshow/rdbms-lecture-notes-unit4-chapter12-view/270199683
Chapter 11 Sequence: https://www.slideshare.net/slideshow/sequnces-lecture_notes_unit4_chapter11_sequence/270134792
chapter 8,9 and 10 : https://www.slideshare.net/slideshow/lecture_notes_unit4_chapter_8_9_10_rdbms-for-the-students-affiliated-by-alagappa-university/270123800
About the Author:
Dr. S. Murugan is Associate Professor at Alagappa Government Arts College, Karaikudi. With 23 years of teaching experience in the field of Computer Science, Dr. S. Murugan has a passion for simplifying complex concepts in database management.
Disclaimer:
This document is intended for educational purposes only. The content presented here reflects the author’s understanding in the field of RDBMS as of 2024.
How to Load Custom Field to POS in Odoo 17 - Odoo 17 SlidesCeline George
This slide explains how to load custom fields you've created into the Odoo 17 Point-of-Sale (POS) interface. This approach involves extending the functionalities of existing POS models (e.g., product.product) to include your custom field.
2. Back in the days when there was
no internet, data used
to be less and was often
structured
This data was easily stored on a
central sever storage
How Big Data evolved?
3. But then, internet boomed and
data grew at a very high rate
A lot of semi-structured and
unstructured data was being
generated
How Big Data evolved?
4. But then, internet boomed and
data grew at a very high rate
Storing such huge volumes of
data on a single server was not
an efficient way
How Big Data evolved?
5. But then, internet boomed and
data grew at a very high rate
There was a need for distributed
storage machines where data
could be stored and processed
parallelly
How Big Data evolved?
6. But then, internet boomed and
data grew at a very high rate
Data can be stored and
processed on multiple machines
How Big Data evolved?
7. But then, internet boomed and
data grew at a very high rate
Hadoop is a framework that allows
distributed storage and parallel processing
of big data
BIG DATA
TECHNOLOGIES
How Big Data evolved?
Solution
8. What’s in it for you?
HDFS Architecture
Components of Hadoop
Demo on MapReduce
What is Hadoop?
What is HDFS?
Hadoop MapReduce
Hadoop MapReduce Example
Hadoop YARN
10. What is Hadoop?
Hadoop is a framework that allows you to store large
volumes of data on several node machines
It also helps in processing the data in a parallel manner
1 TB
3 TB
Data
1 TB 1 TB
14. What is HDFS?
Hadoop Distributed File System (HDFS) is the storage layer of Hadoop
that stores data in multiple data servers
Data is divided into multiple blocks
Stores them over multiple nodes of the cluster
15. What is HDFS?
Hadoop Distributed File System (HDFS) is the storage layer of Hadoop
that stores data in multiple data servers
Namenode
Secondary
Namenode
Slavenode
Master node contains
metadata in ram and disk
Has a copy of Namenode’s
metadata in disk
Contains the actual data in
the form of blocks
3 core components
17. HDFS Blocks
128 MB 128 MB 128 MB 128 MB 30 MB
542 MB
HDFS divides large data into different blocks Each block by default has 128 MB’s of data
Suppose, we have a 542 MB file
19. Data Replication in HDFS
C
DN -------> Datanode
-------> Block AA
B -------> Block B
-------> Block C
-------> Block DD
DN 9
DN 10
DN 11
DN 12
Rack 1
Rack 3
C
B
B
D
DN 5
DN 6
DN 7
DN 8
Rack 1
Rack 2
A
B
A
C
DN 1
DN 2
DN 3
DN 4
Rack 1
Rack 1
A
C
D
D
Do you understand what’s
happening here?
Each block of data is being replicated thrice on different
datanodes present in different racks
20. Data Replication in HDFS
C
DN -------> Datanode
-------> Block AA
B -------> Block B
-------> Block C
-------> Block DD
DN 1
DN 2
DN 3
DN 4
Rack 1
Rack 1
A
C
D
D
Initial copy of Block A is created
in Rack 1
DN 5
DN 6
DN 7
DN 8
Rack 1
Rack 2
A
B
A
C
Initial copy of Block B is created
in Rack 2
DN 9
DN 10
DN 11
DN 12
Rack 1
Rack 3
C
B
B
D
Initial copy of Block C and D is
created in Rack 3
Two identical blocks cannot be placed on the same datanode
21. Data Replication in HDFS
C
DN -------> Datanode
-------> Block AA
B -------> Block B
-------> Block C
-------> Block DD
DN 1
DN 2
DN 3
DN 4
Rack 1
Rack 1
A
C
D
D
Initial copy of Block A is created
in Rack 1
DN 5
DN 6
DN 7
DN 8
Rack 1
Rack 2
A
B
A
C
Initial copy of Block B is created
in Rack 2
DN 9
DN 10
DN 11
DN 12
Rack 1
Rack 3
C
B
B
D
Initial copy of Block C and D is
created in Rack 3
When cluster is rack aware, all the replicas of a block will not be placed on the same
rack
22. Data Replication in HDFS
C
DN -------> Datanode
-------> Block AA
B -------> Block B
-------> Block C
-------> Block DD
DN 9
DN 10
DN 11
DN 12
Rack 1
DN 5
DN 6
DN 7
DN 8
Rack 1
Rack 2 Rack 3
CDN 1
DN 2
DN 3
DN 4
Rack 1
Rack 1
A
A
B
B
B
A
DC
C
D
D
Suppose, datanode 7
crashes
23. Data Replication in HDFS
C
DN -------> Datanode
-------> Block AA
B -------> Block B
-------> Block C
-------> Block DD
DN 9
DN 10
DN 11
DN 12
Rack 1
DN 5
DN 6
DN 7
DN 8
Rack 1
Rack 2 Rack 3
CDN 1
DN 2
DN 3
DN 4
Rack 1
Rack 1
A
A
B
B
B
A
DC
C
D
D
We will still have 2 copies of Block C data on DN 4 of Rack 1 and
DN 9 of Rack 3
Suppose, datanode 7
crashes
26. HDFS - Namenode
Namenode
Datanode 1
B1 B2
Datanode 2
B1 B3
Datanode 3
B2 B3
Datanode N
……….
File.txt
Namenode is the master server. In a non high availability cluster, there can be only one
Namenode. In a Hadoop cluster, 2 Namenodes are possible
File system
Metadata in Disk
Edit log Fsimage
Metadata in RAM
Metadata (Name, replicas,….):
/home/foo/data, 3, …
27. HDFS - Namenode
Namenode
Datanode 1
B1 B2
Datanode 2
B1 B3
Datanode 3
B2 B3
Datanode N
……….
File.txt
File system
Metadata in Disk
Edit log Fsimage
Metadata in RAM
Metadata (Name, replicas,….):
/home/foo/data, 3, …
Namenode holds metadata information about the various
Datanodes, their location, the size of each block, etc.
28. HDFS - Namenode
Namenode
Datanode 1
B1 B2
Datanode 2
B1 B3
Datanode 3
B2 B3
Datanode N
……….
File.txt
File system
Metadata in Disk
Edit log Fsimage
Metadata in RAM
Metadata (Name, replicas,….):
/home/foo/data, 3, …
Helps to execute file system namespace operations –
opening, closing, renaming files and directories
29. HDFS - Namenode
Namenode
Datanode 1
B1 B2
Datanode 2
B1 B3
Datanode 3
B2 B3
Datanode N
……….
Datanodes send block reports to Namenode every 10 seconds
File.txt
File system
Metadata in Disk
Edit log Fsimage
Metadata in RAM
Metadata (Name, replicas,….):
/home/foo/data, 3, …
30. HDFS - Datanode
Namenode
Datanode is a multiple instance server. There can be N number of
Datanode servers
Client
Metadata ops
Metadata (Name, replicas, ….):
/home/foo/data, 3, …
Datanode 1
B1 B2
Datanode 2
B1 B3
Datanode 3
B2 B3
Datanode 4 Datanode 5
B4 B2 B4B3
Client
32. HDFS - Datanode
Namenode
Datanode stores and retrieves the blocks when asked by the Namenode
Client
Metadata ops
Metadata (Name, replicas, ….):
/home/foo/data, 3, …
Block ops
Datanode 1
B1 B2
Datanode 2
B1 B3
Datanode 3
B2 B3
Datanode 4 Datanode 5
B4 B2 B4B3
Client
33. HDFS - Datanode
Namenode
Datanode 1
B1 B2
Datanode 2
B1 B3
Datanode 3
B2 B3
It reads and writes client’s request and performs block creation, deletion and
replication on instruction from the Namenode
Datanode 4 Datanode 5
B4 B2 B4B3
Client
Metadata ops
Metadata (Name, replicas, ….):
/home/foo/data, 3, …
Client
Block ops
Read
Write
Replication
Response from the Namenode
that the operation was
successful
34. HDFS – Secondary Namenode
Namenode
Datanode 1 Datanode 2 Datanode 3
Secondary Namenode server is responsible for maintaining a copy of
Metadata in disk
SecondaryN
amenode
Maintains
Metadata in Disk
Edit log Fsimage
Performs
checkpointing
39. HDFS Read Mechanism
client JVM
client node
HDFS Client
Namenode
DN
1
DN 2
DN 3
DN 4
DN 5
DN 6
DN 7
DN 8
DN 9
Block A
Block A
Block ABlock B
Block B Block B
Request to read Block A and B
Sends the location of the blocks
(DN1 and DN2)
Block A Block B
Data to be read
Rack switch Rack switch Rack switch
Core switch
Rack 1 Rack 2 Rack 3
1
2
3
5
4
40. HDFS Read Mechanism
client JVM
client node
HDFS Client
Namenode
DN
1
DN 2
DN 3
DN 4
DN 5
DN 6
DN 7
DN 8
DN 9
Block A
Block A
Block ABlock B
Block B Block B
Request to read Block A and B
Sends the location of the blocks
(DN1 and DN2)
Block A Block B
Data to be read
Block A and B is read from DN1 and
DN2 as they are closest and have the
least network bandwidth
Rack switch Rack switch Rack switch
Core switch
Rack 1 Rack 2 Rack 3
1
2
5
7
6
43. HDFS Write Mechanism
client JVM
client node
HDFS Client
Namenode
Rack switch
DN
1
DN 2
DN 3
Rack switch
DN 4
DN 5
DN 6
Rack switch
DN 7
DN 8
DN 9
Request to write data on Block A
Sends the location of the Datanodes
(DN1, DN6, DN8)
Block A
Data to be written
Core switch
Block A
Block A
Block A
Rack 1 Rack 2 Rack 3
1
2
3
4
44. HDFS Write Mechanism
client JVM
client node
HDFS Client
Namenode
Rack switch
DN
1
DN 2
DN 3
Rack switch
DN 4
DN 5
DN 6
Rack switch
DN 7
DN 8
DN 9
Request to write data on Block A
Sends the location of the Datanodes
(DN1, DN6, DN8)
Block A
Data to be written
Block A
replica 1
Block A
replica 2
Core switch
Block A
Block A
Block A
Rack 1 Rack 2 Rack 3
5 6
1
2
3
4
45. HDFS Write Mechanism
client JVM
client node
HDFS Client
Namenode
DN
1
DN 2
DN 3
Rack switch
DN 4
DN 5
DN 6
Rack switch
DN 7
DN 8
DN 9
Block A
Data to be written
Core switch
Ack
DN 1, DN 6, DN 8
Ack
DN 1
Ack
DN 8
Ack
DN 6
Write operation successful
Block A
Block A
Block A
Rack 1 Rack 2 Rack 3
7
8
9
10
11
Rack switch
47. Hadoop MapReduce
MapReduce is a framework that performs distributed and parallel processing of large
volumes of data
Map Reduce
Data
block
Read and
process
Generates key-value
pairs (key, value)
Shuffle
and sort
(K1, v1)
(k2, v2)
(k3, v3)
Receives key-
value pairs from
map jobs
Aggregate key-value
pairs into smaller sets
48. Hadoop MapReduce
MapReduce is a framework that performs distributed and parallel processing of large
volumes of data
Input Data Output Data
map()
map()
map()
Shuffle and
Sort
reduce()
reduce()
49. MapReduce Job Execution
Input data
stored on
HDFS
Input
Format
Shuffling
and sorting
Output
Format
Inputsplit
Inputsplit
Inputsplit
……
RecordReader
RecordReader
RecordReader
……
Combiner
Combiner
Combiner
……
Partitioner
Partitioner
Partitioner
……
Reducer
Reducer
……..
Mapper
Mapper
Mapper
……
Output data
stored on
HDFS
Input key
value pair
Intermediate
key value pair
Substitute
intermediate
key value pair
50. MapReduce Example
Big data comes in various
formats. This data can be
stored in multiple data servers
Big data comes in
various formats
This data can be
stored in multiple
data servers
Big, 1
data, 1
comes, 1
in, 1
various, 1
formats, 1
This, 1
data, 1
can, 1
be, 1
stored, 1
in, 1
multiple, 1
data, 1
servers, 1
Input Split Map
be, (1)
Big, (1)
be, (1)
can, (1)
data, (1,1)
comes, (1)
formats, (1)
in, (1,1)
multiple, (1)
servers, (1)
stored, (1)
This, (1)
various, (1)
Shuffle
be, (1)
Big, (1)
be, (1)
can, (1)
comes, (1)
data, (2)
formats, (1)
in, (2)
multiple, (1)
servers, (1)
stored, (1)
This, (1)
various, (1)
Reduce
52. Hadoop YARN
YARN ---------> Yet Another Resource Negotiator
Introduced in Hadoop 2.0 version
It is the middle layer between HDFS
and MapReduce
Manages cluster resources (memory,
network bandwidth, disk IO, CPU)
56. YARN Architecture – Scheduler
Resource
ManagerClient
Node
Manager
container
App Master
App Master
container
Node
Manager
Node
Manager
container container
Job Submission
Node Status
MapReduce Status
Resource Request
Submit job
request
Scheduler
• Scheduler allocates resources to various
running applications
• Schedules resources based on the
requirements of the applications
• Does not monitor or track the status of the
applications
Applications
Manager
57. YARN Architecture – Applications Manager
Resource
ManagerClient
Node
Manager
container
App Master
App Master
container
Node
Manager
Node
Manager
container container
Job Submission
Node Status
MapReduce Status
Resource Request
Submit job
request
Scheduler
Applications
Manager
• Applications Manager accepts job
submissions
• Monitors and restarts application masters
in case of failure
58. YARN Architecture – Node Manager
Resource
ManagerClient
container
App Master
App Master
container
container container
Job Submission
Node Status
MapReduce Status
Resource Request
Submit job
request
Node
Manager
Node
Manager
Node
Manager
• Node Manager is a tracker that tracks the
jobs running
• Monitors each container’s resource
utilization
59. YARN Architecture – App Master
Resource
ManagerClient
container
App Master
App Master
container
container container
Job Submission
Node Status
MapReduce Status
Resource Request
Submit job
request
Node
Manager
Node
Manager
Node
Manager• Application Master manages resource
needs of individual applications
• Interacts with Scheduler to acquire
required resources
• Interacts with Node Manager to execute
and monitor tasks
60. YARN Architecture - Container
Resource
ManagerClient
container
App Master
App Master
container
container container
Job Submission
Node Status
MapReduce Status
Resource Request
Submit job
request
Node
Manager
Node
Manager
Node
Manager
• Container is a collection of resources
like RAM, CPU, Network Bandwidth
• Provides rights to an application to use
specific amount of resources