Where does hadoop come handy

•Download as PPT, PDF•

1 like•2,102 views

This document discusses Hadoop and big data. It begins with definitions of big data and how Hadoop can help with large, complex datasets. It then discusses how Hadoop works with other tools like Pig and Hive. The document outlines different scenarios for big data and whether Hadoop is suitable. It also discusses how big data frameworks have evolved from Google papers. Finally, it provides examples of big data use cases and how education is being democratized with big data tools.

What's hot

Hadoop demo ppt

Phil Young

Building a Business on Hadoop, HBase, and Open Source Distributed Computing

Bradford Stephens

Learning Apache HIVE - Data Warehouse and Query Language for Hadoop

Someshwar Kale

SQL on Hadoop: Defining the New Generation of Analytic SQL Databases

OReillyStrata

The document summarizes Carl Steinbach's presentation on SQL on Hadoop. It discusses how earlier systems like Hive had limitations for analytics workloads due to using MapReduce. A new architecture runs PostgreSQL on worker nodes co-located with HDFS data to enable push-down query processing for better performance. Citus Data's CitusDB product was presented as an example of this architecture, allowing SQL queries to efficiently analyze petabytes of data stored in HDFS.

Impala Unlocks Interactive BI on Hadoop

Cloudera, Inc.

The Cloudera Impala project is pioneering the next generation of Hadoop capabilities: the convergence of interactive SQL queries with the capacity, scalability, and flexibility of a Hadoop cluster. In this webinar, join Cloudera and MicroStrategy to learn how Impala works, how it is uniquely architected to provide an interactive SQL experience native to Hadoop, and how you can leverage the power of MicroStrategy 9.3.1 to easily tap into more data and make new discoveries.

Basics of big data analytics hadoop

Ambuj Kumar

The document provides an overview of big data, analytics, Hadoop, and related concepts. It discusses what big data is and the challenges it poses. It then describes Hadoop as an open-source platform for distributed storage and processing of large datasets across clusters of commodity hardware. Key components of Hadoop introduced include HDFS for storage, MapReduce for parallel processing, and various other tools. A word count example demonstrates how MapReduce works. Common use cases and companies using Hadoop are also listed.

Hadoop And Their Ecosystem

sunera pathan

The document provides an overview of Hadoop and its ecosystem. It discusses the history and architecture of Hadoop, describing how it uses distributed storage and processing to handle large datasets across clusters of commodity hardware. The key components of Hadoop include HDFS for storage, MapReduce for processing, and additional tools like Hive, Pig, HBase, Zookeeper, Flume, Sqoop and Oozie that make up its ecosystem. Advantages are its ability to handle unlimited data storage and high speed processing, while disadvantages include lower speeds for small datasets and limitations on data storage size.

Big data Hadoop Analytic and Data warehouse comparison guide

Danairat Thanabodithammachari

This document provides an overview of 4 solutions for processing big data using Hadoop and compares them. Solution 1 involves using core Hadoop processing without data staging or movement. Solution 2 uses BI tools to analyze Hadoop data after a single CSV transformation. Solution 3 creates a data warehouse in Hadoop after a single transformation. Solution 4 implements a traditional data warehouse. The solutions are then compared based on benefits like cloud readiness, parallel processing, and investment required. The document also includes steps for installing a Hadoop cluster and running sample MapReduce jobs and Excel processing.

Apache Drill

Ted Dunning

Practical Problem Solving with Apache Hadoop & Pig

Milind Bhandarkar

The document discusses a presentation about practical problem solving with Hadoop and Pig. It provides an agenda that covers introductions to Hadoop and Pig, including the Hadoop distributed file system, MapReduce, performance tuning, and examples. It discusses how Hadoop is used at Yahoo, including statistics on usage. It also provides examples of how Hadoop has been used for applications like log processing, search indexing, and machine learning.

Hadoop Developer

Edureka!

This document outlines the modules and topics covered in an Edureka course on Hadoop. The 10 modules cover understanding Big Data and Hadoop architecture, Hadoop cluster configuration, MapReduce framework, Pig, Hive, HBase, Hadoop 2.0 features, and Apache Oozie. Interactive questions are also included to test understanding of concepts like Hadoop core components, HDFS architecture, and MapReduce job execution.

Apache Drill

Big Data User Group Karlsruhe/Stuttgart

Apache Drill (http://incubator.apache.org/drill/) is a distributed system for interactive analysis of large-scale datasets, inspired by Google’s Dremel technology. It is designed to scale to thousands of servers and able to process Petabytes of data in seconds. Since its inception in mid 2012, Apache Drill has gained widespread interest in the community, attracting hundreds of interested individuals and companies. In the talk we discuss how Apache Drill enables ad-hoc interactive query at scale, walking through typical use cases and delve into Drill's architecture, the data flow and query languages as well as data sources supported.

What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka

Edureka!

YouTube Link: https://youtu.be/ll_O9JsjwT4 ** Big Data Hadoop Certification Training - https://www.edureka.co/big-data-hadoop-training-certification ** This Edureka PPT on "Hadoop components" will provide you with detailed knowledge about the top Hadoop Components and it will help you understand the different categories of Hadoop Components. This PPT covers the following topics: What is Hadoop? Core Components of Hadoop Hadoop Architecture Hadoop EcoSystem Hadoop Components in Data Storage General Purpose Execution Engines Hadoop Components in Database Management Hadoop Components in Data Abstraction Hadoop Components in Real-time Data Streaming Hadoop Components in Graph Processing Hadoop Components in Machine Learning Hadoop Cluster Management tools Follow us to never miss an update in the future. YouTube: https://www.youtube.com/user/edurekaIN Instagram: https://www.instagram.com/edureka_learning/ Facebook: https://www.facebook.com/edurekaIN/ Twitter: https://twitter.com/edurekain LinkedIn: https://www.linkedin.com/company/edureka Castbox: https://castbox.fm/networks/505?country=in

Hive at Yahoo: Letters from the trenches

DataWorks Summit

This document discusses challenges faced with running Hive at large scale at Yahoo. It describes how Yahoo runs Hive on 18 Hadoop clusters with over 400,000 nodes and 580PB of data. Even with optimizations like Tez, ORC, and vectorization, Yahoo encountered slow queries, out of memory errors, and slow partition pruning for queries on tables with millions of partitions. Fixes involved throwing more hardware at the metastore, client-side tuning, and addressing memory leaks and inefficiencies in the metastore and filesystem cache.

Introduction to Hadoop and Hadoop component

rebeccatho

This document provides an introduction to Apache Hadoop, which is an open-source software framework for distributed storage and processing of large datasets. It discusses Hadoop's main components of MapReduce and HDFS. MapReduce is a programming model for processing large datasets in a distributed manner, while HDFS provides distributed, fault-tolerant storage. Hadoop runs on commodity computer clusters and can scale to thousands of nodes.

The Evolution of the Hadoop Ecosystem

Cloudera, Inc.

The document provides an overview of the Apache Hadoop ecosystem. It describes Hadoop as a distributed, scalable storage and computation system based on Google's architecture. The ecosystem includes many related projects that interact, such as YARN, HDFS, Impala, Avro, Crunch, and HBase. These projects innovate independently but work together, with Hadoop serving as a flexible data platform at the core.

Hadoop Overview & Architecture

EMC

This document provides an overview of Hadoop architecture. It discusses how Hadoop uses MapReduce and HDFS to process and store large datasets reliably across commodity hardware. MapReduce allows distributed processing of data through mapping and reducing functions. HDFS provides a distributed file system that stores data reliably in blocks across nodes. The document outlines components like the NameNode, DataNodes and how Hadoop handles failures transparently at scale.

Hadoop Summit 2015: Hive at Yahoo: Letters from the Trenches

Mithun Radhakrishnan

6.hive

Prashant Gupta

Hadoop Ecosystem | Big Data Analytics Tools | Hadoop Tutorial | Edureka

Edureka!

This Edureka Hadoop Ecosystem Tutorial (Hadoop Ecosystem blog: https://goo.gl/EbuBGM) will help you understand about a set of tools and services which together form a Hadoop Ecosystem. Below are the topics covered in this Hadoop Ecosystem Tutorial: Hadoop Ecosystem: 1. HDFS - Hadoop Distributed File System 2. YARN - Yet Another Resource Negotiator 3. MapReduce - Data processing using programming 4. Spark - In-memory Data Processing 5. Pig, Hive - Data Processing Services using Query 6. HBase - NoSQL Database 7. Mahout, Spark MLlib - Machine Learning 8. Apache Drill - SQL on Hadoop 9. Zookeeper - Managing Cluster 10. Oozie - Job Scheduling 11. Flume, Sqoop - Data Ingesting Services 12. Solr & Lucene - Searching & Indexing 13. Ambari - Provision, Monitor and Maintain Cluster

What's hot (20)

Hadoop demo ppt

Building a Business on Hadoop, HBase, and Open Source Distributed Computing

Learning Apache HIVE - Data Warehouse and Query Language for Hadoop

SQL on Hadoop: Defining the New Generation of Analytic SQL Databases

Impala Unlocks Interactive BI on Hadoop

Basics of big data analytics hadoop

Hadoop And Their Ecosystem

Big data Hadoop Analytic and Data warehouse comparison guide

Apache Drill

Practical Problem Solving with Apache Hadoop & Pig

Hadoop Developer

Apache Drill

What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka

Hive at Yahoo: Letters from the trenches

Introduction to Hadoop and Hadoop component

The Evolution of the Hadoop Ecosystem

Hadoop Overview & Architecture

Hadoop Summit 2015: Hive at Yahoo: Letters from the Trenches

6.hive

Hadoop Ecosystem | Big Data Analytics Tools | Hadoop Tutorial | Edureka

Similar to Where does hadoop come handy

1. what is hadoop part 1

wintersnow181189

Hadoop is an open source framework that allows for the distributed processing of large data sets across clusters of commodity hardware. It was designed to scale from terabytes to petabytes of data and to handle both structured and unstructured data. Hadoop uses a programming model called MapReduce that partitions work across nodes in a cluster. It is not a replacement for a relational database as it is designed for batch processing large volumes of data rather than transactional workloads or business intelligence queries. Big data refers to the large and growing volumes of structured, semi-structured and unstructured data that are beyond the ability of traditional databases to capture, manage, and process. Examples of big data sources include social media, sensors, and internet activity,

Hadoop

Manuel Vargas

Apache Hadoop es un framework de software que soporta aplicaciones distribuidas bajo una licencia libre.1 Permite a las aplicaciones trabajar con miles de nodos y petabytes de datos. Hadoop se inspiró en los documentos Google para MapReduce y Google File System (GFS). Hadoop es un proyecto de alto nivel Apache que está siendo construido y usado por una comunidad global de contribuyentes,2 mediante el lenguaje de programación Java. Yahoo! ha sido el mayor contribuyente al proyecto,3 y usa Hadoop extensivamente en su negocio

First NL-HUG: Large-scale data processing at SARA with Apache Hadoop

Evert Lammerts

Hadoop hdfs interview questions

Kalyan Hadoop

The document provides an overview of Hadoop and HDFS. It discusses key concepts such as what big data is, examples of big data, an overview of Hadoop, the core components of HDFS and MapReduce, characteristics of HDFS including fault tolerance and throughput, the roles of the namenode and datanodes, and how data is stored and replicated in blocks in HDFS. It also answers common interview questions about Hadoop and HDFS.

Chattanooga Hadoop Meetup - Hadoop 101 - November 2014

Josh Patterson

Josh Patterson is a principal solution architect who has worked with Hadoop at Cloudera and Tennessee Valley Authority. Hadoop is an open-source software framework for distributed storage and processing of large datasets across clusters of commodity servers. It allows for consolidating mixed data types at low cost while keeping raw data always available. Hadoop uses commodity hardware and scales to petabytes without changes. Its distributed file system provides fault tolerance and replication while its processing engine handles all data types and scales processing.

Hadoop Overview

Gregg Barrett

1. The document provides an overview of Hadoop and big data technologies, use cases, common components, challenges, and considerations for implementing a big data initiative. 2. Financial and IT analytics are currently the top planned use cases for big data technologies according to Forrester Research. Hadoop is an open source software framework for distributed storage and processing of large datasets across clusters of computers. 3. Organizations face challenges in implementing big data initiatives including skills gaps, data management issues, and high costs of hardware, personnel, and supporting new technologies. Careful planning is required to realize value from big data.

Learn About Big Data and Hadoop The Most Significant Resource

Assignment Help

Data is now one of the most significant resources for businesses all around the world because of the digital revolution. However, the ability to gather, organize, process, and evaluate huge volumes of data has altered the way businesses function and arrive at educated decisions. Managing and gleaning information from the ever-expanding marine environments of information is impossible without Big Data and Hadoop. Both of which are at the vanguard of this data revolution. If you have selected a programming language, and have difficulties writing the best assignment, get the assistance of assessment help experts to learn more about it. In this blog, we will look at the basics of Big Data and Hadoop and how they work. However, we will also explore the nature of Big Data. Also, its defining features, and the difficulties it provides. We'll also take a look at how Hadoop, an open-source platform, has become a frontrunner in the race to solve the challenges posed by Big Data. These fully appreciate the potential for change of Big Data and Hadoop for businesses across a wide range of sectors. It is necessary first to grasp the central position that they play in current data-driven decision-making.

HadoopWorkshopJuly2014

Dieter De Witte

This document provides an overview of a Hadoop session that will cover: 1. An introduction to big data including the history and evolution of Hadoop and how it addresses challenges with traditional databases. 2. The Hadoop architecture and ecosystem including components like HDFS, MapReduce, HBase and how they address issues with scalability, flexibility and cost compared to traditional databases. 3. Hands-on analysis of a soccer dataset using Hadoop to perform tasks like data classification, prediction and player analysis.

00 hadoop welcome_transcript

Guru Janbheshver University, Hisar

Hadoop is an open-source framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Hadoop supports the processing of structured, unstructured and semi-structured data and is able to reliably store and process petabytes of data. Some key applications of Hadoop include web search indexing, data mining, machine learning, scientific data analysis, and business intelligence.

How Big Data ,Cloud Computing ,Data Science can help business

Ajay Ohri

Big data with java

Stefan Angelov

Big Data Applications with Java discusses various big data technologies including Apache Hadoop, Apache Spark, Apache Kafka, and Apache Cassandra. It defines big data as huge volumes of data that cannot be processed using traditional approaches due to constraints on storage and processing time. The document then covers characteristics of big data like volume, velocity, variety, veracity, variability, and value. It provides overviews of Apache Hadoop and its ecosystem including HDFS and MapReduce. Apache Spark is introduced as an enhancement to MapReduce that processes data faster in memory. Apache Kafka and Cassandra are also summarized as distributed streaming and database platforms respectively. The document concludes by comparing Hadoop and Spark, outlining their relative performance, costs, processing capabilities,

Hadoop @ Sara & BiG Grid

Evert Lammerts

This document discusses large-scale data processing using Apache Hadoop at SARA and BiG Grid. It provides an introduction to Hadoop and MapReduce, noting that data is easier to collect, store, and analyze in large quantities. Examples are given of projects using Hadoop at SARA, including analyzing Wikipedia data and structural health monitoring. The talk outlines the Hadoop ecosystem and timeline of its adoption at SARA. It discusses how scientists are using Hadoop for tasks like information retrieval, machine learning, and bioinformatics.

Is It A Right Time For Me To Learn Hadoop. Find out ?

Edureka!

Hadoop Webinar 28July15

Edureka!

This document discusses why Hadoop is important to learn. It predicts that by 2020 there will be 40 zettabytes of data in the world and big data is being used by companies like Amazon, Twitter, and stock markets. Hadoop allows for the use of big data by being fast, flexible, scalable, cost effective and fault tolerant. The document outlines career paths working with Hadoop and that 2015 will see further adoption by enterprises and emergence of Hadoop distributions from companies like IBM. It concludes by answering that it is the right time to learn Hadoop due to its proven worth, growing adoption, disappearing skills shortage, and job opportunities.

Hadoop.powerpoint.pptx

sonukumar379092

The document discusses the solution of Big Data problems using Hadoop. It introduces Hadoop as an open-source software written in Java that allows distributed processing of large datasets across computer clusters. It describes that Hadoop includes key modules like Hadoop Common, Hadoop Distributed File System (HDFS) and Hadoop MapReduce. HDFS provides a distributed file system that stores data across computer clusters, while MapReduce allows distributed processing of large datasets in a parallel and distributed manner. The document highlights that Hadoop provides a scalable solution for challenges in capturing, retrieving, storing, searching, sharing, analyzing and visualizing large datasets.

Big data and Hadoop overview

Nitesh Ghosh

Big Data-Survey

ijeei-iaes

Big data is the term for any gathering of information sets, so expensive and complex, that it gets to be hard to process for utilizing customary information handling applications. The difficulties incorporate investigation, catch, duration, inquiry, sharing, stockpiling, Exchange, perception, and protection infringement. To reduce spot business patterns, anticipate diseases, conflict etc., we require bigger data sets when compared with the smaller data sets. Enormous information is hard to work with utilizing most social database administration frameworks and desktop measurements and perception bundles, needing rather enormously parallel programming running on tens, hundreds, or even a large number of servers. In this paper there was an observation on Hadoop architecture, different tools used for big data and its security issues.

ANALYTICS OF DATA USING HADOOP-A REVIEW

International Journal of Technical Research & Application

This document discusses big data and Hadoop. It begins by describing the rapid growth of data from sources around the world. Hadoop provides a solution to challenges in storing and processing large volumes of unstructured data across distributed systems. The document then discusses key aspects of big data including the five V's (volume, velocity, variety, value and veracity). It provides examples of large companies using Hadoop and big data like Google, Facebook, Amazon and Twitter. The document concludes that Hadoop is well-suited for batch processing large datasets and provides advantages over relational database management systems.

Hadoop technology doc

tipanagiriharika

Hadoop is an open source framework that allows for the distributed processing of large data sets across clusters of computers. It uses a MapReduce programming model where the input data is distributed, mapped and transformed in parallel, and the results are reduced together. This process allows for massive amounts of data to be processed efficiently. Hadoop can handle both structured and unstructured data, uses commodity hardware, and provides reliability through data replication across nodes. It is well suited for large scale data analysis and mining.

Big data and hadoop

AshishRathore72

This document provides an overview of big data and Hadoop. It discusses what big data is, its types including structured, semi-structured and unstructured data. Some key sources of big data are also outlined. Hadoop is presented as a solution for managing big data through its core components like HDFS for storage and MapReduce for processing. The Hadoop ecosystem including other related tools like Hive, Pig, Spark and YARN is also summarized. Career opportunities in working with big data are listed in the end.

Similar to Where does hadoop come handy (20)

1. what is hadoop part 1

Hadoop

First NL-HUG: Large-scale data processing at SARA with Apache Hadoop

Hadoop hdfs interview questions

Chattanooga Hadoop Meetup - Hadoop 101 - November 2014

Hadoop Overview

Learn About Big Data and Hadoop The Most Significant Resource

HadoopWorkshopJuly2014

00 hadoop welcome_transcript

How Big Data ,Cloud Computing ,Data Science can help business

Big data with java

Hadoop @ Sara & BiG Grid

Is It A Right Time For Me To Learn Hadoop. Find out ?

Hadoop Webinar 28July15

Hadoop.powerpoint.pptx

Big data and Hadoop overview

Big Data-Survey

ANALYTICS OF DATA USING HADOOP-A REVIEW

Hadoop technology doc

Big data and hadoop

Recently uploaded

FIDO Munich Seminar Introduction to FIDO.pptx

FIDO Alliance

Keynote : Presentation on SASE Technology

Priyanka Aash

Secure Access Service Edge (SASE) solutions are revolutionizing enterprise networks by integrating SD-WAN with comprehensive security services. Traditionally, enterprises managed multiple point solutions for network and security needs, leading to complexity and resource-intensive operations. SASE, as defined by Gartner, consolidates these functions into a unified cloud-based service, offering SD-WAN capabilities alongside advanced security features like secure web gateways, CASB, and remote browser isolation. This convergence not only simplifies management but also enhances security posture and application performance across global networks and cloud environments. Discover how adopting SASE can streamline operations and fortify your enterprise's digital transformation strategy.

Redefining Cybersecurity with AI Capabilities

Priyanka Aash

In this comprehensive overview of Cisco's latest innovations in cybersecurity, the focus is squarely on resilience and adaptation in the face of evolving threats. The discussion covers the imperative of tackling Mal information, the increasing sophistication of insider attacks, and the expanding attack surfaces in a hybrid work environment. Emphasizing a shift towards integrated platforms over fragmented tools, Cisco introduces its Security Cloud, designed to provide end-to-end visibility and robust protection across user interactions, cloud environments, and breaches. AI emerges as a pivotal tool, from enhancing user experiences to predicting and defending against cyber threats. The blog underscores Cisco's commitment to simplifying security stacks while ensuring efficacy and economic feasibility, making a compelling case for their platform approach in safeguarding digital landscapes.

Top 12 AI Technology Trends For 2024.pdf

Marrie Morris

Demystifying Neural Networks And Building Cybersecurity Applications

Priyanka Aash

In today's rapidly evolving technological landscape, Artificial Neural Networks (ANNs) have emerged as a cornerstone of artificial intelligence, revolutionizing various fields including cybersecurity. Inspired by the intricacies of the human brain, ANNs have a rich history and a complex structure that enables them to learn and make decisions. This blog aims to unravel the mysteries of neural networks, explore their mathematical foundations, and demonstrate their practical applications, particularly in building robust malware detection systems using Convolutional Neural Networks (CNNs).

Self-Healing Test Automation Framework - Healenium

Knoldus Inc.

DefCamp_2016_Chemerkin_Yury_--_publish.pdf

Yury Chemerkin

Retrieval Augmented Generation Evaluation with Ragas

Zilliz

Retrieval Augmented Generation (RAG) enhances chatbots by incorporating custom data in the prompt. Using large language models (LLMs) as judge has gained prominence in modern RAG systems. This talk will demo Ragas, an open-source automation tool for RAG evaluations. Christy will talk about and demo evaluating a RAG pipeline using Milvus and RAG metrics like context F1-score and answer correctness.

Indian Privacy law & Infosec for Startups

AMol NAik

Scaling Vector Search: How Milvus Handles Billions+

Zilliz

"Making .NET Application Even Faster", Sergey Teplyakov.pptx

Fwdays

In this talk we're going to explore performance improvement lifecycle, starting with setting the performance goals, using profilers to figure out the bottle necks, making a fix and validating that the fix works by benchmarking it. The talk will be useful for novice and seasoned .NET developers and architects interested in making their application fast and understanding how things work under the hood.

FIDO Munich Seminar: Strong Workforce Authn Push & Pull Factors.pptx

FIDO Alliance

What's New in Teams Calling, Meetings, Devices June 2024

Stephanie Beckett

The Challenge of Interpretability in Generative AI Models.pdf

Sara Kroft

Navigating the intricacies of generative AI models reveals a pressing challenge: interpretability. Our blog delves into the complexities of understanding how these advanced models make decisions, shedding light on the mechanisms behind their outputs. Explore the latest research, practical implications, and ethical considerations, as we unravel the opaque processes that drive generative AI. Join us in this insightful journey to demystify the black box of artificial intelligence. Dive into the complexities of generative AI with our blog on interpretability. Find out why making AI models understandable is key to trust and ethical use and discover current efforts to tackle this big challenge.

It's your unstructured data: How to get your GenAI app to production (and spe...

Zilliz

So you've successfully built a GenAI app POC for your company -- now comes the hard part: bringing it to production. Aparavi addresses the challenges of AI projects while addressing data privacy and PII. Our Service for RAG helps AI developers and data scientists to scale their app to 1000s to millions of users using corporate unstructured data. Aparavi’s AI Data Loader cleans, prepares and then loads only the relevant unstructured data for each AI project/app, enabling you to operationalize the creation of GenAI apps easily and accurately while giving you the time to focus on what you really want to do - building a great AI application with useful and relevant context. All within your environment and never having to share private corporate data with anyone - not even Aparavi.

AMD Zen 5 Architecture Deep Dive from Tech Day

Low Hong Chuan

Increase Quality with User Access Policies - July 2024

Peter Caitens

NVIDIA at Breakthrough Discuss for Space Exploration

Alison B. Lowndes

Mule Experience Hub and Release Channel with Java 17

Bhajan Mehta

History and Introduction for Generative AI ( GenAI )

Badri_Bady

Recently uploaded (20)

FIDO Munich Seminar Introduction to FIDO.pptx

Keynote : Presentation on SASE Technology

Redefining Cybersecurity with AI Capabilities

Top 12 AI Technology Trends For 2024.pdf

Demystifying Neural Networks And Building Cybersecurity Applications

Self-Healing Test Automation Framework - Healenium

DefCamp_2016_Chemerkin_Yury_--_publish.pdf

Retrieval Augmented Generation Evaluation with Ragas

Indian Privacy law & Infosec for Startups

Scaling Vector Search: How Milvus Handles Billions+

"Making .NET Application Even Faster", Sergey Teplyakov.pptx

FIDO Munich Seminar: Strong Workforce Authn Push & Pull Factors.pptx

What's New in Teams Calling, Meetings, Devices June 2024

The Challenge of Interpretability in Generative AI Models.pdf

It's your unstructured data: How to get your GenAI app to production (and spe...

AMD Zen 5 Architecture Deep Dive from Tech Day

Increase Quality with User Access Policies - July 2024

NVIDIA at Breakthrough Discuss for Space Exploration

Mule Experience Hub and Release Channel with Java 17

History and Introduction for Generative AI ( GenAI )

Where does hadoop come handy

1. Where does Hadoop come handy? praveensripati@gmail.com www.thecloudavenue.com @praveensripati

2. Agenda isn't used as

3. What's Big Data? According to Wikipedia (http://en.wikipedia.org/wiki/Big_data) the definition of Big Data is In information technology, Big Data is a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools. ~ ~

4. Hadoop acting like a kernel

5. Workload distribution across installationsPig play an important role Hive n in the Hadoop ecosystem http://www.cloudera.com/blog/2012/09/what-do-real-life-hadoop-workloads-look-like/

6. Different Big Data scenarios Scenario Is Hadoop good for it? What are the alternatives? Real time processing No HStreaming, Twitter Storm Iterative Processing No Apache Hama, Apache Giraph, Jung Adhoc Interactive No Apache Drill, Open Querying Dremel Batch Processing Yes

7. How have Big Data frameworks evolved? Google Paper Apache Component There has been 4-5 years gap between Google releasing a paper and The Google File System (October, 2003) us seeing an implementation of it. HDFS (2008 became Apache TLP) MapReduce: Simplified Data Processing MapReduce (2008 became Apache TLP) on Large Clusters (December, 2004) Bigtable: A Distributed Storage System for HBase (2010 became Apache TLP), Structured Data (November, 2006) Cassandra (2010 became Apache TLP) Large-scale graph computing at Google Hama, Giraph (2012 became Apache (June, 2009) TLP) Dremel: Interactive Analysis of Web-Scale Apache Drill (Incubated in August, 2012) Datasets (2010) Spanner: Google's Globally-Distributed ???? Database (September, 2012)

8. What happens to the data once it is stored? If you aren’t taking advantage of big data, then you don’t have big data, you have just a pile of data. Descriptive analytics Predictive and Prescriptive analytics - What happened? - Why did it happen? - When did it happen? - When will it happen again? - What was it's impact? - What caused it to happen? - What can be done to avoid it?

9. Evolution of Big Data use cases Hadoop has evolved from Yahoo and Google which are Web 2.0 companies for their massive text processing requirements like - log processing - search index - recommendations - context based advertising Ads & E-commerce, Astronomy, Social Networks, Bioinformatics/Medical Informatics, Machine Translation, Spatial Data Processing, Information Extraction and Text Processing, Artificial Intelligence/Machine Learning/Data Mining, Search Query Analysis, Information Retrieval (Search), Spam & Malware Detection, Image and Video Processing, Networking, Simulation, Statistics, Numerical Mathematics, Sets & Graphs http://atbrox.com/2011/05/16/mapreduce-hadoop-algorithms-in-academic-papers-4th-update-may-2011/

10. Few of the Big Data use cases World Bank kicked an initiative to improve the Sanitation and Water that would impact 1B people. Neural Networks for Breast Cancer prize by Google. Fraud Detection in financial industry. Predictive Maintenance scheduling (like aircraft engines). Walmart and Sears Holding use POS information to stock different products in the stores and also for the SCM. Customer profiling and segmentation for targetted campaigns. Follow the competetions in Kaagle for more use case.

11. Democratization of Education https://www.coursera.org/ http://www.udacity.com/ http://www.khanacademy.org/ http://www.youtube.com/user/nptelhrd/ https://www.edx.org/ to Machine Learning Music

12. Keep Looking Out There is a lot more than Hadoop and some of them are mature and some are still evolving !!!

13. Q&A

Where does hadoop come handy

Related slideshows

More Related Content

What's hot

What's hot (20)

Similar to Where does hadoop come handy

Similar to Where does hadoop come handy (20)

Recently uploaded

Recently uploaded (20)

Where does hadoop come handy