This document discusses Hadoop and big data. It begins with definitions of big data and how Hadoop can help with large, complex datasets. It then discusses how Hadoop works with other tools like Pig and Hive. The document outlines different scenarios for big data and whether Hadoop is suitable. It also discusses how big data frameworks have evolved from Google papers. Finally, it provides examples of big data use cases and how education is being democratized with big data tools.
This is the basis for some talks I've given at Microsoft Technology Center, the Chicago Mercantile exchange, and local user groups over the past 2 years. It's a bit dated now, but it might be useful to some people. If you like it, have feedback, or would like someone to explain Hadoop or how it and other new tools can help your company, let me know.
Building a Business on Hadoop, HBase, and Open Source Distributed ComputingBradford Stephens
This is a talk on a fundamental approach to thinking about scalability, and how Hadoop, HBase, and Lucene are enabling companies to process amazing amounts of data. It's also about how Social Media is making the traditional RDBMS irrelevant.
Learning Apache HIVE - Data Warehouse and Query Language for HadoopSomeshwar Kale
This presentation is based on my experience while learning HIVE. Most of the things(Limitation and features) covered in ppt are in incubating phase while writing this tutorial.
SQL on Hadoop: Defining the New Generation of Analytic SQL DatabasesOReillyStrata
The document summarizes Carl Steinbach's presentation on SQL on Hadoop. It discusses how earlier systems like Hive had limitations for analytics workloads due to using MapReduce. A new architecture runs PostgreSQL on worker nodes co-located with HDFS data to enable push-down query processing for better performance. Citus Data's CitusDB product was presented as an example of this architecture, allowing SQL queries to efficiently analyze petabytes of data stored in HDFS.
The Cloudera Impala project is pioneering the next generation of Hadoop capabilities: the convergence of interactive SQL queries with the capacity, scalability, and flexibility of a Hadoop cluster. In this webinar, join Cloudera and MicroStrategy to learn how Impala works, how it is uniquely architected to provide an interactive SQL experience native to Hadoop, and how you can leverage the power of MicroStrategy 9.3.1 to easily tap into more data and make new discoveries.
The document provides an overview of big data, analytics, Hadoop, and related concepts. It discusses what big data is and the challenges it poses. It then describes Hadoop as an open-source platform for distributed storage and processing of large datasets across clusters of commodity hardware. Key components of Hadoop introduced include HDFS for storage, MapReduce for parallel processing, and various other tools. A word count example demonstrates how MapReduce works. Common use cases and companies using Hadoop are also listed.
The document provides an overview of Hadoop and its ecosystem. It discusses the history and architecture of Hadoop, describing how it uses distributed storage and processing to handle large datasets across clusters of commodity hardware. The key components of Hadoop include HDFS for storage, MapReduce for processing, and additional tools like Hive, Pig, HBase, Zookeeper, Flume, Sqoop and Oozie that make up its ecosystem. Advantages are its ability to handle unlimited data storage and high speed processing, while disadvantages include lower speeds for small datasets and limitations on data storage size.
This document provides an overview of 4 solutions for processing big data using Hadoop and compares them. Solution 1 involves using core Hadoop processing without data staging or movement. Solution 2 uses BI tools to analyze Hadoop data after a single CSV transformation. Solution 3 creates a data warehouse in Hadoop after a single transformation. Solution 4 implements a traditional data warehouse. The solutions are then compared based on benefits like cloud readiness, parallel processing, and investment required. The document also includes steps for installing a Hadoop cluster and running sample MapReduce jobs and Excel processing.
Summary of recent progress on Apache Drill, an open-source community-driven project to provide easy, dependable, fast and flexible ad hoc query capabilities.
The document discusses a presentation about practical problem solving with Hadoop and Pig. It provides an agenda that covers introductions to Hadoop and Pig, including the Hadoop distributed file system, MapReduce, performance tuning, and examples. It discusses how Hadoop is used at Yahoo, including statistics on usage. It also provides examples of how Hadoop has been used for applications like log processing, search indexing, and machine learning.
This document outlines the modules and topics covered in an Edureka course on Hadoop. The 10 modules cover understanding Big Data and Hadoop architecture, Hadoop cluster configuration, MapReduce framework, Pig, Hive, HBase, Hadoop 2.0 features, and Apache Oozie. Interactive questions are also included to test understanding of concepts like Hadoop core components, HDFS architecture, and MapReduce job execution.
Apache Drill (http://incubator.apache.org/drill/) is a distributed system for interactive analysis of large-scale datasets, inspired by Google’s Dremel technology. It is designed to scale to thousands of servers and able to process Petabytes of data in seconds. Since its inception in mid 2012, Apache Drill has gained widespread interest in the community, attracting hundreds of interested individuals and companies. In the talk we discuss how Apache Drill enables ad-hoc interactive query at scale, walking through typical use cases and delve into Drill's architecture, the data flow and query languages as well as data sources supported.
What are Hadoop Components? Hadoop Ecosystem and Architecture | EdurekaEdureka!
YouTube Link: https://youtu.be/ll_O9JsjwT4
** Big Data Hadoop Certification Training - https://www.edureka.co/big-data-hadoop-training-certification **
This Edureka PPT on "Hadoop components" will provide you with detailed knowledge about the top Hadoop Components and it will help you understand the different categories of Hadoop Components. This PPT covers the following topics:
What is Hadoop?
Core Components of Hadoop
Hadoop Architecture
Hadoop EcoSystem
Hadoop Components in Data Storage
General Purpose Execution Engines
Hadoop Components in Database Management
Hadoop Components in Data Abstraction
Hadoop Components in Real-time Data Streaming
Hadoop Components in Graph Processing
Hadoop Components in Machine Learning
Hadoop Cluster Management tools
Follow us to never miss an update in the future.
YouTube: https://www.youtube.com/user/edurekaIN
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
Castbox: https://castbox.fm/networks/505?country=in
This document discusses challenges faced with running Hive at large scale at Yahoo. It describes how Yahoo runs Hive on 18 Hadoop clusters with over 400,000 nodes and 580PB of data. Even with optimizations like Tez, ORC, and vectorization, Yahoo encountered slow queries, out of memory errors, and slow partition pruning for queries on tables with millions of partitions. Fixes involved throwing more hardware at the metastore, client-side tuning, and addressing memory leaks and inefficiencies in the metastore and filesystem cache.
Introduction to Hadoop and Hadoop component rebeccatho
This document provides an introduction to Apache Hadoop, which is an open-source software framework for distributed storage and processing of large datasets. It discusses Hadoop's main components of MapReduce and HDFS. MapReduce is a programming model for processing large datasets in a distributed manner, while HDFS provides distributed, fault-tolerant storage. Hadoop runs on commodity computer clusters and can scale to thousands of nodes.
The document provides an overview of the Apache Hadoop ecosystem. It describes Hadoop as a distributed, scalable storage and computation system based on Google's architecture. The ecosystem includes many related projects that interact, such as YARN, HDFS, Impala, Avro, Crunch, and HBase. These projects innovate independently but work together, with Hadoop serving as a flexible data platform at the core.
This document provides an overview of Hadoop architecture. It discusses how Hadoop uses MapReduce and HDFS to process and store large datasets reliably across commodity hardware. MapReduce allows distributed processing of data through mapping and reducing functions. HDFS provides a distributed file system that stores data reliably in blocks across nodes. The document outlines components like the NameNode, DataNodes and how Hadoop handles failures transparently at scale.
Hadoop Summit 2015: Hive at Yahoo: Letters from the TrenchesMithun Radhakrishnan
Here's the talk that we presented at the Hadoop Summit 2015, in San Jose. This was an inside look at how we at Yahoo scaled Hive to work at Yahoo's data/metadata scale.
Apache Hive is a data warehouse infrastructure built on top of Hadoop for providing data summarization, query, and analysis. While developed by Facebook.
Hadoop Ecosystem | Big Data Analytics Tools | Hadoop Tutorial | Edureka Edureka!
This Edureka Hadoop Ecosystem Tutorial (Hadoop Ecosystem blog: https://goo.gl/EbuBGM) will help you understand about a set of tools and services which together form a Hadoop Ecosystem. Below are the topics covered in this Hadoop Ecosystem Tutorial:
Hadoop Ecosystem:
1. HDFS - Hadoop Distributed File System
2. YARN - Yet Another Resource Negotiator
3. MapReduce - Data processing using programming
4. Spark - In-memory Data Processing
5. Pig, Hive - Data Processing Services using Query
6. HBase - NoSQL Database
7. Mahout, Spark MLlib - Machine Learning
8. Apache Drill - SQL on Hadoop
9. Zookeeper - Managing Cluster
10. Oozie - Job Scheduling
11. Flume, Sqoop - Data Ingesting Services
12. Solr & Lucene - Searching & Indexing
13. Ambari - Provision, Monitor and Maintain Cluster
Hadoop is an open source framework that allows for the distributed processing of large data sets across clusters of commodity hardware. It was designed to scale from terabytes to petabytes of data and to handle both structured and unstructured data. Hadoop uses a programming model called MapReduce that partitions work across nodes in a cluster. It is not a replacement for a relational database as it is designed for batch processing large volumes of data rather than transactional workloads or business intelligence queries. Big data refers to the large and growing volumes of structured, semi-structured and unstructured data that are beyond the ability of traditional databases to capture, manage, and process. Examples of big data sources include social media, sensors, and internet activity,
Apache Hadoop es un framework de software que soporta aplicaciones distribuidas bajo una licencia libre.1 Permite a las aplicaciones trabajar con miles de nodos y petabytes de datos. Hadoop se inspiró en los documentos Google para MapReduce y Google File System (GFS).
Hadoop es un proyecto de alto nivel Apache que está siendo construido y usado por una comunidad global de contribuyentes,2 mediante el lenguaje de programación Java. Yahoo! ha sido el mayor contribuyente al proyecto,3 y usa Hadoop extensivamente en su negocio
The document provides an overview of Hadoop and HDFS. It discusses key concepts such as what big data is, examples of big data, an overview of Hadoop, the core components of HDFS and MapReduce, characteristics of HDFS including fault tolerance and throughput, the roles of the namenode and datanodes, and how data is stored and replicated in blocks in HDFS. It also answers common interview questions about Hadoop and HDFS.
Chattanooga Hadoop Meetup - Hadoop 101 - November 2014Josh Patterson
Josh Patterson is a principal solution architect who has worked with Hadoop at Cloudera and Tennessee Valley Authority. Hadoop is an open-source software framework for distributed storage and processing of large datasets across clusters of commodity servers. It allows for consolidating mixed data types at low cost while keeping raw data always available. Hadoop uses commodity hardware and scales to petabytes without changes. Its distributed file system provides fault tolerance and replication while its processing engine handles all data types and scales processing.
1. The document provides an overview of Hadoop and big data technologies, use cases, common components, challenges, and considerations for implementing a big data initiative.
2. Financial and IT analytics are currently the top planned use cases for big data technologies according to Forrester Research. Hadoop is an open source software framework for distributed storage and processing of large datasets across clusters of computers.
3. Organizations face challenges in implementing big data initiatives including skills gaps, data management issues, and high costs of hardware, personnel, and supporting new technologies. Careful planning is required to realize value from big data.
Learn About Big Data and Hadoop The Most Significant ResourceAssignment Help
Data is now one of the most significant resources for businesses all around the world because of the digital revolution. However, the ability to gather, organize, process, and evaluate huge volumes of data has altered the way businesses function and arrive at educated decisions. Managing and gleaning information from the ever-expanding marine environments of information is impossible without Big Data and Hadoop. Both of which are at the vanguard of this data revolution.
If you have selected a programming language, and have difficulties writing the best assignment, get the assistance of assessment help experts to learn more about it. In this blog, we will look at the basics of Big Data and Hadoop and how they work. However, we will also explore the nature of Big Data. Also, its defining features, and the difficulties it provides. We'll also take a look at how Hadoop, an open-source platform, has become a frontrunner in the race to solve the challenges posed by Big Data. These fully appreciate the potential for change of Big Data and Hadoop for businesses across a wide range of sectors. It is necessary first to grasp the central position that they play in current data-driven decision-making.
This document provides an overview of a Hadoop session that will cover:
1. An introduction to big data including the history and evolution of Hadoop and how it addresses challenges with traditional databases.
2. The Hadoop architecture and ecosystem including components like HDFS, MapReduce, HBase and how they address issues with scalability, flexibility and cost compared to traditional databases.
3. Hands-on analysis of a soccer dataset using Hadoop to perform tasks like data classification, prediction and player analysis.
Hadoop is an open-source framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Hadoop supports the processing of structured, unstructured and semi-structured data and is able to reliably store and process petabytes of data. Some key applications of Hadoop include web search indexing, data mining, machine learning, scientific data analysis, and business intelligence.
Big Data Applications with Java discusses various big data technologies including Apache Hadoop, Apache Spark, Apache Kafka, and Apache Cassandra. It defines big data as huge volumes of data that cannot be processed using traditional approaches due to constraints on storage and processing time. The document then covers characteristics of big data like volume, velocity, variety, veracity, variability, and value. It provides overviews of Apache Hadoop and its ecosystem including HDFS and MapReduce. Apache Spark is introduced as an enhancement to MapReduce that processes data faster in memory. Apache Kafka and Cassandra are also summarized as distributed streaming and database platforms respectively. The document concludes by comparing Hadoop and Spark, outlining their relative performance, costs, processing capabilities,
This document discusses large-scale data processing using Apache Hadoop at SARA and BiG Grid. It provides an introduction to Hadoop and MapReduce, noting that data is easier to collect, store, and analyze in large quantities. Examples are given of projects using Hadoop at SARA, including analyzing Wikipedia data and structural health monitoring. The talk outlines the Hadoop ecosystem and timeline of its adoption at SARA. It discusses how scientists are using Hadoop for tasks like information retrieval, machine learning, and bioinformatics.
Is It A Right Time For Me To Learn Hadoop. Find out ?Edureka!
Forrester predicts, CIOs who are late to the Hadoop game will finally make the platform a priority in 2015. Hadoop has evolved as a must-to-know technology and has been a reason for better career, salary and job opportunities for many professionals.
This document discusses why Hadoop is important to learn. It predicts that by 2020 there will be 40 zettabytes of data in the world and big data is being used by companies like Amazon, Twitter, and stock markets. Hadoop allows for the use of big data by being fast, flexible, scalable, cost effective and fault tolerant. The document outlines career paths working with Hadoop and that 2015 will see further adoption by enterprises and emergence of Hadoop distributions from companies like IBM. It concludes by answering that it is the right time to learn Hadoop due to its proven worth, growing adoption, disappearing skills shortage, and job opportunities.
The document discusses the solution of Big Data problems using Hadoop. It introduces Hadoop as an open-source software written in Java that allows distributed processing of large datasets across computer clusters. It describes that Hadoop includes key modules like Hadoop Common, Hadoop Distributed File System (HDFS) and Hadoop MapReduce. HDFS provides a distributed file system that stores data across computer clusters, while MapReduce allows distributed processing of large datasets in a parallel and distributed manner. The document highlights that Hadoop provides a scalable solution for challenges in capturing, retrieving, storing, searching, sharing, analyzing and visualizing large datasets.
I have collected information for the beginners to provide an overview of big data and hadoop which will help them to understand the basics and give them a Start-Up.
Big data is the term for any gathering of information sets, so expensive and complex, that it gets to be hard to process for utilizing customary information handling applications. The difficulties incorporate investigation, catch, duration, inquiry, sharing, stockpiling, Exchange, perception, and protection infringement. To reduce spot business patterns, anticipate diseases, conflict etc., we require bigger data sets when compared with the smaller data sets. Enormous information is hard to work with utilizing most social database administration frameworks and desktop measurements and perception bundles, needing rather enormously parallel programming running on tens, hundreds, or even a large number of servers. In this paper there was an observation on Hadoop architecture, different tools used for big data and its security issues.
This document discusses big data and Hadoop. It begins by describing the rapid growth of data from sources around the world. Hadoop provides a solution to challenges in storing and processing large volumes of unstructured data across distributed systems. The document then discusses key aspects of big data including the five V's (volume, velocity, variety, value and veracity). It provides examples of large companies using Hadoop and big data like Google, Facebook, Amazon and Twitter. The document concludes that Hadoop is well-suited for batch processing large datasets and provides advantages over relational database management systems.
Hadoop is an open source framework that allows for the distributed processing of large data sets across clusters of computers. It uses a MapReduce programming model where the input data is distributed, mapped and transformed in parallel, and the results are reduced together. This process allows for massive amounts of data to be processed efficiently. Hadoop can handle both structured and unstructured data, uses commodity hardware, and provides reliability through data replication across nodes. It is well suited for large scale data analysis and mining.
This document provides an overview of big data and Hadoop. It discusses what big data is, its types including structured, semi-structured and unstructured data. Some key sources of big data are also outlined. Hadoop is presented as a solution for managing big data through its core components like HDFS for storage and MapReduce for processing. The Hadoop ecosystem including other related tools like Hive, Pig, Spark and YARN is also summarized. Career opportunities in working with big data are listed in the end.
Keynote : Presentation on SASE TechnologyPriyanka Aash
Secure Access Service Edge (SASE) solutions are revolutionizing enterprise networks by integrating SD-WAN with comprehensive security services. Traditionally, enterprises managed multiple point solutions for network and security needs, leading to complexity and resource-intensive operations. SASE, as defined by Gartner, consolidates these functions into a unified cloud-based service, offering SD-WAN capabilities alongside advanced security features like secure web gateways, CASB, and remote browser isolation. This convergence not only simplifies management but also enhances security posture and application performance across global networks and cloud environments. Discover how adopting SASE can streamline operations and fortify your enterprise's digital transformation strategy.
Redefining Cybersecurity with AI CapabilitiesPriyanka Aash
In this comprehensive overview of Cisco's latest innovations in cybersecurity, the focus is squarely on resilience and adaptation in the face of evolving threats. The discussion covers the imperative of tackling Mal information, the increasing sophistication of insider attacks, and the expanding attack surfaces in a hybrid work environment. Emphasizing a shift towards integrated platforms over fragmented tools, Cisco introduces its Security Cloud, designed to provide end-to-end visibility and robust protection across user interactions, cloud environments, and breaches. AI emerges as a pivotal tool, from enhancing user experiences to predicting and defending against cyber threats. The blog underscores Cisco's commitment to simplifying security stacks while ensuring efficacy and economic feasibility, making a compelling case for their platform approach in safeguarding digital landscapes.
Top 12 AI Technology Trends For 2024.pdfMarrie Morris
Technology has become an irreplaceable component of our daily lives. The role of AI in technology revolutionizes our lives for the betterment of the future. In this article, we will learn about the top 12 AI technology trends for 2024.
Demystifying Neural Networks And Building Cybersecurity ApplicationsPriyanka Aash
In today's rapidly evolving technological landscape, Artificial Neural Networks (ANNs) have emerged as a cornerstone of artificial intelligence, revolutionizing various fields including cybersecurity. Inspired by the intricacies of the human brain, ANNs have a rich history and a complex structure that enables them to learn and make decisions. This blog aims to unravel the mysteries of neural networks, explore their mathematical foundations, and demonstrate their practical applications, particularly in building robust malware detection systems using Convolutional Neural Networks (CNNs).
Self-Healing Test Automation Framework - HealeniumKnoldus Inc.
Revolutionize your test automation with Healenium's self-healing framework. Automate test maintenance, reduce flakes, and increase efficiency. Learn how to build a robust test automation foundation. Discover the power of self-healing tests. Transform your testing experience.
DefCamp_2016_Chemerkin_Yury-publish.pdf - Presentation by Yury Chemerkin at DefCamp 2016 discussing mobile app vulnerabilities, data protection issues, and analysis of security levels across different types of mobile applications.
Retrieval Augmented Generation Evaluation with RagasZilliz
Retrieval Augmented Generation (RAG) enhances chatbots by incorporating custom data in the prompt. Using large language models (LLMs) as judge has gained prominence in modern RAG systems. This talk will demo Ragas, an open-source automation tool for RAG evaluations. Christy will talk about and demo evaluating a RAG pipeline using Milvus and RAG metrics like context F1-score and answer correctness.
"Making .NET Application Even Faster", Sergey Teplyakov.pptxFwdays
In this talk we're going to explore performance improvement lifecycle, starting with setting the performance goals, using profilers to figure out the bottle necks, making a fix and validating that the fix works by benchmarking it. The talk will be useful for novice and seasoned .NET developers and architects interested in making their application fast and understanding how things work under the hood.
The Challenge of Interpretability in Generative AI Models.pdfSara Kroft
Navigating the intricacies of generative AI models reveals a pressing challenge: interpretability. Our blog delves into the complexities of understanding how these advanced models make decisions, shedding light on the mechanisms behind their outputs. Explore the latest research, practical implications, and ethical considerations, as we unravel the opaque processes that drive generative AI. Join us in this insightful journey to demystify the black box of artificial intelligence.
Dive into the complexities of generative AI with our blog on interpretability. Find out why making AI models understandable is key to trust and ethical use and discover current efforts to tackle this big challenge.
It's your unstructured data: How to get your GenAI app to production (and spe...Zilliz
So you've successfully built a GenAI app POC for your company -- now comes the hard part: bringing it to production. Aparavi addresses the challenges of AI projects while addressing data privacy and PII. Our Service for RAG helps AI developers and data scientists to scale their app to 1000s to millions of users using corporate unstructured data. Aparavi’s AI Data Loader cleans, prepares and then loads only the relevant unstructured data for each AI project/app, enabling you to operationalize the creation of GenAI apps easily and accurately while giving you the time to focus on what you really want to do - building a great AI application with useful and relevant context. All within your environment and never having to share private corporate data with anyone - not even Aparavi.
Increase Quality with User Access Policies - July 2024Peter Caitens
⭐️ Increase Quality with User Access Policies ⭐️, presented by Peter Caitens and Adam Best of Salesforce. View the slides from this session to hear all about “User Access Policies” and how they can help you onboard users faster with greater quality.
3. What's Big Data?
According to Wikipedia (http://en.wikipedia.org/wiki/Big_data) the definition of Big Data is
In information technology, Big Data is a collection of data sets so large and complex that
it becomes difficult to process using on-hand database management tools.
~
~
5. Workload distribution across
installationsPig play an important role
Hive n
in the Hadoop ecosystem
http://www.cloudera.com/blog/2012/09/what-do-real-life-hadoop-workloads-look-like/
6. Different Big Data scenarios
Scenario Is Hadoop good for it? What are the
alternatives?
Real time processing No HStreaming, Twitter Storm
Iterative Processing No Apache Hama, Apache
Giraph, Jung
Adhoc Interactive No Apache Drill, Open
Querying Dremel
Batch Processing Yes
7. How have Big Data frameworks
evolved?
Google Paper Apache Component
There has been 4-5 years gap between
Google releasing a paper and
The Google File System (October, 2003) us seeing an implementation of it.
HDFS (2008 became Apache TLP)
MapReduce: Simplified Data Processing MapReduce (2008 became Apache TLP)
on Large Clusters (December, 2004)
Bigtable: A Distributed Storage System for HBase (2010 became Apache TLP),
Structured Data (November, 2006) Cassandra (2010 became Apache TLP)
Large-scale graph computing at Google Hama, Giraph (2012 became Apache
(June, 2009) TLP)
Dremel: Interactive Analysis of Web-Scale Apache Drill (Incubated in August, 2012)
Datasets (2010)
Spanner: Google's Globally-Distributed ????
Database (September, 2012)
8. What happens to the data once it
is stored?
If you aren’t taking advantage of big data,
then you don’t have big data,
you have just a pile of data.
Descriptive analytics Predictive and Prescriptive analytics
- What happened? - Why did it happen?
- When did it happen? - When will it happen again?
- What was it's impact? - What caused it to happen?
- What can be done to avoid it?
9. Evolution of Big Data use cases
Hadoop has evolved from Yahoo and Google
which are Web 2.0 companies for their massive
text processing requirements like
- log processing
- search index
- recommendations
- context based advertising
Ads & E-commerce, Astronomy, Social Networks, Bioinformatics/Medical Informatics, Machine Translation,
Spatial Data Processing, Information Extraction and Text Processing,
Artificial Intelligence/Machine Learning/Data Mining, Search Query Analysis, Information Retrieval (Search),
Spam & Malware Detection, Image and Video Processing, Networking,
Simulation, Statistics, Numerical Mathematics, Sets & Graphs
http://atbrox.com/2011/05/16/mapreduce-hadoop-algorithms-in-academic-papers-4th-update-may-2011/
10. Few of the Big Data use cases
World Bank kicked an initiative to improve the
Sanitation and Water that would impact 1B people.
Neural Networks for Breast Cancer prize by Google.
Fraud Detection in financial industry.
Predictive Maintenance scheduling (like aircraft
engines).
Walmart and Sears Holding use POS information to
stock different products in the stores and also for the
SCM.
Customer profiling and segmentation for targetted
campaigns.
Follow the competetions in Kaagle for more use case.
11. Democratization of Education
https://www.coursera.org/
http://www.udacity.com/
http://www.khanacademy.org/
http://www.youtube.com/user/nptelhrd/
https://www.edx.org/
to
Machine Learning Music
12. Keep Looking Out
There is a lot more than Hadoop and some of them are mature
and some are still evolving !!!