SlideShare a Scribd company logo
Big
Data
Systems
• Before 2004 “Google have implemented
hundreds of special-purpose computations
that process large amounts of raw data, such
as crawled documents, web request logs, etc.,
to compute various kinds of derived data, such
as inverted indices etc.”
• Nutch search system at 2004 was effectively
limited to 100M web pages
Use Cases
• 2002: Doug Cutting started Nutch: crawler & search
system
• 2003: GoogleFS paper
• 2004: Start of NDFS project (Nutch Distributed FS)
• 2004: Google MapReduce paper
• 2005: MapReduce implementation in Nutch
• 2006: HDFS and MapReduce to Hadoop subproject
• 2008: Yahoo! Production search index by a 10000-core
Hadoop cluster
• 2008: Hadoop – top-level Apache project
Hadoop History
• Need to process Multi Petabyte Datasets
• Need to provide framework for reliable application
execution
• Need to encapsulate nodes failures from application
developer.
– Failure is expected, rather than exceptional.
– The number of nodes in a cluster is not constant.
• Need common infrastructure
– Efficient, reliable, Open Source Apache License
Hadoop Objectives

Recommended for you

Asbury Hadoop Overview
Asbury Hadoop OverviewAsbury Hadoop Overview
Asbury Hadoop Overview

This document provides an overview and introduction to Hadoop, HDFS, and MapReduce. It covers the basic concepts of HDFS, including how files are stored in blocks across data nodes, and the role of the name node and data nodes. It also explains the MapReduce programming model, including the mapper, reducer, and how jobs are split into parallel tasks. The document discusses using Hadoop from the command line and writing MapReduce jobs in Java. It also mentions some other projects in the Hadoop ecosystem like Pig, Hive, HBase and Zookeeper.

hadoopbig datanosql
Hadoop Summit 2015: Hive at Yahoo: Letters from the Trenches
Hadoop Summit 2015: Hive at Yahoo: Letters from the TrenchesHadoop Summit 2015: Hive at Yahoo: Letters from the Trenches
Hadoop Summit 2015: Hive at Yahoo: Letters from the Trenches

Here's the talk that we presented at the Hadoop Summit 2015, in San Jose. This was an inside look at how we at Yahoo scaled Hive to work at Yahoo's data/metadata scale.

yahoometastorehadoop
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystem

This document summarizes Hortonworks' Hadoop distribution called Hortonworks Data Platform (HDP). It discusses how HDP provides a comprehensive data management platform built around Apache Hadoop and YARN. HDP includes tools for storage, processing, security, operations and accessing data through batch, interactive and real-time methods. The document also outlines new capabilities in HDP 2.2 like improved engines for SQL, Spark and streaming and expanded deployment options.

hadoopcouchbaseapache spark
• Hadoop Distributed File System (HDFS)
• Hadoop MapReduce
• Hadoop Common
Hadoop
• Very Large Distributed File System
– 10K nodes, 100 million files, 10 PB
• Assumes Commodity Hardware
– Files are replicated to handle hardware failure
– Detect failures and recovers from them
• Optimized for Batch Processing
– Data locations exposed so that computations can move to
where data resides
– Provides very high aggregate bandwidth
Goals of GFS/HDFS
• Data Coherency
– Write-once-read-many access model
– Client can only append to existing files
• Files are broken up into blocks
– Typically 128 MB block size
– Each block replicated on multiple DataNodes
• Intelligent Client
– Client can find location of blocks
– Client accesses data directly from DataNode
HFDS Details
Client reading data from HDFS

Recommended for you

Introduction To Hadoop Ecosystem
Introduction To Hadoop EcosystemIntroduction To Hadoop Ecosystem
Introduction To Hadoop Ecosystem

Introduction to Hadoop Ecosystem was presented to Lansing Java User Group on 2/17/2015 by Vijay Mandava and Lan Jiang. The demo was built on top of HDP 2.2 and AWS cloud.

bigdatahadoopaws
Practical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & PigPractical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & Pig

The document discusses a presentation about practical problem solving with Hadoop and Pig. It provides an agenda that covers introductions to Hadoop and Pig, including the Hadoop distributed file system, MapReduce, performance tuning, and examples. It discusses how Hadoop is used at Yahoo, including statistics on usage. It also provides examples of how Hadoop has been used for applications like log processing, search indexing, and machine learning.

middleware 2009pighadoop
Hortonworks.Cluster Config Guide
Hortonworks.Cluster Config GuideHortonworks.Cluster Config Guide
Hortonworks.Cluster Config Guide

This document provides guidance on sizing and configuring Apache Hadoop clusters. It recommends separating master nodes, which run processes like the NameNode and JobTracker, from slave nodes, which run DataNodes, TaskTrackers and RegionServers. For medium to large clusters it suggests 4 master nodes and the remaining nodes as slaves. The document outlines factors to consider for optimizing performance and cost like selecting balanced CPU, memory and disk configurations and using a "shared nothing" architecture with 1GbE or 10GbE networking. Redundancy is more important for master than slave nodes.

analyticsbig datahadoop
Client writing data to HDFS
Compression
• Java API
• Command Line
– hadoop dfs -mkdir /foodir
– hadoop dfs -cat /foodir/myfile.txt
– hadoop dfs -rm /foodir myfile.txt
– hadoop dfsadmin –report
– hadoop dfsadmin -decommission datanodename
• Web Interface
– http://host:port/dfshealth.jsp
HDFS User Interface
HDFS Web UI

Recommended for you

Migrating structured data between Hadoop and RDBMS
Migrating structured data between Hadoop and RDBMSMigrating structured data between Hadoop and RDBMS
Migrating structured data between Hadoop and RDBMS

- The document discusses migrating structured data between Hadoop and relational databases using a tool called Bouquet. - Bouquet allows users to select data from a relational database, which is then sent to Spark via Kafka and stored in HDFS/Tachyon for processing. - The enriched data in Spark can then be re-injected back into the original database.

datahadoopspark
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt

The most well known technology used for Big Data is Hadoop. It is actually a large scale batch data processing system

Big Data and Hadoop Ecosystem
Big Data and Hadoop EcosystemBig Data and Hadoop Ecosystem
Big Data and Hadoop Ecosystem

Big data and Hadoop are introduced as ways to handle the increasing volume, variety, and velocity of data. Hadoop evolved as a solution to process large amounts of unstructured and semi-structured data across distributed systems in a cost-effective way using commodity hardware. It provides scalable and parallel processing via MapReduce and HDFS distributed file system that stores data across clusters and provides redundancy and failover. Key Hadoop projects include HDFS, MapReduce, HBase, Hive, Pig and Zookeeper.

apache hadoophadooppig
• The Map-Reduce programming model
– Framework for distributed processing of large data sets
– Pluggable user code runs in generic framework
• Common design pattern in data processing
cat * | grep | sort | uniq -c | cat > file
input | map | shuffle | reduce | output
• Natural for:
– Log processing
– Web search indexing
– Ad-hoc queries
Hadoop MapReduce
Map function
Reduce function
Run this program as a
MapReduce job
Lifecycle of a MapReduce Job
MapReduce in Hadoop (1)
MapReduce in Hadoop (2)

Recommended for you

August 2016 HUG: Better together: Fast Data with Apache Spark™ and Apache Ign...
August 2016 HUG: Better together: Fast Data with Apache Spark™ and Apache Ign...August 2016 HUG: Better together: Fast Data with Apache Spark™ and Apache Ign...
August 2016 HUG: Better together: Fast Data with Apache Spark™ and Apache Ign...

Spark and Ignite are two of the most popular open source projects in the area of high-performance Big Data and Fast Data. But did you know that one of the best ways to boost performance for your next generation real-time applications is to use them together? In this session, Dmitriy Setrakyan, Apache Ignite Project Management Committee Chairman and co-founder and CPO at GridGain will explain in detail how IgniteRDD — an implementation of native Spark RDD and DataFrame APIs — shares the state of the RDD across other Spark jobs, applications and workers. Dmitriy will also demonstrate how IgniteRDD, with its advanced in-memory indexing capabilities, allows execution of SQL queries many times faster than native Spark RDDs or Data Frames. Don't miss this opportunity to learn from one of the experts how to use Spark and Ignite better together in your projects. Speakers: Dmitriy Setrakyan, is a founder and CPO at GridGain Systems. Dmitriy has been working with distributed architectures for over 15 years and has expertise in the development of various middleware platforms, financial trading systems, CRM applications and similar systems. Prior to GridGain, Dmitriy worked at eBay where he was responsible for the architecture of an add-serving system processing several billion hits a day. Currently Dmitriy also acts as PMC chair of Apache Ignite project.

yahoohdfshug
Introduction to Hive and HCatalog
Introduction to Hive and HCatalogIntroduction to Hive and HCatalog
Introduction to Hive and HCatalog

Introduction to Hive and HCatalog presentation by Mark Grover at NYC HUG. A video of this presentation is available at https://www.youtube.com/watch?v=JGwhfr4qw5s

groverintrointroduction
Apache Spark & Hadoop
Apache Spark & HadoopApache Spark & Hadoop
Apache Spark & Hadoop

http://bit.ly/1BTaXZP – Hadoop has been a huge success in the data world. It’s disrupted decades of data management practices and technologies by introducing a massively parallel processing framework. The community and the development of all the Open Source components pushed Hadoop to where it is now. That's why the Hadoop community is excited about Apache Spark. The Spark software stack includes a core data-processing engine, an interface for interactive querying, Sparkstreaming for streaming data analysis, and growing libraries for machine-learning and graph analysis. Spark is quickly establishing itself as a leading environment for doing fast, iterative in-memory and streaming analysis. This talk will give an introduction the Spark stack, explain how Spark has lighting fast results, and how it complements Apache Hadoop. Keys Botzum - Senior Principal Technologist with MapR Technologies Keys is Senior Principal Technologist with MapR Technologies, where he wears many hats. His primary responsibility is interacting with customers in the field, but he also teaches classes, contributes to documentation, and works with engineering teams. He has over 15 years of experience in large scale distributed system design. Previously, he was a Senior Technical Staff Member with IBM, and a respected author of many articles on the WebSphere Application Server as well as a book.

hadoopapache sparkbig data
MapReduce in Hadoop (3)
Hadoop WebUI
Hadoop WebUI
• 190+ parameters in
Hadoop
• Set manually or defaults
are used
Hadoop Configuration

Recommended for you

Next Generation Hadoop Operations
Next Generation Hadoop OperationsNext Generation Hadoop Operations
Next Generation Hadoop Operations

Andrew Ryan describes how Facebook operates Hadoop to provide access as a shared resource between groups. More information and video at: http://developer.yahoo.com/blogs/hadoop/posts/2011/02/hug-feb-2011-recap/

operationshughadoop
20131205 hadoop-hdfs-map reduce-introduction
20131205 hadoop-hdfs-map reduce-introduction20131205 hadoop-hdfs-map reduce-introduction
20131205 hadoop-hdfs-map reduce-introduction

The document provides an overview of Hadoop, an open-source software framework for distributed storage and processing of large datasets. It describes how Hadoop uses HDFS for distributed file storage across clusters and MapReduce for parallel processing of data. Key components of Hadoop include HDFS for storage, YARN for resource management, and MapReduce for distributed computing. The document also discusses some popular Hadoop distributions and real-world uses of Hadoop by companies.

hadoop
Apache drill
Apache drillApache drill
Apache drill

Apache Drill is a scalable SQL query engine for analysis of large-scale datasets across various data sources like HDFS, HBase, Hive and others. It allows for ad-hoc analysis of datasets without requiring knowledge of the schema beforehand. Drill uses a distributed architecture with query coordinators and workers to process queries in parallel. It supports various interfaces like JDBC, ODBC and a web console for running SQL queries on different data sources.

drill hadoop big data fast
Pro:
• Cheap components
• Replication
• Fault tolerance
• Parallel processing
• Free license
• Linear scalability
• Amazon support
Con:
• No realtime
• Difficult to add MR tasks
• File edit is not supported
• High support cost
Summary
• Distributed Grep
• Count of URL Access Frequency
• Reverse Web-Link Graph
• Inverted Index
Examples
• Streaming
• Hive
• Pig
• HBase
Hadoop
API to MapReduce that uses Unix standard streams
as the interface between Hadoop and your program
MAP: map.rb
#!/usr/bin/env ruby
STDIN.each_line do |line|
val = line
year, temp, q = val[15,4], val[87,5], val[92,1]
puts "#{year}t#{temp}" if (temp != "+9999" && q =~ /[01459]/)
end
% cat input/ncdc/sample.txt | map.rb
1950 +0000
1950 +0022
1950 -0011
1949 +0111
1949 +0078
LOCAL EXECUTION
Hadoop Streaming (1)

Recommended for you

Apache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard OfApache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard Of

Study after study shows that data preparation and other data janitorial work consume 50-90% of most data scientists’ time. Apache Drill is a very promising tool which can help address this. Drill works with many different forms of “self describing data” and allows analysts to run ad-hoc queries in ANSI SQL against that data. Unlike HIVE or other SQL on Hadoop tools, Drill is not a wrapper for Map-Reduce and can scale to clusters of up to 10k nodes.

apache drilldata visualizationopen data science
Hadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologiesHadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologies

kelly technologies is the best Hadoop Training Institutes in Hyderabad. Providing Hadoop training by real time faculty in Hyderaba www.kellytechno.com

hadoop training in hyderabadhadoop institutes in hyderabadhadoop training centers in hyderabad
MoSQL: An Elastic Storage Engine for MySQL
MoSQL: An Elastic Storage Engine for MySQLMoSQL: An Elastic Storage Engine for MySQL
MoSQL: An Elastic Storage Engine for MySQL

This document describes MoSQL, an elastic storage engine for MySQL that allows adding and removing storage nodes with little performance impact. It has three main components: MySQL servers that interface with clients, storage nodes that store encrypted data using a multi-version key-value store, and a certifier that ensures transactions commit on up-to-date data. Evaluation shows MoSQL outperforms MySQL on TPC-C benchmarks and can dynamically add nodes with minimal throughput reduction. Future work includes supporting different consensus protocols and improving usability.

mysqlelasticitydatabases
REDUCE: reduce.rb
#!/usr/bin/env ruby
last_key, max_val = nil, 0
STDIN.each_line do |line|
key, val = line.split("t")
if last_key && last_key != key
puts "#{last_key}t#{max_val}"
last_key, max_val = key, val.to_i
else
last_key, max_val = key, [max_val, val.to_i].max
end
end
puts "#{last_key}t#{max_val}" if last_key
% cat input/ncdc/sample.txt | map.rb | sort | reduce.rb
1949 111
1950 22
LOCAL EXECUTION
Hadoop Streaming (2)
HADOOP EXECUTION
% hadoop jar 
$HADOOP_INSTALL/contrib/streaming/hadoop-*-streaming.jar 
-input input/ncdc/sample.txt 
-output output 
-mapper map.rb 
-reducer reduce.rb
Hadoop Streaming (3)
 Intuitive
 Make the unstructured data looks like tables regardless how
it really lay out
 SQL based query can be directly against these tables
 Generate specify execution plan for this query
 What’s Hive
 A data warehousing system to store structured data on
Hadoop file system
 Provide an easy query these data by execution Hadoop
MapReduce plans
Hive: overview
HDFS
Map Reduce
Hive: architecture

Recommended for you

JBug_React_and_Flux_2015
JBug_React_and_Flux_2015JBug_React_and_Flux_2015
JBug_React_and_Flux_2015

This document discusses React and Flux. It introduces React as a JavaScript library created by Facebook for building user interfaces. Flux is described as an application architecture pattern for avoiding complex event chains. Key aspects of React covered include using JSX, the virtual DOM for efficient updates, and integrating with other libraries. The document emphasizes thinking about data flow and putting it in good order using Flux. It concludes by recommending enjoying life on a sunny day.

Building search app with ElasticSearch
Building search app with ElasticSearchBuilding search app with ElasticSearch
Building search app with ElasticSearch

Lukas Vlcek built a search app for public mailing lists in 15 minutes using ElasticSearch. The app allows users to search mailing lists, filter results by facets like date and author, and view document previews with highlighted search terms. Key challenges included parsing email structure and content, normalizing complex email subjects, identifying conversation threads, and determining how to handle quoted content and author disambiguation. The search application and a monitoring tool for ElasticSearch called BigDesk will be made available on GitHub.

bbuzz elasticsearch search mail jboss
Elasticsearch & "PeopleSearch"
Elasticsearch & "PeopleSearch"Elasticsearch & "PeopleSearch"
Elasticsearch & "PeopleSearch"

Presented on 10/11/12 at the Boston Elasticsearch meetup held at the Microsoft New England Research & Development Center. This talk gave a very high-level overview of Elasticsearch to newcomers and explained why ES is a good fit for Traackr's use case.

search engineslucenesearch engine
hive> SHOW TABLES;
hive> CREATE TABLE shakespeare (freq
INT, word STRING) ROW FORMAT
DELIMITED FIELDS TERMINATED BY ‘t’
STORED AS TEXTFILE;
hive> DESCRIBE shakespeare;
loading data…
hive> SELECT * FROM shakespeare LIMIT 10;
hive> SELECT * FROM shakespeare
WHERE freq > 100 SORT BY freq ASC
LIMIT 10;
Hive: shell
-- max_temp.pig: Finds the maximum temperature by year
records = LOAD 'input/ncdc/micro-tab/sample.txt'
AS (year:chararray, temperature:int, quality:int);
filtered_records = FILTER records
BY temperature != 9999
AND (quality == 0 OR quality == 1 OR quality == 4 OR quality == 5 OR quality == 9);
grouped_records = GROUP filtered_records BY year;
max_temp = FOREACH grouped_records
GENERATE group, MAX(filtered_records.temperature);
DUMP max_temp;
Pig
Initial public launch
Move from local workstation to shared, remote hosted
MySQL instance with a well-defined schema.
Service becomes more popular; too many reads hitting the
database
Add memcached to cache common queries. Reads are
now no longer strictly ACID; cached data must expire.
Service continues to grow in popularity; too many writes
hitting the database
Scale MySQL vertically by buying a beefed up server
with 16 cores, 128 GB of RAM,
and banks of 15 k RPM hard drives. Costly.
RDBMS scaling story (1)
New features increases query complexity; now we have
too many joins
Denormalize your data to reduce joins.
Rising popularity swamps the server; things are too slow
Stop doing any server-side computations.
Some queries are still too slow
Periodically prematerialize the most complex
queries, try to stop joining in most cases.
Reads are OK, but writes are getting slower and slower
Drop secondary indexes and triggers (no indexes?).
RDBMS scaling story (1)

Recommended for you

OseeGenius - Semantic search engine and discovery platform
OseeGenius - Semantic search engine and discovery platformOseeGenius - Semantic search engine and discovery platform
OseeGenius - Semantic search engine and discovery platform

The document discusses the OseeGenius discovery platform and its features. It provides an overview of OseeGenius' services, search capabilities, and technical details. Key features include facets, explorers, classification, keyword indexing, metadata extraction, stemming, auto-completion, geospatial search, and integration with library systems. Screenshots demonstrate the user interface and capabilities like highlighting, user workspaces, reviews, and MARC import.

atcultproductsearch engine
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch

This document provides an overview of Elasticsearch, including its uses cases at companies like GitHub, Stack Overflow, and Netflix. It discusses Elasticsearch's data indexing and querying capabilities. Key topics covered include document mapping and types, shards and replicas, analyzers, term queries, match queries, sorting, aggregations, and cluster configuration. The document concludes with lessons learned and a reference to Elasticsearch's documentation.

Social Miner: Webinar people marketing em 30 min
Social Miner: Webinar people marketing em 30 minSocial Miner: Webinar people marketing em 30 min
Social Miner: Webinar people marketing em 30 min

Apresentação utilizada no nosso webinar de People Marketing realizado por E-commerce Brasil, Social Miner e iMasters

marketingecommercepeople marketing
NoSQL
• Tables have one primary index, the row key
• No join operators
• Data is unstructured and untyped
• No accessed or manipulated via SQL
– Programmatic access via Java, REST, or Thrift APIs
• There are three types of lookups:
– Fast lookup using row key and optional timestamp
– Full table scan
– Range scan from region start to end
Hbase: differences from RDBMS
• Automatic partitioning
• Scale linearly and automatically with new
nodes
• Commodity hardware
• Fault tolerance: Apache Zookeeper
• Batch processing: Apache Hadoop
Hbase: benefits over RDBMS
 Tables are sorted by Row
 Table schema only define it’s column families .
 Each family consists of any number of columns
 Each column consists of any number of versions
 Columns only exist when inserted, NULLs are free.
 Columns within a family are sorted and stored together
 Everything except table names are byte[]
 (Row, Family: Column, Timestamp)  Value
Row key
Column Family
valueTimeStamp
Hbase: data model

Recommended for you

Oxalide Academy : Workshop #3 Elastic Search
Oxalide Academy : Workshop #3 Elastic SearchOxalide Academy : Workshop #3 Elastic Search
Oxalide Academy : Workshop #3 Elastic Search

Atelier organisé par Oxalide (Ludovic Piot) et Kernel 42 (Edouard Fajnzilberg) à destination des niveaux débutants et intermédiaire. Le point de vue du Syadmin et du Dev en un seul atelier et avoir une vision globale du fonctionnement et de l'usage d'Elastic Search.

devopsopsweb development
Elasticsearch first-steps
Elasticsearch first-stepsElasticsearch first-steps
Elasticsearch first-steps

This document discusses using Elasticsearch for social media analytics and provides examples of common tasks. It introduces Elasticsearch basics like installation, indexing documents, and searching. It also covers more advanced topics like mapping types, facets for aggregations, analyzers, nested and parent/child relations between documents. The document concludes with recommendations on data design, suggesting indexing strategies for different use cases like per user, single index, or partitioning by time range.

data designnosqlanalytics
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to Elasticsearch

The document provides an overview of Elasticsearch including that it is easy to install, horizontally scalable, and highly available. It discusses Elasticsearch's core search capabilities using Lucene and how data can be stored and retrieved. The document also covers Elasticsearch's distributed nature, plugins, scripts, custom analyzers, and other features like aggregations, filtering and sorting.

elasticsearchcrossplatformqueries
• Master
– Responsible for monitoring region servers
– Load balancing for regions
– Redirect client to correct region servers
• regionserver slaves
– Serving requests (Write/Read/Scan) of Client
– Send HeartBeat to Master
Hbase: members
$ hbase shell
> create 'test', 'data'
0 row(s) in 4.3066 seconds
> list
test
1 row(s) in 0.1485 seconds
> put 'test', 'row1', 'data:1', 'value1'
0 row(s) in 0.0454 seconds
> put 'test', 'row2', 'data:2', 'value2'
0 row(s) in 0.0035 seconds
> scan 'test'
ROW COLUMN+CELL
row1 column=data:1, timestamp=1240148026198, value=value1
row2 column=data:2, timestamp=1240148040035, value=value2
2 row(s) in 0.0825 seconds
Hbase: shell
Hbase: Web UI
• Amazon
• Facebook
• Google
• IBM
• Joost
• Last.fm
• New York Times
• PowerSet
• Veoh
• Yahoo!
Who uses Hadoop?

Recommended for you

Amministratori Di Sistema: Adeguamento al Garante Privacy - Log Management e ...
Amministratori Di Sistema: Adeguamento al Garante Privacy - Log Management e ...Amministratori Di Sistema: Adeguamento al Garante Privacy - Log Management e ...
Amministratori Di Sistema: Adeguamento al Garante Privacy - Log Management e ...
amministraori di sistemaadeguamentoads
Oak / Solr integration
Oak / Solr integrationOak / Solr integration
Oak / Solr integration

A quick tour of available integration hooks in Apache Jackrabbit Oak to plug in Apache Solr in order to provide scalable search (& more) functionalities to the repository

apache solrapache jackrabbitberlin
Elastic search
Elastic searchElastic search
Elastic search

This document provides steps to set up Elastic Search on an Ubuntu server including installing Apache, PHP, Java, Elastic Search server, the Elastic Search PHP API, and testing PHP scripts connecting to Elastic Search. It outlines downloading required files, running commands to install packages and configure services, and testing the basic functionality.

ubuntuphpelasticsearch
Books

More Related Content

What's hot

SQOOP - RDBMS to Hadoop
SQOOP - RDBMS to HadoopSQOOP - RDBMS to Hadoop
SQOOP - RDBMS to Hadoop
Sofian Hadiwijaya
 
Hadoop overview
Hadoop overviewHadoop overview
Hadoop overview
Siva Pandeti
 
Real time hadoop + mapreduce intro
Real time hadoop + mapreduce introReal time hadoop + mapreduce intro
Real time hadoop + mapreduce intro
Geoff Hendrey
 
Asbury Hadoop Overview
Asbury Hadoop OverviewAsbury Hadoop Overview
Asbury Hadoop Overview
Brian Enochson
 
Hadoop Summit 2015: Hive at Yahoo: Letters from the Trenches
Hadoop Summit 2015: Hive at Yahoo: Letters from the TrenchesHadoop Summit 2015: Hive at Yahoo: Letters from the Trenches
Hadoop Summit 2015: Hive at Yahoo: Letters from the Trenches
Mithun Radhakrishnan
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystem
Shivaji Dutta
 
Introduction To Hadoop Ecosystem
Introduction To Hadoop EcosystemIntroduction To Hadoop Ecosystem
Introduction To Hadoop Ecosystem
InSemble
 
Practical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & PigPractical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & Pig
Milind Bhandarkar
 
Hortonworks.Cluster Config Guide
Hortonworks.Cluster Config GuideHortonworks.Cluster Config Guide
Hortonworks.Cluster Config Guide
Douglas Bernardini
 
Migrating structured data between Hadoop and RDBMS
Migrating structured data between Hadoop and RDBMSMigrating structured data between Hadoop and RDBMS
Migrating structured data between Hadoop and RDBMS
Bouquet
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
sravya raju
 
Big Data and Hadoop Ecosystem
Big Data and Hadoop EcosystemBig Data and Hadoop Ecosystem
Big Data and Hadoop Ecosystem
Rajkumar Singh
 
August 2016 HUG: Better together: Fast Data with Apache Spark™ and Apache Ign...
August 2016 HUG: Better together: Fast Data with Apache Spark™ and Apache Ign...August 2016 HUG: Better together: Fast Data with Apache Spark™ and Apache Ign...
August 2016 HUG: Better together: Fast Data with Apache Spark™ and Apache Ign...
Yahoo Developer Network
 
Introduction to Hive and HCatalog
Introduction to Hive and HCatalogIntroduction to Hive and HCatalog
Introduction to Hive and HCatalog
markgrover
 
Apache Spark & Hadoop
Apache Spark & HadoopApache Spark & Hadoop
Apache Spark & Hadoop
MapR Technologies
 
Next Generation Hadoop Operations
Next Generation Hadoop OperationsNext Generation Hadoop Operations
Next Generation Hadoop Operations
Owen O'Malley
 
20131205 hadoop-hdfs-map reduce-introduction
20131205 hadoop-hdfs-map reduce-introduction20131205 hadoop-hdfs-map reduce-introduction
20131205 hadoop-hdfs-map reduce-introduction
Xuan-Chao Huang
 
Apache drill
Apache drillApache drill
Apache drill
Jakub Pieprzyk
 
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard OfApache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
Charles Givre
 
Hadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologiesHadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologies
Kelly Technologies
 

What's hot (20)

SQOOP - RDBMS to Hadoop
SQOOP - RDBMS to HadoopSQOOP - RDBMS to Hadoop
SQOOP - RDBMS to Hadoop
 
Hadoop overview
Hadoop overviewHadoop overview
Hadoop overview
 
Real time hadoop + mapreduce intro
Real time hadoop + mapreduce introReal time hadoop + mapreduce intro
Real time hadoop + mapreduce intro
 
Asbury Hadoop Overview
Asbury Hadoop OverviewAsbury Hadoop Overview
Asbury Hadoop Overview
 
Hadoop Summit 2015: Hive at Yahoo: Letters from the Trenches
Hadoop Summit 2015: Hive at Yahoo: Letters from the TrenchesHadoop Summit 2015: Hive at Yahoo: Letters from the Trenches
Hadoop Summit 2015: Hive at Yahoo: Letters from the Trenches
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystem
 
Introduction To Hadoop Ecosystem
Introduction To Hadoop EcosystemIntroduction To Hadoop Ecosystem
Introduction To Hadoop Ecosystem
 
Practical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & PigPractical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & Pig
 
Hortonworks.Cluster Config Guide
Hortonworks.Cluster Config GuideHortonworks.Cluster Config Guide
Hortonworks.Cluster Config Guide
 
Migrating structured data between Hadoop and RDBMS
Migrating structured data between Hadoop and RDBMSMigrating structured data between Hadoop and RDBMS
Migrating structured data between Hadoop and RDBMS
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 
Big Data and Hadoop Ecosystem
Big Data and Hadoop EcosystemBig Data and Hadoop Ecosystem
Big Data and Hadoop Ecosystem
 
August 2016 HUG: Better together: Fast Data with Apache Spark™ and Apache Ign...
August 2016 HUG: Better together: Fast Data with Apache Spark™ and Apache Ign...August 2016 HUG: Better together: Fast Data with Apache Spark™ and Apache Ign...
August 2016 HUG: Better together: Fast Data with Apache Spark™ and Apache Ign...
 
Introduction to Hive and HCatalog
Introduction to Hive and HCatalogIntroduction to Hive and HCatalog
Introduction to Hive and HCatalog
 
Apache Spark & Hadoop
Apache Spark & HadoopApache Spark & Hadoop
Apache Spark & Hadoop
 
Next Generation Hadoop Operations
Next Generation Hadoop OperationsNext Generation Hadoop Operations
Next Generation Hadoop Operations
 
20131205 hadoop-hdfs-map reduce-introduction
20131205 hadoop-hdfs-map reduce-introduction20131205 hadoop-hdfs-map reduce-introduction
20131205 hadoop-hdfs-map reduce-introduction
 
Apache drill
Apache drillApache drill
Apache drill
 
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard OfApache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
 
Hadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologiesHadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologies
 

Viewers also liked

MoSQL: An Elastic Storage Engine for MySQL
MoSQL: An Elastic Storage Engine for MySQLMoSQL: An Elastic Storage Engine for MySQL
MoSQL: An Elastic Storage Engine for MySQL
Alex Tomic
 
JBug_React_and_Flux_2015
JBug_React_and_Flux_2015JBug_React_and_Flux_2015
JBug_React_and_Flux_2015
Lukas Vlcek
 
Building search app with ElasticSearch
Building search app with ElasticSearchBuilding search app with ElasticSearch
Building search app with ElasticSearch
Lukas Vlcek
 
Elasticsearch & "PeopleSearch"
Elasticsearch & "PeopleSearch"Elasticsearch & "PeopleSearch"
Elasticsearch & "PeopleSearch"
George Stathis
 
OseeGenius - Semantic search engine and discovery platform
OseeGenius - Semantic search engine and discovery platformOseeGenius - Semantic search engine and discovery platform
OseeGenius - Semantic search engine and discovery platform
@CULT Srl
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
Amine Ferchichi
 
Social Miner: Webinar people marketing em 30 min
Social Miner: Webinar people marketing em 30 minSocial Miner: Webinar people marketing em 30 min
Social Miner: Webinar people marketing em 30 min
Social Miner
 
Oxalide Academy : Workshop #3 Elastic Search
Oxalide Academy : Workshop #3 Elastic SearchOxalide Academy : Workshop #3 Elastic Search
Oxalide Academy : Workshop #3 Elastic Search
Oxalide
 
Elasticsearch first-steps
Elasticsearch first-stepsElasticsearch first-steps
Elasticsearch first-steps
Matteo Moci
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to Elasticsearch
Sperasoft
 
Amministratori Di Sistema: Adeguamento al Garante Privacy - Log Management e ...
Amministratori Di Sistema: Adeguamento al Garante Privacy - Log Management e ...Amministratori Di Sistema: Adeguamento al Garante Privacy - Log Management e ...
Amministratori Di Sistema: Adeguamento al Garante Privacy - Log Management e ...
Simone Onofri
 
Oak / Solr integration
Oak / Solr integrationOak / Solr integration
Oak / Solr integration
Tommaso Teofili
 
Elastic search
Elastic searchElastic search
Elastic search
Rahul Agarwal
 
quick intro to elastic search
quick intro to elastic search quick intro to elastic search
quick intro to elastic search
medcl
 
Elastic search Walkthrough
Elastic search WalkthroughElastic search Walkthrough
Elastic search Walkthrough
Suhel Meman
 
[Case machine learning- iColabora]Text Mining - classificando textos com Elas...
[Case machine learning- iColabora]Text Mining - classificando textos com Elas...[Case machine learning- iColabora]Text Mining - classificando textos com Elas...
[Case machine learning- iColabora]Text Mining - classificando textos com Elas...
Jozias Rolim
 
Elastic search adaptto2014
Elastic search adaptto2014Elastic search adaptto2014
Elastic search adaptto2014
Vivek Sachdeva
 
Using Elastic Search Outside Full-Text Search
Using Elastic Search Outside Full-Text SearchUsing Elastic Search Outside Full-Text Search
Using Elastic Search Outside Full-Text Search
Sumy PHP User Grpoup
 
03. ElasticSearch : Data In, Data Out
03. ElasticSearch : Data In, Data Out03. ElasticSearch : Data In, Data Out
03. ElasticSearch : Data In, Data Out
OpenThink Labs
 
Data replication in Sling
Data replication in SlingData replication in Sling
Data replication in Sling
Tommaso Teofili
 

Viewers also liked (20)

MoSQL: An Elastic Storage Engine for MySQL
MoSQL: An Elastic Storage Engine for MySQLMoSQL: An Elastic Storage Engine for MySQL
MoSQL: An Elastic Storage Engine for MySQL
 
JBug_React_and_Flux_2015
JBug_React_and_Flux_2015JBug_React_and_Flux_2015
JBug_React_and_Flux_2015
 
Building search app with ElasticSearch
Building search app with ElasticSearchBuilding search app with ElasticSearch
Building search app with ElasticSearch
 
Elasticsearch & "PeopleSearch"
Elasticsearch & "PeopleSearch"Elasticsearch & "PeopleSearch"
Elasticsearch & "PeopleSearch"
 
OseeGenius - Semantic search engine and discovery platform
OseeGenius - Semantic search engine and discovery platformOseeGenius - Semantic search engine and discovery platform
OseeGenius - Semantic search engine and discovery platform
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 
Social Miner: Webinar people marketing em 30 min
Social Miner: Webinar people marketing em 30 minSocial Miner: Webinar people marketing em 30 min
Social Miner: Webinar people marketing em 30 min
 
Oxalide Academy : Workshop #3 Elastic Search
Oxalide Academy : Workshop #3 Elastic SearchOxalide Academy : Workshop #3 Elastic Search
Oxalide Academy : Workshop #3 Elastic Search
 
Elasticsearch first-steps
Elasticsearch first-stepsElasticsearch first-steps
Elasticsearch first-steps
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to Elasticsearch
 
Amministratori Di Sistema: Adeguamento al Garante Privacy - Log Management e ...
Amministratori Di Sistema: Adeguamento al Garante Privacy - Log Management e ...Amministratori Di Sistema: Adeguamento al Garante Privacy - Log Management e ...
Amministratori Di Sistema: Adeguamento al Garante Privacy - Log Management e ...
 
Oak / Solr integration
Oak / Solr integrationOak / Solr integration
Oak / Solr integration
 
Elastic search
Elastic searchElastic search
Elastic search
 
quick intro to elastic search
quick intro to elastic search quick intro to elastic search
quick intro to elastic search
 
Elastic search Walkthrough
Elastic search WalkthroughElastic search Walkthrough
Elastic search Walkthrough
 
[Case machine learning- iColabora]Text Mining - classificando textos com Elas...
[Case machine learning- iColabora]Text Mining - classificando textos com Elas...[Case machine learning- iColabora]Text Mining - classificando textos com Elas...
[Case machine learning- iColabora]Text Mining - classificando textos com Elas...
 
Elastic search adaptto2014
Elastic search adaptto2014Elastic search adaptto2014
Elastic search adaptto2014
 
Using Elastic Search Outside Full-Text Search
Using Elastic Search Outside Full-Text SearchUsing Elastic Search Outside Full-Text Search
Using Elastic Search Outside Full-Text Search
 
03. ElasticSearch : Data In, Data Out
03. ElasticSearch : Data In, Data Out03. ElasticSearch : Data In, Data Out
03. ElasticSearch : Data In, Data Out
 
Data replication in Sling
Data replication in SlingData replication in Sling
Data replication in Sling
 

Similar to Apache Hadoop 1.1

Apache Spark
Apache SparkApache Spark
Apache Spark
SugumarSarDurai
 
Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)
Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)
Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)
VMware Tanzu
 
Hadoop ppt on the basics and architecture
Hadoop ppt on the basics and architectureHadoop ppt on the basics and architecture
Hadoop ppt on the basics and architecture
saipriyacoool
 
Getting Started with Hadoop
Getting Started with HadoopGetting Started with Hadoop
Getting Started with Hadoop
Cloudera, Inc.
 
Hadoop ppt1
Hadoop ppt1Hadoop ppt1
Hadoop ppt1
chariorienit
 
Hadoop Primer
Hadoop PrimerHadoop Primer
Hadoop Primer
Steve Staso
 
Big data Hadoop
Big data  Hadoop   Big data  Hadoop
Big data Hadoop
Ayyappan Paramesh
 
Apache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce OverviewApache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce Overview
Nisanth Simon
 
Big Data Developers Moscow Meetup 1 - sql on hadoop
Big Data Developers Moscow Meetup 1  - sql on hadoopBig Data Developers Moscow Meetup 1  - sql on hadoop
Big Data Developers Moscow Meetup 1 - sql on hadoop
bddmoscow
 
List of Engineering Colleges in Uttarakhand
List of Engineering Colleges in UttarakhandList of Engineering Colleges in Uttarakhand
List of Engineering Colleges in Uttarakhand
Roorkee College of Engineering, Roorkee
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
sonukumar379092
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
arslanhaneef
 
hadoop distributed file systems complete information
hadoop distributed file systems complete informationhadoop distributed file systems complete information
hadoop distributed file systems complete information
bhargavi804095
 
Hadoop and Hive Development at Facebook
Hadoop and Hive Development at  FacebookHadoop and Hive Development at  Facebook
Hadoop and Hive Development at Facebook
S S
 
Hadoop and Hive Development at Facebook
Hadoop and Hive Development at FacebookHadoop and Hive Development at Facebook
Hadoop and Hive Development at Facebook
elliando dias
 
Scaling Storage and Computation with Hadoop
Scaling Storage and Computation with HadoopScaling Storage and Computation with Hadoop
Scaling Storage and Computation with Hadoop
yaevents
 
Hadoop_arunam_ppt
Hadoop_arunam_pptHadoop_arunam_ppt
Hadoop_arunam_ppt
jerrin joseph
 
מיכאל
מיכאלמיכאל
מיכאל
sqlserver.co.il
 
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
Venneladonthireddy1
 
Hadoop
HadoopHadoop
Hadoop
avnishagr
 

Similar to Apache Hadoop 1.1 (20)

Apache Spark
Apache SparkApache Spark
Apache Spark
 
Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)
Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)
Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)
 
Hadoop ppt on the basics and architecture
Hadoop ppt on the basics and architectureHadoop ppt on the basics and architecture
Hadoop ppt on the basics and architecture
 
Getting Started with Hadoop
Getting Started with HadoopGetting Started with Hadoop
Getting Started with Hadoop
 
Hadoop ppt1
Hadoop ppt1Hadoop ppt1
Hadoop ppt1
 
Hadoop Primer
Hadoop PrimerHadoop Primer
Hadoop Primer
 
Big data Hadoop
Big data  Hadoop   Big data  Hadoop
Big data Hadoop
 
Apache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce OverviewApache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce Overview
 
Big Data Developers Moscow Meetup 1 - sql on hadoop
Big Data Developers Moscow Meetup 1  - sql on hadoopBig Data Developers Moscow Meetup 1  - sql on hadoop
Big Data Developers Moscow Meetup 1 - sql on hadoop
 
List of Engineering Colleges in Uttarakhand
List of Engineering Colleges in UttarakhandList of Engineering Colleges in Uttarakhand
List of Engineering Colleges in Uttarakhand
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
hadoop distributed file systems complete information
hadoop distributed file systems complete informationhadoop distributed file systems complete information
hadoop distributed file systems complete information
 
Hadoop and Hive Development at Facebook
Hadoop and Hive Development at  FacebookHadoop and Hive Development at  Facebook
Hadoop and Hive Development at Facebook
 
Hadoop and Hive Development at Facebook
Hadoop and Hive Development at FacebookHadoop and Hive Development at Facebook
Hadoop and Hive Development at Facebook
 
Scaling Storage and Computation with Hadoop
Scaling Storage and Computation with HadoopScaling Storage and Computation with Hadoop
Scaling Storage and Computation with Hadoop
 
Hadoop_arunam_ppt
Hadoop_arunam_pptHadoop_arunam_ppt
Hadoop_arunam_ppt
 
מיכאל
מיכאלמיכאל
מיכאל
 
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
 
Hadoop
HadoopHadoop
Hadoop
 

More from Sperasoft

особенности работы с Locomotion в Unreal Engine 4
особенности работы с Locomotion в Unreal Engine 4особенности работы с Locomotion в Unreal Engine 4
особенности работы с Locomotion в Unreal Engine 4
Sperasoft
 
концепт и архитектура геймплея в Creach: The Depleted World
концепт и архитектура геймплея в Creach: The Depleted Worldконцепт и архитектура геймплея в Creach: The Depleted World
концепт и архитектура геймплея в Creach: The Depleted World
Sperasoft
 
Опыт разработки VR игры для UE4
Опыт разработки VR игры для UE4Опыт разработки VR игры для UE4
Опыт разработки VR игры для UE4
Sperasoft
 
Организация работы с UE4 в команде до 20 человек
Организация работы с UE4 в команде до 20 человек Организация работы с UE4 в команде до 20 человек
Организация работы с UE4 в команде до 20 человек
Sperasoft
 
Gameplay Tags
Gameplay TagsGameplay Tags
Gameplay Tags
Sperasoft
 
Data Driven Gameplay in UE4
Data Driven Gameplay in UE4Data Driven Gameplay in UE4
Data Driven Gameplay in UE4
Sperasoft
 
Code and Memory Optimisation Tricks
Code and Memory Optimisation Tricks Code and Memory Optimisation Tricks
Code and Memory Optimisation Tricks
Sperasoft
 
The theory of relational databases
The theory of relational databasesThe theory of relational databases
The theory of relational databases
Sperasoft
 
Automated layout testing using Galen Framework
Automated layout testing using Galen FrameworkAutomated layout testing using Galen Framework
Automated layout testing using Galen Framework
Sperasoft
 
Sperasoft talks: Android Security Threats
Sperasoft talks: Android Security ThreatsSperasoft talks: Android Security Threats
Sperasoft talks: Android Security Threats
Sperasoft
 
Sperasoft Talks: RxJava Functional Reactive Programming on Android
Sperasoft Talks: RxJava Functional Reactive Programming on AndroidSperasoft Talks: RxJava Functional Reactive Programming on Android
Sperasoft Talks: RxJava Functional Reactive Programming on Android
Sperasoft
 
Sperasoft‬ talks j point 2015
Sperasoft‬ talks j point 2015Sperasoft‬ talks j point 2015
Sperasoft‬ talks j point 2015
Sperasoft
 
Effective Мeetings
Effective МeetingsEffective Мeetings
Effective Мeetings
Sperasoft
 
Unreal Engine 4 Introduction
Unreal Engine 4 IntroductionUnreal Engine 4 Introduction
Unreal Engine 4 Introduction
Sperasoft
 
JIRA Development
JIRA DevelopmentJIRA Development
JIRA Development
Sperasoft
 
MOBILE DEVELOPMENT with HTML, CSS and JS
MOBILE DEVELOPMENT with HTML, CSS and JSMOBILE DEVELOPMENT with HTML, CSS and JS
MOBILE DEVELOPMENT with HTML, CSS and JS
Sperasoft
 
Quick Intro Into Kanban
Quick Intro Into KanbanQuick Intro Into Kanban
Quick Intro Into Kanban
Sperasoft
 
ECMAScript 6 Review
ECMAScript 6 ReviewECMAScript 6 Review
ECMAScript 6 Review
Sperasoft
 
Console Development in 15 minutes
Console Development in 15 minutesConsole Development in 15 minutes
Console Development in 15 minutes
Sperasoft
 
Database Indexes
Database IndexesDatabase Indexes
Database Indexes
Sperasoft
 

More from Sperasoft (20)

особенности работы с Locomotion в Unreal Engine 4
особенности работы с Locomotion в Unreal Engine 4особенности работы с Locomotion в Unreal Engine 4
особенности работы с Locomotion в Unreal Engine 4
 
концепт и архитектура геймплея в Creach: The Depleted World
концепт и архитектура геймплея в Creach: The Depleted Worldконцепт и архитектура геймплея в Creach: The Depleted World
концепт и архитектура геймплея в Creach: The Depleted World
 
Опыт разработки VR игры для UE4
Опыт разработки VR игры для UE4Опыт разработки VR игры для UE4
Опыт разработки VR игры для UE4
 
Организация работы с UE4 в команде до 20 человек
Организация работы с UE4 в команде до 20 человек Организация работы с UE4 в команде до 20 человек
Организация работы с UE4 в команде до 20 человек
 
Gameplay Tags
Gameplay TagsGameplay Tags
Gameplay Tags
 
Data Driven Gameplay in UE4
Data Driven Gameplay in UE4Data Driven Gameplay in UE4
Data Driven Gameplay in UE4
 
Code and Memory Optimisation Tricks
Code and Memory Optimisation Tricks Code and Memory Optimisation Tricks
Code and Memory Optimisation Tricks
 
The theory of relational databases
The theory of relational databasesThe theory of relational databases
The theory of relational databases
 
Automated layout testing using Galen Framework
Automated layout testing using Galen FrameworkAutomated layout testing using Galen Framework
Automated layout testing using Galen Framework
 
Sperasoft talks: Android Security Threats
Sperasoft talks: Android Security ThreatsSperasoft talks: Android Security Threats
Sperasoft talks: Android Security Threats
 
Sperasoft Talks: RxJava Functional Reactive Programming on Android
Sperasoft Talks: RxJava Functional Reactive Programming on AndroidSperasoft Talks: RxJava Functional Reactive Programming on Android
Sperasoft Talks: RxJava Functional Reactive Programming on Android
 
Sperasoft‬ talks j point 2015
Sperasoft‬ talks j point 2015Sperasoft‬ talks j point 2015
Sperasoft‬ talks j point 2015
 
Effective Мeetings
Effective МeetingsEffective Мeetings
Effective Мeetings
 
Unreal Engine 4 Introduction
Unreal Engine 4 IntroductionUnreal Engine 4 Introduction
Unreal Engine 4 Introduction
 
JIRA Development
JIRA DevelopmentJIRA Development
JIRA Development
 
MOBILE DEVELOPMENT with HTML, CSS and JS
MOBILE DEVELOPMENT with HTML, CSS and JSMOBILE DEVELOPMENT with HTML, CSS and JS
MOBILE DEVELOPMENT with HTML, CSS and JS
 
Quick Intro Into Kanban
Quick Intro Into KanbanQuick Intro Into Kanban
Quick Intro Into Kanban
 
ECMAScript 6 Review
ECMAScript 6 ReviewECMAScript 6 Review
ECMAScript 6 Review
 
Console Development in 15 minutes
Console Development in 15 minutesConsole Development in 15 minutes
Console Development in 15 minutes
 
Database Indexes
Database IndexesDatabase Indexes
Database Indexes
 

Recently uploaded

Pigging Solutions Sustainability brochure.pdf
Pigging Solutions Sustainability brochure.pdfPigging Solutions Sustainability brochure.pdf
Pigging Solutions Sustainability brochure.pdf
Pigging Solutions
 
一比一原版(msvu毕业证书)圣文森山大学毕业证如何办理
一比一原版(msvu毕业证书)圣文森山大学毕业证如何办理一比一原版(msvu毕业证书)圣文森山大学毕业证如何办理
一比一原版(msvu毕业证书)圣文森山大学毕业证如何办理
uuuot
 
@Call @Girls Pune 0000000000 Riya Khan Beautiful Girl any Time
@Call @Girls Pune 0000000000 Riya Khan Beautiful Girl any Time@Call @Girls Pune 0000000000 Riya Khan Beautiful Girl any Time
@Call @Girls Pune 0000000000 Riya Khan Beautiful Girl any Time
amitchopra0215
 
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Erasmo Purificato
 
Running a Go App in Kubernetes: CPU Impacts
Running a Go App in Kubernetes: CPU ImpactsRunning a Go App in Kubernetes: CPU Impacts
Running a Go App in Kubernetes: CPU Impacts
ScyllaDB
 
How to Avoid Learning the Linux-Kernel Memory Model
How to Avoid Learning the Linux-Kernel Memory ModelHow to Avoid Learning the Linux-Kernel Memory Model
How to Avoid Learning the Linux-Kernel Memory Model
ScyllaDB
 
Calgary MuleSoft Meetup APM and IDP .pptx
Calgary MuleSoft Meetup APM and IDP .pptxCalgary MuleSoft Meetup APM and IDP .pptx
Calgary MuleSoft Meetup APM and IDP .pptx
ishalveerrandhawa1
 
MYIR Product Brochure - A Global Provider of Embedded SOMs & Solutions
MYIR Product Brochure - A Global Provider of Embedded SOMs & SolutionsMYIR Product Brochure - A Global Provider of Embedded SOMs & Solutions
MYIR Product Brochure - A Global Provider of Embedded SOMs & Solutions
Linda Zhang
 
AC Atlassian Coimbatore Session Slides( 22/06/2024)
AC Atlassian Coimbatore Session Slides( 22/06/2024)AC Atlassian Coimbatore Session Slides( 22/06/2024)
AC Atlassian Coimbatore Session Slides( 22/06/2024)
apoorva2579
 
Observability For You and Me with OpenTelemetry
Observability For You and Me with OpenTelemetryObservability For You and Me with OpenTelemetry
Observability For You and Me with OpenTelemetry
Eric D. Schabell
 
Coordinate Systems in FME 101 - Webinar Slides
Coordinate Systems in FME 101 - Webinar SlidesCoordinate Systems in FME 101 - Webinar Slides
Coordinate Systems in FME 101 - Webinar Slides
Safe Software
 
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Chris Swan
 
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
Kief Morris
 
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdfWhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
ArgaBisma
 
The Increasing Use of the National Research Platform by the CSU Campuses
The Increasing Use of the National Research Platform by the CSU CampusesThe Increasing Use of the National Research Platform by the CSU Campuses
The Increasing Use of the National Research Platform by the CSU Campuses
Larry Smarr
 
HTTP Adaptive Streaming – Quo Vadis (2024)
HTTP Adaptive Streaming – Quo Vadis (2024)HTTP Adaptive Streaming – Quo Vadis (2024)
HTTP Adaptive Streaming – Quo Vadis (2024)
Alpen-Adria-Universität
 
K2G - Insurtech Innovation EMEA Award 2024
K2G - Insurtech Innovation EMEA Award 2024K2G - Insurtech Innovation EMEA Award 2024
K2G - Insurtech Innovation EMEA Award 2024
The Digital Insurer
 
5G bootcamp Sep 2020 (NPI initiative).pptx
5G bootcamp Sep 2020 (NPI initiative).pptx5G bootcamp Sep 2020 (NPI initiative).pptx
5G bootcamp Sep 2020 (NPI initiative).pptx
SATYENDRA100
 
find out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challengesfind out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challenges
huseindihon
 
How RPA Help in the Transportation and Logistics Industry.pptx
How RPA Help in the Transportation and Logistics Industry.pptxHow RPA Help in the Transportation and Logistics Industry.pptx
How RPA Help in the Transportation and Logistics Industry.pptx
SynapseIndia
 

Recently uploaded (20)

Pigging Solutions Sustainability brochure.pdf
Pigging Solutions Sustainability brochure.pdfPigging Solutions Sustainability brochure.pdf
Pigging Solutions Sustainability brochure.pdf
 
一比一原版(msvu毕业证书)圣文森山大学毕业证如何办理
一比一原版(msvu毕业证书)圣文森山大学毕业证如何办理一比一原版(msvu毕业证书)圣文森山大学毕业证如何办理
一比一原版(msvu毕业证书)圣文森山大学毕业证如何办理
 
@Call @Girls Pune 0000000000 Riya Khan Beautiful Girl any Time
@Call @Girls Pune 0000000000 Riya Khan Beautiful Girl any Time@Call @Girls Pune 0000000000 Riya Khan Beautiful Girl any Time
@Call @Girls Pune 0000000000 Riya Khan Beautiful Girl any Time
 
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
 
Running a Go App in Kubernetes: CPU Impacts
Running a Go App in Kubernetes: CPU ImpactsRunning a Go App in Kubernetes: CPU Impacts
Running a Go App in Kubernetes: CPU Impacts
 
How to Avoid Learning the Linux-Kernel Memory Model
How to Avoid Learning the Linux-Kernel Memory ModelHow to Avoid Learning the Linux-Kernel Memory Model
How to Avoid Learning the Linux-Kernel Memory Model
 
Calgary MuleSoft Meetup APM and IDP .pptx
Calgary MuleSoft Meetup APM and IDP .pptxCalgary MuleSoft Meetup APM and IDP .pptx
Calgary MuleSoft Meetup APM and IDP .pptx
 
MYIR Product Brochure - A Global Provider of Embedded SOMs & Solutions
MYIR Product Brochure - A Global Provider of Embedded SOMs & SolutionsMYIR Product Brochure - A Global Provider of Embedded SOMs & Solutions
MYIR Product Brochure - A Global Provider of Embedded SOMs & Solutions
 
AC Atlassian Coimbatore Session Slides( 22/06/2024)
AC Atlassian Coimbatore Session Slides( 22/06/2024)AC Atlassian Coimbatore Session Slides( 22/06/2024)
AC Atlassian Coimbatore Session Slides( 22/06/2024)
 
Observability For You and Me with OpenTelemetry
Observability For You and Me with OpenTelemetryObservability For You and Me with OpenTelemetry
Observability For You and Me with OpenTelemetry
 
Coordinate Systems in FME 101 - Webinar Slides
Coordinate Systems in FME 101 - Webinar SlidesCoordinate Systems in FME 101 - Webinar Slides
Coordinate Systems in FME 101 - Webinar Slides
 
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
 
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
 
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdfWhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
 
The Increasing Use of the National Research Platform by the CSU Campuses
The Increasing Use of the National Research Platform by the CSU CampusesThe Increasing Use of the National Research Platform by the CSU Campuses
The Increasing Use of the National Research Platform by the CSU Campuses
 
HTTP Adaptive Streaming – Quo Vadis (2024)
HTTP Adaptive Streaming – Quo Vadis (2024)HTTP Adaptive Streaming – Quo Vadis (2024)
HTTP Adaptive Streaming – Quo Vadis (2024)
 
K2G - Insurtech Innovation EMEA Award 2024
K2G - Insurtech Innovation EMEA Award 2024K2G - Insurtech Innovation EMEA Award 2024
K2G - Insurtech Innovation EMEA Award 2024
 
5G bootcamp Sep 2020 (NPI initiative).pptx
5G bootcamp Sep 2020 (NPI initiative).pptx5G bootcamp Sep 2020 (NPI initiative).pptx
5G bootcamp Sep 2020 (NPI initiative).pptx
 
find out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challengesfind out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challenges
 
How RPA Help in the Transportation and Logistics Industry.pptx
How RPA Help in the Transportation and Logistics Industry.pptxHow RPA Help in the Transportation and Logistics Industry.pptx
How RPA Help in the Transportation and Logistics Industry.pptx
 

Apache Hadoop 1.1

  • 2. • Before 2004 “Google have implemented hundreds of special-purpose computations that process large amounts of raw data, such as crawled documents, web request logs, etc., to compute various kinds of derived data, such as inverted indices etc.” • Nutch search system at 2004 was effectively limited to 100M web pages Use Cases
  • 3. • 2002: Doug Cutting started Nutch: crawler & search system • 2003: GoogleFS paper • 2004: Start of NDFS project (Nutch Distributed FS) • 2004: Google MapReduce paper • 2005: MapReduce implementation in Nutch • 2006: HDFS and MapReduce to Hadoop subproject • 2008: Yahoo! Production search index by a 10000-core Hadoop cluster • 2008: Hadoop – top-level Apache project Hadoop History
  • 4. • Need to process Multi Petabyte Datasets • Need to provide framework for reliable application execution • Need to encapsulate nodes failures from application developer. – Failure is expected, rather than exceptional. – The number of nodes in a cluster is not constant. • Need common infrastructure – Efficient, reliable, Open Source Apache License Hadoop Objectives
  • 5. • Hadoop Distributed File System (HDFS) • Hadoop MapReduce • Hadoop Common Hadoop
  • 6. • Very Large Distributed File System – 10K nodes, 100 million files, 10 PB • Assumes Commodity Hardware – Files are replicated to handle hardware failure – Detect failures and recovers from them • Optimized for Batch Processing – Data locations exposed so that computations can move to where data resides – Provides very high aggregate bandwidth Goals of GFS/HDFS
  • 7. • Data Coherency – Write-once-read-many access model – Client can only append to existing files • Files are broken up into blocks – Typically 128 MB block size – Each block replicated on multiple DataNodes • Intelligent Client – Client can find location of blocks – Client accesses data directly from DataNode HFDS Details
  • 11. • Java API • Command Line – hadoop dfs -mkdir /foodir – hadoop dfs -cat /foodir/myfile.txt – hadoop dfs -rm /foodir myfile.txt – hadoop dfsadmin –report – hadoop dfsadmin -decommission datanodename • Web Interface – http://host:port/dfshealth.jsp HDFS User Interface
  • 13. • The Map-Reduce programming model – Framework for distributed processing of large data sets – Pluggable user code runs in generic framework • Common design pattern in data processing cat * | grep | sort | uniq -c | cat > file input | map | shuffle | reduce | output • Natural for: – Log processing – Web search indexing – Ad-hoc queries Hadoop MapReduce
  • 14. Map function Reduce function Run this program as a MapReduce job Lifecycle of a MapReduce Job
  • 20. • 190+ parameters in Hadoop • Set manually or defaults are used Hadoop Configuration
  • 21. Pro: • Cheap components • Replication • Fault tolerance • Parallel processing • Free license • Linear scalability • Amazon support Con: • No realtime • Difficult to add MR tasks • File edit is not supported • High support cost Summary
  • 22. • Distributed Grep • Count of URL Access Frequency • Reverse Web-Link Graph • Inverted Index Examples
  • 23. • Streaming • Hive • Pig • HBase Hadoop
  • 24. API to MapReduce that uses Unix standard streams as the interface between Hadoop and your program MAP: map.rb #!/usr/bin/env ruby STDIN.each_line do |line| val = line year, temp, q = val[15,4], val[87,5], val[92,1] puts "#{year}t#{temp}" if (temp != "+9999" && q =~ /[01459]/) end % cat input/ncdc/sample.txt | map.rb 1950 +0000 1950 +0022 1950 -0011 1949 +0111 1949 +0078 LOCAL EXECUTION Hadoop Streaming (1)
  • 25. REDUCE: reduce.rb #!/usr/bin/env ruby last_key, max_val = nil, 0 STDIN.each_line do |line| key, val = line.split("t") if last_key && last_key != key puts "#{last_key}t#{max_val}" last_key, max_val = key, val.to_i else last_key, max_val = key, [max_val, val.to_i].max end end puts "#{last_key}t#{max_val}" if last_key % cat input/ncdc/sample.txt | map.rb | sort | reduce.rb 1949 111 1950 22 LOCAL EXECUTION Hadoop Streaming (2)
  • 26. HADOOP EXECUTION % hadoop jar $HADOOP_INSTALL/contrib/streaming/hadoop-*-streaming.jar -input input/ncdc/sample.txt -output output -mapper map.rb -reducer reduce.rb Hadoop Streaming (3)
  • 27.  Intuitive  Make the unstructured data looks like tables regardless how it really lay out  SQL based query can be directly against these tables  Generate specify execution plan for this query  What’s Hive  A data warehousing system to store structured data on Hadoop file system  Provide an easy query these data by execution Hadoop MapReduce plans Hive: overview
  • 29. hive> SHOW TABLES; hive> CREATE TABLE shakespeare (freq INT, word STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘t’ STORED AS TEXTFILE; hive> DESCRIBE shakespeare; loading data… hive> SELECT * FROM shakespeare LIMIT 10; hive> SELECT * FROM shakespeare WHERE freq > 100 SORT BY freq ASC LIMIT 10; Hive: shell
  • 30. -- max_temp.pig: Finds the maximum temperature by year records = LOAD 'input/ncdc/micro-tab/sample.txt' AS (year:chararray, temperature:int, quality:int); filtered_records = FILTER records BY temperature != 9999 AND (quality == 0 OR quality == 1 OR quality == 4 OR quality == 5 OR quality == 9); grouped_records = GROUP filtered_records BY year; max_temp = FOREACH grouped_records GENERATE group, MAX(filtered_records.temperature); DUMP max_temp; Pig
  • 31. Initial public launch Move from local workstation to shared, remote hosted MySQL instance with a well-defined schema. Service becomes more popular; too many reads hitting the database Add memcached to cache common queries. Reads are now no longer strictly ACID; cached data must expire. Service continues to grow in popularity; too many writes hitting the database Scale MySQL vertically by buying a beefed up server with 16 cores, 128 GB of RAM, and banks of 15 k RPM hard drives. Costly. RDBMS scaling story (1)
  • 32. New features increases query complexity; now we have too many joins Denormalize your data to reduce joins. Rising popularity swamps the server; things are too slow Stop doing any server-side computations. Some queries are still too slow Periodically prematerialize the most complex queries, try to stop joining in most cases. Reads are OK, but writes are getting slower and slower Drop secondary indexes and triggers (no indexes?). RDBMS scaling story (1)
  • 33. NoSQL
  • 34. • Tables have one primary index, the row key • No join operators • Data is unstructured and untyped • No accessed or manipulated via SQL – Programmatic access via Java, REST, or Thrift APIs • There are three types of lookups: – Fast lookup using row key and optional timestamp – Full table scan – Range scan from region start to end Hbase: differences from RDBMS
  • 35. • Automatic partitioning • Scale linearly and automatically with new nodes • Commodity hardware • Fault tolerance: Apache Zookeeper • Batch processing: Apache Hadoop Hbase: benefits over RDBMS
  • 36.  Tables are sorted by Row  Table schema only define it’s column families .  Each family consists of any number of columns  Each column consists of any number of versions  Columns only exist when inserted, NULLs are free.  Columns within a family are sorted and stored together  Everything except table names are byte[]  (Row, Family: Column, Timestamp)  Value Row key Column Family valueTimeStamp Hbase: data model
  • 37. • Master – Responsible for monitoring region servers – Load balancing for regions – Redirect client to correct region servers • regionserver slaves – Serving requests (Write/Read/Scan) of Client – Send HeartBeat to Master Hbase: members
  • 38. $ hbase shell > create 'test', 'data' 0 row(s) in 4.3066 seconds > list test 1 row(s) in 0.1485 seconds > put 'test', 'row1', 'data:1', 'value1' 0 row(s) in 0.0454 seconds > put 'test', 'row2', 'data:2', 'value2' 0 row(s) in 0.0035 seconds > scan 'test' ROW COLUMN+CELL row1 column=data:1, timestamp=1240148026198, value=value1 row2 column=data:2, timestamp=1240148040035, value=value2 2 row(s) in 0.0825 seconds Hbase: shell
  • 40. • Amazon • Facebook • Google • IBM • Joost • Last.fm • New York Times • PowerSet • Veoh • Yahoo! Who uses Hadoop?
  • 41. Books