SlideShare a Scribd company logo
Distributed Database
Architecture
Search and Indexing
Nick Kabra
Distributed Database Architecture 1
Presentation Agenda
Team Introduction
Basics and History
Use Cases & Current Usage
Highlights
Appendix
DISCLAIMER: This is a knowledge-sharing
session and not a recommendation for any
specific technology / product
From the web
Migration
Distributed Database Architecture 2
Team Introduction
Name:
Designation:
Experience with Search and Indexing:
How long have you been working with Solr or ElasticSearch:
Distributed Database Architecture 3
Basics
1
2
3
4
• Used for Indexing and Searching
• Built on top of Lucene API
• Solr and ES take Lucene API and build features on
top. API accessed through web server
• Smaller version of Google which has indexed and
ranked the web pages
Search platform for Web sites. Search platform for organization.
• Lucene – search engine packaged together in
set of jar files
Distributed Database Architecture 4
History
• Differences in design and architecture.
Distributed Database Architecture 5
ES was released in 2010.
Additional features.
Solr released in 2008.
Key Players: Solr and ElasticSearch
1
2
3
Latest Version= Solr 4.6.1
released on Jan 28, 2014
Collection – Main logical
structure for Solr
Index – Main logical structure for
ES
Architecture
• Distributed
• Fault tolerant and auto
replicas
• Coord: Only ElasticSearch
nodes + zen discovery. Split
brain.
• Single leader
• Automatic leader election
Solr ElasticSearch (ES)
Latest Version= ElasticSearch
1.0.0 released on Feb 12, 2014
Architecture
• Distributed
• Fault tolerant and auto
replicas
• Coord: Apache Solr +
ZooKeeper ensemble. So
quorum
• Leader per shard
• Automatic leader election
Distributed Database Architecture 6
Resume recommendations
UseCase1
Challenge
• Company ABC helps other firms hire skilled developers, project
managers. Empower customers to find the right job candidate
from a database of 8 million profiles.
• Need fast and predictable performance.
• Include geo-spatial.
Success
• Customer hires using the company ABC.
• ABC stores searches made by customers.
• Identify candidates, skills, compensation structure to
enhance the customer search experience with better
matches.
• Make recommendations to customers on salaries, future
market needs etc.
• Eliminate duplicate profiles with realtime indexing and
percolation.
• Provides enhanced customers experience, faster
responses
Opportunity
• Use ES as the search engine with realtime indexing
and nested querying.
Point
Distributed Database Architecture 7
Integration - Use Case 2
THE
FULL
CIRCLE
Kibana
Visualization engine for
dynamic dashboards created
in real-time or on-the-fly
ElasticSearch
Search, analyze in realtime
Logstash
Take logs, scrub, parse and
enrich the data
Distributed Database Architecture 8
Chatagent for 460 million documents – Use Case 3
9
Challenge
6,000 customers from around the world use LiveChat daily to communicate with their customers from one person owned businesses to
international organizations like LG, Apple, Adobe etc.
LiveChat customers conduct 3.6 million queries and 220 million “get” operations per day on 460 million documents. LiveChat keeps these
documents updated with 70 million indexing operations every day.
Solution
Advantage
• Reduce query time from 2 seconds to 100 ms
• Streamline updating from hours to seconds
• Guarantee maximum uptime
• Scale to meet the needs of 6,000 customers
• Store and search on 460 million documents
• Process 3.6 million queries per day
• Scalability, indexing, Full text search allows users to search through chat archives
• Faceting makes it possible to pull various statistics for LiveChat clients.
• ES acts as single datastore, data updates available immediately - Now each of the documents is updated in LiveChat on an average of 20 to
30 times every 20 to 60 seconds.
Distributed Database Architecture
Current Uses
1
2
3
4
• Use Case 1
• Use Case 2
• Use Case 4
• Use Case 3
x • Use Case X
10Distributed Database Architecture
Highlights
Schema and config –
Solrconfig.xml, es.yml – change
no. of shards and replicas live
Scaling - nodes autobalanced,
/ Solr -3755 or shard splitting /add a
document
Nesting (address, users & rights,
boolean, parent children)
Index=different types of
documents and analyzer
Point
Node discovery and fault
discovery. Zookeeper
Point
Multiple documents per schema
and parent-child
Point
Percolator
Point
Aggregation+facets in ES
/Facets in Solr
Distributed Database Architecture 11
Highlights (contd. 2)
Auto-load balancer and auto-sharding
Marvel metrics on 03/13/2014
Brain Split problem in ES
Structured queryDSL and query control
Real-time indexing /near real-time indexing
Query routing and Solr 5816 to be introduced
1
2
3
4
5
6
Distributed Database Architecture 12
ElasticSearch / Solr funnel
UIMA
Text analysis debugger,
spell check
Decision tree faceting /
Drilldown
Cloudera, Mapr, DataStax
support Solr
Filters for queries across
nested documents
Query handling analyzer and
language, term suggester,
autocomplete
Realtime GET with query routing
Hortonworks, Couchbase
support ElasticSearch
Distributed Database Architecture 13
FROM THE WEB
Web CPA
This is only an FYI: Found some customers moving from Solr to ElasticSearch but
could not find any article which mentioned that clients moved from ES to Solr.
Caveat: No prejudice but it would be good to hear what customers say.
Let us also check this site: http://www.ymc.ch/en/why-we-chose-solr-4-0-instead-of-elasticsearch
http://www.mgt-commerce.com/magento-elasticsearch.html
Foursquare= http://engineering.foursquare.com/2012/08/09/foursquare-now-uses-
elastic-search-and-on-a-related-note-slashem-also-works-with-elastic-search/
Jetwick= http://karussell.wordpress.com/2011/02/07/why-jetwick-moved-from-solr-
to-elasticsearch/
Netricos= http://www.netricos.com/blog/posts/how-we-are-using-elastic-search
Stumbleupon = http://www.elasticsearch.org/case-study/stumbleupon/
UK govt. site= https://gds.blog.gov.uk/2012/08/03/from-solr-to-elasticsearch/
Wikimedia= http://thenextweb.com/insider/2014/01/06/wikimedia-will-replace-
search-elasticsearch-beta-users-february-users-march-april/#!xDKnd
Distributed Database Architecture 14
2 Parts of a whole – The Math
Solr performs very well on small
indexes that don’t change very often
1
Scalability, auto-sharding, GUI
admin, schemaless, real-time,
nested queries, routing and the
way indexing and queries are
handled which provide faster
execution of queries and better
indexing provide a distinct
advantage to using ES
2
Solr
ElasticSearch
Distributed Database Architecture 15
Migration
Step 1
Use river plugin to migrate
from existing Solr to ES.
Step 2
Pulls the content from
existing Solr cluster and
index it in ES
Step 3
When you decide to switch to
Elasticsearch permanently, you would
obviously switch your indexing to
directly index content from your
sources to Elasticsearch. Keeping Solr
in the middle is not a recommended
setup.
Distributed Database Architecture 16
If we have a small site and need
search features without the
distributed bells-and-whistles,
both Solr and ElasticSearch are
efficient
If we are planning a large
installation that requires
running distributed search
with nesting, scalability,
sharding, real-time
ElasticSearch can do a better
job.
Conclusion
Distributed Database Architecture 17
Both products
trying to catch-up
based on other
product’s capabilities
Where do we go from here ?
---------------------------------------
The best way to define this is:
Some possible next steps….
Question to ask
Distributed Database Architecture 18
Thank you!
201-925-0488
nikkabs@gmail.com
Architecture – Global Head
Distributed Database Architecture 19
Questions session
.
Distributed Database Architecture 20
Appendix
.HYPERLINK
Distributed Database Architecture 21

More Related Content

What's hot

Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Edureka!
 
Machine Learning with Apache Spark
Machine Learning with Apache SparkMachine Learning with Apache Spark
Machine Learning with Apache Spark
IBM Cloud Data Services
 
Implementing Site Search in CQ5 / AEM
Implementing Site Search in CQ5 / AEMImplementing Site Search in CQ5 / AEM
Implementing Site Search in CQ5 / AEM
rtpaem
 
Do you need an external search platform for Adobe Experience Manager?
Do you need an external search platform for Adobe Experience Manager?Do you need an external search platform for Adobe Experience Manager?
Do you need an external search platform for Adobe Experience Manager?
therealgaston
 
MLOps with a Feature Store: Filling the Gap in ML Infrastructure
MLOps with a Feature Store: Filling the Gap in ML InfrastructureMLOps with a Feature Store: Filling the Gap in ML Infrastructure
MLOps with a Feature Store: Filling the Gap in ML Infrastructure
Data Science Milan
 
Webinar: Event Processing & Data Analytics with Lucidworks Fusion
Webinar: Event Processing & Data Analytics with Lucidworks FusionWebinar: Event Processing & Data Analytics with Lucidworks Fusion
Webinar: Event Processing & Data Analytics with Lucidworks Fusion
Lucidworks
 
StreamSQL Feature Store (Apache Pulsar Summit)
StreamSQL Feature Store (Apache Pulsar Summit)StreamSQL Feature Store (Apache Pulsar Summit)
StreamSQL Feature Store (Apache Pulsar Summit)
Simba Khadder
 
Consuming External Content and Enriching Content with Apache Camel
Consuming External Content and Enriching Content with Apache CamelConsuming External Content and Enriching Content with Apache Camel
Consuming External Content and Enriching Content with Apache Camel
therealgaston
 
Fifth Elephant Apache Atlas Talk
Fifth Elephant Apache Atlas TalkFifth Elephant Apache Atlas Talk
Fifth Elephant Apache Atlas Talk
Vimal Sharma
 
Politics Ain’t Beanbag: Using APEX, ML, and GeoCoding In a Modern Election Ca...
Politics Ain’t Beanbag: Using APEX, ML, and GeoCoding In a Modern Election Ca...Politics Ain’t Beanbag: Using APEX, ML, and GeoCoding In a Modern Election Ca...
Politics Ain’t Beanbag: Using APEX, ML, and GeoCoding In a Modern Election Ca...
Jim Czuprynski
 
What's Your Super-Power? Mine is Machine Learning with Oracle Autonomous DB.
What's Your Super-Power? Mine is Machine Learning with Oracle Autonomous DB.What's Your Super-Power? Mine is Machine Learning with Oracle Autonomous DB.
What's Your Super-Power? Mine is Machine Learning with Oracle Autonomous DB.
Jim Czuprynski
 
Hopsworks data engineering melbourne april 2020
Hopsworks   data engineering melbourne april 2020Hopsworks   data engineering melbourne april 2020
Hopsworks data engineering melbourne april 2020
Jim Dowling
 
Analytics Metrics Delivery & ML Feature Visualization
Analytics Metrics Delivery & ML Feature VisualizationAnalytics Metrics Delivery & ML Feature Visualization
Analytics Metrics Delivery & ML Feature Visualization
Bill Liu
 
Introduction to Hivemall
Introduction to HivemallIntroduction to Hivemall
Introduction to Hivemall
Treasure Data, Inc.
 
Asynchronous Hyperparameter Search with Spark on Hopsworks and Maggy
Asynchronous Hyperparameter Search with Spark on Hopsworks and MaggyAsynchronous Hyperparameter Search with Spark on Hopsworks and Maggy
Asynchronous Hyperparameter Search with Spark on Hopsworks and Maggy
Jim Dowling
 
Berlin buzzwords 2020-feature-store-dowling
Berlin buzzwords 2020-feature-store-dowlingBerlin buzzwords 2020-feature-store-dowling
Berlin buzzwords 2020-feature-store-dowling
Jim Dowling
 
What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...
What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...
What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...
Edureka!
 
ElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learnedElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learned
BeyondTrees
 
ADL/U-SQL Introduction (SQLBits 2016)
ADL/U-SQL Introduction (SQLBits 2016)ADL/U-SQL Introduction (SQLBits 2016)
ADL/U-SQL Introduction (SQLBits 2016)
Michael Rys
 
An Autonomous Singularity Approaches: Force Multipliers For Overwhelmed DBAs
An Autonomous Singularity Approaches: Force Multipliers For Overwhelmed DBAsAn Autonomous Singularity Approaches: Force Multipliers For Overwhelmed DBAs
An Autonomous Singularity Approaches: Force Multipliers For Overwhelmed DBAs
Jim Czuprynski
 

What's hot (20)

Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
 
Machine Learning with Apache Spark
Machine Learning with Apache SparkMachine Learning with Apache Spark
Machine Learning with Apache Spark
 
Implementing Site Search in CQ5 / AEM
Implementing Site Search in CQ5 / AEMImplementing Site Search in CQ5 / AEM
Implementing Site Search in CQ5 / AEM
 
Do you need an external search platform for Adobe Experience Manager?
Do you need an external search platform for Adobe Experience Manager?Do you need an external search platform for Adobe Experience Manager?
Do you need an external search platform for Adobe Experience Manager?
 
MLOps with a Feature Store: Filling the Gap in ML Infrastructure
MLOps with a Feature Store: Filling the Gap in ML InfrastructureMLOps with a Feature Store: Filling the Gap in ML Infrastructure
MLOps with a Feature Store: Filling the Gap in ML Infrastructure
 
Webinar: Event Processing & Data Analytics with Lucidworks Fusion
Webinar: Event Processing & Data Analytics with Lucidworks FusionWebinar: Event Processing & Data Analytics with Lucidworks Fusion
Webinar: Event Processing & Data Analytics with Lucidworks Fusion
 
StreamSQL Feature Store (Apache Pulsar Summit)
StreamSQL Feature Store (Apache Pulsar Summit)StreamSQL Feature Store (Apache Pulsar Summit)
StreamSQL Feature Store (Apache Pulsar Summit)
 
Consuming External Content and Enriching Content with Apache Camel
Consuming External Content and Enriching Content with Apache CamelConsuming External Content and Enriching Content with Apache Camel
Consuming External Content and Enriching Content with Apache Camel
 
Fifth Elephant Apache Atlas Talk
Fifth Elephant Apache Atlas TalkFifth Elephant Apache Atlas Talk
Fifth Elephant Apache Atlas Talk
 
Politics Ain’t Beanbag: Using APEX, ML, and GeoCoding In a Modern Election Ca...
Politics Ain’t Beanbag: Using APEX, ML, and GeoCoding In a Modern Election Ca...Politics Ain’t Beanbag: Using APEX, ML, and GeoCoding In a Modern Election Ca...
Politics Ain’t Beanbag: Using APEX, ML, and GeoCoding In a Modern Election Ca...
 
What's Your Super-Power? Mine is Machine Learning with Oracle Autonomous DB.
What's Your Super-Power? Mine is Machine Learning with Oracle Autonomous DB.What's Your Super-Power? Mine is Machine Learning with Oracle Autonomous DB.
What's Your Super-Power? Mine is Machine Learning with Oracle Autonomous DB.
 
Hopsworks data engineering melbourne april 2020
Hopsworks   data engineering melbourne april 2020Hopsworks   data engineering melbourne april 2020
Hopsworks data engineering melbourne april 2020
 
Analytics Metrics Delivery & ML Feature Visualization
Analytics Metrics Delivery & ML Feature VisualizationAnalytics Metrics Delivery & ML Feature Visualization
Analytics Metrics Delivery & ML Feature Visualization
 
Introduction to Hivemall
Introduction to HivemallIntroduction to Hivemall
Introduction to Hivemall
 
Asynchronous Hyperparameter Search with Spark on Hopsworks and Maggy
Asynchronous Hyperparameter Search with Spark on Hopsworks and MaggyAsynchronous Hyperparameter Search with Spark on Hopsworks and Maggy
Asynchronous Hyperparameter Search with Spark on Hopsworks and Maggy
 
Berlin buzzwords 2020-feature-store-dowling
Berlin buzzwords 2020-feature-store-dowlingBerlin buzzwords 2020-feature-store-dowling
Berlin buzzwords 2020-feature-store-dowling
 
What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...
What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...
What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...
 
ElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learnedElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learned
 
ADL/U-SQL Introduction (SQLBits 2016)
ADL/U-SQL Introduction (SQLBits 2016)ADL/U-SQL Introduction (SQLBits 2016)
ADL/U-SQL Introduction (SQLBits 2016)
 
An Autonomous Singularity Approaches: Force Multipliers For Overwhelmed DBAs
An Autonomous Singularity Approaches: Force Multipliers For Overwhelmed DBAsAn Autonomous Singularity Approaches: Force Multipliers For Overwhelmed DBAs
An Autonomous Singularity Approaches: Force Multipliers For Overwhelmed DBAs
 

Similar to Solr and ElasticSearch demo and speaker feb 2014

Serverless SQL
Serverless SQLServerless SQL
Serverless SQL
Torsten Steinbach
 
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
ALTER WAY
 
Solr at zvents 6 years later & still going strong
Solr at zvents   6 years later & still going strongSolr at zvents   6 years later & still going strong
Solr at zvents 6 years later & still going strong
lucenerevolution
 
Elastic - ELK, Logstash & Kibana
Elastic - ELK, Logstash & KibanaElastic - ELK, Logstash & Kibana
Elastic - ELK, Logstash & Kibana
SpringPeople
 
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Spark Summit
 
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical SolutionEnterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Dmitry Anoshin
 
Enabling Self Service Business Intelligence using Excel
Enabling Self Service Business Intelligenceusing ExcelEnabling Self Service Business Intelligenceusing Excel
Enabling Self Service Business Intelligence using Excel
Alan Koo
 
Comment transformer vos données en informations exploitables
Comment transformer vos données en informations exploitablesComment transformer vos données en informations exploitables
Comment transformer vos données en informations exploitables
Elasticsearch
 
Getting Started with Elasticsearch
Getting Started with ElasticsearchGetting Started with Elasticsearch
Getting Started with Elasticsearch
Alibaba Cloud
 
Ladies Be Architects - Integration - Multi-Org, Security, JSON, Backup & Restore
Ladies Be Architects - Integration - Multi-Org, Security, JSON, Backup & RestoreLadies Be Architects - Integration - Multi-Org, Security, JSON, Backup & Restore
Ladies Be Architects - Integration - Multi-Org, Security, JSON, Backup & Restore
gemziebeth
 
Cómo transformar los datos en análisis con los que tomar decisiones
Cómo transformar los datos en análisis con los que tomar decisionesCómo transformar los datos en análisis con los que tomar decisiones
Cómo transformar los datos en análisis con los que tomar decisiones
Elasticsearch
 
Comment transformer vos données en informations exploitables
Comment transformer vos données en informations exploitablesComment transformer vos données en informations exploitables
Comment transformer vos données en informations exploitables
Elasticsearch
 
170215 msa intro
170215 msa intro170215 msa intro
170215 msa intro
Sonic leigh
 
AngularJS 1.x - your first application (problems and solutions)
AngularJS 1.x - your first application (problems and solutions)AngularJS 1.x - your first application (problems and solutions)
AngularJS 1.x - your first application (problems and solutions)
Igor Talevski
 
Handling of Large Data by Salesforce
Handling of Large Data by SalesforceHandling of Large Data by Salesforce
Handling of Large Data by Salesforce
Thinqloud
 
Just the Job: Employing Solr for Recruitment Search -Charlie Hull
Just the Job: Employing Solr for Recruitment Search -Charlie Hull Just the Job: Employing Solr for Recruitment Search -Charlie Hull
Just the Job: Employing Solr for Recruitment Search -Charlie Hull
lucenerevolution
 
Solr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance studySolr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance study
Charlie Hull
 
SQL Analytics for Search Engineers - Timothy Potter, Lucidworksngineers
SQL Analytics for Search Engineers - Timothy Potter, LucidworksngineersSQL Analytics for Search Engineers - Timothy Potter, Lucidworksngineers
SQL Analytics for Search Engineers - Timothy Potter, Lucidworksngineers
Lucidworks
 
Oracle bi ee architecture
Oracle bi ee architectureOracle bi ee architecture
Oracle bi ee architecture
OBIEE Training Online
 
Transforming data into actionable insights
Transforming data into actionable insightsTransforming data into actionable insights
Transforming data into actionable insights
Elasticsearch
 

Similar to Solr and ElasticSearch demo and speaker feb 2014 (20)

Serverless SQL
Serverless SQLServerless SQL
Serverless SQL
 
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
 
Solr at zvents 6 years later & still going strong
Solr at zvents   6 years later & still going strongSolr at zvents   6 years later & still going strong
Solr at zvents 6 years later & still going strong
 
Elastic - ELK, Logstash & Kibana
Elastic - ELK, Logstash & KibanaElastic - ELK, Logstash & Kibana
Elastic - ELK, Logstash & Kibana
 
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
 
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical SolutionEnterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
 
Enabling Self Service Business Intelligence using Excel
Enabling Self Service Business Intelligenceusing ExcelEnabling Self Service Business Intelligenceusing Excel
Enabling Self Service Business Intelligence using Excel
 
Comment transformer vos données en informations exploitables
Comment transformer vos données en informations exploitablesComment transformer vos données en informations exploitables
Comment transformer vos données en informations exploitables
 
Getting Started with Elasticsearch
Getting Started with ElasticsearchGetting Started with Elasticsearch
Getting Started with Elasticsearch
 
Ladies Be Architects - Integration - Multi-Org, Security, JSON, Backup & Restore
Ladies Be Architects - Integration - Multi-Org, Security, JSON, Backup & RestoreLadies Be Architects - Integration - Multi-Org, Security, JSON, Backup & Restore
Ladies Be Architects - Integration - Multi-Org, Security, JSON, Backup & Restore
 
Cómo transformar los datos en análisis con los que tomar decisiones
Cómo transformar los datos en análisis con los que tomar decisionesCómo transformar los datos en análisis con los que tomar decisiones
Cómo transformar los datos en análisis con los que tomar decisiones
 
Comment transformer vos données en informations exploitables
Comment transformer vos données en informations exploitablesComment transformer vos données en informations exploitables
Comment transformer vos données en informations exploitables
 
170215 msa intro
170215 msa intro170215 msa intro
170215 msa intro
 
AngularJS 1.x - your first application (problems and solutions)
AngularJS 1.x - your first application (problems and solutions)AngularJS 1.x - your first application (problems and solutions)
AngularJS 1.x - your first application (problems and solutions)
 
Handling of Large Data by Salesforce
Handling of Large Data by SalesforceHandling of Large Data by Salesforce
Handling of Large Data by Salesforce
 
Just the Job: Employing Solr for Recruitment Search -Charlie Hull
Just the Job: Employing Solr for Recruitment Search -Charlie Hull Just the Job: Employing Solr for Recruitment Search -Charlie Hull
Just the Job: Employing Solr for Recruitment Search -Charlie Hull
 
Solr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance studySolr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance study
 
SQL Analytics for Search Engineers - Timothy Potter, Lucidworksngineers
SQL Analytics for Search Engineers - Timothy Potter, LucidworksngineersSQL Analytics for Search Engineers - Timothy Potter, Lucidworksngineers
SQL Analytics for Search Engineers - Timothy Potter, Lucidworksngineers
 
Oracle bi ee architecture
Oracle bi ee architectureOracle bi ee architecture
Oracle bi ee architecture
 
Transforming data into actionable insights
Transforming data into actionable insightsTransforming data into actionable insights
Transforming data into actionable insights
 

More from nkabra

How i helped rue la la become a one stop ecommerce boutique
How i helped rue la la become a one stop ecommerce boutiqueHow i helped rue la la become a one stop ecommerce boutique
How i helped rue la la become a one stop ecommerce boutique
nkabra
 
How geo phy built a proprietary automated valuation platform for the commerci...
How geo phy built a proprietary automated valuation platform for the commerci...How geo phy built a proprietary automated valuation platform for the commerci...
How geo phy built a proprietary automated valuation platform for the commerci...
nkabra
 
How fleet advantage analytics uses predic engine and iot with machine learning
How fleet advantage analytics uses predic engine and iot with machine learningHow fleet advantage analytics uses predic engine and iot with machine learning
How fleet advantage analytics uses predic engine and iot with machine learning
nkabra
 
Building a data science team at michelin tyres
Building a data science team at michelin tyresBuilding a data science team at michelin tyres
Building a data science team at michelin tyres
nkabra
 
Inmemory db nick kabra june 2013 discussion at columbia university
Inmemory db nick kabra june 2013 discussion at columbia universityInmemory db nick kabra june 2013 discussion at columbia university
Inmemory db nick kabra june 2013 discussion at columbia university
nkabra
 
Comparisons of no sql databases march 2014
Comparisons of no sql databases march 2014Comparisons of no sql databases march 2014
Comparisons of no sql databases march 2014
nkabra
 
Hadoop comparative scorecard nick kabra sr mgmt 04042014 and stack integrati...
Hadoop comparative scorecard  nick kabra sr mgmt 04042014 and stack integrati...Hadoop comparative scorecard  nick kabra sr mgmt 04042014 and stack integrati...
Hadoop comparative scorecard nick kabra sr mgmt 04042014 and stack integrati...
nkabra
 
Harvard case studies presentation 09102013
Harvard case studies presentation 09102013Harvard case studies presentation 09102013
Harvard case studies presentation 09102013
nkabra
 
Hadoop compression analysis strata conference
Hadoop compression analysis strata conferenceHadoop compression analysis strata conference
Hadoop compression analysis strata conference
nkabra
 
Hadoop compression strata conference
Hadoop compression strata conferenceHadoop compression strata conference
Hadoop compression strata conference
nkabra
 
Future of big data nick kabra speaker compendium march 2013
Future of big data nick kabra speaker compendium march 2013Future of big data nick kabra speaker compendium march 2013
Future of big data nick kabra speaker compendium march 2013
nkabra
 
Big data in marketing at harvard business club nick1 june 15 2013
Big data in marketing at harvard business club nick1 june 15 2013Big data in marketing at harvard business club nick1 june 15 2013
Big data in marketing at harvard business club nick1 june 15 2013
nkabra
 

More from nkabra (12)

How i helped rue la la become a one stop ecommerce boutique
How i helped rue la la become a one stop ecommerce boutiqueHow i helped rue la la become a one stop ecommerce boutique
How i helped rue la la become a one stop ecommerce boutique
 
How geo phy built a proprietary automated valuation platform for the commerci...
How geo phy built a proprietary automated valuation platform for the commerci...How geo phy built a proprietary automated valuation platform for the commerci...
How geo phy built a proprietary automated valuation platform for the commerci...
 
How fleet advantage analytics uses predic engine and iot with machine learning
How fleet advantage analytics uses predic engine and iot with machine learningHow fleet advantage analytics uses predic engine and iot with machine learning
How fleet advantage analytics uses predic engine and iot with machine learning
 
Building a data science team at michelin tyres
Building a data science team at michelin tyresBuilding a data science team at michelin tyres
Building a data science team at michelin tyres
 
Inmemory db nick kabra june 2013 discussion at columbia university
Inmemory db nick kabra june 2013 discussion at columbia universityInmemory db nick kabra june 2013 discussion at columbia university
Inmemory db nick kabra june 2013 discussion at columbia university
 
Comparisons of no sql databases march 2014
Comparisons of no sql databases march 2014Comparisons of no sql databases march 2014
Comparisons of no sql databases march 2014
 
Hadoop comparative scorecard nick kabra sr mgmt 04042014 and stack integrati...
Hadoop comparative scorecard  nick kabra sr mgmt 04042014 and stack integrati...Hadoop comparative scorecard  nick kabra sr mgmt 04042014 and stack integrati...
Hadoop comparative scorecard nick kabra sr mgmt 04042014 and stack integrati...
 
Harvard case studies presentation 09102013
Harvard case studies presentation 09102013Harvard case studies presentation 09102013
Harvard case studies presentation 09102013
 
Hadoop compression analysis strata conference
Hadoop compression analysis strata conferenceHadoop compression analysis strata conference
Hadoop compression analysis strata conference
 
Hadoop compression strata conference
Hadoop compression strata conferenceHadoop compression strata conference
Hadoop compression strata conference
 
Future of big data nick kabra speaker compendium march 2013
Future of big data nick kabra speaker compendium march 2013Future of big data nick kabra speaker compendium march 2013
Future of big data nick kabra speaker compendium march 2013
 
Big data in marketing at harvard business club nick1 june 15 2013
Big data in marketing at harvard business club nick1 june 15 2013Big data in marketing at harvard business club nick1 june 15 2013
Big data in marketing at harvard business club nick1 june 15 2013
 

Recently uploaded

Vrinda store data analysis project using Excel
Vrinda store data analysis project using ExcelVrinda store data analysis project using Excel
Vrinda store data analysis project using Excel
SantuJana12
 
PRODUCT | RESEARCH-PRESENTATION-1.1.pptx
PRODUCT | RESEARCH-PRESENTATION-1.1.pptxPRODUCT | RESEARCH-PRESENTATION-1.1.pptx
PRODUCT | RESEARCH-PRESENTATION-1.1.pptx
amazenolmedojeruel
 
Systane Global education training centre
Systane Global education training centreSystane Global education training centre
Systane Global education training centre
AkhinaRomdoni
 
Training on CSPro and step by steps.pptx
Training on CSPro and step by steps.pptxTraining on CSPro and step by steps.pptx
Training on CSPro and step by steps.pptx
lenjisoHussein
 
Histology of Muscle types histology o.ppt
Histology of Muscle types histology o.pptHistology of Muscle types histology o.ppt
Histology of Muscle types histology o.ppt
SamanArshad11
 
Audits Of Complaints Against the PPD Report_2022.pdf
Audits Of Complaints Against the PPD Report_2022.pdfAudits Of Complaints Against the PPD Report_2022.pdf
Audits Of Complaints Against the PPD Report_2022.pdf
evwcarr
 
Where to order Frederick Community College diploma?
Where to order Frederick Community College diploma?Where to order Frederick Community College diploma?
Where to order Frederick Community College diploma?
SomalyEng
 
Bimbingan kaunseling untuk pelajar IPTA/IPTS di Malaysia
Bimbingan kaunseling untuk pelajar IPTA/IPTS di MalaysiaBimbingan kaunseling untuk pelajar IPTA/IPTS di Malaysia
Bimbingan kaunseling untuk pelajar IPTA/IPTS di Malaysia
aznidajailani
 
Technology used in Ott data analysis project
Technology used in Ott data analysis  projectTechnology used in Ott data analysis  project
Technology used in Ott data analysis project
49AkshitYadav
 
Field Diary and lab record, Importance.pdf
Field Diary and lab record, Importance.pdfField Diary and lab record, Importance.pdf
Field Diary and lab record, Importance.pdf
hritikbui
 
393947940-The-Dell-EMC-PowerMax-Family-Overview.pdf
393947940-The-Dell-EMC-PowerMax-Family-Overview.pdf393947940-The-Dell-EMC-PowerMax-Family-Overview.pdf
393947940-The-Dell-EMC-PowerMax-Family-Overview.pdf
Ladislau5
 
SOFTWARE ENGINEERING-UNIT-1SOFTWARE ENGINEERING
SOFTWARE ENGINEERING-UNIT-1SOFTWARE ENGINEERINGSOFTWARE ENGINEERING-UNIT-1SOFTWARE ENGINEERING
SOFTWARE ENGINEERING-UNIT-1SOFTWARE ENGINEERING
PrabhuB33
 
How AI is Revolutionizing Data Collection.pdf
How AI is Revolutionizing Data Collection.pdfHow AI is Revolutionizing Data Collection.pdf
How AI is Revolutionizing Data Collection.pdf
PromptCloud
 
SFBA Splunk Usergroup meeting July 17, 2024
SFBA Splunk Usergroup meeting July 17, 2024SFBA Splunk Usergroup meeting July 17, 2024
SFBA Splunk Usergroup meeting July 17, 2024
Becky Burwell
 
Selcuk Topal Arbitrum Scientific Report.pdf
Selcuk Topal Arbitrum Scientific Report.pdfSelcuk Topal Arbitrum Scientific Report.pdf
Selcuk Topal Arbitrum Scientific Report.pdf
SelcukTOPAL2
 
Acid Base Practice Test 4- KEY.pdfkkjkjk
Acid Base Practice Test 4- KEY.pdfkkjkjkAcid Base Practice Test 4- KEY.pdfkkjkjk
Acid Base Practice Test 4- KEY.pdfkkjkjk
talha2khan2k
 
Accounting and Auditing Laws-Rules-and-Regulations
Accounting and Auditing Laws-Rules-and-RegulationsAccounting and Auditing Laws-Rules-and-Regulations
Accounting and Auditing Laws-Rules-and-Regulations
DALubis
 
Data Analytics for Decision Making By District 11 Solutions
Data Analytics for Decision Making By District 11 SolutionsData Analytics for Decision Making By District 11 Solutions
Data Analytics for Decision Making By District 11 Solutions
District 11 Solutions
 
Cal Girls Hotel Safari Jaipur | | Girls Call Free Drop Service
Cal Girls Hotel Safari Jaipur | | Girls Call Free Drop ServiceCal Girls Hotel Safari Jaipur | | Girls Call Free Drop Service
Cal Girls Hotel Safari Jaipur | | Girls Call Free Drop Service
Deepikakumari457585
 
Parcel Delivery - Intel Segmentation and Last Mile Opt.pptx
Parcel Delivery - Intel Segmentation and Last Mile Opt.pptxParcel Delivery - Intel Segmentation and Last Mile Opt.pptx
Parcel Delivery - Intel Segmentation and Last Mile Opt.pptx
AltanAtabarut
 

Recently uploaded (20)

Vrinda store data analysis project using Excel
Vrinda store data analysis project using ExcelVrinda store data analysis project using Excel
Vrinda store data analysis project using Excel
 
PRODUCT | RESEARCH-PRESENTATION-1.1.pptx
PRODUCT | RESEARCH-PRESENTATION-1.1.pptxPRODUCT | RESEARCH-PRESENTATION-1.1.pptx
PRODUCT | RESEARCH-PRESENTATION-1.1.pptx
 
Systane Global education training centre
Systane Global education training centreSystane Global education training centre
Systane Global education training centre
 
Training on CSPro and step by steps.pptx
Training on CSPro and step by steps.pptxTraining on CSPro and step by steps.pptx
Training on CSPro and step by steps.pptx
 
Histology of Muscle types histology o.ppt
Histology of Muscle types histology o.pptHistology of Muscle types histology o.ppt
Histology of Muscle types histology o.ppt
 
Audits Of Complaints Against the PPD Report_2022.pdf
Audits Of Complaints Against the PPD Report_2022.pdfAudits Of Complaints Against the PPD Report_2022.pdf
Audits Of Complaints Against the PPD Report_2022.pdf
 
Where to order Frederick Community College diploma?
Where to order Frederick Community College diploma?Where to order Frederick Community College diploma?
Where to order Frederick Community College diploma?
 
Bimbingan kaunseling untuk pelajar IPTA/IPTS di Malaysia
Bimbingan kaunseling untuk pelajar IPTA/IPTS di MalaysiaBimbingan kaunseling untuk pelajar IPTA/IPTS di Malaysia
Bimbingan kaunseling untuk pelajar IPTA/IPTS di Malaysia
 
Technology used in Ott data analysis project
Technology used in Ott data analysis  projectTechnology used in Ott data analysis  project
Technology used in Ott data analysis project
 
Field Diary and lab record, Importance.pdf
Field Diary and lab record, Importance.pdfField Diary and lab record, Importance.pdf
Field Diary and lab record, Importance.pdf
 
393947940-The-Dell-EMC-PowerMax-Family-Overview.pdf
393947940-The-Dell-EMC-PowerMax-Family-Overview.pdf393947940-The-Dell-EMC-PowerMax-Family-Overview.pdf
393947940-The-Dell-EMC-PowerMax-Family-Overview.pdf
 
SOFTWARE ENGINEERING-UNIT-1SOFTWARE ENGINEERING
SOFTWARE ENGINEERING-UNIT-1SOFTWARE ENGINEERINGSOFTWARE ENGINEERING-UNIT-1SOFTWARE ENGINEERING
SOFTWARE ENGINEERING-UNIT-1SOFTWARE ENGINEERING
 
How AI is Revolutionizing Data Collection.pdf
How AI is Revolutionizing Data Collection.pdfHow AI is Revolutionizing Data Collection.pdf
How AI is Revolutionizing Data Collection.pdf
 
SFBA Splunk Usergroup meeting July 17, 2024
SFBA Splunk Usergroup meeting July 17, 2024SFBA Splunk Usergroup meeting July 17, 2024
SFBA Splunk Usergroup meeting July 17, 2024
 
Selcuk Topal Arbitrum Scientific Report.pdf
Selcuk Topal Arbitrum Scientific Report.pdfSelcuk Topal Arbitrum Scientific Report.pdf
Selcuk Topal Arbitrum Scientific Report.pdf
 
Acid Base Practice Test 4- KEY.pdfkkjkjk
Acid Base Practice Test 4- KEY.pdfkkjkjkAcid Base Practice Test 4- KEY.pdfkkjkjk
Acid Base Practice Test 4- KEY.pdfkkjkjk
 
Accounting and Auditing Laws-Rules-and-Regulations
Accounting and Auditing Laws-Rules-and-RegulationsAccounting and Auditing Laws-Rules-and-Regulations
Accounting and Auditing Laws-Rules-and-Regulations
 
Data Analytics for Decision Making By District 11 Solutions
Data Analytics for Decision Making By District 11 SolutionsData Analytics for Decision Making By District 11 Solutions
Data Analytics for Decision Making By District 11 Solutions
 
Cal Girls Hotel Safari Jaipur | | Girls Call Free Drop Service
Cal Girls Hotel Safari Jaipur | | Girls Call Free Drop ServiceCal Girls Hotel Safari Jaipur | | Girls Call Free Drop Service
Cal Girls Hotel Safari Jaipur | | Girls Call Free Drop Service
 
Parcel Delivery - Intel Segmentation and Last Mile Opt.pptx
Parcel Delivery - Intel Segmentation and Last Mile Opt.pptxParcel Delivery - Intel Segmentation and Last Mile Opt.pptx
Parcel Delivery - Intel Segmentation and Last Mile Opt.pptx
 

Solr and ElasticSearch demo and speaker feb 2014

  • 1. Distributed Database Architecture Search and Indexing Nick Kabra Distributed Database Architecture 1
  • 2. Presentation Agenda Team Introduction Basics and History Use Cases & Current Usage Highlights Appendix DISCLAIMER: This is a knowledge-sharing session and not a recommendation for any specific technology / product From the web Migration Distributed Database Architecture 2
  • 3. Team Introduction Name: Designation: Experience with Search and Indexing: How long have you been working with Solr or ElasticSearch: Distributed Database Architecture 3
  • 4. Basics 1 2 3 4 • Used for Indexing and Searching • Built on top of Lucene API • Solr and ES take Lucene API and build features on top. API accessed through web server • Smaller version of Google which has indexed and ranked the web pages Search platform for Web sites. Search platform for organization. • Lucene – search engine packaged together in set of jar files Distributed Database Architecture 4
  • 5. History • Differences in design and architecture. Distributed Database Architecture 5 ES was released in 2010. Additional features. Solr released in 2008.
  • 6. Key Players: Solr and ElasticSearch 1 2 3 Latest Version= Solr 4.6.1 released on Jan 28, 2014 Collection – Main logical structure for Solr Index – Main logical structure for ES Architecture • Distributed • Fault tolerant and auto replicas • Coord: Only ElasticSearch nodes + zen discovery. Split brain. • Single leader • Automatic leader election Solr ElasticSearch (ES) Latest Version= ElasticSearch 1.0.0 released on Feb 12, 2014 Architecture • Distributed • Fault tolerant and auto replicas • Coord: Apache Solr + ZooKeeper ensemble. So quorum • Leader per shard • Automatic leader election Distributed Database Architecture 6
  • 7. Resume recommendations UseCase1 Challenge • Company ABC helps other firms hire skilled developers, project managers. Empower customers to find the right job candidate from a database of 8 million profiles. • Need fast and predictable performance. • Include geo-spatial. Success • Customer hires using the company ABC. • ABC stores searches made by customers. • Identify candidates, skills, compensation structure to enhance the customer search experience with better matches. • Make recommendations to customers on salaries, future market needs etc. • Eliminate duplicate profiles with realtime indexing and percolation. • Provides enhanced customers experience, faster responses Opportunity • Use ES as the search engine with realtime indexing and nested querying. Point Distributed Database Architecture 7
  • 8. Integration - Use Case 2 THE FULL CIRCLE Kibana Visualization engine for dynamic dashboards created in real-time or on-the-fly ElasticSearch Search, analyze in realtime Logstash Take logs, scrub, parse and enrich the data Distributed Database Architecture 8
  • 9. Chatagent for 460 million documents – Use Case 3 9 Challenge 6,000 customers from around the world use LiveChat daily to communicate with their customers from one person owned businesses to international organizations like LG, Apple, Adobe etc. LiveChat customers conduct 3.6 million queries and 220 million “get” operations per day on 460 million documents. LiveChat keeps these documents updated with 70 million indexing operations every day. Solution Advantage • Reduce query time from 2 seconds to 100 ms • Streamline updating from hours to seconds • Guarantee maximum uptime • Scale to meet the needs of 6,000 customers • Store and search on 460 million documents • Process 3.6 million queries per day • Scalability, indexing, Full text search allows users to search through chat archives • Faceting makes it possible to pull various statistics for LiveChat clients. • ES acts as single datastore, data updates available immediately - Now each of the documents is updated in LiveChat on an average of 20 to 30 times every 20 to 60 seconds. Distributed Database Architecture
  • 10. Current Uses 1 2 3 4 • Use Case 1 • Use Case 2 • Use Case 4 • Use Case 3 x • Use Case X 10Distributed Database Architecture
  • 11. Highlights Schema and config – Solrconfig.xml, es.yml – change no. of shards and replicas live Scaling - nodes autobalanced, / Solr -3755 or shard splitting /add a document Nesting (address, users & rights, boolean, parent children) Index=different types of documents and analyzer Point Node discovery and fault discovery. Zookeeper Point Multiple documents per schema and parent-child Point Percolator Point Aggregation+facets in ES /Facets in Solr Distributed Database Architecture 11
  • 12. Highlights (contd. 2) Auto-load balancer and auto-sharding Marvel metrics on 03/13/2014 Brain Split problem in ES Structured queryDSL and query control Real-time indexing /near real-time indexing Query routing and Solr 5816 to be introduced 1 2 3 4 5 6 Distributed Database Architecture 12
  • 13. ElasticSearch / Solr funnel UIMA Text analysis debugger, spell check Decision tree faceting / Drilldown Cloudera, Mapr, DataStax support Solr Filters for queries across nested documents Query handling analyzer and language, term suggester, autocomplete Realtime GET with query routing Hortonworks, Couchbase support ElasticSearch Distributed Database Architecture 13
  • 14. FROM THE WEB Web CPA This is only an FYI: Found some customers moving from Solr to ElasticSearch but could not find any article which mentioned that clients moved from ES to Solr. Caveat: No prejudice but it would be good to hear what customers say. Let us also check this site: http://www.ymc.ch/en/why-we-chose-solr-4-0-instead-of-elasticsearch http://www.mgt-commerce.com/magento-elasticsearch.html Foursquare= http://engineering.foursquare.com/2012/08/09/foursquare-now-uses- elastic-search-and-on-a-related-note-slashem-also-works-with-elastic-search/ Jetwick= http://karussell.wordpress.com/2011/02/07/why-jetwick-moved-from-solr- to-elasticsearch/ Netricos= http://www.netricos.com/blog/posts/how-we-are-using-elastic-search Stumbleupon = http://www.elasticsearch.org/case-study/stumbleupon/ UK govt. site= https://gds.blog.gov.uk/2012/08/03/from-solr-to-elasticsearch/ Wikimedia= http://thenextweb.com/insider/2014/01/06/wikimedia-will-replace- search-elasticsearch-beta-users-february-users-march-april/#!xDKnd Distributed Database Architecture 14
  • 15. 2 Parts of a whole – The Math Solr performs very well on small indexes that don’t change very often 1 Scalability, auto-sharding, GUI admin, schemaless, real-time, nested queries, routing and the way indexing and queries are handled which provide faster execution of queries and better indexing provide a distinct advantage to using ES 2 Solr ElasticSearch Distributed Database Architecture 15
  • 16. Migration Step 1 Use river plugin to migrate from existing Solr to ES. Step 2 Pulls the content from existing Solr cluster and index it in ES Step 3 When you decide to switch to Elasticsearch permanently, you would obviously switch your indexing to directly index content from your sources to Elasticsearch. Keeping Solr in the middle is not a recommended setup. Distributed Database Architecture 16
  • 17. If we have a small site and need search features without the distributed bells-and-whistles, both Solr and ElasticSearch are efficient If we are planning a large installation that requires running distributed search with nesting, scalability, sharding, real-time ElasticSearch can do a better job. Conclusion Distributed Database Architecture 17 Both products trying to catch-up based on other product’s capabilities
  • 18. Where do we go from here ? --------------------------------------- The best way to define this is: Some possible next steps…. Question to ask Distributed Database Architecture 18
  • 19. Thank you! 201-925-0488 nikkabs@gmail.com Architecture – Global Head Distributed Database Architecture 19