SlideShare a Scribd company logo
Search @twitter 
Michael Busch 
Search @twitter 
‣ Introduction 
- Search Architecture 
- Lucene Extensions 
- Outlook
Search at Twitter: Presented by Michael Busch, Twitter
Twitter has more than 284 million 
monthly active users.
500 million tweets are sent per day.
More than 300 billion tweets have been 
sent since company founding in 2006.
Tweets-per-second record: 
one-second peak of 143,199 TPS.
More than 2 billion search queries per 
Search @twitter 
- Introduction 
‣ Search Architecture 
- Lucene Extensions 
- Outlook
Search at Twitter: Presented by Michael Busch, Twitter
Search Architecture
RT index 
Search Architecture 
RT stream 
RT index 
RT index 
Tweet archive 
RT index 
Search Architecture 
RT index 
RT index 
Updates Deletes/ 
Engagement (e.g. retweets/favs) 
RT index 
Search Architecture 
RT index 
graph Social 
RT index 
• Blender is our Thrift 
service aggregator 
• Queries multiple 
Earlybirds, merges results 
Search Architecture 
RT index 
Search Architecture 
RT index 
• For historic reasons, these used 
to be entirely different codebases, 
but had similar features/ 
• Over time cross-dependencies 
were introduced to share code 
Search Architecture 
RT index 
• New Lucene extension package 
• This package is truly generic and 
has no dependency on an actual 
• It contains Twitter’s extensions for 
real-time search, a thin segment 
management layer and other 
Search @twitter 
- Introduction 
- Search Architecture 
‣ Lucene Extensions 
- Outlook
Search at Twitter: Presented by Michael Busch, Twitter
Lucene Extensions
Lucene Extension Library 
• Abstraction layer for Lucene index segments 
• Real-time writer for in-memory index segments 
• Schema-based Lucene document factory 
• Real-time faceting
Lucene Extension Library 
• API layer for Lucene segments 
• *IndexSegmentWriter 
• *IndexSegmentAtomicReader 
• Two implementations 
• In-memory: RealtimeIndexSegmentWriter (and reader) 
• On-disk: LuceneIndexSegmentWriter (and reader)
Lucene Extension Library 
• IndexSegments can be built ... 
• in realtime 
• on Mesos or Hadoop (Mapreduce) 
• locally on serving machines 
• Cluster-management code that deals with IndexSegments 
• Share segments across serving machines using HDFS 
• Can rebuild segments (e.g. to upgrade Lucene version, change data 
schema, etc.)
Lucene Extension Library 
HDFS EEEaararlyrlylbybbirirdirdd 
Hadoop (MR) 
RT pipeline
• Modified Lucene index implementation optimized for realtime search 
• IndexWriter buffer is searchable (no need to flush to allow searching) 
• In-memory 
• Lock-free concurrency model for best performance
Concurrency - Definitions 
• Pessimistic locking 
• A thread holds an exclusive lock on a resource, while an action is 
performed [mutual exclusion] 
• Usually used when conflicts are expected to be likely 
• Optimistic locking 
• Operations are tried to be performed atomically without holding a lock; 
conflicts can be detected; retry logic is often used in case of conflicts 
• Usually used when conflicts are expected to be the exception
Concurrency - Definitions 
• Non-blocking algorithm 
Ensures, that threads competing for shared resources do not have their 
execution indefinitely postponed by mutual exclusion. 
• Lock-free algorithm 
A non-blocking algorithm is lock-free if there is guaranteed system-wide 
• Wait-free algorithm 
A non-blocking algorithm is wait-free, if there is guaranteed per-thread 
* Source: Wikipedia
• Having a single writer thread simplifies our problem: no locks have to be used 
to protect data structures from corruption (only one thread modifies data) 
• But: we have to make sure that all readers always see a consistent state of 
all data structures -> this is much harder than it sounds! 
• In Java, it is not guaranteed that one thread will see changes that another 
thread makes in program execution order, unless the same memory barrier is 
crossed by both threads -> safe publication 
• Safe publication can be achieved in different, subtle ways. Read the great 
book “Java concurrency in practice” by Brian Goetz for more information!
Java Memory Model 
• Program order rule 
Each action in a thread happens-before every action in that thread that comes 
later in the program order. 
• Volatile variable rule 
A write to a volatile field happens-before every subsequent read of that same 
• Transitivity 
If A happens-before B, and B happens-before C, then A happens-before C. 
* Source: Brian Goetz: Java Concurrency in Practice
RAM 0 
int x; 
Thread 1 Thread 2 
Cache 5 
RAM 0 
int x; 
Thread 1 Thread 2 
x = 5; 
Thread A writes x=5 to cache 
Cache 5 
RAM 0 
int x; 
Thread 1 Thread 2 
x = 5; 
time while(x != 5); 
This condition will likely 
never become false!
RAM 0 
int x; 
Thread 1 Thread 2 
RAM 0 
int x; 
Thread A writes b=1 to RAM, 
because b is volatile 
5 x = 5; 
Thread 1 Thread 2 
volatile int b; 
b = 1;
RAM 0 
int x; 
5 x = 5; 
Thread 1 Thread 2 
volatile int b; 
b = 1; 
Read volatile b 
int dummy = b; 
while(x != 5);
RAM 0 
int x; 
5 x = 5; 
Thread 1 Thread 2 
volatile int b; 
b = 1; 
int dummy = b; 
while(x != 5); 
• Program order rule: Each action in a thread happens-before every action in 
that thread that comes later in the program order.
RAM 0 
int x; 
5 x = 5; 
Thread 1 Thread 2 
volatile int b; 
b = 1; 
int dummy = b; 
while(x != 5); 
• Volatile variable rule: A write to a volatile field happens-before every 
subsequent read of that same field.
RAM 0 
int x; 
5 x = 5; 
Thread 1 Thread 2 
volatile int b; 
b = 1; 
int dummy = b; 
while(x != 5); 
• Transitivity: If A happens-before B, and B happens-before C, then A 
happens-before C.
RAM 0 
int x; 
5 x = 5; 
Thread 1 Thread 2 
volatile int b; 
b = 1; 
int dummy = b; 
while(x != 5); 
This condition will be 
false, i.e. x==5 
• Note: x itself doesn’t have to be volatile. There can be many variables like x, 
but we need only a single volatile field.
RAM 0 
int x; 
5 x = 5; 
Thread 1 Thread 2 
volatile int b; 
b = 1; 
int dummy = b; 
while(x != 5); 
Memory barrier 
• Note: x itself doesn’t have to be volatile. There can be many variables like x, 
but we need only a single volatile field.
Search at Twitter: Presented by Michael Busch, Twitter
RAM 0 
int x; 
5 x = 5; 
Thread 1 Thread 2 
volatile int b; 
b = 1; 
int dummy = b; 
while(x != 5); 
Memory barrier 
• Note: x itself doesn’t have to be volatile. There can be many variables like x, 
but we need only a single volatile field.
IndexWriter IndexReader 
write 100 docs 
maxDoc = 100 
in read maxDoc 
search upto maxDoc 
write more docs 
maxDoc is volatile
IndexWriter IndexReader 
write 100 docs 
maxDoc = 100 
in read maxDoc 
search upto maxDoc 
write more docs 
maxDoc is volatile 
• Only maxDoc is volatile. All other fields that IW writes to and IR reads from 
don’t need to be!
• Not a single exclusive lock 
• Writer thread can always make progress 
• Optimistic locking (retry-logic) in a few places for searcher thread 
• Retry logic very simple and guaranteed to always make progress
In-memory Real-time Index 
• Highly optimized for GC - all data is stored in blocked native arrays 
• v1: Optimized for tweets with a term position limit of 255 
• v2: Support for 32 bit positions without performance degradation 
• v2: Basic support for out-of-order posting list inserts
In-memory Real-time Index 
• Highly optimized for GC - all data is stored in blocked native arrays 
• v1: Optimized for tweets with a term position limit of 255 
• v2: Support for 32 bit positions without performance degradation 
• v2: Basic support for out-of-order posting list inserts
In-memory Real-time Index 
• RT term dictionary 
• Term lookups using a lock-free hashtable in O(1) 
• v2: Additional probabilistic, lock-free skip list maintains ordering on terms 
• Perfect skip list not an option: out-of-order inserts would require 
rebalancing, which is impractical with our lock-free index 
• In a probabilistic skip list the tower height of a new (out-of-order) item can 
be determined without knowing its insert position by simply rolling a dice
In-memory Real-time Index 
• Perfect skip list
In-memory Real-time Index 
• Perfect skip list 
Inserting a new element in the middle of this 
skip list requires re-balancing the towers.
In-memory Real-time Index 
• Probabilistic skip list
In-memory Real-time Index 
• Probabilistic skip list Tower height determined by rolling a dice 
BEFORE knowing the insert location; tower height 
never has to change for an element, simplifying 
memory allocation and concurrency.
Schema-based Document factory 
• Apps provide one ThriftSchema per index and create a ThriftDocument for 
each document 
• SchemaDocumentFactory translates ThriftDocument -> Lucene Document 
using the Schema 
• Default field values 
• Extended field settings 
• Type-system on top of DocValues 
• Validation
Schema-based Document factory 
• Validation 
• Fill in default values 
• Apply correct Lucene 
field settings
Schema-based Document factory 
• Validation 
• Fill in default values 
• Apply correct Lucene 
field settings 
Decouples core package from 
specific product/index. Similar 
to Solr/ElasticSearch.
Search @twitter 
- Introduction 
- Search Architecture 
- Lucene Extensions 
‣ Outlook
Search at Twitter: Presented by Michael Busch, Twitter
• Support for parallel (sliced) segments to support partial segment rebuilds 
and other cool posting list update patterns 
• Add remaining missing Lucene features to RT index 
• Index term statistics for ranking 
• Term vectors 
• Stored fields
Michael Busch 
Search at Twitter: Presented by Michael Busch, Twitter
Backup Slides
Searching for top entities within Tweets 
• Task: Find the best photos in a subset of tweets 
• We could use a Lucene index, where each photo is a document 
• Problem: How to update existing documents when the same photos are 
tweeted again? 
• In-place posting list updates are hard 
• Lucene’s updateDocument() is a delete/add operation - expensive and not 
Searching for top entities within Tweets 
• Task: Find the best photos in a subset of tweets 
• Could we use our existing time-ordered tweet index? 
• Facets!
Searching for top entities within Tweets 
Query Doc ids 
Term id Term label 
Doc id index Document 
Doc id Term ids
Storing tweet metadata 
Doc id index Term ids
5 15 9000 9002 100000 100090 
doc id 
Term ids 
Top-k heap 
Id Count 
48239 8 
31241 2 
Searching for top entities within Tweets
5 15 9000 9002 100000 100090 
doc id 
Term ids 
Top-k heap 
Id Count 
48239 15 
31241 12 
85932 8 
6748 3 
Searching for top entities within Tweets
Searching for top entities within Tweets 
5 15 9000 9002 100000 100090 
doc id 
Term ids 
Top-k heap 
Id Count 
48239 15 
31241 12 
85932 8 
6748 3 
Weighted counts (from 
engagement features) used 
for relevance scoring
Searching for top entities within Tweets 
5 15 9000 9002 100000 100090 
doc id 
Term ids 
Top-k heap 
Id Count 
48239 15 
31241 12 
85932 8 
6748 3 
All query operators can be 
used. E.g. find best photos in 
San Francisco tweeted by 
people I follow
Searching for top entities within Tweets 
Term id index Term label
Searching for top entities within Tweets 
Id Count Label Count 45 23 15 11 8 5 
48239 45 
31241 23 
85932 15 
6748 11 
74294 8 
3728 5 
• Indexing tweet entities (e.g. photos) as facets allows to search and rank top-entities 
using a tweets index 
• All query operators supported 
• Documents don’t need to be reindexed 
• Approach reusable for different use cases, e.g.: best vines, hashtags, 
@mentions, etc.

More Related Content

What's hot

Dictionary Based Annotation at Scale with Spark by Sujit Pal
Dictionary Based Annotation at Scale with Spark by Sujit PalDictionary Based Annotation at Scale with Spark by Sujit Pal
Dictionary Based Annotation at Scale with Spark by Sujit Pal
Spark Summit
Real Time search using Spark and Elasticsearch
Real Time search using Spark and ElasticsearchReal Time search using Spark and Elasticsearch
Real Time search using Spark and Elasticsearch
Data Engineering with Solr and Spark
Data Engineering with Solr and SparkData Engineering with Solr and Spark
Data Engineering with Solr and Spark
Solr4 nosql search_server_2013
Solr4 nosql search_server_2013Solr4 nosql search_server_2013
Solr4 nosql search_server_2013
Lucidworks (Archived)
SolrCloud in Public Cloud: Scaling Compute Independently from Storage - Ilan ...
SolrCloud in Public Cloud: Scaling Compute Independently from Storage - Ilan ...SolrCloud in Public Cloud: Scaling Compute Independently from Storage - Ilan ...
SolrCloud in Public Cloud: Scaling Compute Independently from Storage - Ilan ...
Parallel SQL and Streaming Expressions in Apache Solr 6
Parallel SQL and Streaming Expressions in Apache Solr 6Parallel SQL and Streaming Expressions in Apache Solr 6
Parallel SQL and Streaming Expressions in Apache Solr 6
Shalin Shekhar Mangar
Improved Search with Lucene 4.0 - Robert Muir
Improved Search with Lucene 4.0 - Robert MuirImproved Search with Lucene 4.0 - Robert Muir
Improved Search with Lucene 4.0 - Robert Muir
Apache Solr/Lucene Internals by Anatoliy Sokolenko
Apache Solr/Lucene Internals  by Anatoliy SokolenkoApache Solr/Lucene Internals  by Anatoliy Sokolenko
Apache Solr/Lucene Internals by Anatoliy Sokolenko
Understanding Lucene Search Performance
Understanding Lucene Search PerformanceUnderstanding Lucene Search Performance
Understanding Lucene Search Performance
Lucidworks (Archived)
Evolving The Optimal Relevancy Scoring Model at Presented by Simon ...
Evolving The Optimal Relevancy Scoring Model at Presented by Simon ...Evolving The Optimal Relevancy Scoring Model at Presented by Simon ...
Evolving The Optimal Relevancy Scoring Model at Presented by Simon ...
LuceneRDD for (Geospatial) Search and Entity Linkage
LuceneRDD for (Geospatial) Search and Entity LinkageLuceneRDD for (Geospatial) Search and Entity Linkage
LuceneRDD for (Geospatial) Search and Entity Linkage
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scale
H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...
H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...
H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...
ElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learnedElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learned
Dictionary based Annotation at Scale with Spark, SolrTextTagger and OpenNLP
Dictionary based Annotation at Scale with Spark, SolrTextTagger and OpenNLPDictionary based Annotation at Scale with Spark, SolrTextTagger and OpenNLP
Dictionary based Annotation at Scale with Spark, SolrTextTagger and OpenNLP
Sujit Pal
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucene
Feature Hashing for Scalable Machine Learning: Spark Summit East talk by Nick...
Feature Hashing for Scalable Machine Learning: Spark Summit East talk by Nick...Feature Hashing for Scalable Machine Learning: Spark Summit East talk by Nick...
Feature Hashing for Scalable Machine Learning: Spark Summit East talk by Nick...
Spark Summit
Integrating Spark and Solr-(Timothy Potter, Lucidworks)
Integrating Spark and Solr-(Timothy Potter, Lucidworks)Integrating Spark and Solr-(Timothy Potter, Lucidworks)
Integrating Spark and Solr-(Timothy Potter, Lucidworks)
Spark Summit
Spark DataFrames: Simple and Fast Analytics on Structured Data at Spark Summi...
Spark DataFrames: Simple and Fast Analytics on Structured Data at Spark Summi...Spark DataFrames: Simple and Fast Analytics on Structured Data at Spark Summi...
Spark DataFrames: Simple and Fast Analytics on Structured Data at Spark Summi...
Your Big Data Stack is Too Big!: Presented by Timothy Potter, Lucidworks
Your Big Data Stack is Too Big!: Presented by Timothy Potter, LucidworksYour Big Data Stack is Too Big!: Presented by Timothy Potter, Lucidworks
Your Big Data Stack is Too Big!: Presented by Timothy Potter, Lucidworks

What's hot (20)

Dictionary Based Annotation at Scale with Spark by Sujit Pal
Dictionary Based Annotation at Scale with Spark by Sujit PalDictionary Based Annotation at Scale with Spark by Sujit Pal
Dictionary Based Annotation at Scale with Spark by Sujit Pal
Real Time search using Spark and Elasticsearch
Real Time search using Spark and ElasticsearchReal Time search using Spark and Elasticsearch
Real Time search using Spark and Elasticsearch
Data Engineering with Solr and Spark
Data Engineering with Solr and SparkData Engineering with Solr and Spark
Data Engineering with Solr and Spark
Solr4 nosql search_server_2013
Solr4 nosql search_server_2013Solr4 nosql search_server_2013
Solr4 nosql search_server_2013
SolrCloud in Public Cloud: Scaling Compute Independently from Storage - Ilan ...
SolrCloud in Public Cloud: Scaling Compute Independently from Storage - Ilan ...SolrCloud in Public Cloud: Scaling Compute Independently from Storage - Ilan ...
SolrCloud in Public Cloud: Scaling Compute Independently from Storage - Ilan ...
Parallel SQL and Streaming Expressions in Apache Solr 6
Parallel SQL and Streaming Expressions in Apache Solr 6Parallel SQL and Streaming Expressions in Apache Solr 6
Parallel SQL and Streaming Expressions in Apache Solr 6
Improved Search with Lucene 4.0 - Robert Muir
Improved Search with Lucene 4.0 - Robert MuirImproved Search with Lucene 4.0 - Robert Muir
Improved Search with Lucene 4.0 - Robert Muir
Apache Solr/Lucene Internals by Anatoliy Sokolenko
Apache Solr/Lucene Internals  by Anatoliy SokolenkoApache Solr/Lucene Internals  by Anatoliy Sokolenko
Apache Solr/Lucene Internals by Anatoliy Sokolenko
Understanding Lucene Search Performance
Understanding Lucene Search PerformanceUnderstanding Lucene Search Performance
Understanding Lucene Search Performance
Evolving The Optimal Relevancy Scoring Model at Presented by Simon ...
Evolving The Optimal Relevancy Scoring Model at Presented by Simon ...Evolving The Optimal Relevancy Scoring Model at Presented by Simon ...
Evolving The Optimal Relevancy Scoring Model at Presented by Simon ...
LuceneRDD for (Geospatial) Search and Entity Linkage
LuceneRDD for (Geospatial) Search and Entity LinkageLuceneRDD for (Geospatial) Search and Entity Linkage
LuceneRDD for (Geospatial) Search and Entity Linkage
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scale
H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...
H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...
H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...
ElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learnedElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learned
Dictionary based Annotation at Scale with Spark, SolrTextTagger and OpenNLP
Dictionary based Annotation at Scale with Spark, SolrTextTagger and OpenNLPDictionary based Annotation at Scale with Spark, SolrTextTagger and OpenNLP
Dictionary based Annotation at Scale with Spark, SolrTextTagger and OpenNLP
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucene
Feature Hashing for Scalable Machine Learning: Spark Summit East talk by Nick...
Feature Hashing for Scalable Machine Learning: Spark Summit East talk by Nick...Feature Hashing for Scalable Machine Learning: Spark Summit East talk by Nick...
Feature Hashing for Scalable Machine Learning: Spark Summit East talk by Nick...
Integrating Spark and Solr-(Timothy Potter, Lucidworks)
Integrating Spark and Solr-(Timothy Potter, Lucidworks)Integrating Spark and Solr-(Timothy Potter, Lucidworks)
Integrating Spark and Solr-(Timothy Potter, Lucidworks)
Spark DataFrames: Simple and Fast Analytics on Structured Data at Spark Summi...
Spark DataFrames: Simple and Fast Analytics on Structured Data at Spark Summi...Spark DataFrames: Simple and Fast Analytics on Structured Data at Spark Summi...
Spark DataFrames: Simple and Fast Analytics on Structured Data at Spark Summi...
Your Big Data Stack is Too Big!: Presented by Timothy Potter, Lucidworks
Your Big Data Stack is Too Big!: Presented by Timothy Potter, LucidworksYour Big Data Stack is Too Big!: Presented by Timothy Potter, Lucidworks
Your Big Data Stack is Too Big!: Presented by Timothy Potter, Lucidworks

Viewers also liked

Introduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesIntroduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and Usecases
Rahul Jain
Realtime Search at Twitter - Michael Busch
Realtime Search at Twitter - Michael BuschRealtime Search at Twitter - Michael Busch
Realtime Search at Twitter - Michael Busch
Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.
Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.
Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.
What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0
Erik Hatcher
Type-Safe MongoDB query (Lift Rogue query)
Type-Safe MongoDB query (Lift Rogue query)Type-Safe MongoDB query (Lift Rogue query)
Type-Safe MongoDB query (Lift Rogue query)
Knoldus Inc.
Faceting with Lucene Block Join Query - Lucene/Solr Revolution 2014
Faceting with Lucene Block Join Query - Lucene/Solr Revolution 2014Faceting with Lucene Block Join Query - Lucene/Solr Revolution 2014
Faceting with Lucene Block Join Query - Lucene/Solr Revolution 2014
Grid Dynamics
This Ain't Your Parent's Search Engine: Presented by Grant Ingersoll, Lucidworks
This Ain't Your Parent's Search Engine: Presented by Grant Ingersoll, LucidworksThis Ain't Your Parent's Search Engine: Presented by Grant Ingersoll, Lucidworks
This Ain't Your Parent's Search Engine: Presented by Grant Ingersoll, Lucidworks
Building Smarter Search Applications Using Built-In Knowledge Graphs and Quer...
Building Smarter Search Applications Using Built-In Knowledge Graphs and Quer...Building Smarter Search Applications Using Built-In Knowledge Graphs and Quer...
Building Smarter Search Applications Using Built-In Knowledge Graphs and Quer...
Lucene/Solr Spatial in 2015: Presented by David Smiley
Lucene/Solr Spatial in 2015: Presented by David SmileyLucene/Solr Spatial in 2015: Presented by David Smiley
Lucene/Solr Spatial in 2015: Presented by David Smiley
SearchHub - How to Spend Your Summer Keeping it Real: Presented by Grant Inge...
SearchHub - How to Spend Your Summer Keeping it Real: Presented by Grant Inge...SearchHub - How to Spend Your Summer Keeping it Real: Presented by Grant Inge...
SearchHub - How to Spend Your Summer Keeping it Real: Presented by Grant Inge...
Search Architecture at Evernote: Presented by Christian Kohlschütter, Evernote
Search Architecture at Evernote: Presented by Christian Kohlschütter, EvernoteSearch Architecture at Evernote: Presented by Christian Kohlschütter, Evernote
Search Architecture at Evernote: Presented by Christian Kohlschütter, Evernote
Lucene/Solr Revolution 2015 Opening Keynote with Lucidworks CEO Will Hayes
Lucene/Solr Revolution 2015 Opening Keynote with Lucidworks CEO Will HayesLucene/Solr Revolution 2015 Opening Keynote with Lucidworks CEO Will Hayes
Lucene/Solr Revolution 2015 Opening Keynote with Lucidworks CEO Will Hayes
A Survey of Elasticsearch Usage
A Survey of Elasticsearch UsageA Survey of Elasticsearch Usage
A Survey of Elasticsearch Usage
Greg Brown
Evolving Search Relevancy: Presented by James Strassburg, Direct Supply
Evolving Search Relevancy: Presented by James Strassburg, Direct SupplyEvolving Search Relevancy: Presented by James Strassburg, Direct Supply
Evolving Search Relevancy: Presented by James Strassburg, Direct Supply
MongoDB: Queries and Aggregation Framework with NBA Game Data
MongoDB: Queries and Aggregation Framework with NBA Game DataMongoDB: Queries and Aggregation Framework with NBA Game Data
MongoDB: Queries and Aggregation Framework with NBA Game Data
Valeri Karpov
The Many Facets of Apache Solr - Yonik Seeley
The Many Facets of Apache Solr - Yonik SeeleyThe Many Facets of Apache Solr - Yonik Seeley
The Many Facets of Apache Solr - Yonik Seeley
Webinar: Ecommerce, Rules, and Relevance
Webinar: Ecommerce, Rules, and RelevanceWebinar: Ecommerce, Rules, and Relevance
Webinar: Ecommerce, Rules, and Relevance
Parallel Computing with SolrCloud: Presented by Joel Bernstein, Alfresco
Parallel Computing with SolrCloud: Presented by Joel Bernstein, AlfrescoParallel Computing with SolrCloud: Presented by Joel Bernstein, Alfresco
Parallel Computing with SolrCloud: Presented by Joel Bernstein, Alfresco
Autocomplete Multi-Language Search Using Ngram and EDismax Phrase Queries: Pr...
Autocomplete Multi-Language Search Using Ngram and EDismax Phrase Queries: Pr...Autocomplete Multi-Language Search Using Ngram and EDismax Phrase Queries: Pr...
Autocomplete Multi-Language Search Using Ngram and EDismax Phrase Queries: Pr...

Viewers also liked (20)

Introduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesIntroduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and Usecases
Realtime Search at Twitter - Michael Busch
Realtime Search at Twitter - Michael BuschRealtime Search at Twitter - Michael Busch
Realtime Search at Twitter - Michael Busch
Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.
Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.
Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.
What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0
Type-Safe MongoDB query (Lift Rogue query)
Type-Safe MongoDB query (Lift Rogue query)Type-Safe MongoDB query (Lift Rogue query)
Type-Safe MongoDB query (Lift Rogue query)
11 lucene
11 lucene11 lucene
11 lucene
Faceting with Lucene Block Join Query - Lucene/Solr Revolution 2014
Faceting with Lucene Block Join Query - Lucene/Solr Revolution 2014Faceting with Lucene Block Join Query - Lucene/Solr Revolution 2014
Faceting with Lucene Block Join Query - Lucene/Solr Revolution 2014
This Ain't Your Parent's Search Engine: Presented by Grant Ingersoll, Lucidworks
This Ain't Your Parent's Search Engine: Presented by Grant Ingersoll, LucidworksThis Ain't Your Parent's Search Engine: Presented by Grant Ingersoll, Lucidworks
This Ain't Your Parent's Search Engine: Presented by Grant Ingersoll, Lucidworks
Building Smarter Search Applications Using Built-In Knowledge Graphs and Quer...
Building Smarter Search Applications Using Built-In Knowledge Graphs and Quer...Building Smarter Search Applications Using Built-In Knowledge Graphs and Quer...
Building Smarter Search Applications Using Built-In Knowledge Graphs and Quer...
Lucene/Solr Spatial in 2015: Presented by David Smiley
Lucene/Solr Spatial in 2015: Presented by David SmileyLucene/Solr Spatial in 2015: Presented by David Smiley
Lucene/Solr Spatial in 2015: Presented by David Smiley
SearchHub - How to Spend Your Summer Keeping it Real: Presented by Grant Inge...
SearchHub - How to Spend Your Summer Keeping it Real: Presented by Grant Inge...SearchHub - How to Spend Your Summer Keeping it Real: Presented by Grant Inge...
SearchHub - How to Spend Your Summer Keeping it Real: Presented by Grant Inge...
Search Architecture at Evernote: Presented by Christian Kohlschütter, Evernote
Search Architecture at Evernote: Presented by Christian Kohlschütter, EvernoteSearch Architecture at Evernote: Presented by Christian Kohlschütter, Evernote
Search Architecture at Evernote: Presented by Christian Kohlschütter, Evernote
Lucene/Solr Revolution 2015 Opening Keynote with Lucidworks CEO Will Hayes
Lucene/Solr Revolution 2015 Opening Keynote with Lucidworks CEO Will HayesLucene/Solr Revolution 2015 Opening Keynote with Lucidworks CEO Will Hayes
Lucene/Solr Revolution 2015 Opening Keynote with Lucidworks CEO Will Hayes
A Survey of Elasticsearch Usage
A Survey of Elasticsearch UsageA Survey of Elasticsearch Usage
A Survey of Elasticsearch Usage
Evolving Search Relevancy: Presented by James Strassburg, Direct Supply
Evolving Search Relevancy: Presented by James Strassburg, Direct SupplyEvolving Search Relevancy: Presented by James Strassburg, Direct Supply
Evolving Search Relevancy: Presented by James Strassburg, Direct Supply
MongoDB: Queries and Aggregation Framework with NBA Game Data
MongoDB: Queries and Aggregation Framework with NBA Game DataMongoDB: Queries and Aggregation Framework with NBA Game Data
MongoDB: Queries and Aggregation Framework with NBA Game Data
The Many Facets of Apache Solr - Yonik Seeley
The Many Facets of Apache Solr - Yonik SeeleyThe Many Facets of Apache Solr - Yonik Seeley
The Many Facets of Apache Solr - Yonik Seeley
Webinar: Ecommerce, Rules, and Relevance
Webinar: Ecommerce, Rules, and RelevanceWebinar: Ecommerce, Rules, and Relevance
Webinar: Ecommerce, Rules, and Relevance
Parallel Computing with SolrCloud: Presented by Joel Bernstein, Alfresco
Parallel Computing with SolrCloud: Presented by Joel Bernstein, AlfrescoParallel Computing with SolrCloud: Presented by Joel Bernstein, Alfresco
Parallel Computing with SolrCloud: Presented by Joel Bernstein, Alfresco
Autocomplete Multi-Language Search Using Ngram and EDismax Phrase Queries: Pr...
Autocomplete Multi-Language Search Using Ngram and EDismax Phrase Queries: Pr...Autocomplete Multi-Language Search Using Ngram and EDismax Phrase Queries: Pr...
Autocomplete Multi-Language Search Using Ngram and EDismax Phrase Queries: Pr...

Similar to Search at Twitter: Presented by Michael Busch, Twitter

Pune-Cocoa: Blocks and GCD
Pune-Cocoa: Blocks and GCDPune-Cocoa: Blocks and GCD
Pune-Cocoa: Blocks and GCD
Prashant Rane
Swift 2 Under the Hood - Gotober 2015
Swift 2 Under the Hood - Gotober 2015Swift 2 Under the Hood - Gotober 2015
Swift 2 Under the Hood - Gotober 2015
Alex Blewitt
Storm presentation
Storm presentationStorm presentation
Storm presentation
Shyam Raj
Jvm memory model
Jvm memory modelJvm memory model
Jvm memory model
Yoav Avrahami
Groovy concurrency
Groovy concurrencyGroovy concurrency
Groovy concurrency
Alex Miller
Игорь Фесенко "Direction of C# as a High-Performance Language"
Игорь Фесенко "Direction of C# as a High-Performance Language"Игорь Фесенко "Direction of C# as a High-Performance Language"
Игорь Фесенко "Direction of C# as a High-Performance Language"
Ehcache 3 @ BruJUG
Ehcache 3 @ BruJUGEhcache 3 @ BruJUG
Ehcache 3 @ BruJUG
Louis Jacomet
London devops logging
London devops loggingLondon devops logging
London devops logging
Tomas Doran
Scaling ingest pipelines with high performance computing principles - Rajiv K...
Scaling ingest pipelines with high performance computing principles - Rajiv K...Scaling ingest pipelines with high performance computing principles - Rajiv K...
Scaling ingest pipelines with high performance computing principles - Rajiv K...
Verification with LoLA: 4 Using LoLA
Verification with LoLA: 4 Using LoLAVerification with LoLA: 4 Using LoLA
Verification with LoLA: 4 Using LoLA
Universität Rostock
Everything I Ever Learned About JVM Performance Tuning @Twitter
Everything I Ever Learned About JVM Performance Tuning @TwitterEverything I Ever Learned About JVM Performance Tuning @Twitter
Everything I Ever Learned About JVM Performance Tuning @Twitter
Attila Szegedi
A Closer Look at Apache Kudu
A Closer Look at Apache KuduA Closer Look at Apache Kudu
A Closer Look at Apache Kudu
Andriy Zabavskyy
Bringing Concurrency to Ruby - RubyConf India 2014
Bringing Concurrency to Ruby - RubyConf India 2014Bringing Concurrency to Ruby - RubyConf India 2014
Bringing Concurrency to Ruby - RubyConf India 2014
Charles Nutter
.NET UY Meetup 7 - CLR Memory by Fabian Alves
.NET UY Meetup 7 - CLR Memory by Fabian Alves.NET UY Meetup 7 - CLR Memory by Fabian Alves
.NET UY Meetup 7 - CLR Memory by Fabian Alves
.NET UY Meetup
Ruby and Distributed Storage Systems
Ruby and Distributed Storage SystemsRuby and Distributed Storage Systems
Ruby and Distributed Storage Systems
Lecture 15 run timeenvironment_2
Lecture 15 run timeenvironment_2Lecture 15 run timeenvironment_2
Lecture 15 run timeenvironment_2
Iffat Anjum
Spil Storage Platform (Erlang) @ EUG-NL
Spil Storage Platform (Erlang) @ EUG-NLSpil Storage Platform (Erlang) @ EUG-NL
Spil Storage Platform (Erlang) @ EUG-NL
Thijs Terlouw
Sphinx at Craigslist in 2012
Sphinx at Craigslist in 2012Sphinx at Craigslist in 2012
Sphinx at Craigslist in 2012
Jeremy Zawodny

Similar to Search at Twitter: Presented by Michael Busch, Twitter (20)

Pune-Cocoa: Blocks and GCD
Pune-Cocoa: Blocks and GCDPune-Cocoa: Blocks and GCD
Pune-Cocoa: Blocks and GCD
Swift 2 Under the Hood - Gotober 2015
Swift 2 Under the Hood - Gotober 2015Swift 2 Under the Hood - Gotober 2015
Swift 2 Under the Hood - Gotober 2015
Storm presentation
Storm presentationStorm presentation
Storm presentation
Jvm memory model
Jvm memory modelJvm memory model
Jvm memory model
Groovy concurrency
Groovy concurrencyGroovy concurrency
Groovy concurrency
Игорь Фесенко "Direction of C# as a High-Performance Language"
Игорь Фесенко "Direction of C# as a High-Performance Language"Игорь Фесенко "Direction of C# as a High-Performance Language"
Игорь Фесенко "Direction of C# as a High-Performance Language"
Ehcache 3 @ BruJUG
Ehcache 3 @ BruJUGEhcache 3 @ BruJUG
Ehcache 3 @ BruJUG
London devops logging
London devops loggingLondon devops logging
London devops logging
Scaling ingest pipelines with high performance computing principles - Rajiv K...
Scaling ingest pipelines with high performance computing principles - Rajiv K...Scaling ingest pipelines with high performance computing principles - Rajiv K...
Scaling ingest pipelines with high performance computing principles - Rajiv K...
Verification with LoLA: 4 Using LoLA
Verification with LoLA: 4 Using LoLAVerification with LoLA: 4 Using LoLA
Verification with LoLA: 4 Using LoLA
Everything I Ever Learned About JVM Performance Tuning @Twitter
Everything I Ever Learned About JVM Performance Tuning @TwitterEverything I Ever Learned About JVM Performance Tuning @Twitter
Everything I Ever Learned About JVM Performance Tuning @Twitter
A Closer Look at Apache Kudu
A Closer Look at Apache KuduA Closer Look at Apache Kudu
A Closer Look at Apache Kudu
Bringing Concurrency to Ruby - RubyConf India 2014
Bringing Concurrency to Ruby - RubyConf India 2014Bringing Concurrency to Ruby - RubyConf India 2014
Bringing Concurrency to Ruby - RubyConf India 2014
.NET UY Meetup 7 - CLR Memory by Fabian Alves
.NET UY Meetup 7 - CLR Memory by Fabian Alves.NET UY Meetup 7 - CLR Memory by Fabian Alves
.NET UY Meetup 7 - CLR Memory by Fabian Alves
Ruby and Distributed Storage Systems
Ruby and Distributed Storage SystemsRuby and Distributed Storage Systems
Ruby and Distributed Storage Systems
Lecture 15 run timeenvironment_2
Lecture 15 run timeenvironment_2Lecture 15 run timeenvironment_2
Lecture 15 run timeenvironment_2
Spil Storage Platform (Erlang) @ EUG-NL
Spil Storage Platform (Erlang) @ EUG-NLSpil Storage Platform (Erlang) @ EUG-NL
Spil Storage Platform (Erlang) @ EUG-NL
Sphinx at Craigslist in 2012
Sphinx at Craigslist in 2012Sphinx at Craigslist in 2012
Sphinx at Craigslist in 2012

More from Lucidworks

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce Strategy
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in Salesforce
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant Products
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized Experiences
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and Rosette
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 Research
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise Search
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and Beyond

More from Lucidworks (20)

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce Strategy
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in Salesforce
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant Products
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized Experiences
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and Rosette
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 Research
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise Search
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and Beyond

Recently uploaded

How UiPath Discovery Suite supports identification of Agentic Process Automat...
How UiPath Discovery Suite supports identification of Agentic Process Automat...How UiPath Discovery Suite supports identification of Agentic Process Automat...
How UiPath Discovery Suite supports identification of Agentic Process Automat...
Generative AI technology is a fascinating field that focuses on creating comp...
Generative AI technology is a fascinating field that focuses on creating comp...Generative AI technology is a fascinating field that focuses on creating comp...
Generative AI technology is a fascinating field that focuses on creating comp...
Nohoax Kanont
Top 12 AI Technology Trends For 2024.pdf
Top 12 AI Technology Trends For 2024.pdfTop 12 AI Technology Trends For 2024.pdf
Top 12 AI Technology Trends For 2024.pdf
Marrie Morris
FIDO Munich Seminar: Biometrics and Passkeys for In-Vehicle Apps.pptx
FIDO Munich Seminar: Biometrics and Passkeys for In-Vehicle Apps.pptxFIDO Munich Seminar: Biometrics and Passkeys for In-Vehicle Apps.pptx
FIDO Munich Seminar: Biometrics and Passkeys for In-Vehicle Apps.pptx
FIDO Alliance
FIDO Munich Seminar Workforce Authentication Case Study.pptx
FIDO Munich Seminar Workforce Authentication Case Study.pptxFIDO Munich Seminar Workforce Authentication Case Study.pptx
FIDO Munich Seminar Workforce Authentication Case Study.pptx
FIDO Alliance
Discovery Series - Zero to Hero - Task Mining Session 1
Discovery Series - Zero to Hero - Task Mining Session 1Discovery Series - Zero to Hero - Task Mining Session 1
Discovery Series - Zero to Hero - Task Mining Session 1
Scaling Vector Search: How Milvus Handles Billions+
Scaling Vector Search: How Milvus Handles Billions+Scaling Vector Search: How Milvus Handles Billions+
Scaling Vector Search: How Milvus Handles Billions+
Perth MuleSoft Meetup July 2024
Perth MuleSoft Meetup July 2024Perth MuleSoft Meetup July 2024
Perth MuleSoft Meetup July 2024
Michael Price
Camunda Chapter NY Meetup July 2024.pptx
Camunda Chapter NY Meetup July 2024.pptxCamunda Chapter NY Meetup July 2024.pptx
Camunda Chapter NY Meetup July 2024.pptx
TrustArc Webinar - Innovating with TRUSTe Responsible AI Certification
TrustArc Webinar - Innovating with TRUSTe Responsible AI CertificationTrustArc Webinar - Innovating with TRUSTe Responsible AI Certification
TrustArc Webinar - Innovating with TRUSTe Responsible AI Certification
Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...
Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...
Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...
Snarky Security
The History of Embeddings & Multimodal Embeddings
The History of Embeddings & Multimodal EmbeddingsThe History of Embeddings & Multimodal Embeddings
The History of Embeddings & Multimodal Embeddings
Choosing the Best Outlook OST to PST Converter: Key Features and Considerations
Choosing the Best Outlook OST to PST Converter: Key Features and ConsiderationsChoosing the Best Outlook OST to PST Converter: Key Features and Considerations
Choosing the Best Outlook OST to PST Converter: Key Features and Considerations
webbyacad software
FIDO Munich Seminar: Strong Workforce Authn Push & Pull Factors.pptx
FIDO Munich Seminar: Strong Workforce Authn Push & Pull Factors.pptxFIDO Munich Seminar: Strong Workforce Authn Push & Pull Factors.pptx
FIDO Munich Seminar: Strong Workforce Authn Push & Pull Factors.pptx
FIDO Alliance
History and Introduction for Generative AI ( GenAI )
History and Introduction for Generative AI ( GenAI )History and Introduction for Generative AI ( GenAI )
History and Introduction for Generative AI ( GenAI )
FIDO Munich Seminar Introduction to FIDO.pptx
FIDO Munich Seminar Introduction to FIDO.pptxFIDO Munich Seminar Introduction to FIDO.pptx
FIDO Munich Seminar Introduction to FIDO.pptx
FIDO Alliance
Finetuning GenAI For Hacking and Defending
Finetuning GenAI For Hacking and DefendingFinetuning GenAI For Hacking and Defending
Finetuning GenAI For Hacking and Defending
Priyanka Aash
"Making .NET Application Even Faster", Sergey Teplyakov.pptx
"Making .NET Application Even Faster", Sergey Teplyakov.pptx"Making .NET Application Even Faster", Sergey Teplyakov.pptx
"Making .NET Application Even Faster", Sergey Teplyakov.pptx
UiPath Community Day Amsterdam: Code, Collaborate, Connect
UiPath Community Day Amsterdam: Code, Collaborate, ConnectUiPath Community Day Amsterdam: Code, Collaborate, Connect
UiPath Community Day Amsterdam: Code, Collaborate, Connect
NVIDIA at Breakthrough Discuss for Space Exploration
NVIDIA at Breakthrough Discuss for Space ExplorationNVIDIA at Breakthrough Discuss for Space Exploration
NVIDIA at Breakthrough Discuss for Space Exploration
Alison B. Lowndes

Recently uploaded (20)

How UiPath Discovery Suite supports identification of Agentic Process Automat...
How UiPath Discovery Suite supports identification of Agentic Process Automat...How UiPath Discovery Suite supports identification of Agentic Process Automat...
How UiPath Discovery Suite supports identification of Agentic Process Automat...
Generative AI technology is a fascinating field that focuses on creating comp...
Generative AI technology is a fascinating field that focuses on creating comp...Generative AI technology is a fascinating field that focuses on creating comp...
Generative AI technology is a fascinating field that focuses on creating comp...
Top 12 AI Technology Trends For 2024.pdf
Top 12 AI Technology Trends For 2024.pdfTop 12 AI Technology Trends For 2024.pdf
Top 12 AI Technology Trends For 2024.pdf
FIDO Munich Seminar: Biometrics and Passkeys for In-Vehicle Apps.pptx
FIDO Munich Seminar: Biometrics and Passkeys for In-Vehicle Apps.pptxFIDO Munich Seminar: Biometrics and Passkeys for In-Vehicle Apps.pptx
FIDO Munich Seminar: Biometrics and Passkeys for In-Vehicle Apps.pptx
FIDO Munich Seminar Workforce Authentication Case Study.pptx
FIDO Munich Seminar Workforce Authentication Case Study.pptxFIDO Munich Seminar Workforce Authentication Case Study.pptx
FIDO Munich Seminar Workforce Authentication Case Study.pptx
Discovery Series - Zero to Hero - Task Mining Session 1
Discovery Series - Zero to Hero - Task Mining Session 1Discovery Series - Zero to Hero - Task Mining Session 1
Discovery Series - Zero to Hero - Task Mining Session 1
Scaling Vector Search: How Milvus Handles Billions+
Scaling Vector Search: How Milvus Handles Billions+Scaling Vector Search: How Milvus Handles Billions+
Scaling Vector Search: How Milvus Handles Billions+
Perth MuleSoft Meetup July 2024
Perth MuleSoft Meetup July 2024Perth MuleSoft Meetup July 2024
Perth MuleSoft Meetup July 2024
Camunda Chapter NY Meetup July 2024.pptx
Camunda Chapter NY Meetup July 2024.pptxCamunda Chapter NY Meetup July 2024.pptx
Camunda Chapter NY Meetup July 2024.pptx
TrustArc Webinar - Innovating with TRUSTe Responsible AI Certification
TrustArc Webinar - Innovating with TRUSTe Responsible AI CertificationTrustArc Webinar - Innovating with TRUSTe Responsible AI Certification
TrustArc Webinar - Innovating with TRUSTe Responsible AI Certification
Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...
Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...
Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...
The History of Embeddings & Multimodal Embeddings
The History of Embeddings & Multimodal EmbeddingsThe History of Embeddings & Multimodal Embeddings
The History of Embeddings & Multimodal Embeddings
Choosing the Best Outlook OST to PST Converter: Key Features and Considerations
Choosing the Best Outlook OST to PST Converter: Key Features and ConsiderationsChoosing the Best Outlook OST to PST Converter: Key Features and Considerations
Choosing the Best Outlook OST to PST Converter: Key Features and Considerations
FIDO Munich Seminar: Strong Workforce Authn Push & Pull Factors.pptx
FIDO Munich Seminar: Strong Workforce Authn Push & Pull Factors.pptxFIDO Munich Seminar: Strong Workforce Authn Push & Pull Factors.pptx
FIDO Munich Seminar: Strong Workforce Authn Push & Pull Factors.pptx
History and Introduction for Generative AI ( GenAI )
History and Introduction for Generative AI ( GenAI )History and Introduction for Generative AI ( GenAI )
History and Introduction for Generative AI ( GenAI )
FIDO Munich Seminar Introduction to FIDO.pptx
FIDO Munich Seminar Introduction to FIDO.pptxFIDO Munich Seminar Introduction to FIDO.pptx
FIDO Munich Seminar Introduction to FIDO.pptx
Finetuning GenAI For Hacking and Defending
Finetuning GenAI For Hacking and DefendingFinetuning GenAI For Hacking and Defending
Finetuning GenAI For Hacking and Defending
"Making .NET Application Even Faster", Sergey Teplyakov.pptx
"Making .NET Application Even Faster", Sergey Teplyakov.pptx"Making .NET Application Even Faster", Sergey Teplyakov.pptx
"Making .NET Application Even Faster", Sergey Teplyakov.pptx
UiPath Community Day Amsterdam: Code, Collaborate, Connect
UiPath Community Day Amsterdam: Code, Collaborate, ConnectUiPath Community Day Amsterdam: Code, Collaborate, Connect
UiPath Community Day Amsterdam: Code, Collaborate, Connect
NVIDIA at Breakthrough Discuss for Space Exploration
NVIDIA at Breakthrough Discuss for Space ExplorationNVIDIA at Breakthrough Discuss for Space Exploration
NVIDIA at Breakthrough Discuss for Space Exploration

Search at Twitter: Presented by Michael Busch, Twitter

  • 1. Search @twitter Michael Busch @michibusch
  • 2. Search @twitter Agenda ‣ Introduction - Search Architecture - Lucene Extensions - Outlook
  • 5. Introduction Twitter has more than 284 million monthly active users.
  • 6. Introduction 500 million tweets are sent per day.
  • 7. Introduction More than 300 billion tweets have been sent since company founding in 2006.
  • 8. Introduction Tweets-per-second record: one-second peak of 143,199 TPS.
  • 9. Introduction More than 2 billion search queries per day.
  • 10. Search @twitter Agenda - Introduction ‣ Search Architecture - Lucene Extensions - Outlook
  • 13. RT index Search Architecture RT stream Analyzer/ Partitioner RT index (Earlybird) Blender Archive index RT index Mapreduce Analyzer raw tweets Tweet archive HDFS Search requests writes searches analyzed tweets analyzed tweets raw tweets
  • 14. RT index Search Architecture Tweets Analyzer/ Partitioner RT index (Earlybird) Blender Archive index RT index queue HDFS Search requests Updates Deletes/ Engagement (e.g. retweets/favs) writes searches Mapreduce Analyzer
  • 15. RT index Search Architecture RT index (Earlybird) Social graph Social Blender Archive index RT index User search Search requests writes searches • Blender is our Thrift service aggregator • Queries multiple Earlybirds, merges results Social graph graph
  • 16. Search Architecture RT index (Earlybird) Archive index User search
  • 17. Search Architecture RT index (Earlybird) Archive index • For historic reasons, these used to be entirely different codebases, but had similar features/ technologies • Over time cross-dependencies were introduced to share code User search Lucene
  • 18. Search Architecture RT index (Earlybird) Archive index User search Lucene Extensions Lucene • New Lucene extension package • This package is truly generic and has no dependency on an actual product/index • It contains Twitter’s extensions for real-time search, a thin segment management layer and other features
  • 19. Search @twitter Agenda - Introduction - Search Architecture ‣ Lucene Extensions - Outlook
  • 22. Lucene Extension Library • Abstraction layer for Lucene index segments • Real-time writer for in-memory index segments • Schema-based Lucene document factory • Real-time faceting
  • 23. Lucene Extension Library • API layer for Lucene segments • *IndexSegmentWriter • *IndexSegmentAtomicReader • Two implementations • In-memory: RealtimeIndexSegmentWriter (and reader) • On-disk: LuceneIndexSegmentWriter (and reader)
  • 24. Lucene Extension Library • IndexSegments can be built ... • in realtime • on Mesos or Hadoop (Mapreduce) • locally on serving machines • Cluster-management code that deals with IndexSegments • Share segments across serving machines using HDFS • Can rebuild segments (e.g. to upgrade Lucene version, change data schema, etc.)
  • 25. Lucene Extension Library HDFS EEEaararlyrlylbybbirirdirdd Mesos Hadoop (MR) RT pipeline
  • 26. RealtimeIndexSegmentWriter • Modified Lucene index implementation optimized for realtime search • IndexWriter buffer is searchable (no need to flush to allow searching) • In-memory • Lock-free concurrency model for best performance
  • 27. Concurrency - Definitions • Pessimistic locking • A thread holds an exclusive lock on a resource, while an action is performed [mutual exclusion] • Usually used when conflicts are expected to be likely • Optimistic locking • Operations are tried to be performed atomically without holding a lock; conflicts can be detected; retry logic is often used in case of conflicts • Usually used when conflicts are expected to be the exception
  • 28. Concurrency - Definitions • Non-blocking algorithm Ensures, that threads competing for shared resources do not have their execution indefinitely postponed by mutual exclusion. • Lock-free algorithm A non-blocking algorithm is lock-free if there is guaranteed system-wide progress. • Wait-free algorithm A non-blocking algorithm is wait-free, if there is guaranteed per-thread progress. * Source: Wikipedia
  • 29. Concurrency • Having a single writer thread simplifies our problem: no locks have to be used to protect data structures from corruption (only one thread modifies data) • But: we have to make sure that all readers always see a consistent state of all data structures -> this is much harder than it sounds! • In Java, it is not guaranteed that one thread will see changes that another thread makes in program execution order, unless the same memory barrier is crossed by both threads -> safe publication • Safe publication can be achieved in different, subtle ways. Read the great book “Java concurrency in practice” by Brian Goetz for more information!
  • 30. Java Memory Model • Program order rule Each action in a thread happens-before every action in that thread that comes later in the program order. • Volatile variable rule A write to a volatile field happens-before every subsequent read of that same field. • Transitivity If A happens-before B, and B happens-before C, then A happens-before C. * Source: Brian Goetz: Java Concurrency in Practice
  • 31. Concurrency RAM 0 int x; Cache Thread 1 Thread 2 time
  • 32. Concurrency Cache 5 RAM 0 int x; Thread 1 Thread 2 x = 5; Thread A writes x=5 to cache time
  • 33. Concurrency Cache 5 RAM 0 int x; Thread 1 Thread 2 x = 5; time while(x != 5); This condition will likely never become false!
  • 34. Concurrency RAM 0 int x; Cache Thread 1 Thread 2 time
  • 35. Concurrency RAM 0 int x; Thread A writes b=1 to RAM, because b is volatile 5 x = 5; 1 Cache Thread 1 Thread 2 time volatile int b; b = 1;
  • 36. Concurrency RAM 0 int x; 5 x = 5; 1 Cache Thread 1 Thread 2 time volatile int b; b = 1; Read volatile b int dummy = b; while(x != 5);
  • 37. Concurrency RAM 0 int x; 5 x = 5; 1 Cache Thread 1 Thread 2 time volatile int b; b = 1; int dummy = b; while(x != 5); happens-before • Program order rule: Each action in a thread happens-before every action in that thread that comes later in the program order.
  • 38. Concurrency RAM 0 int x; 5 x = 5; 1 Cache Thread 1 Thread 2 time volatile int b; b = 1; int dummy = b; while(x != 5); happens-before • Volatile variable rule: A write to a volatile field happens-before every subsequent read of that same field.
  • 39. Concurrency RAM 0 int x; 5 x = 5; 1 Cache Thread 1 Thread 2 time volatile int b; b = 1; int dummy = b; while(x != 5); happens-before • Transitivity: If A happens-before B, and B happens-before C, then A happens-before C.
  • 40. Concurrency RAM 0 int x; 5 x = 5; 1 Cache Thread 1 Thread 2 time volatile int b; b = 1; int dummy = b; while(x != 5); This condition will be false, i.e. x==5 • Note: x itself doesn’t have to be volatile. There can be many variables like x, but we need only a single volatile field.
  • 41. Concurrency RAM 0 int x; 5 x = 5; 1 Cache Thread 1 Thread 2 time volatile int b; b = 1; int dummy = b; while(x != 5); Memory barrier • Note: x itself doesn’t have to be volatile. There can be many variables like x, but we need only a single volatile field.
  • 43. Demo
  • 44. Concurrency RAM 0 int x; 5 x = 5; 1 Cache Thread 1 Thread 2 time volatile int b; b = 1; int dummy = b; while(x != 5); Memory barrier • Note: x itself doesn’t have to be volatile. There can be many variables like x, but we need only a single volatile field.
  • 45. Concurrency IndexWriter IndexReader time write 100 docs maxDoc = 100 in read maxDoc search upto maxDoc write more docs maxDoc is volatile
  • 46. Concurrency IndexWriter IndexReader time write 100 docs maxDoc = 100 in read maxDoc search upto maxDoc write more docs maxDoc is volatile happens-before • Only maxDoc is volatile. All other fields that IW writes to and IR reads from don’t need to be!
  • 47. Wait-free • Not a single exclusive lock • Writer thread can always make progress • Optimistic locking (retry-logic) in a few places for searcher thread • Retry logic very simple and guaranteed to always make progress
  • 48. In-memory Real-time Index • Highly optimized for GC - all data is stored in blocked native arrays • v1: Optimized for tweets with a term position limit of 255 • v2: Support for 32 bit positions without performance degradation • v2: Basic support for out-of-order posting list inserts
  • 49. In-memory Real-time Index • Highly optimized for GC - all data is stored in blocked native arrays • v1: Optimized for tweets with a term position limit of 255 • v2: Support for 32 bit positions without performance degradation • v2: Basic support for out-of-order posting list inserts
  • 50. In-memory Real-time Index • RT term dictionary • Term lookups using a lock-free hashtable in O(1) • v2: Additional probabilistic, lock-free skip list maintains ordering on terms • Perfect skip list not an option: out-of-order inserts would require rebalancing, which is impractical with our lock-free index • In a probabilistic skip list the tower height of a new (out-of-order) item can be determined without knowing its insert position by simply rolling a dice
  • 51. In-memory Real-time Index • Perfect skip list
  • 52. In-memory Real-time Index • Perfect skip list Inserting a new element in the middle of this skip list requires re-balancing the towers.
  • 53. In-memory Real-time Index • Probabilistic skip list
  • 54. In-memory Real-time Index • Probabilistic skip list Tower height determined by rolling a dice BEFORE knowing the insert location; tower height never has to change for an element, simplifying memory allocation and concurrency.
  • 55. Schema-based Document factory • Apps provide one ThriftSchema per index and create a ThriftDocument for each document • SchemaDocumentFactory translates ThriftDocument -> Lucene Document using the Schema • Default field values • Extended field settings • Type-system on top of DocValues • Validation
  • 56. Schema-based Document factory Schema Lucene Document SchemaDocument Factory Thrift Document • Validation • Fill in default values • Apply correct Lucene field settings
  • 57. Schema-based Document factory Schema Lucene Document SchemaDocument Factory Thrift Document • Validation • Fill in default values • Apply correct Lucene field settings Decouples core package from specific product/index. Similar to Solr/ElasticSearch.
  • 58. Search @twitter Agenda - Introduction - Search Architecture - Lucene Extensions ‣ Outlook
  • 61. Outlook • Support for parallel (sliced) segments to support partial segment rebuilds and other cool posting list update patterns • Add remaining missing Lucene features to RT index • Index term statistics for ranking • Term vectors • Stored fields
  • 62. Questions? Michael Busch @michibusch
  • 65. Searching for top entities within Tweets • Task: Find the best photos in a subset of tweets • We could use a Lucene index, where each photo is a document • Problem: How to update existing documents when the same photos are tweeted again? • In-place posting list updates are hard • Lucene’s updateDocument() is a delete/add operation - expensive and not order-preserving
  • 66. Searching for top entities within Tweets • Task: Find the best photos in a subset of tweets • Could we use our existing time-ordered tweet index? • Facets!
  • 67. Searching for top entities within Tweets Query Doc ids Inverted index Term id Term label Forward Doc id index Document Metadata Facet index Doc id Term ids
  • 68. Storing tweet metadata Facet Doc id index Term ids
  • 69. 5 15 9000 9002 100000 100090 Matching doc id Facet index Term ids Top-k heap Id Count 48239 8 31241 2 Query Searching for top entities within Tweets
  • 70. 5 15 9000 9002 100000 100090 Matching doc id Facet index Term ids Top-k heap Id Count 48239 15 31241 12 85932 8 6748 3 Query Searching for top entities within Tweets
  • 71. Searching for top entities within Tweets 5 15 9000 9002 100000 100090 Matching doc id Facet index Term ids Top-k heap Id Count 48239 15 31241 12 85932 8 6748 3 Query Weighted counts (from engagement features) used for relevance scoring
  • 72. Searching for top entities within Tweets 5 15 9000 9002 100000 100090 Matching doc id Facet index Term ids Top-k heap Id Count 48239 15 31241 12 85932 8 6748 3 Query All query operators can be used. E.g. find best photos in San Francisco tweeted by people I follow
  • 73. Searching for top entities within Tweets Inverted Term id index Term label
  • 74. Searching for top entities within Tweets Id Count Label Count 45 23 15 11 8 5 48239 45 31241 23 85932 15 6748 11 74294 8 3728 5 Inverted index
  • 75. Summary • Indexing tweet entities (e.g. photos) as facets allows to search and rank top-entities using a tweets index • All query operators supported • Documents don’t need to be reindexed • Approach reusable for different use cases, e.g.: best vines, hashtags, @mentions, etc.