SlideShare a Scribd company logo
Migration
from mysql to
elasticsearch
Our Service
User can find jobs
that fit him/her
Company can find suitable
candidates for its jobs.
Matching on Structured Data
• Both User and Job data are stored in mysql

• Records in Job table corresponds to multiple records in
child tables, so do records in User table (1:N relationship)
Job
Location Salary Others
User
Location Salary Others
Number of records
Table Number of records
User 529,683
child tableX of user 2,984,111
child tableY of user 1,966,161
… …
Table Number of records
Job 160,305
child tableA of job 2,359,512
child tableB of job 232,202
… …
For user searchFor Job search
Complicated SQL Query
SELECT parentTable.id FROM parentTable

INNER JOIN childTableD ON childTableD.id = parentTable.id 

LEFT OUTER JOIN childTableA ON childTableA.id = parentTable.id AND childTableA.code = 273230450

….

INNER JOIN (

SELECT id FROM childTableB

WHERE code IN (11232) AND id IN (SELECT id FROM parentTable WHERE code = 1 )

UNION

….

SELECT id FROM childTableC

WHERE code IN (1, 2, 3) AND id IN (SELECT id FROM parentTable WHERE code = 5 )

) temp2 ON parentTable.id = temp2.id

WHERE

AND childTableA.id IS NULL

….

AND parentTable.id IN (SELECT id FROM childTableC WHERE code IN (1, 2, 3))

AND (

parentTable.id IN (SELECT id FROM childTableB WHERE code IN (11232))

OR 

….

)

) ORDER BY childTableD.timestamp DESC, parentTable.updatedOn DESC LIMIT 101 OFFSET
• And SQL queries are constructed dynamically in Java.
We are struggling handling job/user search
• Lots of slow ( > 1s) queries of mysql

• Some of them are pretty slow ( > 2s )
160305
Top 20 slow queries of a day
Total Count Average Sec Query
1 209 1.4 SELECT COUNT(*) FROM tableA WHERE tableA.id IN ( SELECT tableA.id FROM tableA LEFT OUTER JOIN tableB ON sc
2 197 1.4 SELECT COUNT(*) FROM tableA WHERE tableA.id IN ( SELECT tableA.id FROM tableA LEFT OUTER JOIN tableB ON sc
3 175 1.9 SELECT COUNT(*) FROM tableA WHERE tableA.id IN ( SELECT tableA.id FROM tableA LEFT OUTER JOIN tableB ON sc
4 158 1.9 SELECT COUNT(*) FROM tableA WHERE tableA.id IN ( SELECT tableA.id FROM tableA LEFT OUTER JOIN tableB ON sc
5 113 3.2 SELECT COUNT(*) FROM tableA WHERE tableA.id IN ( SELECT tableA.id FROM tableA LEFT OUTER JOIN tableB ON sc
6 96 1.1 INSERT INTO tableC (id, lastLoginDate, rank, sortKey, createdOn, updatedOn) SELECT id, DATE(FROM_UNIXTIME(IFNUL,
7 85 3.3 SELECT COUNT(*) FROM tableA WHERE tableA.id IN ( SELECT tableA.id FROM tableA LEFT OUTER JOIN tableB ON sc
8 85 1.1 INSERT INTO tableD (id, tableD, createdOn, updatedOn) SELECT id, 2, UNIX_TIMESTAMP(), UNIX_TIMESTAMP() FROM
9 84 1.1 INSERT INTO tableC (id, lastLoginDate, rank, sortKey, createdOn, updatedOn) SELECT id,DATE(FROM_UNIXTIME(IFNUL
10 82 1.1 INSERT INTO tableD (id, tableD, createdOn, updatedOn) SELECT id, 1, UNIX_TIMESTAMP(), UNIX_TIMESTAMP() FROM
11 80 3.2 SELECT COUNT(*) FROM tableA WHERE tableA.id IN ( SELECT tableA.id FROM tableA LEFT OUTER JOIN tableB ON sc
12 79 3.4 SELECT COUNT(*) FROM tableA WHERE tableA.id IN ( SELECT tableA.id FROM tableA LEFT OUTER JOIN tableB ON sc
13 77 4.1 SELECT COUNT(*) FROM tableA WHERE tableA.id IN ( SELECT tableA.id FROM tableA LEFT OUTER JOIN tableB ON sc
14 70 1.9 SELECT COUNT(*) FROM tableA WHERE tableA.id IN ( SELECT tableA.id FROM tableA LEFT OUTER JOIN tableB ON sc
15 70 4.3 SELECT COUNT(*) FROM tableA WHERE tableA.id IN ( SELECT tableA.id FROM tableA LEFT OUTER JOIN tableB ON sc
16 69 3 SELECT COUNT(*) FROM tableA WHERE tableA.id IN ( SELECT tableA.id FROM tableA LEFT OUTER JOIN tableB ON sc
17 69 5.9 SELECT COUNT(*) FROM tableA WHERE tableA.id IN ( SELECT tableA.id FROM tableA LEFT OUTER JOIN tableB ON sc
18 67 2.6 SELECT COUNT(*) FROM tableA WHERE tableA.id IN ( SELECT tableA.id FROM tableA LEFT OUTER JOIN tableB ON sc
19 65 1.9 SELECT COUNT(*) FROM tableA WHERE tableA.id IN ( SELECT tableA.id FROM tableA LEFT OUTER JOIN tableB ON sc
20 63 4.1 SELECT COUNT(*) FROM tableA WHERE tableA.id IN ( SELECT tableA.id FROM tableA LEFT OUTER JOIN tableB ON sc
Most of them are related to user search
Solution for the slow query problem
• SQL tuning

-> Done (4 different sorted tables, bit-operation). 

-> But still slow

• Introducing elasticsearch(ES)

-> Maybe. But ES is a full-text search engine.

Can ES handle structured data too? Not sure 🤔

• Do an experiment before introducing ES to our project

-> Create an ES cluster with production data in dev env



• The result: ES query(0.5s) is faster than mysql one(2s)
The first target
• User data characteristics

• Data are updated in real-time.

• A number of target records increases quickly 

Less than 10,000 in Jan/2017 

More than 500,000 in Aug/2017

• Job data characteristics

• Data are updated three times a day

• Data growth isn’t fast

• Data structure will change due to a business reason
Let’s migrate job search first!
Basic Strategy
• Data can be recovered from mysql 

-> Data can be recovered by batch even if entire ES cluster is collapsed

• Try to minimize use case of ES

-> Only focus on searching list of jobs which match the given search
condition

-> Do not deal with trivial queries for a specific job (use mysql)

• Use high performance machine

-> 3 physical data nodes (20CPU,32GB RAM 200GB SSD)

• All job related data stored in ES

-> We had a option that we get only ids by ES and get necessary data
from other data source (such as mysql, redis). But we didn’t choose it
for simple implementation
Performance Tuning
• Servers in Cluster

3 master nodes: To avoid split brain

3 data nodes: 2 maybe enough, but for safety

• Index settings

1 shard: 1 shard has better performance than 2 shards

2 replicas: each data node has 1 shard

• Indexing performance setting

indices.store.throttle.max_bytes_per_sec is set 200mb (default
20mb)
Performance Tuning
• Memory Settings

• Thread Pool Settings
Default Tuned
indices.fielddata.cache.size Unbounded 10%
indices.queries.cache.size 10% 10%
indices.requests.cache.size 1% 1%
Default Tuned
thread_pool.index.queue_size 200 1,000
thread_pool.bulk.queue_size 200 1,000
thread_pool.search.queue_size 1,000 5,000
Preparation Before Migration
• Stress Test

-> Use gatling (load and performance test tool)

-> Use queries executed in beta

-> High load search while indexing

-> Check the maximum request per second of the cluster (900req/sec)

-> Long run (more than 24 hours with load)

• HA and failure Test

-> Search function works even when one data node is available

-> Top page of our service is available even when entire cluster is down

-> Circuit breaker behavior

• Monitoring and Alert

-> Use prometheus and alertmanager

-> Server metrics, JVM metrics, ES specific metrics such as query cache size

-> Alert test before operation
As a result of
migration
Latency of API for job search
Before Migration (mysql)
After Migration (elasticsearch)
ES Cluster metrics
Server
ES Cluster metrics
JVM
Positive side effects
• Dedicated full text search engine (groonga) is no longer needed

Everything can be done in ES

• More flexible and precise geo location search than geohash

• Simpler code

-> No SQL tricks and No need to refer many tables

• Avoid constructing SQL query by string concatenation
Future Plan
• adapt ES to other implementation

-> Job related features that we avoided adapting before 

-> User search (the main target)

• Better full text search with neologd

More Related Content

What's hot

PGConf APAC 2018 - Lightening Talk #2 - Centralizing Authorization in PostgreSQL
PGConf APAC 2018 - Lightening Talk #2 - Centralizing Authorization in PostgreSQLPGConf APAC 2018 - Lightening Talk #2 - Centralizing Authorization in PostgreSQL
PGConf APAC 2018 - Lightening Talk #2 - Centralizing Authorization in PostgreSQL
PGConf APAC
 
Lessons for the optimizer from running the TPC-DS benchmark
Lessons for the optimizer from running the TPC-DS benchmarkLessons for the optimizer from running the TPC-DS benchmark
Lessons for the optimizer from running the TPC-DS benchmark
Sergey Petrunya
 
Using Optimizer Hints to Improve MySQL Query Performance
Using Optimizer Hints to Improve MySQL Query PerformanceUsing Optimizer Hints to Improve MySQL Query Performance
Using Optimizer Hints to Improve MySQL Query Performance
oysteing
 
A Billion Goods in a Few Categories: When Optimizer Histograms Help and When ...
A Billion Goods in a Few Categories: When Optimizer Histograms Help and When ...A Billion Goods in a Few Categories: When Optimizer Histograms Help and When ...
A Billion Goods in a Few Categories: When Optimizer Histograms Help and When ...
Sveta Smirnova
 
Common Table Expressions in MariaDB 10.2
Common Table Expressions in MariaDB 10.2Common Table Expressions in MariaDB 10.2
Common Table Expressions in MariaDB 10.2
Sergey Petrunya
 
Mentor Your Indexes
Mentor Your IndexesMentor Your Indexes
Mentor Your Indexes
Karwin Software Solutions LLC
 
Billion Goods in Few Categories: How Histograms Save a Life?
Billion Goods in Few Categories: How Histograms Save a Life?Billion Goods in Few Categories: How Histograms Save a Life?
Billion Goods in Few Categories: How Histograms Save a Life?
Sveta Smirnova
 
Billion Goods in Few Categories: how Histograms Save a Life?
Billion Goods in Few Categories: how Histograms Save a Life?Billion Goods in Few Categories: how Histograms Save a Life?
Billion Goods in Few Categories: how Histograms Save a Life?
Sveta Smirnova
 
MySQL Performance Schema in Action
MySQL Performance Schema in ActionMySQL Performance Schema in Action
MySQL Performance Schema in Action
Sveta Smirnova
 
Query Optimizer in MariaDB 10.4
Query Optimizer in MariaDB 10.4Query Optimizer in MariaDB 10.4
Query Optimizer in MariaDB 10.4
Sergey Petrunya
 
Python and Data Analysis
Python and Data AnalysisPython and Data Analysis
Python and Data Analysis
Praveen Nair
 
Do something in 5 minutes with gas 1-use spreadsheet as database
Do something in 5 minutes with gas 1-use spreadsheet as databaseDo something in 5 minutes with gas 1-use spreadsheet as database
Do something in 5 minutes with gas 1-use spreadsheet as database
Bruce McPherson
 
PostgreSQL, performance for queries with grouping
PostgreSQL, performance for queries with groupingPostgreSQL, performance for queries with grouping
PostgreSQL, performance for queries with grouping
Alexey Bashtanov
 
MySQL Performance Schema in Action: the Complete Tutorial
MySQL Performance Schema in Action: the Complete TutorialMySQL Performance Schema in Action: the Complete Tutorial
MySQL Performance Schema in Action: the Complete Tutorial
Sveta Smirnova
 
Optimizer features in recent releases of other databases
Optimizer features in recent releases of other databasesOptimizer features in recent releases of other databases
Optimizer features in recent releases of other databases
Sergey Petrunya
 
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Serban Tanasa
 
#ajn3.lt.marblejenka
#ajn3.lt.marblejenka#ajn3.lt.marblejenka
#ajn3.lt.marblejenka
Shingo Furuyama
 
R data-import, data-export
R data-import, data-exportR data-import, data-export
R data-import, data-export
FAO
 
Optimizer Trace Walkthrough
Optimizer Trace WalkthroughOptimizer Trace Walkthrough
Optimizer Trace Walkthrough
Sergey Petrunya
 
Photon Technical Deep Dive: How to Think Vectorized
Photon Technical Deep Dive: How to Think VectorizedPhoton Technical Deep Dive: How to Think Vectorized
Photon Technical Deep Dive: How to Think Vectorized
Databricks
 

What's hot (20)

PGConf APAC 2018 - Lightening Talk #2 - Centralizing Authorization in PostgreSQL
PGConf APAC 2018 - Lightening Talk #2 - Centralizing Authorization in PostgreSQLPGConf APAC 2018 - Lightening Talk #2 - Centralizing Authorization in PostgreSQL
PGConf APAC 2018 - Lightening Talk #2 - Centralizing Authorization in PostgreSQL
 
Lessons for the optimizer from running the TPC-DS benchmark
Lessons for the optimizer from running the TPC-DS benchmarkLessons for the optimizer from running the TPC-DS benchmark
Lessons for the optimizer from running the TPC-DS benchmark
 
Using Optimizer Hints to Improve MySQL Query Performance
Using Optimizer Hints to Improve MySQL Query PerformanceUsing Optimizer Hints to Improve MySQL Query Performance
Using Optimizer Hints to Improve MySQL Query Performance
 
A Billion Goods in a Few Categories: When Optimizer Histograms Help and When ...
A Billion Goods in a Few Categories: When Optimizer Histograms Help and When ...A Billion Goods in a Few Categories: When Optimizer Histograms Help and When ...
A Billion Goods in a Few Categories: When Optimizer Histograms Help and When ...
 
Common Table Expressions in MariaDB 10.2
Common Table Expressions in MariaDB 10.2Common Table Expressions in MariaDB 10.2
Common Table Expressions in MariaDB 10.2
 
Mentor Your Indexes
Mentor Your IndexesMentor Your Indexes
Mentor Your Indexes
 
Billion Goods in Few Categories: How Histograms Save a Life?
Billion Goods in Few Categories: How Histograms Save a Life?Billion Goods in Few Categories: How Histograms Save a Life?
Billion Goods in Few Categories: How Histograms Save a Life?
 
Billion Goods in Few Categories: how Histograms Save a Life?
Billion Goods in Few Categories: how Histograms Save a Life?Billion Goods in Few Categories: how Histograms Save a Life?
Billion Goods in Few Categories: how Histograms Save a Life?
 
MySQL Performance Schema in Action
MySQL Performance Schema in ActionMySQL Performance Schema in Action
MySQL Performance Schema in Action
 
Query Optimizer in MariaDB 10.4
Query Optimizer in MariaDB 10.4Query Optimizer in MariaDB 10.4
Query Optimizer in MariaDB 10.4
 
Python and Data Analysis
Python and Data AnalysisPython and Data Analysis
Python and Data Analysis
 
Do something in 5 minutes with gas 1-use spreadsheet as database
Do something in 5 minutes with gas 1-use spreadsheet as databaseDo something in 5 minutes with gas 1-use spreadsheet as database
Do something in 5 minutes with gas 1-use spreadsheet as database
 
PostgreSQL, performance for queries with grouping
PostgreSQL, performance for queries with groupingPostgreSQL, performance for queries with grouping
PostgreSQL, performance for queries with grouping
 
MySQL Performance Schema in Action: the Complete Tutorial
MySQL Performance Schema in Action: the Complete TutorialMySQL Performance Schema in Action: the Complete Tutorial
MySQL Performance Schema in Action: the Complete Tutorial
 
Optimizer features in recent releases of other databases
Optimizer features in recent releases of other databasesOptimizer features in recent releases of other databases
Optimizer features in recent releases of other databases
 
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
 
#ajn3.lt.marblejenka
#ajn3.lt.marblejenka#ajn3.lt.marblejenka
#ajn3.lt.marblejenka
 
R data-import, data-export
R data-import, data-exportR data-import, data-export
R data-import, data-export
 
Optimizer Trace Walkthrough
Optimizer Trace WalkthroughOptimizer Trace Walkthrough
Optimizer Trace Walkthrough
 
Photon Technical Deep Dive: How to Think Vectorized
Photon Technical Deep Dive: How to Think VectorizedPhoton Technical Deep Dive: How to Think Vectorized
Photon Technical Deep Dive: How to Think Vectorized
 

Similar to Migration from mysql to elasticsearch

Introduction to MySQL Query Tuning for Dev[Op]s
Introduction to MySQL Query Tuning for Dev[Op]sIntroduction to MySQL Query Tuning for Dev[Op]s
Introduction to MySQL Query Tuning for Dev[Op]s
Sveta Smirnova
 
Introduction into MySQL Query Tuning
Introduction into MySQL Query TuningIntroduction into MySQL Query Tuning
Introduction into MySQL Query Tuning
Sveta Smirnova
 
My SQL Skills Killed the Server
My SQL Skills Killed the ServerMy SQL Skills Killed the Server
My SQL Skills Killed the Server
devObjective
 
Sql killedserver
Sql killedserverSql killedserver
Sql killedserver
ColdFusionConference
 
MariaDB Optimizer
MariaDB OptimizerMariaDB Optimizer
MariaDB Optimizer
JongJin Lee
 
Quick Wins
Quick WinsQuick Wins
Quick Wins
HighLoad2009
 
Myth busters - performance tuning 101 2007
Myth busters - performance tuning 101 2007Myth busters - performance tuning 101 2007
Myth busters - performance tuning 101 2007
paulguerin
 
Scaling MySQL Strategies for Developers
Scaling MySQL Strategies for DevelopersScaling MySQL Strategies for Developers
Scaling MySQL Strategies for Developers
Jonathan Levin
 
MySQL Indexing Crash Course
MySQL Indexing Crash CourseMySQL Indexing Crash Course
MySQL Indexing Crash Course
Aaron Silverman
 
How to teach an elephant to rock'n'roll
How to teach an elephant to rock'n'rollHow to teach an elephant to rock'n'roll
How to teach an elephant to rock'n'roll
PGConf APAC
 
Why Use EXPLAIN FORMAT=JSON?
 Why Use EXPLAIN FORMAT=JSON?  Why Use EXPLAIN FORMAT=JSON?
Why Use EXPLAIN FORMAT=JSON?
Sveta Smirnova
 
MySQL Indexing : Improving Query Performance Using Index (Covering Index)
MySQL Indexing : Improving Query Performance Using Index (Covering Index)MySQL Indexing : Improving Query Performance Using Index (Covering Index)
MySQL Indexing : Improving Query Performance Using Index (Covering Index)
Hemant Kumar Singh
 
MySQL performance tuning
MySQL performance tuningMySQL performance tuning
MySQL performance tuning
Anurag Srivastava
 
Oracle SQL Tuning
Oracle SQL TuningOracle SQL Tuning
Oracle SQL Tuning
Alex Zaballa
 
MySQL 5.7 in a Nutshell
MySQL 5.7 in a NutshellMySQL 5.7 in a Nutshell
MySQL 5.7 in a Nutshell
Emily Ikuta
 
Introduction into MySQL Query Tuning for Dev[Op]s
Introduction into MySQL Query Tuning for Dev[Op]sIntroduction into MySQL Query Tuning for Dev[Op]s
Introduction into MySQL Query Tuning for Dev[Op]s
Sveta Smirnova
 
Correlated update vs merge
Correlated update vs mergeCorrelated update vs merge
Correlated update vs merge
Heribertus Bramundito
 
Query Optimization with MySQL 5.6: Old and New Tricks
Query Optimization with MySQL 5.6: Old and New TricksQuery Optimization with MySQL 5.6: Old and New Tricks
Query Optimization with MySQL 5.6: Old and New Tricks
MYXPLAIN
 
SFDC Advanced Apex
SFDC Advanced Apex SFDC Advanced Apex
SFDC Advanced Apex
Sujit Kumar
 
MySQL Optimizer Overview
MySQL Optimizer OverviewMySQL Optimizer Overview
MySQL Optimizer Overview
Olav Sandstå
 

Similar to Migration from mysql to elasticsearch (20)

Introduction to MySQL Query Tuning for Dev[Op]s
Introduction to MySQL Query Tuning for Dev[Op]sIntroduction to MySQL Query Tuning for Dev[Op]s
Introduction to MySQL Query Tuning for Dev[Op]s
 
Introduction into MySQL Query Tuning
Introduction into MySQL Query TuningIntroduction into MySQL Query Tuning
Introduction into MySQL Query Tuning
 
My SQL Skills Killed the Server
My SQL Skills Killed the ServerMy SQL Skills Killed the Server
My SQL Skills Killed the Server
 
Sql killedserver
Sql killedserverSql killedserver
Sql killedserver
 
MariaDB Optimizer
MariaDB OptimizerMariaDB Optimizer
MariaDB Optimizer
 
Quick Wins
Quick WinsQuick Wins
Quick Wins
 
Myth busters - performance tuning 101 2007
Myth busters - performance tuning 101 2007Myth busters - performance tuning 101 2007
Myth busters - performance tuning 101 2007
 
Scaling MySQL Strategies for Developers
Scaling MySQL Strategies for DevelopersScaling MySQL Strategies for Developers
Scaling MySQL Strategies for Developers
 
MySQL Indexing Crash Course
MySQL Indexing Crash CourseMySQL Indexing Crash Course
MySQL Indexing Crash Course
 
How to teach an elephant to rock'n'roll
How to teach an elephant to rock'n'rollHow to teach an elephant to rock'n'roll
How to teach an elephant to rock'n'roll
 
Why Use EXPLAIN FORMAT=JSON?
 Why Use EXPLAIN FORMAT=JSON?  Why Use EXPLAIN FORMAT=JSON?
Why Use EXPLAIN FORMAT=JSON?
 
MySQL Indexing : Improving Query Performance Using Index (Covering Index)
MySQL Indexing : Improving Query Performance Using Index (Covering Index)MySQL Indexing : Improving Query Performance Using Index (Covering Index)
MySQL Indexing : Improving Query Performance Using Index (Covering Index)
 
MySQL performance tuning
MySQL performance tuningMySQL performance tuning
MySQL performance tuning
 
Oracle SQL Tuning
Oracle SQL TuningOracle SQL Tuning
Oracle SQL Tuning
 
MySQL 5.7 in a Nutshell
MySQL 5.7 in a NutshellMySQL 5.7 in a Nutshell
MySQL 5.7 in a Nutshell
 
Introduction into MySQL Query Tuning for Dev[Op]s
Introduction into MySQL Query Tuning for Dev[Op]sIntroduction into MySQL Query Tuning for Dev[Op]s
Introduction into MySQL Query Tuning for Dev[Op]s
 
Correlated update vs merge
Correlated update vs mergeCorrelated update vs merge
Correlated update vs merge
 
Query Optimization with MySQL 5.6: Old and New Tricks
Query Optimization with MySQL 5.6: Old and New TricksQuery Optimization with MySQL 5.6: Old and New Tricks
Query Optimization with MySQL 5.6: Old and New Tricks
 
SFDC Advanced Apex
SFDC Advanced Apex SFDC Advanced Apex
SFDC Advanced Apex
 
MySQL Optimizer Overview
MySQL Optimizer OverviewMySQL Optimizer Overview
MySQL Optimizer Overview
 

Recently uploaded

07. Ruby String Slides - Ruby Core Teaching
07. Ruby String Slides - Ruby Core Teaching07. Ruby String Slides - Ruby Core Teaching
07. Ruby String Slides - Ruby Core Teaching
quanhoangd129
 
Applitools Autonomous 2.0 Sneak Peek.pdf
Applitools Autonomous 2.0 Sneak Peek.pdfApplitools Autonomous 2.0 Sneak Peek.pdf
Applitools Autonomous 2.0 Sneak Peek.pdf
Applitools
 
240717 ProPILE - Probing Privacy Leakage in Large Language Models.pdf
240717 ProPILE - Probing Privacy Leakage in Large Language Models.pdf240717 ProPILE - Probing Privacy Leakage in Large Language Models.pdf
240717 ProPILE - Probing Privacy Leakage in Large Language Models.pdf
CS Kwak
 
Test Polarity: Detecting Positive and Negative Tests (FSE 2024)
Test Polarity: Detecting Positive and Negative Tests (FSE 2024)Test Polarity: Detecting Positive and Negative Tests (FSE 2024)
Test Polarity: Detecting Positive and Negative Tests (FSE 2024)
Andre Hora
 
vSAN_Tutorial_Presentation with important topics
vSAN_Tutorial_Presentation with important  topicsvSAN_Tutorial_Presentation with important  topics
vSAN_Tutorial_Presentation with important topics
abhilashspt
 
How to Secure Your Kubernetes Software Supply Chain at Scale
How to Secure Your Kubernetes Software Supply Chain at ScaleHow to Secure Your Kubernetes Software Supply Chain at Scale
How to Secure Your Kubernetes Software Supply Chain at Scale
Anchore
 
06. Ruby Array & Hash - Ruby Core Teaching
06. Ruby Array & Hash - Ruby Core Teaching06. Ruby Array & Hash - Ruby Core Teaching
06. Ruby Array & Hash - Ruby Core Teaching
quanhoangd129
 
Top 10 ERP Companies in UAE Banibro IT Solutions.pdf
Top 10 ERP Companies in UAE Banibro IT Solutions.pdfTop 10 ERP Companies in UAE Banibro IT Solutions.pdf
Top 10 ERP Companies in UAE Banibro IT Solutions.pdf
Banibro IT Solutions
 
Fixing Git Catastrophes - Nebraska.Code()
Fixing Git Catastrophes - Nebraska.Code()Fixing Git Catastrophes - Nebraska.Code()
Fixing Git Catastrophes - Nebraska.Code()
Gene Gotimer
 
09. Ruby Object Oriented Programming - Ruby Core Teaching
09. Ruby Object Oriented Programming - Ruby Core Teaching09. Ruby Object Oriented Programming - Ruby Core Teaching
09. Ruby Object Oriented Programming - Ruby Core Teaching
quanhoangd129
 
Tube Magic Software | Youtube Software | Best AI Tool For Growing Youtube Cha...
Tube Magic Software | Youtube Software | Best AI Tool For Growing Youtube Cha...Tube Magic Software | Youtube Software | Best AI Tool For Growing Youtube Cha...
Tube Magic Software | Youtube Software | Best AI Tool For Growing Youtube Cha...
David D. Scott
 
Three available editions of Windows Servers crucial to your organization’s op...
Three available editions of Windows Servers crucial to your organization’s op...Three available editions of Windows Servers crucial to your organization’s op...
Three available editions of Windows Servers crucial to your organization’s op...
Q-Advise
 
How Generative AI is Shaping the Future of Software Application Development
How Generative AI is Shaping the Future of Software Application DevelopmentHow Generative AI is Shaping the Future of Software Application Development
How Generative AI is Shaping the Future of Software Application Development
MohammedIrfan308637
 
Empowering Businesses with Intelligent Software Solutions - Grawlix
Empowering Businesses with Intelligent Software Solutions - GrawlixEmpowering Businesses with Intelligent Software Solutions - Grawlix
Empowering Businesses with Intelligent Software Solutions - Grawlix
Aarisha Shaikh
 
05. Ruby Control Structures - Ruby Core Teaching
05. Ruby Control Structures - Ruby Core Teaching05. Ruby Control Structures - Ruby Core Teaching
05. Ruby Control Structures - Ruby Core Teaching
quanhoangd129
 
Crowd Strike\Windows Update Issue: Overview and Current Status
Crowd Strike\Windows Update Issue: Overview and Current StatusCrowd Strike\Windows Update Issue: Overview and Current Status
Crowd Strike\Windows Update Issue: Overview and Current Status
ramaganesan0504
 
New York University degree Cert offer diploma Transcripta
New York University degree Cert offer diploma Transcripta New York University degree Cert offer diploma Transcripta
New York University degree Cert offer diploma Transcripta
pyxgy
 
Unlocking value with event-driven architecture by Confluent
Unlocking value with event-driven architecture by ConfluentUnlocking value with event-driven architecture by Confluent
Unlocking value with event-driven architecture by Confluent
confluent
 
The Politics of Agile Development.pptx
The  Politics of  Agile Development.pptxThe  Politics of  Agile Development.pptx
The Politics of Agile Development.pptx
NMahendiran
 
BitLocker Data Recovery | BLR Tools Data Recovery Solutions
BitLocker Data Recovery | BLR Tools Data Recovery SolutionsBitLocker Data Recovery | BLR Tools Data Recovery Solutions
BitLocker Data Recovery | BLR Tools Data Recovery Solutions
Alina Tait
 

Recently uploaded (20)

07. Ruby String Slides - Ruby Core Teaching
07. Ruby String Slides - Ruby Core Teaching07. Ruby String Slides - Ruby Core Teaching
07. Ruby String Slides - Ruby Core Teaching
 
Applitools Autonomous 2.0 Sneak Peek.pdf
Applitools Autonomous 2.0 Sneak Peek.pdfApplitools Autonomous 2.0 Sneak Peek.pdf
Applitools Autonomous 2.0 Sneak Peek.pdf
 
240717 ProPILE - Probing Privacy Leakage in Large Language Models.pdf
240717 ProPILE - Probing Privacy Leakage in Large Language Models.pdf240717 ProPILE - Probing Privacy Leakage in Large Language Models.pdf
240717 ProPILE - Probing Privacy Leakage in Large Language Models.pdf
 
Test Polarity: Detecting Positive and Negative Tests (FSE 2024)
Test Polarity: Detecting Positive and Negative Tests (FSE 2024)Test Polarity: Detecting Positive and Negative Tests (FSE 2024)
Test Polarity: Detecting Positive and Negative Tests (FSE 2024)
 
vSAN_Tutorial_Presentation with important topics
vSAN_Tutorial_Presentation with important  topicsvSAN_Tutorial_Presentation with important  topics
vSAN_Tutorial_Presentation with important topics
 
How to Secure Your Kubernetes Software Supply Chain at Scale
How to Secure Your Kubernetes Software Supply Chain at ScaleHow to Secure Your Kubernetes Software Supply Chain at Scale
How to Secure Your Kubernetes Software Supply Chain at Scale
 
06. Ruby Array & Hash - Ruby Core Teaching
06. Ruby Array & Hash - Ruby Core Teaching06. Ruby Array & Hash - Ruby Core Teaching
06. Ruby Array & Hash - Ruby Core Teaching
 
Top 10 ERP Companies in UAE Banibro IT Solutions.pdf
Top 10 ERP Companies in UAE Banibro IT Solutions.pdfTop 10 ERP Companies in UAE Banibro IT Solutions.pdf
Top 10 ERP Companies in UAE Banibro IT Solutions.pdf
 
Fixing Git Catastrophes - Nebraska.Code()
Fixing Git Catastrophes - Nebraska.Code()Fixing Git Catastrophes - Nebraska.Code()
Fixing Git Catastrophes - Nebraska.Code()
 
09. Ruby Object Oriented Programming - Ruby Core Teaching
09. Ruby Object Oriented Programming - Ruby Core Teaching09. Ruby Object Oriented Programming - Ruby Core Teaching
09. Ruby Object Oriented Programming - Ruby Core Teaching
 
Tube Magic Software | Youtube Software | Best AI Tool For Growing Youtube Cha...
Tube Magic Software | Youtube Software | Best AI Tool For Growing Youtube Cha...Tube Magic Software | Youtube Software | Best AI Tool For Growing Youtube Cha...
Tube Magic Software | Youtube Software | Best AI Tool For Growing Youtube Cha...
 
Three available editions of Windows Servers crucial to your organization’s op...
Three available editions of Windows Servers crucial to your organization’s op...Three available editions of Windows Servers crucial to your organization’s op...
Three available editions of Windows Servers crucial to your organization’s op...
 
How Generative AI is Shaping the Future of Software Application Development
How Generative AI is Shaping the Future of Software Application DevelopmentHow Generative AI is Shaping the Future of Software Application Development
How Generative AI is Shaping the Future of Software Application Development
 
Empowering Businesses with Intelligent Software Solutions - Grawlix
Empowering Businesses with Intelligent Software Solutions - GrawlixEmpowering Businesses with Intelligent Software Solutions - Grawlix
Empowering Businesses with Intelligent Software Solutions - Grawlix
 
05. Ruby Control Structures - Ruby Core Teaching
05. Ruby Control Structures - Ruby Core Teaching05. Ruby Control Structures - Ruby Core Teaching
05. Ruby Control Structures - Ruby Core Teaching
 
Crowd Strike\Windows Update Issue: Overview and Current Status
Crowd Strike\Windows Update Issue: Overview and Current StatusCrowd Strike\Windows Update Issue: Overview and Current Status
Crowd Strike\Windows Update Issue: Overview and Current Status
 
New York University degree Cert offer diploma Transcripta
New York University degree Cert offer diploma Transcripta New York University degree Cert offer diploma Transcripta
New York University degree Cert offer diploma Transcripta
 
Unlocking value with event-driven architecture by Confluent
Unlocking value with event-driven architecture by ConfluentUnlocking value with event-driven architecture by Confluent
Unlocking value with event-driven architecture by Confluent
 
The Politics of Agile Development.pptx
The  Politics of  Agile Development.pptxThe  Politics of  Agile Development.pptx
The Politics of Agile Development.pptx
 
BitLocker Data Recovery | BLR Tools Data Recovery Solutions
BitLocker Data Recovery | BLR Tools Data Recovery SolutionsBitLocker Data Recovery | BLR Tools Data Recovery Solutions
BitLocker Data Recovery | BLR Tools Data Recovery Solutions
 

Migration from mysql to elasticsearch

  • 2. Our Service User can find jobs that fit him/her Company can find suitable candidates for its jobs.
  • 3. Matching on Structured Data • Both User and Job data are stored in mysql • Records in Job table corresponds to multiple records in child tables, so do records in User table (1:N relationship) Job Location Salary Others User Location Salary Others
  • 4. Number of records Table Number of records User 529,683 child tableX of user 2,984,111 child tableY of user 1,966,161 … … Table Number of records Job 160,305 child tableA of job 2,359,512 child tableB of job 232,202 … … For user searchFor Job search
  • 5. Complicated SQL Query SELECT parentTable.id FROM parentTable INNER JOIN childTableD ON childTableD.id = parentTable.id LEFT OUTER JOIN childTableA ON childTableA.id = parentTable.id AND childTableA.code = 273230450 …. INNER JOIN ( SELECT id FROM childTableB WHERE code IN (11232) AND id IN (SELECT id FROM parentTable WHERE code = 1 ) UNION …. SELECT id FROM childTableC WHERE code IN (1, 2, 3) AND id IN (SELECT id FROM parentTable WHERE code = 5 ) ) temp2 ON parentTable.id = temp2.id WHERE AND childTableA.id IS NULL …. AND parentTable.id IN (SELECT id FROM childTableC WHERE code IN (1, 2, 3)) AND ( parentTable.id IN (SELECT id FROM childTableB WHERE code IN (11232)) OR …. ) ) ORDER BY childTableD.timestamp DESC, parentTable.updatedOn DESC LIMIT 101 OFFSET • And SQL queries are constructed dynamically in Java.
  • 6. We are struggling handling job/user search • Lots of slow ( > 1s) queries of mysql • Some of them are pretty slow ( > 2s ) 160305
  • 7. Top 20 slow queries of a day Total Count Average Sec Query 1 209 1.4 SELECT COUNT(*) FROM tableA WHERE tableA.id IN ( SELECT tableA.id FROM tableA LEFT OUTER JOIN tableB ON sc 2 197 1.4 SELECT COUNT(*) FROM tableA WHERE tableA.id IN ( SELECT tableA.id FROM tableA LEFT OUTER JOIN tableB ON sc 3 175 1.9 SELECT COUNT(*) FROM tableA WHERE tableA.id IN ( SELECT tableA.id FROM tableA LEFT OUTER JOIN tableB ON sc 4 158 1.9 SELECT COUNT(*) FROM tableA WHERE tableA.id IN ( SELECT tableA.id FROM tableA LEFT OUTER JOIN tableB ON sc 5 113 3.2 SELECT COUNT(*) FROM tableA WHERE tableA.id IN ( SELECT tableA.id FROM tableA LEFT OUTER JOIN tableB ON sc 6 96 1.1 INSERT INTO tableC (id, lastLoginDate, rank, sortKey, createdOn, updatedOn) SELECT id, DATE(FROM_UNIXTIME(IFNUL, 7 85 3.3 SELECT COUNT(*) FROM tableA WHERE tableA.id IN ( SELECT tableA.id FROM tableA LEFT OUTER JOIN tableB ON sc 8 85 1.1 INSERT INTO tableD (id, tableD, createdOn, updatedOn) SELECT id, 2, UNIX_TIMESTAMP(), UNIX_TIMESTAMP() FROM 9 84 1.1 INSERT INTO tableC (id, lastLoginDate, rank, sortKey, createdOn, updatedOn) SELECT id,DATE(FROM_UNIXTIME(IFNUL 10 82 1.1 INSERT INTO tableD (id, tableD, createdOn, updatedOn) SELECT id, 1, UNIX_TIMESTAMP(), UNIX_TIMESTAMP() FROM 11 80 3.2 SELECT COUNT(*) FROM tableA WHERE tableA.id IN ( SELECT tableA.id FROM tableA LEFT OUTER JOIN tableB ON sc 12 79 3.4 SELECT COUNT(*) FROM tableA WHERE tableA.id IN ( SELECT tableA.id FROM tableA LEFT OUTER JOIN tableB ON sc 13 77 4.1 SELECT COUNT(*) FROM tableA WHERE tableA.id IN ( SELECT tableA.id FROM tableA LEFT OUTER JOIN tableB ON sc 14 70 1.9 SELECT COUNT(*) FROM tableA WHERE tableA.id IN ( SELECT tableA.id FROM tableA LEFT OUTER JOIN tableB ON sc 15 70 4.3 SELECT COUNT(*) FROM tableA WHERE tableA.id IN ( SELECT tableA.id FROM tableA LEFT OUTER JOIN tableB ON sc 16 69 3 SELECT COUNT(*) FROM tableA WHERE tableA.id IN ( SELECT tableA.id FROM tableA LEFT OUTER JOIN tableB ON sc 17 69 5.9 SELECT COUNT(*) FROM tableA WHERE tableA.id IN ( SELECT tableA.id FROM tableA LEFT OUTER JOIN tableB ON sc 18 67 2.6 SELECT COUNT(*) FROM tableA WHERE tableA.id IN ( SELECT tableA.id FROM tableA LEFT OUTER JOIN tableB ON sc 19 65 1.9 SELECT COUNT(*) FROM tableA WHERE tableA.id IN ( SELECT tableA.id FROM tableA LEFT OUTER JOIN tableB ON sc 20 63 4.1 SELECT COUNT(*) FROM tableA WHERE tableA.id IN ( SELECT tableA.id FROM tableA LEFT OUTER JOIN tableB ON sc Most of them are related to user search
  • 8. Solution for the slow query problem • SQL tuning
 -> Done (4 different sorted tables, bit-operation). 
 -> But still slow • Introducing elasticsearch(ES)
 -> Maybe. But ES is a full-text search engine.
 Can ES handle structured data too? Not sure 🤔 • Do an experiment before introducing ES to our project
 -> Create an ES cluster with production data in dev env
 • The result: ES query(0.5s) is faster than mysql one(2s)
  • 9. The first target • User data characteristics • Data are updated in real-time. • A number of target records increases quickly 
 Less than 10,000 in Jan/2017 
 More than 500,000 in Aug/2017 • Job data characteristics • Data are updated three times a day • Data growth isn’t fast • Data structure will change due to a business reason Let’s migrate job search first!
  • 10. Basic Strategy • Data can be recovered from mysql 
 -> Data can be recovered by batch even if entire ES cluster is collapsed • Try to minimize use case of ES
 -> Only focus on searching list of jobs which match the given search condition
 -> Do not deal with trivial queries for a specific job (use mysql) • Use high performance machine
 -> 3 physical data nodes (20CPU,32GB RAM 200GB SSD) • All job related data stored in ES
 -> We had a option that we get only ids by ES and get necessary data from other data source (such as mysql, redis). But we didn’t choose it for simple implementation
  • 11. Performance Tuning • Servers in Cluster
 3 master nodes: To avoid split brain
 3 data nodes: 2 maybe enough, but for safety • Index settings
 1 shard: 1 shard has better performance than 2 shards
 2 replicas: each data node has 1 shard • Indexing performance setting
 indices.store.throttle.max_bytes_per_sec is set 200mb (default 20mb)
  • 12. Performance Tuning • Memory Settings • Thread Pool Settings Default Tuned indices.fielddata.cache.size Unbounded 10% indices.queries.cache.size 10% 10% indices.requests.cache.size 1% 1% Default Tuned thread_pool.index.queue_size 200 1,000 thread_pool.bulk.queue_size 200 1,000 thread_pool.search.queue_size 1,000 5,000
  • 13. Preparation Before Migration • Stress Test
 -> Use gatling (load and performance test tool)
 -> Use queries executed in beta
 -> High load search while indexing
 -> Check the maximum request per second of the cluster (900req/sec)
 -> Long run (more than 24 hours with load) • HA and failure Test
 -> Search function works even when one data node is available
 -> Top page of our service is available even when entire cluster is down
 -> Circuit breaker behavior • Monitoring and Alert
 -> Use prometheus and alertmanager
 -> Server metrics, JVM metrics, ES specific metrics such as query cache size
 -> Alert test before operation
  • 14. As a result of migration
  • 15. Latency of API for job search Before Migration (mysql) After Migration (elasticsearch)
  • 18. Positive side effects • Dedicated full text search engine (groonga) is no longer needed
 Everything can be done in ES • More flexible and precise geo location search than geohash • Simpler code
 -> No SQL tricks and No need to refer many tables • Avoid constructing SQL query by string concatenation
  • 19. Future Plan • adapt ES to other implementation
 -> Job related features that we avoided adapting before 
 -> User search (the main target) • Better full text search with neologd