SlideShare a Scribd company logo
Scaling the World’s Largest Photo Blogging Community Farhan “Frank” Mashraqi Senior MySQL DBA Fotolog, Inc. [email_address] Credits:  Warren L. Habib: CTO Olu King: Senior Systems Administrator
Introduction Farhan Mashraqi Senior MySQL DBA Fotolog, Inc. Known on PlanetMySQL as Frank Mash Author of upcoming “Pro Ruby on Rails”  by Apress Contact [email_address] [email_address] Blog: http:// mysqldatabaseadministration.blogspot.com http:// mashraqi.com
What is Fotolog? Social networking Guestbook comments Friend/ Favorite lists Members create “Social Capital” “ One photo a day” Currently 25 th  most visited website on the Internet (Alexa) History http://blog.fotolog.com/
Fotolog (Screenshot of home page)
Fotolog (Screenshot of a fotolog member page)
Fotolog Growth 228 million member photos 2.47 billion guestbook comments 20% of members visit the site daily 24 minutes a day spent by an average user 10 guestbook comments per photo 1,000 people or more see a photo on average 7 million members and counting “ explosive growth in Europe” Italy and Spain among the fastest-growing countries Recently broke the 500K photos uploaded a day record 90 million page views Fotolog Flickr
Technology Sun Solaris 10 MySQL Apache Java / Hibernate PHP Memcached 3Par IBRIX StrongMail
MySQL at Fotolog 32 Servers Specification of servers Four “clusters” User GB PH FF Non-persistent connections (PHP) Connection Pooling (Java) Mostly MyISAM initially Later mostly converted to InnoDB Application side table partitioning Memcache
Image Storage / Delivery MySQL is used to store image metadata only 3Par (utility storage) Thin Provisioning (dedicate on allocation vs. dedicate on write) How fast growing each day? Frequently Accessed vs. Infrequently accessed media Third party CDN: Akamai/Panther
Important Scalability Considerations Do you really need to have 5 nines availability? Budget Time to deploy Testing Can we afford: SPF? Not having read redundancy? User PH GB FF Not having write redundancy? User PH GB FF
Partitioning SHARD 1 SHARD 2 SHARD 3 Table_v1 Table_v2 Table_v3 Table_v4
Partitioning thoughts
Ideal distribution
GB current db4 db18 db22 db23 db24 db25 db26 db27 db28 db30 db32 Application Servers 4 18 22 23 24 25 26 27 28 30 32 read write Single Point of Failure
GB Scalability db4 db18 db22 db23 db24 db25 db26 db27 db28 db30 db32 Application Servers 4 18 22 23 24 25 26 27 28 30 32 read write 00-08 09-17 18-26 27-35 36-44 45-53 54-62 63-71 72-80 81-89 90-99 Slave Master/DRBD
Current Scheme for fl_db1 repl. PH Application Servers read write Slave DB2 DB1 DB3 DB8 DB12 Application Servers Issuing PH  Queries RTX Repl. Repl. Repl. DB7 DB9 DB15 FSW 05DHN AEK 16JOQUZ 28IP _ 39B 4C 7GLVY M DB10 DB11 DB13 DB14 DB16 29 FF. Repl.
Proposed Scheme for PH  (Write & Read) Application Servers 7 8 9 10 11 12 13 14 15 16 29 read write 00-08 09-17 18-26 27-35 36-44 45-53 54-62 63-71 72-80 81-89 90-99 TO USER CLUSTER
AUTO-INC table lock contention SEL SEL SEL SEL SEL SEL SEL SEL SEL SEL M Y S Q L Thread concurrency SELECTs do very well with  Increased concurrency. QPS: 500+ GOOD TIMES SELECT INSERT
AUTO-INC table lock contention SEL SEL SEL SEL SEL INS INS M Y S Q L Thread concurrency As more SELECTs come, AUTO-INC lock contention Starts causing problem. WARNING SEL SEL SEL SELECT INSERT
AUTO-INC table lock contention INS SEL INS SEL INS INS INS INS INS INS M Y S Q L Thread concurrency PROBLEM SEL SEL SEL SEL INS INS INS INS INS SELECT INSERT
InnoDB Tablespace Structure (Simplified) PK / CLUSTERED INDEX SECONDARY INDEX PK  (clustered index key) 6 byte header Links together consecutive records & used in row-level locking Clustered index  contains Fields for all user-defined columns 6 byte trx id 7 byte roll pointer 6 byte row id If no PK or UNIQUE  NOT NULL defined Record Directory Array of Pointers to each field of the record 1 byte: If the total length of fields in  record is 128 bytes 2 bytes: otherwise Data part of record
InnoDB Index Structure (Simplified) DATA PAGE PK INDEX / CLUSTERED INDEX SECONDARY INDEX PK ROW DATA PK
Old Schema CREATE TABLE `guestbook_v3` (   `identifier`  bigint(20)  unsigned NOT NULL auto_increment,   `user_name` varchar(16) NOT NULL default '',   `photo_identifier`  bigint(20)  unsigned NOT NULL default '0',   `posted`  datetime  NOT NULL default '0000-00-00 00:00:00', …   PRIMARY KEY  (`identifier`),   KEY `guestbook_photo_id_posted_idx` (`photo_identifier`,`posted`) ) ENGINE=MyISAM
Reads Data pages Data ordered by Identifier (PK) Looked up by secondary key
New Schema CREATE TABLE `guestbook_v4` (   `identifier`  int(9)  unsigned NOT NULL auto_increment,   `user_name` varchar(16) NOT NULL default '',   `photo_identifier`  int(9)  unsigned NOT NULL default '0',   `posted`  timestamp  NOT NULL default '0000-00-00 00:00:00',   …   PRIMARY KEY  (`photo_identifier`,`posted`,`identifier`),   KEY `identifier` (`identifier`) ) ENGINE=InnoDB 1 row in set (7.64 sec)
Pending preads (Optimizing Disk Usage) Data pages Data ordered by composite key consisting of photo_identifier (FK) Looked up by primary key Very low read requests per second
Pending reads / writes / Proposed Throughput not as important as number of requests
Pending reads / writes / Proposed
Pending reads
MySQL Performance Challenges Finding the source of problem Mostly disk bound in mature systems Is the query cache hurting you? RAM addition helps dodge the bullet Disk striping Restructuring tables for optimal performance LD_PRELOAD_64 = /usr/lib/sparcv9/libumem.so
Considerations for future growth SQLite?  File system? PostgreSQL? Make application better and optimize tables?
Things to remember Know the problem Know your application Know your storage engine Know your requirements Know your budget
Questions?

More Related Content

What's hot

Pgbr 2013 fts
Pgbr 2013 ftsPgbr 2013 fts
Pgbr 2013 fts
Emanuel Calvo
 
From zero to hero - Easy log centralization with Logstash and Elasticsearch
From zero to hero - Easy log centralization with Logstash and ElasticsearchFrom zero to hero - Easy log centralization with Logstash and Elasticsearch
From zero to hero - Easy log centralization with Logstash and Elasticsearch
Rafał Kuć
 
Using server logs to your advantage
Using server logs to your advantageUsing server logs to your advantage
Using server logs to your advantage
Alexandra Johnson
 
Facebook flash api and social game development
Facebook flash api and social game developmentFacebook flash api and social game development
Facebook flash api and social game development
Yenwen Feng
 
XtraDB 5.7: key performance algorithms
XtraDB 5.7: key performance algorithmsXtraDB 5.7: key performance algorithms
XtraDB 5.7: key performance algorithms
Laurynas Biveinis
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
Wim Godden
 
Clickhouse at Cloudflare. By Marek Vavrusa
Clickhouse at Cloudflare. By Marek VavrusaClickhouse at Cloudflare. By Marek Vavrusa
Clickhouse at Cloudflare. By Marek Vavrusa
Valery Tkachenko
 
Server Logs: After Excel Fails
Server Logs: After Excel FailsServer Logs: After Excel Fails
Server Logs: After Excel Fails
Oliver Mason
 
MongoDB performance tuning and load testing, NOSQL Now! 2013 Conference prese...
MongoDB performance tuning and load testing, NOSQL Now! 2013 Conference prese...MongoDB performance tuning and load testing, NOSQL Now! 2013 Conference prese...
MongoDB performance tuning and load testing, NOSQL Now! 2013 Conference prese...
ronwarshawsky
 
Fluentd and Docker - running fluentd within a docker container
Fluentd and Docker - running fluentd within a docker containerFluentd and Docker - running fluentd within a docker container
Fluentd and Docker - running fluentd within a docker container
Treasure Data, Inc.
 
How to build analytics for 100bn logs a month with ClickHouse. By Vadim Tkach...
How to build analytics for 100bn logs a month with ClickHouse. By Vadim Tkach...How to build analytics for 100bn logs a month with ClickHouse. By Vadim Tkach...
How to build analytics for 100bn logs a month with ClickHouse. By Vadim Tkach...
Valery Tkachenko
 
re:dash is awesome
re:dash is awesomere:dash is awesome
re:dash is awesome
Hiroshi Toyama
 

What's hot (12)

Pgbr 2013 fts
Pgbr 2013 ftsPgbr 2013 fts
Pgbr 2013 fts
 
From zero to hero - Easy log centralization with Logstash and Elasticsearch
From zero to hero - Easy log centralization with Logstash and ElasticsearchFrom zero to hero - Easy log centralization with Logstash and Elasticsearch
From zero to hero - Easy log centralization with Logstash and Elasticsearch
 
Using server logs to your advantage
Using server logs to your advantageUsing server logs to your advantage
Using server logs to your advantage
 
Facebook flash api and social game development
Facebook flash api and social game developmentFacebook flash api and social game development
Facebook flash api and social game development
 
XtraDB 5.7: key performance algorithms
XtraDB 5.7: key performance algorithmsXtraDB 5.7: key performance algorithms
XtraDB 5.7: key performance algorithms
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
 
Clickhouse at Cloudflare. By Marek Vavrusa
Clickhouse at Cloudflare. By Marek VavrusaClickhouse at Cloudflare. By Marek Vavrusa
Clickhouse at Cloudflare. By Marek Vavrusa
 
Server Logs: After Excel Fails
Server Logs: After Excel FailsServer Logs: After Excel Fails
Server Logs: After Excel Fails
 
MongoDB performance tuning and load testing, NOSQL Now! 2013 Conference prese...
MongoDB performance tuning and load testing, NOSQL Now! 2013 Conference prese...MongoDB performance tuning and load testing, NOSQL Now! 2013 Conference prese...
MongoDB performance tuning and load testing, NOSQL Now! 2013 Conference prese...
 
Fluentd and Docker - running fluentd within a docker container
Fluentd and Docker - running fluentd within a docker containerFluentd and Docker - running fluentd within a docker container
Fluentd and Docker - running fluentd within a docker container
 
How to build analytics for 100bn logs a month with ClickHouse. By Vadim Tkach...
How to build analytics for 100bn logs a month with ClickHouse. By Vadim Tkach...How to build analytics for 100bn logs a month with ClickHouse. By Vadim Tkach...
How to build analytics for 100bn logs a month with ClickHouse. By Vadim Tkach...
 
re:dash is awesome
re:dash is awesomere:dash is awesome
re:dash is awesome
 

Similar to Fotolog: Scaling the World's Largest Photo Blogging Community

Fotolog.Com.Mashraqi Scaling
Fotolog.Com.Mashraqi ScalingFotolog.Com.Mashraqi Scaling
Fotolog.Com.Mashraqi Scaling
Frank Cai
 
[db tech showcase Tokyo 2017] C23: Lessons from SQLite4 by SQLite.org - Richa...
[db tech showcase Tokyo 2017] C23: Lessons from SQLite4 by SQLite.org - Richa...[db tech showcase Tokyo 2017] C23: Lessons from SQLite4 by SQLite.org - Richa...
[db tech showcase Tokyo 2017] C23: Lessons from SQLite4 by SQLite.org - Richa...
Insight Technology, Inc.
 
23 October 2013 - AWS 201 - A Walk through the AWS Cloud: Introduction to Ama...
23 October 2013 - AWS 201 - A Walk through the AWS Cloud: Introduction to Ama...23 October 2013 - AWS 201 - A Walk through the AWS Cloud: Introduction to Ama...
23 October 2013 - AWS 201 - A Walk through the AWS Cloud: Introduction to Ama...
Amazon Web Services
 
EEDC 2010. Scaling Web Applications
EEDC 2010. Scaling Web ApplicationsEEDC 2010. Scaling Web Applications
EEDC 2010. Scaling Web Applications
Expertos en TI
 
15 Ways to Kill Your Mysql Application Performance
15 Ways to Kill Your Mysql Application Performance15 Ways to Kill Your Mysql Application Performance
15 Ways to Kill Your Mysql Application Performance
guest9912e5
 
How sitecore depends on mongo db for scalability and performance, and what it...
How sitecore depends on mongo db for scalability and performance, and what it...How sitecore depends on mongo db for scalability and performance, and what it...
How sitecore depends on mongo db for scalability and performance, and what it...
Antonios Giannopoulos
 
Real time analytics at uber @ strata data 2019
Real time analytics at uber @ strata data 2019Real time analytics at uber @ strata data 2019
Real time analytics at uber @ strata data 2019
Zhenxiao Luo
 
User Group3009
User Group3009User Group3009
User Group3009
sqlserver.co.il
 
String Comparison Surprises: Did Postgres lose my data?
String Comparison Surprises: Did Postgres lose my data?String Comparison Surprises: Did Postgres lose my data?
String Comparison Surprises: Did Postgres lose my data?
Jeremy Schneider
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
antoinegirbal
 
2011 Mongo FR - MongoDB introduction
2011 Mongo FR - MongoDB introduction2011 Mongo FR - MongoDB introduction
2011 Mongo FR - MongoDB introduction
antoinegirbal
 
Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...
Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...
Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...
NoSQLmatters
 
Creating PostgreSQL-as-a-Service at Scale
Creating PostgreSQL-as-a-Service at ScaleCreating PostgreSQL-as-a-Service at Scale
Creating PostgreSQL-as-a-Service at Scale
Sean Chittenden
 
Maryna Popova "Deep dive AWS Redshift"
Maryna Popova "Deep dive AWS Redshift"Maryna Popova "Deep dive AWS Redshift"
Maryna Popova "Deep dive AWS Redshift"
Lviv Startup Club
 
Apache Spark 3.0: Overview of What’s New and Why Care
Apache Spark 3.0: Overview of What’s New and Why CareApache Spark 3.0: Overview of What’s New and Why Care
Apache Spark 3.0: Overview of What’s New and Why Care
Databricks
 
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
Citus Data
 
Migrating To PostgreSQL
Migrating To PostgreSQLMigrating To PostgreSQL
Migrating To PostgreSQL
Grant Fritchey
 
MySQL 5.6 - Operations and Diagnostics Improvements
MySQL 5.6 - Operations and Diagnostics ImprovementsMySQL 5.6 - Operations and Diagnostics Improvements
MySQL 5.6 - Operations and Diagnostics Improvements
Morgan Tocker
 
The Adventure: BlackRay as a Storage Engine
The Adventure: BlackRay as a Storage EngineThe Adventure: BlackRay as a Storage Engine
The Adventure: BlackRay as a Storage Engine
fschupp
 
MongoDB WiredTiger Internals
MongoDB WiredTiger InternalsMongoDB WiredTiger Internals
MongoDB WiredTiger Internals
Norberto Leite
 

Similar to Fotolog: Scaling the World's Largest Photo Blogging Community (20)

Fotolog.Com.Mashraqi Scaling
Fotolog.Com.Mashraqi ScalingFotolog.Com.Mashraqi Scaling
Fotolog.Com.Mashraqi Scaling
 
[db tech showcase Tokyo 2017] C23: Lessons from SQLite4 by SQLite.org - Richa...
[db tech showcase Tokyo 2017] C23: Lessons from SQLite4 by SQLite.org - Richa...[db tech showcase Tokyo 2017] C23: Lessons from SQLite4 by SQLite.org - Richa...
[db tech showcase Tokyo 2017] C23: Lessons from SQLite4 by SQLite.org - Richa...
 
23 October 2013 - AWS 201 - A Walk through the AWS Cloud: Introduction to Ama...
23 October 2013 - AWS 201 - A Walk through the AWS Cloud: Introduction to Ama...23 October 2013 - AWS 201 - A Walk through the AWS Cloud: Introduction to Ama...
23 October 2013 - AWS 201 - A Walk through the AWS Cloud: Introduction to Ama...
 
EEDC 2010. Scaling Web Applications
EEDC 2010. Scaling Web ApplicationsEEDC 2010. Scaling Web Applications
EEDC 2010. Scaling Web Applications
 
15 Ways to Kill Your Mysql Application Performance
15 Ways to Kill Your Mysql Application Performance15 Ways to Kill Your Mysql Application Performance
15 Ways to Kill Your Mysql Application Performance
 
How sitecore depends on mongo db for scalability and performance, and what it...
How sitecore depends on mongo db for scalability and performance, and what it...How sitecore depends on mongo db for scalability and performance, and what it...
How sitecore depends on mongo db for scalability and performance, and what it...
 
Real time analytics at uber @ strata data 2019
Real time analytics at uber @ strata data 2019Real time analytics at uber @ strata data 2019
Real time analytics at uber @ strata data 2019
 
User Group3009
User Group3009User Group3009
User Group3009
 
String Comparison Surprises: Did Postgres lose my data?
String Comparison Surprises: Did Postgres lose my data?String Comparison Surprises: Did Postgres lose my data?
String Comparison Surprises: Did Postgres lose my data?
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
2011 Mongo FR - MongoDB introduction
2011 Mongo FR - MongoDB introduction2011 Mongo FR - MongoDB introduction
2011 Mongo FR - MongoDB introduction
 
Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...
Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...
Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...
 
Creating PostgreSQL-as-a-Service at Scale
Creating PostgreSQL-as-a-Service at ScaleCreating PostgreSQL-as-a-Service at Scale
Creating PostgreSQL-as-a-Service at Scale
 
Maryna Popova "Deep dive AWS Redshift"
Maryna Popova "Deep dive AWS Redshift"Maryna Popova "Deep dive AWS Redshift"
Maryna Popova "Deep dive AWS Redshift"
 
Apache Spark 3.0: Overview of What’s New and Why Care
Apache Spark 3.0: Overview of What’s New and Why CareApache Spark 3.0: Overview of What’s New and Why Care
Apache Spark 3.0: Overview of What’s New and Why Care
 
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
 
Migrating To PostgreSQL
Migrating To PostgreSQLMigrating To PostgreSQL
Migrating To PostgreSQL
 
MySQL 5.6 - Operations and Diagnostics Improvements
MySQL 5.6 - Operations and Diagnostics ImprovementsMySQL 5.6 - Operations and Diagnostics Improvements
MySQL 5.6 - Operations and Diagnostics Improvements
 
The Adventure: BlackRay as a Storage Engine
The Adventure: BlackRay as a Storage EngineThe Adventure: BlackRay as a Storage Engine
The Adventure: BlackRay as a Storage Engine
 
MongoDB WiredTiger Internals
MongoDB WiredTiger InternalsMongoDB WiredTiger Internals
MongoDB WiredTiger Internals
 

Recently uploaded

It's your unstructured data: How to get your GenAI app to production (and spe...
It's your unstructured data: How to get your GenAI app to production (and spe...It's your unstructured data: How to get your GenAI app to production (and spe...
It's your unstructured data: How to get your GenAI app to production (and spe...
Zilliz
 
Increase Quality with User Access Policies - July 2024
Increase Quality with User Access Policies - July 2024Increase Quality with User Access Policies - July 2024
Increase Quality with User Access Policies - July 2024
Peter Caitens
 
Generative AI technology is a fascinating field that focuses on creating comp...
Generative AI technology is a fascinating field that focuses on creating comp...Generative AI technology is a fascinating field that focuses on creating comp...
Generative AI technology is a fascinating field that focuses on creating comp...
Nohoax Kanont
 
Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...
Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...
Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...
Snarky Security
 
Finetuning GenAI For Hacking and Defending
Finetuning GenAI For Hacking and DefendingFinetuning GenAI For Hacking and Defending
Finetuning GenAI For Hacking and Defending
Priyanka Aash
 
Discovery Series - Zero to Hero - Task Mining Session 1
Discovery Series - Zero to Hero - Task Mining Session 1Discovery Series - Zero to Hero - Task Mining Session 1
Discovery Series - Zero to Hero - Task Mining Session 1
DianaGray10
 
FIDO Munich Seminar FIDO Automotive Apps.pptx
FIDO Munich Seminar FIDO Automotive Apps.pptxFIDO Munich Seminar FIDO Automotive Apps.pptx
FIDO Munich Seminar FIDO Automotive Apps.pptx
FIDO Alliance
 
FIDO Munich Seminar: Securing Smart Car.pptx
FIDO Munich Seminar: Securing Smart Car.pptxFIDO Munich Seminar: Securing Smart Car.pptx
FIDO Munich Seminar: Securing Smart Car.pptx
FIDO Alliance
 
Redefining Cybersecurity with AI Capabilities
Redefining Cybersecurity with AI CapabilitiesRedefining Cybersecurity with AI Capabilities
Redefining Cybersecurity with AI Capabilities
Priyanka Aash
 
FIDO Munich Seminar In-Vehicle Payment Trends.pptx
FIDO Munich Seminar In-Vehicle Payment Trends.pptxFIDO Munich Seminar In-Vehicle Payment Trends.pptx
FIDO Munich Seminar In-Vehicle Payment Trends.pptx
FIDO Alliance
 
How UiPath Discovery Suite supports identification of Agentic Process Automat...
How UiPath Discovery Suite supports identification of Agentic Process Automat...How UiPath Discovery Suite supports identification of Agentic Process Automat...
How UiPath Discovery Suite supports identification of Agentic Process Automat...
DianaGray10
 
Garbage In, Garbage Out: Why poor data curation is killing your AI models (an...
Garbage In, Garbage Out: Why poor data curation is killing your AI models (an...Garbage In, Garbage Out: Why poor data curation is killing your AI models (an...
Garbage In, Garbage Out: Why poor data curation is killing your AI models (an...
Zilliz
 
Demystifying Neural Networks And Building Cybersecurity Applications
Demystifying Neural Networks And Building Cybersecurity ApplicationsDemystifying Neural Networks And Building Cybersecurity Applications
Demystifying Neural Networks And Building Cybersecurity Applications
Priyanka Aash
 
Generative AI Reasoning Tech Talk - July 2024
Generative AI Reasoning Tech Talk - July 2024Generative AI Reasoning Tech Talk - July 2024
Generative AI Reasoning Tech Talk - July 2024
siddu769252
 
Keynote : Presentation on SASE Technology
Keynote : Presentation on SASE TechnologyKeynote : Presentation on SASE Technology
Keynote : Presentation on SASE Technology
Priyanka Aash
 
The History of Embeddings & Multimodal Embeddings
The History of Embeddings & Multimodal EmbeddingsThe History of Embeddings & Multimodal Embeddings
The History of Embeddings & Multimodal Embeddings
Zilliz
 
FIDO Munich Seminar: Biometrics and Passkeys for In-Vehicle Apps.pptx
FIDO Munich Seminar: Biometrics and Passkeys for In-Vehicle Apps.pptxFIDO Munich Seminar: Biometrics and Passkeys for In-Vehicle Apps.pptx
FIDO Munich Seminar: Biometrics and Passkeys for In-Vehicle Apps.pptx
FIDO Alliance
 
TrustArc Webinar - Innovating with TRUSTe Responsible AI Certification
TrustArc Webinar - Innovating with TRUSTe Responsible AI CertificationTrustArc Webinar - Innovating with TRUSTe Responsible AI Certification
TrustArc Webinar - Innovating with TRUSTe Responsible AI Certification
TrustArc
 
FIDO Munich Seminar: Strong Workforce Authn Push & Pull Factors.pptx
FIDO Munich Seminar: Strong Workforce Authn Push & Pull Factors.pptxFIDO Munich Seminar: Strong Workforce Authn Push & Pull Factors.pptx
FIDO Munich Seminar: Strong Workforce Authn Push & Pull Factors.pptx
FIDO Alliance
 
History and Introduction for Generative AI ( GenAI )
History and Introduction for Generative AI ( GenAI )History and Introduction for Generative AI ( GenAI )
History and Introduction for Generative AI ( GenAI )
Badri_Bady
 

Recently uploaded (20)

It's your unstructured data: How to get your GenAI app to production (and spe...
It's your unstructured data: How to get your GenAI app to production (and spe...It's your unstructured data: How to get your GenAI app to production (and spe...
It's your unstructured data: How to get your GenAI app to production (and spe...
 
Increase Quality with User Access Policies - July 2024
Increase Quality with User Access Policies - July 2024Increase Quality with User Access Policies - July 2024
Increase Quality with User Access Policies - July 2024
 
Generative AI technology is a fascinating field that focuses on creating comp...
Generative AI technology is a fascinating field that focuses on creating comp...Generative AI technology is a fascinating field that focuses on creating comp...
Generative AI technology is a fascinating field that focuses on creating comp...
 
Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...
Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...
Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...
 
Finetuning GenAI For Hacking and Defending
Finetuning GenAI For Hacking and DefendingFinetuning GenAI For Hacking and Defending
Finetuning GenAI For Hacking and Defending
 
Discovery Series - Zero to Hero - Task Mining Session 1
Discovery Series - Zero to Hero - Task Mining Session 1Discovery Series - Zero to Hero - Task Mining Session 1
Discovery Series - Zero to Hero - Task Mining Session 1
 
FIDO Munich Seminar FIDO Automotive Apps.pptx
FIDO Munich Seminar FIDO Automotive Apps.pptxFIDO Munich Seminar FIDO Automotive Apps.pptx
FIDO Munich Seminar FIDO Automotive Apps.pptx
 
FIDO Munich Seminar: Securing Smart Car.pptx
FIDO Munich Seminar: Securing Smart Car.pptxFIDO Munich Seminar: Securing Smart Car.pptx
FIDO Munich Seminar: Securing Smart Car.pptx
 
Redefining Cybersecurity with AI Capabilities
Redefining Cybersecurity with AI CapabilitiesRedefining Cybersecurity with AI Capabilities
Redefining Cybersecurity with AI Capabilities
 
FIDO Munich Seminar In-Vehicle Payment Trends.pptx
FIDO Munich Seminar In-Vehicle Payment Trends.pptxFIDO Munich Seminar In-Vehicle Payment Trends.pptx
FIDO Munich Seminar In-Vehicle Payment Trends.pptx
 
How UiPath Discovery Suite supports identification of Agentic Process Automat...
How UiPath Discovery Suite supports identification of Agentic Process Automat...How UiPath Discovery Suite supports identification of Agentic Process Automat...
How UiPath Discovery Suite supports identification of Agentic Process Automat...
 
Garbage In, Garbage Out: Why poor data curation is killing your AI models (an...
Garbage In, Garbage Out: Why poor data curation is killing your AI models (an...Garbage In, Garbage Out: Why poor data curation is killing your AI models (an...
Garbage In, Garbage Out: Why poor data curation is killing your AI models (an...
 
Demystifying Neural Networks And Building Cybersecurity Applications
Demystifying Neural Networks And Building Cybersecurity ApplicationsDemystifying Neural Networks And Building Cybersecurity Applications
Demystifying Neural Networks And Building Cybersecurity Applications
 
Generative AI Reasoning Tech Talk - July 2024
Generative AI Reasoning Tech Talk - July 2024Generative AI Reasoning Tech Talk - July 2024
Generative AI Reasoning Tech Talk - July 2024
 
Keynote : Presentation on SASE Technology
Keynote : Presentation on SASE TechnologyKeynote : Presentation on SASE Technology
Keynote : Presentation on SASE Technology
 
The History of Embeddings & Multimodal Embeddings
The History of Embeddings & Multimodal EmbeddingsThe History of Embeddings & Multimodal Embeddings
The History of Embeddings & Multimodal Embeddings
 
FIDO Munich Seminar: Biometrics and Passkeys for In-Vehicle Apps.pptx
FIDO Munich Seminar: Biometrics and Passkeys for In-Vehicle Apps.pptxFIDO Munich Seminar: Biometrics and Passkeys for In-Vehicle Apps.pptx
FIDO Munich Seminar: Biometrics and Passkeys for In-Vehicle Apps.pptx
 
TrustArc Webinar - Innovating with TRUSTe Responsible AI Certification
TrustArc Webinar - Innovating with TRUSTe Responsible AI CertificationTrustArc Webinar - Innovating with TRUSTe Responsible AI Certification
TrustArc Webinar - Innovating with TRUSTe Responsible AI Certification
 
FIDO Munich Seminar: Strong Workforce Authn Push & Pull Factors.pptx
FIDO Munich Seminar: Strong Workforce Authn Push & Pull Factors.pptxFIDO Munich Seminar: Strong Workforce Authn Push & Pull Factors.pptx
FIDO Munich Seminar: Strong Workforce Authn Push & Pull Factors.pptx
 
History and Introduction for Generative AI ( GenAI )
History and Introduction for Generative AI ( GenAI )History and Introduction for Generative AI ( GenAI )
History and Introduction for Generative AI ( GenAI )
 

Fotolog: Scaling the World's Largest Photo Blogging Community

  • 1. Scaling the World’s Largest Photo Blogging Community Farhan “Frank” Mashraqi Senior MySQL DBA Fotolog, Inc. [email_address] Credits: Warren L. Habib: CTO Olu King: Senior Systems Administrator
  • 2. Introduction Farhan Mashraqi Senior MySQL DBA Fotolog, Inc. Known on PlanetMySQL as Frank Mash Author of upcoming “Pro Ruby on Rails” by Apress Contact [email_address] [email_address] Blog: http:// mysqldatabaseadministration.blogspot.com http:// mashraqi.com
  • 3. What is Fotolog? Social networking Guestbook comments Friend/ Favorite lists Members create “Social Capital” “ One photo a day” Currently 25 th most visited website on the Internet (Alexa) History http://blog.fotolog.com/
  • 5. Fotolog (Screenshot of a fotolog member page)
  • 6. Fotolog Growth 228 million member photos 2.47 billion guestbook comments 20% of members visit the site daily 24 minutes a day spent by an average user 10 guestbook comments per photo 1,000 people or more see a photo on average 7 million members and counting “ explosive growth in Europe” Italy and Spain among the fastest-growing countries Recently broke the 500K photos uploaded a day record 90 million page views Fotolog Flickr
  • 7. Technology Sun Solaris 10 MySQL Apache Java / Hibernate PHP Memcached 3Par IBRIX StrongMail
  • 8. MySQL at Fotolog 32 Servers Specification of servers Four “clusters” User GB PH FF Non-persistent connections (PHP) Connection Pooling (Java) Mostly MyISAM initially Later mostly converted to InnoDB Application side table partitioning Memcache
  • 9. Image Storage / Delivery MySQL is used to store image metadata only 3Par (utility storage) Thin Provisioning (dedicate on allocation vs. dedicate on write) How fast growing each day? Frequently Accessed vs. Infrequently accessed media Third party CDN: Akamai/Panther
  • 10. Important Scalability Considerations Do you really need to have 5 nines availability? Budget Time to deploy Testing Can we afford: SPF? Not having read redundancy? User PH GB FF Not having write redundancy? User PH GB FF
  • 11. Partitioning SHARD 1 SHARD 2 SHARD 3 Table_v1 Table_v2 Table_v3 Table_v4
  • 14. GB current db4 db18 db22 db23 db24 db25 db26 db27 db28 db30 db32 Application Servers 4 18 22 23 24 25 26 27 28 30 32 read write Single Point of Failure
  • 15. GB Scalability db4 db18 db22 db23 db24 db25 db26 db27 db28 db30 db32 Application Servers 4 18 22 23 24 25 26 27 28 30 32 read write 00-08 09-17 18-26 27-35 36-44 45-53 54-62 63-71 72-80 81-89 90-99 Slave Master/DRBD
  • 16. Current Scheme for fl_db1 repl. PH Application Servers read write Slave DB2 DB1 DB3 DB8 DB12 Application Servers Issuing PH Queries RTX Repl. Repl. Repl. DB7 DB9 DB15 FSW 05DHN AEK 16JOQUZ 28IP _ 39B 4C 7GLVY M DB10 DB11 DB13 DB14 DB16 29 FF. Repl.
  • 17. Proposed Scheme for PH (Write & Read) Application Servers 7 8 9 10 11 12 13 14 15 16 29 read write 00-08 09-17 18-26 27-35 36-44 45-53 54-62 63-71 72-80 81-89 90-99 TO USER CLUSTER
  • 18. AUTO-INC table lock contention SEL SEL SEL SEL SEL SEL SEL SEL SEL SEL M Y S Q L Thread concurrency SELECTs do very well with Increased concurrency. QPS: 500+ GOOD TIMES SELECT INSERT
  • 19. AUTO-INC table lock contention SEL SEL SEL SEL SEL INS INS M Y S Q L Thread concurrency As more SELECTs come, AUTO-INC lock contention Starts causing problem. WARNING SEL SEL SEL SELECT INSERT
  • 20. AUTO-INC table lock contention INS SEL INS SEL INS INS INS INS INS INS M Y S Q L Thread concurrency PROBLEM SEL SEL SEL SEL INS INS INS INS INS SELECT INSERT
  • 21. InnoDB Tablespace Structure (Simplified) PK / CLUSTERED INDEX SECONDARY INDEX PK (clustered index key) 6 byte header Links together consecutive records & used in row-level locking Clustered index contains Fields for all user-defined columns 6 byte trx id 7 byte roll pointer 6 byte row id If no PK or UNIQUE NOT NULL defined Record Directory Array of Pointers to each field of the record 1 byte: If the total length of fields in record is 128 bytes 2 bytes: otherwise Data part of record
  • 22. InnoDB Index Structure (Simplified) DATA PAGE PK INDEX / CLUSTERED INDEX SECONDARY INDEX PK ROW DATA PK
  • 23. Old Schema CREATE TABLE `guestbook_v3` ( `identifier` bigint(20) unsigned NOT NULL auto_increment, `user_name` varchar(16) NOT NULL default '', `photo_identifier` bigint(20) unsigned NOT NULL default '0', `posted` datetime NOT NULL default '0000-00-00 00:00:00', … PRIMARY KEY (`identifier`), KEY `guestbook_photo_id_posted_idx` (`photo_identifier`,`posted`) ) ENGINE=MyISAM
  • 24. Reads Data pages Data ordered by Identifier (PK) Looked up by secondary key
  • 25. New Schema CREATE TABLE `guestbook_v4` ( `identifier` int(9) unsigned NOT NULL auto_increment, `user_name` varchar(16) NOT NULL default '', `photo_identifier` int(9) unsigned NOT NULL default '0', `posted` timestamp NOT NULL default '0000-00-00 00:00:00', … PRIMARY KEY (`photo_identifier`,`posted`,`identifier`), KEY `identifier` (`identifier`) ) ENGINE=InnoDB 1 row in set (7.64 sec)
  • 26. Pending preads (Optimizing Disk Usage) Data pages Data ordered by composite key consisting of photo_identifier (FK) Looked up by primary key Very low read requests per second
  • 27. Pending reads / writes / Proposed Throughput not as important as number of requests
  • 28. Pending reads / writes / Proposed
  • 30. MySQL Performance Challenges Finding the source of problem Mostly disk bound in mature systems Is the query cache hurting you? RAM addition helps dodge the bullet Disk striping Restructuring tables for optimal performance LD_PRELOAD_64 = /usr/lib/sparcv9/libumem.so
  • 31. Considerations for future growth SQLite? File system? PostgreSQL? Make application better and optimize tables?
  • 32. Things to remember Know the problem Know your application Know your storage engine Know your requirements Know your budget