Fotolog: Scaling the World's Largest Photo Blogging Community

Scaling the World’s Largest Photo Blogging Community Farhan “Frank” Mashraqi Senior MySQL DBA Fotolog, Inc. [email_address] Credits: Warren L. Habib: CTO Olu King: Senior Systems Administrator

Introduction Farhan Mashraqi Senior MySQL DBA Fotolog, Inc. Known on PlanetMySQL as Frank Mash Author of upcoming “Pro Ruby on Rails” by Apress Contact [email_address] [email_address] Blog: http:// mysqldatabaseadministration.blogspot.com http:// mashraqi.com

What is Fotolog? Social networking Guestbook comments Friend/ Favorite lists Members create “Social Capital” “ One photo a day” Currently 25 th most visited website on the Internet (Alexa) History http://blog.fotolog.com/

Fotolog (Screenshot of home page)

Fotolog (Screenshot of a fotolog member page)

Fotolog Growth 228 million member photos 2.47 billion guestbook comments 20% of members visit the site daily 24 minutes a day spent by an average user 10 guestbook comments per photo 1,000 people or more see a photo on average 7 million members and counting “ explosive growth in Europe” Italy and Spain among the fastest-growing countries Recently broke the 500K photos uploaded a day record 90 million page views Fotolog Flickr

Technology Sun Solaris 10 MySQL Apache Java / Hibernate PHP Memcached 3Par IBRIX StrongMail

MySQL at Fotolog 32 Servers Specification of servers Four “clusters” User GB PH FF Non-persistent connections (PHP) Connection Pooling (Java) Mostly MyISAM initially Later mostly converted to InnoDB Application side table partitioning Memcache

Image Storage / Delivery MySQL is used to store image metadata only 3Par (utility storage) Thin Provisioning (dedicate on allocation vs. dedicate on write) How fast growing each day? Frequently Accessed vs. Infrequently accessed media Third party CDN: Akamai/Panther

Important Scalability Considerations Do you really need to have 5 nines availability? Budget Time to deploy Testing Can we afford: SPF? Not having read redundancy? User PH GB FF Not having write redundancy? User PH GB FF

Partitioning SHARD 1 SHARD 2 SHARD 3 Table_v1 Table_v2 Table_v3 Table_v4

GB current db4 db18 db22 db23 db24 db25 db26 db27 db28 db30 db32 Application Servers 4 18 22 23 24 25 26 27 28 30 32 read write Single Point of Failure

GB Scalability db4 db18 db22 db23 db24 db25 db26 db27 db28 db30 db32 Application Servers 4 18 22 23 24 25 26 27 28 30 32 read write 00-08 09-17 18-26 27-35 36-44 45-53 54-62 63-71 72-80 81-89 90-99 Slave Master/DRBD

Current Scheme for fl_db1 repl. PH Application Servers read write Slave DB2 DB1 DB3 DB8 DB12 Application Servers Issuing PH Queries RTX Repl. Repl. Repl. DB7 DB9 DB15 FSW 05DHN AEK 16JOQUZ 28IP _ 39B 4C 7GLVY M DB10 DB11 DB13 DB14 DB16 29 FF. Repl.

Proposed Scheme for PH (Write & Read) Application Servers 7 8 9 10 11 12 13 14 15 16 29 read write 00-08 09-17 18-26 27-35 36-44 45-53 54-62 63-71 72-80 81-89 90-99 TO USER CLUSTER

AUTO-INC table lock contention SEL SEL SEL SEL SEL SEL SEL SEL SEL SEL M Y S Q L Thread concurrency SELECTs do very well with Increased concurrency. QPS: 500+ GOOD TIMES SELECT INSERT

AUTO-INC table lock contention SEL SEL SEL SEL SEL INS INS M Y S Q L Thread concurrency As more SELECTs come, AUTO-INC lock contention Starts causing problem. WARNING SEL SEL SEL SELECT INSERT

AUTO-INC table lock contention INS SEL INS SEL INS INS INS INS INS INS M Y S Q L Thread concurrency PROBLEM SEL SEL SEL SEL INS INS INS INS INS SELECT INSERT

InnoDB Tablespace Structure (Simplified) PK / CLUSTERED INDEX SECONDARY INDEX PK (clustered index key) 6 byte header Links together consecutive records & used in row-level locking Clustered index contains Fields for all user-defined columns 6 byte trx id 7 byte roll pointer 6 byte row id If no PK or UNIQUE NOT NULL defined Record Directory Array of Pointers to each field of the record 1 byte: If the total length of fields in record is 128 bytes 2 bytes: otherwise Data part of record

InnoDB Index Structure (Simplified) DATA PAGE PK INDEX / CLUSTERED INDEX SECONDARY INDEX PK ROW DATA PK

Old Schema CREATE TABLE `guestbook_v3` ( ìdentifier` bigint(20) unsigned NOT NULL auto_increment, ùser_name` varchar(16) NOT NULL default '', `photo_identifier` bigint(20) unsigned NOT NULL default '0', `posted` datetime NOT NULL default '0000-00-00 00:00:00', … PRIMARY KEY (ìdentifier`), KEY `guestbook_photo_id_posted_idx` (`photo_identifier`,`posted`) ) ENGINE=MyISAM

Reads Data pages Data ordered by Identifier (PK) Looked up by secondary key

New Schema CREATE TABLE `guestbook_v4` ( ìdentifier` int(9) unsigned NOT NULL auto_increment, ùser_name` varchar(16) NOT NULL default '', `photo_identifier` int(9) unsigned NOT NULL default '0', `posted` timestamp NOT NULL default '0000-00-00 00:00:00', … PRIMARY KEY (`photo_identifier`,`posted`,ìdentifier`), KEY ìdentifier` (ìdentifier`) ) ENGINE=InnoDB 1 row in set (7.64 sec)

Pending preads (Optimizing Disk Usage) Data pages Data ordered by composite key consisting of photo_identifier (FK) Looked up by primary key Very low read requests per second

Pending reads / writes / Proposed Throughput not as important as number of requests

Pending reads / writes / Proposed

MySQL Performance Challenges Finding the source of problem Mostly disk bound in mature systems Is the query cache hurting you? RAM addition helps dodge the bullet Disk striping Restructuring tables for optimal performance LD_PRELOAD_64 = /usr/lib/sparcv9/libumem.so

Considerations for future growth SQLite? File system? PostgreSQL? Make application better and optimize tables?

Things to remember Know the problem Know your application Know your storage engine Know your requirements Know your budget

Fotolog: Scaling the World's Largest Photo Blogging Community

Related slideshows

More Related Content

What's hot

What's hot (12)

Similar to Fotolog: Scaling the World's Largest Photo Blogging Community

Similar to Fotolog: Scaling the World's Largest Photo Blogging Community (20)

Recently uploaded

Recently uploaded (20)

Fotolog: Scaling the World's Largest Photo Blogging Community