These are the slides from my presentation at MySQL Conference and Expo 2007 held in Santa Clara, CA. The talk was focused on scaling InnoDB to meet Fotolog's unique challenges.
The document discusses PostgreSQL full-text search (FTS). It covers FTS concepts like parsers, tokens, lexemes, and dictionaries. It also discusses native PostgreSQL FTS support and external solutions like Sphinx and Solr. The document provides examples of using FTS indexes and queries, and tips on preprocessing, ranking, and automating updates of FTS vectors.
From zero to hero - Easy log centralization with Logstash and ElasticsearchRafał Kuć
Presentation I gave during DevOps Days Warsaw 2014 about combining Elasticsearch, Logstash and Kibana together or use our Logsene solution instead of Elasticsearch.
My presentation from Optimise Oxford in November 2016.
In it I discuss why you should be making use of server logs, and how to go about utilising them.
Facebook flash api and social game developmentYenwen Feng
Gamelet published 17 games including 15 real-time synchronized games and 2 social games using their experiences developing games on Facebook and Flash. They provided tips and tricks for development including managing art and developer workflows, improving performance, fighting cheaters, and using Facebook APIs and Graph API from Flash without reloading the page.
In this session we will discuss selected areas of InnoDB and XtraDB 5.7 internals that are mostly related to buffer pool management, flushing, and the doublewrite buffer, from a performance and scalability point of view.
Beyond php - it's not (just) about the codeWim Godden
Most PHP developers focus on writing code. But creating Web applications is about much more than just wrting PHP. Take a step outside the PHP cocoon and into the big PHP ecosphere to find out how small code changes can make a world of difference on servers and network. This talk is an eye-opener for developers who spend over 80% of their time coding, debugging and testing.
Cloudflare uses ClickHouse to analyze over 1 million DNS queries per second from its global network. ClickHouse is a column-oriented database that allows Cloudflare to perform complex ad-hoc queries and aggregations over trillions of rows of DNS log data with dimensions like timestamp, zone, and location. They store raw logs for 3 months and aggregated data indefinitely to monitor trends and traffic over time. The multi-tenant ClickHouse cluster at Cloudflare inserts over 8 million rows per second and has excellent query performance for common aggregations used in their analytics.
This document discusses MongoDB performance tuning and load testing. It provides an overview of areas to optimize like OS, storage and database tuning. Specific techniques are outlined like using SSDs, adjusting journal settings and compacting collections. Load testing is recommended to validate upgrades and hardware changes using tools like Mongo-Perf. The document is from a presentation by Ron Warshawsky of Enteros, a software company that provides performance management and load testing solutions for databases.
How to build analytics for 100bn logs a month with ClickHouse. By Vadim Tkach...Valery Tkachenko
Vectorization improves performance by representing data as arrays that can be processed in tight loops by CPUs. This allows compilers to generate SIMD instructions to optimize processing multiple values simultaneously. Modern CPUs also benefit from vectorization by executing multiple loop iterations concurrently through out-of-order execution. Studies have shown vectorized execution can improve performance of data-intensive queries in ClickHouse by up to a factor of 50 compared to non-vectorized execution.
re:dash is a tool for sharing SQL queries, visualizing results, and scheduling automated refreshes. It supports connecting to various data sources, provides a low-cost option on AWS, and enables caching of query results for improved performance. Key features include sharing queries with team members, running queries on a schedule, connecting to backends like PostgreSQL, and programming visualizations and parameters through the HTTP API. It also focuses on security features such as authentication, authorization, auditing, and SSL encryption.
This document summarizes Farhan Mashraqi's presentation about scaling the MySQL database that powers the photo blogging website Fotolog. It describes how Fotolog has grown to host over 228 million photos and 2.47 billion comments. The MySQL infrastructure consists of 32 servers split across four clusters to handle the large volume of reads and writes. Key aspects discussed include table partitioning, improving performance through index changes and switching to InnoDB, and strategies for ongoing scalability.
[db tech showcase Tokyo 2017] C23: Lessons from SQLite4 by SQLite.org - Richa...Insight Technology, Inc.
SQLite4 was a project started at the beginning of 2012 and designed to provide a follow-on to SQLite3 without the constraints of backwards compatibility. SQLite4 was built around a Log Structured Merge (LSM) storage engine that is transactional, stores all content in a single file on disk, and that is faster than LevelDB. Other innovations in include the use of decimal floating-point arthimetic and a single storage engine namespace used for all tables and indexes. Expectations were initially high. However, development stopped about 2.5 years later, after finding that the design of SQLite4 would never be competitive with SQLite3. This talk overviews the technological ideas tried in SQLite4 and discusses why they did not work out for the kinds of workloads typically encountered for an embedded database engine.
23 October 2013 - AWS 201 - A Walk through the AWS Cloud: Introduction to Ama...Amazon Web Services
Amazon Redshift is the new data warehouse service from Amazon Web Services. Redshift offers you fast query performance when analyzing data sets from a few hundred gigabytes to over a petabyte at a fraction of the cost of traditional solutions. In this webinar, we will take a detailed look at Redshift, including a live demonstration. This webinar is ideal for anyone looking to gain deeper insight into their data, without the usual challenges of time, cost and effort.
Seminario realizado en el marco del master CANS en la Facultad de Informática de Barcelona.
Anatomia de una aplicación Web
Demasiadas escrituras en la BD, ¿qué puedo hacer?
¿Cómo puedo aprovechar el "Cloud"?
Optimizando aplicaciones Facebook
15 Ways to Kill Your Mysql Application Performanceguest9912e5
Jay is the North American Community Relations Manager at MySQL. Author of Pro MySQL, Jay has also written articles for Linux Magazine and regularly assists software developers in identifying how to make the most effective use of MySQL. He has given sessions on performance tuning at the MySQL Users Conference, RedHat Summit, NY PHP Conference, OSCON and Ohio LinuxFest, among others.In his abundant free time, when not being pestered by his two needy cats and two noisy dogs, he daydreams in PHP code and ponders the ramifications of __clone().
How sitecore depends on mongo db for scalability and performance, and what it...Antonios Giannopoulos
Percona Live 2017 - How sitecore depends on mongo db for scalability and performance, and what it can teach you by Antonios Giannopoulos and Grant Killian
Real time analytics at uber @ strata data 2019Zhenxiao Luo
This document summarizes Uber's use of Presto, an open source distributed SQL query engine, for real-time analytics and business intelligence. Presto allows Uber to query petabytes of data across different data sources like HDFS, Elasticsearch, Pinot and databases in seconds. Uber has optimized Presto for its scale with contributions like geospatial support, security features and connectors. Presto is critical for Uber's data scientists, analysts and operations to power applications, machine learning and business decisions.
The document provides an overview of SSIS connectivity options for Oracle, DB2 and SAP databases. It discusses the various connectors that can be used to extract, load and transform data between these enterprise databases and SQL Server. Performance tests were conducted using these connectors to load and extract data from Oracle, DB2 and SAP systems. Tips are provided on optimizing extraction and loading speeds by leveraging data type conversions and parallel processing capabilities.
String Comparison Surprises: Did Postgres lose my data?Jeremy Schneider
Comparisons are fundamental to computing - and comparing strings is not nearly as straightforward as you might think. Come learn about the history, nuance and surprises of “putting words in order” that you never knew existed in computer science, and how that nuance impacts both general programming and SQL programming. Next, walk through a few actual scenarios and demonstrations using PostgreSQL as a user and administrator, which you can re-run yourself later for further study, including one way you could easily corrupt your self-managed PostgreSQL database if you aren't prepared. Finally we’ll dive into an explanation of the surprising behaviors we saw in PostgreSQL, and learn more about user and administrative features PostgreSQL provides related to localized string comparison.
- MongoDB is a document-oriented, non-relational database that scales horizontally and uses JSON-like documents with dynamic schemas.
- It offers features like embedded documents, indexing, replication, and sharding.
- Documents are stored and queried using simple statements in a JavaScript-like syntax interface.
- MongoDB is a non-relational, document-oriented database that scales horizontally and uses JSON-like documents with dynamic schemas.
- It supports complex queries, embedded documents and arrays, and aggregation and MapReduce for querying and transforming data.
- MongoDB is used by many large companies for operational databases and analytics due to its scalability, flexibility, and performance.
Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...NoSQLmatters
Simon Elliston Ball – When to NoSQL and When to Know SQL
With NoSQL, NewSQL and plain old SQL, there are so many tools around it’s not always clear which is the right one for the job.This is a look at a series of NoSQL technologies, comparing them against traditional SQL technology. I’ll compare real use cases and show how they are solved with both NoSQL options, and traditional SQL servers, and then see who wins. We’ll look at some code and architecture examples that fit a variety of NoSQL techniques, and some where SQL is a better answer. We’ll see some big data problems, little data problems, and a bunch of new and old database technologies to find whatever it takes to solve the problem.By the end you’ll hopefully know more NoSQL, and maybe even have a few new tricks with SQL, and what’s more how to choose the right tool for the job.
Description of some of the elements that go in to creating a PostgreSQL-as-a-Service for organizations with many teams and a diverse ecosystem of applications and teams.
Amazon Redshift is a fully managed data warehouse service that allows for petabyte-scale analytics on data stored in columns. It uses a massively parallel processing architecture and columnar data storage to improve query performance. Defining sort keys and distribution keys appropriately is crucial to influence how data is stored and queries are processed in parallel across nodes. Automatic features like concurrency scaling, resize operations, and backups help ensure the warehouse scales and remains available as data and usage grow over time.
Apache Spark 3.0: Overview of What’s New and Why CareDatabricks
Spark 3.0 introduces several new features and enhancements to improve performance, usability and compatibility. Key highlights include adaptive query execution which optimizes query plans at runtime based on statistics, dynamic partition pruning to avoid unnecessary data scans, and join hints to influence join strategies. Usability is improved with richer APIs like pandas UDF enhancements and a new structured streaming UI. Compatibility and extensibility is enhanced with Java 11 support, Hive 3.x metastore support and Hadoop 3 support.
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...Citus Data
As a developer using PostgreSQL one of the most important tasks you have to deal with is modeling the database schema for your application. In order to achieve a solid design, it’s important to understand how the schema is then going to be used as well as the trade-offs it involves.
As Fred Brooks said: “Show me your flowcharts and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won’t usually need your flowcharts; they’ll be obvious.”
In this talk we're going to see practical normalisation examples and their benefits, and also review some anti-patterns and their typical PostgreSQL solutions, including Denormalization techniques thanks to advanced Data Types.
As the popularity of PostgreSQL continues to soar, many companies are exploring ways of migrating their application database over. At Redgate Software, we recently added PostgreSQL as an optional data store for SQL Monitor, our flagship monitoring application, after nearly 18 years of being backed exclusively by SQL Server. Knowing that others will be taking this journey in the near future, we'd like to discuss what we learned. In this training, we'll discuss the planning that needs to take place before a migration begins, including datatype changes, PostgreSQL configuration modifications, and query differences. This will be a mix of slides and demo from our own learnings, as well as those of some clients we've helped along the way.
MySQL 5.6 - Operations and Diagnostics ImprovementsMorgan Tocker
This document discusses MySQL 5.6 and its improvements to operational and diagnostic capabilities. Key enhancements include online DDL operations that do not block reads or writes, buffer pool dump and restore for faster startup, import/export of partitioned tables, and transportable tablespaces. Diagnostic tools were improved with EXPLAIN showing more details, the ability to EXPLAIN updates and deletes, optimizer tracing, and the performance schema providing detailed query level instrumentation and monitoring by default.
The Adventure: BlackRay as a Storage Enginefschupp
- BlackRay is an in-memory relational database and search engine that supports SQL and fulltext search. It was originally developed in 2005 for a phone directory with over 80 million records.
- The presentation discusses BlackRay's architecture, data loading process, indexing performance, and support for transactions, clustering, and APIs. It also describes efforts to implement BlackRay as a MySQL storage engine.
- Going forward, the team aims to improve SQL support, add security features, and explore using BlackRay as a backend for other applications like LDAP directories.
It's your unstructured data: How to get your GenAI app to production (and spe...Zilliz
So you've successfully built a GenAI app POC for your company -- now comes the hard part: bringing it to production. Aparavi addresses the challenges of AI projects while addressing data privacy and PII. Our Service for RAG helps AI developers and data scientists to scale their app to 1000s to millions of users using corporate unstructured data. Aparavi’s AI Data Loader cleans, prepares and then loads only the relevant unstructured data for each AI project/app, enabling you to operationalize the creation of GenAI apps easily and accurately while giving you the time to focus on what you really want to do - building a great AI application with useful and relevant context. All within your environment and never having to share private corporate data with anyone - not even Aparavi.
Increase Quality with User Access Policies - July 2024Peter Caitens
⭐️ Increase Quality with User Access Policies ⭐️, presented by Peter Caitens and Adam Best of Salesforce. View the slides from this session to hear all about “User Access Policies” and how they can help you onboard users faster with greater quality.
Generative AI technology is a fascinating field that focuses on creating comp...Nohoax Kanont
Generative AI technology is a fascinating field that focuses on creating computer models capable of generating new, original content. It leverages the power of large language models, neural networks, and machine learning to produce content that can mimic human creativity. This technology has seen a surge in innovation and adoption since the introduction of ChatGPT in 2022, leading to significant productivity benefits across various industries. With its ability to generate text, images, video, and audio, generative AI is transforming how we interact with technology and the types of tasks that can be automated.
Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...Snarky Security
How wonderful it is that in our modern age, every bit of our biological data can be digitized, stored, and potentially pilfered by cyber thieves! Isn't it just splendid to think that while scientists are busy pushing the boundaries of biotechnology, hackers could be plotting the next big bio-data heist? This delightful scenario is brought to you by the ever-expanding digital landscape of biology and biotechnology, where the integration of computer science, engineering, and data science transforms our understanding and manipulation of biological systems.
While the fusion of technology and biology offers immense benefits, it also necessitates a careful consideration of the ethical, security, and associated social implications. But let's be honest, in the grand scheme of things, what's a little risk compared to potential scientific achievements? After all, progress in biotechnology waits for no one, and we're just along for the ride in this thrilling, slightly terrifying, adventure.
So, as we continue to navigate this complex landscape, let's not forget the importance of robust data protection measures and collaborative international efforts to safeguard sensitive biological information. After all, what could possibly go wrong?
-------------------------
This document provides a comprehensive analysis of the security implications biological data use. The analysis explores various aspects of biological data security, including the vulnerabilities associated with data access, the potential for misuse by state and non-state actors, and the implications for national and transnational security. Key aspects considered include the impact of technological advancements on data security, the role of international policies in data governance, and the strategies for mitigating risks associated with unauthorized data access.
This view offers valuable insights for security professionals, policymakers, and industry leaders across various sectors, highlighting the importance of robust data protection measures and collaborative international efforts to safeguard sensitive biological information. The analysis serves as a crucial resource for understanding the complex dynamics at the intersection of biotechnology and security, providing actionable recommendations to enhance biosecurity in an digital and interconnected world.
The evolving landscape of biology and biotechnology, significantly influenced by advancements in computer science, engineering, and data science, is reshaping our understanding and manipulation of biological systems. The integration of these disciplines has led to the development of fields such as computational biology and synthetic biology, which utilize computational power and engineering principles to solve complex biological problems and innovate new biotechnological applications. This interdisciplinary approach has not only accelerated research and development but also introduced new capabilities such as gene editing and biomanufact
Finetuning GenAI For Hacking and DefendingPriyanka Aash
Generative AI, particularly through the lens of large language models (LLMs), represents a transformative leap in artificial intelligence. With advancements that have fundamentally altered our approach to AI, understanding and leveraging these technologies is crucial for innovators and practitioners alike. This comprehensive exploration delves into the intricacies of GenAI, from its foundational principles and historical evolution to its practical applications in security and beyond.
Discovery Series - Zero to Hero - Task Mining Session 1DianaGray10
This session is focused on providing you with an introduction to task mining. We will go over different types of task mining and provide you with a real-world demo on each type of task mining in detail.
Redefining Cybersecurity with AI CapabilitiesPriyanka Aash
In this comprehensive overview of Cisco's latest innovations in cybersecurity, the focus is squarely on resilience and adaptation in the face of evolving threats. The discussion covers the imperative of tackling Mal information, the increasing sophistication of insider attacks, and the expanding attack surfaces in a hybrid work environment. Emphasizing a shift towards integrated platforms over fragmented tools, Cisco introduces its Security Cloud, designed to provide end-to-end visibility and robust protection across user interactions, cloud environments, and breaches. AI emerges as a pivotal tool, from enhancing user experiences to predicting and defending against cyber threats. The blog underscores Cisco's commitment to simplifying security stacks while ensuring efficacy and economic feasibility, making a compelling case for their platform approach in safeguarding digital landscapes.
How UiPath Discovery Suite supports identification of Agentic Process Automat...DianaGray10
📚 Understand the basics of the newly persona-based LLM-powered Agentic Process Automation and discover how existing UiPath Discovery Suite products like Communication Mining, Process Mining, and Task Mining can be leveraged to identify APA candidates.
Topics Covered:
💡 Idea Behind APA: Explore the innovative concept of Agentic Process Automation and its significance in modern workflows.
🔄 How APA is Different from RPA: Learn the key differences between Agentic Process Automation and Robotic Process Automation.
🚀 Discover the Advantages of APA: Uncover the unique benefits of implementing APA in your organization.
🔍 Identifying APA Candidates with UiPath Discovery Products: See how UiPath's Communication Mining, Process Mining, and Task Mining tools can help pinpoint potential APA candidates.
🔮 Discussion on Expected Future Impacts: Engage in a discussion on the potential future impacts of APA on various industries and business processes.
Enhance your knowledge on the forefront of automation technology and stay ahead with Agentic Process Automation. 🧠💼✨
Speakers:
Arun Kumar Asokan, Delivery Director (US) @ qBotica and UiPath MVP
Naveen Chatlapalli, Solution Architect @ Ashling Partners and UiPath MVP
Garbage In, Garbage Out: Why poor data curation is killing your AI models (an...Zilliz
Enterprises have traditionally prioritized data quantity, assuming more is better for AI performance. However, a new reality is setting in: high-quality data, not just volume, is the key. This shift exposes a critical gap – many organizations struggle to understand their existing data and lack effective curation strategies and tools. This talk dives into these data challenges and explores the methods of automating data curation.
Demystifying Neural Networks And Building Cybersecurity ApplicationsPriyanka Aash
In today's rapidly evolving technological landscape, Artificial Neural Networks (ANNs) have emerged as a cornerstone of artificial intelligence, revolutionizing various fields including cybersecurity. Inspired by the intricacies of the human brain, ANNs have a rich history and a complex structure that enables them to learn and make decisions. This blog aims to unravel the mysteries of neural networks, explore their mathematical foundations, and demonstrate their practical applications, particularly in building robust malware detection systems using Convolutional Neural Networks (CNNs).
Keynote : Presentation on SASE TechnologyPriyanka Aash
Secure Access Service Edge (SASE) solutions are revolutionizing enterprise networks by integrating SD-WAN with comprehensive security services. Traditionally, enterprises managed multiple point solutions for network and security needs, leading to complexity and resource-intensive operations. SASE, as defined by Gartner, consolidates these functions into a unified cloud-based service, offering SD-WAN capabilities alongside advanced security features like secure web gateways, CASB, and remote browser isolation. This convergence not only simplifies management but also enhances security posture and application performance across global networks and cloud environments. Discover how adopting SASE can streamline operations and fortify your enterprise's digital transformation strategy.
The History of Embeddings & Multimodal EmbeddingsZilliz
Frank Liu will walk through the history of embeddings and how we got to the cool embedding models used today. He'll end with a demo on how multimodal RAG is used.
TrustArc Webinar - Innovating with TRUSTe Responsible AI CertificationTrustArc
In a landmark year marked by significant AI advancements, it’s vital to prioritize transparency, accountability, and respect for privacy rights with your AI innovation.
Learn how to navigate the shifting AI landscape with our innovative solution TRUSTe Responsible AI Certification, the first AI certification designed for data protection and privacy. Crafted by a team with 10,000+ privacy certifications issued, this framework integrated industry standards and laws for responsible AI governance.
This webinar will review:
- How compliance can play a role in the development and deployment of AI systems
- How to model trust and transparency across products and services
- How to save time and work smarter in understanding regulatory obligations, including AI
- How to operationalize and deploy AI governance best practices in your organization
History and Introduction for Generative AI ( GenAI )
Fotolog: Scaling the World's Largest Photo Blogging Community
1. Scaling the World’s Largest Photo Blogging Community Farhan “Frank” Mashraqi Senior MySQL DBA Fotolog, Inc. [email_address] Credits: Warren L. Habib: CTO Olu King: Senior Systems Administrator
2. Introduction Farhan Mashraqi Senior MySQL DBA Fotolog, Inc. Known on PlanetMySQL as Frank Mash Author of upcoming “Pro Ruby on Rails” by Apress Contact [email_address] [email_address] Blog: http:// mysqldatabaseadministration.blogspot.com http:// mashraqi.com
3. What is Fotolog? Social networking Guestbook comments Friend/ Favorite lists Members create “Social Capital” “ One photo a day” Currently 25 th most visited website on the Internet (Alexa) History http://blog.fotolog.com/
6. Fotolog Growth 228 million member photos 2.47 billion guestbook comments 20% of members visit the site daily 24 minutes a day spent by an average user 10 guestbook comments per photo 1,000 people or more see a photo on average 7 million members and counting “ explosive growth in Europe” Italy and Spain among the fastest-growing countries Recently broke the 500K photos uploaded a day record 90 million page views Fotolog Flickr
7. Technology Sun Solaris 10 MySQL Apache Java / Hibernate PHP Memcached 3Par IBRIX StrongMail
8. MySQL at Fotolog 32 Servers Specification of servers Four “clusters” User GB PH FF Non-persistent connections (PHP) Connection Pooling (Java) Mostly MyISAM initially Later mostly converted to InnoDB Application side table partitioning Memcache
9. Image Storage / Delivery MySQL is used to store image metadata only 3Par (utility storage) Thin Provisioning (dedicate on allocation vs. dedicate on write) How fast growing each day? Frequently Accessed vs. Infrequently accessed media Third party CDN: Akamai/Panther
10. Important Scalability Considerations Do you really need to have 5 nines availability? Budget Time to deploy Testing Can we afford: SPF? Not having read redundancy? User PH GB FF Not having write redundancy? User PH GB FF
18. AUTO-INC table lock contention SEL SEL SEL SEL SEL SEL SEL SEL SEL SEL M Y S Q L Thread concurrency SELECTs do very well with Increased concurrency. QPS: 500+ GOOD TIMES SELECT INSERT
19. AUTO-INC table lock contention SEL SEL SEL SEL SEL INS INS M Y S Q L Thread concurrency As more SELECTs come, AUTO-INC lock contention Starts causing problem. WARNING SEL SEL SEL SELECT INSERT
20. AUTO-INC table lock contention INS SEL INS SEL INS INS INS INS INS INS M Y S Q L Thread concurrency PROBLEM SEL SEL SEL SEL INS INS INS INS INS SELECT INSERT
21. InnoDB Tablespace Structure (Simplified) PK / CLUSTERED INDEX SECONDARY INDEX PK (clustered index key) 6 byte header Links together consecutive records & used in row-level locking Clustered index contains Fields for all user-defined columns 6 byte trx id 7 byte roll pointer 6 byte row id If no PK or UNIQUE NOT NULL defined Record Directory Array of Pointers to each field of the record 1 byte: If the total length of fields in record is 128 bytes 2 bytes: otherwise Data part of record
22. InnoDB Index Structure (Simplified) DATA PAGE PK INDEX / CLUSTERED INDEX SECONDARY INDEX PK ROW DATA PK
23. Old Schema CREATE TABLE `guestbook_v3` ( `identifier` bigint(20) unsigned NOT NULL auto_increment, `user_name` varchar(16) NOT NULL default '', `photo_identifier` bigint(20) unsigned NOT NULL default '0', `posted` datetime NOT NULL default '0000-00-00 00:00:00', … PRIMARY KEY (`identifier`), KEY `guestbook_photo_id_posted_idx` (`photo_identifier`,`posted`) ) ENGINE=MyISAM
24. Reads Data pages Data ordered by Identifier (PK) Looked up by secondary key
25. New Schema CREATE TABLE `guestbook_v4` ( `identifier` int(9) unsigned NOT NULL auto_increment, `user_name` varchar(16) NOT NULL default '', `photo_identifier` int(9) unsigned NOT NULL default '0', `posted` timestamp NOT NULL default '0000-00-00 00:00:00', … PRIMARY KEY (`photo_identifier`,`posted`,`identifier`), KEY `identifier` (`identifier`) ) ENGINE=InnoDB 1 row in set (7.64 sec)
26. Pending preads (Optimizing Disk Usage) Data pages Data ordered by composite key consisting of photo_identifier (FK) Looked up by primary key Very low read requests per second
27. Pending reads / writes / Proposed Throughput not as important as number of requests
30. MySQL Performance Challenges Finding the source of problem Mostly disk bound in mature systems Is the query cache hurting you? RAM addition helps dodge the bullet Disk striping Restructuring tables for optimal performance LD_PRELOAD_64 = /usr/lib/sparcv9/libumem.so
31. Considerations for future growth SQLite? File system? PostgreSQL? Make application better and optimize tables?
32. Things to remember Know the problem Know your application Know your storage engine Know your requirements Know your budget