2013 CPM Conference, Nov 6th, NoSQL Capacity Planning

#MongoDB #CMGNews

NoSQL: Capacity Planning
Asya Kamsky
Senior Solutions Architect, MongoDB Inc.

Some History
•  1970's Relational Databases Invented
–  Storage is expensive
–  Data is normalized
–  Data is abstracted away from app

Some History
–  Data is abstracted away from app
•  1980's RDBMS commercialized
–  Client/Server model
–  SQL becomes the standard

Some History
–  Data storage is abstracted away from app
•  1980's RDBMS commercialized
–  Client/Server model
–  SQL becomes the standard
•  1990's Things begin to change
–  Client/Server=> 3-tier architecture
–  Internet and the Web

Some History
•  2000's Web 2.0
–  "Social Media"
–  E-Commerce
–  Decrease of HW prices
–  Increase of collected data

Some History
•  2000's Web 2.0
–  "Social Media"
–  E-Commerce
–  Decrease of HW prices
–  Increase of collected data

•  Result
–  Need to scale
-- How do we keep up?

Developers
•  Agile Development Methodology
–  Shorter development cycles
–  Constant evolution of requirements
–  Flexibility at design time

Developers
•  Agile Development Methodology
–  Shorter development cycles
–  Constant evolution of requirements
–  Flexibility at design time

•  Relational Schema
–  Hard to evolve

•  must stay in sync with
application

NoSQL vs Relational
•  Relational

•  Key-Value

Graph

XML

Document Column
•  ACID

•  BASE

•  Two-phase commit

•  ACID on document level

•  Joins

•  No Joins

All Different
NoSQL != RDBMS

MongoDB != Cassandra != Neo4j != Redis != Riak != CouchDB != Couchbase

MongoDB History
•  Designed/developed by founders of Doubleclick, ShopWiki, GILT

groupe, etc.

•  First production site March 2008 - businessinsider.com
•  Open Source – AGPL, written in C++
•  Version 0.8 – ﬁrst ofﬁcial release February 2009
•  Version 2.4 – March 2013

MongoDB: scalable, high-performance
•  Document-oriented Storage
•  Based on JSON Documents
•  Flexible Schema
•  Scalable Architecture
•  Auto-sharding
•  Replication & high availability
•  Key Features Include:
•  Full featured indexes
•  Query language
•  Aggregation & Map/Reduce

MongoDB Performance
Just like all other systems, w/o understanding what
their strengths and weaknesses are, it is easy to build a
bad system.

MongoDB Performance
Better data locality

Relational

MongoDB

Better Data Locality
•  Data model means "entities" can reside "together"
•  Optimize schema for read and write access patterns
•  Minimize "seeks" as they dominate IO slowdown

•  Failure to take advantage of document model:
–  no improved performance
–  all the disadvantages with non of the advantages!
–  incorrect model can overshoot "all data embedded"

MongoDB Performance

Relational

MongoDB

In-Memory
Caching

In-memory Caching
•  memory mapped ﬁles,
•  caching handled by OS,
•  naturally leaves most frequently accessed data in RAM
•  have enough RAM to ﬁt indexes and working data set

for best performance

MongoDB Performance
In-Memory
Caching

Auto-Sharding

High Availability


Relational

MongoDB

Read/Write scaling

Auto-Sharding
•  horizontal scaling is "built-in" to the product
•  Replication is for HA
•  Sharding is for scaling
•  Number of servers in replica set based on HA

requirements
•  Number of shards is based on capacity needed vs.

single server/replicaset capacity

MongoDB Performance*
Top 5 Marketing
Firm

Government Agency

Top 5 Investment
Bank

10+ ﬁelds, arrays,
nested documents

20+ ﬁelds, arrays,
nested documents

Queries Key-based
1 – 100 docs/query
80/20 read/write

Compound queries
Range queries
MapReduce
20/80 read/write

Compound queries
Range queries
50/50 read/write

Servers ~250

~50

~40

Ops/sec 1,200,000

500,000

30,000

Data Key/value

* These figures are provided as examples. Your application governs your performance.

Key Performance Considerations

Capacity Planning

Performance Tuning

Capacity Planning: Why, What, When
Why?
Consequences of not planning?

What?

Requirements

What
•  There is one thing that is absolutely mandatory to

have in order to succeed in capacity planning

•  Without it, you will not be successful
•  We must have REQUIREMENTS from business
–  without requirements, we're building a roadmap without

knowing the desired destination

Imagine building a car without knowing what its top speed
should be, acceleration, MPH, and cost?

What?
• 

Availability

• 

Throughput

• 

Responsiveness

What
•  Availability: what is uptime requirement?
•  Throughput
–  average read/write/users
–  peak throughput?
–  OPS (operations per second)? per hour? per day?

•  Responsiveness
–  what is acceptable latency?
–  is higher during peak times acceptable?

What?

• 

Availability

• 

Throughput

• 

Responsiveness

When?
•  Before it's too late!

Start

Launch

Version 2

When
•  At the beginning before production, but after you launch you

must continue the process
•  Lack of future planning: Failure to project performance
drop-off as the amount of data increases –

•  Process (steps): -> ACTIONS
–  Requirements ask, guess, try/measure.
–  Understand application needs
–  Choose hardware to meet that pattern (...)
–  How many machines you need
–  Monitor to recognize growth exceeding current capacity.

Capacity Planning: What?
Understand Resources
–  Storage
–  Memory
–  CPU
–  Network

•  Understand Your Application
–  Monitor and Collect Metrics
–  Model to Predict Change
–  Allocate and Deploy
–  (repeat process)

Resource Usage
Storage
–  IOPS
–  Size
–  Data & Loading Patterns

Memory
–  Working Set

CPU
–  Speed
–  Cores

Network
–  Latency
–  Throughput

Storage

•  Active
•  Archival
•  Loading Patterns
•  Integration (BI/DW)

Storage Capability
Example IOPS
7,200 rpm SATA

~ 75-100 IOPS

15,000 rpm SAS

~ 175-210 IOPS

Amazon EBS/Provisioned

~ 100 IOPS "up to" 2,000 IOPS

Amazon SSD

9,000 – 120,000 IOPS

Intel X25-E (SLC)

~ 5,000

IOPS

Fusion IO

~ 135,000

IOPS

Violin Memory 6000

~ 1,000,000 IOPS

Storage

Measuring and Monitoring

Memory
Working Set
–  Active Data in Memory
–  Measured Over Periods

Memory
Work:
– Sorting
– Aggregation
– Connections

Memory
Work:
– Sorting
– Aggregation
– Connections

SORTS
Connections
Aggregations

Memory

Measuring and Monitoring
New in 2.4
–  workingSet option on db.serverStatus()

db.serverStatus( { workingSet: 1 } )

Memory & Storage

MOPS: MongoDB Ops/sec

Memory & Storage

MOPs

MOPS: MongoDB Ops/sec
PFs

Memory & Storage

% Disk Util

MOPS

CPU
Non-indexed Data
Sorting
Aggregation
–  Map/Reduce
–  Framework

Data
–  Fields
–  Nesting
–  Arrays/Embedded-Docs

Network
Latency
–  WriteConcern
–  ReadPreference
–  Batching
–  Documents (and Collections)

Throughput
–  Update/Write Patterns
–  Reads/Queries

Starter Questions
What is the working set?
–  How does that equate to memory
–  How much disk access will that require

How efﬁcient are the queries?
What is the rate of data change?
How big are the highs and lows?

Deployment Types
All of these use the same resources:
• 

Single Instance

• 

Multiple Instances (Replica Set)

• 

Cluster (Sharding)

• 

Data Centers

Capacity Planning: Monitoring

Monitor
§  Storage
§  Memory
§  CPU
§  Network
§  Application Metrics

Monitoring
•  CLI and internal status commands

•  mongostat; mongotop; db.serverStatus()
•  Plug-ins for munin, Nagios, cacti, etc.
•  Integration via SNMP to other tools
•  MMS

MongoDB Management Service
Cloud-based suite of services for managing MongoDB deployments

A Picture Speaks a Thousand Words

Symptoms
High Use CPU

Similar Query Pattern

Monitoring Best Practices
•  Monitor Logs
–  Alert, escalate
–  Correlate

•  Disk
–  Monitor

•  Instrument/Monitor App (including logs!)
•  Know your application and application (write)

characteristics

Models
•  Load/Users
–  Response Time/TTFB

•  System Performance
–  Peak Usage
–  Min/avg Usage

Velocity of Change
•  Limitations -> takes time
–  Data Movement
–  Allocation/Provisioning (servers/mem/disk)
•  Improvement
–  Limit Size of Change (if you can)
–  Increase Frequency
–  MEASURE its effect
–  Practice

Repeat (continuously)

Repeat Testing
Repeat Evaluations
Repeat Deployment

#MongoDB

Thank You
Asya Kamsky
Senior Solutions Architect, MongoDB

2013 CPM Conference, Nov 6th, NoSQL Capacity Planning

Related slideshows

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to 2013 CPM Conference, Nov 6th, NoSQL Capacity Planning

Similar to 2013 CPM Conference, Nov 6th, NoSQL Capacity Planning (20)

Recently uploaded

Recently uploaded (20)

2013 CPM Conference, Nov 6th, NoSQL Capacity Planning