No sql databases

What is NOSQL ?
• NOSQL is not a standard.
• NOSQL does not mean "No SQL", rather “Not Only SQL”
• But is also not a RDBMS replacement.
• CAP [Consistency Availability Partition Tolerance] Theorem
• BASE [ Basic Availability, Soft--‐state, Eventual Consistency] v/s ACID

Characteristics of a NoSQL Database
• Flexible schema / schema less
• Non relational
• Often Distributed (Partitioned)
• Often Replicated
• Horizontally Scalable
• Eventually consistent
• Cheaper compared to Big names RDBMS systems
• Simple API as compared to SQL (but not standard across products or even
versions).

NoSQL pros/cons
Advantages
– Massive scalability
– High availability
– Lower cost (than competitive solutions at that scale)
– (usually) predictable elasticity
– Schema flexibility, sparse & semi-structured data

Disadvantages
– Limited query capabilities (so far)
– Eventual consistency is not intuitive to program for
• Makes client applications more complicated
– No standardizatrion
• Portability might be an issue
– Insufficient access control

Different types of NoSQL Databases
• NoSQL databases are classified in four major data models:
1. Key-value
2. Document
3. Column family
4. Graph

1. Key-value data model
• The main idea is the use of a hash table
• Access data (values) by strings called keys
• Data has no required format – data may have any format
• Data model: (key, value) pairs
• Basic Operations:
Insert(key , value),
Fetch(key),Update(key),
Delete(key)

Contd..
• key/value store
• can be in memory only, or backed by disk persistence.
• supports versioning
• e.g. Voldemort (LinkedIn), Amazon SimpleDB, Memcache,
BerkeleyDB, Oracle NoSQL

1.1 Voldemort
• Distributed key-value store
– Based on Dynamo
• Originally developed by LinkedIn, now open source
• Features
– Simple data model (no joins or complex queries, no RI, …)
– P2P
– Scale-out / elastic
• Consistent hashing of keyspace
• Fixed partitions (no splits, but owner may change when re-balancing)
– Eventual consistency / High Availability
– Replication
– Failure handling

2. Riak
• Like Voldemort , Riak was based on Dynamo database
• Offers key/value interface
• Designed to run on large distributed clusters
• Uses consistent hashing to avoid the need for the kind of centralized
index server
• Querying is handled using MapReduce functions written in JavaScript
• It’s a open source for enterprise customers

2. Document-based datamodel
• Similar to Key-Value model, except value is a document.
• Usually JSON like interchange model.
• Query Model: JavaScript-like or custom.
• Aggregations: Map/Reduce
• Indexes are done via B-Trees.
• unlike simple key-value stores, both keys and values are fully
searchable in document databases.
• e.g. Couchbase, MongoDB, RavenDB, ArangoDB, MarkLogic,
OrientDB, RavenDB, Redis, RethinkDB

2.1 CouchDB
• Schema-free, document oriented database
– Documents stored in JSON format (XML in old versions)
– B-tree storage engine
– MVCC model, no locking
– no joins, no PK/FK (UUIDs are auto assigned)
– Implemented in Erlang
• 1st version in C++, 2nd in Erlang and 500 times more scalable (source: “Erlang
Programming” by Cesarini & Thompson)
– Replication (incremental)
• Documents
– UUID, version
– Old versions retained

2.2 MongoDB
• Another popular Document Database
• Data is stored on Disks but cached in memory for speed
• Supports Replication and Partitioning (Sharding)
• Very popular in Web Applications
• Data is stored internally as BSON and exchanged with
applications as JSON.
• Very easy to setup and get started.
• Not open--‐source but free to use (even commercially) and
support license option.

A sample MongoDB query
MySQL:
MongoDB:

2.3 Redis
• Often referred to as a Data Structure Server
• Supports storing strings, hashes, lists, sets , sorted sets bitmaps and
hyperloglogs.
• Data is kept in Memory
• Extremely popular for short lived data (Session, cache)
• Can be used as a Push/Pull Message Queue

3. Column family data model
• The column is lowest/smallest
instance of data.
• It is a tuple that contains a
name, a value and a timestamp
• Multiple columns (values) per key.
• e.g. Cassandra, Hbase,
Amazon Redshift, HP Vertica,
Teradata, BigTable, Hypertable

3.1 Cassandra
• Data is stored column wise as opposed to row--‐wise
• Supports partitioning (sharding) and replication even across data
centers.
• Can be used to store > Petabytes of data.
• Supports SQL like CQL interface.
• Open--‐source but commercially supported by DataStax.

3.1 Cassandra – data model, partitioning
• Data model
– Same as BigTable
– Super Columns (nested Columns) and Super Column Families
– column order in a CF can be specified (name, time)
• Dynamic partitioning
– Consistent hashing
– Ring of nodes
– Nodes can be “moved” on the ring for load balancing

3.2 BigTable
• Sparse, distributed, persistent multidimensional sorted map
• (row, column, timestamp) dimensions, value is string
• Key features
– Hybrid row/column store
– Single master (stand-by replica)
– Versioning
– Compression

BigTable - architecture
• Master server
– Assign tablets to Tablet Servers
– Balance TS load
– Garbage collection
– Schema management
– Client data does not move through the MS (directly through TS)
– Tablet location not handled by MS
• Tablet server (many)
– thousands of tablets per TS
– Manages Read / Write / Split of its tablets

3.3 HBase
• Developed by Powerset, now Apache
• Based on BigTable
– HDFS (GFS), ZooKeeper (Chubby)
– Master Node (Master Server), Region Servers (Tablet Servers)
– HStore (tablet), memcache (memtable), MapFile (SSTable)
• Features
– Data is stored sorted (no real indexes)
– Automatic partitioning
– Automatic re-balancing / re-partitioning
– Fault tolerance (HDFS, 3 replicas)

3.4 Hypertable
• It’s a open source clone of BigTable
• Written in C++
• Has increased performance

4. Graph data model
• Based on Graph Theory.
• Scale vertically, no clustering.
• You can use graph algorithms easily
• Transactions
• ACID
• For modeling the structure of Data
• Uses Property Graph Data Model (Nodes, Relationships,
properties)
• e.g. Neo4j, InfiniteGraph, OrientDB, Titan GraphDB

Other Types / Special Purpose
• Search DBs Solr, Elasticsearch
• Object Databases
• XML Databases

No sql databases

More Related Content

What's hot

What's hot (20)

Similar to No sql databases

Similar to No sql databases (20)

Recently uploaded

Recently uploaded (20)

No sql databases